ITG Logo










Internetworking 1.2 Header

contents prev: Reflections next: Opinions-Voices of readers
WORKSHOP

Improving "Searchability" of Your Web Site
Pawan Vora, pvora@uswest.com
U S WEST

Introduction
Putting your Web site up on the Internet is not enough. You also have to make sure that interested people can find your Web site using a variety of search engines. Obviously, the first thing you need to do is register your site with the search engines such as Yahoo, Excite, AltaVista, InfoSeek, and so forth. You can also take advantage of among the several Web site promotion services available to help you register your Web site with search engines such as Submit It, AAA Internet Promotions, and WebPromote; some of them charge a fee for this service. You can get an extensive list on the Announcement Services page at Yahoo (Path: Yahoo Home : Computers and Internet : Internet : World Wide Web : Announcement Services).

In addition to registering your Web site with search engines, you also need to make sure that the pages at your Web site are searchable not just by the search engines on the Internet, but also by the search engine you are using at your site. Below, I have described what you need to to improve "searchability" of your Web sites. But before we get in to the specifics, it is important to understand how search engines work.

How do search engines work?
When we search for a keyword in a book, we look at the index, find the keyword, remember the pages the keyword appears on and turn to those pages to get to the desired information. Conceptually, search engines work the same way. However, instead of page numbers, they store the URLs (Uniform Resource Locators; or Web site addresses) for the keywords. When we search on a keyword or a key-phrase, a search engine looks at its index to find the keyword (or a key-phrase) and associated Web site addresses and gives us a list of those addresses in some organized fashion.

To create such an index, most search engines use software programs called spiders (also referred to as crawlers, robots, or bots). These spiders "roam" the Web, look at a Web page, index the content of that page, follow the links on that page to go to other pages, index their contents, and continue following links and indexing the content of Web pages.

Search engines are not created equal!
Although most search engines (spiders) use the method described above to index Web pages, what they index on a Web page varies. Some index only the title or text of a page, or a summary of the site provided by the Web site designer, or just first 100 words in the Web page, or some index all of the above.

Some engines index specially marked tags called meta tags. Meta tags provide information about the content of a Web page--therefore, referred to as "META" tags (information about information). META tags may include a short description of the site's content, list relevant keywords, or identify author(s) of the Web page. There are other uses of the META tags as well. (for more information about META tags, see META tag tutorial at WebDeveloper.com).

META tags do not just provide the keywords for the spiders. They also provide the necessary information to help spiders determine the relative usefulness of that page for a given keyword; this method is also referred to as ranking. However, not all search engines rely on META information to rank Web pages/sites. For example, AltaVista, Infoseek, and HotBot do rely on meta tags to rank sites; whereas, Yahoo, Excite, and Lycos do not.

To summarize, you must remember that different search engines use different methods to index and rank Web sites. And to make your Web site searchable, you should try to understand the needs of different search engines and satisfy them to the extent possible.

Guidelines for improving "searchability"
Results of most search engines number in hundreds or thousands of matching web pages. In most cases, only the 10 most "relevant" matches are displayed first. Because most people don't really pay attention beyond the first 10 or 20 hits, being listed beyond them means that many people may miss your web site. The guidelines below will help you improve searchability both for external search engines and the engines that you are using locally on your Web site.

Use META tags for Keywords and Description
This is probably the most important requirement for getting a page indexed (and searched) effectively. The two most important META tags are: KEYWORDS and DESCRIPTION.

Here's how they are used in the HEAD tag of an HTML document:

<META NAME="DESCRIPTION" CONTENT="A 1-3 sentence description of the page/site.">
<META NAME="KEYWORDS" CONTENT="keyword1, keyword2, etc.">

The order of the META tags is important because Excite uses the first meta tag to summarize a "hit" on the results page.

As mentioned before, Meta tags are not a panacea for search engine problems. Although HotBot and Infoseek do give a slight edge to pages with KEYWORDS in their meta tags, Excite doesn't read them at all; however, Excite does use the DESCRIPTION meta tag as a summary of the page in the "hits" list.

Pick keywords strategically
Consider how people will look for your Web page? Imagine the words they will type into the search field and use them as your strategic keywords. For example, say you have a page devoted to VRML training. Anytime someone types "VRML training," you want your page to be in the top 10 results. Then your strategic keyword for that page is "VRML training". Note that each page on your web site will have different strategic keywords that reflect the page's content.

Your strategic keywords should always be at least two or more words long. Usually, too many sites will be relevant for a single word, such as "Internet" or "Web." You may feel that by adding more words, your odds of success are lower. But, don't waste your time fighting the odds. Pick phrases of two or more words, and you'll have a better shot at success.

Position your keywords on the page strategically
Make sure that your strategic keywords appear in crucial locations on your Web pages. The most important place is the page title. Make sure that the most important keyword appears in the page title; for example, "ITG Publication: Internetworking." However, do not put a list of keywords in the title of the page.

Also, the closer the keywords to the top of the page, the higher the relevance of the page for that keyword. Therefore, it's better to position the strategic keywords in the first paragraph of the page. Remember that tables, JavaScripts, etc. can push the text further down on the page, making keywords less relevant because they appear lower on the page. Also, some search engines will index say only the first 4K of the file or 500 words on a page. If you have JavaScript embedded on a page, the content of the page may not be indexed!

Have relevant content
Making sure that page titles, keywords, and descriptions are relevant to the page content is not sufficient. It is important that the content on the page is relevant as well. Your keywords must be reflected in the page's content. What this means is that you must have HTML text on your page.

Sometimes sites present information mainly via graphics. They may be good to look at, but search engines can't read those graphics and they miss out on the text that might make your site more relevant. Although it is important to put ALT text for every graphic on your Web page (and you should do it!), many search engines don't index the ALT text. Therefore, it is important that you put text whenever possible.

Link to related content at least on your Web site
Expand your text references, where appropriate. Make links to pages related to the current page. It will help reinforce the strategic keywords on your page. Also, if possible, do not restrict the links only to the content on your Web site. Provide links to related information on other Web sites. Users coming to your Web site are interested in useful and relevant information; and, they want to get there quickly! They will return to your Web site if they think you are doing a good job at helping them easily get to the useful information. (This is not very different from how Nordstrom helps its customers by suggesting them the stores where they can find the desired merchandise.)

Make sure that the most important page in a Web site (or subsite) is well-linked
Even after trying all the methods to increase a Web page's relevance, it is quite likely that the first page that the user sees is not that site's or sub-site's home page (or, the most important page in a Web site); for example, a user looking for ITG (Internet Technical Group), may not see the ITG home page first. However, if you offer an opportunity to access ITG's home page on every page on ITG's Web site, the chances of the user accessing it are higher.

Another advantage is that some search engines (e.g., WebCrawler) factor link popularity when ranking Web pages. Pages that have a lot of links pointing at them (e.g., your Home Page), they are given a higher ranking.

Make sure that the title of the page is unique
Pages with keywords appearing in the title are assumed to be more relevant than others to the topic. But this doesn't mean that you put keywords in the title. Just make sure that the title of the page is descriptive enough so that users know the content of the page from the page title, because that's what search engines list in their hits. Having several pages with the same title doesn't help the user know which page is most relevant for their purpose. Having a good page title also helps users find your page if they were to bookmark your page.

Repeat the keywords throughout the page
Frequency is another factor in how search engines determine relevancy. A search engine will analyze how often keywords appear in relation to other words in a web page. Those with a higher frequency are often deemed more relevant that other web pages.

Be careful when using frames
Most search engines are like users with old browsers; they don't understand frames. They ignore all the links in the FRAMESET tags and instead see only the information in the NOFRAMES tags. However, DO NOT include the following information in the NOFRAMES tags:

Viewing this page requires browser capable of displaying frames.

To summarize, if you are using frames, start using NOFRAMES tags to provide an alternative for search engines. Also, consider non-frames version of the site so that all the information gets catalogued. Providing NOFRAMES content on your page will also make it more accessible for people using browsers that don't support frames and those who use special devices to use the Web. For example, people with vision deficiencies may use Screen Readers to read the Web page content; Screen Readers don't understand frames.

Restrict Web pages that you don't want to be indexed
This is not very difficult. If you don't want a page on your Web site to be indexed, simply insert the following META tag.

<META NAME="ROBOTS" CONTENT="NOINDEX,NOFOLLOW">

This means that spiders should NOT INDEX the page as well as NOT FOLLOW all the available links within it. If you just want to exclude a page from being indexed, but want the bot to continue indexing links included within it, change the meta tag to:

<META NAME="ROBOTS" CONTENT="NOINDEX,FOLLOW">

Though this tag is not as yet supported by all search engines, the major ones do support them.

Another way to achieve the same effect is to use a robots.txt file on your Web server. Like the META tag ROBOTS, robots.txt file tells robots (or spiders) what they may and may not index within a site. For instance, if I didn't want my Web site to be indexed at all, I would put the following in the robots.txt file:

User-agent: *
Disallow: /*
Not all spiders follow what is referred to as Robots Exclusion protocol, but most do. For more information on Robots Exclusion protocol, see The Web Robots Pages.

Spamdexing
Spamming the search engine. Some Web authors try to improve a Web site's ranking by repeating keywords in a meta tag several times, or include keywords that have absolutely nothing to do with the content of their pages.
DO NOT spam the search engines
Search engines may penalize pages or exclude them from the index, if they detect search engine spamming. Following practices are considered spamming (or "spamdexing"):

  • When a word is repeated hundreds of times on a page in a row, to increase the frequency and propel the page higher in the listings. Search engines watch for common spamming methods in a variety of ways, not the least by following up on complaints.
  • Be sure that HTML text is "visible." Some Web page designers try to spam search engines by repeating keywords in a tiny font or in the same color as the background color to make text invisible to the browsers. Many search engines now recognize this trick and don't index that page.

References
contents prev: Reflections next: Opinions-Voices of readers

© Internet Technical Group
Last update: September 8, 1998
URL: http://www.sandia.gov/itg/newsletter/sep98/workshop_search_guidelines.html
hosted by Sandia National Labs

Disclaimer: Neither Sandia Corporation, the United States Government, nor any agency thereof, nor any of their employees makes any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately-owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by Sandia Corporation, the United States Government, or any agency thereof. The views and opinions expressed herein do not necessarily state or reflect those of Sandia Corporation, the United States Government or anV"ency thereof.