|
|
![]() WORKSHOP
Improving "Searchability" of Your Web Site
Introduction In addition to registering your Web site with search engines, you also need to make sure that the pages at your Web site are searchable not just by the search engines on the Internet, but also by the search engine you are using at your site. Below, I have described what you need to to improve "searchability" of your Web sites. But before we get in to the specifics, it is important to understand how search engines work.
How do search engines work? To create such an index, most search engines use software programs called spiders (also referred to as crawlers, robots, or bots). These spiders "roam" the Web, look at a Web page, index the content of that page, follow the links on that page to go to other pages, index their contents, and continue following links and indexing the content of Web pages.
Search engines are not created equal! Some engines index specially marked tags called meta tags. Meta tags provide information about the content of a Web page--therefore, referred to as "META" tags (information about information). META tags may include a short description of the site's content, list relevant keywords, or identify author(s) of the Web page. There are other uses of the META tags as well. (for more information about META tags, see META tag tutorial at WebDeveloper.com). META tags do not just provide the keywords for the spiders. They also provide the necessary information to help spiders determine the relative usefulness of that page for a given keyword; this method is also referred to as ranking. However, not all search engines rely on META information to rank Web pages/sites. For example, AltaVista, Infoseek, and HotBot do rely on meta tags to rank sites; whereas, Yahoo, Excite, and Lycos do not. To summarize, you must remember that different search engines use different methods to index and rank Web sites. And to make your Web site searchable, you should try to understand the needs of different search engines and satisfy them to the extent possible.
Guidelines for improving "searchability"
Use META tags for Keywords and Description Here's how they are used in the HEAD tag of an HTML document:
<META NAME="DESCRIPTION" CONTENT="A 1-3 sentence description of the page/site."> The order of the META tags is important because Excite uses the first meta tag to summarize a "hit" on the results page. As mentioned before, Meta tags are not a panacea for search engine problems. Although HotBot and Infoseek do give a slight edge to pages with KEYWORDS in their meta tags, Excite doesn't read them at all; however, Excite does use the DESCRIPTION meta tag as a summary of the page in the "hits" list.
Pick keywords strategically Your strategic keywords should always be at least two or more words long. Usually, too many sites will be relevant for a single word, such as "Internet" or "Web." You may feel that by adding more words, your odds of success are lower. But, don't waste your time fighting the odds. Pick phrases of two or more words, and you'll have a better shot at success.
Position your keywords on the page strategically Also, the closer the keywords to the top of the page, the higher the relevance of the page for that keyword. Therefore, it's better to position the strategic keywords in the first paragraph of the page. Remember that tables, JavaScripts, etc. can push the text further down on the page, making keywords less relevant because they appear lower on the page. Also, some search engines will index say only the first 4K of the file or 500 words on a page. If you have JavaScript embedded on a page, the content of the page may not be indexed! Have relevant contentMaking sure that page titles, keywords, and descriptions are relevant to the page content is not sufficient. It is important that the content on the page is relevant as well. Your keywords must be reflected in the page's content. What this means is that you must have HTML text on your page. Sometimes sites present information mainly via graphics. They may be good to look at, but search engines can't read those graphics and they miss out on the text that might make your site more relevant. Although it is important to put ALT text for every graphic on your Web page (and you should do it!), many search engines don't index the ALT text. Therefore, it is important that you put text whenever possible.
Link to related content at least on your Web site
Make sure that the most important page in a Web site (or subsite) is well-linked Another advantage is that some search engines (e.g., WebCrawler) factor link popularity when ranking Web pages. Pages that have a lot of links pointing at them (e.g., your Home Page), they are given a higher ranking.
Make sure that the title of the page is unique
Repeat the keywords throughout the page
Be careful when using frames Viewing this page requires browser capable of displaying frames. To summarize, if you are using frames, start using NOFRAMES tags to provide an alternative for search engines. Also, consider non-frames version of the site so that all the information gets catalogued. Providing NOFRAMES content on your page will also make it more accessible for people using browsers that don't support frames and those who use special devices to use the Web. For example, people with vision deficiencies may use Screen Readers to read the Web page content; Screen Readers don't understand frames.
Restrict Web pages that you don't want to be indexed <META NAME="ROBOTS" CONTENT="NOINDEX,NOFOLLOW"> This means that spiders should NOT INDEX the page as well as NOT FOLLOW all the available links within it. If you just want to exclude a page from being indexed, but want the bot to continue indexing links included within it, change the meta tag to:
<META NAME="ROBOTS" CONTENT="NOINDEX,FOLLOW"> Though this tag is not as yet supported by all search engines, the major ones do support them.Another way to achieve the same effect is to use a robots.txt file on your Web server. Like the META tag ROBOTS, robots.txt file tells robots (or spiders) what they may and may not index within a site. For instance, if I didn't want my Web site to be indexed at all, I would put the following in the robots.txt file: User-agent: *Not all spiders follow what is referred to as Robots Exclusion protocol, but most do. For more information on Robots Exclusion protocol, see The Web Robots Pages.
Search engines may penalize pages or exclude them from the index, if they detect search engine spamming. Following practices are considered spamming (or "spamdexing"):
References
© Internet Technical Group Last update: September 8, 1998 URL: http://www.sandia.gov/itg/newsletter/sep98/workshop_search_guidelines.html hosted by Sandia National Labs Disclaimer: Neither Sandia Corporation, the United States Government, nor any agency thereof, nor any of their employees makes any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately-owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by Sandia Corporation, the United States Government, or any agency thereof. The views and opinions expressed herein do not necessarily state or reflect those of Sandia Corporation, the United States Government or anV"ency thereof. |