Some people believe that they should create different pages fordifferent search engines, each page optimized for one keyword and forone search engine. Now, while I don't recommend that people createdifferent pages for different search engines, if you do decide tocreate such pages, there is one issue that you need to be aware of.Thesepages, although optimized for different search engines, often turn outto be pretty similar to each other. The search engines now have theability to detect when a site has created such similar looking pagesand are penalizing or even banning such sites. In order to prevent yoursite from being penalized for spamming, you need to prevent the searchengine spiders from indexing pages which are not meant for it, i.e. youneed to prevent AltaVista from indexing pages meant for Google andvice-versa. The best way to do that is to use a robots.txt file.
Youshould create a robots.txt file using a text editor like WindowsNotepad. Don't use your word processor to create such a file.
Here is the basic syntax of the robots.txt file:
User-Agent: [Spider Name]
Disallow: [File Name]
Forinstance, to tell AltaVista's spider, Scooter, not to spider the filenamed myfile1.html residing in the root directory of the server, youwould write
User-Agent: Scooter
Disallow: /myfile1.html
To tell Google's spider, called Googlebot, not to spider the files myfile2.html and myfile3.html, you would write
User-Agent: Googlebot
Disallow: /myfile2.html
Disallow: /myfile3.html
Youcan, of course, put multiple User-Agent statements in the samerobots.txt file. Hence, to tell AltaVista not to spider the file namedmyfile1.html, and to tell Google not to spider the files myfile2.htmland myfile3.html, you would write
User-Agent: Scooter
Disallow: /myfile1.html
User-Agent: Googlebot
Disallow: /myfile2.html
Disallow: /myfile3.html
Ifyou want to prevent all robots from spidering the file namedmyfile4.html, you can use the * wildcard character in the User-Agentline, i.e. you would write
User-Agent: *
Disallow: /myfile4.html
However, you cannot use the wildcard character in the Disallow line.
Onceyou have created the robots.txt file, you should upload it to the rootdirectory of your domain. Uploading it to any sub-directory won't work- the robots.txt file needs to be in the root directory.
I won't discuss the syntax and structure of the robots.txt file any further - you can get the complete specifications from here.
Nowwe come to how the robots.txt file can be used to prevent your sitefrom being penalized for spamming in case you are creating differentpages for different search engines. What you need to do is to preventeach search engine from spidering pages which are not meant for it.Forsimplicity, let's assume that you are targeting only two keywords:"tourism in Australia" and "travel to Australia". Also, let's assumethat you are targeting only three of the major search engines:AltaVista, HotBot and Google.
Now, suppose you have followed thefollowing convention for naming the files: Each page is named byseparating the individual words of the keyword for which the page isbeing optimized by hyphens. To this is added the first two letters ofthe name of the search engine for which the page is being optimized.
Hence, the files for AltaVista are
tourism-in-australia-al.html
travel-to-australia-al.html
The files for HotBot are
tourism-in-australia-ho.html
travel-to-australia-ho.html
The files for Google are
tourism-in-australia-go.html
travel-to-australia-go.html
As I noted earlier, AltaVista's spider is called Scooter and Google's spider is called Googlebot.
A list of spiders for the major search engines can be found here.
Now, we know that HotBot uses Inktomi and from this list, we find that Inktomi's spider is called Slurp.
Using this knowledge, here's what the robots.txt file should contain:
User-Agent: Scooter
Disallow: /tourism-in-australia-ho.html
Disallow: /travel-to-australia-ho.html
Disallow: /tourism-in-australia-go.html
Disallow: /travel-to-australia-go.html
User-Agent: Slurp
Disallow: /tourism-in-australia-al.html
Disallow: /travel-to-australia-al.html
Disallow: /tourism-in-australia-go.html
Disallow: /travel-to-australia-go.html
User-Agent: Googlebot
Disallow: /tourism-in-australia-al.html
Disallow: /travel-to-australia-al.html
Disallow: /tourism-in-australia-ho.html
Disallow: /travel-to-australia-ho.html
Whenyou put the above lines in the robots.txt file, you instruct eachsearch engine not to spider the files meant for the other searchengines.
When you have finished creating the robots.txt file,double-check to ensure that you have not made any errors anywhere init. A small error can have disastrous consequences - a search enginemay spider files which are not meant for it, in which case it canpenalize your site for spamming, or, it may not spider any files atall, in which case you won't get top rankings in that search engine.
Anuseful tool to check the syntax of your robots.txt file can be foundhere. While it will help you correct syntactical errors in therobots.txt file, it won't help you correct any logical errors, forwhich you will still need to go through the robots.txt thoroughly, asmentioned above.
Related Articles
- Creating a Robots.txt fileSome people believe that they should create different pages for different search engines, each page optimized for one keyword and for one search engine. Now, if you do decide to create such pages, there is one issue that you need to be aware of...
- Software To Help You Create A Robots.Txt fileRobogen is a software for Windows 95, Windows 98,Windows XP and Windows NT based computers which can help you create robots.txt files without having to worry about the correct syntax of a robots.txt file...
- Robots.txt FileWe all know search engine optimization is a tricky business. Sometimes we rank well on one engine for a particular keyphrase and assume that all search engines will like our pages, and hence we will rank well for that keyphrase on a number of engines. Unfortunately this is rarely the case. All the m...
- Chat With Search Engine SpidersWe are living in an age where robots and spiders are crawling all over your Web site. No, this isnt a tag line from an old 1950 horror movie, this is the way things are. Dont be frightened though. The fact that you have robots and spiders on your Web site is a good thing...
- Disabling Google And Other Search Engines From Crawling A SiteIf I disable Google to my Web site, its possible Google.com erase or drop down my Web site for his directory?
- Search Engine Spiders Lost Without Guidance - Post This Sign!Robots.txt Signpost Warns Trespassers From Private Property The robots.txt file is an exclusion standard required by all web crawlers/robots to tell them what files and directories that you want them to stay OUT of on your site. Not all crawlers/bots follow the exclusion standard and will continu...
- Blocking Bad Agents from downloading your websiteIf you found this page searching Google or search engine, you must be looking for a solution how to bloc.....
- 7 Top Ways to Avoid Link TheftIf you have a link directory on a website, how do you stop link theft by sites that dont link back, or trick you into thinking they do?...
- Banner Rotation Using PHPCreating a simple ad-rotator, the ads are stored in a text file and picked randomly by this PHP code. Most of the webmaster use banner ads, we will be creating a very simple banner rotator, which picks up randomly one ad from the banner file and displays it, this file can be called in any other page...
- How Link Exchange Partners Cheat YouIt is not just you, these things happen to everyone. Worrying only leads to more problems.
