Robots.txt File

We all know search engine optimization is a tricky business. Sometimes we rank well on one engine for a particular keyphrase and assume that all search engines will like our pages, and hence we will rank well for that keyphrase on a number of engines. Unfortunately this is rarely the case. All the major search engines differ somewhat, so what's get you ranked high on one engine may actually help to lower your ranking on another engine.

It is for this reason that some people like to optimize pages for each particular search engine. Usually these pages would only be slightly different but this slight difference could make all the difference when it comes to ranking high.

However because search engine spiders crawl through sites indexing every page it can find, it might come across your search engine specific optimizes pages and because they are very similar, the spider may think you are spamming it and will do one of two things, ban your site altogether or severely punish you in the form of lower rankings.

So what can you do to say stop Google indexing pages that are meant for AltaVista, well the solution is really quite simple and I'm surprised that more webmaster's who do optimize for each search engine don't use it more. It's done using a robots.txt file which resides on your webspace.

A Robots.txt file is a vital part of any webmasters battle against getting banned or punished by the search engines if he or she designs different pages for different search engine's.

The robots.txt file is just a simple text file as the file extension suggests. It's created using a simple text editor like notepad or wordpad, complicated word processors such as Microsoft Word will only corrupt the file.

Here's the code you need to insert into the file to work:

Red text is compulsory and never changes, while the blue text you will have to change to suit the file and the engine which you want to avoid it.

User-Agent: (Spider Name)
Disallow: (File Name)

The User-Agent is the name of the search engines spider and Disallow is the name of the file that you don't want that spider to spider. I'm not entirely sure if the code is case sensitive or not (maybe someone can let me know) but I do know that the code above works, so to be sure to check that the U and A are in caps and likewise the D in disallow.

You have to start a new batch of code for each engine, but if you want to list multiply disallow files you can one under another. For example -

User-Agent: Slurp (Inktomi's spider)
Disallow: internet-marketing-gg.html
Disallow: internet-marketing-al.html
Disallow: advertising-secrets-gg.html
Disallow: advertising-secrets-al.html

In the above code, I have disallowed Inktomi to spider two pages optimized for Google (internet-marketing-gg.html & advertising-secrets-gg.html) and two pages optimized for Altavista (internet-marketing-al.html & advertising-secrets-al.html). If Inktomi were allowed to spider these pages as well as the pages specifically made for Inktomi, I run the risk of being banned or penalized, So it's always a good idea to use a robots.txt file.

I mentioned earlier that the robots.txt file resides on your webspace, but where on your webspace? The root directory that's where, if you upload your file to sub-directories it will not work. If you want to block certain engines from certain files that do not reside in your root directory you simply need to point to the right directory and then list the file as normal, For example -

Robots.txtUser-Agent: Slurp (Inktomi's spider)
Disallow: folder/internet-marketing-gg.html
Disallow: folder/internet-marketing-al.html

If you wanted to disallow all engines from indexing a file, you simply use the * character where the engines name would usually be. However beaware that the * character won't work on the Disallow line.

Here's the names of a few of the big engines,
Excite - ArchitextSpider
Altavista - Scooter
Lycos - Lycos_Spider_(T-Rex)
Google - Googlebot
Alltheweb - FAST-WebCrawler/

Be sure to check over the file before uploading it, as you may have made a simple mistake, which could mean your pages are indexed by engines you don't want to index them, or even worse none of your pages might not be indexed.

A little note before I go, I have listed the User-Agent names of a few of the big search engines, but in reality, it's not worth creating different pages for more than 6-7 search engines. It's very time consuming and results would be similar to those if you created different pages for the only the top five. So more is not always best.

So now you know how to make a robots.txt file, to stop you from getting banned by the search engines. Wasn't that easy. Till next time!


Related Articles

  • Chat With Search Engine Spiders
    We are living in an age where robots and spiders are crawling all over your Web site. No, this isnt a tag line from an old 1950 horror movie, this is the way things are. Dont be frightened though. The fact that you have robots and spiders on your Web site is a good thing...
  • Software To Help You Create A Robots.Txt file
    Robogen is a software for Windows 95, Windows 98,Windows XP and Windows NT based computers which can help you create robots.txt files without having to worry about the correct syntax of a robots.txt file...
  • Robots.txt File
    We all know search engine optimization is a tricky business. Sometimes we rank well on one engine for a particular keyphrase and assume that all search engines will like our pages, and hence we will rank well for that keyphrase on a number of engines. Unfortunately this is rarely the case. All the m...
  • Creating a Robots.txt file
    Some people believe that they should create different pages for different search engines, each page optimized for one keyword and for one search engine. Now, if you do decide to create such pages, there is one issue that you need to be aware of...
  • Disabling Google And Other Search Engines From Crawling A Site
    If I disable Google to my Web site, its possible Google.com erase or drop down my Web site for his directory?
  • Search Engine Spiders Lost Without Guidance - Post This Sign!
    Robots.txt Signpost Warns Trespassers From Private Property The robots.txt file is an exclusion standard required by all web crawlers/robots to tell them what files and directories that you want them to stay OUT of on your site. Not all crawlers/bots follow the exclusion standard and will continu...
  • Blocking Bad Agents from downloading your website
    If you found this page searching Google or search engine, you must be looking for a solution how to bloc.....
  • 7 Top Ways to Avoid Link Theft
    If you have a link directory on a website, how do you stop link theft by sites that dont link back, or trick you into thinking they do?...
  • How Link Exchange Partners Cheat You
    It is not just you, these things happen to everyone. Worrying only leads to more problems.
  • How to Get Search Engines to Pay Attention
    This article tells you the most important ways to get search engines to index and rank your website highly. The author broke the article into 10 very good tips, and they will all help you reach your goals.

Contact Web Design Outsource and get started today

Need Website Designing, Development, Redesigning, Maintenance and SEO services or help growing your company's web presence? Request a free Quote Now.