Getting Indexed?

As an Search Engine Optimization Guy I am always get asked the same question. When will google list my site?

There is a lot of speculation about how search engines index websites. The topic is shrouded in mystery about the exact workings of a search engine indexing process, since most search engines offer limited information about how they architect the indexing process. The truth is none really knows. The only thing some experts can do is draw conclusions based data from log files. Here is some known information. Google runs from about 10 Internet Data Centers (IDCs), each having 1000 to 2000 Pentium-3 or Pentium-4 servers running Linux OS.

Google has over 200 (some think over 1000) crawlers/bots scanning the web each day. These do not necessarily follow an exclusive pattern, which means different crawlers may visit the same site on the same day, not knowing other crawlers have been there before. This is makes web masters very happy.

Some crawlers jobs are only to grab new URLs (let us call them URL Grabbers for convenience). The URL grabbers grab links and URLs they detects on various websites (including links pointing to your site) and old/new URL's it detects on your site. They also capture the Date Stamp' of files when they visit your website, so that they can identify new content or updated content pages.

The URL grabbers write the captured URL's with their date stamps and other stats in a Master URL List so that these can be deep-indexed by other special crawlers.

The master list is then processed and classified

  1. New URLs detected
  2. Old URLs with new date stamp
  3. 301 & 302 redirected URLs
  4. Old URLs with old date stamp
  5. 404 error URLs
  6. Other URLs

The real indexing is done by (what we're calling) Deep Crawlers. A deep crawler's job is to pick up URLs from the master list and deep crawl each URL and capture all the content, text, HTML, images, flash etc.

Priority is given to existing URLs with a new date stamp as they relate to already indexed but updated content. 301 and 302 redirected URLs come next in priority followed by New URLs detected. High priority is given to URLs whose links appear on several other sites. These are classified as Important URLs. Sites and URL's whose date stamp and content changes on a daily or hourly basis are stamped as News sites which are indexed hourly or even on a minute-by-minute basis.

Indexing of Old URLs with old date stamps and 404 error URLs are altogether ignored. There is no point wasting resources indexing Old URLs with old date stamp, since the search engine already has the content indexed, which is not yet updated.

The Other URLs may contain URLs which are dynamic URLs, have session IDs, PDF documents, Word documents, PowerPoint presentations, Multimedia files etc. Google needs to further process these and assess which ones are worth indexing and to what depth. It perhaps allocates indexing task of these to Special Crawlers.

When Google schedules the Deep Crawlers to index New URLs and 301 and 302 redirected URLs, just the URLs (not the descriptions) start appearing in search engines result pages when you run the a search.

Since Deep Crawlers need to crawl Billions of web pages each month, they take as many as 4 to 8 weeks to index even updated content. New URL's may take longer to index.

Once the Deep Crawlers index the content, it goes into their originating IDCs. Content is then processed, sorted and replicated (synchronized) to the rest of the IDCs. A few years back, when the data size was manageable, this data synchronization used to happen once a month, lasting for 5 days, nicknamed Google Dance. Nowadays, the data synchronization happens constantly, which some people call Everflux.

When you hit www.google.com from your browser, you can land at any of their 10 IDCs depending upon their speed and availability. Since the data at any given time is slightly different at each IDC, you may get different results at different times or on repeated searches of the same term, thus the name Google Dance.

Bottom line is that one needs to wait for as long as 8 to 20 weeks, to see full indexing in Google. Unless you can increase the importance of your web pages by getting several high quality incoming links from good sites, there is no way to speed up the indexing process.

Dynamic URLs may take longer to index (sometimes they do not get indexed at all) since even a small data change can create unlimited URLs, which can clutter Google index with duplicate content.

Conclusion:

First of all, most of this will be GREEK to most people, but to the ones that have an idea, and attempt SEO on their own, it should be a wake up call. Not only do you need to know what to do and how to do it, you also need patients, lots of patients. Too much tweaking, too much subversive SEO and you'll end up getting banned.

Patiently wait for 4 to 20 weeks for the indexing to happen. And Then comes the big work Getting Page Ranking!


Related Articles

  • New SEO Term Spibalance
    I been doing SEO for a very long time now so I thought of helping webmasters with a very common issue. The problem is that most webmasters see their site jumping in and out of the indexed. Today you check and your website is in the index.
  • Search Engine Optimization - Choosing Search Engine Friendly Web Hosting
    Building traffic from free search engines for a new web site is very difficult, since there are too competitive. Beside, the site that have been indexed earlier may have been building link popularity that make them owning high PR (page rank) that contributes in search engine placement. If it does so...
  • Finding Old Domains to Get Indexed Faster
    The first step in the process is to find an old domain. There is a more complicated explanation for this but all you really need to know is that the search engines favour domains that are established and have a longer history because they perceive that they are an older and hence more credible sourc...
  • Who Else Wants To Get Listed In Google In Under 24 Hours?
    You may or may not have heard people claim they can get listed in Google in only 24 hours. It seems that nearly 100% of the people you run across in forums act like in order to get indexed by Google in 24 hours is some miracle feat that can only be accomplished by doing some top secret marketing.....
  • The Secret to Getting Listed in Google in 24hrs - Guaranteed
    The truth of the matter is getting listed in Google in under a day is actually quite easy!
  • Blog Multiplication Equals Higher Earnings
    I just want to ask you one quick question. Are your blogs profitable? I told you it would be quick. Its real simple to answer.
  • What is Search Engine Optimizations?
    What is Search Engine Optimizations?
    Search engine optimizations is the process of writing a webpage with the right meta tags in the appropriate place of a web page for it to be crawled by web crawlers, web spiders, and web robot which will later indexed the extracted information along with t...
  • Get Listed In Google In Under 24
    You may or may not have heard pepole claim they can get listed in Google in only 1 day. It seems that nearly 100% of the people you run across in forums act like in order to get indexed by Google in 24 hours is some miracle feat that can only be accomplished by doing some top secret marketing........
  • SEO Blunders - Four Mistakes to Avoid
    There are lots of articles giving you top tips about SEO. Ive actually written a few myself.
  • Google Sitemaps and your SEO Strategy
    SEO Strategy is improved immediately with a Google Sitemap. This article explains why they help. The author gives you tips on creating the Sitemap and lists resources for if you run into problems.

Contact Web Design Outsource and get started today

Need Website Designing, Development, Redesigning, Maintenance and SEO services or help growing your company's web presence? Request a free Quote Now.