Effective Internet Marketing Strategy and Tactics Through Test

Googlebot and Search Visitors

If this page is useful, please click the "+1" button

Published on April 8th, 2010 by Jeremy Chatfield

I’ve been interested in the behaviour of Googlebot, the robot that Google uses to crawl the web, for years. It’s a topic that seems largely unaddressed by search engine optimisers, yet the behaviour of Googlebot should be extremely important. After all, uncrawled sites tend to have problems with ranking many pages – the best you can get is to have pages ranked that other people are pointing to, which, for most businesses, tends to be just the home page.

I’ve fairly recently had discussions with a few web site managers who’d made what appears to me to be the most peculiar decision – to block Googlebot because of the traffic impact. This resonated with a previous short article that I’d posted, about a problem identified by a Google staffer who was running his own blog. He’d seen his blog dropped from search results and was looking for why that might be happening.

There’s certainly a potential problem – low bandwidth sites may suffer if Googlebot consumes the available bandwidth. But if you don’t have Googlebot crawling, then how are you going to appear, anyway?

You could use the Webmaster Tools to request that Google slows the crawl for your site. This should still result in having the crawling and indexing, and minimal damage to the traffic. But just disabling the crawl, by using robots.txt to block all crawling, or to block crawling of large sections of the site that should have user interaction, is probably a mistake.

There is also the legitimate concern that Googlebot’s visits might be draining server resources at peak traffic periods. That’s moderately difficult for non-technical site owners to work out. Google Analytics (and the other JavaScript page bug based web analytics packages, such as CoreMetrics, Omniture, Webtrends, etc) measure user visits, not Googlebot and other bot visits. Verifying that Googlebot isn’t interfering with and slowing down visitors, is pretty much impossible to understand without going to web server log file analysis.

Web Server Log File Analysis

I like web server log files. There’s things I can find out from them, in a few hours, that I simply can’t find from Google Analytics, CoreMetrics and Omniture. Look at this graph, for example. I’ve taken web server log files from a UK-targeted business, and extracted Google-inspired visits and Googlebot visits, by hour.

Graph shows that Googlebot is more active when visitors aren't present

The graph shows that Googlebot is busiest when users are less present. That is, when Google can see visitors coming to the site, the crawl volume is reduced.

This pattern of making Googlebot most active when the site visits are least active, seems to be the most common pattern that I can see in clients’ web server logfiles. It makes a lot of sense for Google, too:

  • Continuing visits by Googlebot allow them to check that the site is still working (preventing Google from delivering users to a 404′ed page)
  • Site performance under load can be monitored (helping Googlebot tune crawling rates, and verifying that users are getting responses from the site, mostly)

Summary

Googlebot seems to be quite smart about when it visits sites. The more users that are being sent to a site in a given hour, the relatively lower rate that it crawls. So Googlebot should never get in the way of visitors, under normal conditions.

Simply disabling Googlebot looks like a weak way to go.

Following suggestions from the Google Webmaster Blog, if you have areas of the website that change at different speeds, you might want to validate multiple webmaster consoles for different sections of the site. That would allow setting different crawl rates. I’ve not tried this, yet… I don’t have a client for whom I want to restrict crawling speeds!

"Googlebot and Search Visitors" was published on April 8th, 2010 and is listed in google, SEO, spiders.

Follow comments via the RSS Feed | Leave a comment

Got a question or want to dispute this?

Is this article any good? What helped you? What made you think it was wrong? What else would you like to know or discuss?

free debate

Merjis Internet Marketing Blog is powered by WordPress and the YUI-Mainstream Theme by Buzzdroid.comBoosted by FeedBurner