March 30th, 2010 by Susie
Google and the other search engines evaluate websites using web crawlers, also called spiders or bots. These are fully automated critters that follow links across the internet independent of their owners, and report on what they find. That information is used to measure relevancy to particular searches and also to rank websites and decide which are the best. There are a slew of other factors at work, of course- SEO is a complex business- but the data gathered by crawlers is extremely important.
Sites with high authority, those that the search engines think are good, are crawled (or ‘indexed’) frequently. If they are known to update content frequently, that could mean a few times a day. New and unknown sites, or those with low authority for whatever reason, won’t be crawled so often. Poorer quality sites may only be indexed once a month or less.
Collecting and storing data with bots is cheap, but when that data runs into terabytes upon terabytes, using it effectively does become more problematic. Of course, there is incredibly valuable information in Google’s data warehouses, but nobody can deal with an infinite amount of data. There are millions and millions of sites on the web, some with a lot of content and many different subpages, and search engines need to prioritise the ones they gather data from.
For the same reasons, crawlers limit the information they gather from each site. And like the frequency of crawl, the amount of information gathered varies according to the good standing of the site, or otherwise. The higher a site’s authority, the more the crawler will look at.
For most sites, the bots restrict themselves to the top four levels of the url. That means thissite.com/level2/level3/level4/apage.html won’t be considered. Any keywords or content on it won’t contribute relevancy information. Users don’t like clicking through a lot of levels either, so keeping your site structure at four levels or less is a sound idea for more than one search engine optimization reason.
They also don’t index more than about 150kB of content from any page or subpage. Images don’t count towards the total, so you do get quite a lot of text within the limit, and all of that will contribute towards your overall SEO efforts. Again, most users won’t read through nearly that much content on any one page either, so there is a second reason to keep each one at a reasonable size. You have to consider your search engine reputation management with every aspect of your site.
Titles should be no more than 70 characters in length or there abouts. That’s not far off the length of the last sentence, so as you can see, it’s a fairly generous allowance. Anything longer than that will look a little odd anyway.
There are other factors that limit where bots will look- Flash objects and poor or image based navigation, for example- but as a rule of thumb, create content for easy reading by people and you probably won’t have to worry much about indexation limits. It is a good policy that will serve your SEO well in a lot of areas.
Link to us
If you want to link to this blog, copy and paste the following HTML code to your website.

08459 736 736