The Search Spider is an essential part of the search engine indexing process. These spiders are responsible for reading the codes of your website. Remember, all websites are codes and those codes are read by the spiders as the search engines save it in their database.
No, it’s not a different Species of Arachnids…
If you wanna learn how Search Works, you gotta know what makes Search Tick. For this lesson I’ll be discussing about search spiders and Google’s three kinds of search spiders.
What’s a Search Spider for?
When you own a website, you know that it’s all codes. It’s the web browser’s function to display those codes in a format that is friendly to human eyes. The pictures you see? Those are all codes. The flash movie you’re looking at? Yep, those are all codes. The video from Youtube? All codes.
That’s why meta tags are actually read by search engines even if they don’t appear on an HTML browser. Because it’s part of the code. And the search spiders crawl the code not the browser display.
As I’ve laid out in our last lesson, a search engine does three main things: Index, retrieve and rank. Search Spiders do most of the work when it comes to the first part of a search engines’ work which is to index.
All the codes in your website has to be read in order for it to be indexed successfully. If it’s not read well, it will show up distorted in Google’s database which we can see through it’s cache on the search results. There are different kinds of codes that the Search spider has to deal with. There’s PHP, JAVA, HTML, C# and so on…
What’s amazing about the search spider is that it doesn’t spin it’s own web. It uses links that go in and out of your website in order for it to move. It crawls the outbound links from your website to another website you’re pointing to. And how it will crawl your website is probably through the same way – coming from another website’s outbound link pointing to yours.
There are three kinds of Google search engine spiders – they’re better known as the Google bots.
Secondly, we have the Freshbot. The Freshbot crawls the most visited page in your website. It doesn’t really matter if you just have one popular page or a lot of ’em. There are websites that are crawled every ten minutes due to the rapid turnover of content and popularity of its pages such as CNN.com or Amazon.com, etc… A typical website is most likely crawled by the Freshbot around 1 to 14 days depending on how popular those pages are.
The Freshbot also paves the way for the third bot which is the DeepCrawl. The Freshbot looks out for all the deeper links in your site and saves it aside for the DeepCrawl to use. The DeepCrawl goes and indexes your site approximately once a month. That’s the reason why it can take up to a month or more for Google to index your whole website even with a Google sitemap in place.
You can read further about Google’s new indexing system – Google Caffeine to learn a bit more about how Google indexes websites.