In order for your website to appear in search results, Google (as well as other search engines such as Bing, Yandex, Baidu, Naver, Yahoo or DuckDuckGo) use web crawlers to navigate the website to discover websites and its web pages.
Different search engines have different market shares in each country.
In this guide we cover Google, which is the biggest search engine in most countries. That being said, you might want to check other search engines and their guidelines, especially if your target customers are in China, Russia, Japan or South Korea.
While there are some differences when it comes to Ranking and Rendering, most search engines work in a very similar way when it comes to Crawling and Indexing.
Web crawlers are a type of bot that emulate users and navigate through links found on the websites to index the pages. Web crawlers identify themselves using custom user-agents. Google has several web crawlers, but the ones that are used more often are Googlebot Desktop and Googlebot Smartphone.
How Does Googlebot Work?
A general overview of the process can be the following:
Add to Crawl Queue: These URLs are added to the Crawl Queue for the Googlebot to process. URLs in the Crawl Queue usually last seconds there, but it can be up to a few days depending on the case, especially if the pages need to be rendered, indexed, or – if the URL is already indexed – refreshed. The pages will then enter the Render Queue.
HTTP Request: The crawler makes an HTTP request to get the headers and acts according to the returned status code:
200 - it crawls and parses the HTML.
30X - it follows the redirects.
40X - it will note the error and not load the HTML
50X - it may come back later to check if the status code has changed.
Ready to be indexed: If all criteria are met, the pages may be eligible to be indexed and shown in search results.