In order for your website to appear in search results, Google (as well as other search engines such as Bing, Yandex, Baidu, Naver, Yahoo or DuckDuckGo) use web crawlers to navigate the website to discover websites and its web pages.
Different search engines have different market shares in each country.
In this guide we cover Google, which is the biggest search engine in most countries. That being said, you might want to check other search engines and their guidelines, especially if your target customers are in China, Russia, Japan or South Korea.
While there are some differences when it comes to Ranking and Rendering, most search engines work in a very similar way when it comes to Crawling and Indexing.
Web crawlers are a type of bot that emulate users and navigate through links found on the websites to index the pages. Web crawlers identify themselves using custom user-agents. Google has several web crawlers, but the ones that are used more often are Googlebot Desktop and Googlebot Smartphone.
A general overview of the process can be the following:
200- it crawls and parses the HTML.
30X- it follows the redirects.
40X- it will note the error and not load the HTML
50X- it may come back later to check if the status code has changed.
In the next few sections, we will take a deep dive into each of the main components of a search system's processes: crawling and indexing, and rendering and ranking.
How can you identify if a user on your site is a web crawler?