Saturday 18 February 2017

What happens when your Robots.txt file returns a Server Error?

What is robots.txt?

Robots.txt file is a standard used by websites to specify web crawlers which areas of the site should not be crawled. Here are basic examples of a robots.txt setups:-

If you want to allow full access to your site:

User-agent: *
Disallow:

If you want to block access to your whole site:

User-agent: *
Disallow: /

If you want to block a folder:

User-agent: *
Disallow: /folder/
You have to add robots.txt file to the root folder of your domain:
www.example.com/robots.txt

Since this is a file that contains important instructions for the web crawlers, it is a must for the crawlers to first visit this page and then rest of the site.

Do make a note of this - If Google bots can't crawl your robots.txt file, it would not crawl your site. If your robots.txt file doesn't return a 200 or 404 response code, Google bots won't be able to crawl your robots.txt file and hence they won't crawl your site.

This is what Google's Eric Kuan once said on Google Webmaster Help forum:

If Google is having trouble crawling your robots.txt file, it will stop crawling the rest of your site to prevent it from crawling pages that have been blocked by the robots.txt file. If this isn't happening frequently, then it's probably a one off issue you won't need to worry about. If it's happening frequently or if you're worried, you should consider contacting your hosting or service provider to see if they encountered any issues on the date that you saw the crawl error.

Even Gary Illyes from Google recently confirmed the same on Twitter:
And here are few interesting questions on Twitter and helpful replies from Gary:   

Gary Illyes on robots.txt
- Tejas Thakkar

No comments:

Post a Comment

I welcome your comments. Love to discuss about SEO. Please don't spam :)