robots.txt Checker
Fetch a site's robots.txt and review its rules and sitemap declarations.
The robots.txt Checker fetches the robots.txt file from a site's domain in real time and shows you the raw content alongside its key rules. See at a glance which paths are blocked from crawlers, which User-agents have rules applied, and whether your sitemaps are properly declared.
robots.txt always lives at the site root (/robots.txt), and search engines read it first before crawling. A misplaced rule can drop important pages from the index, so it pays to verify your rules with this tool after every deploy. Enter a domain or full URL and it automatically looks up the robots.txt at the root.
Reading the results
- Raw content: the full robots.txt exactly as the server returned it.
- Sitemap: sitemap URLs discovered from
Sitemap:declarations, clickable to open directly. - User-agent: the crawler identifiers the rules apply to.
*means all crawlers. - Disallow count: how many block rules (
Disallow:) exist, a rough gauge of how much is blocked.
When robots.txt is missing
If robots.txt does not exist (e.g. returns 404), the standard interprets that as full crawling allowed. That is not a problem in itself — it simply means there are no block rules. Still, if you want to advertise your sitemap location, it is worth adding the file even with an otherwise empty ruleset and a Sitemap: line. Build a new file with the robots.txt generator, and confirm the sitemap you declared is healthy with the sitemap validator.
Common mistakes
- A single
Disallow: /blocking the entire site and dropping it from the index. - Shipping staging block rules straight into a production deploy.
- Expecting a robots-blocked page to be removed from the index — blocking only prevents crawling; removal requires
noindex.