OneWebDesk

robots.txt Checker

Fetch a site's robots.txt and review its rules and sitemap declarations.

The robots.txt Checker fetches the robots.txt file from a site's domain in real time and shows you the raw content alongside its key rules. See at a glance which paths are blocked from crawlers, which User-agents have rules applied, and whether your sitemaps are properly declared.

robots.txt always lives at the site root (/robots.txt), and search engines read it first before crawling. A misplaced rule can drop important pages from the index, so it pays to verify your rules with this tool after every deploy. Enter a domain or full URL and it automatically looks up the robots.txt at the root.

Reading the results

  • Raw content: the full robots.txt exactly as the server returned it.
  • Sitemap: sitemap URLs discovered from Sitemap: declarations, clickable to open directly.
  • User-agent: the crawler identifiers the rules apply to. * means all crawlers.
  • Disallow count: how many block rules (Disallow:) exist, a rough gauge of how much is blocked.

When robots.txt is missing

If robots.txt does not exist (e.g. returns 404), the standard interprets that as full crawling allowed. That is not a problem in itself — it simply means there are no block rules. Still, if you want to advertise your sitemap location, it is worth adding the file even with an otherwise empty ruleset and a Sitemap: line. Build a new file with the robots.txt generator, and confirm the sitemap you declared is healthy with the sitemap validator.

Common mistakes

  • A single Disallow: / blocking the entire site and dropping it from the index.
  • Shipping staging block rules straight into a production deploy.
  • Expecting a robots-blocked page to be removed from the index — blocking only prevents crawling; removal requires noindex.

Frequently asked questions

Can I enter just a domain?
Yes. Enter a domain and it automatically looks up https://domain/robots.txt. If you paste a full URL, the path is ignored and only the root robots.txt is fetched.
It says robots.txt is missing. Is that a problem?
No. When the file is absent, the standard treats all crawling as allowed. You don't need one if you have nothing to block, but it's useful for advertising your sitemap.
How do I remove a page from the index with robots.txt?
A Disallow rule only blocks crawling. To remove an already-indexed page, use a noindex meta tag (or X-Robots-Tag header) on that page instead.
Is my input sent anywhere?
The server only fetches the site's public robots.txt; the URL you enter is not stored or shared with third parties beyond the lookup. Results are cached briefly for fast responses.

Related tools

SEO / Indexing