robots.txt Checker

Fetch a site's robots.txt and review its rules and sitemap declarations.

The robots.txt Checker fetches the robots.txt file from a site's domain in real time and shows you the raw content alongside its key rules. See at a glance which paths are blocked from crawlers, which User-agents have rules applied, and whether your sitemaps are properly declared.

robots.txt always lives at the site root (/robots.txt), and search engines read it first before crawling. A misplaced rule can drop important pages from the index, so it pays to verify your rules with this tool after every deploy. Enter a domain or full URL and it automatically looks up the robots.txt at the root.

Domain or URLe.g. example.com (auto-fetches /robots.txt at the root)

Reading the results

Raw content: the full robots.txt exactly as the server returned it.
Sitemap: sitemap URLs discovered from Sitemap: declarations, clickable to open directly.
User-agent: the crawler identifiers the rules apply to. * means all crawlers.
Disallow count: how many block rules (Disallow:) exist, a rough gauge of how much is blocked.

When robots.txt is missing

If robots.txt does not exist (e.g. returns 404), the standard interprets that as full crawling allowed. That is not a problem in itself — it simply means there are no block rules. Still, if you want to advertise your sitemap location, it is worth adding the file even with an otherwise empty ruleset and a Sitemap: line. Build a new file with the robots.txt generator, and confirm the sitemap you declared is healthy with the sitemap validator.

Common mistakes

A single Disallow: / blocking the entire site and dropping it from the index.
Shipping staging block rules straight into a production deploy.
Expecting a robots-blocked page to be removed from the index — blocking only prevents crawling; removal requires noindex.

Directives and crawler support

robots.txt supports several directives, but search engines honor different subsets. If you spot any of these in the raw output above, use this table to confirm it actually does what you intend.

Directive	Meaning	Google	Bing
`User-agent`	Which crawler a rule block applies to	Yes	Yes
`Disallow`	Block crawling of a path (prefix match)	Yes	Yes
`Allow`	Carve an exception out of a Disallow	Yes	Yes
`Sitemap`	Declare sitemap location (absolute URL)	Yes	Yes
`Crawl-delay`	Seconds to wait between requests	Ignored	Yes
`Noindex`	(Non-standard) index block inside robots.txt	Ignored	Ignored
`$` / `*`	End anchor / wildcard pattern	Yes	Yes

A worked example

Suppose the checker returns this robots.txt as the raw output:

User-agent: *
Disallow: /admin/
Disallow: /search
Allow: /search/help
Sitemap: https://example.com/sitemap.xml

/admin/dashboard → blocked (matches Disallow: /admin/).
/search?q=shoes → blocked (/search is a prefix, so query strings are caught too).
/search/help → allowed — both Disallow: /search and Allow: /search/help match, but Google honors the longer (more specific) path, so Allow wins.
/about → allowed (matches no Disallow).

Common pitfall

Assuming line order decides the winner when Allow and Disallow both match — Google and Bing pick the longest (most specific) path, not whichever comes first.
Adding Crawl-delay expecting Googlebot to slow down — Google ignores it; adjust crawl rate in Search Console instead.
Treating Disallow: /search (no slash) the same as Disallow: /search/ — the former also blocks /searchresults, while the latter only blocks paths under /search/.

Frequently asked questions

Can I enter just a domain?

Yes. Enter a domain and it automatically looks up https://domain/robots.txt. If you paste a full URL, the path is ignored and only the root robots.txt is fetched.

It says robots.txt is missing. Is that a problem?

No. When the file is absent, the standard treats all crawling as allowed. You don't need one if you have nothing to block, but it's useful for advertising your sitemap.

How do I remove a page from the index with robots.txt?

A Disallow rule only blocks crawling. To remove an already-indexed page, use a noindex meta tag (or X-Robots-Tag header) on that page instead.

Is my input sent anywhere?

The server only fetches the site's public robots.txt; the URL you enter is not stored or shared with third parties beyond the lookup. Results are cached briefly for fast responses.

Related guides

How to Write robots.txt: Syntax, Examples and Common Mistakesrobots.txt rule syntax, ready-made examples by site type, and the common mistakes that break indexing.
Creating and Submitting sitemap.xml to Speed Up IndexingSitemap format, using lastmod correctly, how to submit to each search engine, and common errors.

Related tools

SEO / Indexing →

Meta Title Length Checker

Check title length and how it fits in search results.

Meta Description Length Checker

Check description length and search snippet fit.

SERP Snippet Preview

Preview how your title, URL and description appear in Google results.

Structured Data Generator

Generate Schema.org JSON-LD for Article, FAQ, Breadcrumb and more.

robots.txt Generator

Generate a robots.txt with allow/disallow rules and sitemap.

hreflang Tag Generator

Generate hreflang link tags for multilingual pages.