OneWebDesk

robots.txt Generator

Generate a robots.txt with allow/disallow rules and sitemap.

robots.txt is a plain-text file placed at your site root (/robots.txt) that tells search-engine crawlers which paths they may crawl. This generator turns a User-agent, Allow/Disallow rules, an optional Crawl-delay and one or more Sitemap URLs into a standard, ready-to-paste robots.txt.

Start from a preset — allow all, block all, or the WordPress default — then add or remove paths to fine-tune it. Everything is processed in your browser and nothing is sent to a server.

Presets
Allow / Disallow rules
Paths start with a slash (/). An empty Disallow value means 'allow everything'.
Sitemap URL (multiple allowed)
Enter one absolute URL per line. Example: https://example.com/sitemap.xml
Generated robots.txt
User-agent: *
Disallow: 

robots.txt controls crawling, not indexing

The most common misconception is that Disallow removes a page from search results. It does not.robots.txt only stops a crawler from fetching a URL; it does not prevent indexing. If many external links point to that URL, Google can still index the URL without reading its content and show it in results as "No information is available for this page."

To reliably keep a page out of search results, use noindex on the page itself — not robots.txt — via the meta tag <meta name="robots" content="noindex"> or the HTTP header X-Robots-Tag: noindex. Because a crawler must be able to fetch the page to see thenoindex, blocking that same URL in robots.txt while also setting noindexis contradictory.

Directives at a glance

  • User-agent: the crawler the rules apply to. * means all bots, Googlebot only Google.
  • Disallow: a path to block from crawling. It matches by prefix (e.g. /admin/). An empty value allows everything.
  • Allow: an exception inside a disallowed area. The more specific rule wins.
  • Crawl-delay: seconds between requests. Google ignores it; only some crawlers (Bing/Yandex) honor it.
  • Sitemap: the absolute URL of a sitemap. You can list several on separate lines.

Common mistakes

  1. Accidentally blocking the whole site with Disallow: /, dropping it from the index — always verify after deploy.
  2. Blocking CSS/JS so Google can't render the page properly. Don't disallow your static assets.
  3. Missing leading slashes or wrong casing. Paths are case-sensitive and must always start with /.

After deploying, use the robots.txt checker to confirm the live file behaves as intended, and the sitemap validator to verify the sitemaps you declared.

Frequently asked questions

If I block a page in robots.txt, will it disappear from Google?
No. robots.txt only blocks crawling (fetching the page). If external links point to it, the URL can still be indexed without its content. To reliably remove it from search, use a noindex meta tag or X-Robots-Tag header on the page.
What's the difference between Disallow and noindex?
Disallow is a crawl directive meaning 'don't fetch this URL,' while noindex is an indexing directive meaning 'don't index this page.' For noindex to work the crawler must be able to read the page, so applying both to the same URL causes noindex to be ignored.
Where do I put the robots.txt file?
It must live at the domain root (https://example.com/robots.txt). Crawlers won't read it from a subfolder, and each subdomain needs its own robots.txt.
Do I need a Crawl-delay?
It's optional. Googlebot ignores Crawl-delay and uses the crawl-rate setting in Search Console instead. It only affects some crawlers like Bing or Yandex, which is useful when server load is a concern.
Are the paths I enter sent to a server?
No. Everything is generated entirely in your browser and your input is never transmitted.

Related tools

SEO / Indexing