Free Tool
Robots.txt Generator
Build a robots.txt file with per-bot rules, crawl-delay, restricted directories, and sitemap directives. Or read the guide below first.
Jump to the generatorWhat robots.txt actually does
robots.txt is a plain text file that lives at the root of your domain (yoursite.com/robots.txt) and tells well-behaved crawlers which parts of your site they are welcome to crawl. It is the first file a bot like Googlebot, Bingbot, or any of the AI training crawlers looks at before requesting anything else. The file uses a simple syntax: User-agent lines name the bot, Disallow lines name paths the bot should not crawl, Allow lines carve out exceptions, and Sitemap lines point to your XML sitemaps.
The key thing to understand is that robots.txt is a crawling directive, not an indexing directive, and not a security control. It tells a compliant crawler not to fetch the URL. It does not tell search engines to drop the URL from the index, and it does nothing to stop a hostile or non-compliant bot from making the request anyway. If a page is already indexed and you block it in robots.txt, Google can keep showing the URL in results, just without a snippet.
Used well, robots.txt saves crawl budget on large sites, keeps low-value URLs like internal search results out of the crawler's path, and points search engines at your canonical sitemap. Used badly, it can quietly kill organic traffic to an entire section of the site.
When you need a robots.txt file
Every public site benefits from having one, even if it is empty or just points at a sitemap. You actively need a robots.txt file when you have large faceted navigation that creates near-infinite URL combinations, when you run an internal search that generates indexable URLs you do not want crawled, when you have staging or admin paths that should never be touched, when you want to block known scraper or AI training bots from training on your content, or when you have multiple sitemaps you want every search engine to discover. Small brochure sites can usually get by with a one-liner. Large e-commerce, media, and SaaS sites need a properly considered file that has been reviewed by whoever owns the URL structure.
Common robots.txt mistakes
The most common mistake we see in audits is blocking CSS and JavaScript that Google needs to render the page. Google's rendering pipeline behaves like a real browser. If you Disallow /assets/ or /static/ or /_next/, Google may see an unstyled, empty page and demote the URL. Always let crawlers fetch the rendering resources, even if they look like internal plumbing.
The second mistake is conflicting with your sitemap.xml. If your sitemap lists URLs that robots.txt blocks, Google will flag the contradiction in Search Console and likely ignore both signals. Decide what should be indexed, then make sure the sitemap and robots.txt agree.
Third, accidental wildcard blocks. A line like Disallow: /*? to clean up query parameters can also block legitimate URLs you forgot about. A single forward slash in the wrong place (Disallow: /) takes down the entire site for that user-agent. We have lost count of how many post-launch traffic drops trace back to a staging robots.txt that shipped to production.
Fourth, treating robots.txt as access control. It is not. The file is public, anyone can read it, and listing your /admin or /private paths in it is a roadmap for attackers. Use HTTP auth, IP allowlists, or proper authentication for anything sensitive, and use robots.txt only to manage crawler behaviour on URLs that are already safe to expose.
How to test your robots.txt
Before you push a new robots.txt to production, test it. Google Search Console used to ship a dedicated robots.txt Tester and still surfaces a robots.txt report under Settings, which will tell you when Google last fetched the file, whether it parsed cleanly, and how it interprets each rule. You can also use the URL Inspection tool to check whether a specific URL is blocked. For a quick local check, curl your robots.txt directly (curl -A "Googlebot" https://yoursite.com/robots.txt) and read the response. To verify behaviour at scale, configure Screaming Frog to respect your robots.txt and crawl the site. Any URL the crawler skips is a URL Google will also skip. That is the fastest way to catch wildcard blocks that look fine in isolation but quietly take out hundreds of pages.
Frequently asked questions
Does Google honor robots.txt for indexing?
No, not directly. Google honors it for crawling, which is different. If a URL is blocked in robots.txt but linked from elsewhere on the web, Google can still index the URL (without a snippet) because it knows the URL exists. To keep a page out of the index, use a noindex meta tag or an X-Robots-Tag header on a crawlable URL.
What's the difference between robots.txt and noindex?
robots.txt blocks crawling. noindex blocks indexing. If you noindex a page but also disallow it in robots.txt, Googlebot cannot fetch the page and therefore cannot see the noindex tag, so the URL can still appear in results. Pick one mechanism per URL, and pick the one that matches your actual intent.
Can robots.txt prevent crawlers from finding my page?
No. robots.txt only stops compliant crawlers from fetching the URL. It does not hide the URL from the wider web. If your page is linked, shared, or sitemapped anywhere, crawlers will discover it. For real privacy, put the page behind authentication or remove it from the public internet entirely.
Free Tool
Robots.txt Generator
Build a robots.txt file with crawler-specific access rules, crawl-delay settings, restricted directories, and sitemap directives.
Generator Settings
Configure default and crawler-specific directives
Search Robots
Restricted Directories
Paths are relative to root and typically end with a trailing slash.
Generated Robots.txt
Click create to generate your file, then copy or download it and place it at your site root as /robots.txt.
Need help validating your technical setup? Request a free SEO review.