Free Tool

Robots.txt Tester

A free robots.txt checker that works the way Google does. Paste your file or fetch a live one, then test real URLs against Googlebot, Bingbot, GPTBot, ClaudeBot, and more. Or read the guide below first.

Jump to the tester

How Google actually parses robots.txt

Most robots.txt mistakes are not typos. They are misunderstandings of how Google reads the file. Google parses robots.txt into groups. A group starts with one or more consecutive User-agent lines, followed by the Allow and Disallow rules that belong to them. When Googlebot fetches the file, it does not read it top to bottom and obey everything. It selects exactly one rule set: the group with the most specific user-agent that matches its product token. An exact match like User-agent: Googlebot beats the wildcard User-agent: *, and once a named group matches, the wildcard group is ignored entirely for that crawler. Write a permissive Googlebot group next to a strict * group and Googlebot only sees the permissive one.

Within the selected group, precedence between Allow and Disallow is decided by specificity, not order. The rule with the longest matching path wins. Disallow: /search blocks /search/flights, but Allow: /search/about is longer, so that one URL stays crawlable. When an Allow and a Disallow match with equal length, Google uses the least restrictive rule, so Allow wins the tie. The order the rules appear in the file is irrelevant.

Two wildcard characters extend path matching. An asterisk (*) matches any sequence of characters, so Disallow: /*?sort= blocks any URL containing that parameter. A dollar sign ($) anchors the rule to the end of the URL, so Disallow: /*.pdf$ blocks PDFs but not /report.pdf.html. Paths are case-sensitive: /Admin and /admin are different URLs to a crawler. Our tester implements all of these behaviours, reports the exact rule and line number that decided each verdict, and flags parse problems like unknown directives or rules that appear before any User-agent line.

Why Search Console retired its robots.txt tester

For years the standard advice was simple: test robots.txt in Google Search Console. Then Google retired the standalone robots.txt Tester in late 2023, replacing it with a robots.txt report under Settings. The report is useful for what it does. It shows when Google last fetched your file, which version it is using, and whether the file parsed cleanly. What it no longer offers is the interactive part: typing in a URL, picking a user-agent, and getting an instant allowed-or-blocked answer before you deploy a change.

That gap matters because robots.txt edits are one of the few SEO changes that can take out an entire site in a single line. The URL Inspection tool can still tell you whether one indexed URL is blocked, but it is slow, only works for properties you own, and tests one URL at a time against the live file rather than a draft. This tool fills the gap: validate a draft before it ships, test many URLs at once, and check crawlers that Search Console never covered, including the AI crawlers that now account for a meaningful share of bot traffic.

False assumptions that break robots.txt files

The first false assumption is that rules apply in the order written, so an early Allow protects you from a later Disallow. It does not. Only length decides precedence. The second is that the * group acts as a baseline that named groups add to. It does not. A crawler that matches a named group ignores the * group completely, so any rule you want Googlebot to follow must live in the Googlebot group (or only in *, with no named group present).

The third is that Disallow: /page blocks only that page. It is a prefix match, so it also blocks /page-two, /pages/, and /page.html. If you mean exactly one URL, anchor it with $. The fourth is that blocking a URL removes it from Google. robots.txt controls crawling, not indexing. A blocked URL that has inbound links can still be indexed without a snippet, and because Googlebot can no longer fetch it, it cannot even see a noindex tag you add later.

Finally, people assume an error response is safe. It is not neutral. Google treats a 404 for robots.txt as full permission to crawl, but a 5xx error as a temporary signal to stop crawling the whole site. If your robots.txt endpoint starts throwing server errors, your crawl rate can collapse even though the file itself never changed.

AI crawler user-agents worth testing

Search engines are no longer the only bots that matter. AI companies crawl the web for training data and for live retrieval, and they identify themselves with their own product tokens. GPTBot is OpenAI's training crawler, while ChatGPT-User fetches pages when a ChatGPT user asks about your site. ClaudeBot crawls for Anthropic. PerplexityBot powers Perplexity's answer engine. CCBot is Common Crawl, whose archive feeds many training datasets. Each of these reads robots.txt and selects its own group using the same most-specific-match logic as Googlebot.

That creates a new class of robots.txt accidents. A blanket Disallow in the * group quietly blocks every AI crawler too, which may or may not be what you want now that AI assistants send real referral traffic. Conversely, blocking GPTBot does nothing to ChatGPT-User, because they are different tokens with different jobs. Use the user-agent dropdown in the tester to check each one individually and confirm your file says what you think it says, for search bots and AI bots alike.

Need to build the file, not just test it?

This tester pairs with our free robots.txt generator. Build a clean file with per-bot rules, restricted directories, AI crawler blocks, and sitemap directives, then paste it back here and verify every important URL before you deploy.

Open the Robots.txt Generator

Frequently asked questions

How do I check if Googlebot is blocked from a URL?

Fetch your domain's live robots.txt with the tool above (or paste the file), select Googlebot in the user-agent dropdown, and enter the URL path. The tester applies Google's group selection and longest-match precedence and shows an Allowed or Blocked verdict, plus the exact rule line that made the decision.

Does this tester match Google's official behaviour?

It implements the rules Google documents and ships in its open-source robots.txt parser: most-specific user-agent group selection, longest-path precedence between Allow and Disallow, Allow winning equal-length ties, * and $ wildcards, and case-sensitive path matching. Edge cases around exotic percent-encoding are simplified, so for unusual encoded URLs treat the verdict as a strong signal rather than a guarantee.

My site has no robots.txt file. Is that a problem?

No. A missing robots.txt (a 404) means crawlers can fetch everything, which is the correct state for many small sites. It only becomes a problem when you need to manage crawl budget, keep parameterised or internal-search URLs out of the crawl, or set policy for AI crawlers. In that case, create one with our generator and test it here before deploying.

Why does the tester say Allowed when I have Disallow rules?

Usually because of group selection. If your Disallow rules sit under User-agent: * but a named group exists for the crawler you are testing, the named group wins and the * rules never apply. Other common causes: a longer Allow rule outranks your Disallow, the rule is case-sensitive and the path differs, or the rule sits above any User-agent line, in which case the parser ignores it and the tool reports a warning.

Free Tool