Skip to content
iMakeMVPs
← Back to Blog
GEOApril 15, 20259 min read

ClaudeBot User Agent: How to Allow Anthropic's Crawlers and Stop Blocking Your Own AI Search Visibility

To allow ClaudeBot, add a dedicated User-agent: ClaudeBot block with Allow: / to your robots.txt file. A wildcard User-agent: * paired with Disallow: / blocks every bot that lacks its own explicit rule, including ClaudeBot. Most small business sites have this configuration and never set it on purpose.

By Samer Shaker

Most Small Business Sites Are Already Blocking ClaudeBot and Don't Know It

We audited 10 small business client sites. Seven of them were blocking ClaudeBot. None of the owners knew.

Split-screen diagram showing robots.txt with wildcard Disallow rule on left in red blocking all bots vs. ClaudeBot-specific Allow block on right in green granting access

The culprit is a single line in robots.txt: User-agent: * followed by Disallow: /. That wildcard applies to every crawler that does not have its own named block. ClaudeBot does not get an exemption by default. So unless your file has a dedicated User-agent: ClaudeBot section, Anthropic's crawler reads that wildcard and stops.

This is not a fringe case. A Calvano study of 12.15 million sites found that 97.4% use a wildcard directive. The wildcard itself is not the problem. The blanket Disallow: / attached to it is.

SEO plugins make this worse. Yoast and SEO Manager both generate robots.txt configurations automatically. Some of those defaults include restrictive disallow rules that the site owner never reviewed. The plugin ships the file, the site owner moves on, and ClaudeBot gets quietly turned away.

Blocking ClaudeBot cuts your site out of Anthropic's training corpus and reduces your visibility in AI-powered search results. If you have already read our guide on the same logic applies here, with one key difference: ClaudeBot needs its own explicit rule.

Anthropic Runs Three Separate Crawlers and Each One Does a Different Job

Most guides treat all Anthropic bots as one thing and give you a single robots.txt rule. That is wrong, and it will leave two of the three crawlers doing whatever they want on your site.

Anthropic runs ClaudeBot, Claude-User, and Claude-SearchBot as distinct systems with different purposes, different speeds, and different costs when blocked. Each one requires its own directive. Anthropic clarifies how Claude bots crawl sites and how to block them in their official guidance.

ClaudeBot: Training Crawler

ClaudeBot is the bulk harvester. It crawls at roughly 500 pages per hour, collecting page content to feed Anthropic's model training pipeline. This is the bot responsible for whether your site ends up in future versions of Claude's knowledge base. Blocking it keeps your content out of training data entirely. The correct token for your robots.txt is ClaudeBot.

Claude-User: Real-Time Fetcher

Claude-User fires on demand, not on a schedule. When a Claude.ai user asks a question that needs live page content, this bot fetches the relevant URLs in real time. It runs at under 10 pages per hour. According to Anthropic, blocking it “prevents our system from retrieving your content in response to a user query, which may reduce your site's visibility for user-directed web search.” The token is Claude-User.

Claude-SearchBot: Search Indexer

Claude-SearchBot builds the index that powers Claude's search results. Like Claude-User, it runs at under 10 pages per hour. Anthropic states that blocking it “may reduce your site's visibility and accuracy in user search results.” The token is Claude-SearchBot.

One more thing before you write any rules: two older tokens, Claude-Web and anthropic-ai, are deprecated but still appear in tutorials. The next section covers exactly what to do with them. If you run a WordPress site, our guide on adding llms.txt to WordPress covers how these bots interact with your content layer beyond robots.txt.

Use the Wrong Token in robots.txt and the Allow Rule Does Nothing

Claude-Web and anthropic-ai are both deprecated. Anthropic no longer honors either token, so any rule you write using them is invisible to the actual ClaudeBot crawler.

Here is the wrong approach:

User-agent: Claude-Web
Allow: /

User-agent: anthropic-ai
Allow: /

Those rules do nothing. ClaudeBot ignores them completely.

The correct token for Anthropic's training crawler is ClaudeBot. If you also want to allow the real-time fetcher that powers Claude's web-browsing feature, use Claude-User. For the search indexer, use Claude-SearchBot.

Here is the correct approach:

User-agent: ClaudeBot
Allow: /

User-agent: Claude-User
Allow: /

User-agent: Claude-SearchBot
Allow: /

Copy that block, paste it into your robots.txt file, and you are done. One wrong token is all it takes to make an allow rule fail silently, so always check the exact string before you save.

How to Audit Your robots.txt File for ClaudeBot Blocks in Under Five Minutes

Your robots.txt file may already be blocking ClaudeBot without a single line that mentions it by name. A single Disallow: / under User-agent: * blocks every crawler that has no explicit Allow rule elsewhere. That includes ClaudeBot. Here is how to find the problem fast.

A browser window showing yourdomain.com/robots.txt with the wildcard Disallow rule highlighted in red and a tooltip reading: This blocks ClaudeBot
  1. Open your browser and go to yourdomain.com/robots.txt. No login required. The file renders as plain text.
  2. Look for a User-agent: * block. Check whether it contains Disallow: / or Disallow: /*. Either one catches ClaudeBot unless you have a specific Allow rule for it further down.
  3. Search the page for ClaudeBot. If you see a Disallow: line under it, that rule overrides everything else for that token.
  4. Check whether your SEO plugin generates this file dynamically. If it does, editing robots.txt manually may not stick after the next save or plugin update.

What to Look For in a Wildcard Rule

97.4% of robots.txt files use a wildcard User-agent: * directive. That number matters because most site owners write their rules there and assume bots they care about are covered. They are not. A wildcard block applies to every bot with no named exception. If your file contains User-agent: * followed by Disallow: /, ClaudeBot is blocked unless a separate User-agent: ClaudeBot block with Allow: / appears below it. Order matters less than specificity: the named block wins. For a deep primer on how robots.txt directives are parsed and prioritized, Moz has a solid reference.

How SEO Plugins Generate Blocking Rules Without Telling You

Yoast SEO and SEO Manager both generate robots.txt files automatically. That convenience is also the risk. A blanket Disallow rule can appear after a plugin update, a setting change, or even a fresh install, without any prompt or notification. Check your plugin's SEO settings for a robots.txt editor or “crawl rules” panel. If the file is plugin-managed, make your ClaudeBot allow rule inside the plugin's interface, not by editing the raw file directly. A raw edit gets overwritten on the next save.

The Correct robots.txt Syntax to Allow All Three Anthropic Crawlers

Here are the three snippets you need. Copy one, paste it into your robots.txt, and save.

Option 1: Allow all three bots (maximum AI visibility)

Use this if you want Claude to index your content, show it in real-time answers, and use it for training.

User-agent: ClaudeBot
Allow: /

User-agent: Claude-User
Allow: /

User-agent: Claude-SearchBot
Allow: /

Option 2: Allow search visibility, block training data collection

This is the right call for most small businesses. You get cited in Claude's live answers without handing your content to Anthropic's training pipeline.

User-agent: Claude-User
Allow: /

User-agent: Claude-SearchBot
Allow: /

User-agent: ClaudeBot
Disallow: /

Blocking ClaudeBot does not touch Claude-User or Claude-SearchBot. Each token operates independently.

Option 3: Safe wildcard with explicit allow exemptions

If your robots.txt already blocks all bots with User-agent: * and Disallow: /, add the exemptions below it:

User-agent: *
Disallow: /

User-agent: Claude-User
Allow: /

User-agent: Claude-SearchBot
Allow: /
Three labeled code blocks showing robots.txt snippets for allowing all three Anthropic crawlers, with color-coded User-agent and Allow lines on a white background

Each bot requires its own User-agent block. A single block listing all three names does not work in standard robots.txt parsing. Write them out separately every time.

Cloudflare's “Block AI Bots” Rule Fires Before robots.txt and Silently Breaks Your Allow Rules

Your robots.txt can be perfectly written and still do nothing. If you use Cloudflare and the “Block AI Bots” managed WAF rule is active, ClaudeBot never reaches your server to read robots.txt. Cloudflare returns a 403 at the network layer first. The allow rules you wrote are irrelevant.

Diagram showing Cloudflare WAF intercepting ClaudeBot with a 403 before robots.txt is reached, with an alternate path showing WAF rule disabled and ClaudeBot reaching the site

Find the Rule in Cloudflare:

  1. Log into your Cloudflare dashboard and select your domain.
  2. Go to Security, then WAF, then Managed Rules.
  3. Search for “Block AI Bots” in the rule list.

Fix It: Two Options

Option A: Disable the rule entirely.

Click the toggle next to “Block AI Bots” to turn it off. All AI crawlers will pass through. Use this if you want broad AI visibility.

Option B: Create a skip rule for ClaudeBot specifically.

In WAF, click “Add rule” and create a custom rule that skips the managed ruleset when the incoming user-agent string matches ClaudeBot. This lets Cloudflare keep blocking other AI crawlers while allowing Anthropic's bots through.

Most site owners who add robots.txt allow rules never check this WAF setting. They assume it worked. It did not. Confirm the WAF rule is off or skipped before testing any ClaudeBot user agent allow configuration. Anthropic's Claude bots also make robots.txt decisions more granular than many site owners realize, which is why network-layer blocks are so easy to miss.

Sites That Allow ClaudeBot Get 376 AI Crawler Visits Per Month. Blocked Sites Get 58.

A Duda analysis of 858,000 sites found that sites with proper AI crawler access receive an average of 376 AI crawler visits per month. Sites that block AI crawlers get 58. That is a 6x gap. Paul Calvano's analysis of 12.15 million sites confirms how widespread wildcard blocking is, with 97.4% of files using the User-agent: * directive that silently catches ClaudeBot.

More crawl visits means more chances to appear when Claude answers a query in your niche. Fewer visits means Claude either has no data about your site or is working from a stale cached snapshot. You cannot show up in AI search results that Claude never indexed you for.

The math is simple. Open the door, get indexed. Keep it shut, stay invisible.

Frequently Asked Questions

How do I allow ClaudeBot in robots.txt?

Add a dedicated User-agent: ClaudeBot block with Allow: / to your robots.txt file. Place it after any User-agent: * block. The named rule takes priority over the wildcard, so ClaudeBot will pass through even if the wildcard has Disallow: /.

Does blocking ClaudeBot hurt my Google rankings?

No. ClaudeBot and Googlebot are separate crawlers run by separate companies. Blocking ClaudeBot has zero effect on Google's ability to index your site. This is an Anthropic-only issue.

Can I block training data collection but keep real-time and search visibility?

Yes. Add Disallow: / under the ClaudeBot user agent to block the training crawler. Then add Allow: / under Claude-User and Claude-SearchBot. Anthropic honors each directive separately.

Does allowing ClaudeBot guarantee my site gets cited in Claude?

No. Access is a prerequisite, not a guarantee. Claude still decides what to cite based on content quality, relevance, and authority. Getting indexed puts you in the running. What you publish determines whether you win.

How do I confirm ClaudeBot is crawling after I update robots.txt?

Check your server access logs for requests with ClaudeBot in the user-agent string. ClaudeBot's full user-agent is:

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)

If you see that string in your logs after updating your rules, the configuration is working.

What is the difference between ClaudeBot and Claude-User?

ClaudeBot is a scheduled training crawler that harvests content for Anthropic's model training pipeline at roughly 500 pages per hour. Claude-User is a real-time fetcher that fires on demand when a Claude.ai user asks a question that requires live web content. They are independent systems with separate user-agent tokens.

Why does my robots.txt Allow rule seem to do nothing?

Two common causes. First, you may be using a deprecated token like Claude-Web or anthropic-ai instead of ClaudeBot. Second, a Cloudflare WAF “Block AI Bots” rule may be returning a 403 before ClaudeBot reaches your server to read robots.txt at all. Check both before assuming the rule failed.

Do I need to add ClaudeBot rules to llms.txt as well?

These are separate systems. robots.txt controls crawler access at the network level. llms.txt is a separate file that tells AI assistants how to use your content. Allowing ClaudeBot in robots.txt is the first step. Adding llms.txt gives Claude structured guidance about what to read and cite. Our guide on adding llms.txt to WordPress covers the next step after you fix robots.txt.

Get Your AI Visibility Score

Find out where you rank in ChatGPT, Claude, and Perplexity, and what is blocking you.

Get My Free AI Audit →