Complete Guide to Anthropic Bots (ClaudeBot · Claude-User · Claude-SearchBot)

What are Anthropic bots

Anthropic operates three web crawlers for Claude AI. Like OpenAI, they separate bots by purpose, and each can be controlled independently via robots.txt. Anthropic explicitly commits to robots.txt compliance in its official documentation.

TL;DR

Distinguish ClaudeBot (training), Claude-User (user browsing), and Claude-SearchBot (search index). Anthropic officially states it honors robots.txt and does not use CAPTCHA bypass technology. IP ranges are available at claude.com/crawling/bots.json.

Bot identification information

The information below is based on Anthropic's official support documentation (support.claude.com, verified June 2026).

Bot Name	robots.txt Key	Primary Use	Block Effect
ClaudeBot	ClaudeBot	AI model training data collection	Signal to exclude from AI training datasets
Claude-User	Claude-User	When users access URLs in Claude	May reduce visibility in user web search
Claude-SearchBot	Claude-SearchBot	Index for Claude search quality improvement	May reduce search result accuracy and visibility

IP range check: https://claude.com/crawling/bots.json

Anthropic's official documentation advises contacting Anthropic support if suspicious crawl traffic is found, including your domain.

How each bot works

ClaudeBot — for training

ClaudeBot collects web data for Claude model training. Anthropic's official documentation describes blocking ClaudeBot as "sending a signal that you will be excluded from AI training datasets." Blocking via robots.txt stops future training collection but does not affect data already collected.

Claude-User — user browsing

It activates when users submit specific URLs to Claude via Claude.ai or the Claude API, or request web search. Anthropic's documentation states that blocking may "reduce your site's visibility in user-based web search." It is related to real-time Claude answer citations.

Claude-SearchBot — search index

It builds an index to improve results for Claude's in-product search features. Blocking may reduce your site's exposure accuracy in Claude search.

Anthropic's crawling policy commitments

As stated in Anthropic's official documentation:

✅ robots.txt compliance: Respects "do not crawl" signals
✅ Non-invasive crawling: Does not disrupt site operations
✅ Transparency: Publicly provides information about its crawlers
✅ No CAPTCHA bypass: Does not use CAPTCHA bypass technology
✅ Separate subdomain application: robots.txt on main domains and subdomains is applied separately

Three robots.txt examples

Scenario A. Full allow (default)

# No separate configuration required. All Anthropic bots operate under default policy.

Scenario B. Block training only, allow answer citations and search (recommended)

# ClaudeBot: block training data collection
User-agent: ClaudeBot
Disallow: /

# Claude-User, Claude-SearchBot are allowed
# → Maintains Claude answer citations and search exposure

Scenario C. Full block

User-agent: ClaudeBot
Disallow: /

User-agent: Claude-User
Disallow: /

User-agent: Claude-SearchBot
Disallow: /

Subdomain note

# Subdomains must be configured separately (Anthropic official documentation)
# blog.example.com/robots.txt needs the same settings

Recommended scenarios (SMB baseline)

General SMBs: Scenario B recommended. Block only ClaudeBot to minimize training data provision while maintaining Claude answer citation opportunities.

Content asset businesses: Scenario C. Block both training and citations. Exposure in Claude will disappear.

Maximum AI exposure strategy: Scenario A (full allow). Anthropic's robots.txt compliance makes this a high-trust choice.

Verification

# Filter Anthropic bots in server logs
grep -iE "ClaudeBot|Claude-User|Claude-SearchBot" /var/log/nginx/access.log \
  | awk '{print $4, $7, $12}' \
  | tail -50

# Check IP ranges (reference for bot verification only — IP blocking not recommended)
# curl https://claude.com/crawling/bots.json

⚠️ IP blocking not recommended Anthropic's official documentation states that "blocking IP addresses may prevent Anthropic from reading robots.txt, so opt-out may not be correctly or consistently guaranteed." When blocking Anthropic bots, robots.txt is officially recommended over IP blocking.

Frequently asked questions

Q. If I block ClaudeBot, will I stop appearing in Claude answers?
A. Blocking ClaudeBot is not directly connected to Claude answer citations. Answer citations are primarily handled by Claude-User and Claude-SearchBot. Blocking ClaudeBot only stops training data provision; Claude's real-time answer citation channels remain open.

Q. What is the relationship between Claude Citations API and ClaudeBot?
A. Anthropic's Citations API is a feature for developers to specify sources in answers via the Claude API. It is a separate system from ClaudeBot crawling. The Citations API works from documents developers provide and is not directly connected to ClaudeBot's web crawl data.

Q. Does Anthropic really honor robots.txt?
A. Anthropic explicitly commits to robots.txt compliance and not using CAPTCHA bypass in its official documentation. Unlike controversies involving OpenAI or Perplexity, there are no publicly reported cases of Anthropic ignoring robots.txt.

Q. Do I need separate settings for subdomains?
A. Yes. Anthropic's official documentation states that robots.txt on subdomains must be applied separately from the main domain. Settings for blog.example.com must be added separately at blog.example.com/robots.txt.

Q. Can I set Crawl-delay?
A. Yes. Anthropic supports the non-standard Crawl-delay directive. To reduce crawl frequency, you can add Crawl-delay: 10 (in seconds).

References

Anthropic official crawler documentation: https://support.claude.com/en/articles/8896518-does-anthropic-crawl-data-from-the-web-and-how-can-site-owners-block-the-crawler (verified June 2026)
Anthropic bot IP ranges: https://claude.com/crawling/bots.json