Complete Guide to Perplexity Bots (PerplexityBot · Perplexity-User)
What are Perplexity bots
Perplexity AI operates two web crawlers. The two bots serve different purposes and are officially documented as behaving differently regarding robots.txt.
⚠️ Warning Perplexity-User is officially documented as "generally ignoring robots.txt." In 2024, external reports claimed PerplexityBot ignored robots.txt; Perplexity issued an official response. This article describes only facts based on official documentation and public reporting.
TL;DR
PerplexityBot (search index) honors robots.txt and is not used for AI training. Perplexity-User (user browsing) is officially documented as "generally ignoring robots.txt." Full blocking requires WAF and IP range blocking together.
Bot identification information
The information below is based on Perplexity's official documentation (docs.perplexity.ai/guides/bots, verified June 2026).
| Bot Name | robots.txt Key | Primary Use | robots.txt Compliance |
|---|---|---|---|
| PerplexityBot | PerplexityBot | Perplexity search index building | ✅ Compliant |
| Perplexity-User | Perplexity-User | Real-time URL visits when processing user questions | ❌ Generally ignores (officially stated) |
User-Agent strings (official documentation):
# PerplexityBot
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; PerplexityBot/1.0; +https://perplexity.ai/perplexitybot)
# Perplexity-User
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Perplexity-User/1.0; +https://perplexity.ai/perplexity-user
IP range check:
- PerplexityBot: https://www.perplexity.com/perplexitybot.json → https://www.perplexity.ai/perplexitybot.json (302 redirect)
- Perplexity-User: https://www.perplexity.com/perplexity-user.json → https://www.perplexity.ai/perplexity-user.json (302 redirect)
WAF IP range automation scripts should use the final URL (perplexity.ai) or handle redirects.
How each bot works
PerplexityBot — search index
Perplexity's official documentation describes PerplexityBot as "a bot designed to surface your site and links in search results." Importantly, data collected by PerplexityBot is not used for AI foundation model training, according to official statements.
Perplexity-User — user request based
When users ask questions on Perplexity, Perplexity-User visits relevant URLs in real time to fetch answer source data. Perplexity's official documentation states that this bot "generally ignores robots.txt rules." The rationale is the same as ChatGPT-User — visits are user-initiated. Likewise, it is not used for AI model training.
2024 robots.txt controversy — fact-based summary
In June 2024, multiple media outlets including Wired and Forbes reported that PerplexityBot crawled sites blocked by robots.txt. According to reports, some publishers saw Perplexity traffic in logs despite robots.txt settings.
Perplexity CEO Aravind Srinivas acknowledged the issue on his official X account and promised policy improvements. In August 2024, Perplexity announced a revenue-sharing program with publishers.
This controversy is not ongoing; Perplexity officially responded and refined its policy. Current official documentation states PerplexityBot's robots.txt compliance.
Three robots.txt examples
Scenario A. Full allow
# No separate configuration required. Maintains Perplexity search exposure.
Scenario B. Block PerplexityBot only (exclude from search index)
User-agent: PerplexityBot
Disallow: /
# Perplexity-User officially ignores robots.txt
# Full blocking requires WAF + IP blocking together
Scenario C. Full block (WAF required)
User-agent: PerplexityBot
Disallow: /
User-agent: Perplexity-User
Disallow: /
# Perplexity-User blocking is not guaranteed by robots.txt alone
# Combine with IP blocking below:
# curl https://www.perplexity.com/perplexity-user.json
Forced blocking with WAF
Because Perplexity-User ignores robots.txt, WAF (Web Application Firewall) configuration is required for full blocking. Perplexity's official documentation provides implementation methods for Cloudflare and AWS WAF.
Core principle: Always use User-Agent matching + IP range verification together. User-Agent alone is vulnerable to spoofing; IP alone risks misses when ranges change.
Cloudflare WAF configuration
Create a custom rule in Cloudflare dashboard → Security → WAF → Custom Rules.
Block rule:
- Condition 1: HTTP User-Agent contains PerplexityBot or Perplexity-User
- Condition 2: Request IP is outside official IP range
- Combine with AND: block Perplexity User-Agent from outside IP ranges
Allow rule — let legitimate Perplexity bots through:
- Condition 1: User-Agent contains PerplexityBot or Perplexity-User
- Condition 2: Request IP is inside official IP range
- Action: Allow (bypass other security rules)
- Priority: Higher than block rule
Official IP range JSON:
- PerplexityBot: https://www.perplexity.com/perplexitybot.json
- Perplexity-User: https://www.perplexity.com/perplexity-user.json
AWS WAF configuration
Perplexity's official documentation describes a three-step setup.
Step 1: Create IP Sets
In AWS WAF console → IP sets, create separate IP Sets for each bot.
- PerplexityBot-IPs: CIDR list from perplexitybot.json
- PerplexityUser-IPs: CIDR list from perplexity-user.json
Step 2: Configure User-Agent string conditions
Create conditions matching each bot's User-Agent header.
- Header: user-agent
- Match type: Contains string
- Value: PerplexityBot (or Perplexity-User)
Step 3: Combine Rule + Web ACL
Add rules combining IP Set and User-Agent conditions with AND to the Web ACL. Set allow rules higher priority than block rules.
Automatic IP range updates
Perplexity may update IP ranges frequently. Manual management causes gaps, so configuring a script that periodically polls official JSON endpoints to auto-update IP Sets is recommended.
# Check current IP ranges
curl -s https://www.perplexity.com/perplexitybot.json
curl -s https://www.perplexity.com/perplexity-user.json
Recommended scenarios
Want Perplexity search exposure: Scenario A. PerplexityBot must collect your site for search results exposure.
Block training, allow citations: PerplexityBot data is officially stated as not used for AI training, so separate training block settings are not needed unless you also block the index.
Full block: Apply robots.txt + WAF + IP blocking together. robots.txt alone does not guarantee Perplexity-User blocking.
Verification — identifying suspicious traffic
# Filter Perplexity bots in server logs
grep -iE "PerplexityBot|Perplexity-User" /var/log/nginx/access.log \
| awk '{print $4, $7, $12}' | tail -50
# If Perplexity-User traffic appears after robots.txt blocking
# consider adding WAF configuration
Frequently asked questions
Q. If I block PerplexityBot, will I stop appearing in Perplexity answers?
A. You will be excluded from the Perplexity search index. However, Perplexity-User ignores robots.txt, so if users include your URL directly in a question, access may still occur.
Q. Did Perplexity really change after the 2024 controversy?
A. Perplexity's official documentation states PerplexityBot's robots.txt compliance. However, Perplexity-User is still officially documented as "generally ignoring robots.txt." WAF is required for full blocking.
Q. Is data collected by PerplexityBot used for AI training?
A. Official documentation states both PerplexityBot and Perplexity-User are "not used for AI foundation model training."
Q. How do I avoid appearing in Perplexity's source list?
A. Blocking PerplexityBot reduces the likelihood of exposure in the source list by excluding you from the search index. However, cases where users include URLs directly in questions are difficult to control.
References
- Perplexity official bot documentation: https://docs.perplexity.ai/guides/bots (verified June 2026)
- PerplexityBot IP range: https://www.perplexity.com/perplexitybot.json
- Perplexity-User IP range: https://www.perplexity.com/perplexity-user.json
- Wired (2024.06). Perplexity Is a Bullshit Machine. https://www.wired.com/story/perplexity-is-a-bullshit-machine/