Complete Guide to Perplexity Bots (PerplexityBot · Perplexity-User)

What are Perplexity bots

Perplexity AI operates two web crawlers. The two bots serve different purposes and are officially documented as behaving differently regarding robots.txt.

⚠️ Warning Perplexity-User is officially documented as "generally ignoring robots.txt." In 2024, external reports claimed PerplexityBot ignored robots.txt; Perplexity issued an official response. This article describes only facts based on official documentation and public reporting.

TL;DR

PerplexityBot (search index) honors robots.txt and is not used for AI training. Perplexity-User (user browsing) is officially documented as "generally ignoring robots.txt." Full blocking requires WAF and IP range blocking together.

Bot identification information

The information below is based on Perplexity's official documentation (docs.perplexity.ai/guides/bots, verified June 2026).

Bot Name	robots.txt Key	Primary Use	robots.txt Compliance
PerplexityBot	PerplexityBot	Perplexity search index building	✅ Compliant
Perplexity-User	Perplexity-User	Real-time URL visits when processing user questions	❌ Generally ignores (officially stated)

User-Agent strings (official documentation):

# PerplexityBot
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; PerplexityBot/1.0; +https://perplexity.ai/perplexitybot)

# Perplexity-User
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Perplexity-User/1.0; +https://perplexity.ai/perplexity-user

IP range check:

PerplexityBot: https://www.perplexity.com/perplexitybot.json → https://www.perplexity.ai/perplexitybot.json (302 redirect)
Perplexity-User: https://www.perplexity.com/perplexity-user.json → https://www.perplexity.ai/perplexity-user.json (302 redirect)

WAF IP range automation scripts should use the final URL (perplexity.ai) or handle redirects.

How each bot works

PerplexityBot — search index

Perplexity's official documentation describes PerplexityBot as "a bot designed to surface your site and links in search results." Importantly, data collected by PerplexityBot is not used for AI foundation model training, according to official statements.

Perplexity-User — user request based

When users ask questions on Perplexity, Perplexity-User visits relevant URLs in real time to fetch answer source data. Perplexity's official documentation states that this bot "generally ignores robots.txt rules." The rationale is the same as ChatGPT-User — visits are user-initiated. Likewise, it is not used for AI model training.

2024 robots.txt controversy — fact-based summary

In June 2024, multiple media outlets including Wired and Forbes reported that PerplexityBot crawled sites blocked by robots.txt. According to reports, some publishers saw Perplexity traffic in logs despite robots.txt settings.

Perplexity CEO Aravind Srinivas acknowledged the issue on his official X account and promised policy improvements. In August 2024, Perplexity announced a revenue-sharing program with publishers.

This controversy is not ongoing; Perplexity officially responded and refined its policy. Current official documentation states PerplexityBot's robots.txt compliance.

Three robots.txt examples

Scenario A. Full allow

# No separate configuration required. Maintains Perplexity search exposure.

Scenario B. Block PerplexityBot only (exclude from search index)

User-agent: PerplexityBot
Disallow: /

# Perplexity-User officially ignores robots.txt
# Full blocking requires WAF + IP blocking together

Scenario C. Full block (WAF required)

User-agent: PerplexityBot
Disallow: /

User-agent: Perplexity-User
Disallow: /

# Perplexity-User blocking is not guaranteed by robots.txt alone
# Combine with IP blocking below:
# curl https://www.perplexity.com/perplexity-user.json

Forced blocking with WAF

Because Perplexity-User ignores robots.txt, WAF (Web Application Firewall) configuration is required for full blocking. Perplexity's official documentation provides implementation methods for Cloudflare and AWS WAF.

Core principle: Always use User-Agent matching + IP range verification together. User-Agent alone is vulnerable to spoofing; IP alone risks misses when ranges change.

Cloudflare WAF configuration

Create a custom rule in Cloudflare dashboard → Security → WAF → Custom Rules.

Block rule:

Condition 1: HTTP User-Agent contains PerplexityBot or Perplexity-User
Condition 2: Request IP is outside official IP range
Combine with AND: block Perplexity User-Agent from outside IP ranges

Allow rule — let legitimate Perplexity bots through:

Condition 1: User-Agent contains PerplexityBot or Perplexity-User
Condition 2: Request IP is inside official IP range
Action: Allow (bypass other security rules)
Priority: Higher than block rule

Official IP range JSON:

PerplexityBot: https://www.perplexity.com/perplexitybot.json
Perplexity-User: https://www.perplexity.com/perplexity-user.json

AWS WAF configuration

Perplexity's official documentation describes a three-step setup.

Step 1: Create IP Sets

In AWS WAF console → IP sets, create separate IP Sets for each bot.

PerplexityBot-IPs: CIDR list from perplexitybot.json
PerplexityUser-IPs: CIDR list from perplexity-user.json

Step 2: Configure User-Agent string conditions

Create conditions matching each bot's User-Agent header.

Header: user-agent
Match type: Contains string
Value: PerplexityBot (or Perplexity-User)

Step 3: Combine Rule + Web ACL

Add rules combining IP Set and User-Agent conditions with AND to the Web ACL. Set allow rules higher priority than block rules.

Automatic IP range updates

Perplexity may update IP ranges frequently. Manual management causes gaps, so configuring a script that periodically polls official JSON endpoints to auto-update IP Sets is recommended.

# Check current IP ranges
curl -s https://www.perplexity.com/perplexitybot.json
curl -s https://www.perplexity.com/perplexity-user.json

Recommended scenarios

Want Perplexity search exposure: Scenario A. PerplexityBot must collect your site for search results exposure.

Block training, allow citations: PerplexityBot data is officially stated as not used for AI training, so separate training block settings are not needed unless you also block the index.

Full block: Apply robots.txt + WAF + IP blocking together. robots.txt alone does not guarantee Perplexity-User blocking.

Verification — identifying suspicious traffic

# Filter Perplexity bots in server logs
grep -iE "PerplexityBot|Perplexity-User" /var/log/nginx/access.log \
  | awk '{print $4, $7, $12}' | tail -50

# If Perplexity-User traffic appears after robots.txt blocking
# consider adding WAF configuration

Frequently asked questions

Q. If I block PerplexityBot, will I stop appearing in Perplexity answers?
A. You will be excluded from the Perplexity search index. However, Perplexity-User ignores robots.txt, so if users include your URL directly in a question, access may still occur.

Q. Did Perplexity really change after the 2024 controversy?
A. Perplexity's official documentation states PerplexityBot's robots.txt compliance. However, Perplexity-User is still officially documented as "generally ignoring robots.txt." WAF is required for full blocking.

Q. Is data collected by PerplexityBot used for AI training?
A. Official documentation states both PerplexityBot and Perplexity-User are "not used for AI foundation model training."

Q. How do I avoid appearing in Perplexity's source list?
A. Blocking PerplexityBot reduces the likelihood of exposure in the source list by excluding you from the search index. However, cases where users include URLs directly in questions are difficult to control.

References

Perplexity official bot documentation: https://docs.perplexity.ai/guides/bots (verified June 2026)
PerplexityBot IP range: https://www.perplexity.com/perplexitybot.json
Perplexity-User IP range: https://www.perplexity.com/perplexity-user.json
Wired (2024.06). Perplexity Is a Bullshit Machine. https://www.wired.com/story/perplexity-is-a-bullshit-machine/