/Complete Guide to Perplexity Bots (PerplexityBot · Perplexity-User)
📘Concept⭐️ Pillar

Complete Guide to Perplexity Bots (PerplexityBot · Perplexity-User)

최종 업데이트:

What are Perplexity bots

Perplexity AI operates two web crawlers. The two bots serve different purposes and are officially documented as behaving differently regarding robots.txt.

⚠️ Warning Perplexity-User is officially documented as "generally ignoring robots.txt." In 2024, external reports claimed PerplexityBot ignored robots.txt; Perplexity issued an official response. This article describes only facts based on official documentation and public reporting.


TL;DR

PerplexityBot (search index) honors robots.txt and is not used for AI training. Perplexity-User (user browsing) is officially documented as "generally ignoring robots.txt." Full blocking requires WAF and IP range blocking together.


Bot identification information

The information below is based on Perplexity's official documentation (docs.perplexity.ai/guides/bots, verified June 2026).

Bot Namerobots.txt KeyPrimary Userobots.txt Compliance
PerplexityBotPerplexityBotPerplexity search index building✅ Compliant
Perplexity-UserPerplexity-UserReal-time URL visits when processing user questions❌ Generally ignores (officially stated)

User-Agent strings (official documentation):

# PerplexityBot
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; PerplexityBot/1.0; +https://perplexity.ai/perplexitybot)

# Perplexity-User
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Perplexity-User/1.0; +https://perplexity.ai/perplexity-user

IP range check:

  • PerplexityBot: https://www.perplexity.com/perplexitybot.jsonhttps://www.perplexity.ai/perplexitybot.json (302 redirect)
  • Perplexity-User: https://www.perplexity.com/perplexity-user.jsonhttps://www.perplexity.ai/perplexity-user.json (302 redirect)

WAF IP range automation scripts should use the final URL (perplexity.ai) or handle redirects.


How each bot works

PerplexityBot — search index

Perplexity's official documentation describes PerplexityBot as "a bot designed to surface your site and links in search results." Importantly, data collected by PerplexityBot is not used for AI foundation model training, according to official statements.

Perplexity-User — user request based

When users ask questions on Perplexity, Perplexity-User visits relevant URLs in real time to fetch answer source data. Perplexity's official documentation states that this bot "generally ignores robots.txt rules." The rationale is the same as ChatGPT-User — visits are user-initiated. Likewise, it is not used for AI model training.


2024 robots.txt controversy — fact-based summary

In June 2024, multiple media outlets including Wired and Forbes reported that PerplexityBot crawled sites blocked by robots.txt. According to reports, some publishers saw Perplexity traffic in logs despite robots.txt settings.

Perplexity CEO Aravind Srinivas acknowledged the issue on his official X account and promised policy improvements. In August 2024, Perplexity announced a revenue-sharing program with publishers.

This controversy is not ongoing; Perplexity officially responded and refined its policy. Current official documentation states PerplexityBot's robots.txt compliance.


Three robots.txt examples

Scenario A. Full allow

# No separate configuration required. Maintains Perplexity search exposure.

Scenario B. Block PerplexityBot only (exclude from search index)

User-agent: PerplexityBot
Disallow: /

# Perplexity-User officially ignores robots.txt
# Full blocking requires WAF + IP blocking together

Scenario C. Full block (WAF required)

User-agent: PerplexityBot
Disallow: /

User-agent: Perplexity-User
Disallow: /

# Perplexity-User blocking is not guaranteed by robots.txt alone
# Combine with IP blocking below:
# curl https://www.perplexity.com/perplexity-user.json

Forced blocking with WAF

Because Perplexity-User ignores robots.txt, WAF (Web Application Firewall) configuration is required for full blocking. Perplexity's official documentation provides implementation methods for Cloudflare and AWS WAF.

Core principle: Always use User-Agent matching + IP range verification together. User-Agent alone is vulnerable to spoofing; IP alone risks misses when ranges change.

Cloudflare WAF configuration

Create a custom rule in Cloudflare dashboard → Security → WAF → Custom Rules.

Block rule:

  • Condition 1: HTTP User-Agent contains PerplexityBot or Perplexity-User
  • Condition 2: Request IP is outside official IP range
  • Combine with AND: block Perplexity User-Agent from outside IP ranges

Allow rule — let legitimate Perplexity bots through:

  • Condition 1: User-Agent contains PerplexityBot or Perplexity-User
  • Condition 2: Request IP is inside official IP range
  • Action: Allow (bypass other security rules)
  • Priority: Higher than block rule

Official IP range JSON:

  • PerplexityBot: https://www.perplexity.com/perplexitybot.json
  • Perplexity-User: https://www.perplexity.com/perplexity-user.json

AWS WAF configuration

Perplexity's official documentation describes a three-step setup.

Step 1: Create IP Sets

In AWS WAF console → IP sets, create separate IP Sets for each bot.

  • PerplexityBot-IPs: CIDR list from perplexitybot.json
  • PerplexityUser-IPs: CIDR list from perplexity-user.json

Step 2: Configure User-Agent string conditions

Create conditions matching each bot's User-Agent header.

  • Header: user-agent
  • Match type: Contains string
  • Value: PerplexityBot (or Perplexity-User)

Step 3: Combine Rule + Web ACL

Add rules combining IP Set and User-Agent conditions with AND to the Web ACL. Set allow rules higher priority than block rules.

Automatic IP range updates

Perplexity may update IP ranges frequently. Manual management causes gaps, so configuring a script that periodically polls official JSON endpoints to auto-update IP Sets is recommended.

# Check current IP ranges
curl -s https://www.perplexity.com/perplexitybot.json
curl -s https://www.perplexity.com/perplexity-user.json

Recommended scenarios

Want Perplexity search exposure: Scenario A. PerplexityBot must collect your site for search results exposure.

Block training, allow citations: PerplexityBot data is officially stated as not used for AI training, so separate training block settings are not needed unless you also block the index.

Full block: Apply robots.txt + WAF + IP blocking together. robots.txt alone does not guarantee Perplexity-User blocking.


Verification — identifying suspicious traffic

# Filter Perplexity bots in server logs
grep -iE "PerplexityBot|Perplexity-User" /var/log/nginx/access.log \
  | awk '{print $4, $7, $12}' | tail -50

# If Perplexity-User traffic appears after robots.txt blocking
# consider adding WAF configuration

Frequently asked questions

Q. If I block PerplexityBot, will I stop appearing in Perplexity answers?
A. You will be excluded from the Perplexity search index. However, Perplexity-User ignores robots.txt, so if users include your URL directly in a question, access may still occur.

Q. Did Perplexity really change after the 2024 controversy?
A. Perplexity's official documentation states PerplexityBot's robots.txt compliance. However, Perplexity-User is still officially documented as "generally ignoring robots.txt." WAF is required for full blocking.

Q. Is data collected by PerplexityBot used for AI training?
A. Official documentation states both PerplexityBot and Perplexity-User are "not used for AI foundation model training."

Q. How do I avoid appearing in Perplexity's source list?
A. Blocking PerplexityBot reduces the likelihood of exposure in the source list by excluding you from the search index. However, cases where users include URLs directly in questions are difficult to control.


References

이 페이지를 참조하는 항목

관련 항목

📘ConceptPillar
Complete Guide to Anthropic Bots (ClaudeBot · Claude-User · Claude-SearchBot)
Anthropic operates three bots for training (ClaudeBot), user browsing (Claude-User), and search indexing (Claude-SearchBot), each controllable independently via robots.txt; Anthropic officially commits to honoring robots.txt.
📙How-to
llms.txt Writing Guide
llms.txt is a markdown-format metadata file that helps LLMs efficiently understand site content efficiently, placed at the site root (/) as an AI-friendly site guide.
📘ConceptPillar
Complete Guide to OpenAI Bots (GPTBot · ChatGPT-User · OAI-SearchBot · OAI-AdsBot)
OpenAI operates four purpose-specific bots for training (GPTBot), user browsing (ChatGPT-User), search indexing (OAI-SearchBot), and ad verification (OAI-AdsBot), each of which can be controlled independently via robots.txt.
📙How-toPillar
AI Citation Tracking Methodology
AI Citation Tracking is a methodology for systematically measuring how often and in what context AI answer engines such as ChatGPT, Perplexity, Claude, and Gemini cite your content — the foundational infrastructure for validating AEO and GEO performance.
📙How-to
Perplexity Citation Optimization
Perplexity citation optimization is the work of securing citations from a real-time web search-based AI.
📙How-to
How to Allow AI Bots in robots.txt
Allowing AI bots means explicitly permitting major AI crawlers such as GPTBot, ClaudeBot, and PerplexityBot to access your site in robots.txt, exposing your content for citation in generative AI answers.

이런 항목도 있어요

이 페이지가 도움이 됐나요?