/Complete Guide to OpenAI Bots (GPTBot · ChatGPT-User · OAI-SearchBot · OAI-AdsBot)
📘Concept⭐️ Pillar

Complete Guide to OpenAI Bots (GPTBot · ChatGPT-User · OAI-SearchBot · OAI-AdsBot)

최종 업데이트:

What are OpenAI bots

OpenAI does not operate a single bot but four separate crawlers by purpose. Rather than handling all collection with one User-Agent, training, citation, search, and ad verification are each handled by different bots. The key advantage of this structure is that you can selectively block specific bots in robots.txt.


TL;DR

You must distinguish GPTBot (training), ChatGPT-User (user browsing), OAI-SearchBot (ChatGPT Search index), and OAI-AdsBot (ad verification). If you want to block training but allow AI answer citations, the recommended setup is to block only GPTBot and allow the rest.


Bot identification information

The information below is from OpenAI's official documentation (developers.openai.com/api/docs/bots, verified June 2026).

Bot Namerobots.txt KeyPrimary UseIP Range Published
GPTBotGPTBotAI model training data collectionopenai.com/gptbot.json
ChatGPT-UserChatGPT-UserWhen users use ChatGPT browsing featuresopenai.com/chatgpt-user.json
OAI-SearchBotOAI-SearchBotChatGPT Search index buildingopenai.com/searchbot.json
OAI-AdsBotOAI-AdsBotChatGPT ad safety verification (not used for training)

User-Agent strings (official documentation)

# GPTBot
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; GPTBot/1.3; +https://openai.com/gptbot

# ChatGPT-User
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot

# OAI-SearchBot
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36; compatible; OAI-SearchBot/1.3; +https://openai.com/searchbot

# OAI-AdsBot
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; OAI-AdsBot/1.0; +https://openai.com/adsbot

⚠️ Warning User-Agent version numbers (e.g., /1.3) may change. When filtering server logs, matching on GPTBot alone without the version number is safer.


How each bot works

GPTBot — for training

GPTBot collects web data for OpenAI's AI model training. Collected content may be used for pre-training or fine-tuning of GPT-family models. Blocking via robots.txt stops future training data collection but does not affect data already collected.

ChatGPT-User — user browsing

ChatGPT-User activates when users enter URLs or use browsing features in ChatGPT to fetch those pages. OpenAI's official documentation states that "because this bot's visits are user-initiated, robots.txt rules may not apply." It is directly connected to ChatGPT answer citations.

OAI-SearchBot — ChatGPT Search index

OAI-SearchBot builds the search index for ChatGPT's web search feature (ChatGPT Search). It controls whether your site is included in OpenAI's own index, separate from the Bing index.

OAI-AdsBot — ad verification

It verifies the safety of pages registered as ChatGPT ads by advertisers. Collected data is not used for model training.


Three robots.txt examples

Scenario A. Full allow (default — no action needed)

# No separate configuration required. All OpenAI bots operate under default policy.

Scenario B. Block training only, allow answer citations and search (recommended for SMBs)

# GPTBot: block training data collection
User-agent: GPTBot
Disallow: /

# ChatGPT-User, OAI-SearchBot, OAI-AdsBot are allowed (default)
# → Maintains ChatGPT answer citations and ChatGPT Search exposure

Scenario C. Full block

# Block all OpenAI bots
# ChatGPT answer citations and ChatGPT Search exposure may also disappear

User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: OAI-SearchBot
Disallow: /

User-agent: OAI-AdsBot
Disallow: /

Recommended scenarios (SMB baseline)

General SMBs (cafes, clinics, agencies, etc.): Scenario B recommended. Minimize training data provision while maintaining opportunities for exposure in ChatGPT answers.

Content asset businesses (media, education, publishing): Choose Scenario C if sensitive to unauthorized content training. Note that exposure in ChatGPT answers and search will disappear.

Maximum AI exposure strategy: Scenario A (full allow). Providing content as AI training data increases the long-term likelihood of being cited as an authoritative source in AI answers.


Verification — checking bot traffic in server logs

# Filter OpenAI bots in Nginx access.log
grep -iE "GPTBot|ChatGPT-User|OAI-SearchBot|OAI-AdsBot" /var/log/nginx/access.log \
  | awk '{print $4, $7, $12}' \
  | tail -50

# Check bot IP ranges (published JSON files)
# curl https://openai.com/gptbot.json
# curl https://openai.com/chatgpt-user.json

Frequently asked questions

Q. If I block GPTBot, will I stop appearing in ChatGPT answers?
A. Not necessarily. ChatGPT answer citations are primarily handled by ChatGPT-User and OAI-SearchBot. Blocking GPTBot only stops training data provision; answer citation channels remain open. To block answer citations as well, you must also block ChatGPT-User and OAI-SearchBot.

Q. How long after changing robots.txt does it take effect?
A. GPTBot typically recognizes updated robots.txt within days to weeks. ChatGPT-User may apply immediately since it operates in real time on user requests. OpenAI does not officially specify exact timing.

Q. Is it true that ChatGPT-User ignores robots.txt?
A. OpenAI's official documentation states that "because ChatGPT-User visits are user-initiated, robots.txt rules may not apply." In other words, full blocking via robots.txt is not guaranteed.

Q. If the version number in the User-Agent string changes, will blocking stop working?
A. robots.txt matches on the bot name (GPTBot, ChatGPT-User, etc.), not the full User-Agent string. Blocking remains in effect as long as the bot name is the same, even if the version number changes.

Q. What's the difference between IP range blocking and robots.txt blocking?
A. robots.txt blocking is a "policy notice," and whether bots respect it depends on operator policy. IP range blocking physically rejects requests at the server level. IP blocking is stronger but requires maintenance when OpenAI changes IP ranges. Using both methods together is most reliable.


References

이 페이지를 참조하는 항목

관련 항목

📕ChecklistPillar
AI Bot robots.txt Matrix — Comprehensive Comparison and Setup Guide
A unified reference guide that compares policy, robots.txt settings, and recommended scenarios for six major AI answer engines and LLM training bots on one screen, with copy-ready robots.txt templates for each scenario.
📘ConceptPillar
Complete Guide to Anthropic Bots (ClaudeBot · Claude-User · Claude-SearchBot)
Anthropic operates three bots for training (ClaudeBot), user browsing (Claude-User), and search indexing (Claude-SearchBot), each controllable independently via robots.txt; Anthropic officially commits to honoring robots.txt.
📘ConceptPillar
Google-Extended Complete Guide — A Policy Token, Not a Bot
Google-Extended is not an independent crawler but a robots.txt control token that governs whether data already collected by Googlebot may be used for Gemini model training and Vertex AI grounding—without affecting Google Search visibility or rankings.
📙How-to
llms.txt Writing Guide
llms.txt is a markdown-format metadata file that helps LLMs efficiently understand site content efficiently, placed at the site root (/) as an AI-friendly site guide.
📘ConceptPillar
Complete Guide to Perplexity Bots (PerplexityBot · Perplexity-User)
Perplexity operates two bots — PerplexityBot for search indexing and Perplexity-User for user-initiated requests; PerplexityBot honors robots.txt but Perplexity-User is officially documented as generally ignoring robots.txt.
📙How-toPillar
AI Citation Tracking Methodology
AI Citation Tracking is a methodology for systematically measuring how often and in what context AI answer engines such as ChatGPT, Perplexity, Claude, and Gemini cite your content — the foundational infrastructure for validating AEO and GEO performance.
📙How-to
ChatGPT Citation Optimization
ChatGPT citation optimization is the work of getting content cited in ChatGPT answers.
📙How-to
How to Allow AI Bots in robots.txt
Allowing AI bots means explicitly permitting major AI crawlers such as GPTBot, ClaudeBot, and PerplexityBot to access your site in robots.txt, exposing your content for citation in generative AI answers.

이런 항목도 있어요

이 페이지가 도움이 됐나요?