/Wikipedia Entity Registration Guide
📙How-to

Wikipedia Entity Registration Guide

최종 업데이트:

How Wikipedia Affects LLMs

“When we ask AI about our company, it gives wrong information—or does not know us at all.” One direct fix for this problem is a Wikipedia entry.

Nearly all major LLMs use Wikipedia as a core training data source. Wikipedia is included in pre-training datasets for GPT, Claude, Gemini, Llama, and others, and structured text from Wikipedia is widely understood to receive high weight during training. Having a brand entry on Wikipedia is a strong signal that the LLM “knows” that brand.

Wikipedia connects to Wikidata, and Wikidata is one of the data sources for Google’s Knowledge Graph. When an entity is registered in the Knowledge Graph, it also influences entity recognition in Google AI Overviews.

Wikipedia vs. Wikidata

WikipediaWikidata
FormatEncyclopedia article (prose)Structured data (property–value pairs)
AuthoringEditors write proseDirect property value entry
LLM impactText training → brand knowledgeStructured fact extraction
EligibilityStrict (Notability GNG required)Relatively lenient
ScopeEach language edition operates independentlySingle language-neutral database

They are separate projects but linked. Wikipedia articles connect to Wikidata QIDs; conversely, a Wikidata item can exist without a Wikipedia article.

Notability Requirements

Wikipedia assesses eligibility through the General Notability Guideline (GNG). The standard is:

"A topic is presumed to be suitable for a stand-alone article or list when it has received significant coverage in reliable sources that are independent of the subject."

Breaking down the core requirements:

Significant Coverage

  • Coverage that directly addresses the subject, not mere mentions
  • Substantial reporting with detail (articles that only republish press releases do not count)
  • Multiple independent sources (no fixed minimum count; quality matters)

Independent Sources

  • Company press releases, official websites, and autobiographies do not count
  • Third-party sources with no direct interest in the company
  • Media within the same corporate group count as a single source

Reliable Sources

  • Publications with editorial oversight
  • Mainstream news, academic publishing, verified industry media
  • Online or offline, any language

Note: If these criteria are not met, even a published article may face an AfD (Articles for Deletion) nomination. Verify eligibility before attempting publication.

Five-Step Registration Process

Step 1: Verify notability yourself

Before attempting publication, confirm whether your company meets GNG. Checklist:

  • At least three substantive reports in mainstream media independent of your company?
  • Is each report more than a simple republish of your press release?
  • Does coverage directly address the company (not just a passing mention)?

If you fall short, postpone publication and build media coverage through PR first.

Step 2: Register on Wikidata first

Wikidata has more lenient entry criteria than Wikipedia. Create a company item on Wikidata first and enter basic properties (company name, founding year, location, website, industry). Wikidata registration is possible regardless of GNG status.

Step 3: Draft the English Wikipedia article (Draft space)

Wikipedia allows drafting in the Draft namespace (Draft:CompanyName). At draft stage, you can receive feedback from the editor community.

Drafting guidelines:

  • No promotional language (“industry-leading,” “innovative,” etc.)
  • Footnote every fact with a citation
  • Maintain neutral point of view (NPOV)
  • Prefer third-party sources over official company channels

Step 4: Submit via AfC (Articles for Creation)

After drafting, request formal article creation through Wikipedia’s AfC process. Review typically takes weeks to months. Reviewers assess GNG compliance; submissions that fail are declined.

Step 5: Separate registration for other language editions

English Wikipedia and other language editions (e.g., Korean Wikipedia) are separate projects. After English publication, create other language editions separately. Smaller language editions may have fewer editors and different review timelines.

COI (Conflict of Interest) When Self-Registering

Employees or affiliates writing or editing their own company’s article is classified as COI (Conflict of Interest) on Wikipedia.

Risks:

  • The article may be judged promotional and deleted immediately
  • Once marked as a COI editor, all subsequent edits may be scrutinized
  • Aggressive publication attempts can trigger AfD nominations

Recommended approach:

  • Disclose COI on the Talk page (required by Wikipedia policy)
  • Request independent editor review via AfC
  • If using external Wikipedia specialists (including agencies), verify COI guideline compliance

Wikipedia does not ban COI editing outright—it requires COI disclosure. Hiding the relationship while editing is the greater problem.

Alternatives When Wikipedia Is Not Yet Viable

If notability criteria are not met, immediate alternatives include:

  • Wikidata: Basic entity registration without GNG
  • Crunchbase: Standard database for startups and tech companies
  • AngelList (Wellfound): Startup investment information platform
  • LinkedIn company page: Google Knowledge Graph integration
  • G2, Capterra: SaaS product review platforms (stronger search engine signals)
  • Region-specific: Local startup databases, crowdfunding project pages, official corporate filings (usable as reliable sources where applicable)

Among these, Wikidata is directly included in LLM training data, so registration is recommended even before Wikipedia publication.

Applying This in Different Markets

English Wikipedia coverage rates vary by region. Outside large enterprises and some unicorns, many companies remain unlisted. Successful publication can deliver differentiated recognition in global LLMs.

Examples of reliable sources (media commonly accepted as Wikipedia citations):

  • Major national newspapers and business press in your market
  • Industry and technology trade publications with editorial oversight
  • Official records: securities filings, patent offices, competition authorities
  • Broadcast and major online news outlets with editorial standards

Language strategy: English Wikipedia has the largest impact on global LLMs (GPT, Claude, Gemini). A single non-English Wikipedia edition alone has limited influence on English-centric LLM training data.

When LLM Answers Reflect Publication

After a Wikipedia article is created, how quickly it appears in LLM answers depends on each model’s training cutoff and retraining cycle. This can take months to more than a year.

Systems that use real-time web search—ChatGPT Search, Perplexity, and similar—can cite Wikipedia directly and may reflect new entries faster. Offline training-based answers update at the next model release.

Frequently Asked Questions

Can a small startup qualify?
Size is not the criterion—media coverage is. A 10-person startup with multiple independent reports in major outlets can meet GNG. Conversely, a large company known only in one country may struggle with English Wikipedia GNG.

What if we cannot write in English?
Contributing to English Wikipedia requires English proficiency. External specialist editors or agencies are an option, but verify COI guideline compliance. Be wary of agencies promising “guaranteed publication”—that may be fraudulent (AfC outcomes are decided by the Wikipedia community).

How long does publication take?
AfC review depends on contributor volume and queue size; English Wikipedia typically takes weeks to months. Other language editions may differ based on editor availability.

How long until AI reflects the entry?
Real-time search AI (Perplexity, ChatGPT Search, etc.) may reflect changes within days to weeks. Training-data-based LLM answers update at the next retraining cycle, usually months or longer.

Should we start with Wikidata or Wikipedia?
Register Wikidata first. Basic Wikidata registration does not require GNG, and you can later link to a Wikipedia article. While building media coverage for Wikipedia, Wikidata can already enter LLM training data.

Related Sources

이 페이지를 참조하는 항목

관련 항목

📘ConceptPillar
AI Share of Voice
AI Share of Voice (AI SOV) is the proportion of brand citations within AI answers for a specific category or query pool — extending Les Binet's Share of Search concept to AI answer engine environments.
📘ConceptPillar
AI Visibility Score
AI Visibility Score quantifies how much a specific brand is exposed and cited in AI answer engines like ChatGPT, Perplexity, Gemini, and Naver Cue — a core KPI measuring brand digital asset value in the AI search era.
📘Concept
What Is Domain Authority (DA/DR)?
Domain authority is a site link trust score calculated by Moz, Ahrefs, and Semrush — not an official Google metric.
📘ConceptPillar
What Are Backlinks?
A backlink is when an external site links to your page — a trust signal for search engines and AI.
📘Concept
Entity SEO: From Keywords to Concepts in Search
Entity SEO is an optimization strategy that helps Google recognize your site and content as real-world entities rather than isolated keywords, so you become a trusted presence in AI-based search and the Knowledge Graph.
📘ConceptPillar
GEO Master Guide: 5-Area Checklist
An execution guide for Generative AI Optimization covering GEO's five areas: content, structure, technical, off-site, and measurement.
📘Concept
Google Knowledge Graph: The Core of Entity-Based Search
The Google Knowledge Graph is Google's large-scale knowledge database that stores real-world entities such as people, places, objects, and concepts and their relationships, serving as core infrastructure for AI-based search and GEO.
📘ConceptPillar
What Is AEO?
AEO is the practice of optimizing content so AI answer engines cite it.
📘ConceptPillar
What Is GEO?
GEO is the practice of optimizing content so generative AI cites it in answers.
📘Concept
E-E-A-T
E-E-A-T is the framework Google uses to evaluate content quality through Experience, Expertise, Authoritativeness, and Trustworthiness.
📘ConceptPillar
YMYL (Your Money Your Life)
YMYL (Your Money Your Life) is a content category that can affect users' money, health, safety, and life—a high-risk area where Google applies E-E-A-T most strictly.
📘ConceptPillar
Korean LLM Optimization
Korean LLM optimization is the work of optimizing content so global AI answer engines cite your content when answering Korean-language questions. Because Korean represents a smaller share of training data than English, it presents both higher barriers and distinct opportunities compared with English AEO.
📙How-toPillar
Google Business Profile
Google Business Profile (GBP) is Google's free business listing tool. It is core local SEO infrastructure for exposing company information in local search, Google Maps, and Google AI Overviews, and a direct signal of entity authority.
📘ConceptPillar
Local SEO
Local SEO is the SEO subfield that optimizes for business visibility in specific geographic areas, with five core signals: Google Business Profile, NAP consistency, local keywords, local backlinks, and reviews.
📘Concept
Mental Availability
Mental Availability is the probability that a brand comes to mind in a purchase situation.

이런 항목도 있어요

이 페이지가 도움이 됐나요?