/Noindex
📘Concept

Noindex

최종 업데이트:

Definition

noindex is a crawl control mechanism that instructs search engines not to register a web page in the search index. It is delivered via robots meta tags or HTTP response headers.

noindex separates crawling from indexing. Bots continue visiting the page (crawling), read the noindex directive, and do not display the page in search results. Blocking page access entirely with robots.txt prevents reading the noindex directive, so results differ. See Crawling vs Indexing for details.


Summary

noindex essentials: ①<meta name="robots" content="noindex"/> → insert in <head> → ②Bots crawl but exclude from indexing → ③Blocking with robots.txt prevents reading noindex, so it has no effect → ④Suitable targets: thank-you pages, login, internal search results, parameter pages → ⑤For complete page deletion, 410 response code is more reliable.


7 Suitable Targets for noindex

1. Thank-You and Confirmation Pages

Post-transaction pages such as payment confirmation and form submission completion. Search exposure provides meaningless user experience and wastes crawl budget.

2. Login and Registration Pages

Login and registration pages for authentication-required services have no value to unauthenticated visitors. noindex focuses crawl budget on actual content pages.

3. Internal Search Result Pages

On-site search result pages in the form /search?q=keyword. Infinite URL combinations exhaust crawl budget, and Google may evaluate such pages as low-quality auto-generated pages.

4. URL Parameter Duplicate Pages

Parameter variant URLs that continue to be crawled despite canonical tag handling can have noindex applied additionally. See URL Parameters for details.

5. Thin Content Pages

Hundreds of thin listing pages created by category filters, tag archive pages, etc. noindex pages with no unique value to improve overall site indexing quality. See Content Pruning for details.

6. Staging and Test Environments

Apply noindex to test servers like staging.example.com before production deployment to prevent accidental Google indexing.

7. Personal and Internal Documents

Internal documents that should not be public but are accidentally crawlable. Authentication protection takes priority; noindex is a secondary measure.


noindex Implementation Methods

HTML Meta Tag (Most Common)

<head>
  <meta name="robots" content="noindex" />
</head>

To block crawling as well:

<meta name="robots" content="noindex, nofollow" />

Control specific bots only:

<meta name="googlebot" content="noindex" />

HTTP Header (For Non-HTML Resources)

PDF, images, JavaScript files, etc.:

X-Robots-Tag: noindex

noindex vs nofollow Difference

  • noindex: Excludes this page from the index. Links on the page are still followed.
  • nofollow: Does not follow links on this page. The page itself may still be indexed.
  • noindex, nofollow: Excludes from indexing and does not follow links simultaneously.

noindex vs robots.txt Difference

[COMPARISON_TABLE: noindex vs robots.txt differences]

noindex (meta/header)

  • Bot visit: ✅ Allowed
  • Indexing: ❌ Excluded
  • Link following: Configurable separately
  • PageRank: Can pass (if no nofollow)
  • Suitable situation: Allow access but exclude from search

robots.txt Disallow

  • Bot visit: ❌ Blocked
  • Indexing: Gray area (blocking prevents reading noindex)
  • Link following: ❌ Blocked
  • PageRank: Does not pass
  • Suitable situation: Crawl budget protection, complete blocking of sensitive resources

Important: Pages blocked by robots.txt cannot be read for noindex directives. To exclude from indexing only, noindex meta tags must be used while crawling is allowed.

See robots.txt and AI Bots for details.


Re-indexing After Removing noindex

After removing noindex, it takes time for Google to re-index the page. For faster processing:

  1. Use "URL Inspection → Request Indexing" in Google Search Console
  2. Include the URL in the XML sitemap and submit
  3. Verify internal links point to the page

See Indexing Coverage for details.


Application in the Korean Market

Naver Search noindex Support

Naver search bot (Yeti) supports <meta name="robots" content="noindex"/>. However, a more reliable method to control Naver search exposure is using URL blocking in Naver Search Advisor.

noindex Cases in Korean E-commerce

Common noindex applications on Korean e-commerce sites:

  • Sort filter URLs (?sort=price, ?sort=latest)
  • Cart and order completion pages
  • Member-only my page
  • Out-of-stock product temporary pages (noindex if restocking expected; 410 if permanently discontinued)

Implementation by CMS

In WordPress, set page-level noindex via Yoast SEO or RankMath plugin "Search appearance" settings. In Next.js, set robots: { index: false } in generateMetadata().


Frequently Asked Questions

Q. Does a noindex page disappear from search immediately?
A. No. Google removes it from the index only after crawling the page and reading noindex. This process can take days to weeks. For fast removal, use Google Search Console's "URL Removal" as a temporary measure, but fundamentally maintain noindex or 410 response.

Q. What happens if I accidentally set noindex on an important page?
A. Google removes it from the index on the next crawl. Remove noindex immediately upon discovery and request re-indexing in Google Search Console. Recovering previous rankings can take weeks. Maintaining a QA checklist to prevent noindex mistakes on staging before deployment is important.

Q. Are links from noindex pages to other pages also ignored?
A. With noindex alone, links are still followed (PageRank can pass). To block link following as well, use noindex, nofollow together. However, most noindex target pages (thank-you pages, login pages) have no external links, so noindex alone is typical.

Q. Is there a better method than noindex to remove a page entirely?
A. For permanent page deletion, 410 (Gone) HTTP status code is most reliable. Google recognizes 410 and quickly removes the URL from the index. Use noindex when the page exists but should not appear in search; use 410 when deleting the page itself.

Q. Can I use canonical tags and noindex on the same page?
A. Not recommended. Canonical requests "treat this URL as representative" for indexing; noindex requests "do not index." They contradict each other. Google tends to prioritize noindex in such cases, but confusion can cause unexpected results. Use only one per page.


Sources

  • Google Search Central (2024). Block search indexing with noindex. Google Developers.
  • Google Search Central (2024). robots.txt vs noindex — Which should I use? Google Search Central Blog.
  • John Mueller, Google (2023). How Google processes noindex directives. Google Search Central.

이 페이지를 참조하는 항목

관련 항목

📘Concept
Crawl Budget
Crawl budget is the number of pages Googlebot can and wants to crawl on your site within a given period — relevant for large sites where crawl allocation affects indexing speed and coverage.
📘Concept
Google Search Console
Google Search Console (GSC) is a free tool from Google for monitoring site search performance, diagnosing indexing issues, and submitting sitemaps — the essential foundation for SEO measurement.
📙How-to
Indexing Coverage Diagnosis
Indexing coverage diagnosis uses the GSC indexing report to check overall site indexing status, identify causes of unindexed pages, and fix them — a core SEO task.
📘ConceptPillar
What Is AEO?
AEO is the practice of optimizing content so AI answer engines cite it.
📙How-to
Content Pruning
Content pruning is an SEO strategy that systematically improves, consolidates, or deletes low-quality and outdated pages to strengthen sitewide quality signals.
📘ConceptPillar
Canonical Tag
A canonical tag is an HTML meta tag that tells search engines 'this URL is the representative version' when duplicate or similar content exists across multiple URLs. It resolves duplicate content problems and concentrates PageRank on the canonical URL—a core on-page SEO tool.
📘Concept
Crawling vs Indexing
Crawling is the process where search engine bots follow links across the web and collect pages. Indexing is the process of analyzing collected pages and storing them in a search database. These are the first two stages of SEO’s three stages: crawling → indexing → ranking.
📘Concept
HTTP Status Codes
HTTP status codes are three-digit codes returned when a server responds to client requests. In SEO, codes such as 200 (OK), 301 (permanent redirect), 302 (temporary redirect), 404 (not found), 410 (gone), and 500 (server error) directly affect crawling, indexing, and PageRank transfer.
📙How-to
How to Allow AI Bots in robots.txt
Allowing AI bots means explicitly permitting major AI crawlers such as GPTBot, ClaudeBot, and PerplexityBot to access your site in robots.txt, exposing your content for citation in generative AI answers.
📘ConceptPillar
Site Architecture
Site architecture is the overall design of page hierarchy, URL structure, and internal linking on a website. It simultaneously determines crawl efficiency, indexing quality, and user navigation experience — a foundational SEO element.

이런 항목도 있어요

이 페이지가 도움이 됐나요?