/Sitemap (XML Sitemap)
📙How-to

Sitemap (XML Sitemap)

최종 업데이트:

Definition

An XML sitemap (XML Sitemap) is a file that provides a structured list of a website’s URLs in XML format. It helps search engine bots discover and crawl all important pages on a site without missing any.

It can efficiently notify search engines about pages that are hard to find through internal links alone, newly added pages, and updated pages. A sitemap works as a crawling hint; submitting one does not guarantee that every URL will be crawled or indexed immediately.


Summary

Sitemap essentials: ①Create at /sitemap.xml → ②Submit in GSC → ③Declare location in robots.txt → ④Include only important URLs (exclude noindex) → ⑤Use a sitemap index when exceeding 50,000 URLs or 50MB. Remember that internal link structure is a stronger crawling signal than a sitemap.


Basic XML Sitemap Structure

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com/</loc>
    <lastmod>2026-05-13</lastmod>
    <changefreq>weekly</changefreq>
    <priority>1.0</priority>
  </url>
  <url>
    <loc>https://example.com/blog/</loc>
    <lastmod>2026-05-10</lastmod>
    <changefreq>daily</changefreq>
    <priority>0.8</priority>
  </url>
</urlset>

Tag descriptions

  • <loc>: Absolute page URL (required)
  • <lastmod>: Last modified date in ISO 8601 format (recommended)
  • <changefreq>: Update frequency (always/hourly/daily/weekly/monthly/yearly/never) — Google uses this only as a reference
  • <priority>: Relative priority from 0.0–1.0 (Google does not rely on this heavily)

Types of Sitemaps

XML Sitemap (Standard)

The most common form. Includes a URL list and metadata. A text sitemap (sitemap.txt) is also possible but cannot carry metadata.

Sitemap Index

When URL count exceeds 50,000 or file size exceeds 50MB, split sitemaps into multiple files and create a sitemap index that points to them.

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>https://example.com/sitemap-pages.xml</loc>
  </sitemap>
  <sitemap>
    <loc>https://example.com/sitemap-blog.xml</loc>
  </sitemap>
</sitemapindex>

Media Sitemap

Dedicated sitemaps for images and video. Provides additional metadata in Google Image Search and Google Video Search.


4 Steps to Create and Submit a Sitemap

Step 1: Generate the Sitemap

CMS auto-generation

  • WordPress: Yoast SEO, RankMath, All-in-One SEO plugins generate automatically
  • Next.js: next-sitemap package or App Router sitemap.ts file
  • Shopify: Automatically generates /sitemap.xml

Manual generation: For small sites, write XML directly or use online generator tools

Step 2: Deploy on the Web Server

Place the file at the /sitemap.xml path. It must be accessible at https://example.com/sitemap.xml.

Step 3: Declare Location in robots.txt

User-agent: *
Disallow:

Sitemap: https://example.com/sitemap.xml

Declaring the sitemap location in robots.txt lets bots discover it even without direct submission. See robots.txt and AI Bots for details.

Step 4: Submit to Google Search Console

GSC → Indexing → Sitemaps → enter URL and submit. After submission, check processing status and discovered URL count.


URLs to Include/Exclude in a Sitemap

[COMPARISON_TABLE: Sitemap include vs exclude URLs]

URLs to include

  • All important content pages (blog, product, service)
  • Category and tag pages (when they are indexing targets)
  • Every URL you want indexed

URLs to exclude

  • Pages with noindex
  • Pages blocked by robots.txt
  • Old URLs that redirect (include only the final destination)
  • Duplicate parameter URLs
  • Admin, login, and checkout pages

Including URLs that should not be indexed wastes crawl budget and confuses Google.


Korea Market Application

Naver Search Advisor Sitemap Submission

Naver also supports sitemap submission. Register the sitemap URL in Naver Search Advisor → Webmaster Tools → Request → Sitemap submission. This must be done separately from Google GSC submission.

Korean URL Encoding

When including URLs with non-ASCII characters in a sitemap, apply percent encoding (URL encoding). Most CMS plugins handle this automatically, but manual creation requires caution.

https://example.com/café/menuhttps://example.com/caf%C3%A9/menu

Next.js Sitemap Configuration

Automatic sitemap generation in Next.js App Router, widely used by Korean startups:

// app/sitemap.ts
import { MetadataRoute } from 'next'

export default function sitemap(): MetadataRoute.Sitemap {
  return [
    { url: 'https://example.com', lastModified: new Date() },
    { url: 'https://example.com/blog', lastModified: new Date() },
  ]
}

Frequently Asked Questions

Q. Does submitting a sitemap guarantee crawling?
A. No. A sitemap is a crawling "hint," not a command. Google references sitemaps but decides whether to crawl based on crawl budget, internal links, and page quality. High-quality content and a solid internal link structure are stronger crawling signals than a sitemap.

Q. Do I need to set changefreq and priority tags accurately?
A. Google has stated it barely uses changefreq and priority for crawl scheduling decisions. Keeping lastmod (last modified date) accurate is more effective than over-tuning the other tags. If lastmod does not match actual changes, Google lowers its trust in the data.

Q. Is it a problem if a sitemap has too many URLs?
A. A single sitemap file is limited to 50,000 URLs / 50MB. Exceed that and you must split using a sitemap index. URL count itself is not the issue, but bulk inclusion of low-quality URLs can waste crawl budget and reduce overall site crawling efficiency.

Q. Should I include images in the sitemap?
A. If images are core content (photo portfolios, ecommerce product images), an image sitemap helps Google Image Search visibility. General content sites are crawled without an image sitemap, but having one allows richer metadata.

Q. How do I fix sitemap errors (shown as errors in GSC)?
A. Check error types in the GSC sitemap report. Main errors: ①URL could not be read (HTTP error, file inaccessible) → check server settings ②Unsupported format → validate XML structure ③URL not on site → remove redirected or 404 URLs ④URL not indexed → check noindex or quality issues.


Related Sources

  • Google Search Central (2024). Build and submit a sitemap. Google Developers.
  • Sitemaps.org (2024). Sitemap protocol. sitemaps.org.
  • Google Search Central (2024). Large site owner's guide to managing your crawl budget. Google Developers.

이 페이지를 참조하는 항목

관련 항목

📘Concept
Crawl Budget
Crawl budget is the number of pages Googlebot can and wants to crawl on your site within a given period — relevant for large sites where crawl allocation affects indexing speed and coverage.
📘Concept
Google Search Console
Google Search Console (GSC) is a free tool from Google for monitoring site search performance, diagnosing indexing issues, and submitting sitemaps — the essential foundation for SEO measurement.
📙How-to
Indexing Coverage Diagnosis
Indexing coverage diagnosis uses the GSC indexing report to check overall site indexing status, identify causes of unindexed pages, and fix them — a core SEO task.
📘ConceptPillar
What Is AEO?
AEO is the practice of optimizing content so AI answer engines cite it.
📘ConceptPillar
Internal Linking Strategy
Internal linking strategy is the practice of semantically connecting pages within your own site to optimize topic authority and bot and user navigation.
📘Concept
Crawl Depth
Crawl depth (click depth) is the number of clicks required to reach a page from the homepage. It is a core site structure metric that determines page discovery priority for search engine and AI bots and PageRank transfer efficiency.
📘ConceptPillar
Crawlability
Crawlability is the ability of search engine and AI bots to access website pages and read content. It is the most basic condition for SEO and AEO, a required step that precedes indexing and ranking.
📘Concept
Crawling vs Indexing
Crawling is the process where search engine bots follow links across the web and collect pages. Indexing is the process of analyzing collected pages and storing them in a search database. These are the first two stages of SEO’s three stages: crawling → indexing → ranking.
📙How-to
How to Allow AI Bots in robots.txt
Allowing AI bots means explicitly permitting major AI crawlers such as GPTBot, ClaudeBot, and PerplexityBot to access your site in robots.txt, exposing your content for citation in generative AI answers.
📘ConceptPillar
Site Architecture
Site architecture is the overall design of page hierarchy, URL structure, and internal linking on a website. It simultaneously determines crawl efficiency, indexing quality, and user navigation experience — a foundational SEO element.

이런 항목도 있어요

이 페이지가 도움이 됐나요?