📘Concept

Crawl Budget

최종 업데이트: May 13, 2026

Definition

Crawl budget is the number of pages Googlebot can and wants to crawl on your site within a given time frame. It combines crawl capacity (Google's limit) and crawl demand (Google's interest in your site).

Summary

Crawl budget matters mainly for large sites. Reduce low-value URLs, fix errors, strengthen internal links to important pages. GSC crawl stats show crawl patterns. Most sites under 100K URLs should prioritize content quality and indexing over crawl budget optimization.

When crawl budget matters

Google guidance

Google states crawl budget is rarely a concern for sites with fewer than ~100,000 URLs that update at most once per day. Most sites should focus on:

Content quality
Indexing coverage
Technical health

When to optimize crawl budget

Large sites: 100,000+ URLs, e-commerce, news, UGC
Frequent updates: Daily or hourly new content
Indexing delays: Important pages stuck in "Discovered — currently not indexed"
Crawl waste: Many duplicate, thin, or low-value URLs consuming crawl

Crawl budget components

Crawl capacity

Google's limit on how many URLs it can crawl without overloading your server. Affected by:

Server response time and stability
5xx errors (reduce crawl)
robots.txt and crawl rate settings in GSC

Crawl demand

Google's interest in crawling your site. Affected by:

Site popularity and authority
Content freshness and update frequency
Indexing value of URLs

Common crawl budget waste

Waste type	Example	Fix
Duplicate URLs	Parameter variants, www/non-www	Canonical, parameter handling
Thin/low-value pages	Tag pages, search results, filters	noindex, consolidate, reduce
Redirect chains	Multiple hops before final URL	Direct 301 redirects
Soft 404s	Empty pages returning 200	Fix content or 404/301
Blocked resources	CSS/JS blocked unnecessarily	Allow critical resources in robots.txt
Infinite spaces	Calendar pagination, session IDs	noindex, limit pagination

See Indexing Coverage Diagnosis for unindexed page causes.

How to monitor crawl budget

GSC Crawl stats report

GSC → Settings → Crawl stats (if available)

Metrics: Total crawl requests, average response time, response code distribution

Use: Identify crawl spikes, error patterns, and response time issues.

Server log analysis

Analyze server logs for Googlebot requests. Tools: Screaming Frog Log Analyzer, custom scripts. Shows which URLs Googlebot crawls most.

URL Inspection sampling

GSC URL Inspection → check "Last crawl" dates for important pages. Old crawl dates may indicate low crawl priority.

Five crawl budget optimization strategies

1. Reduce low-value URL count

noindex or remove tag pages, internal search results, faceted navigation duplicates. Consolidate similar content.

2. Fix crawl errors

Resolve 5xx, redirect loops, and soft 404s. Errors waste crawl and reduce capacity.

3. Optimize sitemap

Submit sitemap with important URLs only. Remove deleted or noindex URLs from sitemap. See Sitemap for details.

4. Strengthen internal linking

Important pages need strong internal links to signal crawl priority. Orphan pages get less crawl.

5. Improve server performance

Faster response times allow more crawl within capacity. Core Web Vitals and server stability matter.

robots.txt and crawl budget

Blocking URLs in robots.txt prevents crawl but does not remove from index if already indexed. Use noindex for pages you don't want indexed. See How to Allow AI Bots in robots.txt for AI bot considerations.

Local market application

Crawl budget for local sites

Most local business sites have few URLs — crawl budget rarely an issue. Focus on indexing and content quality. Multi-location sites with many location pages may need crawl optimization.

CMS and hosting

Some CMS generate many low-value URLs (tags, archives). Audit and noindex or consolidate to preserve crawl for important pages.

Frequently asked questions

Q. Should I use "Crawl rate" in GSC?
A. Google recommends leaving default unless server is overloaded. Reducing crawl rate can delay indexing; increasing rarely helps.

Q. Does blocking CSS/JS hurt crawl budget?
A. Blocking render-critical resources can prevent proper rendering and indexing. Allow Googlebot to access CSS/JS needed to render content.

Q. How do I know if crawl budget is my problem?
A. Important pages stuck "Discovered — currently not indexed" for months, GSC crawl stats show high crawl on low-value URLs, or server logs show Googlebot hitting duplicates/thin pages heavily.

Q. Does crawl budget affect AI bots?
A. AI bots (GPTBot, etc.) have separate crawl behavior. robots.txt controls them independently. See AI Bots robots.txt Matrix for details.

Q. Can I request more crawl budget?
A. No direct request. Improve site quality, fix errors, reduce waste — Google allocates more crawl to valuable, healthy sites.