Crawl Budget
Definition
Crawl budget is the number of pages Googlebot can and wants to crawl on your site within a given time frame. It combines crawl capacity (Google's limit) and crawl demand (Google's interest in your site).
Summary
Crawl budget matters mainly for large sites. Reduce low-value URLs, fix errors, strengthen internal links to important pages. GSC crawl stats show crawl patterns. Most sites under 100K URLs should prioritize content quality and indexing over crawl budget optimization.
When crawl budget matters
Google guidance
Google states crawl budget is rarely a concern for sites with fewer than ~100,000 URLs that update at most once per day. Most sites should focus on:
- Content quality
- Indexing coverage
- Technical health
When to optimize crawl budget
- Large sites: 100,000+ URLs, e-commerce, news, UGC
- Frequent updates: Daily or hourly new content
- Indexing delays: Important pages stuck in "Discovered — currently not indexed"
- Crawl waste: Many duplicate, thin, or low-value URLs consuming crawl
Crawl budget components
Crawl capacity
Google's limit on how many URLs it can crawl without overloading your server. Affected by:
- Server response time and stability
- 5xx errors (reduce crawl)
- robots.txt and crawl rate settings in GSC
Crawl demand
Google's interest in crawling your site. Affected by:
- Site popularity and authority
- Content freshness and update frequency
- Indexing value of URLs
Common crawl budget waste
| Waste type | Example | Fix |
|---|---|---|
| Duplicate URLs | Parameter variants, www/non-www | Canonical, parameter handling |
| Thin/low-value pages | Tag pages, search results, filters | noindex, consolidate, reduce |
| Redirect chains | Multiple hops before final URL | Direct 301 redirects |
| Soft 404s | Empty pages returning 200 | Fix content or 404/301 |
| Blocked resources | CSS/JS blocked unnecessarily | Allow critical resources in robots.txt |
| Infinite spaces | Calendar pagination, session IDs | noindex, limit pagination |
See Indexing Coverage Diagnosis for unindexed page causes.
How to monitor crawl budget
GSC Crawl stats report
GSC → Settings → Crawl stats (if available)
Metrics: Total crawl requests, average response time, response code distribution
Use: Identify crawl spikes, error patterns, and response time issues.
Server log analysis
Analyze server logs for Googlebot requests. Tools: Screaming Frog Log Analyzer, custom scripts. Shows which URLs Googlebot crawls most.
URL Inspection sampling
GSC URL Inspection → check "Last crawl" dates for important pages. Old crawl dates may indicate low crawl priority.
Five crawl budget optimization strategies
1. Reduce low-value URL count
noindex or remove tag pages, internal search results, faceted navigation duplicates. Consolidate similar content.
2. Fix crawl errors
Resolve 5xx, redirect loops, and soft 404s. Errors waste crawl and reduce capacity.
3. Optimize sitemap
Submit sitemap with important URLs only. Remove deleted or noindex URLs from sitemap. See Sitemap for details.
4. Strengthen internal linking
Important pages need strong internal links to signal crawl priority. Orphan pages get less crawl.
5. Improve server performance
Faster response times allow more crawl within capacity. Core Web Vitals and server stability matter.
robots.txt and crawl budget
Blocking URLs in robots.txt prevents crawl but does not remove from index if already indexed. Use noindex for pages you don't want indexed. See How to Allow AI Bots in robots.txt for AI bot considerations.
Local market application
Crawl budget for local sites
Most local business sites have few URLs — crawl budget rarely an issue. Focus on indexing and content quality. Multi-location sites with many location pages may need crawl optimization.
CMS and hosting
Some CMS generate many low-value URLs (tags, archives). Audit and noindex or consolidate to preserve crawl for important pages.
Frequently asked questions
Q. Should I use "Crawl rate" in GSC?
A. Google recommends leaving default unless server is overloaded. Reducing crawl rate can delay indexing; increasing rarely helps.
Q. Does blocking CSS/JS hurt crawl budget?
A. Blocking render-critical resources can prevent proper rendering and indexing. Allow Googlebot to access CSS/JS needed to render content.
Q. How do I know if crawl budget is my problem?
A. Important pages stuck "Discovered — currently not indexed" for months, GSC crawl stats show high crawl on low-value URLs, or server logs show Googlebot hitting duplicates/thin pages heavily.
Q. Does crawl budget affect AI bots?
A. AI bots (GPTBot, etc.) have separate crawl behavior. robots.txt controls them independently. See AI Bots robots.txt Matrix for details.
Q. Can I request more crawl budget?
A. No direct request. Improve site quality, fix errors, reduce waste — Google allocates more crawl to valuable, healthy sites.
Related sources
- Google Search Central. Large site owner's guide to managing crawl budget. https://developers.google.com/search/docs/crawling-indexing/large-site-managing-crawl-budget
- Google Search Central. How Google crawls and indexes. https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers
- Google Search Central. Crawl stats report. https://support.google.com/webmasters/answer/9679690
이 페이지를 참조하는 항목
- 📙How-tollms.txt Writing Guide
- 📘ConceptHelpful Content System: Google's People-First Content Evaluation System
- 📘ConceptGoogle Search Console
- 📙How-toIndexing Coverage Diagnosis
- 📘ConceptGEO Master Guide: 5-Area Checklist
- 📙How-toContent Pruning
- 📘ConceptDoorway Pages
- 📘ConceptThin Content
- 📙How-toNaver Search Advisor Registration Guide
- 📘ConceptCanonical Tag
- 📙How-toHow to Write Image Alt Text
- 📘ConceptInternal Linking Strategy
- 📘ConceptNoindex
- 📘ConceptPagination
- 📘Concept301 Redirect
- 📘ConceptCore Web Vitals
- 📘ConceptCrawl Depth
- 📘ConceptCrawlability
- 📘ConceptCrawling vs Indexing
- 📘ConceptHTTP Status Codes
- 📙How-toHow to Allow AI Bots in robots.txt
- 📘ConceptSite Architecture
- 📙How-toSitemap (XML Sitemap)
- 📘ConceptSubdomain vs Subdirectory
- 📘ConceptTTFB (Time to First Byte)
- 📘ConceptURL Parameters
- 📒ToolAhrefs