Sitemap/Robots.txt Conflicts Wasting Crawl Budget

A Phase 1 technical audit on a SaaS client surfaced three quiet crawl-budget issues no one had spotted: URLs simultaneously listed in the sitemap and blocked by robots.txt, 18 redundant User-agent blocks in a single robots file, and 554 sitemap URLs with no lastmod data. None caused a ranking event. All capped performance.

Sitemap/Robots Conflicts

18 → 1

User-Agent Blocks

554

URLs Missing lastmod

The Situation

During a Phase 1 technical SEO audit for a SaaS client, I identified a pattern of conflicting crawl directives that had gone unnoticed — likely because none of them triggered any dramatic ranking event. The site was performing, but it was leaving efficiency on the table.

Finding 1: URLs Listed in Sitemap While Blocked by Robots.txt

Three URLs were simultaneously present in the XML sitemap and blocked by the robots.txt file. That's a direct contradiction — the sitemap tells Googlebot to crawl those URLs, while robots.txt tells it not to. Googlebot resolves this conflict, but not predictably, and crawl budget gets consumed either way.

This kind of conflict is easy to miss on CMS-generated sitemaps. The sitemap updates automatically as content is added or changed. The robots.txt doesn't update to match. If nobody is manually cross-referencing the two files, conflicts accumulate silently.

Finding 2: 18 Separate User-agent Blocks in Robots.txt

A well-structured robots.txt uses a single User-agent: * block for all blanket directives. This site's file had 18 of them — each one firing independently, all of them redundant. Beyond the maintenance headache, a fragmented file like that is significantly harder to debug when something actually goes wrong. Consolidated into one clean block.

Finding 3: No lastmod Data Across 554 Sitemap URLs

The CMS was auto-generating the sitemap without any lastmod attribute. That means all 554 URLs look equally stale to Googlebot — there's no signal about which pages have been updated recently and should be prioritized for recrawling. For a site with a substantial content library, this removes one of the key levers for managing crawl efficiency.

The Fix

Three changes, in order of priority: removed the conflicting URLs from the sitemap, consolidated the 18 User-agent: * blocks into one, and flagged the lastmod gap for the development team to resolve at the CMS level.

The Takeaway

None of these issues in isolation would cause a meaningful ranking drop. Together, they represent the kind of technical debt that quietly caps performance — crawl budget wasted on ambiguous directives, a maintenance structure that doesn't scale, and a sitemap that fails to do its job as a freshness signal. CMS-generated sitemaps need human review. Automation handles volume. It doesn't handle accuracy.

Key Findings

3 URLs simultaneously in sitemap and blocked by robots.txt
18 redundant User-agent: * blocks consolidated into one
554 sitemap URLs auto-generated without lastmod data
No manual cross-reference between sitemap and robots.txt was happening
CMS-driven generation introduced silent technical debt over time
None of the issues triggered a ranking drop — all capped efficiency

Other work

More from the case study shelf

GSC Diagnostics

GSC Coverage Noise vs. Real Problems

145 "not indexed" entries that turned out to be nothing

See the work →

Equity Recovery

Capturing Ghost Equity on a Static Site

Three URLs ranking on page one of Google — all 404s

See the work →

Backlinks

Disavow Over-Sweep on a YMYL Site

Surgical rollback after a disavow neutralized link equity

See the work →

Quiet technical debt eating your crawl budget?

A $500 SEO Health Check audits sitemap, robots.txt, indexing, and crawl efficiency — and tells you exactly what to fix first.

Book Your Health Check → ← Back to all case studies Or see the full services & pricing →

Want diagnostic work like this?

Whether you need a full technical audit, a crawl budget review, or someone to look at your sitemap and robots.txt and tell you what's actually happening — I'd love to hear what you're working on.

Phone: (606) 755-8010

Email: hello@bree-sharp.com

LinkedIn: linkedin.com/in/writerbreesharp

Send a message

Free: The 10-Minute Local SEO Self-Check