Sitemap/Robots.txt Conflicts Wasting Crawl Budget
A Phase 1 technical audit on a SaaS client surfaced three quiet crawl-budget issues no one had spotted: URLs simultaneously listed in the sitemap and blocked by robots.txt, 18 redundant User-agent blocks in a single robots file, and 554 sitemap URLs with no lastmod data. None caused a ranking event. All capped performance.
The Situation
During a Phase 1 technical SEO audit for a SaaS client, I identified a pattern of conflicting crawl directives that had gone unnoticed — likely because none of them triggered any dramatic ranking event. The site was performing, but it was leaving efficiency on the table.
Finding 1: URLs Listed in Sitemap While Blocked by Robots.txt
Three URLs were simultaneously present in the XML sitemap and blocked by the robots.txt file. That's a direct contradiction — the sitemap tells Googlebot to crawl those URLs, while robots.txt tells it not to. Googlebot resolves this conflict, but not predictably, and crawl budget gets consumed either way.
This kind of conflict is easy to miss on CMS-generated sitemaps. The sitemap updates automatically as content is added or changed. The robots.txt doesn't update to match. If nobody is manually cross-referencing the two files, conflicts accumulate silently.
Finding 2: 18 Separate User-agent Blocks in Robots.txt
A well-structured robots.txt uses a single User-agent: * block for all blanket directives. This site's file had 18 of them — each one firing independently, all of them redundant. Beyond the maintenance headache, a fragmented file like that is significantly harder to debug when something actually goes wrong. Consolidated into one clean block.
Finding 3: No lastmod Data Across 554 Sitemap URLs
The CMS was auto-generating the sitemap without any lastmod attribute. That means all 554 URLs look equally stale to Googlebot — there's no signal about which pages have been updated recently and should be prioritized for recrawling. For a site with a substantial content library, this removes one of the key levers for managing crawl efficiency.
The Fix
Three changes, in order of priority: removed the conflicting URLs from the sitemap, consolidated the 18 User-agent: * blocks into one, and flagged the lastmod gap for the development team to resolve at the CMS level.
The Takeaway
None of these issues in isolation would cause a meaningful ranking drop. Together, they represent the kind of technical debt that quietly caps performance — crawl budget wasted on ambiguous directives, a maintenance structure that doesn't scale, and a sitemap that fails to do its job as a freshness signal. CMS-generated sitemaps need human review. Automation handles volume. It doesn't handle accuracy.
Key Findings
- 3 URLs simultaneously in sitemap and blocked by robots.txt
- 18 redundant
User-agent: *blocks consolidated into one - 554 sitemap URLs auto-generated without
lastmoddata - No manual cross-reference between sitemap and robots.txt was happening
- CMS-driven generation introduced silent technical debt over time
- None of the issues triggered a ranking drop — all capped efficiency
More from the case study shelf
Quiet technical debt eating your crawl budget?
A $500 SEO Health Check audits sitemap, robots.txt, indexing, and crawl efficiency — and tells you exactly what to fix first.
Book Your Health Check → ← Back to all case studies Or see the full services & pricing →Want diagnostic work like this?
Whether you need a full technical audit, a crawl budget review, or someone to look at your sitemap and robots.txt and tell you what's actually happening — I'd love to hear what you're working on.