Home / Case Studies / Sitemap Validator Tool Build
XML
Sitemap Validation
Astro · Cloudflare Workers · XML Parsing · Technical SEO

Sitemap Validator Tool Build

A sitemap validator built to answer the question most generic validators skip: not just whether the XML exists, but whether the sitemap is listing URLs that look crawlable, canonical, and useful for search discovery.

Index
Sitemap Index Detection
25
Sample URLs Per Check
Signals
Robots, Noindex, Canonicals
Open the sitemap validator

The Problem

A sitemap can be valid XML and still be bad SEO. It can list redirected URLs, noindex pages, blocked paths, duplicate canonical variants, or a stale set of URLs that no longer represents the site.

That distinction matters because many sitemap validators stop at parseability. In real audits, the more useful question is whether the sitemap is helping search engines discover the right pages.

The Build

I built the sitemap validator as a Cloudflare-backed technical SEO tool. Users can paste either a direct sitemap URL or a homepage. If they paste a homepage, the tool checks robots.txt sitemap declarations first, then falls back to the common /sitemap.xml location.

The Worker fetches the sitemap, detects whether it is a URL set or sitemap index, counts entries, validates key fields, and samples listed URLs for HTTP status, redirects, robots.txt conflicts, noindex tags, and canonical mismatches.

The Product Decisions

The validator is intentionally transparent about limits. It does not claim a valid sitemap guarantees indexing. Instead, it frames sitemap quality as a discovery signal: useful when clean, noisy when full of URLs search engines should not crawl or index.

The sampling control gives users a practical tradeoff. A small sample is fast for a quick check. A larger sample catches more patterns when the user is reviewing a launch, migration, or cleanup project.

The Takeaway

This tool turns a messy audit pattern into a repeatable workflow. It connects sitemap availability, XML structure, URL behavior, robots.txt, noindex, and canonicals into one focused report.

Paired with the robots.txt checker, it creates a practical crawl-discovery diagnostic suite: what the site asks crawlers to find, what crawlers are allowed to request, and where those signals conflict.

Built as a public portfolio asset and as a practical utility: the page has to earn trust twice, once as a usable SEO tool and once as proof that the underlying engineering choices were deliberate.

What I Built

  • Astro tool page with sitemap input and sample-size control
  • Cloudflare Worker endpoint for sitemap discovery and validation
  • XML sitemap and sitemap index detection
  • URL sampling for HTTP status, redirects, robots conflicts, noindex, and canonicals
  • Findings renderer with severity labels
  • Educational page copy explaining sitemap limits and interpretation
  • FAQ schema, breadcrumbs, OG image, sitemap entry, and share strip

More from the tool suite

Crawler Rules
Robots.txt Checker Tool Build
A focused crawl-rule tester with user-agent matching and plain-English rule explanations.
See the work →
Schema
Schema Generator Tool Build
A schema audit and JSON-LD generator powered by Cloudflare Workers and the Claude API.
See the work →
Technical SEO
Sitemap/Robots Conflicts
A real audit where sitemap and robots.txt signals contradicted each other.
See the work →

Need a technical SEO tool or audit workflow built?

I build practical SEO systems that do one useful job clearly, then wire them into the site, schema, analytics, and conversion path around them.

Book Your SEO Health Check → ← Back to all case studies

Want work like this?

Whether you need a technical audit, a public-facing tool, or a workflow that turns messy SEO judgment into a repeatable system, I would love to hear what you are building.

Send a message