CMS Detection System

Technical documentation of the server-side CMS detection system. Covers architecture, detection channels, scoring model and known limitations.

v0.8 Beta 41 CMS-Signaturen 15 detection channels Node.js · No dependencies
Go to tool

About the CMS Detection System

The CMS Detection System analyses websites using up to 15 independent detection channels and identifies the content management system, framework or website builder in use. Detection runs entirely server-side – no browser automation or external services required.

The analysis combines passive signals (HTTP headers, cookies, meta tags) with active probing (path checks, DNS resolution, feed fetching, favicon hashing) and evaluates all matches in a weighted scoring model. The result is a confidence value (Very likely / Likely / Possible) and – where determinable – the exact CMS version.

Detection scope

  • CMS signatures: 41 systems (open source, SaaS, headless, static)
  • Detection channels: 15 independent methods per domain
  • CDN domain signals: ~50 known asset domains
  • DNS fingerprints: 17 SaaS systems via CNAME resolution
  • Version extraction: 5 sources (meta, feed, header, regex, comments)
  • Status: v0.8 Beta – production-ready architecture, unvalidated test corpus

Technical details

Detection channels (15)

  • DNS / CNAME: Weight 70 – strongest signal
  • Meta Generator: Weight 60
  • HTTP Header: Weight 55
  • X-Powered-By / Server: Weight 55
  • Cookie: Weight 45
  • Path Probe (HEAD): Weight 40
  • Favicon Hash (MD5): Weight 40
  • Feed Generator: Weight 35 (×2 with generator tag)
  • JS Variables: Weight 35
  • CDN Domain Signal: Weight 30
  • HTML Attribute (<html>): Weight 30
  • robots.txt / sitemap.xml: Weight 25
  • Script Tags (src): Weight 25
  • 404 Error-Page Fingerprint: Weight 20
  • Link Tags (href): Weight 20

Scoring model

  • Confidence High: Score ≥ 140 or ≥ 90 + 4 channels
  • Confidence High (DNS): Score ≥ 70 + 3 channels
  • Confidence Medium: Score ≥ 45 or ≥ 30 + 2 channels
  • Confidence Low: Below the medium threshold
  • Multi-channel bonus: +30 (≥2), +60 (≥3), +90 (≥4 channels)
  • HTML pattern bonus: +10 from 3, +20 from 4 matches
  • Negative indicators: Excluded when score ratio > 3:1
  • Feed generator bonus: Double weight for <generator> tag

CMS coverage (41 systems)

  • Open source CMS: WordPress, Joomla, Drupal, TYPO3, Contao
  • E-commerce: Shopify, Magento, WooCommerce, PrestaShop, OpenCart, OXID
  • SaaS builders: Wix, Squarespace, Webflow, Ghost, Jimdo, Sitejet, HubSpot CMS, Weebly, Framer
  • Headless / API-first: Storyblok, Contentful, Sanity, Strapi, Builder.io, Prismic
  • Static site generators: Hugo, Jekyll, Eleventy, Gatsby, Next.js, Nuxt.js
  • PHP frameworks: Laravel, Symfony
  • Enterprise CMS: Pimcore, Neos, Craft CMS, Sitecore
  • Community / forum: WoltLab, phpBB
  • Other: Mono

Fetch & network

  • Protocol: HTTP/1.1 + HTTPS, native Node.js
  • Redirect handling: Up to 5 hops (301/302/303/307/308)
  • HTTP→HTTPS fallback: Automatic on connection error
  • Body limit: 600 KB per page
  • Main page timeout: 10 seconds
  • Path check timeout: 4 seconds (HEAD)
  • Parallel HEAD requests: Max. 8 concurrent
  • DNS resolution: Parallel to main fetch
  • Feed paths: 7 candidates sequentially

Version extraction (5 sources)

  • Signature regex: CMS-specific patterns in source code
  • Meta generator: content attribute with version number
  • Feed generator tag: <generator> in RSS/Atom
  • HTTP header: X-Powered-By / Server with version
  • HTML comments: Inline version strings (e.g. WordPress, TYPO3)
  • WordPress specific: ?ver= parameter in script URLs
  • Priority: First found source wins per CMS

CDN domain signals

  • Sources: src, href, CSS url() from entire HTML
  • Covered services: ~20 systems with ~50 domain patterns
  • Shopify: cdn.shopify.com, shopifycloud.com
  • Wix: wixstatic.com, parastorage.com
  • Squarespace: squarespace-cdn.com, sqspcdn.com
  • Contentful: ctfassets.net (images, assets, downloads)
  • Sanity: cdn.sanity.io
  • Builder.io: cdn.builder.io
  • Minimum matches: 1 domain sufficient (exclusive domains)

DNS fingerprinting

  • Method: CNAME resolution via Node.js dns.promises
  • Timing: Parallel to main fetch (no additional latency)
  • Covered systems: 17 SaaS platforms
  • Shopify: myshopify.com, shopify.com
  • Webflow: proxy.webflow.com, webflow.io
  • Netlify / Vercel: netlify.com, vercel-dns.com
  • WordPress.com: wordpress.com, wpcomstaging.com
  • Weight: 70 points – strongest single signal

Backend

  • Runtime: Node.js (CommonJS)
  • Dependencies: No external npm packages
  • Module: https, http, crypto, dns (all built-in)
  • Parallelisation: Promise.all for all 14 checks
  • Code size: ~1,500 lines (cms-detector.js)
  • API endpoint: /api/cms-detect?domain=
  • Response format: JSON (detectedCMS, confidence, details, version)

Frontend

  • JavaScript: Vanilla ES6+, no framework
  • CSS: Injected via design system variables
  • Icons: Inline SVG (Lucide), currentColor
  • Progress display: 14 animated steps with icons
  • Result cards: Score bars, channel tags, confidence badge
  • Version badge: Inline next to CMS name
  • Indicators: Collapsible list per CMS card
  • Enter key: Supported

Known limitations (v0.8)

  • No test corpus: Score thresholds not calibrated
  • Favicon hashes: Database not yet verified
  • Cloudflare: Masks headers, IPs and some CNAMEs
  • No caching: Every domain is fetched fresh
  • Version detection: Reliably tested for ~5 CMS only
  • CDN patterns: Unvalidated, possible false positives
  • No TLS fingerprinting: Stage 4 still pending
  • Path to v1.0: 200+ test domains, precision ≥ 90%

Built by Sören Meier, 2026
Technical implementation: cms-detector.js v4 / cms-detection.js v4 | Experimental system – results without warranty.