Meta Tag Optimizer

Technical documentation of the Meta Tag Optimizer: PHP-based URL crawler, multi-provider AI architecture with rule-based fallback, and outputs for Title, Description, Keywords, Open Graph, JSON-LD and Robots.

v1.0 7 AI providers 6 output types PHP · Vanilla JS
Go to tool

About the Meta Tag Optimizer

The Meta Tag Optimizer crawls any URL server-side and extracts the full page content – H1, H2 headings, paragraphs, existing meta tags, JSON-LD blocks and body text. This content is passed to an AI provider (or a rule-based fallback) which generates optimized meta tags based on what the page is actually about, not just its existing metadata.

The tool supports manual input as an alternative to URL crawling. All fields are editable before generation, allowing overrides for title, keywords, description and page type. The AI provider is fully swappable via a single config line – from no AI to Gemini, Claude, GPT-4o, Perplexity, Grok or any OpenAI-compatible endpoint.

Tool scope

  • Input methods: URL crawl or manual entry
  • Crawler: PHP cURL, up to 5 redirects, 10s timeout
  • Content extraction: H1, H2s, paragraphs, body text (4,000 chars), JSON-LD, existing tags
  • AI providers: 7 (none/rule-based, Anthropic, OpenAI, Google, Perplexity, Grok, OwnAI)
  • Output types: Title, Description, Keywords, Open Graph, Twitter Card, JSON-LD, Robots
  • Rate limiting: File-based, per IP, configurable window
  • Deployment: Single PHP directory, no framework, no database

Technical details

Crawler (Crawler.php)

  • Protocol: HTTP/HTTPS via PHP cURL
  • Redirects: Up to 5 hops (CURLOPT_FOLLOWLOCATION)
  • Timeout: 10 seconds
  • SSL: CURLOPT_SSL_VERIFYPEER enabled
  • User-Agent: MetaTagOptimizer/1.0
  • Content limit: 50,000 characters (configurable)
  • Parser: PHP DOMDocument + DOMXPath
  • Encoding: HTML-ENTITIES via mb_convert_encoding

Content extraction

  • Title: <title> tag
  • Meta tags: description, keywords, author, robots, canonical
  • Open Graph: og:title, og:description, og:image, og:type
  • Twitter Card: twitter:card, twitter:title, twitter:description
  • H1: First heading element
  • H2s: Up to 5 subheadings
  • Paragraphs: Up to 6 paragraphs >80 chars
  • JSON-LD: All <script type="application/ld+json"> blocks
  • Body text: Stripped of nav/header/footer/scripts
  • Page type: Auto-detected from JSON-LD or URL pattern

AI provider system

  • Interface: ProviderInterface with generate(string): string
  • Factory: ProviderFactory::create($config) – single config line switch
  • none: Rule-based, no API key required
  • anthropic: Claude via /v1/messages
  • openai: GPT-4o via /v1/chat/completions
  • google: Gemini via generateContent API
  • perplexity: OpenAI-compatible endpoint
  • grok: xAI via api.x.ai
  • ownai: Any OpenAI-compatible custom endpoint

Generator (MetaTagGenerator.php)

  • Rule-based title: Strips domain suffix, trims to 60 chars
  • Rule-based description: First complete sentences up to 155 chars
  • AI prompt: 4,000 chars of page content + H1, H2s, keywords
  • AI output: JSON with title, description, keywords, suggestions
  • Fallback: Rule-based if AI response is unparseable
  • OG type: article or website based on page type
  • JSON-LD type: Article, Product, Organization or WebPage
  • Robots: 3 variants – standard, AI open, AI block

API (api.php)

  • Method: POST, JSON body
  • Action crawl: Fetches URL, returns page data + provider info
  • Action generate: Takes page data + overrides, returns all tags
  • CORS: Origin-restricted to deploying domain
  • Rate limiting: File-based per IP, 10 req / 60s (configurable)
  • Error handling: JSON error responses with HTTP status codes
  • Response format: {ok, data} or {ok, error}

Frontend

  • JavaScript: Vanilla ES6+, no framework
  • Two-step flow: Crawl → auto-generate, or manual entry → generate
  • Mode toggle: "Crawl URL" / "Manual Input" with active state
  • Override fields: Title, keywords, description, page type
  • Char counters: Live feedback for title (30–60) and description (120–160)
  • Tabs: Title & Desc, Open Graph, Twitter Card, JSON-LD, Robots
  • Copy buttons: Per output block, clipboard API
  • AI badge: Shows active provider name when AI is enabled
  • Syntax highlighting: Inline HTML tag coloring in output

File structure

  • api.php: Request handler, rate limiter, CORS
  • config.php: Provider, API key, model, crawler settings
  • Crawler.php: URL fetch + DOM content extraction
  • Generator.php: Rule-based + AI tag generation (MetaTagGenerator)
  • providers/: ProviderInterface, Factory, 6 provider classes
  • index.html: Frontend (one file per language/deployment)
  • tmp/: Rate limit JSON files (needs write permission)

Known limitations (v1.0)

  • JS-rendered pages: No headless browser – JS-only content not crawled
  • Login-protected pages: No authentication support
  • AI response time: 5–15s depending on provider and model
  • Rule-based keywords: Not generated without AI provider
  • Rate limiting: File-based only, no distributed cache
  • JSON-LD output: Simplified schema, not full structured data audit
  • llms.txt: Removed from output – separate tool on ai-ready-check.de

Built by Sören Meier, 2026
Technical implementation: Crawler.php · MetaTagGenerator.php · ProviderFactory.php | Deployed on Alpine Linux with lighttpd.