Plan and audit programmatic SEO pages generated at scale from structured data. Use when designing templates, URL systems, internal linking, quality gates, and index-bloat safeguards for pages at scale.

SKILL.md•seo-programmatic

Programmatic SEO Analysis & Planning

Build and audit SEO pages generated at scale from structured data sources. Enforces quality gates to prevent thin content penalties and index bloat.

When to Use

Use when the user wants programmatic SEO planning or review.
Use when designing templates, data-driven pages, or scalable URL systems.
Use when preventing thin content and index bloat across large page sets.

Data Source Assessment

Evaluate the data powering programmatic pages:

CSV/JSON files: Row count, column uniqueness, missing values
API endpoints: Response structure, data freshness, rate limits
Database queries: Record count, field completeness, update frequency
Data quality checks:
- Each record must have enough unique attributes to generate distinct content
- Flag duplicate or near-duplicate records (>80% field overlap)
- Verify data freshness; stale data produces stale pages

Template Engine Planning

Design templates that produce unique, valuable pages:

Variable injection points: Title, H1, body sections, meta description, schema
Content blocks: Static (shared across pages) vs dynamic (unique per page)
Conditional logic: Show/hide sections based on data availability
Supplementary content: Related items, contextual tips, user-generated content
Template review checklist:
- Each page must read as a standalone, valuable resource
- No "mad-libs" patterns (just swapping city/product names in identical text)
- Dynamic sections must add genuine information, not just keyword variations

URL Pattern Strategy

Common Patterns

/tools/[tool-name]: Tool/product directory pages
/[city]/[service]: Location + service pages
/integrations/[platform]: Integration landing pages
/glossary/[term]: Definition/reference pages
/templates/[template-name]: Downloadable template pages

URL Rules

Lowercase, hyphenated slugs derived from data
Logical hierarchy reflecting site architecture
No duplicate slugs; enforce uniqueness at generation time
Keep URLs under 100 characters
No query parameters for primary content URLs
Consistent trailing slash usage (match existing site pattern)

Internal Linking Automation

Hub/spoke model: Category hub pages linking to individual programmatic pages
Related items: Auto-link to 3-5 related pages based on data attributes
Breadcrumbs: Generate BreadcrumbList schema from URL hierarchy
Cross-linking: Link between programmatic pages sharing attributes (same category, same city, same feature)
Anchor text: Use descriptive, varied anchor text. Avoid exact-match keyword repetition
Link density: 3-5 internal links per 1000 words (match seo-content guidelines)

Thin Content Safeguards

Quality Gates

Metric	Threshold	Action
Pages without content review	100+	⚠️ WARNING: require content audit before publishing
Pages without justification	500+	🛑 HARD STOP: require explicit user approval and thin content audit
Unique content per page	<40%	❌ Flag as thin content (likely penalty risk)
Word count per page	<300	⚠️ Flag for review (may lack sufficient value)

Scaled Content Abuse: Enforcement Context (2025-2026)

Google's Scaled Content Abuse policy (introduced March 2024) saw major enforcement escalation in 2025:

June 2025: Wave of manual actions targeting websites with AI-generated content at scale
August 2025: SpamBrain spam update enhanced pattern detection for AI-generated link schemes and content farms
Result: Google reported 45% reduction in low-quality, unoriginal content in search results post-March 2024 enforcement

Enhanced quality gates for programmatic pages:

Content differentiation: ≥30-40% of content must be genuinely unique between any two programmatic pages (not just city/keyword string replacement)
Human review: Minimum 5-10% sample review of generated pages before publishing
Progressive rollout: Publish in batches of 50-100 pages. Monitor indexing and rankings for 2-4 weeks before expanding. Never publish 500+ programmatic pages simultaneously without explicit quality review.
Standalone value test: Each page should pass: "Would this page be worth publishing even if no other similar pages existed?"
Site reputation abuse: If publishing programmatic content under a high-authority domain (not your own), this may trigger site reputation abuse penalties. Google began enforcing this aggressively in November 2024.

Recommendation: The WARNING gate at <40% unique content remains appropriate. Consider a HARD STOP at <30% unique content to prevent scaled content abuse risk.

Safe Programmatic Pages (OK at scale)

✅ Integration pages (with real setup docs, API details, screenshots) ✅ Template/tool pages (with downloadable content, usage instructions) ✅ Glossary pages (200+ word definitions with examples, related terms) ✅ Product pages (unique specs, reviews, comparison data) ✅ Data-driven pages (unique statistics, charts, analysis per record)

Penalty Risk (avoid at scale)

❌ Location pages with only city name swapped in identical text ❌ "Best [tool] for [industry]" without industry-specific value ❌ "[Competitor] alternative" without real comparison data ❌ AI-generated pages without human review and unique value-add ❌ Pages where >60% of content is shared template boilerplate

Uniqueness Calculation

Unique content % = (words unique to this page) / (total words on page) × 100

Measure against all other pages in the programmatic set. Shared headers, footers, and navigation are excluded from the calculation. Template boilerplate text IS included.

Canonical Strategy

Every programmatic page must have a self-referencing canonical tag
Parameter variations (sort, filter, pagination) canonical to the base URL
Paginated series: canonical to page 1 or use rel=next/prev
If programmatic pages overlap with manual pages, the manual page is canonical
No canonical to a different domain unless intentional cross-domain setup

Sitemap Integration

Auto-generate sitemap entries for all programmatic pages
Split at 50,000 URLs per sitemap file (protocol limit)
Use sitemap index if multiple sitemap files needed
<lastmod> reflects actual data update timestamp (not generation time)
Exclude noindexed programmatic pages from sitemap
Register sitemap in robots.txt
Update sitemap dynamically as new records are added to data source

Index Bloat Prevention

Noindex low-value pages: Pages that don't meet quality gates
Pagination: Noindex paginated results beyond page 1 (or use rel=next/prev)
Faceted navigation: Noindex filtered views, canonical to base category
Crawl budget: For sites with >10k programmatic pages, monitor crawl stats in Search Console
Thin page consolidation: Merge records with insufficient data into aggregated pages
Regular audits: Monthly review of indexed page count vs intended count

Output

Programmatic SEO Score: XX/100

Assessment Summary

Category	Status	Score
Data Quality	✅/⚠️/❌	XX/100
Template Uniqueness	✅/⚠️/❌	XX/100
URL Structure	✅/⚠️/❌	XX/100
Internal Linking	✅/⚠️/❌	XX/100
Thin Content Risk	✅/⚠️/❌	XX/100
Index Management	✅/⚠️/❌	XX/100

Critical Issues (fix immediately)

High Priority (fix within 1 week)

Medium Priority (fix within 1 month)

Low Priority (backlog)

Recommendations

Data source improvements
Template modifications
URL pattern adjustments
Quality gate compliance actions

Error Handling

Scenario	Action
URL unreachable	Report connection error with status code. Suggest verifying URL accessibility and checking for authentication requirements.
No programmatic pages detected	Inform user that no template-generated or data-driven page patterns were found. Suggest checking if pages use client-side rendering or if the URL points to the correct section.
Thin content threshold exceeded	Trigger quality gate warning. Report the unique content percentage and flag pages below 40% uniqueness. Require user acknowledgment before proceeding.
Quality gate violation	Halt analysis at the HARD STOP threshold (500+ pages without justification or <30% unique content). Present findings and require explicit user approval to continue.

> seo-programmatic