> alert-optimizer

Restructure and optimize alert rules for monitoring platforms (Sentry, PagerDuty, Datadog, OpsGenie). Use when someone asks to "reduce alert noise", "fix alert fatigue", "create alert rules", "set up escalation policies", "tune alerting thresholds", or "create on-call runbooks". Generates platform-specific alert configurations and tiered escalation policies.

fetch

$curl "https://skillshub.wtf/TerminalSkills/skills/alert-optimizer?format=md"

SKILL.md•alert-optimizer

Alert Optimizer

Overview

This skill takes error analysis data (ideally from the error-monitoring skill) and generates optimized alert rules, severity tiers, escalation policies, and on-call runbooks. It turns a noisy alerting setup into a structured incident response system.

Instructions

1. Understand Current State

Ask for or infer:

Current monitoring platform (Sentry, Datadog, PagerDuty, etc.)
Current alert volume and on-call team size
Notification channels available (Slack, PagerDuty, email, SMS)
Any existing severity definitions

2. Define Severity Tiers

Create a three-tier model (unless the user specifies otherwise):

Tier	Criteria	Response Time	Channel
P1 - Critical	Revenue impact, data loss, security breach, >50% users affected	Immediate page	PagerDuty/SMS
P2 - Warning	Degraded experience, >5% users affected, error rate spike	1 hour	Slack channel
P3 - Info	Known issues, cosmetic errors, self-healing transients	Weekly review	Log only

3. Generate Alert Rules

For each error group, produce a platform-specific alert configuration:

Sentry: JSON alert rule with conditions, filters, and actions
Datadog: Monitor definition with query, thresholds, and notification targets
PagerDuty: Event rules with severity mapping and escalation policy
Generic: Webhook payload template with routing logic

4. Create Escalation Policies

Define who gets notified and when:

P1: On-call engineer immediately → team lead after 10 min → engineering manager after 30 min
P2: Post to team Slack channel → on-call acknowledges within 1 hour
P3: Aggregated weekly digest

5. Generate Runbooks

For each P1 alert, create a runbook with:

What: One-sentence description of the alert
Why it matters: Business impact
Diagnose: First 3 steps to investigate
Fix: Common resolutions
Escalate: When and to whom

Examples

Example 1: Sentry Alert Rules

Input: Error audit showing PaymentProcessingError as P1, RedisTimeout as P3.

Output:

{
  "rules": [
    {
      "name": "P1: Payment Processing Failure",
      "conditions": [
        {"id": "sentry.rules.conditions.event_frequency", "value": 1, "interval": "5m"}
      ],
      "filters": [
        {"id": "sentry.rules.filters.tagged_event", "key": "error_type", "value": "PaymentProcessingError"}
      ],
      "actions": [
        {"id": "sentry.integrations.pagerduty.notify_action", "service": "engineering-oncall"},
        {"id": "sentry.integrations.slack.notify_action", "channel": "#incidents", "tags": "environment,transaction"}
      ],
      "frequency": 5
    },
    {
      "name": "P3: Redis Connection Timeout (Log Only)",
      "conditions": [
        {"id": "sentry.rules.conditions.event_frequency", "value": 100, "interval": "1h"}
      ],
      "filters": [
        {"id": "sentry.rules.filters.tagged_event", "key": "error_type", "value": "TimeoutError"}
      ],
      "actions": [],
      "frequency": 1440
    }
  ]
}

Example 2: On-Call Runbook Entry

Input: P1 alert for database deadlocks.

Output:

## RUNBOOK: Database Deadlock Detected

**Alert**: DatabaseError: deadlock detected in order processing
**Severity**: P1 — Revenue impact, orders fail silently
**Response**: Immediate — acknowledge within 5 minutes

### Diagnose
1. Check current DB connections: `SELECT * FROM pg_stat_activity WHERE state = 'active' ORDER BY query_start;`
2. Identify locked queries: `SELECT * FROM pg_locks WHERE NOT granted;`
3. Check if deployment happened in last 30 min (new query pattern?)

### Common Fixes
- Kill the blocking query: `SELECT pg_terminate_backend(<pid>);`
- If recurring post-deploy: rollback last migration
- If load-related: enable connection pooling or add read replica

### Escalate
- If not resolved in 15 min → page team lead
- If data corruption suspected → page engineering manager + DB admin

Guidelines

Always deduplicate alert rules — one root cause should trigger one alert, not five
Set reasonable frequency caps: P1 alerts should re-fire every 5 minutes max, P3 should be daily at most
Include "auto-resolve" rules where appropriate (e.g., error rate drops below threshold)
Runbooks should be copy-pasteable — include actual commands, not pseudocode
When the user has fewer than 3 people on-call, simplify escalation to two tiers
Test configurations by asking the user to dry-run before applying

> related_skills --same-repo

> zustand

You are an expert in Zustand, the small, fast, and scalable state management library for React. You help developers manage global state without boilerplate using Zustand's hook-based stores, selectors for performance, middleware (persist, devtools, immer), computed values, and async actions — replacing Redux complexity with a simple, un-opinionated API in under 1KB.

> zod

You are an expert in Zod, the TypeScript-first schema declaration and validation library. You help developers define schemas that validate data at runtime AND infer TypeScript types at compile time — eliminating the need to write types and validators separately. Used for API input validation, form validation, environment variables, config files, and any data boundary.

> xero-accounting

Integrate with the Xero accounting API to sync invoices, expenses, bank transactions, and contacts — and generate financial reports like P&L and balance sheet. Use when: connecting apps to Xero, automating bookkeeping workflows, syncing accounting data, or pulling financial reports programmatically.

> windsurf-rules

Configure Windsurf AI coding assistant with .windsurfrules and workspace rules. Use when: customizing Windsurf for a project, setting AI coding standards, creating team-shared Windsurf configurations, or tuning Cascade AI behavior.

┌ stats

installs/wk0

░░░░░░░░░░

github stars76

██████████

first seenMar 17, 2026

└────────────

┌ repo

TerminalSkills/skills

by TerminalSkills

└────────────

┌ tags

#monitoring

└────────────