> clade-incident-runbook
Respond to Anthropic API incidents — outages, degraded performance, Use when working with incident-runbook patterns. error spikes, and rate limit issues in production. Trigger with "anthropic down", "claude outage", "anthropic incident", "claude not responding", "anthropic 529".
curl "https://skillshub.wtf/jeremylongshore/claude-code-plugins-plus-skills/clade-incident-runbook?format=md"Anthropic Incident Runbook
Overview
Respond to Anthropic API incidents in production — outages, sustained 529 errors, authentication failures, and timeouts. Covers status page checking, severity classification, model fallback activation, communication, and post-incident review.
Step 1: Confirm the Issue
# Check Anthropic status
curl -s https://status.anthropic.com/api/v2/status.json | python3 -c "
import json, sys
d = json.load(sys.stdin)
print(f\"Status: {d['status']['description']} ({d['status']['indicator']})\")"
# Test API directly
curl -s -w "\nHTTP %{http_code} in %{time_total}s\n" \
https://api.anthropic.com/v1/messages \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "claude-version: 2023-06-01" \
-H "content-type: application/json" \
-d '{"model":"claude-haiku-4-5-20251001","max_tokens":5,"messages":[{"role":"user","content":"ping"}]}'
Step 2: Classify Severity
| Symptom | Severity | Action |
|---|---|---|
| 529 overloaded (intermittent) | Low | SDK auto-retries handle this |
| 529 overloaded (sustained 5+ min) | Medium | Switch to fallback model |
| 401/403 on all requests | High | API key issue — check console |
| All requests timing out | High | Check status page, activate fallback |
| Status page shows incident | Varies | Follow status page updates |
Step 3: Activate Fallback
async function callWithFallback(params: Anthropic.MessageCreateParams) {
try {
return await client.messages.create(params);
} catch (err) {
if (err instanceof Anthropic.APIError && (err.status === 529 || err.status === 500)) {
// Try a different model
if (params.model.includes('opus')) {
return await client.messages.create({ ...params, model: 'claude-sonnet-4-20250514' });
}
if (params.model.includes('sonnet')) {
return await client.messages.create({ ...params, model: 'claude-haiku-4-5-20251001' });
}
}
throw err;
}
}
Step 4: Communicate
- Update your status page if user-facing
- Note: Anthropic incidents typically resolve in 15-60 minutes
Step 5: Post-Incident
- Check your error logs for the incident window
- Calculate impact (failed requests, user impact)
- Verify all systems recovered
Output
- Incident confirmed via status page and direct API test
- Severity classified (Low/Medium/High) based on symptoms
- Fallback activated if needed (downgrade model or queue requests)
- Impact assessed and documented post-incident
Error Handling
| Error | Cause | Solution |
|---|---|---|
| API Error | Check error type and status code | See clade-common-errors |
Examples
See Step 1 (curl status check and API test), Step 2 (severity classification table), Step 3 (fallback code with model downgrade), and Step 5 (post-incident checklist) above.
Resources
Next Steps
See clade-reliability-patterns for building resilient integrations.
Prerequisites
- Production Claude integration deployed
- Fallback model configuration in place (see
clade-reliability-patterns) - Monitoring/alerting configured (see
clade-observability)
Instructions
Step 1: Review the patterns below
Each section contains production-ready code examples. Copy and adapt them to your use case.
Step 2: Apply to your codebase
Integrate the patterns that match your requirements. Test each change individually.
Step 3: Verify
Run your test suite to confirm the integration works correctly.
> related_skills --same-repo
> fathom-cost-tuning
Optimize Fathom API usage and plan selection. Trigger with phrases like "fathom cost", "fathom pricing", "fathom plan".
> fathom-core-workflow-b
Sync Fathom meeting data to CRM and build automated follow-up workflows. Use when integrating Fathom with Salesforce, HubSpot, or custom CRMs, or creating automated post-meeting email summaries. Trigger with phrases like "fathom crm sync", "fathom salesforce", "fathom follow-up", "fathom post-meeting workflow".
> fathom-core-workflow-a
Build a meeting analytics pipeline with Fathom transcripts and summaries. Use when extracting insights from meetings, building CRM sync, or creating automated meeting follow-up workflows. Trigger with phrases like "fathom analytics", "fathom meeting pipeline", "fathom transcript analysis", "fathom action items sync".
> fathom-common-errors
Diagnose and fix Fathom API errors including auth failures and missing data. Use when API calls fail, transcripts are empty, or webhooks are not firing. Trigger with phrases like "fathom error", "fathom not working", "fathom api failure", "fix fathom".