> canva-incident-runbook
Execute Canva Connect API incident response with triage, mitigation, and postmortem. Use when responding to Canva-related outages, investigating API errors, or running post-incident reviews for Canva integration failures. Trigger with phrases like "canva incident", "canva outage", "canva down", "canva on-call", "canva emergency", "canva broken".
curl "https://skillshub.wtf/jeremylongshore/claude-code-plugins-plus-skills/canva-incident-runbook?format=md"Canva Incident Runbook
Overview
Rapid incident response for Canva Connect API integration failures. Covers triage, mitigation, escalation, and postmortem.
Quick Triage (First 5 Minutes)
#!/bin/bash
# canva-triage.sh — Run immediately when incident detected
echo "=== Canva Triage ==="
# 1. Is it Canva or us?
echo -n "Canva API: "
curl -s -o /dev/null -w "HTTP %{http_code} (%{time_total}s)\n" \
-H "Authorization: Bearer $CANVA_ACCESS_TOKEN" \
"https://api.canva.com/rest/v1/users/me"
# 2. Check our health endpoint
echo -n "Our health: "
curl -s -o /dev/null -w "HTTP %{http_code}\n" \
"https://api.ourapp.com/health"
# 3. Error rate (if Prometheus available)
echo "Error rate (5min):"
curl -s "localhost:9090/api/v1/query?query=rate(canva_api_errors_total[5m])" \
| python3 -c "import sys,json; d=json.load(sys.stdin); print(d['data']['result'])" 2>/dev/null \
|| echo "Prometheus not available"
# 4. Rate limit status
echo -n "Rate limit remaining: "
curl -sD - -o /dev/null -H "Authorization: Bearer $CANVA_ACCESS_TOKEN" \
"https://api.canva.com/rest/v1/designs?limit=1" 2>&1 \
| grep -i "x-ratelimit-remaining" || echo "unknown"
Decision Tree
API returning errors?
├── YES → What HTTP status?
│ ├── 401 → Token expired → Refresh token, check rotation
│ ├── 403 → Scope issue → Verify integration permissions
│ ├── 429 → Rate limited → Enable backoff, check Retry-After
│ ├── 5xx → Canva outage → Enable fallback, monitor status page
│ └── Other → Check request format against API docs
└── NO → Is our integration healthy?
├── YES → Likely resolved or intermittent → Monitor
└── NO → Check our infra (pods, memory, DNS, TLS)
Severity Levels
| Level | Definition | Response Time | Example |
|---|---|---|---|
| P1 | All design operations broken | < 15 min | All API calls returning 5xx |
| P2 | Degraded — some operations fail | < 1 hour | Exports failing, designs work |
| P3 | Minor — non-critical feature down | < 4 hours | Webhooks delayed |
| P4 | No user impact | Next business day | Monitoring gap |
Immediate Mitigation by Error Type
401 — Token Expired / Revoked
# Check if token is valid
curl -s -H "Authorization: Bearer $TOKEN" \
https://api.canva.com/rest/v1/users/me | python3 -m json.tool
# If expired: refresh all affected users' tokens
# If revoked: users must re-authorize via OAuth flow
429 — Rate Limited
# Check how long to wait
curl -sD - -o /dev/null -H "Authorization: Bearer $TOKEN" \
"https://api.canva.com/rest/v1/designs" 2>&1 \
| grep -i "retry-after"
# Immediate: reduce request rate
# Enable queue-based rate limiting
5xx — Canva Service Error
# Check Canva status page (no official status.canva.com for API)
# Check Canva developer community for reported outages
# Enable graceful degradation
# Return cached data where possible
# Show "Design features temporarily unavailable" to users
Communication Templates
Internal (Slack)
P[1-4] INCIDENT: Canva Integration
Status: INVESTIGATING | MITIGATING | RESOLVED
Impact: [Describe user impact]
API Response: HTTP [status code]
Current action: [What you're doing]
Next update: [Time]
IC: @[name]
External (Status Page)
Canva Design Features — Degraded Performance
We are experiencing issues with our design integration.
Users may see delays or errors when creating/exporting designs.
We are actively working with our design platform provider to resolve this.
Last updated: [ISO 8601 timestamp]
Post-Incident
Evidence Collection
# Collect logs for the incident window
kubectl logs -l app=canva-integration --since=2h > incident-canva-logs.txt
# Export metrics
curl "localhost:9090/api/v1/query_range?query=canva_api_errors_total&start=$(date -d '2 hours ago' +%s)&end=$(date +%s)&step=60" > metrics.json
Postmortem Template
## Incident: Canva API [Error Type]
**Date:** YYYY-MM-DD HH:MM UTC
**Duration:** X hours Y minutes
**Severity:** P[1-4]
### Summary
[1-2 sentence description]
### Timeline (UTC)
- HH:MM — [First alert / error detected]
- HH:MM — [Investigation started]
- HH:MM — [Root cause identified]
- HH:MM — [Mitigation applied]
- HH:MM — [Confirmed resolved]
### Root Cause
[Was it Canva-side or our integration? Token issue? Rate limit? Code bug?]
### Impact
- Users affected: N
- Failed operations: N designs / N exports
### Action Items
- [ ] [Preventive measure] — Owner — Due date
Error Handling
| Issue | Cause | Solution |
|---|---|---|
| Can't determine if Canva is down | No status page API | Test with known-good token |
| Token refresh fails | Revoked integration | Re-authorize user |
| All users affected | Integration-level issue | Check client credentials |
| Single user affected | User-level token issue | Refresh that user's token |
Resources
Next Steps
For data handling, see canva-data-handling.
> related_skills --same-repo
> fathom-cost-tuning
Optimize Fathom API usage and plan selection. Trigger with phrases like "fathom cost", "fathom pricing", "fathom plan".
> fathom-core-workflow-b
Sync Fathom meeting data to CRM and build automated follow-up workflows. Use when integrating Fathom with Salesforce, HubSpot, or custom CRMs, or creating automated post-meeting email summaries. Trigger with phrases like "fathom crm sync", "fathom salesforce", "fathom follow-up", "fathom post-meeting workflow".
> fathom-core-workflow-a
Build a meeting analytics pipeline with Fathom transcripts and summaries. Use when extracting insights from meetings, building CRM sync, or creating automated meeting follow-up workflows. Trigger with phrases like "fathom analytics", "fathom meeting pipeline", "fathom transcript analysis", "fathom action items sync".
> fathom-common-errors
Diagnose and fix Fathom API errors including auth failures and missing data. Use when API calls fail, transcripts are empty, or webhooks are not firing. Trigger with phrases like "fathom error", "fathom not working", "fathom api failure", "fix fathom".