> enterprise-agent-ops

Operate long-lived agent workloads with observability, security boundaries, and lifecycle management.

fetch
$curl "https://skillshub.wtf/affaan-m/everything-claude-code/enterprise-agent-ops?format=md"
SKILL.mdenterprise-agent-ops

Enterprise Agent Ops

Use this skill for cloud-hosted or continuously running agent systems that need operational controls beyond single CLI sessions.

Operational Domains

  1. runtime lifecycle (start, pause, stop, restart)
  2. observability (logs, metrics, traces)
  3. safety controls (scopes, permissions, kill switches)
  4. change management (rollout, rollback, audit)

Baseline Controls

  • immutable deployment artifacts
  • least-privilege credentials
  • environment-level secret injection
  • hard timeout and retry budgets
  • audit log for high-risk actions

Metrics to Track

  • success rate
  • mean retries per task
  • time to recovery
  • cost per successful task
  • failure class distribution

Incident Pattern

When failure spikes:

  1. freeze new rollout
  2. capture representative traces
  3. isolate failing route
  4. patch with smallest safe change
  5. run regression + security checks
  6. resume gradually

Deployment Integrations

This skill pairs with:

  • PM2 workflows
  • systemd services
  • container orchestrators
  • CI/CD gates

┌ stats

installs/wk2.3K
██████████
github stars80.5K
██████████
first seenMar 16, 2026
└────────────

┌ repo

affaan-m/everything-claude-code
by affaan-m
└────────────

┌ tags

└────────────