> enterprise-agent-ops
Operate long-lived agent workloads with observability, security boundaries, and lifecycle management.
curl "https://skillshub.wtf/affaan-m/everything-claude-code/enterprise-agent-ops?format=md"Enterprise Agent Ops
Use this skill for cloud-hosted or continuously running agent systems that need operational controls beyond single CLI sessions.
Operational Domains
- runtime lifecycle (start, pause, stop, restart)
- observability (logs, metrics, traces)
- safety controls (scopes, permissions, kill switches)
- change management (rollout, rollback, audit)
Baseline Controls
- immutable deployment artifacts
- least-privilege credentials
- environment-level secret injection
- hard timeout and retry budgets
- audit log for high-risk actions
Metrics to Track
- success rate
- mean retries per task
- time to recovery
- cost per successful task
- failure class distribution
Incident Pattern
When failure spikes:
- freeze new rollout
- capture representative traces
- isolate failing route
- patch with smallest safe change
- run regression + security checks
- resume gradually
Deployment Integrations
This skill pairs with:
- PM2 workflows
- systemd services
- container orchestrators
- CI/CD gates
> related_skills --same-repo
> energy-procurement
energy-procurement skill from affaan-m/everything-claude-code
> e2e-testing
Playwright E2E testing patterns, Page Object Model, configuration, CI/CD integration, artifact management, and flaky test strategies.
> docker-patterns
Docker and Docker Compose patterns for local development, container security, networking, volume strategies, and multi-service orchestration.
> dmux-workflows
Multi-agent orchestration using dmux (tmux pane manager for AI agents). Patterns for parallel agent workflows across Claude Code, Codex, OpenCode, and other harnesses. Use when running multiple agent sessions in parallel or coordinating multi-agent development workflows.