> agentic-engineering
Operate as an agentic engineer using eval-first execution, decomposition, and cost-aware model routing. Use when AI agents perform most implementation work and humans enforce quality and risk controls.
curl "https://skillshub.wtf/affaan-m/everything-claude-code/agentic-engineering?format=md"Agentic Engineering
Use this skill for engineering workflows where AI agents perform most implementation work and humans enforce quality and risk controls.
Operating Principles
- Define completion criteria before execution.
- Decompose work into agent-sized units.
- Route model tiers by task complexity.
- Measure with evals and regression checks.
Eval-First Loop
- Define capability eval and regression eval.
- Run baseline and capture failure signatures.
- Execute implementation.
- Re-run evals and compare deltas.
Example workflow:
1. Write test that captures desired behavior (eval)
2. Run test → capture baseline failures
3. Implement feature
4. Re-run test → verify improvements
5. Check for regressions in other tests
Task Decomposition
Apply the 15-minute unit rule:
- Each unit should be independently verifiable
- Each unit should have a single dominant risk
- Each unit should expose a clear done condition
Good decomposition:
Task: Add user authentication
├─ Unit 1: Add password hashing (15 min, security risk)
├─ Unit 2: Create login endpoint (15 min, API contract risk)
├─ Unit 3: Add session management (15 min, state risk)
└─ Unit 4: Protect routes with middleware (15 min, auth logic risk)
Bad decomposition:
Task: Add user authentication (2 hours, multiple risks)
Model Routing
Choose model tier based on task complexity:
-
Haiku: Classification, boilerplate transforms, narrow edits
- Example: Rename variable, add type annotation, format code
-
Sonnet: Implementation and refactors
- Example: Implement feature, refactor module, write tests
-
Opus: Architecture, root-cause analysis, multi-file invariants
- Example: Design system, debug complex issue, review architecture
Cost discipline: Escalate model tier only when lower tier fails with a clear reasoning gap.
Session Strategy
-
Continue session for closely-coupled units
- Example: Implementing related functions in same module
-
Start fresh session after major phase transitions
- Example: Moving from implementation to testing
-
Compact after milestone completion, not during active debugging
- Example: After feature complete, before starting next feature
Review Focus for AI-Generated Code
Prioritize:
- Invariants and edge cases
- Error boundaries
- Security and auth assumptions
- Hidden coupling and rollout risk
Do not waste review cycles on style-only disagreements when automated format/lint already enforce style.
Review checklist:
- Edge cases handled (null, empty, boundary values)
- Error handling comprehensive
- Security assumptions validated
- No hidden coupling between modules
- Rollout risk assessed (breaking changes, migrations)
Cost Discipline
Track per task:
- Model tier used
- Token estimate
- Retries needed
- Wall-clock time
- Success/failure outcome
Example tracking:
Task: Implement user login
Model: Sonnet
Tokens: ~5k input, ~2k output
Retries: 1 (initial implementation had auth bug)
Time: 8 minutes
Outcome: Success
When to Use This Skill
- Managing AI-driven development workflows
- Planning agent task decomposition
- Optimizing model tier selection
- Implementing eval-first development
- Reviewing AI-generated code
- Tracking development costs
Integration with Other Skills
- tdd-workflow: Combine with eval-first loop for test-driven development
- verification-loop: Use for continuous validation during implementation
- search-first: Apply before implementation to find existing solutions
- coding-standards: Reference during code review phase
> related_skills --same-repo
> skill-comply
Visualize whether skills, rules, and agent definitions are actually followed — auto-generates scenarios at 3 prompt strictness levels, runs agents, classifies behavioral sequences, and reports compliance rates with full tool call timelines
> santa-method
Multi-agent adversarial verification with convergence loop. Two independent review agents must both pass before output ships.
> safety-guard
# Safety Guard — Prevent Destructive Operations ## When to Use - When working on production systems - When agents are running autonomously (full-auto mode) - When you want to restrict edits to a specific directory - During sensitive operations (migrations, deploys, data changes) ## How It Works Three modes of protection: ### Mode 1: Careful Mode Intercepts destructive commands before execution and warns: ``` Watched patterns: - rm -rf (especially /, ~, or project root) - git push --force
> product-lens
# Product Lens — Think Before You Build ## When to Use - Before starting any feature — validate the "why" - Weekly product review — are we building the right thing? - When stuck choosing between features - Before a launch — sanity check the user journey - When converting a vague idea into a spec ## How It Works ### Mode 1: Product Diagnostic Like YC office hours but automated. Asks the hard questions: ``` 1. Who is this for? (specific person, not "developers") 2. What's the pain? (quantify