> agentic-engineering

Operate as an agentic engineer using eval-first execution, decomposition, and cost-aware model routing. Use when AI agents perform most implementation work and humans enforce quality and risk controls.

fetch

$curl "https://skillshub.wtf/affaan-m/everything-claude-code/agentic-engineering?format=md"

SKILL.md•agentic-engineering

Agentic Engineering

Use this skill for engineering workflows where AI agents perform most implementation work and humans enforce quality and risk controls.

Operating Principles

Define completion criteria before execution.
Decompose work into agent-sized units.
Route model tiers by task complexity.
Measure with evals and regression checks.

Eval-First Loop

Define capability eval and regression eval.
Run baseline and capture failure signatures.
Execute implementation.
Re-run evals and compare deltas.

Example workflow:

1. Write test that captures desired behavior (eval)
2. Run test → capture baseline failures
3. Implement feature
4. Re-run test → verify improvements
5. Check for regressions in other tests

Task Decomposition

Apply the 15-minute unit rule:

Each unit should be independently verifiable
Each unit should have a single dominant risk
Each unit should expose a clear done condition

Good decomposition:

Task: Add user authentication
├─ Unit 1: Add password hashing (15 min, security risk)
├─ Unit 2: Create login endpoint (15 min, API contract risk)
├─ Unit 3: Add session management (15 min, state risk)
└─ Unit 4: Protect routes with middleware (15 min, auth logic risk)

Bad decomposition:

Task: Add user authentication (2 hours, multiple risks)

Model Routing

Choose model tier based on task complexity:

Haiku: Classification, boilerplate transforms, narrow edits
- Example: Rename variable, add type annotation, format code
Sonnet: Implementation and refactors
- Example: Implement feature, refactor module, write tests
Opus: Architecture, root-cause analysis, multi-file invariants
- Example: Design system, debug complex issue, review architecture

Cost discipline: Escalate model tier only when lower tier fails with a clear reasoning gap.

Session Strategy

Continue session for closely-coupled units
- Example: Implementing related functions in same module
Start fresh session after major phase transitions
- Example: Moving from implementation to testing
Compact after milestone completion, not during active debugging
- Example: After feature complete, before starting next feature

Review Focus for AI-Generated Code

Prioritize:

Invariants and edge cases
Error boundaries
Security and auth assumptions
Hidden coupling and rollout risk

Do not waste review cycles on style-only disagreements when automated format/lint already enforce style.

Review checklist:

Edge cases handled (null, empty, boundary values)
Error handling comprehensive
Security assumptions validated
No hidden coupling between modules
Rollout risk assessed (breaking changes, migrations)

Cost Discipline

Track per task:

Model tier used
Token estimate
Retries needed
Wall-clock time
Success/failure outcome

Example tracking:

Task: Implement user login
Model: Sonnet
Tokens: ~5k input, ~2k output
Retries: 1 (initial implementation had auth bug)
Time: 8 minutes
Outcome: Success

When to Use This Skill

Managing AI-driven development workflows
Planning agent task decomposition
Optimizing model tier selection
Implementing eval-first development
Reviewing AI-generated code
Tracking development costs

Integration with Other Skills

tdd-workflow: Combine with eval-first loop for test-driven development
verification-loop: Use for continuous validation during implementation
search-first: Apply before implementation to find existing solutions
coding-standards: Reference during code review phase

> related_skills --same-repo

> skill-comply

Visualize whether skills, rules, and agent definitions are actually followed — auto-generates scenarios at 3 prompt strictness levels, runs agents, classifies behavioral sequences, and reports compliance rates with full tool call timelines

> santa-method

Multi-agent adversarial verification with convergence loop. Two independent review agents must both pass before output ships.

> safety-guard

# Safety Guard — Prevent Destructive Operations ## When to Use - When working on production systems - When agents are running autonomously (full-auto mode) - When you want to restrict edits to a specific directory - During sensitive operations (migrations, deploys, data changes) ## How It Works Three modes of protection: ### Mode 1: Careful Mode Intercepts destructive commands before execution and warns: ``` Watched patterns: - rm -rf (especially /, ~, or project root) - git push --force

> product-lens

# Product Lens — Think Before You Build ## When to Use - Before starting any feature — validate the "why" - Weekly product review — are we building the right thing? - When stuck choosing between features - Before a launch — sanity check the user journey - When converting a vague idea into a spec ## How It Works ### Mode 1: Product Diagnostic Like YC office hours but automated. Asks the hard questions: ``` 1. Who is this for? (specific person, not "developers") 2. What's the pain? (quantify

┌ stats

installs/wk2.3K

██████████

github stars217.4K

██████████

first seenMar 16, 2026

└────────────

┌ repo

affaan-m/everything-claude-code

by affaan-m

└────────────

┌ tags

#agent

└────────────