> agent-harness-construction

设计和优化AI代理的动作空间、工具定义和观察格式，以提高完成率。

fetch

$curl "https://skillshub.wtf/affaan-m/everything-claude-code/agent-harness-construction?format=md"

SKILL.md•agent-harness-construction

智能体框架构建

当你在改进智能体的规划、调用工具、从错误中恢复以及收敛到完成状态的方式时，使用此技能。

核心模型

智能体输出质量受限于：

行动空间质量
观察质量
恢复质量
上下文预算质量

行动空间设计

使用稳定、明确的工具名称。
保持输入模式优先且范围狭窄。
返回确定性的输出形状。
除非无法隔离，否则避免使用全能型工具。

粒度规则

对高风险操作（部署、迁移、权限）使用微工具。
对常见的编辑/读取/搜索循环使用中等工具。
仅当往返开销是主要成本时使用宏工具。

观察设计

每个工具响应都应包括：

status: success|warning|error
summary: 一行结果
next_actions: 可执行的后续步骤
artifacts: 文件路径 / ID

错误恢复契约

对于每个错误路径，应包括：

根本原因提示
安全重试指令
明确的停止条件

上下文预算管理

保持系统提示词最少且不变。
将大量指导信息移至按需加载的技能中。
优先引用文件，而不是内联长文档。
在阶段边界处进行压缩，而不是任意的令牌阈值。

架构模式指导

ReAct：最适合路径不确定的探索性任务。
函数调用：最适合结构化的确定性流程。
混合模式（推荐）：ReAct 规划 + 类型化工具执行。

基准测试

跟踪：

完成率
每项任务的重试次数
pass@1 和 pass@3
每个成功任务的成本

反模式

太多语义重叠的工具。
不透明的工具输出，没有恢复提示。
仅输出错误而没有后续步骤。
上下文过载，包含不相关的引用。

> related_skills --same-repo

> skill-comply

Visualize whether skills, rules, and agent definitions are actually followed — auto-generates scenarios at 3 prompt strictness levels, runs agents, classifies behavioral sequences, and reports compliance rates with full tool call timelines

> santa-method

Multi-agent adversarial verification with convergence loop. Two independent review agents must both pass before output ships.

> safety-guard

# Safety Guard — Prevent Destructive Operations ## When to Use - When working on production systems - When agents are running autonomously (full-auto mode) - When you want to restrict edits to a specific directory - During sensitive operations (migrations, deploys, data changes) ## How It Works Three modes of protection: ### Mode 1: Careful Mode Intercepts destructive commands before execution and warns: ``` Watched patterns: - rm -rf (especially /, ~, or project root) - git push --force

> product-lens

# Product Lens — Think Before You Build ## When to Use - Before starting any feature — validate the "why" - Weekly product review — are we building the right thing? - When stuck choosing between features - Before a launch — sanity check the user journey - When converting a vague idea into a spec ## How It Works ### Mode 1: Product Diagnostic Like YC office hours but automated. Asks the hard questions: ``` 1. Who is this for? (specific person, not "developers") 2. What's the pain? (quantify

┌ stats

installs/wk2.3K

██████████

github stars172.3K

██████████

first seenMar 16, 2026

└────────────

┌ repo

affaan-m/everything-claude-code

by affaan-m

└────────────

┌ tags

#agent

└────────────