> weave
You are an expert in Weave, the lightweight toolkit by Weights & Biases for tracking and evaluating AI applications. You help developers trace LLM calls, evaluate outputs, compare model versions, track experiments, and debug AI pipelines — with automatic logging via decorators and a visual dashboard for exploring traces, costs, and quality metrics.
curl "https://skillshub.wtf/TerminalSkills/skills/weave?format=md"Weave — AI Application Tracking by Weights & Biases
You are an expert in Weave, the lightweight toolkit by Weights & Biases for tracking and evaluating AI applications. You help developers trace LLM calls, evaluate outputs, compare model versions, track experiments, and debug AI pipelines — with automatic logging via decorators and a visual dashboard for exploring traces, costs, and quality metrics.
Core Capabilities
Automatic Tracing
import weave
import openai
weave.init("my-ai-project") # Initialize with project name
client = openai.OpenAI()
# OpenAI calls are automatically traced
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Explain transformers"}],
)
# Weave captures: model, tokens, latency, cost, input/output — viewable in dashboard
# Custom function tracing
@weave.op()
def extract_entities(text: str) -> list[str]:
"""Extract named entities from text."""
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": f"Extract entities from: {text}\nReturn JSON list."}],
)
return json.loads(response.choices[0].message.content)
@weave.op()
def rag_pipeline(query: str) -> str:
"""Full RAG pipeline — each step traced as child span."""
docs = retrieve_documents(query) # Traced if decorated
context = "\n".join(docs)
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": f"Answer using:\n{context}"},
{"role": "user", "content": query},
],
)
return response.choices[0].message.content
Evaluations
# Define evaluation dataset
eval_dataset = [
{"query": "What is Python?", "expected": "programming language"},
{"query": "Who created Linux?", "expected": "Linus Torvalds"},
{"query": "What is Docker?", "expected": "containerization platform"},
]
# Define scoring functions
@weave.op()
def relevance_scorer(output: str, expected: str) -> dict:
"""Score if output contains expected information."""
contains = expected.lower() in output.lower()
return {"relevance": 1.0 if contains else 0.0}
@weave.op()
def length_scorer(output: str) -> dict:
"""Score response length (prefer concise)."""
words = len(output.split())
return {"conciseness": min(1.0, 50 / max(words, 1))}
# Run evaluation
evaluation = weave.Evaluation(
dataset=eval_dataset,
scorers=[relevance_scorer, length_scorer],
)
results = await evaluation.evaluate(rag_pipeline)
# Results visible in Weave dashboard with per-example scores
# Compare across model versions, prompts, parameters
Model Versioning
# Track model/prompt versions
class SupportAgent(weave.Model):
model_name: str = "gpt-4o"
system_prompt: str = "You are a helpful support agent."
temperature: float = 0.7
@weave.op()
def predict(self, query: str) -> str:
response = client.chat.completions.create(
model=self.model_name,
temperature=self.temperature,
messages=[
{"role": "system", "content": self.system_prompt},
{"role": "user", "content": query},
],
)
return response.choices[0].message.content
# Version 1
agent_v1 = SupportAgent(system_prompt="Be concise and helpful.")
# Version 2 — compare in dashboard
agent_v2 = SupportAgent(model_name="gpt-4o-mini", system_prompt="Be detailed and empathetic.")
# Evaluate both versions
for agent in [agent_v1, agent_v2]:
await evaluation.evaluate(agent)
# Dashboard shows side-by-side comparison: quality, cost, latency
Installation
pip install weave
# Uses your W&B account — set WANDB_API_KEY
Best Practices
- @weave.op() decorator — Add to any function to trace it; creates hierarchical spans for nested calls
- Auto-instrumentation — OpenAI, Anthropic, LangChain calls traced automatically after
weave.init() - Evaluations — Define datasets + scorers; run systematically; compare versions in dashboard
- weave.Model — Subclass for versioned models; parameters tracked, comparable across evaluations
- W&B integration — Weave data appears in your W&B workspace; share with team, add to reports
- Cost tracking — Automatic per-call cost calculation; aggregate by function, model, or user
- Production monitoring — Use in production for continuous quality tracking; alert on regressions
- Lightweight — Single
@weave.op()decorator; no complex setup, no separate infrastructure
> related_skills --same-repo
> zustand
You are an expert in Zustand, the small, fast, and scalable state management library for React. You help developers manage global state without boilerplate using Zustand's hook-based stores, selectors for performance, middleware (persist, devtools, immer), computed values, and async actions — replacing Redux complexity with a simple, un-opinionated API in under 1KB.
> zod
You are an expert in Zod, the TypeScript-first schema declaration and validation library. You help developers define schemas that validate data at runtime AND infer TypeScript types at compile time — eliminating the need to write types and validators separately. Used for API input validation, form validation, environment variables, config files, and any data boundary.
> xero-accounting
Integrate with the Xero accounting API to sync invoices, expenses, bank transactions, and contacts — and generate financial reports like P&L and balance sheet. Use when: connecting apps to Xero, automating bookkeeping workflows, syncing accounting data, or pulling financial reports programmatically.
> windsurf-rules
Configure Windsurf AI coding assistant with .windsurfrules and workspace rules. Use when: customizing Windsurf for a project, setting AI coding standards, creating team-shared Windsurf configurations, or tuning Cascade AI behavior.