> weave
You are an expert in Weave, the lightweight toolkit by Weights & Biases for tracking and evaluating AI applications. You help developers trace LLM calls, evaluate outputs, compare model versions, track experiments, and debug AI pipelines — with automatic logging via decorators and a visual dashboard for exploring traces, costs, and quality metrics.
curl "https://skillshub.wtf/TerminalSkills/skills/weave?format=md"Weave — AI Application Tracking by Weights & Biases
You are an expert in Weave, the lightweight toolkit by Weights & Biases for tracking and evaluating AI applications. You help developers trace LLM calls, evaluate outputs, compare model versions, track experiments, and debug AI pipelines — with automatic logging via decorators and a visual dashboard for exploring traces, costs, and quality metrics.
Core Capabilities
Automatic Tracing
import weave
import openai
weave.init("my-ai-project") # Initialize with project name
client = openai.OpenAI()
# OpenAI calls are automatically traced
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Explain transformers"}],
)
# Weave captures: model, tokens, latency, cost, input/output — viewable in dashboard
# Custom function tracing
@weave.op()
def extract_entities(text: str) -> list[str]:
"""Extract named entities from text."""
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": f"Extract entities from: {text}\nReturn JSON list."}],
)
return json.loads(response.choices[0].message.content)
@weave.op()
def rag_pipeline(query: str) -> str:
"""Full RAG pipeline — each step traced as child span."""
docs = retrieve_documents(query) # Traced if decorated
context = "\n".join(docs)
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": f"Answer using:\n{context}"},
{"role": "user", "content": query},
],
)
return response.choices[0].message.content
Evaluations
# Define evaluation dataset
eval_dataset = [
{"query": "What is Python?", "expected": "programming language"},
{"query": "Who created Linux?", "expected": "Linus Torvalds"},
{"query": "What is Docker?", "expected": "containerization platform"},
]
# Define scoring functions
@weave.op()
def relevance_scorer(output: str, expected: str) -> dict:
"""Score if output contains expected information."""
contains = expected.lower() in output.lower()
return {"relevance": 1.0 if contains else 0.0}
@weave.op()
def length_scorer(output: str) -> dict:
"""Score response length (prefer concise)."""
words = len(output.split())
return {"conciseness": min(1.0, 50 / max(words, 1))}
# Run evaluation
evaluation = weave.Evaluation(
dataset=eval_dataset,
scorers=[relevance_scorer, length_scorer],
)
results = await evaluation.evaluate(rag_pipeline)
# Results visible in Weave dashboard with per-example scores
# Compare across model versions, prompts, parameters
Model Versioning
# Track model/prompt versions
class SupportAgent(weave.Model):
model_name: str = "gpt-4o"
system_prompt: str = "You are a helpful support agent."
temperature: float = 0.7
@weave.op()
def predict(self, query: str) -> str:
response = client.chat.completions.create(
model=self.model_name,
temperature=self.temperature,
messages=[
{"role": "system", "content": self.system_prompt},
{"role": "user", "content": query},
],
)
return response.choices[0].message.content
# Version 1
agent_v1 = SupportAgent(system_prompt="Be concise and helpful.")
# Version 2 — compare in dashboard
agent_v2 = SupportAgent(model_name="gpt-4o-mini", system_prompt="Be detailed and empathetic.")
# Evaluate both versions
for agent in [agent_v1, agent_v2]:
await evaluation.evaluate(agent)
# Dashboard shows side-by-side comparison: quality, cost, latency
Installation
pip install weave
# Uses your W&B account — set WANDB_API_KEY
Best Practices
- @weave.op() decorator — Add to any function to trace it; creates hierarchical spans for nested calls
- Auto-instrumentation — OpenAI, Anthropic, LangChain calls traced automatically after
weave.init() - Evaluations — Define datasets + scorers; run systematically; compare versions in dashboard
- weave.Model — Subclass for versioned models; parameters tracked, comparable across evaluations
- W&B integration — Weave data appears in your W&B workspace; share with team, add to reports
- Cost tracking — Automatic per-call cost calculation; aggregate by function, model, or user
- Production monitoring — Use in production for continuous quality tracking; alert on regressions
- Lightweight — Single
@weave.op()decorator; no complex setup, no separate infrastructure
> related_skills --same-repo
> zustand
You are an expert in Zustand, the small, fast, and scalable state management library for React. You help developers manage global state without boilerplate using Zustand's hook-based stores, selectors for performance, middleware (persist, devtools, immer), computed values, and async actions — replacing Redux complexity with a simple, un-opinionated API in under 1KB.
> zoho
Integrate and automate Zoho products. Use when a user asks to work with Zoho CRM, Zoho Books, Zoho Desk, Zoho Projects, Zoho Mail, or Zoho Creator, build custom integrations via Zoho APIs, automate workflows with Deluge scripting, sync data between Zoho apps and external systems, manage leads and deals, automate invoicing, build custom Zoho Creator apps, set up webhooks, or manage Zoho organization settings. Covers Zoho CRM, Books, Desk, Projects, Creator, and cross-product integrations.
> zod
You are an expert in Zod, the TypeScript-first schema declaration and validation library. You help developers define schemas that validate data at runtime AND infer TypeScript types at compile time — eliminating the need to write types and validators separately. Used for API input validation, form validation, environment variables, config files, and any data boundary.
> zipkin
Deploy and configure Zipkin for distributed tracing and request flow visualization. Use when a user needs to set up trace collection, instrument Java/Spring or other services with Zipkin, analyze service dependencies, or configure storage backends for trace data.