> langfuse
You are an expert in Langfuse, the open-source LLM engineering platform. You help developers trace LLM calls, evaluate output quality, manage prompts, track costs and latency, run experiments, and build evaluation datasets — providing full observability into AI applications from development through production.
curl "https://skillshub.wtf/TerminalSkills/skills/langfuse?format=md"Langfuse — Open-Source LLM Observability
You are an expert in Langfuse, the open-source LLM engineering platform. You help developers trace LLM calls, evaluate output quality, manage prompts, track costs and latency, run experiments, and build evaluation datasets — providing full observability into AI applications from development through production.
Core Capabilities
Tracing
# Decorator-based tracing (Python)
from langfuse.decorators import observe, langfuse_context
@observe()
def answer_question(question: str, context: str) -> str:
"""Trace the entire RAG pipeline as a single trace."""
# Step 1: Retrieve relevant docs
docs = retrieve_docs(question)
# Step 2: Generate answer
answer = generate_answer(question, docs)
# Add metadata to trace
langfuse_context.update_current_trace(
user_id="user-42",
session_id="session-abc",
tags=["production", "rag"],
metadata={"model": "gpt-4o", "doc_count": len(docs)},
)
return answer
@observe()
def retrieve_docs(question: str) -> list[str]:
"""Traced as a span within the parent trace."""
embeddings = openai.embeddings.create(model="text-embedding-3-small", input=question)
results = vector_store.search(embeddings.data[0].embedding, top_k=5)
return [r.text for r in results]
@observe(as_type="generation")
def generate_answer(question: str, docs: list[str]) -> str:
"""Traced as an LLM generation with token usage and cost."""
response = openai.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": f"Context:\n{chr(10).join(docs)}"},
{"role": "user", "content": question},
],
)
return response.choices[0].message.content
// TypeScript / Vercel AI SDK integration
import { Langfuse } from "langfuse";
import { AISDKExporter } from "langfuse-vercel";
const langfuse = new Langfuse();
// Wrap AI SDK calls
const result = await generateText({
model: openai("gpt-4o"),
prompt: question,
experimental_telemetry: {
isEnabled: true,
functionId: "answer-question",
metadata: { userId: "user-42" },
},
});
Prompt Management
from langfuse import Langfuse
langfuse = Langfuse()
# Fetch versioned prompt from Langfuse
prompt = langfuse.get_prompt("rag-system-prompt", version=3)
# Use in generation
response = openai.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": prompt.compile(company_name="Acme", tone="professional")},
{"role": "user", "content": user_question},
],
)
# Prompts managed via Langfuse UI — non-engineers can edit, version, A/B test
Evaluation
# Score traces for quality
langfuse.score(
trace_id=trace.id,
name="relevance",
value=0.9, # 0-1 scale
comment="Answer directly addressed the question",
)
# LLM-as-judge evaluation
langfuse.score(
trace_id=trace.id,
name="hallucination",
value=0.0, # 0 = no hallucination
data_type="NUMERIC",
)
# Create evaluation datasets
dataset = langfuse.create_dataset("rag-eval-v1")
for item in test_cases:
langfuse.create_dataset_item(
dataset_name="rag-eval-v1",
input={"question": item["question"]},
expected_output=item["expected_answer"],
)
Installation
pip install langfuse # Python
npm install langfuse # TypeScript
# Self-hosted: docker-compose up (Langfuse is open-source)
Best Practices
- Trace everything — Wrap all LLM calls with tracing; understand latency, cost, and quality per request
- Structured traces — Use nested spans (retrieve → generate → format); identify bottlenecks in pipeline
- Cost tracking — Langfuse auto-calculates token costs per model; track spending by user, feature, prompt version
- Prompt versioning — Manage prompts in Langfuse UI; A/B test versions, rollback safely
- Evaluation datasets — Create test sets from production traces; run regression tests on prompt changes
- LLM-as-judge — Use automated scoring for hallucination, relevance, helpfulness at scale
- Session tracking — Group traces by session for conversational AI; see full conversation flow
- Self-hosted — Deploy with Docker for data sovereignty; same features as cloud, your infrastructure
> related_skills --same-repo
> zustand
You are an expert in Zustand, the small, fast, and scalable state management library for React. You help developers manage global state without boilerplate using Zustand's hook-based stores, selectors for performance, middleware (persist, devtools, immer), computed values, and async actions — replacing Redux complexity with a simple, un-opinionated API in under 1KB.
> zoho
Integrate and automate Zoho products. Use when a user asks to work with Zoho CRM, Zoho Books, Zoho Desk, Zoho Projects, Zoho Mail, or Zoho Creator, build custom integrations via Zoho APIs, automate workflows with Deluge scripting, sync data between Zoho apps and external systems, manage leads and deals, automate invoicing, build custom Zoho Creator apps, set up webhooks, or manage Zoho organization settings. Covers Zoho CRM, Books, Desk, Projects, Creator, and cross-product integrations.
> zod
You are an expert in Zod, the TypeScript-first schema declaration and validation library. You help developers define schemas that validate data at runtime AND infer TypeScript types at compile time — eliminating the need to write types and validators separately. Used for API input validation, form validation, environment variables, config files, and any data boundary.
> zipkin
Deploy and configure Zipkin for distributed tracing and request flow visualization. Use when a user needs to set up trace collection, instrument Java/Spring or other services with Zipkin, analyze service dependencies, or configure storage backends for trace data.