found 172 skills in registry
You are an expert in Traceloop and its OpenLLMetry SDK, the open-source observability framework that extends OpenTelemetry for LLM applications. You help developers instrument AI pipelines with automatic tracing for OpenAI, Anthropic, Cohere, LangChain, LlamaIndex, vector databases, and frameworks — exporting to any OpenTelemetry-compatible backend (Grafana Tempo, Jaeger, Datadog, Honeycomb, Traceloop Cloud).
Grafana is an open-source visualization and dashboarding platform that connects to dozens of data sources including Prometheus, PostgreSQL, ClickHouse, and Elasticsearch. It lets you build interactive dashboards with panels, set up alerting rules, and manage everything as code through JSON dashboard definitions and provisioning configuration.
LLM observability proxy that sits between your app and LLM providers. Logs every request, enables caching, rate limiting, and provides cost analytics. Works with OpenAI, Anthropic, and other providers with a one-line integration change.
You are an expert in Braintrust, the evaluation and observability platform for AI applications. You help developers run systematic evaluations, compare model versions, track experiments, log production traces, and measure quality metrics — with a focus on making AI development as rigorous as traditional software testing.
You are an expert in Langtrace, the open-source observability platform for LLM applications built on OpenTelemetry. You help developers trace LLM calls, RAG pipelines, agent tool use, and chain executions with automatic instrumentation for OpenAI, Anthropic, LangChain, LlamaIndex, and 20+ providers — providing cost tracking, latency analysis, token usage, and quality evaluation in a self-hostable dashboard.
Run AI agent code safely in isolated sandboxes with resource limits, audit trails, and kill switches. Use when someone asks to "sandbox my agent", "run agent code safely", "add guardrails to AI agent", "isolate agent execution", "audit agent actions", "prevent agent from deleting files", "restrict agent permissions", or "add safety controls to AI coding agent". Covers Docker isolation, filesystem restrictions, network policies, resource locking, and comprehensive audit logging.
Linkerd lightweight service mesh for Kubernetes. Use when the user needs automatic mTLS, traffic splitting, retries, and observability with minimal resource overhead and operational complexity.
Manage Railway deployments using the CLI. Use when a user asks to deploy to Railway, check deployment status, manage Railway services, set environment variables on Railway, view Railway logs, link a Railway project, add a database on Railway, scale Railway services, manage Railway environments, rollback a Railway deployment, or run commands with Railway env vars. Covers the full deploy lifecycle from project setup to production monitoring.
Assists with instrumenting applications using OpenTelemetry for distributed tracing, metrics, and logs. Use when adding observability, configuring auto-instrumentation, building custom spans, setting up OTel Collectors, or exporting telemetry to Jaeger, Grafana, or Datadog. Trigger words: opentelemetry, otel, tracing, spans, metrics, observability, collector.
You are an expert in the OpenAI Agents SDK (formerly Swarm), the official framework for building multi-agent systems. You help developers create agents with tool calling, guardrails, agent handoffs, streaming, tracing, and MCP integration — building production-grade AI agents that coordinate, delegate tasks, and execute tools with built-in safety controls.
Deploy and configure Thanos for long-term Prometheus metric storage, global querying across multiple Prometheus instances, and data compaction. Use when a user needs durable metric storage in object storage, a unified query view across clusters, downsampling for historical data, or high-availability Prometheus with deduplication.
Expert database optimizer specializing in modern performance tuning, query optimization, and scalable architectures. Masters advanced indexing, N+1 resolution, multi-tier caching, partitioning strategies, and cloud database optimization. Handles complex query analysis, migration strategies, and performance monitoring. Use PROACTIVELY for database optimization, performance issues, or scalability challenges.
Patterns and best practices for bringing up a complete ROS2-based robotics system on a robot's onboard computer, including systemd services, launch file composition, ordered startup, and production monitoring. Use this skill when configuring a robot to start ROS2 nodes on boot, writing systemd unit files for ROS2 launch, composing layered launch files for full robot stacks, setting up watchdog monitoring, configuring udev rules for deterministic device naming, or debugging boot-time race conditi
AWS CloudWatch monitoring for logs, metrics, alarms, and dashboards. Use when setting up monitoring, creating alarms, querying logs with Insights, configuring metric filters, building dashboards, or troubleshooting application issues.
This skill automates the setup of distributed tracing for microservices. It helps developers implement end-to-end request visibility by configuring context propagation, span creation, trace collection, and analysis. Use this skill when the user requests to set up distributed tracing, implement observability, or troubleshoot performance issues in a microservices architecture. The skill is triggered by phrases such as "setup tracing", "implement distributed tracing", "configure opentelemetry", or
Standards for Micrometer, Distributed Tracing, and Structured Logging. Use when adding Micrometer metrics, distributed tracing, or structured logging to Spring Boot. (triggers: logback-spring.xml, application.properties, micrometer, tracing, correlation-id, mdc)
Use this skill to work with Microsoft Foundry (Azure AI Foundry): deploy AI models from catalog, build RAG applications with knowledge indexes, create and evaluate AI agents, manage RBAC permissions and role assignments, manage quotas and capacity, create Foundry resources. USE FOR: Microsoft Foundry, AI Foundry, deploy model, model catalog, RAG, knowledge index, create agent, evaluate agent, agent monitoring, create Foundry project, new Foundry project, set up Foundry, onboard to Foundry, provi
This skill enables Claude to monitor and analyze CPU usage patterns within applications. It helps identify CPU hotspots, analyze algorithmic complexity, and detect blocking operations. Use this skill when the user asks to "monitor CPU usage", "optimize CPU performance", "analyze CPU load", or "find CPU bottlenecks". It assists in identifying inefficient loops, regex performance issues, and provides optimization recommendations. This skill is designed for improving application performance by addr
This skill enables Claude to create intelligent alerting rules for proactive performance monitoring. It is triggered when the user requests to "create alerts", "define monitoring rules", or "set up alerting". The skill helps define thresholds, routing, and escalation policies, and offers options for multi-category alert creation, including latency, error rate, throughput, resource utilization, availability, and SLO violation alerts. It is useful for Site Reliability Engineers (SREs) and DevOps t
This skill enables Claude to monitor and analyze application error rates to improve reliability. It is used when the user needs to track and understand errors occurring in their application, including HTTP errors, application exceptions, database errors, external API errors, background job errors, and client-side errors. Use this skill when the user asks to "monitor errors", "analyze error rates", "track application errors", or requests help with "error monitoring". It sets up comprehensive erro