found 157 skills in registry
Configure Nagios for infrastructure monitoring, service checks, host monitoring, and alert notifications. Use when a user needs to set up Nagios Core, write check commands, configure host and service definitions, manage notification contacts, or create custom monitoring plugins.
Set up end-to-end observability for microservices. Use when someone asks to "add tracing", "set up monitoring", "configure OpenTelemetry", "build Grafana dashboards", "distributed tracing", "structured logging", "metrics collection", or "debug production issues". Covers OpenTelemetry instrumentation, collector configuration, Grafana LGTM stack deployment, dashboard provisioning, and alert rules.
Instrument Node.js apps with OpenTelemetry. Use when a user asks to add distributed tracing, collect metrics, instrument HTTP requests, trace database queries, or set up observability for microservices.
Log in Node.js with Pino. Use when a user asks to add structured logging, improve logging performance, configure log levels, format logs for production, or replace console.log with proper logging.
You are an expert in Portkey, the AI gateway that sits between your app and LLM providers. You help developers add caching, fallbacks, load balancing, request retries, guardrails, semantic caching, budget limits, and observability to LLM calls — using a single unified API that works with 200+ models from OpenAI, Anthropic, Google, and open-source providers.
Expert guidance for SigNoz, the open-source observability platform that provides traces, metrics, and logs in a single UI. Built natively on OpenTelemetry, SigNoz is a self-hosted alternative to Datadog and New Relic. Helps developers set up distributed tracing, application performance monitoring, log management, and custom dashboards.
You are an expert in Soda, the data quality platform for testing, monitoring, and profiling data. You help developers write data quality checks in YAML that validate freshness, completeness, uniqueness, validity, and business rules — catching data issues before they reach dashboards and ML models.
Configure and manage status pages for incident communication and service health transparency. Use when a user needs to set up Atlassian Statuspage or open-source alternatives, manage components and incidents, automate status updates, or integrate with monitoring and alerting tools.
Restructure and optimize alert rules for monitoring platforms (Sentry, PagerDuty, Datadog, OpsGenie). Use when someone asks to "reduce alert noise", "fix alert fatigue", "create alert rules", "set up escalation policies", "tune alerting thresholds", or "create on-call runbooks". Generates platform-specific alert configurations and tiered escalation policies.
You are an expert in Svix, the enterprise webhook delivery platform. You help developers send reliable webhooks to customers with automatic retries, signature verification, delivery monitoring, endpoint management, and event type filtering — replacing custom webhook infrastructure with a purpose-built service used by companies like Clerk, Resend, and Liveblocks.
Configure Telegraf as a metrics collection agent for infrastructure and application monitoring. Use when a user needs to collect system metrics, set up input plugins for databases and services, configure output to InfluxDB or Prometheus, or build custom metric pipelines.
Deploy and configure Thanos for long-term Prometheus metric storage, global querying across multiple Prometheus instances, and data compaction. Use when a user needs durable metric storage in object storage, a unified query view across clusters, downsampling for historical data, or high-availability Prometheus with deduplication.
Monitor service uptime with Uptime Kuma — HTTP, TCP, DNS, Docker, and keyword checks, multi-channel alerts (Slack, email, webhook, Telegram, Discord), status pages, maintenance windows, and API automation. Use when tasks involve monitoring website or service availability, setting up alerting, creating public status pages, or tracking SLA metrics.
Expert guidance for Vector, the high-performance observability data pipeline built in Rust by Datadog. Helps developers collect, transform, and route logs, metrics, and traces from any source to any destination with minimal resource usage. Vector replaces Logstash, Fluentd, and Filebeat with a single, faster tool.
Configure Opsgenie for alert management, on-call scheduling, routing rules, and incident response. Use when a user needs to set up alert routing, create escalation policies, manage on-call rotations, integrate with monitoring tools, or automate incident workflows with Opsgenie.
Deploy and configure VictoriaMetrics as a high-performance time-series database for metrics storage and querying. Use when a user needs a Prometheus-compatible long-term storage backend, wants to write MetricsQL queries, configure vmagent for metrics scraping, or set up VictoriaMetrics cluster mode for horizontal scaling.
You are an expert in Weave, the lightweight toolkit by Weights & Biases for tracking and evaluating AI applications. You help developers trace LLM calls, evaluate outputs, compare model versions, track experiments, and debug AI pipelines — with automatic logging via decorators and a visual dashboard for exploring traces, costs, and quality metrics.
Deploy and configure Zipkin for distributed tracing and request flow visualization. Use when a user needs to set up trace collection, instrument Java/Spring or other services with Zipkin, analyze service dependencies, or configure storage backends for trace data.
You are an expert in Traceloop and its OpenLLMetry SDK, the open-source observability framework that extends OpenTelemetry for LLM applications. You help developers instrument AI pipelines with automatic tracing for OpenAI, Anthropic, Cohere, LangChain, LlamaIndex, vector databases, and frameworks — exporting to any OpenTelemetry-compatible backend (Grafana Tempo, Jaeger, Datadog, Honeycomb, Traceloop Cloud).
You are an expert in Langtrace, the open-source observability platform for LLM applications built on OpenTelemetry. You help developers trace LLM calls, RAG pipelines, agent tool use, and chain executions with automatic instrumentation for OpenAI, Anthropic, LangChain, LlamaIndex, and 20+ providers — providing cost tracking, latency analysis, token usage, and quality evaluation in a self-hostable dashboard.