found 172 skills in registry
Apache Airflow is a platform for programmatically authoring, scheduling, and monitoring workflows. Learn to write DAGs, use operators, set up connections, configure scheduling, and deploy with Docker Compose.
Expert guidance for Comet ML, the platform for tracking machine learning experiments, managing models, and monitoring production ML systems. Helps developers log experiments, compare model versions, and build reproducible ML pipelines with automatic code/data versioning.
Expert guidance for Gatus, the lightweight, self-hosted health check and status page tool written in Go. Helps developers set up endpoint monitoring with conditions, alerting, and a beautiful status page — all configured via a single YAML file with no database required.
Istio service mesh for Kubernetes traffic management, security, and observability. Use when the user needs to configure traffic routing, mTLS, circuit breaking, fault injection, or observability for microservices.
Deploy and use Jaeger for distributed tracing across microservices. Use when a user needs to set up trace collection, instrument applications with OpenTelemetry, analyze trace data to find latency bottlenecks, or configure Jaeger storage backends and sampling strategies.
Assists with instrumenting applications using OpenTelemetry for distributed tracing, metrics, and logs. Use when adding observability, configuring auto-instrumentation, building custom spans, setting up OTel Collectors, or exporting telemetry to Jaeger, Grafana, or Datadog. Trigger words: opentelemetry, otel, tracing, spans, metrics, observability, collector.
Set up and manage New Relic for full-stack observability including APM, browser monitoring, infrastructure monitoring, and alerting. Use when a user needs to instrument applications, write NRQL queries, create dashboards, configure alert policies, or integrate New Relic with their deployment pipeline.
Configure Nagios for infrastructure monitoring, service checks, host monitoring, and alert notifications. Use when a user needs to set up Nagios Core, write check commands, configure host and service definitions, manage notification contacts, or create custom monitoring plugins.
Set up end-to-end observability for microservices. Use when someone asks to "add tracing", "set up monitoring", "configure OpenTelemetry", "build Grafana dashboards", "distributed tracing", "structured logging", "metrics collection", or "debug production issues". Covers OpenTelemetry instrumentation, collector configuration, Grafana LGTM stack deployment, dashboard provisioning, and alert rules.
Instrument Node.js apps with OpenTelemetry. Use when a user asks to add distributed tracing, collect metrics, instrument HTTP requests, trace database queries, or set up observability for microservices.
Log in Node.js with Pino. Use when a user asks to add structured logging, improve logging performance, configure log levels, format logs for production, or replace console.log with proper logging.
You are an expert in Portkey, the AI gateway that sits between your app and LLM providers. You help developers add caching, fallbacks, load balancing, request retries, guardrails, semantic caching, budget limits, and observability to LLM calls — using a single unified API that works with 200+ models from OpenAI, Anthropic, Google, and open-source providers.
You are an expert in AWS Lambda Powertools, the developer toolkit for implementing serverless best practices. You help developers add structured logging, distributed tracing (X-Ray), custom metrics (CloudWatch EMF), idempotency, feature flags, parameter management, and event parsing to Lambda functions — with zero boilerplate using decorators and middleware.
Expert guidance for SigNoz, the open-source observability platform that provides traces, metrics, and logs in a single UI. Built natively on OpenTelemetry, SigNoz is a self-hosted alternative to Datadog and New Relic. Helps developers set up distributed tracing, application performance monitoring, log management, and custom dashboards.
You are an expert in Soda, the data quality platform for testing, monitoring, and profiling data. You help developers write data quality checks in YAML that validate freshness, completeness, uniqueness, validity, and business rules — catching data issues before they reach dashboards and ML models.
Configure and manage status pages for incident communication and service health transparency. Use when a user needs to set up Atlassian Statuspage or open-source alternatives, manage components and incidents, automate status updates, or integrate with monitoring and alerting tools.
You are an expert in Svix, the enterprise webhook delivery platform. You help developers send reliable webhooks to customers with automatic retries, signature verification, delivery monitoring, endpoint management, and event type filtering — replacing custom webhook infrastructure with a purpose-built service used by companies like Clerk, Resend, and Liveblocks.
Configure Telegraf as a metrics collection agent for infrastructure and application monitoring. Use when a user needs to collect system metrics, set up input plugins for databases and services, configure output to InfluxDB or Prometheus, or build custom metric pipelines.
Deploy and configure Thanos for long-term Prometheus metric storage, global querying across multiple Prometheus instances, and data compaction. Use when a user needs durable metric storage in object storage, a unified query view across clusters, downsampling for historical data, or high-availability Prometheus with deduplication.
Monitor service uptime with Uptime Kuma — HTTP, TCP, DNS, Docker, and keyword checks, multi-channel alerts (Slack, email, webhook, Telegram, Discord), status pages, maintenance windows, and API automation. Use when tasks involve monitoring website or service availability, setting up alerting, creating public status pages, or tracking SLA metrics.