found 157 skills in registry
Configure Prometheus Alertmanager for alert routing, grouping, silencing, and notification delivery. Use when a user needs to set up alert receivers (Slack, PagerDuty, email), define routing trees, manage silences and inhibition rules, or troubleshoot alert delivery pipelines.
Apache Airflow is a platform for programmatically authoring, scheduling, and monitoring workflows. Learn to write DAGs, use operators, set up connections, configure scheduling, and deploy with Docker Compose.
Manage Railway deployments using the CLI. Use when a user asks to deploy to Railway, check deployment status, manage Railway services, set environment variables on Railway, view Railway logs, link a Railway project, add a database on Railway, scale Railway services, manage Railway environments, rollback a Railway deployment, or run commands with Railway env vars. Covers the full deploy lifecycle from project setup to production monitoring.
Spring Boot is a Java framework that simplifies building production-ready applications. It provides auto-configuration, embedded servers, and opinionated defaults for REST APIs, data access with JPA, security, and monitoring via Actuator.
Grafana is an open-source visualization and dashboarding platform that connects to dozens of data sources including Prometheus, PostgreSQL, ClickHouse, and Elasticsearch. It lets you build interactive dashboards with panels, set up alerting rules, and manage everything as code through JSON dashboard definitions and provisioning configuration.
Configure and manage Datadog for infrastructure monitoring, application performance monitoring (APM), log management, and alerting. Use when a user needs to set up Datadog agents, create dashboards, configure monitors and alerts, integrate services, or query metrics and logs through Datadog's API.
You are an expert in Arize and its open-source Phoenix library for AI observability. You help developers monitor LLM applications with tracing, evaluation, embedding analysis, drift detection, and retrieval quality metrics — using Phoenix for local development (open-source, self-hosted) and Arize platform for production monitoring at scale.
Expert guidance for Checkly, the synthetic monitoring platform that runs Playwright-based browser checks and API checks from locations worldwide. Helps developers implement monitoring-as-code (MaC) with the Checkly CLI, set up API and browser checks, configure alerting, and integrate monitoring into CI/CD pipelines.
Run AI agent code safely in isolated sandboxes with resource limits, audit trails, and kill switches. Use when someone asks to "sandbox my agent", "run agent code safely", "add guardrails to AI agent", "isolate agent execution", "audit agent actions", "prevent agent from deleting files", "restrict agent permissions", or "add safety controls to AI coding agent". Covers Docker isolation, filesystem restrictions, network policies, resource locking, and comprehensive audit logging.
Deploy and configure 3proxy — a lightweight universal proxy server. Use when a user asks to set up HTTP, HTTPS, SOCKS4, SOCKS5, or transparent proxies, build proxy chains, configure authentication, set bandwidth limits, manage access control lists, set up proxy rotation, create multi-port proxy servers, configure logging and traffic accounting, or deploy a lightweight proxy without heavy VPN overhead. Covers all 3proxy features including proxy chaining, ACLs, traffic shaping, and multi-protocol
You are an expert in AWS Lambda Powertools, the developer toolkit for implementing serverless best practices. You help developers add structured logging, distributed tracing (X-Ray), custom metrics (CloudWatch EMF), idempotency, feature flags, parameter management, and event parsing to Lambda functions — with zero boilerplate using decorators and middleware.
Linkerd lightweight service mesh for Kubernetes. Use when the user needs automatic mTLS, traffic splitting, retries, and observability with minimal resource overhead and operational complexity.
Assists with monitoring application errors, performance, and user experience using Sentry. Use when integrating Sentry SDKs, configuring alerting, analyzing stack traces, uploading source maps, or tracking release health in production. Trigger words: sentry, error monitoring, error tracking, performance monitoring, source maps, session replay.
This skill automates the setup of distributed tracing for microservices. It helps developers implement end-to-end request visibility by configuring context propagation, span creation, trace collection, and analysis. Use this skill when the user requests to set up distributed tracing, implement observability, or troubleshoot performance issues in a microservices architecture. The skill is triggered by phrases such as "setup tracing", "implement distributed tracing", "configure opentelemetry", or
This skill automates the setup of machine learning experiment tracking using tools like MLflow or Weights & Biases (W&B). It is triggered when the user requests to "track experiments", "setup experiment tracking", "initialize MLflow", or "integrate W&B". The skill configures the necessary environment, initializes the tracking server (if needed), and provides code snippets for logging experiment parameters, metrics, and artifacts. It helps ensure reproducibility and simplifies the comparison of d
Expert service mesh architect specializing in Istio, Linkerd, and cloud-native networking patterns. Masters traffic management, security policies, observability integration, and multi-cluster mesh con
通过可观测性、安全边界和生命周期管理来操作长期运行的代理工作负载。
This skill helps implement database audit logging for tracking changes and ensuring compliance. It is triggered when the user requests to "implement database audit logging", "add audit trails", "track database changes", or mentions "audit_log" in relation to a database. The skill provides options for trigger-based auditing, application-level logging, Change Data Capture (CDC), and parsing database logs. It generates a basic audit table schema and guides the user through selecting the appropriate
Set up Grafana, Prometheus, and Loki for metrics, logs, and tracing.
Expert performance engineer specializing in modern observability, application optimization, and scalable system performance. Masters OpenTelemetry, distributed tracing, load testing, multi-tier caching, Core Web Vitals, and performance monitoring. Handles end-to-end optimization, real user monitoring, and scalability patterns. Use PROACTIVELY for performance optimization, observability, or scalability challenges.