> deploying-monitoring-stacks

Monitor use when deploying monitoring stacks including Prometheus, Grafana, and Datadog. Trigger with phrases like "deploy monitoring stack", "setup prometheus", "configure grafana", or "install datadog agent". Generates production-ready configurations with metric collection, visualization dashboards, and alerting rules.

fetch

$curl "https://skillshub.wtf/jeremylongshore/claude-code-plugins-plus-skills/deploying-monitoring-stacks?format=md"

SKILL.md•deploying-monitoring-stacks

Deploying Monitoring Stacks

Overview

Deploy production monitoring stacks (Prometheus + Grafana, Datadog, or Victoria Metrics) with metric collection, custom dashboards, and alerting rules. Configure exporters, scrape targets, recording rules, and notification channels for comprehensive infrastructure and application observability.

Prerequisites

Target infrastructure identified: Kubernetes cluster, Docker hosts, or bare-metal servers
Metric endpoints accessible from the monitoring platform (application /metrics, node exporters)
Storage backend capacity planned for time-series data (Prometheus TSDB, Thanos, or Cortex for long-term)
Alert notification channels defined: Slack webhook, PagerDuty integration key, or email SMTP
Helm 3+ for Kubernetes deployments using kube-prometheus-stack or similar charts

Instructions

Select the monitoring platform: Prometheus + Grafana for open-source self-hosted, Datadog for managed SaaS, Victoria Metrics for high-cardinality workloads
Deploy the monitoring stack: helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack or Docker Compose for non-Kubernetes
Install exporters on monitored systems: node-exporter for host metrics, kube-state-metrics for Kubernetes object states, application-specific exporters
Configure scrape targets in prometheus.yml: define job names, scrape intervals, and relabeling rules for service discovery
Create recording rules for frequently queried aggregations to reduce dashboard query load
Define alerting rules with meaningful thresholds: high CPU (>80% for 5m), high memory (>90%), error rate (>1%), latency P99 (>500ms)
Configure Alertmanager with routing, grouping, and notification channels (Slack, PagerDuty, email)
Build Grafana dashboards: RED metrics (Rate, Errors, Duration) for services, USE metrics (Utilization, Saturation, Errors) for resources
Set up data retention: configure TSDB retention period (15-30 days local), set up Thanos/Cortex for long-term storage if needed
Test the full pipeline: trigger a test alert and verify notification delivery

Output

Helm values file or Docker Compose for the monitoring stack
Prometheus configuration with scrape targets, recording rules, and alerting rules
Alertmanager configuration with routing tree and notification receivers
Grafana dashboard JSON files for infrastructure and application metrics
Exporter deployment manifests (node-exporter DaemonSet, application ServiceMonitor)

Error Handling

Error	Cause	Solution
`No data points in dashboard`	Scrape target not reachable or metric name wrong	Check `Targets` page in Prometheus UI; verify service discovery and metric name
`Too many time series (high cardinality)`	Labels with unbounded values (user IDs, request IDs)	Remove high-cardinality labels with `metric_relabel_configs`; use recording rules for aggregation
`Alert condition met but no notification`	Alertmanager routing or receiver misconfigured	Verify Alertmanager config with `amtool check-config`; test receiver with `amtool silence`
`Prometheus OOMKilled`	Insufficient memory for series count	Increase memory limits; reduce scrape targets or retention; add WAL compression
`Grafana datasource connection failed`	Wrong Prometheus URL or network policy blocking access	Verify datasource URL in Grafana; check Kubernetes service name and port; review network policies

Examples

"Deploy kube-prometheus-stack on Kubernetes with alerts for node CPU > 80%, pod restart count > 5, and API error rate > 1%, sending to Slack."
"Set up Prometheus + Grafana on Docker Compose for monitoring 10 application servers with node-exporter and custom application metrics."
"Create Grafana dashboards for the four golden signals (latency, traffic, errors, saturation) for a microservices application."

Resources

Prometheus documentation: https://prometheus.io/docs/
Grafana documentation: https://grafana.com/docs/grafana/latest/
kube-prometheus-stack: https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack
Alerting best practices: https://prometheus.io/docs/practices/alerting/
Datadog documentation: https://docs.datadoghq.com/

> related_skills --same-repo

> fathom-cost-tuning

Optimize Fathom API usage and plan selection. Trigger with phrases like "fathom cost", "fathom pricing", "fathom plan".

> fathom-core-workflow-b

Sync Fathom meeting data to CRM and build automated follow-up workflows. Use when integrating Fathom with Salesforce, HubSpot, or custom CRMs, or creating automated post-meeting email summaries. Trigger with phrases like "fathom crm sync", "fathom salesforce", "fathom follow-up", "fathom post-meeting workflow".

> fathom-core-workflow-a

Build a meeting analytics pipeline with Fathom transcripts and summaries. Use when extracting insights from meetings, building CRM sync, or creating automated meeting follow-up workflows. Trigger with phrases like "fathom analytics", "fathom meeting pipeline", "fathom transcript analysis", "fathom action items sync".

> fathom-common-errors

Diagnose and fix Fathom API errors including auth failures and missing data. Use when API calls fail, transcripts are empty, or webhooks are not firing. Trigger with phrases like "fathom error", "fathom not working", "fathom api failure", "fix fathom".

┌ stats

installs/wk0

░░░░░░░░░░

github stars2.4K

██████████

first seenMar 23, 2026

└────────────

┌ repo

jeremylongshore/claude-code-plugins-plus-skills

by jeremylongshore

└────────────