> coreweave-observability

Set up GPU monitoring and observability for CoreWeave workloads. Use when implementing GPU metrics dashboards, configuring alerts, or tracking inference latency and throughput. Trigger with phrases like "coreweave monitoring", "coreweave observability", "coreweave gpu metrics", "coreweave grafana".

fetch

$curl "https://skillshub.wtf/jeremylongshore/claude-code-plugins-plus-skills/coreweave-observability?format=md"

SKILL.md•coreweave-observability

CoreWeave Observability

GPU Metrics (DCGM Exporter)

CKS clusters come with DCGM exporter pre-installed. Key metrics:

Metric	Description
`DCGM_FI_DEV_GPU_UTIL`	GPU core utilization %
`DCGM_FI_DEV_FB_USED`	GPU memory used (MB)
`DCGM_FI_DEV_FB_FREE`	GPU memory free (MB)
`DCGM_FI_DEV_POWER_USAGE`	Power consumption (W)
`DCGM_FI_DEV_GPU_TEMP`	GPU temperature (C)

Prometheus Alert Rules

groups:
  - name: coreweave-gpu
    rules:
      - alert: GPUUtilizationLow
        expr: avg(DCGM_FI_DEV_GPU_UTIL) < 20
        for: 30m
        labels: { severity: warning }
        annotations:
          summary: "GPU utilization below 20% for 30min -- consider scaling down"

      - alert: GPUMemoryHigh
        expr: DCGM_FI_DEV_FB_USED / (DCGM_FI_DEV_FB_USED + DCGM_FI_DEV_FB_FREE) > 0.95
        for: 5m
        labels: { severity: critical }
        annotations:
          summary: "GPU memory >95% -- risk of OOM"

      - alert: InferencePodDown
        expr: kube_deployment_status_replicas_available{deployment=~".*inference.*"} == 0
        for: 2m
        labels: { severity: critical }

Resources

CoreWeave Observability

Next Steps

For incident response, see coreweave-incident-runbook.

> related_skills --same-repo

> fathom-cost-tuning

Optimize Fathom API usage and plan selection. Trigger with phrases like "fathom cost", "fathom pricing", "fathom plan".

> fathom-core-workflow-b

Sync Fathom meeting data to CRM and build automated follow-up workflows. Use when integrating Fathom with Salesforce, HubSpot, or custom CRMs, or creating automated post-meeting email summaries. Trigger with phrases like "fathom crm sync", "fathom salesforce", "fathom follow-up", "fathom post-meeting workflow".

> fathom-core-workflow-a

Build a meeting analytics pipeline with Fathom transcripts and summaries. Use when extracting insights from meetings, building CRM sync, or creating automated meeting follow-up workflows. Trigger with phrases like "fathom analytics", "fathom meeting pipeline", "fathom transcript analysis", "fathom action items sync".

> fathom-common-errors

Diagnose and fix Fathom API errors including auth failures and missing data. Use when API calls fail, transcripts are empty, or webhooks are not firing. Trigger with phrases like "fathom error", "fathom not working", "fathom api failure", "fix fathom".

┌ stats

installs/wk0

░░░░░░░░░░

github stars2.4K

██████████

first seenMar 23, 2026

└────────────

┌ repo

jeremylongshore/claude-code-plugins-plus-skills

by jeremylongshore

└────────────