> coreweave-reference-architecture

Reference architecture for CoreWeave GPU cloud deployments. Use when designing ML infrastructure, planning multi-model serving, or establishing CoreWeave deployment standards. Trigger with phrases like "coreweave architecture", "coreweave design", "coreweave infrastructure", "coreweave best practices".

fetch
$curl "https://skillshub.wtf/jeremylongshore/claude-code-plugins-plus-skills/coreweave-reference-architecture?format=md"
SKILL.mdcoreweave-reference-architecture

CoreWeave Reference Architecture

Architecture Diagram

                    ┌─────────────────────┐
                    │   Load Balancer     │
                    │   (Ingress/LB)      │
                    └──────────┬──────────┘
                               │
              ┌────────────────┼────────────────┐
              │                │                │
     ┌────────▼──────┐ ┌──────▼────────┐ ┌─────▼───────┐
     │ Model A       │ │ Model B       │ │ Model C     │
     │ (vLLM, A100)  │ │ (TGI, H100)  │ │ (SD, L40)   │
     │ 2 replicas    │ │ 1 replica     │ │ 3 replicas  │
     └───────────────┘ └───────────────┘ └─────────────┘
              │                │                │
     ┌────────▼────────────────▼────────────────▼───────┐
     │              Shared Storage (PVC)                │
     │         Models / Checkpoints / Data              │
     └──────────────────────────────────────────────────┘

Project Structure

ml-platform/
├── k8s/
│   ├── base/                    # Shared templates
│   ├── models/
│   │   ├── llama-8b/           # Per-model manifests
│   │   ├── llama-70b/
│   │   └── stable-diffusion/
│   └── infra/
│       ├── storage.yaml         # PVCs
│       ├── secrets.yaml         # Model tokens
│       └── monitoring.yaml      # Prometheus rules
├── containers/
│   ├── vllm/Dockerfile
│   └── custom-server/Dockerfile
├── scripts/
│   ├── deploy.sh
│   └── benchmark.sh
└── monitoring/
    ├── grafana-dashboards/
    └── alert-rules.yaml

Key Design Decisions

DecisionChoiceRationale
Serving frameworkvLLMContinuous batching, PagedAttention
GPU type (production)A100 80GBBest price/performance for inference
StorageShared PVC (SSD)Fast model loading across replicas
AutoscalingKServe + KnativeNative scale-to-zero support
Container registryGHCRGitHub integration, free for public

Resources

Next Steps

For multi-environment setup, see coreweave-multi-env-setup.

┌ stats

installs/wk0
░░░░░░░░░░
github stars1.7K
██████████
first seenMar 23, 2026
└────────────

┌ repo

jeremylongshore/claude-code-plugins-plus-skills
by jeremylongshore
└────────────