> model-deployment
Deploy ML models with FastAPI, Docker, Kubernetes. Use for serving predictions, containerization, monitoring, drift detection, or encountering latency issues, health check failures, version conflicts.
curl "https://skillshub.wtf/secondsky/claude-skills/model-deployment?format=md"ML Model Deployment
Deploy trained models to production with proper serving and monitoring.
Deployment Options
| Method | Use Case | Latency |
|---|---|---|
| REST API | Web services | Medium |
| Batch | Large-scale processing | N/A |
| Streaming | Real-time | Low |
| Edge | On-device | Very low |
FastAPI Model Server
from fastapi import FastAPI
from pydantic import BaseModel
import joblib
import numpy as np
app = FastAPI()
model = joblib.load('model.pkl')
class PredictionRequest(BaseModel):
features: list[float]
class PredictionResponse(BaseModel):
prediction: float
probability: float
@app.get('/health')
def health():
return {'status': 'healthy'}
@app.post('/predict', response_model=PredictionResponse)
def predict(request: PredictionRequest):
features = np.array(request.features).reshape(1, -1)
prediction = model.predict(features)[0]
probability = model.predict_proba(features)[0].max()
return PredictionResponse(prediction=prediction, probability=probability)
Docker Deployment
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY model.pkl .
COPY app.py .
EXPOSE 8000
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]
Model Monitoring
class ModelMonitor:
def __init__(self):
self.predictions = []
self.latencies = []
def log_prediction(self, input_data, prediction, latency):
self.predictions.append({
'input': input_data,
'prediction': prediction,
'latency': latency,
'timestamp': datetime.now()
})
def detect_drift(self, reference_distribution):
# Compare current predictions to reference
pass
Deployment Checklist
- Model validated on test set
- API endpoints documented
- Health check endpoint
- Authentication configured
- Logging and monitoring setup
- Model versioning in place
- Rollback procedure documented
Quick Start: Deploy Model in 6 Steps
# 1. Save trained model
import joblib
joblib.dump(model, 'model.pkl')
# 2. Create FastAPI app (see references/fastapi-production-server.md)
# app.py with /predict and /health endpoints
# 3. Create Dockerfile
cat > Dockerfile << 'EOF'
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY app.py model.pkl ./
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]
EOF
# 4. Build and test locally
docker build -t model-api:v1.0.0 .
docker run -p 8000:8000 model-api:v1.0.0
# 5. Push to registry
docker tag model-api:v1.0.0 registry.example.com/model-api:v1.0.0
docker push registry.example.com/model-api:v1.0.0
# 6. Deploy to Kubernetes
kubectl apply -f deployment.yaml
kubectl rollout status deployment/model-api
Known Issues Prevention
1. No Health Checks = Downtime
Problem: Load balancer sends traffic to unhealthy pods, causing 503 errors.
Solution: Implement both liveness and readiness probes:
# app.py
@app.get("/health") # Liveness: Is service alive?
async def health():
return {"status": "healthy"}
@app.get("/ready") # Readiness: Can handle traffic?
async def ready():
try:
_ = model_store.model # Verify model loaded
return {"status": "ready"}
except:
raise HTTPException(503, "Not ready")
# deployment.yaml
livenessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 30
readinessProbe:
httpGet:
path: /ready
port: 8000
initialDelaySeconds: 5
2. Model Not Found Errors in Container
Problem: FileNotFoundError: model.pkl when container starts.
Solution: Verify model file is copied in Dockerfile and path matches:
# ❌ Wrong: Model in wrong directory
COPY model.pkl /app/models/ # But code expects /app/model.pkl
# ✅ Correct: Consistent paths
COPY model.pkl /models/model.pkl
ENV MODEL_PATH=/models/model.pkl
# In Python:
model_path = os.getenv("MODEL_PATH", "/models/model.pkl")
3. Unhandled Input Validation = 500 Errors
Problem: Invalid inputs crash API with unhandled exceptions.
Solution: Use Pydantic for automatic validation:
from pydantic import BaseModel, Field, validator
class PredictionRequest(BaseModel):
features: List[float] = Field(..., min_items=1, max_items=100)
@validator('features')
def validate_finite(cls, v):
if not all(np.isfinite(val) for val in v):
raise ValueError("All features must be finite")
return v
# FastAPI auto-validates and returns 422 for invalid requests
@app.post("/predict")
async def predict(request: PredictionRequest):
# Request is guaranteed valid here
pass
4. No Drift Monitoring = Silent Degradation
Problem: Model performance degrades over time, no one notices until users complain.
Solution: Implement drift detection (see references/model-monitoring-drift.md):
monitor = ModelMonitor(reference_data=training_data, drift_threshold=0.1)
@app.post("/predict")
async def predict(request: PredictionRequest):
prediction = model.predict(features)
monitor.log_prediction(features, prediction, latency)
# Alert if drift detected
if monitor.should_retrain():
alert_manager.send_alert("Model drift detected - retrain recommended")
return prediction
5. Missing Resource Limits = OOM Kills
Problem: Pod killed by Kubernetes OOMKiller, service goes down.
Solution: Set memory/CPU limits and requests:
resources:
requests:
memory: "512Mi" # Guaranteed
cpu: "500m"
limits:
memory: "1Gi" # Max allowed
cpu: "1000m"
# Monitor actual usage:
kubectl top pods
6. No Rollback Plan = Stuck on Bad Deploy
Problem: New model version has bugs, no way to revert quickly.
Solution: Tag images with versions, keep previous deployment:
# Deploy with version tag
kubectl set image deployment/model-api model-api=registry/model-api:v1.2.0
# If issues, rollback to previous
kubectl rollout undo deployment/model-api
# Or specify version
kubectl set image deployment/model-api model-api=registry/model-api:v1.1.0
7. Synchronous Prediction = Slow Batch Processing
Problem: Processing 10,000 predictions one-by-one takes hours.
Solution: Implement batch endpoint:
@app.post("/predict/batch")
async def predict_batch(request: BatchPredictionRequest):
# Process all at once (vectorized)
features = np.array(request.instances)
predictions = model.predict(features) # Much faster!
return {"predictions": predictions.tolist()}
8. No CI/CD Validation = Deploy Bad Models
Problem: Deploying model that fails basic tests, breaking production.
Solution: Validate in CI pipeline (see references/cicd-ml-models.md):
# .github/workflows/deploy.yml
- name: Validate model performance
run: |
python scripts/validate_model.py \
--model model.pkl \
--test-data test.csv \
--min-accuracy 0.85 # Fail if below threshold
Best Practices
- Version everything: Models (semantic versioning), Docker images, deployments
- Monitor continuously: Latency, error rate, drift, resource usage
- Test before deploy: Unit tests, integration tests, performance benchmarks
- Deploy gradually: Canary (10%), then full rollout
- Plan for rollback: Keep previous version, document procedure
- Log predictions: Enable debugging and drift detection
- Set resource limits: Prevent OOM kills and resource contention
- Use health checks: Enable proper load balancing
When to Load References
Load reference files for detailed implementations:
-
FastAPI Production Server: Load
references/fastapi-production-server.mdfor complete production-ready FastAPI implementation with error handling, validation (Pydantic models), logging, health/readiness probes, batch predictions, model versioning, middleware, exception handlers, and performance optimizations (caching, async) -
Model Monitoring & Drift: Load
references/model-monitoring-drift.mdfor ModelMonitor implementation with KS-test drift detection, Jensen-Shannon divergence, Prometheus metrics integration, alert configuration (Slack, email), continuous monitoring service, and dashboard endpoints -
Containerization & Deployment: Load
references/containerization-deployment.mdfor multi-stage Dockerfiles, model versioning in containers, Docker Compose setup, A/B testing with Nginx, Kubernetes deployments (rolling update, blue-green, canary), GitHub Actions CI/CD, and deployment checklists -
CI/CD for ML Models: Load
references/cicd-ml-models.mdfor complete GitHub Actions pipeline with model validation, data validation, automated testing, security scanning, performance benchmarks, automated rollback, and deployment strategies
> related_skills --same-repo
> zustand-state-management
--- name: zustand-state-management description: Zustand state management for React with TypeScript. Use for global state, Redux/Context API migration, localStorage persistence, slices pattern, devtools, Next.js SSR, or encountering hydration errors, TypeScript inference issues, persist middleware problems, infinite render loops. Keywords: zustand, state management, React state, TypeScript state, persist middleware, devtools, slices pattern, global state, React hooks, create store, useBoundS
> zod
TypeScript-first schema validation and type inference. Use for validating API requests/responses, form data, env vars, configs, defining type-safe schemas with runtime validation, transforming data, generating JSON Schema for OpenAPI/AI, or encountering missing validation errors, type inference issues, validation error handling problems. Zero dependencies (2kb gzipped).
> xss-prevention
--- name: xss-prevention description: XSS attack prevention with input sanitization, output encoding, Content Security Policy. Use for user-generated content, rich text editors, web application security, or encountering stored XSS, reflected XSS, DOM manipulation, script injection errors. Keywords: sanitization, HTML-encoding, DOMPurify, CSP, Content-Security-Policy, rich-text-editor, user-input, escaping, innerHTML, DOM-manipulation, stored-XSS, reflected-XSS, input-validation, output-encodi
> wordpress-plugin-core
--- name: wordpress-plugin-core description: WordPress plugin development with hooks, security, REST API, custom post types. Use for plugin creation, $wpdb queries, Settings API, or encountering SQL injection, XSS, CSRF, nonce errors. Keywords: wordpress plugin development, wordpress security, wordpress hooks, wordpress filters, wordpress database, wpdb prepare, sanitize_text_field, esc_html, wp_nonce, custom post type, register_post_type, settings api, rest api, admin-ajax, wordpress sql inj