> anth-cost-tuning
Optimize Anthropic Claude API costs with model routing, prompt caching, batching, and spend monitoring. Use when analyzing Claude API billing, reducing costs, or implementing cost controls and budget alerts. Trigger with phrases like "anthropic cost", "claude billing", "reduce claude spend", "anthropic budget", "claude pricing optimize".
curl "https://skillshub.wtf/jeremylongshore/claude-code-plugins-plus-skills/anth-cost-tuning?format=md"Anthropic Cost Tuning
Overview
Optimize Claude API spend through model routing, prompt caching, the Message Batches API, and real-time cost tracking. The four biggest levers: model selection (4-19x), prompt caching (10x input), batches (2x), and max_tokens discipline.
Pricing Reference (per million tokens)
| Model | Input | Output | Cache Read | Cache Write |
|---|---|---|---|---|
| Claude Haiku | $0.80 | $4.00 | $0.08 | $1.00 |
| Claude Sonnet | $3.00 | $15.00 | $0.30 | $3.75 |
| Claude Opus | $15.00 | $75.00 | $1.50 | $18.75 |
Message Batches: 50% off all model pricing for async processing.
Cost Calculator
def estimate_cost(
input_tokens: int,
output_tokens: int,
model: str = "claude-sonnet-4-20250514",
cached_input: int = 0,
use_batch: bool = False
) -> float:
pricing = {
"claude-haiku-4-20250514": {"input": 0.80, "output": 4.00, "cache_read": 0.08},
"claude-sonnet-4-20250514": {"input": 3.00, "output": 15.00, "cache_read": 0.30},
"claude-opus-4-20250514": {"input": 15.00, "output": 75.00, "cache_read": 1.50},
}
rates = pricing[model]
uncached_input = input_tokens - cached_input
cost = (
uncached_input * rates["input"] +
cached_input * rates["cache_read"] +
output_tokens * rates["output"]
) / 1_000_000
if use_batch:
cost *= 0.5
return cost
# Example: 10K requests/day, 500 input + 200 output tokens each
daily = estimate_cost(500, 200, "claude-sonnet-4-20250514") * 10_000
print(f"Daily: ${daily:.2f}") # ~$0.045 * 10K = $450/day
print(f"Monthly: ${daily * 30:.2f}") # ~$13,500/month
# Same with Haiku + batching
daily_optimized = estimate_cost(500, 200, "claude-haiku-4-20250514", use_batch=True) * 10_000
print(f"Optimized: ${daily_optimized:.2f}/day") # ~$22/day (20x cheaper)
Strategy 1: Model Routing
def route_to_model(task: str, complexity: str) -> str:
"""Route tasks to cheapest adequate model."""
# Haiku: classification, extraction, yes/no, routing ($0.80/$4)
if task in ("classify", "extract", "route", "validate"):
return "claude-haiku-4-20250514"
# Sonnet: general tasks, code, tool use ($3/$15)
if complexity in ("low", "medium"):
return "claude-sonnet-4-20250514"
# Opus: only for complex reasoning, research ($15/$75)
return "claude-opus-4-20250514"
Strategy 2: Prompt Caching
# Cache system prompts and reference documents (90% input savings)
# Break-even: 2 requests with same cached content
message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=256,
system=[{
"type": "text",
"text": large_reference_document, # 10K+ tokens
"cache_control": {"type": "ephemeral"}
}],
messages=[{"role": "user", "content": user_question}]
)
Strategy 3: Batches for Non-Real-Time
# 50% cost reduction for anything that doesn't need immediate response
# Ideal for: summarization pipelines, data extraction, content generation
batch = client.messages.batches.create(requests=[...]) # Up to 100K requests
Strategy 4: Spend Tracking
import anthropic
from dataclasses import dataclass, field
@dataclass
class SpendTracker:
budget_usd: float = 100.0
spent_usd: float = 0.0
requests: int = 0
def track(self, response):
cost = estimate_cost(
response.usage.input_tokens,
response.usage.output_tokens,
response.model,
getattr(response.usage, "cache_read_input_tokens", 0)
)
self.spent_usd += cost
self.requests += 1
if self.spent_usd > self.budget_usd * 0.8:
print(f"WARNING: 80% budget used (${self.spent_usd:.2f}/${self.budget_usd})")
if self.spent_usd > self.budget_usd:
raise RuntimeError(f"Budget exceeded: ${self.spent_usd:.2f}")
tracker = SpendTracker(budget_usd=50.0)
Cost Reduction Checklist
- Use Haiku for classification/extraction/routing tasks
- Enable prompt caching for repeated system prompts
- Use Message Batches for non-real-time processing
- Set
max_tokensto realistic values (not maximum) - Use prefill to reduce output preamble tokens
- Implement spend tracking and budget alerts
- Monitor via Usage API
Resources
Next Steps
For architecture patterns, see anth-reference-architecture.
> related_skills --same-repo
> fathom-cost-tuning
Optimize Fathom API usage and plan selection. Trigger with phrases like "fathom cost", "fathom pricing", "fathom plan".
> fathom-core-workflow-b
Sync Fathom meeting data to CRM and build automated follow-up workflows. Use when integrating Fathom with Salesforce, HubSpot, or custom CRMs, or creating automated post-meeting email summaries. Trigger with phrases like "fathom crm sync", "fathom salesforce", "fathom follow-up", "fathom post-meeting workflow".
> fathom-core-workflow-a
Build a meeting analytics pipeline with Fathom transcripts and summaries. Use when extracting insights from meetings, building CRM sync, or creating automated meeting follow-up workflows. Trigger with phrases like "fathom analytics", "fathom meeting pipeline", "fathom transcript analysis", "fathom action items sync".
> fathom-common-errors
Diagnose and fix Fathom API errors including auth failures and missing data. Use when API calls fail, transcripts are empty, or webhooks are not firing. Trigger with phrases like "fathom error", "fathom not working", "fathom api failure", "fix fathom".