> litellm
Call 100+ LLM APIs with one interface using LiteLLM — unified API proxy for OpenAI, Anthropic, Google, Mistral, Cohere, and self-hosted models. Use when someone asks to "switch between LLM providers", "LiteLLM", "unified LLM API", "LLM proxy", "call Claude and GPT with the same code", "LLM load balancing", or "multi-model AI gateway". Covers provider routing, fallbacks, rate limiting, spend tracking, and self-hosted proxy.
curl "https://skillshub.wtf/TerminalSkills/skills/litellm?format=md"LiteLLM
Overview
LiteLLM provides a single API to call 100+ LLM providers — OpenAI, Anthropic, Google Gemini, Mistral, Cohere, Azure, Bedrock, Ollama, and more. Write your code once using the OpenAI SDK format, then switch providers by changing a model string. As a proxy server, it adds load balancing, fallbacks, rate limiting, spend tracking, and API key management for teams.
When to Use
- Using multiple LLM providers and want a unified interface
- Need automatic fallbacks (if Claude is down, use GPT)
- Cost tracking across multiple providers and teams
- Load balancing requests across multiple API keys or models
- Self-hosted proxy to manage LLM access for a team
Instructions
Setup
pip install litellm
# Or run as proxy server
pip install 'litellm[proxy]'
SDK Usage (Python)
# llm.py — Call any LLM with the same interface
from litellm import completion
# OpenAI
response = completion(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}],
)
# Anthropic — same interface, just change the model string
response = completion(
model="claude-sonnet-4-20250514",
messages=[{"role": "user", "content": "Hello!"}],
)
# Google Gemini
response = completion(
model="gemini/gemini-2.0-flash",
messages=[{"role": "user", "content": "Hello!"}],
)
# Local Ollama
response = completion(
model="ollama/llama3",
messages=[{"role": "user", "content": "Hello!"}],
api_base="http://localhost:11434",
)
# All return the same response format (OpenAI-compatible)
print(response.choices[0].message.content)
Proxy Server
# litellm_config.yaml — Proxy configuration
model_list:
- model_name: "fast"
litellm_params:
model: gpt-4o-mini
api_key: sk-...
- model_name: "smart"
litellm_params:
model: claude-sonnet-4-20250514
api_key: sk-ant-...
- model_name: "smart" # Second "smart" model = load balancing
litellm_params:
model: gpt-4o
api_key: sk-...
- model_name: "cheap"
litellm_params:
model: gemini/gemini-2.0-flash
api_key: AIza...
router_settings:
routing_strategy: "latency-based-routing"
num_retries: 3
timeout: 30
fallbacks: [{"smart": ["fast"]}] # If smart fails, use fast
general_settings:
master_key: "sk-master-key-xxx" # Admin key
# Start proxy
litellm --config litellm_config.yaml --port 4000
# Call via OpenAI SDK (any language!)
curl http://localhost:4000/v1/chat/completions \
-H "Authorization: Bearer sk-master-key-xxx" \
-d '{"model": "smart", "messages": [{"role": "user", "content": "Hello"}]}'
Node.js via Proxy
// app.ts — Use any OpenAI SDK client with LiteLLM proxy
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "http://localhost:4000/v1",
apiKey: "sk-master-key-xxx",
});
// Calls route to Claude or GPT based on load balancing config
const response = await client.chat.completions.create({
model: "smart",
messages: [{ role: "user", content: "Explain monads simply." }],
});
Spend Tracking
# Track costs per team/user/project
from litellm import completion
response = completion(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}],
metadata={
"user": "user-123",
"team": "engineering",
"project": "chatbot",
},
)
# LiteLLM proxy stores costs in its database
# Query via API: GET /spend/logs?user=user-123
Examples
Example 1: Multi-provider AI application
User prompt: "My app uses Claude for reasoning and GPT-4o for function calling. Set up a unified interface."
The agent will configure LiteLLM with named model groups, route by capability, and add fallbacks between providers.
Example 2: Team LLM gateway with cost controls
User prompt: "Set up an LLM proxy for our team with per-user rate limits and spend tracking."
The agent will deploy the LiteLLM proxy, configure API keys per team member, set rate limits and budget caps, and enable spend logging.
Guidelines
- Model format:
provider/model—anthropic/claude-sonnet-4-20250514,gemini/gemini-2.0-flash - Proxy for teams — centralize API keys, track spend, enforce rate limits
- Fallbacks for reliability — if primary model fails, route to backup
- Load balancing — multiple entries with same
model_namedistribute traffic - Latency-based routing — LiteLLM picks the fastest responding provider
- Spend tracking — costs calculated per-request, queryable via API
- OpenAI SDK compatible — any OpenAI client library works with the proxy
- Streaming works —
stream=Trueworks across all providers - Environment variables —
OPENAI_API_KEY,ANTHROPIC_API_KEYetc. auto-detected
> related_skills --same-repo
> zustand
You are an expert in Zustand, the small, fast, and scalable state management library for React. You help developers manage global state without boilerplate using Zustand's hook-based stores, selectors for performance, middleware (persist, devtools, immer), computed values, and async actions — replacing Redux complexity with a simple, un-opinionated API in under 1KB.
> zoho
Integrate and automate Zoho products. Use when a user asks to work with Zoho CRM, Zoho Books, Zoho Desk, Zoho Projects, Zoho Mail, or Zoho Creator, build custom integrations via Zoho APIs, automate workflows with Deluge scripting, sync data between Zoho apps and external systems, manage leads and deals, automate invoicing, build custom Zoho Creator apps, set up webhooks, or manage Zoho organization settings. Covers Zoho CRM, Books, Desk, Projects, Creator, and cross-product integrations.
> zod
You are an expert in Zod, the TypeScript-first schema declaration and validation library. You help developers define schemas that validate data at runtime AND infer TypeScript types at compile time — eliminating the need to write types and validators separately. Used for API input validation, form validation, environment variables, config files, and any data boundary.
> zipkin
Deploy and configure Zipkin for distributed tracing and request flow visualization. Use when a user needs to set up trace collection, instrument Java/Spring or other services with Zipkin, analyze service dependencies, or configure storage backends for trace data.