> cohere-prod-checklist

Execute Cohere production deployment checklist and rollback procedures. Use when deploying Cohere integrations to production, preparing for launch, or implementing go-live procedures for Cohere-powered apps. Trigger with phrases like "cohere production", "deploy cohere", "cohere go-live", "cohere launch checklist".

fetch
$curl "https://skillshub.wtf/jeremylongshore/claude-code-plugins-plus-skills/cohere-prod-checklist?format=md"
SKILL.mdcohere-prod-checklist

Cohere Production Checklist

Overview

Complete go-live checklist for deploying Cohere API v2 integrations to production with safety gates, health checks, and rollback procedures.

Prerequisites

  • Staging environment tested and verified
  • Production API key (not trial) from dashboard.cohere.com
  • Deployment pipeline configured
  • Monitoring and alerting ready

Checklist

API & Authentication

  • Using production API key (not trial — trial is rate-limited to 20 calls/min)
  • CO_API_KEY stored in secret manager (Vault, AWS Secrets Manager, GCP Secret Manager)
  • Key rotation procedure documented and tested
  • Billing alerts configured at dashboard.cohere.com
  • Using API v2 endpoints (CohereClientV2, not CohereClient)

Code Quality

  • All API calls specify model parameter explicitly
  • embeddingTypes set for all Embed calls (required for v3+)
  • inputType set for all Embed calls (required for v3+)
  • Error handling catches CohereError and CohereTimeoutError
  • Retry logic with exponential backoff for 429 and 5xx
  • No hardcoded API keys in source code
  • Request/response logging excludes API keys and PII

Model Selection

  • Correct model IDs used (not deprecated names):
Use CaseRecommended ModelFallback
Chat/generationcommand-a-03-2025command-r-plus-08-2024
Lightweight chatcommand-r7b-12-2024command-r-08-2024
Embeddingsembed-v4.0embed-english-v3.0
Rerankingrerank-v3.5rerank-english-v3.0

Performance

  • Embed calls batched (up to 96 texts per request)
  • Rerank calls limited to 1000 documents per request
  • Streaming enabled for user-facing chat (chatStream)
  • Connection pooling / keep-alive configured
  • Response caching for repeated embed/rerank queries
  • maxTokens set to prevent runaway generation costs

Health Check Endpoint

// /api/health
import { CohereClientV2, CohereError } from 'cohere-ai';

const cohere = new CohereClientV2();

export async function GET() {
  const start = Date.now();
  let cohereStatus: 'healthy' | 'degraded' | 'down' = 'down';

  try {
    // Cheapest possible health check — minimal chat
    await cohere.chat({
      model: 'command-r7b-12-2024',
      messages: [{ role: 'user', content: 'ping' }],
      maxTokens: 1,
    });
    cohereStatus = 'healthy';
  } catch (err) {
    if (err instanceof CohereError && err.statusCode === 429) {
      cohereStatus = 'degraded'; // Rate limited but reachable
    }
  }

  return Response.json({
    status: cohereStatus === 'healthy' ? 'ok' : 'degraded',
    cohere: {
      status: cohereStatus,
      latencyMs: Date.now() - start,
    },
    timestamp: new Date().toISOString(),
  });
}

Circuit Breaker

class CohereCircuitBreaker {
  private failures = 0;
  private lastFailure = 0;
  private state: 'closed' | 'open' | 'half-open' = 'closed';

  constructor(
    private threshold = 5,
    private resetMs = 60_000
  ) {}

  async call<T>(fn: () => Promise<T>, fallback?: () => T): Promise<T> {
    if (this.state === 'open') {
      if (Date.now() - this.lastFailure > this.resetMs) {
        this.state = 'half-open';
      } else if (fallback) {
        return fallback();
      } else {
        throw new Error('Cohere circuit breaker is open');
      }
    }

    try {
      const result = await fn();
      this.failures = 0;
      this.state = 'closed';
      return result;
    } catch (err) {
      this.failures++;
      this.lastFailure = Date.now();

      if (this.failures >= this.threshold) {
        this.state = 'open';
        console.error(`Cohere circuit breaker OPEN after ${this.failures} failures`);
      }
      throw err;
    }
  }
}

const breaker = new CohereCircuitBreaker();

Gradual Rollout

# Pre-flight
curl -sf https://staging.example.com/api/health | jq '.cohere'
curl -s https://status.cohere.com/api/v2/status.json | jq '.status'

# Deploy with canary (10% traffic)
kubectl apply -f k8s/production.yaml
kubectl rollout pause deployment/app

# Monitor for 10 minutes: error rate, latency, 429s
# Check: No increase in CohereError rate
# Check: P95 latency < 5s for chat, < 500ms for embed/rerank

# Proceed to 100%
kubectl rollout resume deployment/app
kubectl rollout status deployment/app

Monitoring Alerts

AlertConditionSeverity
Cohere unreachableHealth check fails 3xP1
High error rate5xx > 5% of requests/5minP1
Rate limited429 > 10/minP2
High latencyChat P95 > 10sP2
Auth failureAny 401 responseP1
Budget exceededDaily token cost > thresholdP2

Rollback

# Immediate rollback
kubectl rollout undo deployment/app
kubectl rollout status deployment/app

# Verify rollback
curl -sf https://api.example.com/api/health | jq '.cohere'

Output

  • Production-ready Cohere integration with health checks
  • Circuit breaker preventing cascade failures
  • Monitoring alerts for Cohere-specific error conditions
  • Documented rollback procedure

Resources

Next Steps

For version upgrades, see cohere-upgrade-migration.

┌ stats

installs/wk0
░░░░░░░░░░
github stars1.7K
██████████
first seenMar 23, 2026
└────────────

┌ repo

jeremylongshore/claude-code-plugins-plus-skills
by jeremylongshore
└────────────