> elevenlabs-rate-limits

Implement ElevenLabs rate limiting, concurrency queuing, and backoff patterns. Use when handling 429 errors, implementing retry logic, or managing concurrent TTS request throughput. Trigger: "elevenlabs rate limit", "elevenlabs throttling", "elevenlabs 429", "elevenlabs retry", "elevenlabs backoff", "elevenlabs concurrent requests".

fetch

$curl "https://skillshub.wtf/jeremylongshore/claude-code-plugins-plus-skills/elevenlabs-rate-limits?format=md"

SKILL.md•elevenlabs-rate-limits

ElevenLabs Rate Limits

Overview

Handle ElevenLabs rate limits with plan-aware concurrency queuing, exponential backoff, and quota monitoring. ElevenLabs uses two rate limit mechanisms: concurrent request limits (per plan) and system-level throttling.

Prerequisites

ElevenLabs SDK installed
Understanding of your subscription plan's limits
p-queue package (recommended): npm install p-queue

Instructions

Step 1: Understand the Two 429 Error Types

ElevenLabs returns HTTP 429 for two different reasons:

429 Variant	Response Body	Cause	Strategy
`too_many_concurrent_requests`	`{"detail":{"status":"too_many_concurrent_requests"}}`	Exceeded plan concurrency	Queue requests, don't backoff
`system_busy`	`{"detail":{"status":"system_busy"}}`	Server overload	Exponential backoff

Step 2: Plan Concurrency Limits

Plan	Max Concurrent Requests	Characters/Month
Free	2	10,000
Starter	3	30,000
Creator	5	100,000
Pro	10	500,000
Scale	15	2,000,000
Business	15	Custom

Step 3: Concurrency-Aware Request Queue

// src/elevenlabs/rate-limiter.ts
import PQueue from "p-queue";

type ElevenLabsPlan = "free" | "starter" | "creator" | "pro" | "scale" | "business";

const CONCURRENCY_LIMITS: Record<ElevenLabsPlan, number> = {
  free: 2,
  starter: 3,
  creator: 5,
  pro: 10,
  scale: 15,
  business: 15,
};

export function createRequestQueue(plan: ElevenLabsPlan) {
  const concurrency = CONCURRENCY_LIMITS[plan];

  const queue = new PQueue({
    concurrency,
    // Each queued request adds ~50ms to response time
    // so keep queue depth reasonable
    timeout: 120_000,  // 2 minute timeout per request
    throwOnTimeout: true,
  });

  queue.on("error", (error) => {
    console.error("[ElevenLabs Queue] Request failed:", error.message);
  });

  return queue;
}

// Usage
const queue = createRequestQueue("pro"); // 10 concurrent

async function generateWithQueue(voiceId: string, text: string) {
  return queue.add(async () => {
    return client.textToSpeech.convert(voiceId, {
      text,
      model_id: "eleven_flash_v2_5",
    });
  });
}

// All 20 requests run with max 10 concurrent
const results = await Promise.all(
  texts.map(text => generateWithQueue("21m00Tcm4TlvDq8ikWAM", text))
);

Step 4: Exponential Backoff for system_busy

// src/elevenlabs/backoff.ts
export async function withBackoff<T>(
  operation: () => Promise<T>,
  config = {
    maxRetries: 5,
    baseDelayMs: 1000,
    maxDelayMs: 32_000,
    jitterMs: 500,
  }
): Promise<T> {
  for (let attempt = 0; attempt <= config.maxRetries; attempt++) {
    try {
      return await operation();
    } catch (error: any) {
      const status = error.statusCode || error.status;
      const errorType = error.body?.detail?.status;

      // Don't retry non-retryable errors
      if (status === 401 || status === 400 || status === 404) throw error;

      // For concurrent limit, retry immediately (queue handles spacing)
      if (errorType === "too_many_concurrent_requests") {
        if (attempt === config.maxRetries) throw error;
        // Short pause — the queue is managing concurrency
        await new Promise(r => setTimeout(r, 50 * (attempt + 1)));
        continue;
      }

      // For system_busy or 5xx, exponential backoff with jitter
      if (attempt === config.maxRetries) throw error;

      const exponentialDelay = config.baseDelayMs * Math.pow(2, attempt);
      const jitter = Math.random() * config.jitterMs;
      const delay = Math.min(exponentialDelay + jitter, config.maxDelayMs);

      console.warn(`[ElevenLabs] ${errorType || status}. Retry ${attempt + 1}/${config.maxRetries} in ${delay.toFixed(0)}ms`);
      await new Promise(r => setTimeout(r, delay));
    }
  }
  throw new Error("Unreachable");
}

Step 5: Quota Monitor

// src/elevenlabs/quota-monitor.ts
import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js";

export class QuotaMonitor {
  private characterCount = 0;
  private characterLimit = 0;
  private lastCheck = 0;

  constructor(
    private client: ElevenLabsClient,
    private warningThresholdPct = 80,
    private checkIntervalMs = 60_000
  ) {}

  async check(): Promise<{
    used: number;
    limit: number;
    remaining: number;
    pctUsed: number;
    warning: boolean;
  }> {
    const now = Date.now();
    if (now - this.lastCheck > this.checkIntervalMs) {
      const user = await this.client.user.get();
      this.characterCount = user.subscription.character_count;
      this.characterLimit = user.subscription.character_limit;
      this.lastCheck = now;
    }

    const remaining = this.characterLimit - this.characterCount;
    const pctUsed = (this.characterCount / this.characterLimit) * 100;

    return {
      used: this.characterCount,
      limit: this.characterLimit,
      remaining,
      pctUsed: Math.round(pctUsed * 10) / 10,
      warning: pctUsed >= this.warningThresholdPct,
    };
  }

  async guardRequest(textLength: number): Promise<void> {
    const quota = await this.check();
    if (textLength > quota.remaining) {
      throw new Error(
        `Insufficient quota: need ${textLength} chars, have ${quota.remaining} remaining (${quota.pctUsed}% used)`
      );
    }
    if (quota.warning) {
      console.warn(`[ElevenLabs] Quota warning: ${quota.pctUsed}% used (${quota.remaining} chars remaining)`);
    }
  }
}

Step 6: Combined Rate-Limited Client

// src/elevenlabs/resilient-client.ts
import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js";
import { createRequestQueue } from "./rate-limiter";
import { withBackoff } from "./backoff";
import { QuotaMonitor } from "./quota-monitor";

export function createResilientClient(plan: "free" | "starter" | "creator" | "pro" | "scale" = "pro") {
  const client = new ElevenLabsClient({ maxRetries: 0 }); // We handle retries
  const queue = createRequestQueue(plan);
  const quota = new QuotaMonitor(client);

  return {
    async generateSpeech(voiceId: string, text: string, modelId = "eleven_multilingual_v2") {
      await quota.guardRequest(text.length);

      return queue.add(() =>
        withBackoff(() =>
          client.textToSpeech.convert(voiceId, {
            text,
            model_id: modelId,
          })
        )
      );
    },

    getQueueStats() {
      return {
        pending: queue.pending,
        size: queue.size,
      };
    },

    checkQuota: () => quota.check(),
  };
}

Model Cost Impact on Quota

Model	Credits per Character	10,000 Chars Cost
`eleven_v3`	1.0	10,000 credits
`eleven_multilingual_v2`	1.0	10,000 credits
`eleven_flash_v2_5`	0.5	5,000 credits
`eleven_turbo_v2_5`	0.5	5,000 credits

Use Flash/Turbo models during development to conserve quota.

Error Handling

Scenario	Detection	Response
Concurrent limit hit	429 + `too_many_concurrent_requests`	Queue; retry after ~50ms per queued request
System busy	429 + `system_busy`	Exponential backoff (1s, 2s, 4s, 8s...)
Quota exhausted	401 + `quota_exceeded`	Stop requests; alert; wait for reset
Server error	500-599	Exponential backoff; max 5 retries

Resources

Next Steps

For security configuration, see elevenlabs-security-basics.

> related_skills --same-repo

> fathom-cost-tuning

Optimize Fathom API usage and plan selection. Trigger with phrases like "fathom cost", "fathom pricing", "fathom plan".

> fathom-core-workflow-b

Sync Fathom meeting data to CRM and build automated follow-up workflows. Use when integrating Fathom with Salesforce, HubSpot, or custom CRMs, or creating automated post-meeting email summaries. Trigger with phrases like "fathom crm sync", "fathom salesforce", "fathom follow-up", "fathom post-meeting workflow".

> fathom-core-workflow-a

Build a meeting analytics pipeline with Fathom transcripts and summaries. Use when extracting insights from meetings, building CRM sync, or creating automated meeting follow-up workflows. Trigger with phrases like "fathom analytics", "fathom meeting pipeline", "fathom transcript analysis", "fathom action items sync".

> fathom-common-errors

Diagnose and fix Fathom API errors including auth failures and missing data. Use when API calls fail, transcripts are empty, or webhooks are not firing. Trigger with phrases like "fathom error", "fathom not working", "fathom api failure", "fix fathom".

┌ stats

installs/wk0

░░░░░░░░░░

github stars2.4K

██████████

first seenMar 23, 2026

└────────────

┌ repo

jeremylongshore/claude-code-plugins-plus-skills

by jeremylongshore

└────────────