> voice-generation

Use this skill for AI text-to-speech generation. Triggers include: "generate voice", "create audio", "text to speech", "TTS", "read this aloud", "generate narration", "create voiceover", "synthesize speech", "podcast audio", "dialogue audio", "multi-speaker", "audiobook" Supports Google Gemini TTS, ElevenLabs, and OpenAI TTS.

fetch
$curl "https://skillshub.wtf/michaelboeding/skills/voice-generation?format=md"
SKILL.mdvoice-generation

Voice Generation Skill

Generate realistic speech using AI (Google Gemini TTS, ElevenLabs, OpenAI TTS).

Prerequisites

At least one API key is required:

  • GOOGLE_API_KEY - For Google Gemini TTS (same key as video/image/music) ✅
  • ELEVENLABS_API_KEY - For ElevenLabs high-quality voice synthesis
  • OPENAI_API_KEY - For OpenAI TTS voices

Available APIs

Google Gemini TTS (Recommended - Same API Key)

  • Best for: Podcasts, dialogues, audiobooks with style control
  • Voices: 30 voices with natural language style control
  • Multi-speaker: Up to 2 speakers for dialogues ✅
  • Languages: 24 languages (auto-detected)
  • Features: Control style, accent, pace via prompts
  • Output: 24kHz WAV
  • API Key: Same GOOGLE_API_KEY as video/image/music ✅

ElevenLabs (Best Quality)

  • Best for: Natural-sounding voices, voice cloning, long-form content
  • Voices: 100+ pre-made voices + custom voice cloning
  • Languages: 29+ languages
  • Models: Eleven Multilingual v2, Eleven Turbo v2

OpenAI TTS (Simplest)

  • Best for: Quick, reliable text-to-speech with consistent quality
  • Voices: alloy, echo, fable, onyx, nova, shimmer
  • Models: tts-1 (fast), tts-1-hd (high quality)
  • Output: MP3, Opus, AAC, FLAC

Workflow

Step 1: Understand the Request

Parse the user's voice request for:

  • Text content: What should be spoken?
  • Voice type: Male, female, specific character?
  • Tone: Professional, casual, dramatic, cheerful?
  • Use case: Narration, voiceover, audiobook, notification?
  • Language: English, Spanish, other?
  • Speed: Normal, slow, fast?

Step 2: Select Voice and API

Choose based on requirements:

Use CaseRecommended APIReason
Default / Same key as videoGemini TTSSame GOOGLE_API_KEY
Multi-speaker dialogueGemini TTSUp to 2 speakers built-in
Style/accent controlGemini TTSNatural language prompts
Voice cloningElevenLabsOnly API with cloning
100+ voice optionsElevenLabsWidest selection
Audiobook/podcastElevenLabs or GeminiBoth excellent for long content
Quick narrationOpenAI TTSFast, reliable
Budget-consciousOpenAI TTSLower cost

Step 3: Prepare the Text

Optimize text for speech:

  1. Add pauses: Use commas, periods for natural rhythm
  2. Spell out numbers: "1,234" → "one thousand two hundred thirty-four" (if needed)
  3. Handle acronyms: "NASA" vs "N.A.S.A." depending on pronunciation
  4. Mark emphasis: Some APIs support emphasis markers

Example transformation:

  • Original: "The Q4 2024 results show a 15% YoY increase."
  • Optimized: "The Q4 2024 results show a fifteen percent year-over-year increase."

Step 4: Generate the Audio

Execute the appropriate script from ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/:

For Google Gemini TTS (single speaker):

python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/gemini_tts.py \
  --text "Welcome to our podcast!" \
  --voice "Charon"

Gemini TTS with style direction:

python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/gemini_tts.py \
  --text "Have a wonderful day!" \
  --voice "Puck" \
  --style "Say cheerfully with a British accent:"

Gemini TTS multi-speaker (dialogue):

python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/gemini_tts.py \
  --multi \
  --speaker "Host:Charon" \
  --speaker "Guest:Aoede" \
  --text "Host: Welcome to the show!
Guest: Thanks for having me!"

For ElevenLabs:

python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/elevenlabs.py \
  --text "Your text here" \
  --voice "Rachel" \
  --model "eleven_multilingual_v2"

For OpenAI TTS:

python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/openai_tts.py \
  --text "Your text here" \
  --voice "nova" \
  --model "tts-1-hd"

List Gemini voices:

python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/gemini_tts.py --list-voices

Step 5: Deliver the Result

  1. Provide the generated audio file path
  2. Mention the voice and settings used
  3. Offer to:
    • Try a different voice
    • Adjust speed or tone
    • Use a different API
    • Generate in a different format

Error Handling

Missing API key: Inform the user which key is needed:

Gemini TTS requires google-genai package: pip install google-genai

Text too long: Split into chunks and concatenate, or suggest shorter text.

Rate limit: Suggest waiting or trying a different API.

Unsupported language: Suggest an alternative API that supports the language.

Multi-speaker limit: Gemini TTS supports max 2 speakers. For more, use ElevenLabs with multiple calls.

Voice Selection Guide

Google Gemini TTS Voices (30 voices)

StyleVoicesBest For
Bright/UpbeatZephyr, Puck, Aoede, LaomedeiaMarketing, cheerful content
Firm/InformativeCharon, Kore, Orus, RasalgethiNews, tutorials, professional
Soft/WarmAchernar, Sulafat, VindemiatrixMeditation, gentle narration
SmoothAlgieba, Despina, CallirrhoeAudiobooks, storytelling
ClearErinome, Iapetus, PulcherrimaInstructions, clarity
CharacterFenrir (excitable), Enceladus (breathy), Algenib (gravelly), Gacrux (mature)Character voices, drama
FriendlyAchird, Zubenelgenubi (casual)Casual, conversational

Gemini TTS Style Tips:

  • Use natural language: --style "Say angrily:" or --style "Whisper mysteriously:"
  • Specify accents: --style "Speak with a British accent from London:"
  • Control pace: --style "Speak slowly and deliberately:"
  • Combine: --style "Say excitedly with a Southern US accent:"

OpenAI TTS Voices

VoiceDescriptionBest For
alloyNeutral, balancedGeneral purpose
echoWarm, conversationalPodcasts, casual
fableExpressive, BritishStorytelling
onyxDeep, authoritativeNarration, professional
novaFriendly, upbeatMarketing, tutorials
shimmerSoft, gentleMeditation, ASMR

ElevenLabs Popular Voices

VoiceDescriptionBest For
RachelYoung female, AmericanNarration, audiobooks
DomiYoung female, energeticMarketing, ads
BellaYoung female, softStorytelling
AntoniYoung male, well-roundedNarration
JoshYoung male, deepAudiobooks
ArnoldMature male, authoritativeDocumentary
AdamMiddle-aged male, deepNarration
SamYoung male, raspyCharacter voices

Best Practices

For Narration

  • Use a consistent voice throughout
  • Add natural pauses between paragraphs
  • Consider pacing for the content type

For Dialogue

  • Use different voices for different characters
  • Match voice characteristics to character descriptions
  • Adjust speed for emotional scenes

For Accessibility

  • Use clear, well-paced speech
  • Avoid overly stylized voices
  • Test with screen readers if applicable

API Comparison

FeatureGemini TTSElevenLabsOpenAI TTS
API KeyGOOGLE_API_KEYELEVENLABS_API_KEYOPENAI_API_KEY
Voice qualityExcellentExcellentVery good
Voice variety30 voices100+ voices6 voices
Multi-speaker✅ Up to 2❌ No❌ No
Style control✅ Natural languageLimited❌ No
Voice cloning❌ No✅ Yes❌ No
Languages2429+50+
Speed controlVia promptsYesYes (0.25-4x)
Max length32k tokens5,000 chars4,096 chars
Output formatWAV (24kHz)MP3, WAVMP3, Opus, AAC, FLAC
Same key as video/image✅ Yes❌ No❌ No

> related_skills --same-repo

> xlsx

Comprehensive spreadsheet creation, editing, and analysis with support for formulas, formatting, data analysis, and visualization. When Claude needs to work with spreadsheets (.xlsx, .xlsm, .csv, .tsv, etc) for: (1) Creating new spreadsheets with formulas and formatting, (2) Reading or analyzing data, (3) Modify existing spreadsheets while preserving formulas, (4) Data analysis and visualization in spreadsheets, or (5) Recalculating formulas

> video-producer-agent

Use this skill to create complete videos with voiceover and music. Triggers: "create video", "product video", "explainer video", "promo video", "demo video", "training video", "ad video", "commercial", "marketing video", "video with voiceover", "video with music", "brand video", "testimonial video" Orchestrates: script, voiceover, background music, video clips/images, and final assembly.

> video-generation

Use this skill for AI video generation. Triggers include: "generate video", "create video", "make video", "animate", "text to video", "video from image", "video of", "animate image", "bring to life", "make it move", "add motion", "video with audio", "video with dialogue" Supports text-to-video, image-to-video, video with dialogue/audio using Google Veo 3.1 (default) or OpenAI Sora.

> style-guide

Analyze a codebase to extract its conventions, patterns, and style. Spawns specialized analyzer agents that each focus on one aspect (structure, naming, patterns, testing, frontend). Generates a comprehensive style guide that other skills can reference. Use when starting work on an unfamiliar codebase, or to create explicit documentation of implicit conventions.

┌ stats

installs/wk0
░░░░░░░░░░
github stars10
██░░░░░░░░
first seenMar 18, 2026
└────────────

┌ repo

michaelboeding/skills
by michaelboeding
└────────────

┌ tags

└────────────