> deepgram

Transcribe and analyze audio with the Deepgram API. Use when a user asks to convert speech to text, implement real-time transcription, analyze audio intelligence, detect languages, or build voice-enabled applications with Deepgram SDKs.

fetch

$curl "https://skillshub.wtf/TerminalSkills/skills/deepgram?format=md"

SKILL.md•deepgram

Deepgram — Real-Time Speech-to-Text API

Overview

You are an expert in Deepgram, the speech-to-text platform optimized for real-time transcription. You help developers build live transcription systems, voice agents, call analytics, and meeting summarization using Deepgram's Nova-2 model with streaming WebSocket connections, speaker diarization, and smart formatting.

Instructions

Streaming Transcription (Real-Time)

// Real-time transcription via WebSocket
import { createClient, LiveTranscriptionEvents } from "@deepgram/sdk";

const deepgram = createClient(process.env.DEEPGRAM_API_KEY);

async function transcribeLive(audioStream: ReadableStream) {
  const connection = deepgram.listen.live({
    model: "nova-2",                    // Fastest, most accurate model
    language: "en",
    smart_format: true,                 // Auto-punctuation, casing, numbers
    interim_results: true,              // Get partial results as user speaks
    utterance_end_ms: 1000,             // 1s silence = end of utterance
    vad_events: true,                   // Voice activity detection
    diarize: true,                      // Speaker identification
    endpointing: 500,                   // 500ms endpointing for responsiveness
  });

  connection.on(LiveTranscriptionEvents.Transcript, (data) => {
    const transcript = data.channel.alternatives[0];
    if (transcript.transcript) {
      if (data.is_final) {
        console.log(`[Final] Speaker ${data.channel.alternatives[0].words?.[0]?.speaker}: ${transcript.transcript}`);
        // Send to LLM for response generation
      } else {
        console.log(`[Interim] ${transcript.transcript}`);
        // Show real-time text as user speaks
      }
    }
  });

  connection.on(LiveTranscriptionEvents.UtteranceEnd, () => {
    console.log("[Utterance complete — user stopped speaking]");
  });

  // Pipe audio to Deepgram (16kHz, 16-bit PCM or any supported format)
  for await (const chunk of audioStream) {
    connection.send(chunk);
  }
}

Pre-Recorded Transcription

# Batch transcription for recorded audio/video
from deepgram import DeepgramClient, PrerecordedOptions

dg = DeepgramClient(os.environ["DEEPGRAM_API_KEY"])

options = PrerecordedOptions(
    model="nova-2",
    smart_format=True,
    diarize=True,                       # Identify speakers
    summarize="v2",                     # Auto-generate summary
    topics=True,                        # Extract topics
    intents=True,                       # Detect intent (question, command, statement)
    sentiment=True,                     # Sentiment per utterance
    paragraphs=True,                    # Auto-paragraph formatting
    utterances=True,                    # Split by speaker turns
)

# From URL
response = dg.listen.rest.v("1").transcribe_url(
    {"url": "https://example.com/meeting.mp3"}, options
)

# From file
with open("recording.wav", "rb") as f:
    response = dg.listen.rest.v("1").transcribe_file(
        {"buffer": f.read(), "mimetype": "audio/wav"}, options
    )

# Access results
transcript = response.results.channels[0].alternatives[0]
print(f"Transcript: {transcript.transcript}")
print(f"Confidence: {transcript.confidence}")
print(f"Summary: {response.results.summary.short}")
for utterance in response.results.utterances:
    print(f"Speaker {utterance.speaker}: {utterance.transcript}")

Text-to-Speech (Aura)

# Deepgram Aura TTS — low-latency voice synthesis
response = dg.speak.rest.v("1").stream_raw(
    {"text": "Thanks for calling. How can I help you today?"},
    options={
        "model": "aura-asteria-en",     # Female, warm tone
        "encoding": "linear16",          # 16-bit PCM
        "sample_rate": 24000,
    },
)

# Stream audio chunks to speaker/WebRTC
for chunk in response.iter_bytes():
    audio_output.write(chunk)

Installation

npm install @deepgram/sdk                # Node.js
pip install deepgram-sdk                  # Python

Examples

Example 1: User asks to set up deepgram

User: "Help me set up deepgram for my project"

The agent should:

Check system requirements and prerequisites
Install or configure deepgram
Set up initial project structure
Verify the setup works correctly

Example 2: User asks to build a feature with deepgram

User: "Create a dashboard using deepgram"

The agent should:

Scaffold the component or configuration
Connect to the appropriate data source
Implement the requested feature
Test and validate the output

Guidelines

Nova-2 for everything — Nova-2 is Deepgram's best model for accuracy and speed; use it unless you need a specific language model
Streaming for real-time — Use WebSocket connections for live audio; batch API for pre-recorded files
Endpointing tuning — Set endpointing to 300-500ms for voice agents (responsive) or 1000ms for transcription (accurate)
Smart formatting — Always enable smart_format for proper capitalization, punctuation, and number formatting
Diarization for meetings — Enable diarize when multiple speakers are present; Deepgram identifies up to 10 speakers
Interim results for UX — Enable interim_results for real-time text display; show partial transcripts as users speak
Multichannel for calls — Use multichannel: true for phone calls where each speaker is on a separate audio channel
Callback for async — Use callback_url for large file transcription; Deepgram POSTs results when done

> related_skills --same-repo

> zustand

You are an expert in Zustand, the small, fast, and scalable state management library for React. You help developers manage global state without boilerplate using Zustand's hook-based stores, selectors for performance, middleware (persist, devtools, immer), computed values, and async actions — replacing Redux complexity with a simple, un-opinionated API in under 1KB.

> zod

You are an expert in Zod, the TypeScript-first schema declaration and validation library. You help developers define schemas that validate data at runtime AND infer TypeScript types at compile time — eliminating the need to write types and validators separately. Used for API input validation, form validation, environment variables, config files, and any data boundary.

> xero-accounting

Integrate with the Xero accounting API to sync invoices, expenses, bank transactions, and contacts — and generate financial reports like P&L and balance sheet. Use when: connecting apps to Xero, automating bookkeeping workflows, syncing accounting data, or pulling financial reports programmatically.

> windsurf-rules

Configure Windsurf AI coding assistant with .windsurfrules and workspace rules. Use when: customizing Windsurf for a project, setting AI coding standards, creating team-shared Windsurf configurations, or tuning Cascade AI behavior.

┌ stats

installs/wk0

░░░░░░░░░░

github stars76

██████████

first seenMar 17, 2026

└────────────

┌ repo

TerminalSkills/skills

by TerminalSkills

└────────────