> assemblyai

AssemblyAI API for speech recognition, transcription, and audio intelligence. Use when transcribing audio or video files, performing speaker diarization, running sentiment analysis on calls, detecting unsafe content in audio, or asking LLM-powered questions about recorded content with LeMUR.

fetch

$curl "https://skillshub.wtf/TerminalSkills/skills/assemblyai?format=md"

SKILL.md•assemblyai

AssemblyAI

Overview

AssemblyAI provides best-in-class speech recognition plus an intelligence layer: speaker diarization, sentiment analysis, auto chapters, content moderation, and LeMUR (LLM-powered Q&A on audio). Use it to turn audio/video files into structured, queryable data.

Setup

pip install assemblyai python-dotenv
export ASSEMBLYAI_API_KEY="your_api_key_here"

Core Concepts

Transcript: The async job that converts audio → text. Submit a URL or file, poll for completion.
Audio Intelligence: Optional enrichments added to the transcript request (diarization, sentiment, chapters, etc.).
LeMUR: Apply LLMs to your transcript — summarize, answer questions, extract structured data.
Real-time: Stream audio via WebSocket for live transcription.

Instructions

Step 1: Initialize the client

import assemblyai as aai
import os

aai.settings.api_key = os.environ["ASSEMBLYAI_API_KEY"]

Step 2: Transcribe a file (basic)

def transcribe(audio_source: str) -> aai.Transcript:
    """
    audio_source: URL (https://...) or local file path.
    Returns the completed Transcript object.
    """
    transcriber = aai.Transcriber()
    transcript = transcriber.transcribe(audio_source)

    if transcript.status == aai.TranscriptStatus.error:
        raise RuntimeError(f"Transcription error: {transcript.error}")

    print(f"Transcript ID: {transcript.id}")
    print(f"Text (first 300 chars): {transcript.text[:300]}...")
    return transcript

t = transcribe("https://assembly.ai/sports_injuries.mp3")
print(t.text)

Step 3: Transcribe with full audio intelligence

def transcribe_rich(audio_source: str) -> aai.Transcript:
    """Transcribe with speaker labels, sentiment, chapters, and content safety."""
    config = aai.TranscriptionConfig(
        speaker_labels=True,         # Who said what
        sentiment_analysis=True,     # Positive/negative/neutral per sentence
        auto_chapters=True,          # Generate chapter markers
        content_safety=True,         # Detect profanity, hate speech, etc.
        auto_highlights=True,        # Key phrases and topics
        entity_detection=True,       # People, places, organizations
        iab_categories=True,         # Topic taxonomy
        language_detection=True      # Detect language automatically
    )
    transcriber = aai.Transcriber()
    transcript = transcriber.transcribe(audio_source, config=config)

    if transcript.status == aai.TranscriptStatus.error:
        raise RuntimeError(transcript.error)
    return transcript

t = transcribe_rich("https://your-audio.com/podcast.mp3")

# Speaker diarization
print("\n--- Speakers ---")
for utt in t.utterances:
    print(f"[{utt.speaker}] {utt.text}")

# Chapters
print("\n--- Chapters ---")
for ch in t.chapters:
    start_min = ch.start // 60000
    print(f"[{start_min}m] {ch.headline}: {ch.summary}")

# Sentiment
print("\n--- Sentiment ---")
for s in t.sentiment_analysis[:5]:
    print(f"{s.sentiment.value}: {s.text[:80]}")

# Content safety
print("\n--- Content Safety ---")
for label, result in t.content_safety_labels.results.items():
    if result.status == "flagged":
        print(f"Flagged: {label} (confidence: {result.confidence:.2f})")

Step 4: Real-time streaming transcription

import assemblyai as aai
import pyaudio  # pip install pyaudio

def on_open(session_opened: aai.RealtimeSessionOpened):
    print(f"Session opened: {session_opened.session_id}")

def on_data(transcript: aai.RealtimeTranscript):
    if not transcript.text:
        return
    if isinstance(transcript, aai.RealtimeFinalTranscript):
        print(f"\n[FINAL] {transcript.text}")
    else:
        print(f"\r[partial] {transcript.text}", end="")

def on_error(error: aai.RealtimeError):
    print(f"Error: {error}")

def on_close():
    print("Session closed.")

def stream_microphone():
    """Stream microphone input to AssemblyAI for real-time transcription."""
    transcriber = aai.RealtimeTranscriber(
        sample_rate=16_000,
        on_data=on_data,
        on_error=on_error,
        on_open=on_open,
        on_close=on_close,
        end_utterance_silence_threshold=700
    )
    transcriber.connect()

    FRAMES_PER_BUFFER = 3200
    FORMAT = pyaudio.paInt16
    CHANNELS = 1
    RATE = 16_000

    p = pyaudio.PyAudio()
    stream = p.open(format=FORMAT, channels=CHANNELS, rate=RATE,
                    input=True, frames_per_buffer=FRAMES_PER_BUFFER)
    try:
        print("Recording... Press Ctrl+C to stop.")
        while True:
            data = stream.read(FRAMES_PER_BUFFER)
            transcriber.stream(data)
    except KeyboardInterrupt:
        pass
    finally:
        stream.stop_stream()
        stream.close()
        p.terminate()
        transcriber.close()

stream_microphone()

Step 5: LeMUR — ask questions about audio

def lemur_qa(transcript_id: str, questions: list[str]) -> list[dict]:
    """
    Ask LeMUR questions about a transcript.
    Returns list of {question, answer} dicts.
    """
    transcript = aai.Transcript.get_by_id(transcript_id)
    questions_answers = transcript.lemur.question_answer(
        questions=[
            aai.LemurQuestion(question=q, answer_format="concise")
            for q in questions
        ],
        final_model=aai.LemurModel.claude3_5_sonnet
    )
    results = []
    for qa in questions_answers.response:
        print(f"Q: {qa.question}\nA: {qa.answer}\n")
        results.append({"question": qa.question, "answer": qa.answer})
    return results

# Use LeMUR to extract structured insights
lemur_qa(t.id, [
    "What are the main topics discussed?",
    "List any action items or decisions made.",
    "What is the overall sentiment of the conversation?"
])

Step 6: LeMUR summarization

def lemur_summarize(transcript_id: str, context: str = "") -> str:
    """Generate a concise summary of a transcript."""
    transcript = aai.Transcript.get_by_id(transcript_id)
    result = transcript.lemur.summarize(
        context=context or "This is a podcast episode.",
        answer_format="bullet points",
        final_model=aai.LemurModel.claude3_5_sonnet
    )
    print(result.response)
    return result.response

summary = lemur_summarize(t.id, context="B2B SaaS podcast discussing AI trends")

Step 7: Generate show notes (combined pipeline)

def generate_show_notes(audio_url: str) -> dict:
    """Full podcast processing pipeline."""
    config = aai.TranscriptionConfig(
        speaker_labels=True,
        auto_chapters=True,
        auto_highlights=True
    )
    transcriber = aai.Transcriber()
    transcript = transcriber.transcribe(audio_url, config=config)

    if transcript.status == aai.TranscriptStatus.error:
        raise RuntimeError(transcript.error)

    # Build chapters list
    chapters = [
        {"time": f"{ch.start // 60000}:{(ch.start % 60000) // 1000:02d}",
         "title": ch.headline,
         "summary": ch.summary}
        for ch in transcript.chapters
    ]

    # LeMUR for show notes
    show_notes = transcript.lemur.task(
        prompt=(
            "Write podcast show notes in markdown. Include: "
            "1-paragraph episode summary, key takeaways as bullets, "
            "and a list of resources mentioned."
        ),
        final_model=aai.LemurModel.claude3_5_sonnet
    )

    # Social clips (key quotes)
    social_prompt = transcript.lemur.task(
        prompt="Extract 3 compelling quotes suitable for social media posts. Format each as a standalone quote with speaker label.",
        final_model=aai.LemurModel.claude3_5_sonnet
    )

    return {
        "transcript_id": transcript.id,
        "full_text": transcript.text,
        "chapters": chapters,
        "show_notes": show_notes.response,
        "social_clips": social_prompt.response
    }

result = generate_show_notes("https://your-podcast.com/episode-42.mp3")
print(result["show_notes"])

Audio Intelligence features reference

Feature	Config param	Description
Speaker labels	`speaker_labels=True`	Identify and label each speaker
Sentiment analysis	`sentiment_analysis=True`	Per-sentence positive/negative/neutral
Auto chapters	`auto_chapters=True`	Detect topic segments with summaries
Content safety	`content_safety=True`	Flag hate speech, profanity, etc.
Entity detection	`entity_detection=True`	Extract names, places, organizations
Key phrases	`auto_highlights=True`	Most important topics and phrases
Language detection	`language_detection=True`	Auto-detect spoken language
PII redaction	`redact_pii=True`	Mask personal information

Guidelines

Audio must be accessible via URL or uploaded; local files can be passed directly to transcriber.transcribe() — the SDK handles uploading.
Transcription typically completes in 20–50% of audio duration (a 10-min file → ~2–5 min).
LeMUR runs on top of the completed transcript, adding another few seconds.
For real-time streaming, use 16kHz mono PCM audio for best accuracy.
PII redaction (redact_pii=True) is useful for compliance when transcribing customer calls.
Store API keys in environment variables — never hardcode them.

> related_skills --same-repo

> zustand

You are an expert in Zustand, the small, fast, and scalable state management library for React. You help developers manage global state without boilerplate using Zustand's hook-based stores, selectors for performance, middleware (persist, devtools, immer), computed values, and async actions — replacing Redux complexity with a simple, un-opinionated API in under 1KB.

> zod

You are an expert in Zod, the TypeScript-first schema declaration and validation library. You help developers define schemas that validate data at runtime AND infer TypeScript types at compile time — eliminating the need to write types and validators separately. Used for API input validation, form validation, environment variables, config files, and any data boundary.

> xero-accounting

Integrate with the Xero accounting API to sync invoices, expenses, bank transactions, and contacts — and generate financial reports like P&L and balance sheet. Use when: connecting apps to Xero, automating bookkeeping workflows, syncing accounting data, or pulling financial reports programmatically.

> windsurf-rules

Configure Windsurf AI coding assistant with .windsurfrules and workspace rules. Use when: customizing Windsurf for a project, setting AI coding standards, creating team-shared Windsurf configurations, or tuning Cascade AI behavior.

┌ stats

installs/wk0

░░░░░░░░░░

github stars83

██████████

first seenMar 23, 2026

└────────────

┌ repo

TerminalSkills/skills

by TerminalSkills

└────────────