> zoom-rtms

Zoom Realtime Media Streams (RTMS) for accessing live audio, video, transcript, chat, and screen share from Zoom meetings, webinars, and Video SDK sessions. WebSocket-based protocol using open web standards. Use when building AI/ML applications, live transcription, recording, streaming, or real-time meeting/webinar/session analysis.

fetch
$curl "https://skillshub.wtf/zoom/skills/rtms?format=md"
SKILL.mdzoom-rtms

Zoom Realtime Media Streams (RTMS)

Expert guidance for accessing live audio, video, transcript, chat, and screen share data from Zoom meetings, webinars, and Video SDK sessions in real-time. RTMS uses WebSocket-based protocol with open standards - no meeting bots required.

Read This First (Critical)

RTMS is primarily a backend media ingestion service.

  • Your backend receives and processes live media: audio, video, screen share, chat, transcript.
  • RTMS is not a frontend UI SDK by itself.
  • Processing is event-triggered: backend waits for RTMS start webhook events before stream handling begins.

Optional architecture (common):

  • Add a Zoom App SDK frontend for in-client UI/controls.
  • Stream backend RTMS outputs to frontend via WebSocket (or SSE, gRPC, queue workers, etc.).

Use RTMS for media/data plane, and use frontend frameworks/Zoom Apps for presentation + user interactions.

Official Documentation: https://developers.zoom.us/docs/rtms/ SDK Reference (JS): https://zoom.github.io/rtms/js/ SDK Reference (Python): https://zoom.github.io/rtms/py/ Sample Repository: https://github.com/zoom/rtms-samples

Quick Links

New to RTMS? Follow this path:

  1. Connection Architecture - Two-phase WebSocket design
  2. SDK Quickstart - Fastest way to receive media (recommended)
  3. Manual WebSocket - Full protocol control without SDK
  4. Media Types - Audio, video, transcript, chat, screen share

Complete Implementation:

  • RTMS Bot - End-to-end bot implementation guide

Reference:

Having issues?

Supported Products

ProductWebhook EventPayload IDApp Type
Meetingsmeeting.rtms_started / meeting.rtms_stoppedmeeting_uuidGeneral App
Webinarswebinar.rtms_started / webinar.rtms_stoppedmeeting_uuid (same!)General App
Video SDKsession.rtms_started / session.rtms_stoppedsession_idVideo SDK App

Once connected, the WebSocket protocol, media types, and streaming behavior are identical across all products.

RTMS Overview

RTMS is a data pipeline that gives your app access to live media from Zoom meetings, webinars, and Video SDK sessions without participant bots. Instead of having automated clients join meetings, use RTMS to collect media data directly from Zoom's infrastructure.

What RTMS Provides

Media TypeFormatUse Cases
AudioPCM (L16), G.711, G.722, OpusTranscription, voice analysis, recording
VideoH.264, JPG, PNGRecording, AI vision, thumbnails
Screen ShareH.264, JPG, PNGContent capture, slide extraction
TranscriptJSON textMeeting notes, search, compliance
ChatJSON textArchive, sentiment analysis

Two Approaches

ApproachBest ForComplexity
SDK (@zoom/rtms)Most use casesLow - handles WebSocket complexity
Manual WebSocketCustom protocols, other languagesHigh - full protocol implementation

Prerequisites

  • Node.js 20.3.0+ (24 LTS recommended) for JavaScript SDK
  • Python 3.10+ for Python SDK
  • Zoom General App (for meetings/webinars) or Video SDK App (for Video SDK) with RTMS feature enabled
  • Webhook endpoint for RTMS events
  • Server to receive WebSocket streams

Need RTMS access? Post in Zoom Developer Forum requesting RTMS access with your use case.

Quick Start (SDK - Recommended)

import rtms from "@zoom/rtms";

// All RTMS start/stop events across products
const RTMS_EVENTS = ["meeting.rtms_started", "webinar.rtms_started", "session.rtms_started"];

// Handle webhook events
rtms.onWebhookEvent(({ event, payload }) => {
  if (!RTMS_EVENTS.includes(event)) return;

  const client = new rtms.Client();

  client.onAudioData((data, timestamp, metadata) => {
    console.log(`Audio from ${metadata.userName}: ${data.length} bytes`);
  });

  client.onTranscriptData((data, timestamp, metadata) => {
    const text = data.toString('utf8');
    console.log(`${metadata.userName}: ${text}`);
  });

  client.onJoinConfirm((reason) => {
    console.log(`Joined session: ${reason}`);
  });

  // SDK handles all WebSocket connections automatically
  // Accepts both meeting_uuid and session_id transparently
  client.join(payload);
});

Quick Start (Manual WebSocket)

For full control or non-SDK languages, implement the two-phase WebSocket protocol:

const WebSocket = require('ws');
const crypto = require('crypto');

const RTMS_EVENTS = ['meeting.rtms_started', 'webinar.rtms_started', 'session.rtms_started'];

// 1. Generate signature
// For meetings/webinars: uses meeting_uuid. For Video SDK: uses session_id.
function generateSignature(clientId, idValue, streamId, clientSecret) {
  const message = `${clientId},${idValue},${streamId}`;
  return crypto.createHmac('sha256', clientSecret).update(message).digest('hex');
}

// 2. Handle webhook
app.post('/webhook', (req, res) => {
  res.status(200).send();  // CRITICAL: Respond immediately!
  
  const { event, payload } = req.body;
  if (RTMS_EVENTS.includes(event)) {
    connectToRTMS(payload);
  }
});

// 3. Connect to signaling WebSocket
function connectToRTMS(payload) {
  const { server_urls, rtms_stream_id } = payload;
  // meeting_uuid for meetings/webinars, session_id for Video SDK
  const idValue = payload.meeting_uuid || payload.session_id;
  const signature = generateSignature(CLIENT_ID, idValue, rtms_stream_id, CLIENT_SECRET);
  
  const signalingWs = new WebSocket(server_urls);
  
  signalingWs.on('open', () => {
    signalingWs.send(JSON.stringify({
      msg_type: 1,  // Handshake request
      protocol_version: 1,
      meeting_uuid: idValue,
      rtms_stream_id,
      signature,
      media_type: 9  // AUDIO(1) | TRANSCRIPT(8)
    }));
  });
  
  // ... handle responses, connect to media WebSocket
}

See: Manual WebSocket Guide for complete implementation.

Media Type Bitmask

Combine types with bitwise OR:

TypeValueDescription
Audio1PCM audio samples
Video2H.264/JPG video frames
Screen Share4Separate from video!
Transcript8Real-time speech-to-text
Chat16In-meeting chat messages
All32All media types

Example: Audio + Transcript = 1 | 8 = 9

Critical Gotchas

IssueSolution
Only 1 connection allowedNew connections kick out existing ones. Track active sessions!
Respond 200 immediatelyIf webhook delays, Zoom retries creating duplicate connections
Heartbeat mandatoryRespond to msg_type 12 with msg_type 13, or connection dies
Reconnection is YOUR jobRTMS doesn't auto-reconnect. Media: 30s, Signaling: 60s timeout
Transcript language delayAuto-detect takes 30s. Set language explicitly to skip delay

Environment Variables

SDK Environment Variables

# Required - Authentication
ZM_RTMS_CLIENT=your_client_id          # Zoom OAuth Client ID
ZM_RTMS_SECRET=your_client_secret      # Zoom OAuth Client Secret

# Optional - Webhook server
ZM_RTMS_PORT=8080                      # Default: 8080
ZM_RTMS_PATH=/webhook                  # Default: /

# Optional - Logging
ZM_RTMS_LOG_LEVEL=info                 # error, warn, info, debug, trace
ZM_RTMS_LOG_FORMAT=progressive         # progressive or json
ZM_RTMS_LOG_ENABLED=true

Manual Implementation Variables

ZOOM_CLIENT_ID=your_client_id
ZOOM_CLIENT_SECRET=your_client_secret
ZOOM_SECRET_TOKEN=your_webhook_token   # For webhook validation

Zoom App Setup

For Meetings and Webinars (General App)

  1. Go to marketplace.zoom.us -> Develop -> Build App
  2. Choose General App -> User-Managed
  3. Features -> Access -> Enable Event Subscription
  4. Add Events -> Search "rtms" -> Select:
    • meeting.rtms_started
    • meeting.rtms_stopped
    • webinar.rtms_started (if using webinars)
    • webinar.rtms_stopped (if using webinars)
  5. Scopes -> Add Scopes -> Search "rtms" -> Add:
    • meeting:read:meeting_audio
    • meeting:read:meeting_video
    • meeting:read:meeting_transcript
    • meeting:read:meeting_chat
    • webinar:read:webinar_audio (if using webinars)
    • webinar:read:webinar_video (if using webinars)
    • webinar:read:webinar_transcript (if using webinars)
    • webinar:read:webinar_chat (if using webinars)

For Video SDK (Video SDK App)

  1. Go to marketplace.zoom.us -> Develop -> Build App
  2. Choose Video SDK App
  3. Use your SDK Key and SDK Secret (not OAuth Client ID/Secret)
  4. Add Events:
    • session.rtms_started
    • session.rtms_stopped

Sample Repositories

Official Samples

RepositoryDescription
rtms-samplesRTMSManager, boilerplates, AI samples
rtms-quickstart-jsJavaScript SDK quickstart
rtms-quickstart-pyPython SDK quickstart
rtms-sdk-cppC++ SDK
zoom-rtmsMain SDK repository

AI Integration Samples

SampleDescription
rtms-meeting-assistant-starter-kitAI meeting assistant with summaries
arlo-meeting-assistantProduction meeting assistant with DB
videosdk-rtms-transcribe-audioWhisper transcription

Complete Documentation

Concepts

Examples

References

Troubleshooting

Resources


Need help? Start with Integrated Index section below for complete navigation.


Integrated Index

This section was migrated from SKILL.md.

RTMS provides real-time access to live audio, video, transcript, chat, and screen share from Zoom meetings, webinars, and Video SDK sessions.

Critical Positioning

Treat RTMS as a backend service for receiving and processing media streams.

  • Backend role: ingest audio/video/share/chat/transcript, run AI/analytics, persist/forward data.
  • Optional frontend role: Zoom App SDK or web dashboard that consumes processed stream data from backend transport (WebSocket/SSE/other).
  • Kickoff model: backend waits for RTMS start webhook events, then starts stream processing.

Do not model RTMS as a frontend-only SDK.

Quick Start Path

If you're new to RTMS, follow this order:

  1. Run preflight checks first -> RUNBOOK.md

  2. Understand the architecture -> concepts/connection-architecture.md

    • Two-phase WebSocket: Signaling + Media
    • Why RTMS doesn't use bots
  3. Choose your approach -> SDK or Manual

  4. Understand the lifecycle -> concepts/lifecycle-flow.md

    • Webhook -> Signaling -> Media -> Streaming
  5. Configure media types -> references/media-types.md

    • Audio, video, transcript, chat, screen share
  6. Troubleshoot issues -> troubleshooting/common-issues.md

    • Connection problems, duplicate webhooks, missing data

Documentation Structure

rtms/
├── SKILL.md                           # Main skill overview
├── SKILL.md                           # This file - navigation guide
│
├── concepts/                          # Core architectural patterns
│   ├── connection-architecture.md     # Two-phase WebSocket design
│   └── lifecycle-flow.md              # Webhook to streaming flow
│
├── examples/                          # Complete working code
│   ├── sdk-quickstart.md              # Using @zoom/rtms SDK
│   ├── manual-websocket.md            # Raw protocol implementation
│   ├── rtms-bot.md                    # Complete RTMS bot implementation
│   └── ai-integration.md              # Transcription and analysis
│
├── references/                        # Reference documentation
│   ├── media-types.md                 # Audio, video, transcript, chat, share
│   ├── data-types.md                  # All enums and constants
│   ├── connection.md                  # WebSocket protocol details
│   └── webhooks.md                    # Event subscription
│
└── troubleshooting/                   # Problem solving guides
    └── common-issues.md               # FAQ and solutions

By Use Case

I want to get meeting transcripts

  1. SDK Quickstart - Fastest approach
  2. Media Types - Transcript configuration
  3. AI Integration - Whisper, Deepgram, AssemblyAI

I want to record meetings

  1. Media Types - Audio + Video configuration
  2. SDK Quickstart - Receiving media
  3. AI Integration - Gap-filled recording

I want to build an AI meeting assistant

  1. AI Integration - Complete patterns
  2. SDK Quickstart - Media ingestion
  3. Lifecycle Flow - Event handling

I want to build a complete RTMS bot

  1. RTMS Bot - Complete implementation guide
  2. Lifecycle Flow - Webhook to streaming flow
  3. Connection Architecture - Two-phase design

I need full protocol control

  1. Manual WebSocket - START HERE
  2. Connection Architecture - Two-phase design
  3. Data Types - All message types and enums
  4. Connection - Protocol details

I'm getting connection errors

  1. Common Issues - Diagnostic checklist
  2. Connection Architecture - Verify flow
  3. Webhooks - Validation and timing

I want to understand the architecture

  1. Connection Architecture - Two-phase WebSocket
  2. Lifecycle Flow - Complete flow diagram
  3. Data Types - Protocol constants

By Product

I'm building for Zoom Meetings

I'm building for Zoom Webinars

  • Same as meetings, but webhook event is webinar.rtms_started. Payload still uses meeting_uuid (NOT webinar_uuid).
  • Add webinar scopes and event subscriptions. See Webhooks.
  • Only panelist streams are confirmed available. Attendee streams may not be individual.

I'm building for Zoom Video SDK

  • Webhook event: session.rtms_started. Payload uses session_id (NOT meeting_uuid).
  • Requires a Video SDK App with SDK Key/Secret (not OAuth Client ID/Secret).
  • Once connected, the protocol is identical to meetings.
  • See Webhooks for payload details.

Key Documents

1. Connection Architecture (CRITICAL)

concepts/connection-architecture.md

RTMS uses two separate WebSocket connections:

  • Signaling WebSocket: Authentication, control, heartbeats
  • Media WebSocket: Actual audio/video/transcript data

2. SDK vs Manual (DECISION POINT)

examples/sdk-quickstart.md vs examples/manual-websocket.md

SDKManual
Handles WebSocket complexityFull protocol control
Automatic reconnectionDIY reconnection
Less codeMore code
Best for most use casesBest for custom requirements

3. Critical Gotchas (MOST COMMON ISSUES)

troubleshooting/common-issues.md

  1. Respond 200 immediately - Delayed webhook responses cause duplicates
  2. Only 1 connection per stream - New connections kick out existing
  3. Heartbeat required - Must respond to keep-alive or connection dies
  4. Track active sessions - Prevent duplicate join attempts

Key Learnings

Critical Discoveries:

  1. Two-Phase WebSocket Design

    • Signaling: Control plane (handshake, heartbeat, start/stop)
    • Media: Data plane (audio, video, transcript, chat, share)
    • See: Connection Architecture
  2. Webhook Response Timing

    • MUST respond 200 BEFORE any processing
    • Delayed response -> Zoom retries -> duplicate connections
    • See: Common Issues
  3. Heartbeat is Mandatory

    • Signaling: Receive msg_type 12, respond with msg_type 13
    • Media: Same pattern
    • Failure to respond = connection closed
    • See: Connection
  4. Signature Generation

    • Format: HMAC-SHA256(clientSecret, "clientId,meetingUuid,streamId")
    • For Video SDK, use session_id in place of meetingUuid
    • Webinars still use meeting_uuid (not webinar_uuid)
    • Required for both signaling and media handshakes
    • See: Manual WebSocket
  5. Media Types are Bitmasks

    • Audio=1, Video=2, Share=4, Transcript=8, Chat=16, All=32
    • Combine with OR: Audio+Transcript = 1|8 = 9
    • See: Media Types
  6. Screen Share is SEPARATE from Video

    • Different msg_type (16 vs 15)
    • Different media flag (4 vs 2)
    • Must subscribe separately
    • See: Media Types

Quick Reference

"Connection fails"

-> Common Issues

"Duplicate connections"

-> Webhook timing

"No audio/video data"

-> Media Types - Check configuration

"How do I implement manually?"

-> Manual WebSocket

"What message types exist?"

-> Data Types

"How do I integrate AI?"

-> AI Integration


Document Version

Based on Zoom RTMS SDK v1.x and official documentation as of 2026.


Happy coding!

Remember: Start with SDK Quickstart for the fastest path, or Manual WebSocket if you need full control.

┌ stats

installs/wk0
░░░░░░░░░░
github stars13
███░░░░░░░
first seenMar 17, 2026
└────────────

┌ repo

zoom/skills
by zoom
└────────────

┌ tags

└────────────