> ltx-video
Generate videos via LTX-2.3 API (ltx.video). Supports text-to-video, image-to-video, audio-to-video (lip-sync from audio + image), extend, and retake. Use when: generating AI video from text/image/audio, animating a portrait, creating lip-sync video from an existing image + audio recording.
curl "https://skillshub.wtf/LeoYeAI/openclaw-master-skills/ltx-video?format=md"LTX-2.3 Video API
API Reference
Base URL: https://api.ltx.video/v1
Auth: Authorization: Bearer <API_KEY>
Response: MP4 binary (direct download, no polling)
Endpoints
| Endpoint | Input | Use |
|---|---|---|
/v1/text-to-video | prompt | Generate video from text |
/v1/image-to-video | image_uri + prompt | Animate a still image |
/v1/audio-to-video | audio_uri + image_uri + prompt | Lip-sync video from audio + image |
/v1/extend | video_uri + prompt | Extend a video at start or end |
/v1/retake | video_uri + time range | Regenerate a section of a video |
Models
| Model | Speed | Quality |
|---|---|---|
ltx-2-3-fast | ~17s | Good (use for tests) |
ltx-2-3-pro | ~30-60s | Best (use for final) |
Supported Resolutions
1920x1080(landscape 16:9)1080x1920(portrait 9:16 — native vertical, trained on vertical data)1440x1080,4096x2160(text-to-video only)
audio-to-video only supports: 1920x1080 or 1080x1920
Quick Examples
Text to Video
curl -X POST "https://api.ltx.video/v1/text-to-video" \
-H "Authorization: Bearer $LTX_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"prompt": "A man in a navy blue suit sits at a luxury restaurant table...",
"model": "ltx-2-3-pro",
"duration": 8,
"resolution": "1920x1080"
}' -o output.mp4
Audio to Video (Lip-sync)
curl -X POST "https://api.ltx.video/v1/audio-to-video" \
-H "Authorization: Bearer $LTX_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"audio_uri": "https://example.com/voice.mp3",
"image_uri": "https://example.com/portrait.jpg",
"prompt": "A man speaks directly to camera...",
"model": "ltx-2-3-pro",
"resolution": "1920x1080"
}' -o output.mp4
Python Wrapper
import requests
def ltx_audio_to_video(audio_url, image_url, prompt, api_key,
model="ltx-2-3-pro", resolution="1920x1080",
output_path="output.mp4"):
r = requests.post(
"https://api.ltx.video/v1/audio-to-video",
headers={"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"},
json={"audio_uri": audio_url, "image_uri": image_url,
"prompt": prompt, "model": model, "resolution": resolution},
timeout=300, stream=True
)
if r.status_code != 200:
raise RuntimeError(f"LTX error {r.status_code}: {r.text}")
with open(output_path, "wb") as f:
for chunk in r.iter_content(8192): f.write(chunk)
return output_path
⚠️ Critical Rules (learned from experience)
File Hosting
- URLs must be HTTPS — HTTP is rejected
- Files must return correct MIME type (not
application/octet-stream) - uguu.se works: upload with
curl -F "files[]=@file.mp3" https://uguu.se/upload - Audio: upload as MP3 (not WAV) → uguu returns
audio/mpeg✅ - 4K images fail → resize to 1920x1080 before uploading
# Upload MP3 to uguu.se
AUDIO_URL=$(curl -s -F "files[]=@audio.mp3" "https://uguu.se/upload" | \
python3 -c "import sys,json; print(json.load(sys.stdin)['files'][0]['url'])")
# Upload image
IMAGE_URL=$(curl -s -F "files[]=@portrait.jpg" "https://uguu.se/upload" | \
python3 -c "import sys,json; print(json.load(sys.stdin)['files'][0]['url'])")
Image Size Limit
# Resize large images before upload
ffmpeg -y -i input_4k.png -vf "scale=1920:1080" output_1080.jpg
Face Consistency
- Avoid prompts where the character looks down — breaks face consistency
- Keep head level and gaze forward throughout
- Place objects already in frame instead of having character reach below frame
Last Frame
- LTX does not support first+last frame natively
- Workaround: generate clip A, generate clip B, then use
/v1/extendto chain them
Prompting Guide (LTX-2.3)
LTX-2.3 has a much stronger text connector. Specificity wins.
1. Use Verbs, Not Nouns
❌ "A dramatic portrait of a man standing"
✅ "A man stands on a rooftop. His coat flaps in the wind. He adjusts his collar and steps forward as the camera tracks right."
2. Block the Scene Like a Director
- Specify left vs right, foreground vs background
- Describe who moves, what moves, how they move, what the camera does
- Spatial relationships are now respected
3. Describe Audio Explicitly (for text-to-video)
- Name the type of sound: dialogue, ambient, music
- Specify tone and intensity
- Example:
"His voice is clear and warm. Restaurant ambient sound softly in the background."
4. Avoid Static Photo-Like Prompts
- If the prompt reads like a still image → the output behaves like one
- Add wind, motion, breathing, gestures, camera movement
5. Describe Texture and Material
- Hair, fabric, surface finish, lighting fall-off
"Individual hair strands visible in the backlight"→ now renders correctly
6. Portrait (9:16) Native
resolution: "1080x1920"→ trained on vertical data- Frame for vertical intentionally, don't treat as cropped landscape
7. Complex Shots Work Now
- Layer multiple actions:
"He picks up the banana, raises it to his ear, and smirks" - Combine character performance + environment + camera motion
Lip-Sync Prompt Template
A [description of person] sits/stands [location]. He/she speaks directly
to camera, lips moving in perfect sync with his/her voice. [Gesture details].
Head stays level and gaze remains locked on camera throughout.
[Environment description softly blurred in background].
[Lighting]. [Camera: holds steady at eye level, front-on].
ComfyUI Node
Custom nodes for ComfyUI (no manual API calls):
cd ComfyUI/custom_nodes
git clone https://github.com/PauldeLavallaz/comfyui-ltx-node
Nodes: LTX Text to Video, LTX Image to Video, LTX Extend Video
Category: LTX Video
API Key
Paul's key: stored in ~/clawd/.env as LTX_API_KEY
ltxv_RfSU5hdKJb_g5dwbECZWnilE1P8dJzbavz6niP_0LQJ942ARHIVhrBCfebcytEL1efLVx_63S_PJyWTzicrBcWEkOXfCbGTl8JSzlJJk329MwRViEgOoE2KnE9LIA5t6QSFeBy7DLnTIcX0AZNbV9Jv0TuC7qcq2gV33G6ROhUVUDCuN
> related_skills --same-repo
> youtube-watcher
Fetch and read transcripts from YouTube videos. Use when you need to summarize a video, answer questions about its content, or extract information from it.
> youtube-transcript
Fetch and summarize YouTube video transcripts. Use when asked to summarize, transcribe, or extract content from YouTube videos. Handles transcript fetching via residential IP proxy to bypass YouTube's cloud IP blocks.
> youtube-auto-captions
youtube-auto-captions skill from LeoYeAI/openclaw-master-skills
> youtube
YouTube Data API integration with managed OAuth. Search videos, manage playlists, access channel data, and interact with comments. Use this skill when users want to interact with YouTube. For other third party apps, use the api-gateway skill (https://clawhub.ai/byungkyu/api-gateway).