> capy-video-gen-skill

Multi-shot AI video generation pipeline with face identity consistency. Converts scripts or ideas into complete videos using character extraction, storyboarding, frame generation, and video assembly. 300 experiments validated, 70% face distance improvement. Use when the user asks to create a video from a script, story, idea, or wants multi-shot video with consistent characters.

fetch

$curl "https://skillshub.wtf/happycapy-ai/Happycapy-skills/capy-video-gen-skill?format=md"

SKILL.md•capy-video-gen-skill

Capy Video Gen Skill - Script-to-Video Pipeline

Generate complete multi-shot videos from scripts or ideas with consistent character faces across all scenes. Built for HappyCapy AI Gateway. 300 experiments validated, 70% face distance improvement.

Overview

ViMax converts text scripts into full videos through an automated pipeline:

Extract characters from script with detailed physical features
Generate front/side/back character portraits
Design shot-by-shot storyboard
Decompose each shot into first_frame, last_frame, and motion descriptions
Build camera tree for shot relationships
Generate frames with reference image selection (face identity as top priority)
Generate video clips from frames
Concatenate into final video

Installation Location

The ViMax pipeline code is at: /home/node/a0/workspace/527fb591-1439-4b5b-ad5d-90f972773f95/workspace/tmp/ViMax/

All commands must be run from this directory using the venv:

cd /home/node/a0/workspace/527fb591-1439-4b5b-ad5d-90f972773f95/workspace/tmp/ViMax

Prerequisites

AI_GATEWAY_API_KEY environment variable (auto-configured in HappyCapy)
Python venv at .venv/ (already set up)

Quick Start

Script-to-Video

Edit the script, requirements, and style in the entry script, then run:

cd /home/node/a0/workspace/527fb591-1439-4b5b-ad5d-90f972773f95/workspace/tmp/ViMax
.venv/bin/python main_happycapy_script2video.py

Idea-to-Video

For generating from a brief idea (auto-generates script first):

cd /home/node/a0/workspace/527fb591-1439-4b5b-ad5d-90f972773f95/workspace/tmp/ViMax
.venv/bin/python main_happycapy_idea2video.py

Programmatic Usage

import asyncio
from langchain.chat_models import init_chat_model
from tools.render_backend import RenderBackend
from utils.config_loader import load_config
from pipelines.script2video_pipeline import Script2VideoPipeline

config = load_config("configs/happycapy_script2video.yaml")
chat_model = init_chat_model(**config["chat_model"]["init_args"])
backend = RenderBackend.from_config(config)

pipeline = Script2VideoPipeline(
    chat_model=chat_model,
    image_generator=backend.image_generator,
    video_generator=backend.video_generator,
    working_dir=config["working_dir"],
)

# Run the pipeline
asyncio.run(pipeline(
    script="Your script here...",
    user_requirement="No more than 8 shots total.",
    style="Cinematic, warm lighting"
))

Pipelines

Script2VideoPipeline

Input: A formatted screenplay/script with character dialogue and scene descriptions
Output: Concatenated video at {working_dir}/final_video.mp4
Config: configs/happycapy_script2video.yaml

Idea2VideoPipeline

Input: A brief idea/concept (1-3 paragraphs)
Output: Auto-generates a script, then produces video
Config: configs/happycapy_idea2video.yaml

Configuration

HappyCapy configs at configs/happycapy_script2video.yaml:

chat_model:
  init_args:
    model: gpt-4.1
    model_provider: openai
    api_key: ${AI_GATEWAY_API_KEY}
    base_url: https://ai-gateway.happycapy.ai/api/v1/openai/v1

image_generator:
  class_path: tools.ImageGeneratorHappyCapyAPI
  init_args:
    api_key: ${AI_GATEWAY_API_KEY}
    model: google/gemini-3.1-flash-image-preview

video_generator:
  class_path: tools.VideoGeneratorHappyCapyAPI
  init_args:
    api_key: ${AI_GATEWAY_API_KEY}
    model: google/veo-3.1-generate-preview

working_dir: .working_dir/script2video

Key Components

Agents (AI Processing)

Agent	File	Purpose
CharacterExtractor	`agents/character_extractor.py`	Extract characters with static/dynamic features from script
CharacterPortraitsGenerator	`agents/character_portraits_generator.py`	Generate front/side/back portraits for each character
StoryboardArtist	`agents/storyboard_artist.py`	Design shot-by-shot storyboard with first/last frames and motion
ReferenceImageSelector	`agents/reference_image_selector.py`	Select best reference images for each frame (face identity #1 priority)
CameraImageGenerator	`agents/camera_image_generator.py`	Build camera trees and generate transition videos
BestImageSelector	`agents/best_image_selector.py`	Select best generated image from candidates
Screenwriter	`agents/screenwriter.py`	Generate scripts from ideas

Tools (Generation Backends)

Tool	File	Purpose
ImageGeneratorHappyCapyAPI	`tools/image_generator_happycapy_api.py`	Image generation via HappyCapy Gateway (Gemini)
VideoGeneratorHappyCapyAPI	`tools/video_generator_happycapy_api.py`	Video generation via HappyCapy Gateway (Veo)
RenderBackend	`tools/render_backend.py`	Factory for instantiating generators from config

Interfaces (Data Models)

CharacterInScene - Character with identifier, static_features, dynamic_features
ShotDescription - Shot with ff_desc, lf_desc, motion_desc, variation_type
Camera - Camera with parent-child relationships
Frame - Frame with shot_idx, frame_type, visible characters
ImageOutput / VideoOutput - Generation outputs with save methods

Face Identity Consistency (CRITICAL)

This pipeline includes face identity improvements validated through 257 experiments (70% improvement in face distance, from 0.74 to 0.22):

Built-In Protections

Reference Image Selector: Face identity is the #1 priority when selecting reference images. The front-view portrait is always included when a character's face is visible.
Character Portraits: Enhanced prompts generate identity-critical details (exact nose shape, eye spacing, jawline, distinguishing marks) for cross-scene recognition.
Video Prompt Face Lock: Every video generation prompt is prepended with a face identity instruction requiring the character's face to remain identical to the starting frame throughout the clip.

Best Practices When Using ViMax

Hyper-detailed character descriptions: Include ethnicity, age, hair texture/style/color, eye shape, facial hair, glasses, skin tone, build, and distinguishing marks in your script's character introductions
Extreme close-up shots: Include at least one extreme close-up per character to anchor identity
Consistent lighting: Specify similar lighting across scenes to prevent face drift
User-provided reference photos: Place photos in the working directory and pass them as character_portraits_registry to skip AI portrait generation

What Does NOT Work

Complex prompt engineering (viseme morphing, phoneme anchoring) does not improve face identity
Simple, direct prompts with detailed physical descriptions outperform clever prompts
Lip-sync to external audio is NOT possible (Veo generates its own internal audio)

See FACE_IDENTITY_GUIDE.md in the ViMax directory for full details.

Output Structure

After a run, the working directory contains:

.working_dir/script2video/
  characters.json                      # Extracted characters
  character_portraits_registry.json    # Portrait paths registry
  character_portraits/                 # Generated portraits
    0_CharacterName/
      front.png
      side.png
      back.png
  storyboard.json                     # Shot descriptions
  camera_tree.json                    # Camera relationships
  shots/
    0/
      shot_description.json
      first_frame.png
      last_frame.png (if medium/large variation)
      video.mp4
    1/
      ...
  final_video.mp4                     # Final concatenated output

Customization

Using Your Own Reference Photos

To use real photos instead of AI-generated portraits:

# Build a portrait registry pointing to your photos
character_portraits_registry = {
    "Alice": {
        "front": {"path": "/path/to/alice_front.png", "description": "Front view of Alice"},
        "side": {"path": "/path/to/alice_side.png", "description": "Side view of Alice"},
        "back": {"path": "/path/to/alice_back.png", "description": "Back view of Alice"},
    }
}

# Pass to pipeline (skips portrait generation)
await pipeline(
    script=script,
    user_requirement=user_requirement,
    style=style,
    character_portraits_registry=character_portraits_registry,
)

Changing Models

Edit the YAML config to use different models:

Image: google/gemini-3.1-flash-image-preview (recommended for face identity)
Video: google/veo-3.1-generate-preview (recommended) or openai/sora-2
Chat: gpt-4.1 (recommended) or any OpenAI-compatible model

Troubleshooting

"No module named 'tools'" or similar import errors

Run from the ViMax root directory:

cd /home/node/a0/workspace/527fb591-1439-4b5b-ad5d-90f972773f95/workspace/tmp/ViMax
.venv/bin/python main_happycapy_script2video.py

API rate limit errors

Reduce max_requests_per_minute in the YAML config.

Face identity drift in generated videos

Add more physical detail to character descriptions in your script
Use user-provided reference photos instead of AI-generated portraits
Include extreme close-up shots for important characters
Keep lighting consistent across scenes

> related_skills --same-repo

> youtube-music

Search and play music tracks on YouTube Music through MCP integration. Use when user wants to search for songs, play music, or discover tracks on YouTube Music platform.

> xiaohongshu-recruiter

用于在小红书上发布高质量的 AI 相关岗位招聘帖子。包含自动生成极客风格的招聘封面图和详情图，并提供自动化发布脚本。当用户需要发布招聘信息、寻找 Agent 设计师或其他 AI 领域人才时使用。

> writing-clearly-and-concisely

Use when writing prose humans will read—documentation, commit messages, error messages, explanations, reports, or UI text. Applies Strunk's timeless rules for clearer, stronger, more professional writing.

> world-class-carousel

Generate world-class Instagram carousel content on any topic. Produces 7-10 publication-ready slides (1080x1350) with AI-generated visuals, precise typography, Instagram music recommendations, optimized captions, and hashtags. Uses Aristotelian first-principles framework with 7 content archetypes, 6 hook patterns, a mandatory Bullshit Test quality gate, and a comprehensive design system. Fully generalized -- works for ANY topic. Triggers: instagram carousel, create carousel, carousel post, make

┌ stats

installs/wk0

░░░░░░░░░░

github stars128

██████████

first seenApr 3, 2026

└────────────