> storyboard-video-orchestrator

Plan and orchestrate a full AI‑generated video (e.g., a 10‑minute short film or promo) by breaking it into **5–10 second scenes**, attaching script lines and audio requirements to each scene, and coordinating downstream skills (video, music, SFX, voice) to produce all assets needed for final assembly.

fetch
$curl "https://skillshub.wtf/tippyentertainment/skills/storyboard-video-orchestrator?format=md"
SKILL.mdstoryboard-video-orchestrator

Provided by TippyEntertainment

https://github.com/tippyentertainment/skills.git

This skill is designed for use on the Tasking.tech agent platform (https://tasking.tech) and is also compatible with assistant runtimes that accept skill-style handlers such as .claude, .openai, and .mistral. Use this skill for both Claude code and Tasking.tech agent source.

storyboard-video-orchestrator

Summary

Plan and orchestrate a full AI‑generated video (e.g., a 10‑minute short film or promo) by breaking it into 5–10 second scenes, attaching script lines and audio requirements to each scene, and coordinating downstream skills (video, music, SFX, voice) to produce all assets needed for final assembly.

This skill focuses on storyboarding, structuring, and orchestration, not on low‑level rendering.


When to Use

Use this skill when the user wants to:

  • Turn an idea or script into a scene‑by‑scene storyboard for an AI‑generated video.
  • Build longer videos (e.g., ~10 minutes) from multiple short clips generated by ComfyUI or similar tools.
  • Attach dialogue/voice‑over, music, and sound effects to each scene.
  • Produce a production plan that other skills can execute to render video and audio assets.

Typical use cases:

  • Anime‑style shorts or trailers.
  • Product or platform promos (e.g., for tasking.tech).
  • Narrative explainer videos or cinematic demos.
  • “AI movies” constructed from many short clips.

Inputs to Collect

The assistant should ask for:

High‑Level Video Brief

  • Goal / purpose
    • e.g. brand promo, narrative short, tutorial, trailer.
  • Target total duration
    • e.g. 10 minutes (default), or a specific range.
  • Tone & style
    • Anime, cinematic, documentary, playful, serious, etc.
  • Visual style references
    • Keywords or reference works (without copying them), art styles, color palettes.
  • Audience / rating
    • General, teen, mature.

Story / Content

  • Source material
    • Existing script, outline, or just a high‑level premise.
  • Characters & setting
    • Main characters, roles, important locations.
  • Key beats
    • Moments that must appear (introductions, reveals, climax, call‑to‑action).

Audio Requirements

  • Voice‑over / dialogue
    • Narration only, character dialogue, or both.
  • Music
    • Style (genre, tempo, mood), whether continuous or scene‑based.
  • Sound effects
    • Level of detail (just key effects vs rich sound design).

If any of these are missing, the skill should ask 2–4 clarifying questions before generating the storyboard.


Expected Behavior

1. Break the Story into Scenes

  • Determine the number of scenes based on target duration and 5–10 second clips:
    • For a 10‑minute video (600 seconds), expect roughly 60–100 scenes.
    • Optionally group scenes into chapters/segments (e.g., intro, body, outro).
  • For each scene:
    • Assign a scene number and approximate duration (5–10 seconds).
    • Write a short description of what happens visually.
    • Note camera/motion style (static, pan, zoom, orbit, etc.).

2. Attach Script to Each Scene

  • Take the user’s script or generate one consistent with the brief.
  • Split the script across scenes:
    • For voice‑over, align lines to scenes based on pacing.
    • For dialogue, map lines to characters and scenes where they speak.
  • For each scene, attach:
    • voiceoverText (if any).
    • dialogueLines (character → line).
    • Notes about timing (e.g., line starts mid‑scene, at second 3).

3. Attach Audio Requirements

For each scene, define:

  • Music
    • Whether scene uses:
      • Global background track, or
      • A specific musical cue (e.g., “builds tension”, “drops out here”).
  • Ambience
    • Environment sound: city, forest, office, spaceship, etc.
  • Sound FX
    • List important effects: footsteps, doors, UI beeps, magic attacks, explosions, etc.
  • Voice
    • Which voice(s) are needed:
      • Narrator.
      • Character voices (which characters, approximate lines).

This creates a per‑scene audio spec that downstream skills can implement.

4. Produce a Structured Storyboard

The main output is a structured storyboard document (JSON‑like, or a table) with entries like:

  • sceneNumber
  • startTime / endTime (cumulative)
  • durationSeconds
  • visualDescription
  • cameraStyle
  • videoPrompt (for video generator)
  • voiceoverText
  • dialogue (array of { character, line })
  • musicSpec (mood, intensity, references)
  • ambienceSpec
  • sfxSpec (list of effects)
  • notes (continuity, transitions, overlays, titles)

The skill should also generate a human‑readable version (e.g., Markdown table) for review.

5. Orchestrate Downstream Skills (Conceptually)

This skill does not run external tools itself, but it should explicitly prepare tasks for other skills:

  • For video generation (e.g., comfyui-video-generator):
    • Provide, per scene:
      • videoPrompt, durationSeconds, fps, resolution, reference images.
  • For voice generation (comfyui-voice-generator):
    • Provide, per scene or sequence:
      • voiceoverText, speaker style, language, pace, target duration.
  • For music/ambience (comfyui-audio-creator):
    • Provide:
      • Segment durations and mood/genre per chapter or scene group.
  • For sound effects (comfyui-soundfx-creator):
    • Provide:
      • SFX lists with timestamps within each scene.

The skill should output these as clearly labeled sections or structured objects so an orchestrator can call the other skills in the correct order.


Output Format (to the Caller)

By default, respond with:

  1. Overview

    • 3–6 sentences summarizing the planned video (story, tone, length, structure).
  2. Global Plan

    • Bullet list:
      • Target duration.
      • Number of scenes.
      • Chapter/segment breakdown (if used).
  3. Storyboard Table (Condensed View)

    • A Markdown table with one row per scene, including:
      • Scene #
      • Time range
      • Duration
      • Visual description
      • Key audio (voice/music/SFX keywords)
    • Keep descriptions concise but clear.
  4. Detailed Scene Specs

    • For each scene, a short block including:
      • Visual description & camera style.
      • videoPrompt.
      • voiceoverText and/or dialogue lines.
      • musicSpec, ambienceSpec, sfxSpec.
    • This section is the “API contract” for video/audio skills.
  5. Orchestration Plan

    • Numbered steps like:
      1. Use comfyui-video-generator to render all scenes with given prompts/durations.
      2. Use comfyui-voice-generator to generate narration/dialogue per scene or per segment.
      3. Use comfyui-audio-creator for background music/ambience per segment.
      4. Use comfyui-soundfx-creator for per‑event SFX per scene.
      5. Assemble all assets in an editor (or export pipeline) into a single ~10‑minute video.
  6. Next Actions

    • Clear checklist for the user, e.g.:
      • Approve/adjust storyboard.
      • Select models/styles for video and audio.
      • Kick off asset generation for each scene.

Orchestration Notes

  • This skill is upstream of:
    • comfyui-video-generator
    • comfyui-voice-generator
    • comfyui-audio-creator
    • comfyui-soundfx-creator
  • It should:
    • Maintain continuity of characters, locations, and visual style across scenes.
    • Ensure cumulative duration matches the target (e.g., 10 minutes ± ~5%).
    • Aim for scenes in the 5–10 second range; use shorter shots for action, longer shots for exposition.
  • For iterative workflows:
    • Accept an existing storyboard and:
      • Add or remove scenes.
      • Adjust durations.
      • Rewrite prompts/lines for clarity or style.
    • Keep scene IDs stable where possible so previously generated assets can be reused.

This skill does not perform final rendering or editing; it creates a production‑ready blueprint and detailed per‑scene specs that other skills and tools can execute to build the complete video.

┌ stats

installs/wk0
░░░░░░░░░░
first seenMar 23, 2026
└────────────

┌ repo

tippyentertainment/skills
by tippyentertainment
└────────────