> media-generation
Generate images, edit existing images, create short videos, run inpainting/outpainting and object-focused edits, use reference images as provider inputs, batch related media jobs from a manifest, and fetch returned media from URLs/HTML/JSON/data URLs/base64. Use when working on AI image generation, AI image editing, mask-based inpainting, outpainting, reference-image workflows, short AI video generation, product-shot variations, or reusable media-production pipelines.
curl "https://skillshub.wtf/LeoYeAI/openclaw-master-skills/media-generation?format=md"Media Generation
Handle image generation, image editing, and short video generation through one workflow: choose the right modality, pass caller intent through to the provider, save outputs under tmp/images/ or tmp/videos/, and prefer the bundled helpers over ad-hoc one-off API calls.
Workflow decision
- If the user wants a brand-new still image, use an image-generation model.
- If the user supplies an image or wants a specific existing image changed, use an image-edit workflow.
- If the user wants motion / a clip / a short video, use a video-generation model.
- If the request includes one or more reference images, use the helper that supports reference-image transport.
Standard workflow
- Determine whether the task is image generation, image editing, or video generation.
- Clarify only when required to execute the request correctly.
- Prefer
scripts/generate_image.pyfor still-image generation. - Prefer
scripts/edit_image.pyfor direct image edits. - Prefer
scripts/mask_inpaint.pyfor localized edits with masks or generated regions. - Prefer
scripts/outpaint_image.pyfor canvas expansion / outpainting. - Prefer
scripts/generate_consistent_media.pywhen reference images need to be passed through. - Prefer
scripts/generate_video.pyfor video generation, especially when the provider may return async job payloads. - Prefer
scripts/generate_batch_media.pyfor repeatable batch jobs, templated variations, or auditable manifests. - Prefer
scripts/object_select_edit.pyfor simple object-vs-background edits on transparent assets or clean backdrops. - If the provider returns a URL, path, HTML snippet, markdown snippet,
data:URL, orb64_json, usescripts/fetch_generated_media.py. - Save outputs under:
- images →
tmp/images/ - videos →
tmp/videos/
- images →
- If the user wants files sent in chat, prefer sending the local downloaded file.
- Keep the original remote reference as fallback when local retrieval fails.
Prompt handling
Default to prompt pass-through.
- Pass the caller's prompt through unchanged.
- Use optional request fields only when the caller provides them.
- Keep prompt semantics under caller control.
Use the scripts mainly as functional helpers:
- normalize arguments
- map fields to provider-specific JSON
- upload files
- poll async jobs
- download returned media
- save outputs under
tmp/images/ortmp/videos/
Delivery rules
- Save generated or edited images in
tmp/images/. - Save generated videos in
tmp/videos/. - Never scatter generated files in the workspace root.
- If message delivery blocks remote URLs, download locally first and then send the local file.
- If a remote file cannot be fetched locally but the raw link may still help, provide the original link clearly.
Image generation helper
Use scripts/generate_image.py for direct still-image generation.
Example:
python3 skills/media-generation/scripts/generate_image.py \
--prompt 'person' \
--size '1024x1024' \
--out-dir 'tmp/images' \
--prefix 'generated'
The helper:
- reads provider credentials from OpenClaw config (
~/.openclaw/openclaw.jsonby default, or--config/$OPENCLAW_CONFIG) - calls
/images/generationsby default - supports
size,quality,style,background,n,seed,extra-json, andextra-json-file - downloads the returned image into
tmp/images/by default - handles providers that reply with URL/path,
data:URL, orb64_json
Image edit helper
Use scripts/edit_image.py for direct image-edit calls.
Example:
python3 skills/media-generation/scripts/edit_image.py \
--image 'tmp/images/source.jpg' \
--prompt 'replace the background' \
--out-dir 'tmp/images' \
--prefix 'edited'
The helper:
- reads provider credentials from OpenClaw config
- calls
/images/editsby default - supports optional
--maskinput for localized edits - downloads the returned image into
tmp/images/by default - handles URL/path,
data:URL, orb64_json
Mask inpaint helper
Use scripts/mask_inpaint.py for localized repainting tasks.
Example:
python3 skills/media-generation/scripts/mask_inpaint.py \
--image 'tmp/images/source.jpg' \
--x 120 --y 80 --width 220 --height 180 \
--prompt 'replace the masked area' \
--out-dir 'tmp/images' \
--prefix 'mask-result'
The helper:
- accepts either an existing
--maskimage or generated regions - supports rectangle / ellipse regions and repeatable
--regionspecs - supports percentage-based regions like
rect-pct/ellipse-pct - supports
--expand/--shrinkbefore feathering - supports
--mask-onlyfor local preparation / testing without a live API call - forwards
--config,--provider,--model, and--endpointtoscripts/edit_image.py - reuses
scripts/edit_image.pyfor the final edit call
Outpaint helper
Use scripts/outpaint_image.py for extension / canvas expansion tasks.
Example:
python3 skills/media-generation/scripts/outpaint_image.py \
--image 'tmp/images/source.jpg' \
--left 512 --right 512 --top 128 --bottom 128 \
--mode blur \
--prompt 'extend outward' \
--out-dir 'tmp/images' \
--prefix 'outpaint-result'
The helper:
- expands the canvas locally before calling the model
- supports directional expansion on each side
- supports
transparent,blur, andsolidinitialization modes - forwards
--config,--provider,--model, and--endpointtoscripts/edit_image.py - reuses
scripts/edit_image.pyfor the final edit call
Reference-image helper
Use scripts/generate_consistent_media.py when one or more reference images need to be passed through to the provider.
Note: the script name is historical; its current role is reference-image transport and delegation.
Example:
python3 skills/media-generation/scripts/generate_consistent_media.py \
--mode image \
--reference-image 'tmp/images/reference.png' \
--prompt 'character' \
--size '1024x1024' \
--out-dir 'tmp/images' \
--prefix 'reference-output'
The helper:
- can pass encoded reference images in provider JSON (default key:
reference_images) - can retry without provider-json references when transport is
auto - delegates to
scripts/generate_image.pyorscripts/generate_video.py
Batch generation helper
Use scripts/generate_batch_media.py when the user wants several related outputs, repeatable batch rendering, or a manifest-driven workflow.
Example:
python3 skills/media-generation/scripts/generate_batch_media.py \
--manifest 'tmp/images/media-batch.jsonl' \
--vars-json '{"subject":"item"}' \
--summary-out 'tmp/images/media-batch-summary.json' \
--continue-on-error \
--print-json
The helper supports:
- JSON array or JSONL manifests
- image generation, video generation, and reference-image generation
- shared templating vars via
--vars-jsonor--vars-file - item-local
varsobjects for per-item string rendering such as{index} --summary-outto persist the resolved batch result JSON--dry-runto validate a manifest before spending live generation calls
Object-select edit helper
Use scripts/object_select_edit.py when the source has a transparent background or a simple clean backdrop and the user wants a one-step object or background edit workflow.
Example:
python3 skills/media-generation/scripts/object_select_edit.py \
--image 'tmp/images/product.png' \
--selection-mode alpha \
--edit-target background \
--prompt 'replace the background' \
--out-dir 'tmp/images' \
--prefix 'product-bg-edit'
The helper:
- prepares an object/background mask with
prepare_object_mask.py - flips the mask automatically when editing the background instead of the object
- passes the prepared mask into
mask_inpaint.py - supports
--prepare-onlyfor local inspection/testing without a live edit call
Video generation helper
Use scripts/generate_video.py for direct video-generation calls.
Example:
python3 skills/media-generation/scripts/generate_video.py \
--prompt 'motion clip' \
--size '720x1280' \
--seconds 6 \
--out-dir 'tmp/videos' \
--prefix 'generated-video'
The helper:
- reads provider credentials from OpenClaw config
- calls
/videosby default - supports
size,seconds/duration,fps,seed, optional input image,extra-json, andextra-json-file - can resolve both immediate-result and async job responses by polling when the provider returns job metadata instead of the final media directly
- downloads the returned video into
tmp/videos/by default
Retrieval helper
Use scripts/fetch_generated_media.py for both images and videos.
It can extract downloadable refs from markdown / HTML / JSON, and can also persist data: URLs or b64_json payloads directly to local files.
Quick compatibility checklist
Before blaming the skill, check these first:
- config exists and is valid JSON
config.models.providers.<provider>exists- the selected provider has both
baseUrlandapiKey - the chosen endpoint actually exists on that provider
- the chosen model name is valid for that endpoint
- any provider-specific fields passed through
--extra-jsonor--extra-json-filematch that provider's schema
Defaults used by the bundled scripts:
- config path:
~/.openclaw/openclaw.jsonor$OPENCLAW_CONFIG - default provider:
$OPENCLAW_MEDIA_PROVIDER, otherwise the first provider found in config - default model names: placeholders unless overridden by env vars or
--model- image →
$OPENCLAW_MEDIA_IMAGE_MODELorimage-model - edit →
$OPENCLAW_MEDIA_EDIT_MODELorimage-edit-model - video →
$OPENCLAW_MEDIA_VIDEO_MODELorvideo-model
- image →
- output root:
tmp/or$MEDIA_GENERATION_OUTPUT_ROOT - output paths are resolved relative to the current working directory unless you pass an absolute
--out-dir
Quick troubleshooting
Common failure patterns:
provider not found→ pass--providerexplicitly or set$OPENCLAW_MEDIA_PROVIDER- placeholder model warning (
image-model/image-edit-model/video-model) → pass--modelexplicitly or set the matching$OPENCLAW_MEDIA_*_MODELenv var config not found/ invalid JSON → pass--configexplicitly or fix the OpenClaw config file- HTTP 404 → check
--endpointand video polling paths - HTTP 400 → check model name and provider-specific payload fields in
--extra-json/--extra-json-file - HTTP 401/403 → check the provider
apiKey - request failed before HTTP response → check base URL, proxy/TLS, or network reachability
- video accepted then failed later → check request payload, provider logs, or switch provider/model
Use --print-json when debugging so the response body, resolved endpoint, and failure hints stay visible.
References
- Batch workflow reference:
references/batch-workflows.md - Model capability matrix:
references/model-capabilities.md - Reference-image workflow:
references/reference-image-workflow.md - Image generation helper:
scripts/generate_image.py - Reference-image helper:
scripts/generate_consistent_media.py - Image edit helper:
scripts/edit_image.py - Mask inpaint helper:
scripts/mask_inpaint.py - Outpaint helper:
scripts/outpaint_image.py - Video generation helper:
scripts/generate_video.py - Batch generation helper:
scripts/generate_batch_media.py - Object-select edit helper:
scripts/object_select_edit.py - Object mask prep helper:
scripts/prepare_object_mask.py - Shared request utility:
scripts/media_request_common.py - Smoke tests:
scripts/smoke_test.py - Unified fetch helper:
scripts/fetch_generated_media.py
> related_skills --same-repo
> youtube-watcher
Fetch and read transcripts from YouTube videos. Use when you need to summarize a video, answer questions about its content, or extract information from it.
> youtube-transcript
Fetch and summarize YouTube video transcripts. Use when asked to summarize, transcribe, or extract content from YouTube videos. Handles transcript fetching via residential IP proxy to bypass YouTube's cloud IP blocks.
> youtube-auto-captions
youtube-auto-captions skill from LeoYeAI/openclaw-master-skills
> youtube
YouTube Data API integration with managed OAuth. Search videos, manage playlists, access channel data, and interact with comments. Use this skill when users want to interact with YouTube. For other third party apps, use the api-gateway skill (https://clawhub.ai/byungkyu/api-gateway).