> media-generation

Generate images, edit existing images, create short videos, run inpainting/outpainting and object-focused edits, use reference images as provider inputs, batch related media jobs from a manifest, and fetch returned media from URLs/HTML/JSON/data URLs/base64. Use when working on AI image generation, AI image editing, mask-based inpainting, outpainting, reference-image workflows, short AI video generation, product-shot variations, or reusable media-production pipelines.

fetch

$curl "https://skillshub.wtf/LeoYeAI/openclaw-master-skills/media-generation?format=md"

SKILL.md•media-generation

Media Generation

Handle image generation, image editing, and short video generation through one workflow: choose the right modality, pass caller intent through to the provider, save outputs under tmp/images/ or tmp/videos/, and prefer the bundled helpers over ad-hoc one-off API calls.

Workflow decision

If the user wants a brand-new still image, use an image-generation model.
If the user supplies an image or wants a specific existing image changed, use an image-edit workflow.
If the user wants motion / a clip / a short video, use a video-generation model.
If the request includes one or more reference images, use the helper that supports reference-image transport.

Standard workflow

Determine whether the task is image generation, image editing, or video generation.
Clarify only when required to execute the request correctly.
Prefer scripts/generate_image.py for still-image generation.
Prefer scripts/edit_image.py for direct image edits.
Prefer scripts/mask_inpaint.py for localized edits with masks or generated regions.
Prefer scripts/outpaint_image.py for canvas expansion / outpainting.
Prefer scripts/generate_consistent_media.py when reference images need to be passed through.
Prefer scripts/generate_video.py for video generation, especially when the provider may return async job payloads.
Prefer scripts/generate_batch_media.py for repeatable batch jobs, templated variations, or auditable manifests.
Prefer scripts/object_select_edit.py for simple object-vs-background edits on transparent assets or clean backdrops.
If the provider returns a URL, path, HTML snippet, markdown snippet, data: URL, or b64_json, use scripts/fetch_generated_media.py.
Save outputs under:
- images → tmp/images/
- videos → tmp/videos/
If the user wants files sent in chat, prefer sending the local downloaded file.
Keep the original remote reference as fallback when local retrieval fails.

Prompt handling

Default to prompt pass-through.

Pass the caller's prompt through unchanged.
Use optional request fields only when the caller provides them.
Keep prompt semantics under caller control.

Use the scripts mainly as functional helpers:

normalize arguments
map fields to provider-specific JSON
upload files
poll async jobs
download returned media
save outputs under tmp/images/ or tmp/videos/

Delivery rules

Save generated or edited images in tmp/images/.
Save generated videos in tmp/videos/.
Never scatter generated files in the workspace root.
If message delivery blocks remote URLs, download locally first and then send the local file.
If a remote file cannot be fetched locally but the raw link may still help, provide the original link clearly.

Image generation helper

Use scripts/generate_image.py for direct still-image generation.

Example:

python3 skills/media-generation/scripts/generate_image.py \
  --prompt 'person' \
  --size '1024x1024' \
  --out-dir 'tmp/images' \
  --prefix 'generated'

The helper:

reads provider credentials from OpenClaw config (~/.openclaw/openclaw.json by default, or --config / $OPENCLAW_CONFIG)
calls /images/generations by default
supports size, quality, style, background, n, seed, extra-json, and extra-json-file
downloads the returned image into tmp/images/ by default
handles providers that reply with URL/path, data: URL, or b64_json

Image edit helper

Use scripts/edit_image.py for direct image-edit calls.

Example:

python3 skills/media-generation/scripts/edit_image.py \
  --image 'tmp/images/source.jpg' \
  --prompt 'replace the background' \
  --out-dir 'tmp/images' \
  --prefix 'edited'

The helper:

reads provider credentials from OpenClaw config
calls /images/edits by default
supports optional --mask input for localized edits
downloads the returned image into tmp/images/ by default
handles URL/path, data: URL, or b64_json

Mask inpaint helper

Use scripts/mask_inpaint.py for localized repainting tasks.

Example:

python3 skills/media-generation/scripts/mask_inpaint.py \
  --image 'tmp/images/source.jpg' \
  --x 120 --y 80 --width 220 --height 180 \
  --prompt 'replace the masked area' \
  --out-dir 'tmp/images' \
  --prefix 'mask-result'

The helper:

accepts either an existing --mask image or generated regions
supports rectangle / ellipse regions and repeatable --region specs
supports percentage-based regions like rect-pct / ellipse-pct
supports --expand / --shrink before feathering
supports --mask-only for local preparation / testing without a live API call
forwards --config, --provider, --model, and --endpoint to scripts/edit_image.py
reuses scripts/edit_image.py for the final edit call

Outpaint helper

Use scripts/outpaint_image.py for extension / canvas expansion tasks.

Example:

python3 skills/media-generation/scripts/outpaint_image.py \
  --image 'tmp/images/source.jpg' \
  --left 512 --right 512 --top 128 --bottom 128 \
  --mode blur \
  --prompt 'extend outward' \
  --out-dir 'tmp/images' \
  --prefix 'outpaint-result'

The helper:

expands the canvas locally before calling the model
supports directional expansion on each side
supports transparent, blur, and solid initialization modes
forwards --config, --provider, --model, and --endpoint to scripts/edit_image.py
reuses scripts/edit_image.py for the final edit call

Reference-image helper

Use scripts/generate_consistent_media.py when one or more reference images need to be passed through to the provider.

Note: the script name is historical; its current role is reference-image transport and delegation.

Example:

python3 skills/media-generation/scripts/generate_consistent_media.py \
  --mode image \
  --reference-image 'tmp/images/reference.png' \
  --prompt 'character' \
  --size '1024x1024' \
  --out-dir 'tmp/images' \
  --prefix 'reference-output'

The helper:

can pass encoded reference images in provider JSON (default key: reference_images)
can retry without provider-json references when transport is auto
delegates to scripts/generate_image.py or scripts/generate_video.py

Batch generation helper

Use scripts/generate_batch_media.py when the user wants several related outputs, repeatable batch rendering, or a manifest-driven workflow.

Example:

python3 skills/media-generation/scripts/generate_batch_media.py \
  --manifest 'tmp/images/media-batch.jsonl' \
  --vars-json '{"subject":"item"}' \
  --summary-out 'tmp/images/media-batch-summary.json' \
  --continue-on-error \
  --print-json

The helper supports:

JSON array or JSONL manifests
image generation, video generation, and reference-image generation
shared templating vars via --vars-json or --vars-file
item-local vars objects for per-item string rendering such as {index}
--summary-out to persist the resolved batch result JSON
--dry-run to validate a manifest before spending live generation calls

Object-select edit helper

Use scripts/object_select_edit.py when the source has a transparent background or a simple clean backdrop and the user wants a one-step object or background edit workflow.

Example:

python3 skills/media-generation/scripts/object_select_edit.py \
  --image 'tmp/images/product.png' \
  --selection-mode alpha \
  --edit-target background \
  --prompt 'replace the background' \
  --out-dir 'tmp/images' \
  --prefix 'product-bg-edit'

The helper:

prepares an object/background mask with prepare_object_mask.py
flips the mask automatically when editing the background instead of the object
passes the prepared mask into mask_inpaint.py
supports --prepare-only for local inspection/testing without a live edit call

Video generation helper

Use scripts/generate_video.py for direct video-generation calls.

Example:

python3 skills/media-generation/scripts/generate_video.py \
  --prompt 'motion clip' \
  --size '720x1280' \
  --seconds 6 \
  --out-dir 'tmp/videos' \
  --prefix 'generated-video'

The helper:

reads provider credentials from OpenClaw config
calls /videos by default
supports size, seconds / duration, fps, seed, optional input image, extra-json, and extra-json-file
can resolve both immediate-result and async job responses by polling when the provider returns job metadata instead of the final media directly
downloads the returned video into tmp/videos/ by default

Retrieval helper

Use scripts/fetch_generated_media.py for both images and videos. It can extract downloadable refs from markdown / HTML / JSON, and can also persist data: URLs or b64_json payloads directly to local files.

Quick compatibility checklist

Before blaming the skill, check these first:

config exists and is valid JSON
config.models.providers.<provider> exists
the selected provider has both baseUrl and apiKey
the chosen endpoint actually exists on that provider
the chosen model name is valid for that endpoint
any provider-specific fields passed through --extra-json or --extra-json-file match that provider's schema

Defaults used by the bundled scripts:

config path: ~/.openclaw/openclaw.json or $OPENCLAW_CONFIG
default provider: $OPENCLAW_MEDIA_PROVIDER, otherwise the first provider found in config
default model names: placeholders unless overridden by env vars or --model
- image → $OPENCLAW_MEDIA_IMAGE_MODEL or image-model
- edit → $OPENCLAW_MEDIA_EDIT_MODEL or image-edit-model
- video → $OPENCLAW_MEDIA_VIDEO_MODEL or video-model
output root: tmp/ or $MEDIA_GENERATION_OUTPUT_ROOT
output paths are resolved relative to the current working directory unless you pass an absolute --out-dir

Quick troubleshooting

Common failure patterns:

provider not found → pass --provider explicitly or set $OPENCLAW_MEDIA_PROVIDER
placeholder model warning (image-model / image-edit-model / video-model) → pass --model explicitly or set the matching $OPENCLAW_MEDIA_*_MODEL env var
config not found / invalid JSON → pass --config explicitly or fix the OpenClaw config file
HTTP 404 → check --endpoint and video polling paths
HTTP 400 → check model name and provider-specific payload fields in --extra-json / --extra-json-file
HTTP 401/403 → check the provider apiKey
request failed before HTTP response → check base URL, proxy/TLS, or network reachability
video accepted then failed later → check request payload, provider logs, or switch provider/model

Use --print-json when debugging so the response body, resolved endpoint, and failure hints stay visible.

References

Batch workflow reference: references/batch-workflows.md
Model capability matrix: references/model-capabilities.md
Reference-image workflow: references/reference-image-workflow.md
Image generation helper: scripts/generate_image.py
Reference-image helper: scripts/generate_consistent_media.py
Image edit helper: scripts/edit_image.py
Mask inpaint helper: scripts/mask_inpaint.py
Outpaint helper: scripts/outpaint_image.py
Video generation helper: scripts/generate_video.py
Batch generation helper: scripts/generate_batch_media.py
Object-select edit helper: scripts/object_select_edit.py
Object mask prep helper: scripts/prepare_object_mask.py
Shared request utility: scripts/media_request_common.py
Smoke tests: scripts/smoke_test.py
Unified fetch helper: scripts/fetch_generated_media.py

> related_skills --same-repo

> youtube-watcher

Fetch and read transcripts from YouTube videos. Use when you need to summarize a video, answer questions about its content, or extract information from it.

> youtube-transcript

Fetch and summarize YouTube video transcripts. Use when asked to summarize, transcribe, or extract content from YouTube videos. Handles transcript fetching via residential IP proxy to bypass YouTube's cloud IP blocks.

> youtube-auto-captions

youtube-auto-captions skill from LeoYeAI/openclaw-master-skills

> youtube

YouTube Data API integration with managed OAuth. Search videos, manage playlists, access channel data, and interact with comments. Use this skill when users want to interact with YouTube. For other third party apps, use the api-gateway skill (https://clawhub.ai/byungkyu/api-gateway).

┌ stats

installs/wk0

░░░░░░░░░░

github stars2.1K

██████████

first seenMar 23, 2026

└────────────

┌ repo

LeoYeAI/openclaw-master-skills

by LeoYeAI

└────────────