6.6 KiB
6.6 KiB
name, description
| name | description |
|---|---|
| gemini-video-web | Generate videos in Gemini web via browser automation, download results, and collect downloaded video files into a local target folder with manifest output. Use when users ask to create Gemini web videos, need multiple video variants generated through repeated requests, or need structured outputs (paths, metadata, dedupe) for downstream publishing workflows. |
Gemini Video Web
Workflow
- Open Gemini web and check whether the user is logged in.
- If not logged in, stop and explicitly ask the user to log in.
- If logged in, open
工具and click创作视频/制作视频. - Set output directory and target video count.
- Send one video-generation prompt per request until target count is reached.
- For each request, wait until generation ends (
停止回答button disappears), then download the latest video result. - Collect downloaded files into target folder with batch naming, dedupe, and manifest.
- Return file paths, manifest path, and failure summary.
1) Prerequisites
- Ensure browser session can access Gemini (
https://gemini.google.com/app). - If login, captcha, or MFA is required, pause and ask user to complete it manually.
- Use the shared Playwright session policy across all skills:
- Auto session policy:
tools/pwderives one Playwright session perCODEX_THREAD_ID(fallback:PLAYWRIGHT_SESSION_OWNERor explicit--session). - Invoke Playwright CLI through
/Users/xd/java/xhs/tools/pw; use--session <name>only when explicit multi-session isolation is needed.
- Auto session policy:
- Decide output directory before generation, for example:
/Users/xd/java/xhs/output/gemini-video
Quick run:
python3 scripts/run_video_flow.py \
--prompt "生成一段 8 秒的科幻城市夜景镜头,雨夜霓虹,电影感运镜。" \
--target /Users/xd/java/xhs/output/gemini-video \
--count 1
2) Open Gemini And Enforce Login Gate
- Navigate to Gemini app page.
- Check login state via account/avatar area or login controls.
- If login controls are present (
登录,Sign in, orServiceLoginURL), stop immediately and ask user to log in. - Continue only when login is confirmed.
3) Enter Video Creation Tool
- Click
工具. - Click video tool item by visible text priority:
创作视频制作视频Create videoVideo
- If quick-intent card click is intercepted by overlay, retry via
工具menu item. - If video tool is not present after login is confirmed, stop and report capability unavailable for this account/region/model.
4) Multi-Video Generation Strategy
- Gemini web typically returns one video per request.
- If user asks for
Nvideos, runNrequests in sequence. - Keep a shared base prompt, then apply per-video variants only when needed.
- Record a
download_start_tsbefore each download action.
Prompt construction rules:
- Keep one clear scene target per request.
- Include subject, style, motion, camera movement, lighting, and duration.
- Include aspect ratio/fps/quality only when user requests them.
5) Wait For Completion (Explicit End Condition)
- After submit, wait for generation state to appear.
- Treat generation as complete only when:
停止回答button disappears, and- latest assistant response has downloadable video action.
- If refs are stale or state is unclear, re-snapshot and retry once.
6) Download Video
- Download from the latest assistant response block (not old history blocks).
- If download menu appears, prefer
MP4/ highest-quality item. - Wait for download completion before next request.
- Repeat until target count is reached or retry budget is exhausted.
7) Collect Downloaded Files
Use bundled script:
python3 scripts/collect_downloads.py \
--source /var/folders/.../playwright-mcp-output/<session-id> \
--source ~/Downloads \
--target /ABS/PATH/TO/output/gemini-video \
--since <download_start_unix_ts> \
--ext mp4,mov,webm,mkv,m4v,avi \
--limit <max_to_collect> \
--expected-count <required_count> \
--prefix gemini-video \
--batch-id <run_id> \
--prompt "<prompt_used>"
Script behavior:
- Source strategy:
- Prefer Playwright temp download directory first.
- Also scan
.playwright-cliand fallback to~/Downloads.
- Recursively scans source folders and filters video extensions.
- Merges matches from all source folders, then sorts by mtime.
- Uses batch naming (
<prefix>-<batch-id>-NN.ext). - Dedupes by SHA-256 (current run + existing target files).
- Writes JSON manifest and prints absolute output paths.
- Records video metadata when available (
duration_sec,bitrate_kbps,width,height,file_size_bytes).
8) Failure Handling By Step
- Login step:
- If login/captcha/MFA blocks, stop and ask user to complete manually.
- Tool-selection step:
- If
创作视频is missing after login, stop and report unsupported capability.
- If
- Generation step:
- If failed once, retry once with minimal prompt rewrite.
- If still failing, record failure reason and continue remaining quota if requested.
- Completion detection step:
- If
停止回答does not disappear within timeout, retry snapshot+wait once. - If still stuck, mark timeout and skip this request.
- If
- Download step:
- If click intercepted or stale ref, re-snapshot and retry once.
- If no file detected after timeout, mark download failure for that request.
- Collection step:
- If no matching files, return manifest with failure status.
- If dedupe removes all files, return manifest with
no_files_after_dedupe. - If collected count < required count, return
insufficient_files.
9) Return Output
Return:
- prompt used
- target count and successful count
- absolute file paths for collected files
- manifest absolute path
- retries, failures, and skipped duplicates
10) Reliability Rules
- Re-snapshot after navigation, tool switch, and generation completion.
- If refs are stale or click intercepted, re-snapshot and retry once.
- Do not assume static selectors across Gemini updates; rely on visible text and role-first matching.
11) Boundaries
- Do not bypass login verification, captcha, paywalls, or security checks.
- Do not submit disallowed or unsafe video prompts.
- Stop before posting to third-party platforms; this skill only generates and collects video files.
Scripts
/Users/xd/java/xhs/tools/pw: Shared Playwright CLI entrypoint with per-session lock, auto per-thread session resolution, and shared Chrome CDP defaults.scripts/run_video_flow.py: End-to-end runner (login gate, enter video tool, generate, download video, collect files).scripts/collect_downloads.py: Collect recent downloaded video files with fallback sources, dedupe, and manifest.