feat(skill/gemini-image-web): unify image flow with music/video skills

This commit is contained in:
2026-03-04 13:54:27 +08:00
parent 787a3334b6
commit b9153c70c7
4 changed files with 364 additions and 35 deletions
+49 -22
View File
@@ -7,12 +7,14 @@ description: "Generate images in Gemini web via browser automation, download res
## Workflow
1. Open Gemini web and confirm user is logged in.
2. Set output directory and target image count.
3. Send one image-generation prompt per request until target count is reached.
4. For each request, wait until generation ends (`停止回答` button disappears), then download.
5. Collect downloaded files into target folder with batch naming, dedupe, and manifest.
6. Return file paths, manifest path, and failure summary.
1. Open Gemini web and check whether the user is logged in.
2. If not logged in, stop and explicitly ask the user to log in.
3. If logged in, open `工具` and click `创作图片`/`制作图片`.
4. Set output directory and target image count.
5. Send one image-generation prompt per request until target count is reached.
6. For each request, wait until generation ends (`停止回答` button disappears), then download the latest image result.
7. Collect downloaded files into target folder with batch naming, dedupe, and manifest.
8. Return file paths, manifest path, and failure summary.
## 1) Prerequisites
@@ -22,16 +24,38 @@ description: "Generate images in Gemini web via browser automation, download res
- `export PLAYWRIGHT_SHARED_SESSION=codex-shared`
- Invoke Playwright CLI through `/Users/xd/java/xhs/tools/pw` (do not pass `--session` manually).
- Decide output directory before generation, for example:
- `/Users/xd/java/xhs/output/gemini`
- `/Users/xd/java/xhs/output/gemini-image`
## 2) Open Gemini
Quick run:
```bash
export PLAYWRIGHT_SHARED_SESSION=codex-shared
python3 scripts/run_image_flow.py \
--prompt "生成一张电影感赛博朋克街景海报,夜晚霓虹,雨天反光,纵向构图。" \
--target /Users/xd/java/xhs/output/gemini-image \
--count 1
```
## 2) Open Gemini And Enforce Login Gate
- Navigate to Gemini app page.
- Confirm login state by checking account/avatar area.
- If not logged in, stop and ask user to complete login manually.
- Check login state via account/avatar area or login controls.
- If login controls are present (`登录`, `Sign in`, or `ServiceLogin` URL), stop immediately and ask user to log in.
- Continue only when login is confirmed.
- If model selection is needed, choose a model that supports image output.
## 3) Multi-Image Generation Strategy
## 3) Enter Image Creation Tool
- Click `工具`.
- Click image tool item by visible text priority:
- `创作图片`
- `制作图片`
- `Create image`
- `Image`
- If quick-intent card click is intercepted by overlay, retry via `工具` menu item.
- If image tool is not present after login is confirmed, stop and report capability unavailable for this account/region/model.
## 4) Multi-Image Generation Strategy
- Gemini web currently returns one image per request.
- If user asks for `N` images, run `N` requests in sequence.
@@ -44,7 +68,7 @@ Prompt construction rules:
- Include visual style, lighting, composition, and aspect ratio.
- Include banned elements only if user requests negative constraints.
## 4) Wait For Completion (Explicit End Condition)
## 5) Wait For Completion (Explicit End Condition)
- After submit, wait for generation state to appear.
- Treat generation as complete only when:
@@ -52,14 +76,14 @@ Prompt construction rules:
- latest assistant response has downloadable image action.
- If refs are stale or state is unclear, re-snapshot and retry once.
## 5) Download Images
## 6) Download Images
- Download from the latest assistant response block (not old history blocks).
- Click `下载完整尺寸的图片`.
- Wait for download completion toast/progress to end before next request.
- Repeat until target count is reached or retry budget is exhausted.
## 6) Collect Downloaded Files
## 7) Collect Downloaded Files
Use bundled script:
@@ -67,11 +91,11 @@ Use bundled script:
python3 scripts/collect_downloads.py \
--source /var/folders/.../playwright-mcp-output/<session-id> \
--source ~/Downloads \
--target /ABS/PATH/TO/output/gemini \
--target /ABS/PATH/TO/output/gemini-image \
--since <download_start_unix_ts> \
--limit <max_to_collect> \
--expected-count <required_count> \
--prefix gemini \
--prefix gemini-image \
--batch-id <run_id> \
--prompt "<prompt_used>"
```
@@ -80,17 +104,19 @@ Script behavior:
- Source strategy:
- Prefer Playwright temp download directory first.
- Fallback to `~/Downloads` when primary source has no matches.
- Also scan `.playwright-cli` and fallback to `~/Downloads`.
- Filters to image extensions (`png,jpg,jpeg,webp`).
- Uses batch naming (`<prefix>-<batch-id>-NN.ext`).
- Dedupes by SHA-256 (current run + existing target files).
- Captures dimensions (`width`, `height`) and writes JSON manifest.
- Prints absolute output paths and manifest path.
## 7) Failure Handling By Step
## 8) Failure Handling By Step
- Login step:
- If login/captcha/MFA blocks, stop and ask user to complete manually.
- Tool-selection step:
- If `创作图片` is missing after login, stop and report unsupported capability.
- Generation step:
- If failed once, retry once with minimal prompt rewrite.
- If still failing, record failure reason and continue remaining quota if requested.
@@ -105,7 +131,7 @@ Script behavior:
- If dedupe removes all files, return manifest with `no_files_after_dedupe`.
- If collected count < required count, return `insufficient_files`.
## 8) Return Output
## 9) Return Output
Return:
@@ -115,13 +141,13 @@ Return:
- manifest absolute path
- retries, failures, and skipped duplicates
## 9) Reliability Rules
## 10) Reliability Rules
- Re-snapshot after navigation, model switch, and generation completion.
- Re-snapshot after navigation, tool switch, and generation completion.
- If refs are stale or click intercepted, re-snapshot and retry once.
- Do not assume static selectors across Gemini updates; rely on visible text and role-first matching.
## 10) Boundaries
## 11) Boundaries
- Do not bypass login verification, captcha, paywalls, or security checks.
- Do not submit disallowed or unsafe image prompts.
@@ -130,4 +156,5 @@ Return:
## Scripts
- `/Users/xd/java/xhs/tools/pw`: Shared Playwright CLI entrypoint with fixed session + lock.
- `scripts/run_image_flow.py`: End-to-end runner (login gate, enter image tool, generate, download image, collect files).
- `scripts/collect_downloads.py`: Collect recent downloaded images with fallback sources, dedupe, and manifest.