feat(skill/gemini-image-web): unify image flow with music/video skills
This commit is contained in:
@@ -7,12 +7,14 @@ description: "Generate images in Gemini web via browser automation, download res
|
||||
|
||||
## Workflow
|
||||
|
||||
1. Open Gemini web and confirm user is logged in.
|
||||
2. Set output directory and target image count.
|
||||
3. Send one image-generation prompt per request until target count is reached.
|
||||
4. For each request, wait until generation ends (`停止回答` button disappears), then download.
|
||||
5. Collect downloaded files into target folder with batch naming, dedupe, and manifest.
|
||||
6. Return file paths, manifest path, and failure summary.
|
||||
1. Open Gemini web and check whether the user is logged in.
|
||||
2. If not logged in, stop and explicitly ask the user to log in.
|
||||
3. If logged in, open `工具` and click `创作图片`/`制作图片`.
|
||||
4. Set output directory and target image count.
|
||||
5. Send one image-generation prompt per request until target count is reached.
|
||||
6. For each request, wait until generation ends (`停止回答` button disappears), then download the latest image result.
|
||||
7. Collect downloaded files into target folder with batch naming, dedupe, and manifest.
|
||||
8. Return file paths, manifest path, and failure summary.
|
||||
|
||||
## 1) Prerequisites
|
||||
|
||||
@@ -22,16 +24,38 @@ description: "Generate images in Gemini web via browser automation, download res
|
||||
- `export PLAYWRIGHT_SHARED_SESSION=codex-shared`
|
||||
- Invoke Playwright CLI through `/Users/xd/java/xhs/tools/pw` (do not pass `--session` manually).
|
||||
- Decide output directory before generation, for example:
|
||||
- `/Users/xd/java/xhs/output/gemini`
|
||||
- `/Users/xd/java/xhs/output/gemini-image`
|
||||
|
||||
## 2) Open Gemini
|
||||
Quick run:
|
||||
|
||||
```bash
|
||||
export PLAYWRIGHT_SHARED_SESSION=codex-shared
|
||||
python3 scripts/run_image_flow.py \
|
||||
--prompt "生成一张电影感赛博朋克街景海报,夜晚霓虹,雨天反光,纵向构图。" \
|
||||
--target /Users/xd/java/xhs/output/gemini-image \
|
||||
--count 1
|
||||
```
|
||||
|
||||
## 2) Open Gemini And Enforce Login Gate
|
||||
|
||||
- Navigate to Gemini app page.
|
||||
- Confirm login state by checking account/avatar area.
|
||||
- If not logged in, stop and ask user to complete login manually.
|
||||
- Check login state via account/avatar area or login controls.
|
||||
- If login controls are present (`登录`, `Sign in`, or `ServiceLogin` URL), stop immediately and ask user to log in.
|
||||
- Continue only when login is confirmed.
|
||||
- If model selection is needed, choose a model that supports image output.
|
||||
|
||||
## 3) Multi-Image Generation Strategy
|
||||
## 3) Enter Image Creation Tool
|
||||
|
||||
- Click `工具`.
|
||||
- Click image tool item by visible text priority:
|
||||
- `创作图片`
|
||||
- `制作图片`
|
||||
- `Create image`
|
||||
- `Image`
|
||||
- If quick-intent card click is intercepted by overlay, retry via `工具` menu item.
|
||||
- If image tool is not present after login is confirmed, stop and report capability unavailable for this account/region/model.
|
||||
|
||||
## 4) Multi-Image Generation Strategy
|
||||
|
||||
- Gemini web currently returns one image per request.
|
||||
- If user asks for `N` images, run `N` requests in sequence.
|
||||
@@ -44,7 +68,7 @@ Prompt construction rules:
|
||||
- Include visual style, lighting, composition, and aspect ratio.
|
||||
- Include banned elements only if user requests negative constraints.
|
||||
|
||||
## 4) Wait For Completion (Explicit End Condition)
|
||||
## 5) Wait For Completion (Explicit End Condition)
|
||||
|
||||
- After submit, wait for generation state to appear.
|
||||
- Treat generation as complete only when:
|
||||
@@ -52,14 +76,14 @@ Prompt construction rules:
|
||||
- latest assistant response has downloadable image action.
|
||||
- If refs are stale or state is unclear, re-snapshot and retry once.
|
||||
|
||||
## 5) Download Images
|
||||
## 6) Download Images
|
||||
|
||||
- Download from the latest assistant response block (not old history blocks).
|
||||
- Click `下载完整尺寸的图片`.
|
||||
- Wait for download completion toast/progress to end before next request.
|
||||
- Repeat until target count is reached or retry budget is exhausted.
|
||||
|
||||
## 6) Collect Downloaded Files
|
||||
## 7) Collect Downloaded Files
|
||||
|
||||
Use bundled script:
|
||||
|
||||
@@ -67,11 +91,11 @@ Use bundled script:
|
||||
python3 scripts/collect_downloads.py \
|
||||
--source /var/folders/.../playwright-mcp-output/<session-id> \
|
||||
--source ~/Downloads \
|
||||
--target /ABS/PATH/TO/output/gemini \
|
||||
--target /ABS/PATH/TO/output/gemini-image \
|
||||
--since <download_start_unix_ts> \
|
||||
--limit <max_to_collect> \
|
||||
--expected-count <required_count> \
|
||||
--prefix gemini \
|
||||
--prefix gemini-image \
|
||||
--batch-id <run_id> \
|
||||
--prompt "<prompt_used>"
|
||||
```
|
||||
@@ -80,17 +104,19 @@ Script behavior:
|
||||
|
||||
- Source strategy:
|
||||
- Prefer Playwright temp download directory first.
|
||||
- Fallback to `~/Downloads` when primary source has no matches.
|
||||
- Also scan `.playwright-cli` and fallback to `~/Downloads`.
|
||||
- Filters to image extensions (`png,jpg,jpeg,webp`).
|
||||
- Uses batch naming (`<prefix>-<batch-id>-NN.ext`).
|
||||
- Dedupes by SHA-256 (current run + existing target files).
|
||||
- Captures dimensions (`width`, `height`) and writes JSON manifest.
|
||||
- Prints absolute output paths and manifest path.
|
||||
|
||||
## 7) Failure Handling By Step
|
||||
## 8) Failure Handling By Step
|
||||
|
||||
- Login step:
|
||||
- If login/captcha/MFA blocks, stop and ask user to complete manually.
|
||||
- Tool-selection step:
|
||||
- If `创作图片` is missing after login, stop and report unsupported capability.
|
||||
- Generation step:
|
||||
- If failed once, retry once with minimal prompt rewrite.
|
||||
- If still failing, record failure reason and continue remaining quota if requested.
|
||||
@@ -105,7 +131,7 @@ Script behavior:
|
||||
- If dedupe removes all files, return manifest with `no_files_after_dedupe`.
|
||||
- If collected count < required count, return `insufficient_files`.
|
||||
|
||||
## 8) Return Output
|
||||
## 9) Return Output
|
||||
|
||||
Return:
|
||||
|
||||
@@ -115,13 +141,13 @@ Return:
|
||||
- manifest absolute path
|
||||
- retries, failures, and skipped duplicates
|
||||
|
||||
## 9) Reliability Rules
|
||||
## 10) Reliability Rules
|
||||
|
||||
- Re-snapshot after navigation, model switch, and generation completion.
|
||||
- Re-snapshot after navigation, tool switch, and generation completion.
|
||||
- If refs are stale or click intercepted, re-snapshot and retry once.
|
||||
- Do not assume static selectors across Gemini updates; rely on visible text and role-first matching.
|
||||
|
||||
## 10) Boundaries
|
||||
## 11) Boundaries
|
||||
|
||||
- Do not bypass login verification, captcha, paywalls, or security checks.
|
||||
- Do not submit disallowed or unsafe image prompts.
|
||||
@@ -130,4 +156,5 @@ Return:
|
||||
## Scripts
|
||||
|
||||
- `/Users/xd/java/xhs/tools/pw`: Shared Playwright CLI entrypoint with fixed session + lock.
|
||||
- `scripts/run_image_flow.py`: End-to-end runner (login gate, enter image tool, generate, download image, collect files).
|
||||
- `scripts/collect_downloads.py`: Collect recent downloaded images with fallback sources, dedupe, and manifest.
|
||||
|
||||
Reference in New Issue
Block a user