Files

T

kurihada 4caa839154 feat(skills): 初始化多技能包并完善发布流程

- 新增 git-push skill 与 agent 元数据，加入安全推送与 commit message 规范

- 新增 xiaohongshu-engage / xiaohongshu-publish-note 两个技能及元数据

- 新增 gemini-image-web skill、元数据与下载整理脚本

- 新增 .gitignore 以忽略常见生成产物

2026-03-04 00:46:44 +08:00

4.9 KiB

Raw Blame History

name, description

name	description
gemini-image-web	Generate images in Gemini web via browser automation, download results, and collect downloaded files into a local target folder with manifest output. Use when users ask to create Gemini web images, need multiple images generated through repeated requests, or need structured outputs (paths, metadata, dedupe) for downstream publishing workflows.

Gemini Image Web

Workflow

Open Gemini web and confirm user is logged in.
Set output directory and target image count.
Send one image-generation prompt per request until target count is reached.
For each request, wait until generation ends (停止回答 button disappears), then download.
Collect downloaded files into target folder with batch naming, dedupe, and manifest.
Return file paths, manifest path, and failure summary.

1) Prerequisites

Ensure browser session can access Gemini (https://gemini.google.com/app).
If login, captcha, or MFA is required, pause and ask user to complete it manually.
Decide output directory before generation, for example:
- /Users/xd/java/xhs/output/gemini

2) Open Gemini

Navigate to Gemini app page.
Confirm login state by checking account/avatar area.
If not logged in, stop and ask user to complete login manually.
If model selection is needed, choose a model that supports image output.

3) Multi-Image Generation Strategy

Gemini web currently returns one image per request.
If user asks for N images, run N requests in sequence.
Keep a shared base prompt, then apply per-image variants only when needed.
Record a download_start_ts before each download action.

Prompt construction rules:

Keep a single clear subject per prompt.
Include visual style, lighting, composition, and aspect ratio.
Include banned elements only if user requests negative constraints.

4) Wait For Completion (Explicit End Condition)

After submit, wait for generation state to appear.
Treat generation as complete only when:
- 停止回答 button disappears, and
- latest assistant response has downloadable image action.
If refs are stale or state is unclear, re-snapshot and retry once.

5) Download Images

Download from the latest assistant response block (not old history blocks).
Click 下载完整尺寸的图片.
Wait for download completion toast/progress to end before next request.
Repeat until target count is reached or retry budget is exhausted.

6) Collect Downloaded Files

Use bundled script:

python3 scripts/collect_downloads.py \
  --source /var/folders/.../playwright-mcp-output/<session-id> \
  --source ~/Downloads \
  --target /ABS/PATH/TO/output/gemini \
  --since <download_start_unix_ts> \
  --limit <max_to_collect> \
  --expected-count <required_count> \
  --prefix gemini \
  --batch-id <run_id> \
  --prompt "<prompt_used>"

Script behavior:

Source strategy:
- Prefer Playwright temp download directory first.
- Fallback to ~/Downloads when primary source has no matches.
Filters to image extensions (png,jpg,jpeg,webp).
Uses batch naming (<prefix>-<batch-id>-NN.ext).
Dedupes by SHA-256 (current run + existing target files).
Captures dimensions (width, height) and writes JSON manifest.
Prints absolute output paths and manifest path.

7) Failure Handling By Step

Login step:
- If login/captcha/MFA blocks, stop and ask user to complete manually.
Generation step:
- If failed once, retry once with minimal prompt rewrite.
- If still failing, record failure reason and continue remaining quota if requested.
Completion detection step:
- If 停止回答 does not disappear within timeout, retry snapshot+wait once.
- If still stuck, mark timeout and skip this request.
Download step:
- If click intercepted or stale ref, re-snapshot and retry once.
- If no file detected after timeout, mark download failure for that request.
Collection step:
- If no matching files, return manifest with failure status.
- If dedupe removes all files, return manifest with no_files_after_dedupe.
- If collected count < required count, return insufficient_files.

8) Return Output

Return:

prompt used
target count and successful count
absolute file paths for collected files
manifest absolute path
retries, failures, and skipped duplicates

9) Reliability Rules

Re-snapshot after navigation, model switch, and generation completion.
If refs are stale or click intercepted, re-snapshot and retry once.
Do not assume static selectors across Gemini updates; rely on visible text and role-first matching.

10) Boundaries

Do not bypass login verification, captcha, paywalls, or security checks.
Do not submit disallowed or unsafe image prompts.
Stop before posting to third-party platforms; this skill only generates and collects images.

Scripts

scripts/collect_downloads.py: Collect recent downloaded images with fallback sources, dedupe, and manifest.

4.9 KiB Raw Blame History

Gemini Image Web

Workflow

1) Prerequisites

2) Open Gemini

3) Multi-Image Generation Strategy

4) Wait For Completion (Explicit End Condition)

5) Download Images

6) Collect Downloaded Files

7) Failure Handling By Step

8) Return Output

9) Reliability Rules

10) Boundaries

Scripts

4.9 KiB

Raw Blame History