Files

T

kurihada 8f9fc6d7f7 feat(playwright): adopt per-thread sessions for shared Chrome automation

2026-03-05 18:33:36 +08:00

6.6 KiB

Raw Blame History

name, description

name	description
gemini-music-web	Generate music in Gemini web via browser automation, download results, and collect downloaded audio files into a local target folder with manifest output. Use when users ask to create Gemini web music, need multiple tracks generated through repeated requests, or need structured outputs (paths, metadata, dedupe) for downstream publishing workflows.

Gemini Music Web

Workflow

Open Gemini web and check whether the user is logged in.
If not logged in, stop and explicitly ask the user to log in.
If logged in, open 工具 and click 创作音乐/制作音乐.
Set output directory and target track count.
Send one music-generation prompt per request until target count is reached.
For each request, wait until generation ends (停止回答 button disappears), then download the latest audio result.
Collect downloaded files into target folder with batch naming, dedupe, and manifest.
Return file paths, manifest path, and failure summary.

1) Prerequisites

Ensure browser session can access Gemini (https://gemini.google.com/app).
If login, captcha, or MFA is required, pause and ask user to complete it manually.
Use the shared Playwright session policy across all skills:
- Auto session policy: tools/pw derives one Playwright session per CODEX_THREAD_ID (fallback: PLAYWRIGHT_SESSION_OWNER or explicit --session).
- Invoke Playwright CLI through /Users/xd/java/xhs/tools/pw; use --session <name> only when explicit multi-session isolation is needed.
Decide output directory before generation, for example:
- /Users/xd/java/xhs/output/gemini-music

Quick run:

python3 scripts/run_music_flow.py \
  --prompt "创作一段 90 BPM 的 lo-fi hiphop，温暖、夜晚、钢琴和刷镲，时长 30 秒。" \
  --target /Users/xd/java/xhs/output/gemini-music \
  --count 1

Navigate to Gemini app page.
Check login state via account/avatar area or login controls.
If login controls are present (登录, Sign in, or ServiceLogin URL), stop immediately and ask user to log in.
Continue only when login is confirmed.

3) Enter Music Creation Tool

Click 工具.
Click music tool item by visible text priority:
- 创作音乐
- 制作音乐
- Create music
- Music
If quick-intent card click is intercepted by overlay, retry via 工具 menu item (制作音乐).
If music tool is not present after login is confirmed, stop and report capability unavailable for this account/region/model.

4) Multi-Track Generation Strategy

Gemini web typically returns one track per request.
If user asks for N tracks, run N requests in sequence.
Keep a shared base prompt, then apply per-track variants only when needed.
Record a download_start_ts before each download action.

Prompt construction rules:

Keep a single clear style target per request.
Include genre, tempo/BPM, mood, instrumentation, and duration.
Include structure constraints only if user requests them.

5) Wait For Completion (Explicit End Condition)

After submit, wait for generation state to appear.
Treat generation as complete only when:
- 停止回答 button disappears, and
- latest assistant response has downloadable audio action.
If refs are stale or state is unclear, re-snapshot and retry once.

6) Download Audio

Download from the latest assistant response block (not old history blocks).
If download menu appears, prefer 纯音频 MP3 音轨.
Wait for download completion before next request.
Repeat until target count is reached or retry budget is exhausted.

7) Collect Downloaded Files

Use bundled script:

python3 scripts/collect_downloads.py \
  --source /var/folders/.../playwright-mcp-output/<session-id> \
  --source ~/Downloads \
  --target /ABS/PATH/TO/output/gemini-music \
  --since <download_start_unix_ts> \
  --ext mp3,wav,m4a,ogg,flac,aac \
  --limit <max_to_collect> \
  --expected-count <required_count> \
  --prefix gemini-music \
  --batch-id <run_id> \
  --prompt "<prompt_used>"

Script behavior:

Source strategy:
- Prefer Playwright temp download directory first.
- Fallback to ~/Downloads when primary source has no matches.
Recursively scans source folders and filters audio extensions.
Merges matches from all source folders, then sorts by mtime.
Uses batch naming (<prefix>-<batch-id>-NN.ext).
Dedupes by SHA-256 (current run + existing target files).
Writes JSON manifest and prints absolute output paths.
Records audio metadata when available (duration_sec, bitrate_kbps, file_size_bytes).

8) Failure Handling By Step

Login step:
- If login/captcha/MFA blocks, stop and ask user to complete manually.
Tool-selection step:
- If 创作音乐 is missing after login, stop and report unsupported capability.
Generation step:
- If failed once, retry once with minimal prompt rewrite.
- If still failing, record failure reason and continue remaining quota if requested.
Completion detection step:
- If 停止回答 does not disappear within timeout, retry snapshot+wait once.
- If still stuck, mark timeout and skip this request.
Download step:
- If click intercepted or stale ref, re-snapshot and retry once.
- If no file detected after timeout, mark download failure for that request.
Collection step:
- If no matching files, return manifest with failure status.
- If dedupe removes all files, return manifest with no_files_after_dedupe.
- If collected count < required count, return insufficient_files.

9) Return Output

Return:

prompt used
target count and successful count
absolute file paths for collected files
manifest absolute path
retries, failures, and skipped duplicates

10) Reliability Rules

Re-snapshot after navigation, tool switch, and generation completion.
If refs are stale or click intercepted, re-snapshot and retry once.
Do not assume static selectors across Gemini updates; rely on visible text and role-first matching.

11) Boundaries

Do not bypass login verification, captcha, paywalls, or security checks.
Do not submit disallowed or unsafe music prompts.
Stop before posting to third-party platforms; this skill only generates and collects music files.

Scripts

/Users/xd/java/xhs/tools/pw: Shared Playwright CLI entrypoint with per-session lock, auto per-thread session resolution, and shared Chrome CDP defaults.
scripts/run_music_flow.py: End-to-end runner (login gate, enter music tool, generate, download MP3, collect files).
scripts/collect_downloads.py: Collect recent downloaded audio files with fallback sources, dedupe, and manifest.

6.6 KiB Raw Blame History

Gemini Music Web

Workflow

1) Prerequisites

2) Open Gemini And Enforce Login Gate

3) Enter Music Creation Tool

4) Multi-Track Generation Strategy

5) Wait For Completion (Explicit End Condition)

6) Download Audio

7) Collect Downloaded Files

8) Failure Handling By Step

9) Return Output

10) Reliability Rules

11) Boundaries

Scripts

6.6 KiB

Raw Blame History