diff --git a/skills/gemini-music-web/SKILL.md b/skills/gemini-music-web/SKILL.md new file mode 100644 index 0000000..468214d --- /dev/null +++ b/skills/gemini-music-web/SKILL.md @@ -0,0 +1,156 @@ +--- +name: gemini-music-web +description: "Generate music in Gemini web via browser automation, download results, and collect downloaded audio files into a local target folder with manifest output. Use when users ask to create Gemini web music, need multiple tracks generated through repeated requests, or need structured outputs (paths, metadata, dedupe) for downstream publishing workflows." +--- + +# Gemini Music Web + +## Workflow + +1. Open Gemini web and check whether the user is logged in. +2. If not logged in, stop and explicitly ask the user to log in. +3. If logged in, open `工具` and click `创作音乐`/`制作音乐`. +4. Set output directory and target track count. +5. Send one music-generation prompt per request until target count is reached. +6. For each request, wait until generation ends (`停止回答` button disappears), then download the latest audio result. +7. Collect downloaded files into target folder with batch naming, dedupe, and manifest. +8. Return file paths, manifest path, and failure summary. + +## 1) Prerequisites + +- Ensure browser session can access Gemini (`https://gemini.google.com/app`). +- If login, captcha, or MFA is required, pause and ask user to complete it manually. +- Decide output directory before generation, for example: + - `/Users/xd/java/xhs/output/gemini-music` + +Quick run: + +```bash +bash scripts/run_music_flow.sh \ + --prompt "创作一段 90 BPM 的 lo-fi hiphop,温暖、夜晚、钢琴和刷镲,时长 30 秒。" \ + --target /Users/xd/java/xhs/output/gemini-music \ + --count 1 +``` + +## 2) Open Gemini And Enforce Login Gate + +- Navigate to Gemini app page. +- Check login state via account/avatar area or login controls. +- If login controls are present (`登录`, `Sign in`, or `ServiceLogin` URL), stop immediately and ask user to log in. +- Continue only when login is confirmed. + +## 3) Enter Music Creation Tool + +- Click `工具`. +- Click music tool item by visible text priority: + - `创作音乐` + - `制作音乐` + - `Create music` + - `Music` +- If quick-intent card click is intercepted by overlay, retry via `工具` menu item (`制作音乐`). +- If music tool is not present after login is confirmed, stop and report capability unavailable for this account/region/model. + +## 4) Multi-Track Generation Strategy + +- Gemini web typically returns one track per request. +- If user asks for `N` tracks, run `N` requests in sequence. +- Keep a shared base prompt, then apply per-track variants only when needed. +- Record a `download_start_ts` before each download action. + +Prompt construction rules: + +- Keep a single clear style target per request. +- Include genre, tempo/BPM, mood, instrumentation, and duration. +- Include structure constraints only if user requests them. + +## 5) Wait For Completion (Explicit End Condition) + +- After submit, wait for generation state to appear. +- Treat generation as complete only when: + - `停止回答` button disappears, and + - latest assistant response has downloadable audio action. +- If refs are stale or state is unclear, re-snapshot and retry once. + +## 6) Download Audio + +- Download from the latest assistant response block (not old history blocks). +- If download menu appears, prefer `纯音频 MP3 音轨`. +- Wait for download completion before next request. +- Repeat until target count is reached or retry budget is exhausted. + +## 7) Collect Downloaded Files + +Use bundled script: + +```bash +python3 scripts/collect_downloads.py \ + --source /var/folders/.../playwright-mcp-output/ \ + --source ~/Downloads \ + --target /ABS/PATH/TO/output/gemini-music \ + --since \ + --ext mp3,wav,m4a,ogg,flac,aac \ + --limit \ + --expected-count \ + --prefix gemini-music \ + --batch-id \ + --prompt "" +``` + +Script behavior: + +- Source strategy: + - Prefer Playwright temp download directory first. + - Fallback to `~/Downloads` when primary source has no matches. +- Recursively scans source folders and filters audio extensions. +- Merges matches from all source folders, then sorts by mtime. +- Uses batch naming (`--NN.ext`). +- Dedupes by SHA-256 (current run + existing target files). +- Writes JSON manifest and prints absolute output paths. +- Records audio metadata when available (`duration_sec`, `bitrate_kbps`, `file_size_bytes`). + +## 8) Failure Handling By Step + +- Login step: + - If login/captcha/MFA blocks, stop and ask user to complete manually. +- Tool-selection step: + - If `创作音乐` is missing after login, stop and report unsupported capability. +- Generation step: + - If failed once, retry once with minimal prompt rewrite. + - If still failing, record failure reason and continue remaining quota if requested. +- Completion detection step: + - If `停止回答` does not disappear within timeout, retry snapshot+wait once. + - If still stuck, mark timeout and skip this request. +- Download step: + - If click intercepted or stale ref, re-snapshot and retry once. + - If no file detected after timeout, mark download failure for that request. +- Collection step: + - If no matching files, return manifest with failure status. + - If dedupe removes all files, return manifest with `no_files_after_dedupe`. + - If collected count < required count, return `insufficient_files`. + +## 9) Return Output + +Return: + +- prompt used +- target count and successful count +- absolute file paths for collected files +- manifest absolute path +- retries, failures, and skipped duplicates + +## 10) Reliability Rules + +- Re-snapshot after navigation, tool switch, and generation completion. +- If refs are stale or click intercepted, re-snapshot and retry once. +- Do not assume static selectors across Gemini updates; rely on visible text and role-first matching. + +## 11) Boundaries + +- Do not bypass login verification, captcha, paywalls, or security checks. +- Do not submit disallowed or unsafe music prompts. +- Stop before posting to third-party platforms; this skill only generates and collects music files. + +## Scripts + +- `scripts/run_music_flow.sh`: End-to-end runner (login gate, enter music tool, generate, download MP3, collect files). +- `scripts/collect_downloads.py`: Collect recent downloaded audio files with fallback sources, dedupe, and manifest. diff --git a/skills/gemini-music-web/agents/openai.yaml b/skills/gemini-music-web/agents/openai.yaml new file mode 100644 index 0000000..bce7702 --- /dev/null +++ b/skills/gemini-music-web/agents/openai.yaml @@ -0,0 +1,4 @@ +interface: + display_name: "Gemini Music Web" + short_description: "Generate Gemini music via web with login gate and manifest." + default_prompt: "Use $gemini-music-web to run scripts/run_music_flow.sh: verify Gemini login, enter 创作音乐, generate tracks one-by-one, prefer MP3 download, then collect files with dedupe and manifest metadata." diff --git a/skills/gemini-music-web/scripts/collect_downloads.py b/skills/gemini-music-web/scripts/collect_downloads.py new file mode 100755 index 0000000..25717b0 --- /dev/null +++ b/skills/gemini-music-web/scripts/collect_downloads.py @@ -0,0 +1,404 @@ +#!/usr/bin/env python3 +"""Collect recent audio downloads into a target directory with manifest output.""" + +from __future__ import annotations + +import argparse +import hashlib +import json +import re +import shutil +import subprocess +import sys +import time +import wave +from datetime import datetime, timezone +from pathlib import Path + + +def parse_args() -> argparse.Namespace: + parser = argparse.ArgumentParser( + description="Collect recent audio downloads into a target directory." + ) + parser.add_argument( + "--source", + action="append", + help=( + "Source download directory. Repeatable. " + "If omitted, auto-discovers Playwright temp downloads and then " + "falls back to ~/Downloads." + ), + ) + parser.add_argument( + "--target", + required=True, + help="Target directory for collected files.", + ) + parser.add_argument( + "--since", + type=float, + default=time.time() - 1800, + help="Unix timestamp lower bound for file mtime. Default: now-1800s", + ) + parser.add_argument( + "--ext", + default="mp3,wav,m4a,ogg,flac,aac", + help="Comma-separated file extensions to include.", + ) + parser.add_argument( + "--limit", + type=int, + default=8, + help="Maximum files to collect. Default: 8", + ) + parser.add_argument( + "--expected-count", + type=int, + default=None, + help="Required minimum number of collected files.", + ) + parser.add_argument( + "--prefix", + default="gemini-music", + help="Filename prefix for collected files. Default: gemini-music", + ) + parser.add_argument( + "--batch-id", + default=None, + help="Batch ID used in output filenames. Default: current timestamp.", + ) + parser.add_argument( + "--manifest", + default=None, + help="Manifest output path. Default: /--manifest.json", + ) + parser.add_argument( + "--prompt", + default="", + help="Prompt text to store in manifest.", + ) + parser.add_argument( + "--move", + action="store_true", + help="Move files instead of copying.", + ) + parser.add_argument( + "--no-dedupe-target", + action="store_true", + help="Disable hash dedupe against existing files in target directory.", + ) + return parser.parse_args() + + +def unique_path(path: Path) -> Path: + if not path.exists(): + return path + stem = path.stem + suffix = path.suffix + parent = path.parent + idx = 2 + while True: + candidate = parent / f"{stem}-{idx}{suffix}" + if not candidate.exists(): + return candidate + idx += 1 + + +def collect_candidates(source: Path, since_ts: float, allowed_ext: set[str]) -> list[Path]: + files: list[Path] = [] + if not source.exists(): + return files + for path in source.rglob("*"): + if not path.is_file(): + continue + ext = path.suffix.lower().lstrip(".") + if ext not in allowed_ext: + continue + try: + mtime = path.stat().st_mtime + except OSError: + continue + if mtime >= since_ts: + files.append(path) + files.sort(key=lambda p: p.stat().st_mtime, reverse=True) + return files + + +def discover_playwright_sources() -> list[Path]: + globs = ( + "/var/folders/*/*/T/playwright-mcp-output/*", + "/private/var/folders/*/*/T/playwright-mcp-output/*", + "/var/folders/*/*/*/T/playwright-mcp-output/*", + "/private/var/folders/*/*/*/T/playwright-mcp-output/*", + "/tmp/playwright-mcp-output/*", + ) + candidates: list[Path] = [] + seen: set[Path] = set() + for pattern in globs: + for raw in Path("/").glob(pattern.lstrip("/")): + if not raw.is_dir(): + continue + path = raw.resolve() + if path in seen: + continue + seen.add(path) + candidates.append(path) + candidates.sort(key=lambda p: p.stat().st_mtime, reverse=True) + return candidates + + +def resolve_sources(raw_sources: list[str] | None) -> list[Path]: + if raw_sources: + return [Path(item).expanduser().resolve() for item in raw_sources] + auto_sources = discover_playwright_sources() + auto_sources.append((Path.home() / "Downloads").resolve()) + result: list[Path] = [] + seen: set[Path] = set() + for path in auto_sources: + if path in seen: + continue + seen.add(path) + result.append(path) + return result + + +def sha256_of_file(path: Path) -> str: + digest = hashlib.sha256() + with path.open("rb") as fh: + while True: + chunk = fh.read(1024 * 1024) + if not chunk: + break + digest.update(chunk) + return digest.hexdigest() + + +def read_audio_metadata(path: Path) -> tuple[float | None, int | None]: + # Prefer ffprobe for broad codec/container support. + try: + proc = subprocess.run( + [ + "ffprobe", + "-v", + "error", + "-show_entries", + "format=duration,bit_rate", + "-of", + "json", + str(path), + ], + check=False, + capture_output=True, + text=True, + ) + except OSError: + proc = None + if proc and proc.returncode == 0: + try: + payload = json.loads(proc.stdout or "{}") + fmt = payload.get("format", {}) + dur_raw = fmt.get("duration") + br_raw = fmt.get("bit_rate") + duration = float(dur_raw) if dur_raw not in (None, "") else None + bitrate_kbps = ( + int(int(br_raw) / 1000) if br_raw not in (None, "") else None + ) + return duration, bitrate_kbps + except (ValueError, TypeError, json.JSONDecodeError): + pass + + # macOS fallback for compressed formats when ffprobe is unavailable. + try: + proc = subprocess.run( + ["afinfo", str(path)], + check=False, + capture_output=True, + text=True, + ) + except OSError: + proc = None + if proc and proc.returncode == 0: + duration_match = re.search(r"estimated duration:\s*([0-9.]+)\s*sec", proc.stdout) + bitrate_match = re.search(r"bit rate:\s*([0-9]+)\s*bits per second", proc.stdout) + duration = float(duration_match.group(1)) if duration_match else None + bitrate_kbps = int(int(bitrate_match.group(1)) / 1000) if bitrate_match else None + if duration is not None or bitrate_kbps is not None: + return duration, bitrate_kbps + + # Fallback for WAV without external dependencies. + if path.suffix.lower() == ".wav": + try: + with wave.open(str(path), "rb") as wav_file: + frames = wav_file.getnframes() + frame_rate = wav_file.getframerate() + channels = wav_file.getnchannels() + sample_width = wav_file.getsampwidth() + duration = (frames / frame_rate) if frame_rate else None + bitrate_kbps = int((frame_rate * channels * sample_width * 8) / 1000) + return duration, bitrate_kbps + except (wave.Error, OSError, ValueError): + return None, None + + return None, None + + +def iso_ts(ts: float) -> str: + return datetime.fromtimestamp(ts, tz=timezone.utc).isoformat() + + +def collect_candidates_all_sources( + sources: list[Path], since_ts: float, allowed_ext: set[str] +) -> tuple[list[Path], list[dict[str, object]]]: + tried: list[dict[str, object]] = [] + merged: list[Path] = [] + seen: set[Path] = set() + for source in sources: + files = collect_candidates(source, since_ts, allowed_ext) + tried.append({"source": str(source), "matches": len(files)}) + for file_path in files: + resolved = file_path.resolve() + if resolved in seen: + continue + seen.add(resolved) + merged.append(file_path) + merged.sort(key=lambda p: p.stat().st_mtime, reverse=True) + return merged, tried + + +def collect_existing_hashes(target: Path, allowed_ext: set[str]) -> set[str]: + hashes: set[str] = set() + for path in target.iterdir(): + if not path.is_file(): + continue + ext = path.suffix.lower().lstrip(".") + if ext not in allowed_ext: + continue + try: + hashes.add(sha256_of_file(path)) + except OSError: + continue + return hashes + + +def write_manifest(manifest_path: Path, payload: dict[str, object]) -> None: + manifest_path.parent.mkdir(parents=True, exist_ok=True) + with manifest_path.open("w", encoding="utf-8") as fh: + json.dump(payload, fh, ensure_ascii=False, indent=2) + fh.write("\n") + + +def main() -> int: + args = parse_args() + target = Path(args.target).expanduser().resolve() + target.mkdir(parents=True, exist_ok=True) + batch_id = args.batch_id or time.strftime("%Y%m%d-%H%M%S") + manifest_path = ( + Path(args.manifest).expanduser().resolve() + if args.manifest + else target / f"{args.prefix}-{batch_id}-manifest.json" + ) + + allowed_ext = { + ext.strip().lower().lstrip(".") + for ext in args.ext.split(",") + if ext.strip() + } + if not allowed_ext: + print("No valid extensions provided.", file=sys.stderr) + return 2 + + sources = resolve_sources(args.source) + candidates, tried_sources = collect_candidates_all_sources(sources, args.since, allowed_ext) + if not candidates: + payload = { + "status": "no_matching_files", + "created_at": iso_ts(time.time()), + "batch_id": batch_id, + "prompt": args.prompt, + "target_dir": str(target), + "since_ts": args.since, + "sources_tried": tried_sources, + "collected_count": 0, + "files": [], + } + write_manifest(manifest_path, payload) + print("No matching files found.") + print(f"MANIFEST: {manifest_path}") + return 1 + + dedupe_target = not args.no_dedupe_target + seen_hashes: set[str] = set() + if dedupe_target: + seen_hashes.update(collect_existing_hashes(target, allowed_ext)) + + files: list[dict[str, object]] = [] + skipped_duplicates = 0 + for src in candidates: + if len(files) >= args.limit: + break + try: + src_hash = sha256_of_file(src) + except OSError: + continue + if src_hash in seen_hashes: + skipped_duplicates += 1 + continue + + idx = len(files) + 1 + dst = target / f"{args.prefix}-{batch_id}-{idx:02d}{src.suffix.lower()}" + dst = unique_path(dst) + src_mtime = src.stat().st_mtime + if args.move: + shutil.move(str(src), str(dst)) + else: + shutil.copy2(str(src), str(dst)) + duration_sec, bitrate_kbps = read_audio_metadata(dst) + file_entry = { + "prompt": args.prompt, + "generated_at": iso_ts(src_mtime), + "source_filename": src.name, + "source_path": str(src.resolve()), + "target_path": str(dst.resolve()), + "sha256": src_hash, + "file_size_bytes": dst.stat().st_size, + "duration_sec": duration_sec, + "bitrate_kbps": bitrate_kbps, + } + files.append(file_entry) + seen_hashes.add(src_hash) + + status = "ok" + exit_code = 0 + expected_count = args.expected_count + if not files: + status = "no_files_after_dedupe" + exit_code = 1 + elif expected_count is not None and len(files) < expected_count: + status = "insufficient_files" + exit_code = 1 + + payload = { + "status": status, + "created_at": iso_ts(time.time()), + "batch_id": batch_id, + "prompt": args.prompt, + "target_dir": str(target), + "sources_tried": tried_sources, + "since_ts": args.since, + "limit": args.limit, + "expected_count": expected_count, + "dedupe_target": dedupe_target, + "skipped_duplicates": skipped_duplicates, + "collected_count": len(files), + "files": files, + } + write_manifest(manifest_path, payload) + + for item in files: + print(item["target_path"]) + print(f"MANIFEST: {manifest_path}") + return exit_code + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/skills/gemini-music-web/scripts/run_music_flow.sh b/skills/gemini-music-web/scripts/run_music_flow.sh new file mode 100755 index 0000000..b47d886 --- /dev/null +++ b/skills/gemini-music-web/scripts/run_music_flow.sh @@ -0,0 +1,230 @@ +#!/usr/bin/env bash +set -euo pipefail + +usage() { + cat <<'EOF' +Usage: + run_music_flow.sh --prompt "" --target /abs/output/dir [--count N] [--session NAME] [--no-headed] + +Example: + run_music_flow.sh \ + --prompt "创作一段 90 BPM 的 lo-fi hiphop,温暖、夜晚、钢琴和刷镲,时长 30 秒。" \ + --target /Users/xd/java/xhs/output/gemini-music \ + --count 2 +EOF +} + +PROMPT="" +TARGET="" +COUNT=1 +SESSION="gmw$(date +%s)" +HEADED=1 + +while [[ $# -gt 0 ]]; do + case "$1" in + --prompt) + PROMPT="${2:-}" + shift 2 + ;; + --target) + TARGET="${2:-}" + shift 2 + ;; + --count) + COUNT="${2:-1}" + shift 2 + ;; + --session) + SESSION="${2:-$SESSION}" + shift 2 + ;; + --no-headed) + HEADED=0 + shift + ;; + -h|--help) + usage + exit 0 + ;; + *) + echo "Unknown arg: $1" >&2 + usage + exit 1 + ;; + esac +done + +if [[ -z "$PROMPT" || -z "$TARGET" ]]; then + echo "Both --prompt and --target are required." >&2 + usage + exit 1 +fi + +if ! [[ "$COUNT" =~ ^[0-9]+$ ]] || [[ "$COUNT" -lt 1 ]]; then + echo "--count must be a positive integer." >&2 + exit 1 +fi + +CODEX_HOME="${CODEX_HOME:-$HOME/.codex}" +PWCLI="${PWCLI:-$CODEX_HOME/skills/playwright/scripts/playwright_cli.sh}" +COLLECT_SCRIPT="$(cd "$(dirname "$0")" && pwd)/collect_downloads.py" + +if ! command -v npx >/dev/null 2>&1; then + echo "npx is required." >&2 + exit 1 +fi +if [[ ! -x "$PWCLI" ]]; then + echo "Playwright wrapper not found or not executable: $PWCLI" >&2 + exit 1 +fi +if [[ ! -f "$COLLECT_SCRIPT" ]]; then + echo "Collector script not found: $COLLECT_SCRIPT" >&2 + exit 1 +fi + +pw() { + "$PWCLI" --session "$SESSION" "$@" +} + +json_escape() { + python3 - "$1" <<'PY' +import json +import sys +print(json.dumps(sys.argv[1])) +PY +} + +is_login_required() { + local out + out="$( + pw eval "() => { + const hasAccount = !!document.querySelector('button[aria-label*=\\\"Google 账号\\\"], button[aria-label*=\\\"Google Account\\\"]'); + const hasService = !!document.querySelector('a[href*=\\\"ServiceLogin\\\"]'); + const hasLoginCtl = Array.from(document.querySelectorAll('a,button')).some(el => /登录|Sign in/i.test((el.textContent || '').trim())); + return !hasAccount && (hasService || hasLoginCtl); + }" + )" + echo "$out" | rg -q '^true$' +} + +enter_music_tool() { + local js + js="$(cat <<'JS' +const labels = [/创作音乐/, /制作音乐/, /Create music/i, /Music/i]; + +const tryCardButtons = async () => { + for (const re of labels) { + const btn = page.getByRole('button', { name: re }).first(); + if (await btn.count()) { + try { + await btn.click({ timeout: 2000 }); + return true; + } catch (_) { + // Overlay may intercept pointer. Fall through to menu strategy. + } + } + } + return false; +}; + +const tryToolMenu = async () => { + await page.getByRole('button', { name: '工具', exact: true }).click(); + for (const re of labels) { + const itemCheck = page.getByRole('menuitemcheckbox', { name: re }).first(); + if (await itemCheck.count()) { + await itemCheck.click(); + return true; + } + const itemPlain = page.getByRole('menuitem', { name: re }).first(); + if (await itemPlain.count()) { + await itemPlain.click(); + return true; + } + } + return false; +}; + +let ok = await tryCardButtons(); +if (!ok) ok = await tryToolMenu(); +if (!ok) { + // Re-open the tool menu once and retry as a last attempt. + ok = await tryToolMenu(); +} +if (!ok) { + throw new Error('Music tool entry not found'); +} +JS +)" + pw run-code "$js" >/dev/null +} + +submit_and_download_one() { + local track_prompt="$1" + local escaped + escaped="$(json_escape "$track_prompt")" + local js + js="$(cat < {}); +await stopBtn.waitFor({ state: 'hidden', timeout: 240000 }); + +const downloadBtn = page.getByRole('button', { name: /下载音乐作品|Download music/i }).last(); +await downloadBtn.click(); + +const mp3Item = page.getByRole('menuitem', { name: /纯音频|MP3/i }).first(); +if (await mp3Item.count()) { + await mp3Item.click(); +} else { + const anyItem = page.getByRole('menuitem').first(); + if (await anyItem.count()) await anyItem.click(); +} + +await page.waitForTimeout(1200); +JS +)" + pw run-code "$js" >/dev/null +} + +mkdir -p "$TARGET" +start_ts="$(python3 - <<'PY' +import time +print(time.time()) +PY +)" + +if [[ "$HEADED" -eq 1 ]]; then + pw open "https://gemini.google.com/app" --headed >/dev/null +else + pw open "https://gemini.google.com/app" >/dev/null +fi +pw snapshot >/dev/null + +if is_login_required; then + echo "Gemini is not logged in. Please log in at https://gemini.google.com/app and rerun." >&2 + exit 2 +fi + +enter_music_tool + +for ((i=1; i<=COUNT; i++)); do + current_prompt="$PROMPT" + if [[ "$COUNT" -gt 1 ]]; then + current_prompt="$PROMPT +变体要求:这是第 $i / $COUNT 首。保持风格一致,但旋律和节奏细节需要变化。" + fi + submit_and_download_one "$current_prompt" +done + +python3 "$COLLECT_SCRIPT" \ + --target "$TARGET" \ + --since "$start_ts" \ + --expected-count "$COUNT" \ + --limit "$COUNT" \ + --prefix "gemini-music" \ + --prompt "$PROMPT"