feat(skill/gemini-image-web): unify image flow with music/video skills
This commit is contained in:
@@ -7,12 +7,14 @@ description: "Generate images in Gemini web via browser automation, download res
|
|||||||
|
|
||||||
## Workflow
|
## Workflow
|
||||||
|
|
||||||
1. Open Gemini web and confirm user is logged in.
|
1. Open Gemini web and check whether the user is logged in.
|
||||||
2. Set output directory and target image count.
|
2. If not logged in, stop and explicitly ask the user to log in.
|
||||||
3. Send one image-generation prompt per request until target count is reached.
|
3. If logged in, open `工具` and click `创作图片`/`制作图片`.
|
||||||
4. For each request, wait until generation ends (`停止回答` button disappears), then download.
|
4. Set output directory and target image count.
|
||||||
5. Collect downloaded files into target folder with batch naming, dedupe, and manifest.
|
5. Send one image-generation prompt per request until target count is reached.
|
||||||
6. Return file paths, manifest path, and failure summary.
|
6. For each request, wait until generation ends (`停止回答` button disappears), then download the latest image result.
|
||||||
|
7. Collect downloaded files into target folder with batch naming, dedupe, and manifest.
|
||||||
|
8. Return file paths, manifest path, and failure summary.
|
||||||
|
|
||||||
## 1) Prerequisites
|
## 1) Prerequisites
|
||||||
|
|
||||||
@@ -22,16 +24,38 @@ description: "Generate images in Gemini web via browser automation, download res
|
|||||||
- `export PLAYWRIGHT_SHARED_SESSION=codex-shared`
|
- `export PLAYWRIGHT_SHARED_SESSION=codex-shared`
|
||||||
- Invoke Playwright CLI through `/Users/xd/java/xhs/tools/pw` (do not pass `--session` manually).
|
- Invoke Playwright CLI through `/Users/xd/java/xhs/tools/pw` (do not pass `--session` manually).
|
||||||
- Decide output directory before generation, for example:
|
- Decide output directory before generation, for example:
|
||||||
- `/Users/xd/java/xhs/output/gemini`
|
- `/Users/xd/java/xhs/output/gemini-image`
|
||||||
|
|
||||||
## 2) Open Gemini
|
Quick run:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
export PLAYWRIGHT_SHARED_SESSION=codex-shared
|
||||||
|
python3 scripts/run_image_flow.py \
|
||||||
|
--prompt "生成一张电影感赛博朋克街景海报,夜晚霓虹,雨天反光,纵向构图。" \
|
||||||
|
--target /Users/xd/java/xhs/output/gemini-image \
|
||||||
|
--count 1
|
||||||
|
```
|
||||||
|
|
||||||
|
## 2) Open Gemini And Enforce Login Gate
|
||||||
|
|
||||||
- Navigate to Gemini app page.
|
- Navigate to Gemini app page.
|
||||||
- Confirm login state by checking account/avatar area.
|
- Check login state via account/avatar area or login controls.
|
||||||
- If not logged in, stop and ask user to complete login manually.
|
- If login controls are present (`登录`, `Sign in`, or `ServiceLogin` URL), stop immediately and ask user to log in.
|
||||||
|
- Continue only when login is confirmed.
|
||||||
- If model selection is needed, choose a model that supports image output.
|
- If model selection is needed, choose a model that supports image output.
|
||||||
|
|
||||||
## 3) Multi-Image Generation Strategy
|
## 3) Enter Image Creation Tool
|
||||||
|
|
||||||
|
- Click `工具`.
|
||||||
|
- Click image tool item by visible text priority:
|
||||||
|
- `创作图片`
|
||||||
|
- `制作图片`
|
||||||
|
- `Create image`
|
||||||
|
- `Image`
|
||||||
|
- If quick-intent card click is intercepted by overlay, retry via `工具` menu item.
|
||||||
|
- If image tool is not present after login is confirmed, stop and report capability unavailable for this account/region/model.
|
||||||
|
|
||||||
|
## 4) Multi-Image Generation Strategy
|
||||||
|
|
||||||
- Gemini web currently returns one image per request.
|
- Gemini web currently returns one image per request.
|
||||||
- If user asks for `N` images, run `N` requests in sequence.
|
- If user asks for `N` images, run `N` requests in sequence.
|
||||||
@@ -44,7 +68,7 @@ Prompt construction rules:
|
|||||||
- Include visual style, lighting, composition, and aspect ratio.
|
- Include visual style, lighting, composition, and aspect ratio.
|
||||||
- Include banned elements only if user requests negative constraints.
|
- Include banned elements only if user requests negative constraints.
|
||||||
|
|
||||||
## 4) Wait For Completion (Explicit End Condition)
|
## 5) Wait For Completion (Explicit End Condition)
|
||||||
|
|
||||||
- After submit, wait for generation state to appear.
|
- After submit, wait for generation state to appear.
|
||||||
- Treat generation as complete only when:
|
- Treat generation as complete only when:
|
||||||
@@ -52,14 +76,14 @@ Prompt construction rules:
|
|||||||
- latest assistant response has downloadable image action.
|
- latest assistant response has downloadable image action.
|
||||||
- If refs are stale or state is unclear, re-snapshot and retry once.
|
- If refs are stale or state is unclear, re-snapshot and retry once.
|
||||||
|
|
||||||
## 5) Download Images
|
## 6) Download Images
|
||||||
|
|
||||||
- Download from the latest assistant response block (not old history blocks).
|
- Download from the latest assistant response block (not old history blocks).
|
||||||
- Click `下载完整尺寸的图片`.
|
- Click `下载完整尺寸的图片`.
|
||||||
- Wait for download completion toast/progress to end before next request.
|
- Wait for download completion toast/progress to end before next request.
|
||||||
- Repeat until target count is reached or retry budget is exhausted.
|
- Repeat until target count is reached or retry budget is exhausted.
|
||||||
|
|
||||||
## 6) Collect Downloaded Files
|
## 7) Collect Downloaded Files
|
||||||
|
|
||||||
Use bundled script:
|
Use bundled script:
|
||||||
|
|
||||||
@@ -67,11 +91,11 @@ Use bundled script:
|
|||||||
python3 scripts/collect_downloads.py \
|
python3 scripts/collect_downloads.py \
|
||||||
--source /var/folders/.../playwright-mcp-output/<session-id> \
|
--source /var/folders/.../playwright-mcp-output/<session-id> \
|
||||||
--source ~/Downloads \
|
--source ~/Downloads \
|
||||||
--target /ABS/PATH/TO/output/gemini \
|
--target /ABS/PATH/TO/output/gemini-image \
|
||||||
--since <download_start_unix_ts> \
|
--since <download_start_unix_ts> \
|
||||||
--limit <max_to_collect> \
|
--limit <max_to_collect> \
|
||||||
--expected-count <required_count> \
|
--expected-count <required_count> \
|
||||||
--prefix gemini \
|
--prefix gemini-image \
|
||||||
--batch-id <run_id> \
|
--batch-id <run_id> \
|
||||||
--prompt "<prompt_used>"
|
--prompt "<prompt_used>"
|
||||||
```
|
```
|
||||||
@@ -80,17 +104,19 @@ Script behavior:
|
|||||||
|
|
||||||
- Source strategy:
|
- Source strategy:
|
||||||
- Prefer Playwright temp download directory first.
|
- Prefer Playwright temp download directory first.
|
||||||
- Fallback to `~/Downloads` when primary source has no matches.
|
- Also scan `.playwright-cli` and fallback to `~/Downloads`.
|
||||||
- Filters to image extensions (`png,jpg,jpeg,webp`).
|
- Filters to image extensions (`png,jpg,jpeg,webp`).
|
||||||
- Uses batch naming (`<prefix>-<batch-id>-NN.ext`).
|
- Uses batch naming (`<prefix>-<batch-id>-NN.ext`).
|
||||||
- Dedupes by SHA-256 (current run + existing target files).
|
- Dedupes by SHA-256 (current run + existing target files).
|
||||||
- Captures dimensions (`width`, `height`) and writes JSON manifest.
|
- Captures dimensions (`width`, `height`) and writes JSON manifest.
|
||||||
- Prints absolute output paths and manifest path.
|
- Prints absolute output paths and manifest path.
|
||||||
|
|
||||||
## 7) Failure Handling By Step
|
## 8) Failure Handling By Step
|
||||||
|
|
||||||
- Login step:
|
- Login step:
|
||||||
- If login/captcha/MFA blocks, stop and ask user to complete manually.
|
- If login/captcha/MFA blocks, stop and ask user to complete manually.
|
||||||
|
- Tool-selection step:
|
||||||
|
- If `创作图片` is missing after login, stop and report unsupported capability.
|
||||||
- Generation step:
|
- Generation step:
|
||||||
- If failed once, retry once with minimal prompt rewrite.
|
- If failed once, retry once with minimal prompt rewrite.
|
||||||
- If still failing, record failure reason and continue remaining quota if requested.
|
- If still failing, record failure reason and continue remaining quota if requested.
|
||||||
@@ -105,7 +131,7 @@ Script behavior:
|
|||||||
- If dedupe removes all files, return manifest with `no_files_after_dedupe`.
|
- If dedupe removes all files, return manifest with `no_files_after_dedupe`.
|
||||||
- If collected count < required count, return `insufficient_files`.
|
- If collected count < required count, return `insufficient_files`.
|
||||||
|
|
||||||
## 8) Return Output
|
## 9) Return Output
|
||||||
|
|
||||||
Return:
|
Return:
|
||||||
|
|
||||||
@@ -115,13 +141,13 @@ Return:
|
|||||||
- manifest absolute path
|
- manifest absolute path
|
||||||
- retries, failures, and skipped duplicates
|
- retries, failures, and skipped duplicates
|
||||||
|
|
||||||
## 9) Reliability Rules
|
## 10) Reliability Rules
|
||||||
|
|
||||||
- Re-snapshot after navigation, model switch, and generation completion.
|
- Re-snapshot after navigation, tool switch, and generation completion.
|
||||||
- If refs are stale or click intercepted, re-snapshot and retry once.
|
- If refs are stale or click intercepted, re-snapshot and retry once.
|
||||||
- Do not assume static selectors across Gemini updates; rely on visible text and role-first matching.
|
- Do not assume static selectors across Gemini updates; rely on visible text and role-first matching.
|
||||||
|
|
||||||
## 10) Boundaries
|
## 11) Boundaries
|
||||||
|
|
||||||
- Do not bypass login verification, captcha, paywalls, or security checks.
|
- Do not bypass login verification, captcha, paywalls, or security checks.
|
||||||
- Do not submit disallowed or unsafe image prompts.
|
- Do not submit disallowed or unsafe image prompts.
|
||||||
@@ -130,4 +156,5 @@ Return:
|
|||||||
## Scripts
|
## Scripts
|
||||||
|
|
||||||
- `/Users/xd/java/xhs/tools/pw`: Shared Playwright CLI entrypoint with fixed session + lock.
|
- `/Users/xd/java/xhs/tools/pw`: Shared Playwright CLI entrypoint with fixed session + lock.
|
||||||
|
- `scripts/run_image_flow.py`: End-to-end runner (login gate, enter image tool, generate, download image, collect files).
|
||||||
- `scripts/collect_downloads.py`: Collect recent downloaded images with fallback sources, dedupe, and manifest.
|
- `scripts/collect_downloads.py`: Collect recent downloaded images with fallback sources, dedupe, and manifest.
|
||||||
|
|||||||
@@ -1,4 +1,4 @@
|
|||||||
interface:
|
interface:
|
||||||
display_name: "Gemini Image Web"
|
display_name: "Gemini Image Web"
|
||||||
short_description: "Generate Gemini images via web, multi-request, dedupe, and manifest."
|
short_description: "Generate Gemini images via web, multi-request, dedupe, and manifest."
|
||||||
default_prompt: "Use $gemini-image-web with PLAYWRIGHT_SHARED_SESSION=codex-shared; run browser steps only through /Users/xd/java/xhs/tools/pw, generate one image per Gemini request until target count is reached, download full-size outputs, then collect files with fallback source strategy, dedupe, and manifest metadata."
|
default_prompt: "Use $gemini-image-web with PLAYWRIGHT_SHARED_SESSION=codex-shared; run scripts/run_image_flow.py via /Users/xd/java/xhs/tools/pw-backed CLI flow to verify login, generate images, prefer full-size download, and collect deduped outputs with manifest."
|
||||||
|
|||||||
@@ -58,8 +58,8 @@ def parse_args() -> argparse.Namespace:
|
|||||||
)
|
)
|
||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
"--prefix",
|
"--prefix",
|
||||||
default="gemini",
|
default="gemini-image",
|
||||||
help="Filename prefix for collected files. Default: gemini",
|
help="Filename prefix for collected files. Default: gemini-image",
|
||||||
)
|
)
|
||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
"--batch-id",
|
"--batch-id",
|
||||||
@@ -107,7 +107,7 @@ def collect_candidates(source: Path, since_ts: float, allowed_ext: set[str]) ->
|
|||||||
files: list[Path] = []
|
files: list[Path] = []
|
||||||
if not source.exists():
|
if not source.exists():
|
||||||
return files
|
return files
|
||||||
for path in source.iterdir():
|
for path in source.rglob("*"):
|
||||||
if not path.is_file():
|
if not path.is_file():
|
||||||
continue
|
continue
|
||||||
ext = path.suffix.lower().lstrip(".")
|
ext = path.suffix.lower().lstrip(".")
|
||||||
@@ -125,7 +125,10 @@ def collect_candidates(source: Path, since_ts: float, allowed_ext: set[str]) ->
|
|||||||
|
|
||||||
def discover_playwright_sources() -> list[Path]:
|
def discover_playwright_sources() -> list[Path]:
|
||||||
globs = (
|
globs = (
|
||||||
|
"/var/folders/*/*/T/playwright-mcp-output/*",
|
||||||
|
"/private/var/folders/*/*/T/playwright-mcp-output/*",
|
||||||
"/var/folders/*/*/*/T/playwright-mcp-output/*",
|
"/var/folders/*/*/*/T/playwright-mcp-output/*",
|
||||||
|
"/private/var/folders/*/*/*/T/playwright-mcp-output/*",
|
||||||
"/tmp/playwright-mcp-output/*",
|
"/tmp/playwright-mcp-output/*",
|
||||||
)
|
)
|
||||||
candidates: list[Path] = []
|
candidates: list[Path] = []
|
||||||
@@ -147,6 +150,8 @@ def resolve_sources(raw_sources: list[str] | None) -> list[Path]:
|
|||||||
if raw_sources:
|
if raw_sources:
|
||||||
return [Path(item).expanduser().resolve() for item in raw_sources]
|
return [Path(item).expanduser().resolve() for item in raw_sources]
|
||||||
auto_sources = discover_playwright_sources()
|
auto_sources = discover_playwright_sources()
|
||||||
|
auto_sources.append((Path.cwd() / ".playwright-cli").resolve())
|
||||||
|
auto_sources.append((Path(__file__).resolve().parents[3] / ".playwright-cli").resolve())
|
||||||
auto_sources.append((Path.home() / "Downloads").resolve())
|
auto_sources.append((Path.home() / "Downloads").resolve())
|
||||||
result: list[Path] = []
|
result: list[Path] = []
|
||||||
seen: set[Path] = set()
|
seen: set[Path] = set()
|
||||||
@@ -214,16 +219,23 @@ def iso_ts(ts: float) -> str:
|
|||||||
return datetime.fromtimestamp(ts, tz=timezone.utc).isoformat()
|
return datetime.fromtimestamp(ts, tz=timezone.utc).isoformat()
|
||||||
|
|
||||||
|
|
||||||
def select_source_candidates(
|
def collect_candidates_all_sources(
|
||||||
sources: list[Path], since_ts: float, allowed_ext: set[str]
|
sources: list[Path], since_ts: float, allowed_ext: set[str]
|
||||||
) -> tuple[Path | None, list[Path], list[dict[str, object]]]:
|
) -> tuple[list[Path], list[dict[str, object]]]:
|
||||||
tried: list[dict[str, object]] = []
|
tried: list[dict[str, object]] = []
|
||||||
|
merged: list[Path] = []
|
||||||
|
seen: set[Path] = set()
|
||||||
for source in sources:
|
for source in sources:
|
||||||
files = collect_candidates(source, since_ts, allowed_ext)
|
files = collect_candidates(source, since_ts, allowed_ext)
|
||||||
tried.append({"source": str(source), "matches": len(files)})
|
tried.append({"source": str(source), "matches": len(files)})
|
||||||
if files:
|
for file_path in files:
|
||||||
return source, files, tried
|
resolved = file_path.resolve()
|
||||||
return None, [], tried
|
if resolved in seen:
|
||||||
|
continue
|
||||||
|
seen.add(resolved)
|
||||||
|
merged.append(file_path)
|
||||||
|
merged.sort(key=lambda p: p.stat().st_mtime, reverse=True)
|
||||||
|
return merged, tried
|
||||||
|
|
||||||
|
|
||||||
def collect_existing_hashes(target: Path, allowed_ext: set[str]) -> set[str]:
|
def collect_existing_hashes(target: Path, allowed_ext: set[str]) -> set[str]:
|
||||||
@@ -269,9 +281,7 @@ def main() -> int:
|
|||||||
return 2
|
return 2
|
||||||
|
|
||||||
sources = resolve_sources(args.source)
|
sources = resolve_sources(args.source)
|
||||||
selected_source, candidates, tried_sources = select_source_candidates(
|
candidates, tried_sources = collect_candidates_all_sources(sources, args.since, allowed_ext)
|
||||||
sources, args.since, allowed_ext
|
|
||||||
)
|
|
||||||
if not candidates:
|
if not candidates:
|
||||||
payload = {
|
payload = {
|
||||||
"status": "no_matching_files",
|
"status": "no_matching_files",
|
||||||
@@ -345,7 +355,6 @@ def main() -> int:
|
|||||||
"batch_id": batch_id,
|
"batch_id": batch_id,
|
||||||
"prompt": args.prompt,
|
"prompt": args.prompt,
|
||||||
"target_dir": str(target),
|
"target_dir": str(target),
|
||||||
"source_dir": str(selected_source) if selected_source else None,
|
|
||||||
"sources_tried": tried_sources,
|
"sources_tried": tried_sources,
|
||||||
"since_ts": args.since,
|
"since_ts": args.since,
|
||||||
"limit": args.limit,
|
"limit": args.limit,
|
||||||
|
|||||||
+293
@@ -0,0 +1,293 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""Run Gemini image generation flow end-to-end via Playwright CLI."""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import re
|
||||||
|
import subprocess
|
||||||
|
import sys
|
||||||
|
import time
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
|
||||||
|
class FlowError(RuntimeError):
|
||||||
|
"""Raised when a subprocess command in the flow fails."""
|
||||||
|
|
||||||
|
|
||||||
|
def run_command(
|
||||||
|
cmd: list[str], *, capture_output: bool = True, check: bool = True
|
||||||
|
) -> subprocess.CompletedProcess[str]:
|
||||||
|
kwargs: dict[str, object] = {"text": True}
|
||||||
|
if capture_output:
|
||||||
|
kwargs["stdout"] = subprocess.PIPE
|
||||||
|
kwargs["stderr"] = subprocess.STDOUT
|
||||||
|
proc = subprocess.run(cmd, **kwargs)
|
||||||
|
if check and proc.returncode != 0:
|
||||||
|
output = proc.stdout if capture_output else ""
|
||||||
|
raise FlowError(
|
||||||
|
f"Command failed ({proc.returncode}): {' '.join(cmd)}\n{output}"
|
||||||
|
)
|
||||||
|
return proc
|
||||||
|
|
||||||
|
|
||||||
|
def run_pw(pw_shared: Path, *args: str) -> str:
|
||||||
|
proc = run_command([str(pw_shared), *args], capture_output=True)
|
||||||
|
return proc.stdout or ""
|
||||||
|
|
||||||
|
|
||||||
|
def is_login_required(pw_shared: Path) -> bool:
|
||||||
|
out = run_pw(
|
||||||
|
pw_shared,
|
||||||
|
"eval",
|
||||||
|
(
|
||||||
|
"() => {"
|
||||||
|
"const hasAccount = !!document.querySelector("
|
||||||
|
"'button[aria-label*=\\\"Google 账号\\\"], "
|
||||||
|
"button[aria-label*=\\\"Google Account\\\"]'"
|
||||||
|
");"
|
||||||
|
"const hasService = !!document.querySelector('a[href*=\\\"ServiceLogin\\\"]');"
|
||||||
|
"const hasLoginCtl = Array.from(document.querySelectorAll('a,button'))"
|
||||||
|
".some(el => /登录|Sign in/i.test((el.textContent || '').trim()));"
|
||||||
|
"return !hasAccount && (hasService || hasLoginCtl);"
|
||||||
|
"}"
|
||||||
|
),
|
||||||
|
)
|
||||||
|
return bool(re.search(r"(?m)^true$", out))
|
||||||
|
|
||||||
|
|
||||||
|
def enter_image_tool(pw_shared: Path) -> None:
|
||||||
|
js = r"""
|
||||||
|
async (page) => {
|
||||||
|
const labels = [/创作图片/, /制作图片/, /Create image/i, /Image/i];
|
||||||
|
|
||||||
|
const openToolMenu = async () => {
|
||||||
|
const cn = page.getByRole('button', { name: '工具', exact: true }).first();
|
||||||
|
if (await cn.count()) {
|
||||||
|
await cn.click();
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
const generic = page.getByRole('button', { name: /工具|Tools/i }).first();
|
||||||
|
if (await generic.count()) {
|
||||||
|
await generic.click();
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
return false;
|
||||||
|
};
|
||||||
|
|
||||||
|
const tryCardButtons = async () => {
|
||||||
|
for (const re of labels) {
|
||||||
|
const btn = page.getByRole('button', { name: re }).first();
|
||||||
|
if (await btn.count()) {
|
||||||
|
try {
|
||||||
|
await btn.click({ timeout: 2000 });
|
||||||
|
return true;
|
||||||
|
} catch (_) {
|
||||||
|
// Overlay may intercept pointer. Fall through to menu strategy.
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return false;
|
||||||
|
};
|
||||||
|
|
||||||
|
const tryToolMenu = async () => {
|
||||||
|
const opened = await openToolMenu();
|
||||||
|
if (!opened) return false;
|
||||||
|
for (const re of labels) {
|
||||||
|
const itemCheck = page.getByRole('menuitemcheckbox', { name: re }).first();
|
||||||
|
if (await itemCheck.count()) {
|
||||||
|
await itemCheck.click();
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
const itemPlain = page.getByRole('menuitem', { name: re }).first();
|
||||||
|
if (await itemPlain.count()) {
|
||||||
|
await itemPlain.click();
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return false;
|
||||||
|
};
|
||||||
|
|
||||||
|
let ok = await tryCardButtons();
|
||||||
|
if (!ok) ok = await tryToolMenu();
|
||||||
|
if (!ok) ok = await tryToolMenu();
|
||||||
|
if (!ok) throw new Error('Image tool entry not found');
|
||||||
|
}
|
||||||
|
"""
|
||||||
|
run_pw(pw_shared, "run-code", js)
|
||||||
|
|
||||||
|
|
||||||
|
def submit_and_download_one(pw_shared: Path, prompt: str) -> None:
|
||||||
|
js = f"""
|
||||||
|
async (page) => {{
|
||||||
|
const prompt = {json.dumps(prompt)};
|
||||||
|
const input = page.getByRole('textbox', {{ name: /为 Gemini 输入提示|Enter a prompt/i }}).first();
|
||||||
|
await input.click();
|
||||||
|
await input.fill(prompt);
|
||||||
|
await input.press('Enter');
|
||||||
|
|
||||||
|
const stopBtn = page.getByRole('button', {{ name: /停止回答|Stop response/i }}).first();
|
||||||
|
await stopBtn.waitFor({{ state: 'visible', timeout: 15000 }}).catch(() => {{}});
|
||||||
|
await stopBtn.waitFor({{ state: 'hidden', timeout: 240000 }});
|
||||||
|
|
||||||
|
const downloadBtn = page.getByRole('button', {{ name: /下载完整尺寸的图片|下载图片|Download full size|Download image|Download/i }}).last();
|
||||||
|
if (!(await downloadBtn.count())) {{
|
||||||
|
throw new Error('Image download button not found');
|
||||||
|
}}
|
||||||
|
|
||||||
|
const downloadPromise = page.waitForEvent('download', {{ timeout: 45000 }}).catch(() => null);
|
||||||
|
await downloadBtn.click();
|
||||||
|
|
||||||
|
const preferredItem = page.getByRole('menuitem', {{ name: /完整尺寸|Full size|PNG|JPG|JPEG|WEBP/i }}).first();
|
||||||
|
if (await preferredItem.isVisible().catch(() => false)) {{
|
||||||
|
await preferredItem.click();
|
||||||
|
}} else {{
|
||||||
|
const anyItem = page.getByRole('menuitem').first();
|
||||||
|
if (await anyItem.isVisible().catch(() => false)) {{
|
||||||
|
await anyItem.click();
|
||||||
|
}}
|
||||||
|
}}
|
||||||
|
|
||||||
|
const download = await downloadPromise;
|
||||||
|
if (!download) {{
|
||||||
|
const failedToast = page.getByText(/下载失败|Download failed|无法下载|保存失败/i).first();
|
||||||
|
if (await failedToast.isVisible().catch(() => false)) {{
|
||||||
|
throw new Error('Image download failed');
|
||||||
|
}}
|
||||||
|
throw new Error('Image download did not start');
|
||||||
|
}}
|
||||||
|
await download.path().catch(() => null);
|
||||||
|
await page.waitForTimeout(800);
|
||||||
|
}}
|
||||||
|
"""
|
||||||
|
run_pw(pw_shared, "run-code", js)
|
||||||
|
|
||||||
|
|
||||||
|
def retry_click_latest_download(pw_shared: Path) -> None:
|
||||||
|
js = r"""
|
||||||
|
async (page) => {
|
||||||
|
const btn = page.getByRole('button', { name: /下载完整尺寸的图片|下载图片|Download full size|Download image|Download/i }).last();
|
||||||
|
if (!(await btn.count())) {
|
||||||
|
throw new Error('Image download button not found for retry');
|
||||||
|
}
|
||||||
|
const downloadPromise = page.waitForEvent('download', { timeout: 45000 }).catch(() => null);
|
||||||
|
await btn.click();
|
||||||
|
const download = await downloadPromise;
|
||||||
|
if (!download) {
|
||||||
|
throw new Error('Retry image download did not start');
|
||||||
|
}
|
||||||
|
await download.path().catch(() => null);
|
||||||
|
await page.waitForTimeout(800);
|
||||||
|
}
|
||||||
|
"""
|
||||||
|
run_pw(pw_shared, "run-code", js)
|
||||||
|
|
||||||
|
|
||||||
|
def parse_args() -> argparse.Namespace:
|
||||||
|
parser = argparse.ArgumentParser(
|
||||||
|
description="Generate images on Gemini web and collect downloaded files."
|
||||||
|
)
|
||||||
|
parser.add_argument("--prompt", required=True, help="Prompt text for image generation.")
|
||||||
|
parser.add_argument(
|
||||||
|
"--target", required=True, help="Absolute output directory for collected files."
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--count", type=int, default=1, help="Number of images to generate. Default: 1."
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--no-headed",
|
||||||
|
action="store_true",
|
||||||
|
help="Run browser without headed mode.",
|
||||||
|
)
|
||||||
|
return parser.parse_args()
|
||||||
|
|
||||||
|
|
||||||
|
def main() -> int:
|
||||||
|
args = parse_args()
|
||||||
|
if args.count < 1:
|
||||||
|
print("--count must be a positive integer.", file=sys.stderr)
|
||||||
|
return 1
|
||||||
|
|
||||||
|
repo_root = Path(__file__).resolve().parents[3]
|
||||||
|
pw_shared = Path(
|
||||||
|
os.environ.get("PW_SHARED_WRAPPER", str(repo_root / "tools/pw"))
|
||||||
|
).expanduser()
|
||||||
|
collect_script = (Path(__file__).resolve().parent / "collect_downloads.py").resolve()
|
||||||
|
|
||||||
|
if not pw_shared.exists() or not pw_shared.is_file():
|
||||||
|
print(f"Shared Playwright wrapper not found: {pw_shared}", file=sys.stderr)
|
||||||
|
return 1
|
||||||
|
if not os.access(pw_shared, os.X_OK):
|
||||||
|
print(f"Shared Playwright wrapper is not executable: {pw_shared}", file=sys.stderr)
|
||||||
|
return 1
|
||||||
|
if not collect_script.exists():
|
||||||
|
print(f"Collector script not found: {collect_script}", file=sys.stderr)
|
||||||
|
return 1
|
||||||
|
|
||||||
|
target = Path(args.target).expanduser().resolve()
|
||||||
|
target.mkdir(parents=True, exist_ok=True)
|
||||||
|
start_ts = time.time()
|
||||||
|
|
||||||
|
try:
|
||||||
|
os.environ["PLAYWRIGHT_SHARED_INIT_MODE"] = (
|
||||||
|
"headless" if args.no_headed else "headed"
|
||||||
|
)
|
||||||
|
run_pw(pw_shared, "snapshot")
|
||||||
|
run_pw(pw_shared, "goto", "https://gemini.google.com/app")
|
||||||
|
run_pw(pw_shared, "snapshot")
|
||||||
|
|
||||||
|
if is_login_required(pw_shared):
|
||||||
|
print(
|
||||||
|
"Gemini is not logged in. Please log in at https://gemini.google.com/app and rerun.",
|
||||||
|
file=sys.stderr,
|
||||||
|
)
|
||||||
|
return 2
|
||||||
|
|
||||||
|
enter_image_tool(pw_shared)
|
||||||
|
|
||||||
|
for i in range(1, args.count + 1):
|
||||||
|
current_prompt = args.prompt
|
||||||
|
if args.count > 1:
|
||||||
|
current_prompt = (
|
||||||
|
f"{args.prompt}\n"
|
||||||
|
f"变体要求:这是第 {i} / {args.count} 张。保持主题一致,但构图和光影细节需要变化。"
|
||||||
|
)
|
||||||
|
submit_and_download_one(pw_shared, current_prompt)
|
||||||
|
|
||||||
|
collect_cmd = [
|
||||||
|
sys.executable,
|
||||||
|
str(collect_script),
|
||||||
|
"--target",
|
||||||
|
str(target),
|
||||||
|
"--since",
|
||||||
|
str(start_ts),
|
||||||
|
"--expected-count",
|
||||||
|
str(args.count),
|
||||||
|
"--limit",
|
||||||
|
str(args.count),
|
||||||
|
"--prefix",
|
||||||
|
"gemini-image",
|
||||||
|
"--prompt",
|
||||||
|
args.prompt,
|
||||||
|
]
|
||||||
|
proc = run_command(collect_cmd, capture_output=False, check=False)
|
||||||
|
if proc.returncode == 0:
|
||||||
|
return 0
|
||||||
|
|
||||||
|
# Fallback: click latest image download button once and retry collection.
|
||||||
|
try:
|
||||||
|
retry_click_latest_download(pw_shared)
|
||||||
|
except FlowError:
|
||||||
|
return proc.returncode
|
||||||
|
|
||||||
|
retry_proc = run_command(collect_cmd, capture_output=False, check=False)
|
||||||
|
return retry_proc.returncode
|
||||||
|
except FlowError as exc:
|
||||||
|
print(str(exc), file=sys.stderr)
|
||||||
|
return 1
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
raise SystemExit(main())
|
||||||
Reference in New Issue
Block a user