commit 4caa839154a5d6bd80e147412fa9ed2a1223be1b Author: kurihada Date: Wed Mar 4 00:41:21 2026 +0800 feat(skills): 初始化多技能包并完善发布流程 - 新增 git-push skill 与 agent 元数据,加入安全推送与 commit message 规范 - 新增 xiaohongshu-engage / xiaohongshu-publish-note 两个技能及元数据 - 新增 gemini-image-web skill、元数据与下载整理脚本 - 新增 .gitignore 以忽略常见生成产物 diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..e0d3016 --- /dev/null +++ b/.gitignore @@ -0,0 +1,7 @@ +output/ +.playwright-cli/ + +__pycache__/ +*.pyc + +.DS_Store diff --git a/skills/gemini-image-web/SKILL.md b/skills/gemini-image-web/SKILL.md new file mode 100644 index 0000000..76ebca2 --- /dev/null +++ b/skills/gemini-image-web/SKILL.md @@ -0,0 +1,129 @@ +--- +name: gemini-image-web +description: "Generate images in Gemini web via browser automation, download results, and collect downloaded files into a local target folder with manifest output. Use when users ask to create Gemini web images, need multiple images generated through repeated requests, or need structured outputs (paths, metadata, dedupe) for downstream publishing workflows." +--- + +# Gemini Image Web + +## Workflow + +1. Open Gemini web and confirm user is logged in. +2. Set output directory and target image count. +3. Send one image-generation prompt per request until target count is reached. +4. For each request, wait until generation ends (`停止回答` button disappears), then download. +5. Collect downloaded files into target folder with batch naming, dedupe, and manifest. +6. Return file paths, manifest path, and failure summary. + +## 1) Prerequisites + +- Ensure browser session can access Gemini (`https://gemini.google.com/app`). +- If login, captcha, or MFA is required, pause and ask user to complete it manually. +- Decide output directory before generation, for example: + - `/Users/xd/java/xhs/output/gemini` + +## 2) Open Gemini + +- Navigate to Gemini app page. +- Confirm login state by checking account/avatar area. +- If not logged in, stop and ask user to complete login manually. +- If model selection is needed, choose a model that supports image output. + +## 3) Multi-Image Generation Strategy + +- Gemini web currently returns one image per request. +- If user asks for `N` images, run `N` requests in sequence. +- Keep a shared base prompt, then apply per-image variants only when needed. +- Record a `download_start_ts` before each download action. + +Prompt construction rules: + +- Keep a single clear subject per prompt. +- Include visual style, lighting, composition, and aspect ratio. +- Include banned elements only if user requests negative constraints. + +## 4) Wait For Completion (Explicit End Condition) + +- After submit, wait for generation state to appear. +- Treat generation as complete only when: + - `停止回答` button disappears, and + - latest assistant response has downloadable image action. +- If refs are stale or state is unclear, re-snapshot and retry once. + +## 5) Download Images + +- Download from the latest assistant response block (not old history blocks). +- Click `下载完整尺寸的图片`. +- Wait for download completion toast/progress to end before next request. +- Repeat until target count is reached or retry budget is exhausted. + +## 6) Collect Downloaded Files + +Use bundled script: + +```bash +python3 scripts/collect_downloads.py \ + --source /var/folders/.../playwright-mcp-output/ \ + --source ~/Downloads \ + --target /ABS/PATH/TO/output/gemini \ + --since \ + --limit \ + --expected-count \ + --prefix gemini \ + --batch-id \ + --prompt "" +``` + +Script behavior: + +- Source strategy: + - Prefer Playwright temp download directory first. + - Fallback to `~/Downloads` when primary source has no matches. +- Filters to image extensions (`png,jpg,jpeg,webp`). +- Uses batch naming (`--NN.ext`). +- Dedupes by SHA-256 (current run + existing target files). +- Captures dimensions (`width`, `height`) and writes JSON manifest. +- Prints absolute output paths and manifest path. + +## 7) Failure Handling By Step + +- Login step: + - If login/captcha/MFA blocks, stop and ask user to complete manually. +- Generation step: + - If failed once, retry once with minimal prompt rewrite. + - If still failing, record failure reason and continue remaining quota if requested. +- Completion detection step: + - If `停止回答` does not disappear within timeout, retry snapshot+wait once. + - If still stuck, mark timeout and skip this request. +- Download step: + - If click intercepted or stale ref, re-snapshot and retry once. + - If no file detected after timeout, mark download failure for that request. +- Collection step: + - If no matching files, return manifest with failure status. + - If dedupe removes all files, return manifest with `no_files_after_dedupe`. + - If collected count < required count, return `insufficient_files`. + +## 8) Return Output + +Return: + +- prompt used +- target count and successful count +- absolute file paths for collected files +- manifest absolute path +- retries, failures, and skipped duplicates + +## 9) Reliability Rules + +- Re-snapshot after navigation, model switch, and generation completion. +- If refs are stale or click intercepted, re-snapshot and retry once. +- Do not assume static selectors across Gemini updates; rely on visible text and role-first matching. + +## 10) Boundaries + +- Do not bypass login verification, captcha, paywalls, or security checks. +- Do not submit disallowed or unsafe image prompts. +- Stop before posting to third-party platforms; this skill only generates and collects images. + +## Scripts + +- `scripts/collect_downloads.py`: Collect recent downloaded images with fallback sources, dedupe, and manifest. diff --git a/skills/gemini-image-web/agents/openai.yaml b/skills/gemini-image-web/agents/openai.yaml new file mode 100644 index 0000000..d4fa8fa --- /dev/null +++ b/skills/gemini-image-web/agents/openai.yaml @@ -0,0 +1,4 @@ +interface: + display_name: "Gemini Image Web" + short_description: "Generate Gemini images via web, multi-request, dedupe, and manifest." + default_prompt: "Use $gemini-image-web to generate one image per Gemini request until target count is reached, download full-size outputs, then collect files with fallback source strategy, dedupe, and manifest metadata." diff --git a/skills/gemini-image-web/scripts/collect_downloads.py b/skills/gemini-image-web/scripts/collect_downloads.py new file mode 100755 index 0000000..40b29ce --- /dev/null +++ b/skills/gemini-image-web/scripts/collect_downloads.py @@ -0,0 +1,367 @@ +#!/usr/bin/env python3 +"""Collect recent image downloads into a target directory with manifest output.""" + +from __future__ import annotations + +import argparse +import hashlib +import json +import re +import shutil +import subprocess +import sys +import time +from datetime import datetime, timezone +from pathlib import Path + + +def parse_args() -> argparse.Namespace: + parser = argparse.ArgumentParser( + description="Collect recent image downloads into a target directory." + ) + parser.add_argument( + "--source", + action="append", + help=( + "Source download directory. Repeatable. " + "If omitted, auto-discovers Playwright temp downloads and then " + "falls back to ~/Downloads." + ), + ) + parser.add_argument( + "--target", + required=True, + help="Target directory for collected files.", + ) + parser.add_argument( + "--since", + type=float, + default=time.time() - 1800, + help="Unix timestamp lower bound for file mtime. Default: now-1800s", + ) + parser.add_argument( + "--ext", + default="png,jpg,jpeg,webp", + help="Comma-separated file extensions to include.", + ) + parser.add_argument( + "--limit", + type=int, + default=8, + help="Maximum files to collect. Default: 8", + ) + parser.add_argument( + "--expected-count", + type=int, + default=None, + help="Required minimum number of collected files.", + ) + parser.add_argument( + "--prefix", + default="gemini", + help="Filename prefix for collected files. Default: gemini", + ) + parser.add_argument( + "--batch-id", + default=None, + help="Batch ID used in output filenames. Default: current timestamp.", + ) + parser.add_argument( + "--manifest", + default=None, + help="Manifest output path. Default: /--manifest.json", + ) + parser.add_argument( + "--prompt", + default="", + help="Prompt text to store in manifest.", + ) + parser.add_argument( + "--move", + action="store_true", + help="Move files instead of copying.", + ) + parser.add_argument( + "--no-dedupe-target", + action="store_true", + help="Disable hash dedupe against existing files in target directory.", + ) + return parser.parse_args() + + +def unique_path(path: Path) -> Path: + if not path.exists(): + return path + stem = path.stem + suffix = path.suffix + parent = path.parent + idx = 2 + while True: + candidate = parent / f"{stem}-{idx}{suffix}" + if not candidate.exists(): + return candidate + idx += 1 + + +def collect_candidates(source: Path, since_ts: float, allowed_ext: set[str]) -> list[Path]: + files: list[Path] = [] + if not source.exists(): + return files + for path in source.iterdir(): + if not path.is_file(): + continue + ext = path.suffix.lower().lstrip(".") + if ext not in allowed_ext: + continue + try: + mtime = path.stat().st_mtime + except OSError: + continue + if mtime >= since_ts: + files.append(path) + files.sort(key=lambda p: p.stat().st_mtime, reverse=True) + return files + + +def discover_playwright_sources() -> list[Path]: + globs = ( + "/var/folders/*/*/*/T/playwright-mcp-output/*", + "/tmp/playwright-mcp-output/*", + ) + candidates: list[Path] = [] + seen: set[Path] = set() + for pattern in globs: + for raw in Path("/").glob(pattern.lstrip("/")): + if not raw.is_dir(): + continue + path = raw.resolve() + if path in seen: + continue + seen.add(path) + candidates.append(path) + candidates.sort(key=lambda p: p.stat().st_mtime, reverse=True) + return candidates + + +def resolve_sources(raw_sources: list[str] | None) -> list[Path]: + if raw_sources: + return [Path(item).expanduser().resolve() for item in raw_sources] + auto_sources = discover_playwright_sources() + auto_sources.append((Path.home() / "Downloads").resolve()) + result: list[Path] = [] + seen: set[Path] = set() + for path in auto_sources: + if path in seen: + continue + seen.add(path) + result.append(path) + return result + + +def sha256_of_file(path: Path) -> str: + digest = hashlib.sha256() + with path.open("rb") as fh: + while True: + chunk = fh.read(1024 * 1024) + if not chunk: + break + digest.update(chunk) + return digest.hexdigest() + + +def dimensions_from_sips(path: Path) -> tuple[int, int] | None: + try: + proc = subprocess.run( + ["sips", "-g", "pixelWidth", "-g", "pixelHeight", str(path)], + check=False, + capture_output=True, + text=True, + ) + except OSError: + return None + if proc.returncode != 0: + return None + width_match = re.search(r"pixelWidth:\s+(\d+)", proc.stdout) + height_match = re.search(r"pixelHeight:\s+(\d+)", proc.stdout) + if not width_match or not height_match: + return None + return int(width_match.group(1)), int(height_match.group(1)) + + +def dimensions_from_png(path: Path) -> tuple[int, int] | None: + try: + with path.open("rb") as fh: + header = fh.read(24) + except OSError: + return None + if len(header) < 24 or header[:8] != b"\x89PNG\r\n\x1a\n": + return None + width = int.from_bytes(header[16:20], "big") + height = int.from_bytes(header[20:24], "big") + return width, height + + +def read_dimensions(path: Path) -> tuple[int, int] | None: + dims = dimensions_from_sips(path) + if dims: + return dims + if path.suffix.lower() == ".png": + return dimensions_from_png(path) + return None + + +def iso_ts(ts: float) -> str: + return datetime.fromtimestamp(ts, tz=timezone.utc).isoformat() + + +def select_source_candidates( + sources: list[Path], since_ts: float, allowed_ext: set[str] +) -> tuple[Path | None, list[Path], list[dict[str, object]]]: + tried: list[dict[str, object]] = [] + for source in sources: + files = collect_candidates(source, since_ts, allowed_ext) + tried.append({"source": str(source), "matches": len(files)}) + if files: + return source, files, tried + return None, [], tried + + +def collect_existing_hashes(target: Path, allowed_ext: set[str]) -> set[str]: + hashes: set[str] = set() + for path in target.iterdir(): + if not path.is_file(): + continue + ext = path.suffix.lower().lstrip(".") + if ext not in allowed_ext: + continue + try: + hashes.add(sha256_of_file(path)) + except OSError: + continue + return hashes + + +def write_manifest(manifest_path: Path, payload: dict[str, object]) -> None: + manifest_path.parent.mkdir(parents=True, exist_ok=True) + with manifest_path.open("w", encoding="utf-8") as fh: + json.dump(payload, fh, ensure_ascii=False, indent=2) + fh.write("\n") + + +def main() -> int: + args = parse_args() + target = Path(args.target).expanduser().resolve() + target.mkdir(parents=True, exist_ok=True) + batch_id = args.batch_id or time.strftime("%Y%m%d-%H%M%S") + manifest_path = ( + Path(args.manifest).expanduser().resolve() + if args.manifest + else target / f"{args.prefix}-{batch_id}-manifest.json" + ) + + allowed_ext = { + ext.strip().lower().lstrip(".") + for ext in args.ext.split(",") + if ext.strip() + } + if not allowed_ext: + print("No valid extensions provided.", file=sys.stderr) + return 2 + + sources = resolve_sources(args.source) + selected_source, candidates, tried_sources = select_source_candidates( + sources, args.since, allowed_ext + ) + if not candidates: + payload = { + "status": "no_matching_files", + "created_at": iso_ts(time.time()), + "batch_id": batch_id, + "prompt": args.prompt, + "target_dir": str(target), + "since_ts": args.since, + "sources_tried": tried_sources, + "collected_count": 0, + "files": [], + } + write_manifest(manifest_path, payload) + print("No matching files found.") + print(f"MANIFEST: {manifest_path}") + return 1 + + dedupe_target = not args.no_dedupe_target + seen_hashes: set[str] = set() + if dedupe_target: + seen_hashes.update(collect_existing_hashes(target, allowed_ext)) + + files: list[dict[str, object]] = [] + skipped_duplicates = 0 + for src in candidates: + if len(files) >= args.limit: + break + try: + src_hash = sha256_of_file(src) + except OSError: + continue + if src_hash in seen_hashes: + skipped_duplicates += 1 + continue + + idx = len(files) + 1 + dst = target / f"{args.prefix}-{batch_id}-{idx:02d}{src.suffix.lower()}" + dst = unique_path(dst) + src_mtime = src.stat().st_mtime + if args.move: + shutil.move(str(src), str(dst)) + else: + shutil.copy2(str(src), str(dst)) + dims = read_dimensions(dst) + file_entry = { + "prompt": args.prompt, + "generated_at": iso_ts(src_mtime), + "source_filename": src.name, + "source_path": str(src.resolve()), + "target_path": str(dst.resolve()), + "sha256": src_hash, + "width": dims[0] if dims else None, + "height": dims[1] if dims else None, + } + files.append(file_entry) + seen_hashes.add(src_hash) + + status = "ok" + exit_code = 0 + expected_count = args.expected_count + if not files: + status = "no_files_after_dedupe" + exit_code = 1 + elif expected_count is not None and len(files) < expected_count: + status = "insufficient_files" + exit_code = 1 + + payload = { + "status": status, + "created_at": iso_ts(time.time()), + "batch_id": batch_id, + "prompt": args.prompt, + "target_dir": str(target), + "source_dir": str(selected_source) if selected_source else None, + "sources_tried": tried_sources, + "since_ts": args.since, + "limit": args.limit, + "expected_count": expected_count, + "dedupe_target": dedupe_target, + "skipped_duplicates": skipped_duplicates, + "collected_count": len(files), + "files": files, + } + write_manifest(manifest_path, payload) + + for item in files: + print(item["target_path"]) + print(f"MANIFEST: {manifest_path}") + return exit_code + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/skills/git-push/SKILL.md b/skills/git-push/SKILL.md new file mode 100644 index 0000000..90eed91 --- /dev/null +++ b/skills/git-push/SKILL.md @@ -0,0 +1,148 @@ +--- +name: git-push +description: "Standardize safe Git push workflows: inspect repository state, configure identity, normalize staging with .gitignore, create or repair local commits (with guarded root-commit rollback), configure remote URLs, and push current branch via SSH or HTTPS with troubleshooting. Use when users ask to initialize repos, prepare clean commits, push to GitLab/GitHub, or resolve auth/push errors." +--- + +# Git Push + +## Workflow + +1. Inspect current repository status and risk. +2. Configure local identity if needed. +3. Normalize staging with `.gitignore`. +4. Create or repair commits with a consistent commit-message policy. +5. Configure remote URL and auth strategy. +6. Push current branch and verify results. +7. Return a concise operation report. + +## 1) Inspect Repository State + +- Run baseline checks: + - `git status --short --branch` + - `git rev-parse --abbrev-ref HEAD` + - `git log --oneline -n 5` (only if `HEAD` exists) + - `git remote -v` +- Detect whether `HEAD` exists before using `HEAD~1`. +- If unexpected file mutations appear during the run, stop and ask the user before continuing. +- Never discard user changes by default. + +## 2) Configure Local Identity + +- If missing or explicitly requested, set: + - `git config --local user.name ""` + - `git config --local user.email ""` +- Verify with: + - `git config --local --get user.name` + - `git config --local --get user.email` + +## 3) Normalize Staging With .gitignore + +- Ensure `.gitignore` contains generated artifacts when applicable: + - `output/` + - `.playwright-cli/` + - `__pycache__/` + - `*.pyc` + - `.DS_Store` +- If ignored files are already tracked, untrack only matching paths first: + - `git rm -r --cached -- output .playwright-cli __pycache__` + - `git rm --cached -- '*.pyc' '.DS_Store'` +- Rebuild whole index (`git rm -r --cached . && git add .`) only when user explicitly confirms. +- Re-check staged files before commit. + +## 4) Create Or Repair Commits + +- Use this commit-message format by default: + - `(): ` +- Allowed `type` values: + - `feat` + - `fix` + - `docs` + - `refactor` + - `test` + - `chore` +- `scope` policy: + - Prefer including scope for multi-module repositories. + - Prefer `skill/git-push` for this skill. + - Allow omitting scope when the target area is unclear. +- `summary` policy: + - Use present tense and one sentence. + - Keep it short and specific (about 20-60 characters when practical). + - Do not end with a period. +- If the user provides an explicit commit message, use it as-is. +- Auto-select `type` when user does not specify: + - Docs-only changes -> `docs` + - Test-only changes -> `test` + - Behavior/workflow additions or changes -> `feat` + - Bug/risk fixes -> `fix` + - Misc maintenance -> `chore` +- Example messages: + - `feat(skill/git-push): unify naming and harden safe push defaults` + - `docs(skill/git-push): document commit message policy` + +- Create commit: + - `git commit -m ""` +- Undo latest commit but keep staged changes: + - `git reset --soft HEAD~1` +- If only a root commit exists and user asks to undo it, run only after explicit confirmation that history is unpublished: + - `git rev-list --count HEAD` (must be `1`) + - `git status --short --branch` (no upstream tracking to remote) + - `branch=$(git symbolic-ref --short HEAD)` + - `git update-ref -d "refs/heads/$branch"` +- Avoid destructive commands (`git reset --hard`) unless explicitly requested. + +## 5) Configure Remote + +- Add origin when missing: + - `git remote add origin ` +- Replace origin URL when needed: + - `git remote set-url origin ` +- Migrate existing origin if required: + - `git remote rename origin old-origin` + - `git remote add origin ` +- Verify connectivity: + - `git ls-remote origin` + +## 6) Choose Authentication Strategy + +- Prefer SSH when server accepts user key. +- Fall back to HTTPS + Personal Access Token (PAT) when SSH fails or policy requires. +- SSH diagnostics: + - `ssh -T git@` + - `ssh -vT git@` +- HTTPS push: + - `git push --set-upstream origin ` +- If prompted for HTTPS password, enter PAT (not account password). + +## 7) Push Recipes + +- Detect current branch: + - `branch=$(git symbolic-ref --short HEAD)` +- Default push (new or existing repository): + - `git push --set-upstream origin "$branch"` +- Push all branches/tags only when user explicitly requests bulk publish: + - `git push origin --all` + - `git push origin --tags` +- Post-push verification: + - `git remote -v` + - `git branch -vv` + +## 8) Failure Handling + +- `Permission denied (publickey)`: + - Confirm key type is supported by remote. + - Confirm expected fingerprint matches uploaded key. + - Confirm key is loaded and offered by ssh client. +- `could not read Username`: + - Use interactive terminal or credential helper. + - Retry with username + PAT. +- `repository not found`: + - Confirm remote URL path and account permission. +- `non-fast-forward`: + - Fetch and rebase/merge per user preference, then retry push. + +## 9) Boundaries + +- Never force-push by default. +- Never rewrite published history unless user explicitly requests it. +- Never commit secrets or token files intentionally. +- Always report commands run plus resulting branch/remote status. diff --git a/skills/git-push/agents/openai.yaml b/skills/git-push/agents/openai.yaml new file mode 100644 index 0000000..53de144 --- /dev/null +++ b/skills/git-push/agents/openai.yaml @@ -0,0 +1,4 @@ +interface: + display_name: "Git Push" + short_description: "Prepare clean commits and push safely" + default_prompt: "Use $git-push to inspect repo state, clean staging with .gitignore, repair local commits when needed, and push the current branch to GitLab/GitHub via SSH or HTTPS with safe troubleshooting. Generate commit messages using (): by default, while honoring explicit user-provided messages." diff --git a/skills/xiaohongshu-engage/SKILL.md b/skills/xiaohongshu-engage/SKILL.md new file mode 100644 index 0000000..135b390 --- /dev/null +++ b/skills/xiaohongshu-engage/SKILL.md @@ -0,0 +1,81 @@ +--- +name: xiaohongshu-engage +description: "Browse XiaoHongShu (小红书) with Playwright and execute account interactions end-to-end: discover content, open notes, like, favorite, and post concise comments. Use when the user asks to '逛逛', browse feed content, or delegate 点赞/收藏/评论 on their logged-in account." +--- + +# Xiaohongshu Engage + +## Workflow + +1. Confirm login state before interacting. +2. Navigate to discovery feed and identify relevant content. +3. Open note details and perform requested engagement actions. +4. Validate action success from UI state. +5. Summarize exactly what was done. + +## 1) Confirm Login State + +- Snapshot current page. +- Check that the sidebar includes the `我` entry and that profile link is accessible. +- If not logged in, stop and ask the user to complete login first. + +## 2) Enter Feed And Pick Topics + +- Click `发现`. +- Prefer topics related to user context or explicit interest (for example AI, coding, photography, travel). +- Open one note at a time from the card list. + +## 3) Engage On A Note + +For each selected note: + +- Click `点赞`. +- Click `收藏`. +- Enter a concise comment and submit. +- Keep comments specific to note content, short, and polite. + +Comment style: + +- 1 sentence, <= 35 Chinese characters preferred. +- Avoid arguments, sarcasm, or sensitive claims. +- Prefer useful and neutral phrasing. + +Examples: + +- `这个思路很实用,感谢分享。` +- `观点很清晰,落地建议也很有帮助。` +- `内容很有启发,我也在实践类似方法。` + +## 4) Verify Success Signals + +After each action, verify at least one visible signal: + +- Like count increases or icon state toggles. +- `收藏成功` toast appears or favorite count changes. +- `评论成功` toast appears and new comment is present in comment list. + +If a send/click fails because overlay intercepts: + +- Re-snapshot. +- Wait briefly and retry. +- Use keyboard `Enter` submission as fallback for comment sending. + +## 5) Handle UI Reliability + +- Re-snapshot after navigation, modal open/close, and any major DOM change. +- Treat stale refs as expected behavior; refresh refs before retrying. +- Close note modal before opening another card when interactions become blocked. + +## 6) Report Back To User + +Return a compact action log: + +- Which notes were engaged. +- How many likes, favorites, and comments were completed. +- Any failures or skipped items. + +## Boundaries + +- Do not follow users, send private messages, or change profile/account settings unless explicitly requested. +- Do not post repeated or spammy comments across multiple notes. +- Stop and notify user if login state is lost or account safety prompts appear. diff --git a/skills/xiaohongshu-engage/agents/openai.yaml b/skills/xiaohongshu-engage/agents/openai.yaml new file mode 100644 index 0000000..78dcc73 --- /dev/null +++ b/skills/xiaohongshu-engage/agents/openai.yaml @@ -0,0 +1,7 @@ +interface: + display_name: "XHS Engage" + short_description: "Browse feed and engage via likes, favorites, and comments." + default_prompt: "Use $xiaohongshu-engage to browse XiaoHongShu and interact with relevant posts through likes, favorites, and concise comments." + +policy: + allow_implicit_invocation: true diff --git a/skills/xiaohongshu-publish-note/SKILL.md b/skills/xiaohongshu-publish-note/SKILL.md new file mode 100644 index 0000000..162ba00 --- /dev/null +++ b/skills/xiaohongshu-publish-note/SKILL.md @@ -0,0 +1,175 @@ +--- +name: xiaohongshu-publish-note +description: "Execute XiaoHongShu (小红书) image-note publishing workflow in the creator platform: open publish center, switch to 图文 mode, prepare images, upload images, fill title/body/topics, configure settings, and validate pre-publish state. Use when the user asks to 发布图文, 发笔记, or automate web publishing steps before final submit. Prefer user-provided image paths; if no image paths are provided, generate images via gemini-image-web and consume manifest target paths for upload. Support configurable publish modes (`safe_mode` and `live_mode`)." +--- + +# Xiaohongshu Publish Note + +## Workflow + +1. Confirm account is logged in and creator center is reachable. +2. Open creator publish page and switch to 图文 publishing mode. +3. Prepare image inputs: use user-provided paths first, otherwise generate via gemini-image-web. +4. Upload images and wait for editor panel readiness. +5. Fill title/body/topics and required settings. +6. Validate preview and publish controls. +7. Execute publish behavior according to publish mode. +8. Save publish evidence and return summary. + +## 1) Enter Creator Publish Page + +- From web homepage, click left sidebar `发布`. +- Expect new tab: `https://creator.xiaohongshu.com/publish/publish?...`. +- Switch to the new tab before continuing. + +## 2) Switch To 图文 Mode + +- Click `上传图文` tab. +- If tab click fails due viewport/position issues: + - Resize viewport to desktop (for example 1440x1000). + - Re-snapshot and retry with latest refs. +- Use `上传图片` area visibility as the final switch-success signal. +- Do not treat tab text click success as completion by itself. + +## 3) Prepare Images + +- Prefer user-provided absolute file paths when available. +- If user did not provide image paths: + - Generate images from the article topic/content using `gemini-image-web`. + - Read the generated manifest JSON and extract `files[*].target_path`. + - Use manifest-derived paths as upload input. +- Ensure at least 1 valid image path exists before entering upload. + +Manifest linkage rules: + +- Trust manifest `status=ok` and `collected_count>=1` before upload. +- If manifest status is not ok, stop and return image-preparation failure. +- Avoid guessing latest files by timestamp when manifest is available. + +## 4) Upload Images + +- Click `上传图片` to open file chooser. +- Upload absolute file paths from Step 3. +- After upload, verify editor state: + - `图片编辑` + - image counter like `1/18` + - title and body input areas. + +## 5) Fill Core Content + +- Fill title textbox (`填写标题会有更多赞哦`). +- Fill body contenteditable area. +- Insert topics (mandatory): + - Click recommended topic chips, and/or + - Use `话题` button for manual insertion. + - Validate final topic count from actual editor content, not click count. + - Ensure total topic count is >= 5 before proceeding. +- Verify body counter and preview panel update. + +Topic counting rule: + +- Count inserted topic tokens in正文内容(例如 `#话题`)。 +- If token count < 5, continue inserting until count >= 5. +- If repeated insertion still fails to reach 5, block final publish. + +## 6) Configure Settings + +- Do not add location. +- Review content settings: + - `原创声明` + - `公开可见` + - `定时发布` + - other toggles as needed by request. +- Ignore geolocation prompts/errors and keep location empty. + +## 7) Validate Pre-Publish State + +- Confirm right-side preview reflects title/body/topics. +- Confirm topic count is >= 5. +- Confirm no location tag is present in preview. +- Confirm uploaded image count is >= 1. +- Confirm bottom actions are visible: + - `暂存离开` + - `发布` +- Publish hard gate (must all pass): + - image count >= 1 + - topic count >= 5 + - location is empty +- Default safety behavior: + - Do not click final `发布` unless user explicitly asks for real submission now. + +## 8) Publish Mode + +- `safe_mode` (default): + - Run full workflow and hard-gate checks. + - Never click final `发布`. +- `live_mode`: + - Require explicit user intent for real posting in current turn. + - Click final `发布` only after hard-gate checks pass. + +## 9) Failure Handling (Layered Retries) + +- Image preparation failure: + - If user paths are invalid, ask for corrected absolute paths. + - If gemini generation fails, stop publish flow and return failure reason. +- Manifest linkage failure: + - If manifest missing/invalid/empty, retry reading once. + - If still invalid, stop and return manifest failure. +- Upload failure: + - If upload area blocks or file chooser fails, re-snapshot and retry once. + - If still failing, stop and report upload failure. +- Topic insertion failure: + - Retry insertion via alternate path (chip -> manual topic button). + - If topic count stays < 5, block final publish. +- UI interaction failure: + - On stale refs/click interception/modal cover, wait briefly, re-snapshot, retry once. +- Publish action failure: + - If publish click fails, re-snapshot and retry click once. + - If success page still not reached, return publish-step failure. +- Publish gate failure: + - If any hard gate fails, do not click `发布`; return blocking conditions. + +## 10) Reliability Rules + +- Re-snapshot after: + - tab switches + - upload completion + - dropdown/modal open/close +- On click interception or stale refs: + - wait briefly + - refresh snapshot + - retry with updated refs. + +## 11) Save Publish Evidence + +- On successful live publish: + - Wait for success indicator (for example `发布成功`). + - Capture success screenshot to: + - `/Users/xd/java/xhs/output/playwright/xhs-publish-success-.png` + - Record: + - publish time + - title + - uploaded image path list + - screenshot path + +## 12) Return Report + +Return a compact execution summary: + +- uploaded files count +- title/body/topic/location status +- settings changed +- publish mode used (`safe_mode` or `live_mode`) +- whether final publish was intentionally skipped or executed +- evidence info when published (success marker, screenshot path, publish time) + +## Boundaries + +- Do not post misleading or spam content. +- Do not publish real content without clear user confirmation in current turn. +- Do not modify unrelated account settings. +- Never add location information. +- Never proceed to final publish when topic count is < 5. +- Never proceed to final publish when uploaded image count is < 1. +- Never proceed to final publish when location is non-empty. +- Default to `safe_mode` unless user clearly requests real posting now. diff --git a/skills/xiaohongshu-publish-note/agents/openai.yaml b/skills/xiaohongshu-publish-note/agents/openai.yaml new file mode 100644 index 0000000..5e10ce4 --- /dev/null +++ b/skills/xiaohongshu-publish-note/agents/openai.yaml @@ -0,0 +1,4 @@ +interface: + display_name: "XHS Publish Note" + short_description: "Publish XHS image notes with manifest image linkage, hard gates, and publish modes" + default_prompt: "Use $xiaohongshu-publish-note to publish XiaoHongShu image notes: prefer user image paths, otherwise generate via $gemini-image-web and use manifest target paths, enforce hard gates (images>=1, topics>=5, no location), and run in safe_mode by default unless live_mode is explicitly requested."