From 84bd4fd9a701cc043762c73655f0ec4d747e7e3c Mon Sep 17 00:00:00 2001 From: kurihada Date: Thu, 19 Mar 2026 02:55:22 +0800 Subject: [PATCH] Add design docs and gitignore --- .gitignore | 7 + docs/architecture.md | 161 +++++++++ docs/blog-project-example.md | 281 +++++++++++++++ docs/council-review.md | 545 +++++++++++++++++++++++++++++ docs/implementation-roadmap.md | 315 +++++++++++++++++ docs/inbox-cli.md | 479 ++++++++++++++++++++++++++ docs/orch-cli.md | 602 +++++++++++++++++++++++++++++++++ docs/worktree-execution.md | 225 ++++++++++++ 8 files changed, 2615 insertions(+) create mode 100644 .gitignore create mode 100644 docs/architecture.md create mode 100644 docs/blog-project-example.md create mode 100644 docs/council-review.md create mode 100644 docs/implementation-roadmap.md create mode 100644 docs/inbox-cli.md create mode 100644 docs/orch-cli.md create mode 100644 docs/worktree-execution.md diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..5ffc648 --- /dev/null +++ b/.gitignore @@ -0,0 +1,7 @@ +.agents/ +.orch/ +bin/ +dist/ +*.db +coverage.out +.DS_Store diff --git a/docs/architecture.md b/docs/architecture.md new file mode 100644 index 0000000..6203da3 --- /dev/null +++ b/docs/architecture.md @@ -0,0 +1,161 @@ +# Agent Coordination Architecture + +## Purpose + +This document defines the system split between the worker-facing `inbox` layer and the leader-facing `orch` layer. + +The design target is a local, file-portable agent coordination stack: + +- `inbox`: durable communication bus +- `orch`: task graph and scheduling control plane +- worktree-backed task execution for code-writing workers +- optional user-facing council review workflow on top of `orch` +- shared SQLite database file +- leader and workers coordinated through stable CLI commands + +## Why Two Layers + +`inbox` and `orch` solve different problems. + +- `inbox` answers: how do agents exchange durable messages, claim work, report progress, and return results? +- `orch` answers: what work exists, which tasks are ready, who should get them, and what happens after a block, failure, or retry? + +If `inbox` is reduced to pure chat storage, the scheduler must reconstruct state from message history and ownership becomes ambiguous. If `inbox` tries to become a full scheduler, worker concerns and leader concerns get mixed into one unstable interface. + +## Role Model + +- `user`: talks only to the leader +- `leader`: owns the overall goal, task graph, acceptance criteria, and final integration +- `worker`: executes one assigned task at a time and reports through `inbox` +- `inbox`: durable thread/message/lease/artifact store +- `orch`: run/task/dependency/dispatch state machine built on top of `inbox` + +## Default Usage Rules + +- The leader should use `orch` as the default control surface. +- The leader may use `inbox` directly for inspection or manual repair. +- Workers should use `inbox` only. +- Workers should not use `orch`. +- User-facing discussion stays with the leader. +- Code-writing workers should run in `orch`-assigned Git worktrees, not in the user's primary checkout. + +## Shared Storage Model + +Both CLIs should point at the same SQLite file. + +- `inbox` owns communication tables such as threads, messages, leases, and artifacts. +- `orch` owns scheduling tables such as runs, tasks, dependencies, and attempts. +- both layers append to a shared event stream for blocking waits +- `orch dispatch` creates or updates `inbox` threads. +- `orch reconcile` reads `inbox` state and updates task state. + +This preserves a clean boundary while keeping deployment simple. + +## Worker Execution Model + +For code tasks, execution should be isolated from the user's primary checkout. + +- `orch dispatch` should create a task-attempt worktree +- the assigned worktree path should be stored in attempt metadata and inbox task payload +- the worker runtime should execute inside that worktree +- strict mode should require a committed base revision + +See [worktree-execution.md](/home/kurihada/project/ai-workflow-skill/docs/worktree-execution.md) for the full lifecycle. + +## Event-Driven Waiting + +The leader does not receive worker messages as an in-memory push. Workers write state into `inbox`, and the leader must read it back through CLI commands. + +The intended solution is event-driven blocking waits, not ad hoc `sleep` loops. + +- leaders should use `orch wait` +- blocked workers should use `inbox wait-reply` +- low-level polling may still exist internally, but it should be hidden inside the CLI + +This means there is still one logical leader. The extra behavior is a blocking wait primitive, not a second leader. + +## Shared Event Stream + +To support blocking waits cleanly, both layers should append rows to a shared `events` table. + +Typical emitters: + +- `inbox`: claim, progress, blocked, answer, done, fail, cancel +- `orch`: dispatch, answer, retry, reassign, cancel, reconcile-driven task state changes + +Typical consumers: + +- `orch wait`: watches run-scoped task events for the leader +- `inbox wait-reply`: watches thread-scoped reply events for a blocked worker + +Every waiter should use a monotonic cursor such as `event_id` or `message_id`, so it can resume safely without reprocessing old events. + +## Recommended Binary Layout + +The recommended v1 shape is: + +- `inbox` binary for communication primitives +- `orch` binary for leader-side planning and scheduling +- one shared `--db PATH` + +If packaging later favors a single binary, the same model can be exposed as command groups: + +- `agentctl inbox ...` +- `agentctl orch ...` + +## Responsibility Split + +`inbox` should own: + +- directed messages +- durable threads +- worker claiming and leases +- progress, blocked, result, and failure events +- artifact references +- thread history and watch functionality +- thread-scoped waiting for replies + +`orch` should own: + +- runs +- task graph and dependencies +- ready queue calculation +- dispatch decisions +- task-attempt worktree allocation +- blocked queue review for the leader +- retries, reassignment, and cancellation +- mapping task attempts to inbox threads +- run-scoped waiting for actionable events +- reusable higher-level workflows such as council review + +## What Not To Mix + +Do not put these into `inbox`: + +- dependency graph logic +- automatic worker selection policy +- retry policy +- acceptance-driven task completion logic + +Do not put these into `orch`: + +- worker claiming +- low-level message append/reply primitives +- raw thread history storage + +## Reading Order + +- [inbox-cli.md](/home/kurihada/project/ai-workflow-skill/docs/inbox-cli.md): worker-facing bus and low-level message protocol +- [orch-cli.md](/home/kurihada/project/ai-workflow-skill/docs/orch-cli.md): leader-facing scheduler and task graph control plane +- [worktree-execution.md](/home/kurihada/project/ai-workflow-skill/docs/worktree-execution.md): strict worktree model for code-writing task attempts +- [council-review.md](/home/kurihada/project/ai-workflow-skill/docs/council-review.md): user-facing three-reviewer brainstorm and voting workflow +- [implementation-roadmap.md](/home/kurihada/project/ai-workflow-skill/docs/implementation-roadmap.md): handoff-oriented implementation order and next steps +- [blog-project-example.md](/home/kurihada/project/ai-workflow-skill/docs/blog-project-example.md): concrete example using both layers + +## Skills + +The intended skill split mirrors the CLI split. + +- `inbox` skill: used when an agent needs to fetch work, claim a thread, send progress, ask blocked questions, reply, or return results through `inbox` +- `orchestrator` skill: used when the leader needs to create runs, decompose tasks, manage dependencies, dispatch ready work, inspect blocks, answer them, retry failures, or reassign work through `orch` +- `council-review` skill: used when the user explicitly wants a structured three-reviewer brainstorm or review with grouped and tallied recommendations diff --git a/docs/blog-project-example.md b/docs/blog-project-example.md new file mode 100644 index 0000000..a984b73 --- /dev/null +++ b/docs/blog-project-example.md @@ -0,0 +1,281 @@ +# Blog Project Example + +## Goal + +This document simulates how a leader should use `orch` and how workers should use `inbox` for a blog MVP. + +Assumptions: + +- stack: `Next.js + PostgreSQL + Prisma` +- features: public blog pages, admin login, post CRUD, tags, basic tests +- leader owns planning, architecture, dependency decisions, and final integration + +## Task Graph + +The leader decomposes the work into these tasks: + +- `T1`: project skeleton and base wiring +- `T2`: data model and migrations +- `T3`: auth and API contract +- `T4`: backend post and tag APIs +- `T5`: admin UI +- `T6`: public blog UI +- `T7`: QA and acceptance +- `T8`: final integration and user-facing handoff + +Dependencies: + +- `T2` depends on `T1` +- `T3` depends on `T2` +- `T4` depends on `T3` +- `T5` depends on `T3` +- `T6` depends on `T3` +- `T7` depends on `T4`, `T5`, and `T6` +- `T8` depends on `T7` + +## Who Uses What + +- leader uses the `orchestrator` skill and `orch` CLI +- workers use the `inbox` skill and `inbox` CLI +- `orch` is the leader's main interface +- `inbox` is the worker's main interface + +## Simulated Leader Flow + +### 1. Create the Run + +```bash +orch run init --db .agents/coord.db --run blog_mvp_001 --goal "Build blog MVP" --summary "Public blog plus admin CRUD" --json +``` + +### 2. Register Tasks + +```bash +orch task add --run blog_mvp_001 --task T1 --title "Project skeleton" --summary "Initialize app structure, env template, and DB wiring" --default-to foundation-worker --json +orch task add --run blog_mvp_001 --task T2 --title "Data model and migrations" --summary "Create users, posts, tags, and post_tags schema" --default-to db-worker --json +orch task add --run blog_mvp_001 --task T3 --title "Auth and API contract" --summary "Define admin auth flow and CRUD contract" --default-to backend-worker --json +orch task add --run blog_mvp_001 --task T4 --title "Backend APIs" --summary "Implement post and tag CRUD" --default-to backend-worker --json +orch task add --run blog_mvp_001 --task T5 --title "Admin UI" --summary "Implement login and post management pages" --default-to admin-ui-worker --json +orch task add --run blog_mvp_001 --task T6 --title "Public blog UI" --summary "Implement homepage, post detail, and tag page" --default-to frontend-worker --json +orch task add --run blog_mvp_001 --task T7 --title "QA and acceptance" --summary "Run smoke tests and verify MVP acceptance" --default-to qa-worker --json +orch task add --run blog_mvp_001 --task T8 --title "Final integration" --summary "Leader verifies final system and prepares handoff" --json +``` + +### 3. Register Dependencies + +```bash +orch dep add --run blog_mvp_001 --task T2 --depends-on T1 --json +orch dep add --run blog_mvp_001 --task T3 --depends-on T2 --json +orch dep add --run blog_mvp_001 --task T4 --depends-on T3 --json +orch dep add --run blog_mvp_001 --task T5 --depends-on T3 --json +orch dep add --run blog_mvp_001 --task T6 --depends-on T3 --json +orch dep add --run blog_mvp_001 --task T7 --depends-on T4 --json +orch dep add --run blog_mvp_001 --task T7 --depends-on T5 --json +orch dep add --run blog_mvp_001 --task T7 --depends-on T6 --json +orch dep add --run blog_mvp_001 --task T8 --depends-on T7 --json +``` + +### 4. Ask What Is Ready + +```bash +orch ready --run blog_mvp_001 --json +``` + +Expected result: + +```json +{ + "ready": ["T1"] +} +``` + +### 5. Dispatch `T1` + +```bash +orch dispatch --run blog_mvp_001 --task T1 --to foundation-worker --body "Set up the app skeleton, env.example, Prisma initialization, base routes, and startup instructions." --json +``` + +The leader does not hand-write `inbox send`. `orch dispatch` does that under the hood. + +If nothing else is actionable after dispatch, the leader should block on `orch wait` instead of sleeping: + +```bash +orch wait --run blog_mvp_001 --for task_blocked,task_done,task_failed --after-event 0 --timeout-seconds 900 --json +``` + +## Simulated Worker Flow + +### 6. Foundation Worker Polls Inbox + +```bash +inbox fetch --db .agents/coord.db --agent foundation-worker --status pending --json +inbox claim --db .agents/coord.db --agent foundation-worker --thread thr_t1_attempt1 --lease-seconds 1800 --json +inbox update --db .agents/coord.db --agent foundation-worker --thread thr_t1_attempt1 --status in_progress --summary "Initializing project and wiring Prisma" --json +``` + +### 7. Foundation Worker Finishes + +```bash +inbox done --db .agents/coord.db --agent foundation-worker --thread thr_t1_attempt1 --summary "Project skeleton complete" --body "Project starts locally, DB wiring exists, and env template is present." --json +``` + +## Leader Reconciliation Loop + +### 8. Leader Reconciles + +```bash +orch reconcile --run blog_mvp_001 --json +orch ready --run blog_mvp_001 --json +``` + +Expected result after reconciliation: + +```json +{ + "ready": ["T2"] +} +``` + +### 9. Leader Dispatches `T2` + +```bash +orch dispatch --run blog_mvp_001 --task T2 --to db-worker --body "Create the blog schema with users, posts, tags, and post_tags. Include migration files and keep the schema MVP-scoped." --json +``` + +### 10. Worker Gets Blocked + +The database worker discovers a missing product decision. + +```bash +inbox update --db .agents/coord.db --agent db-worker --thread thr_t2_attempt1 --status blocked --summary "Need post status values" --payload-json '{"question":"Should MVP support only draft and published, or more states?"}' --json +inbox wait-reply --db .agents/coord.db --thread thr_t2_attempt1 --after-event 0 --timeout-seconds 1800 --json +``` + +### 11. Leader Reviews Blocked Work Through `orch` + +```bash +orch reconcile --run blog_mvp_001 --json +orch blocked --run blog_mvp_001 --json +``` + +Expected result: + +```json +{ + "blocked": [ + { + "task_id": "T2", + "thread_id": "thr_t2_attempt1", + "question": "Should MVP support only draft and published, or more states?" + } + ] +} +``` + +### 12. Leader Answers Without Leaving the Scheduler + +```bash +orch answer --run blog_mvp_001 --task T2 --body "Use only draft and published for the MVP. Do not add archived or soft delete yet." --json +``` + +`orch answer` writes the response back into the mapped inbox thread. + +### 13. `T2` Completes, Then `T3` Runs + +The worker resumes and finishes through `inbox done`. + +The leader continues the same loop: + +```bash +orch reconcile --run blog_mvp_001 --json +orch ready --run blog_mvp_001 --json +orch dispatch --run blog_mvp_001 --task T3 --to backend-worker --body "Define admin auth and the API contract for posts and tags. Output the contract as a project artifact and keep UI workers unblocked." --json +``` + +## Parallel Stage + +### 14. `T3` Finishes and Unblocks `T4`, `T5`, and `T6` + +```bash +orch reconcile --run blog_mvp_001 --json +orch ready --run blog_mvp_001 --json +``` + +Expected result: + +```json +{ + "ready": ["T4", "T5", "T6"] +} +``` + +### 15. Leader Dispatches Three Tasks + +```bash +orch dispatch --run blog_mvp_001 --task T4 --to backend-worker --body "Implement the post and tag APIs exactly as defined by the contract." --json +orch dispatch --run blog_mvp_001 --task T5 --to admin-ui-worker --body "Build the admin login and post management UI against the published API contract." --json +orch dispatch --run blog_mvp_001 --task T6 --to frontend-worker --body "Build the public blog pages against the published API contract." --json +``` + +### 16. Another Block Arrives + +The admin UI worker asks whether to use a rich text editor. + +```bash +inbox update --db .agents/coord.db --agent admin-ui-worker --thread thr_t5_attempt1 --status blocked --summary "Editor choice undecided" --payload-json '{"question":"Should the admin editor use a rich text editor or plain textarea in MVP?"}' --json +``` + +Leader response: + +```bash +orch reconcile --run blog_mvp_001 --json +orch blocked --run blog_mvp_001 --json +orch answer --run blog_mvp_001 --task T5 --body "Use a plain textarea in MVP. Avoid adding a rich text dependency in this phase." --json +``` + +The worker then resumes after `inbox wait-reply` wakes with the leader's answer. No side uses blind `sleep` loops. + +## QA and Follow-Up Fix + +### 17. QA Runs After `T4`, `T5`, and `T6` + +```bash +orch reconcile --run blog_mvp_001 --json +orch ready --run blog_mvp_001 --json +orch dispatch --run blog_mvp_001 --task T7 --to qa-worker --body "Verify login, post CRUD, public blog pages, and tag filtering. Return minimal repro steps for any failures." --json +``` + +### 18. QA Finds a Contract Mismatch + +QA reports that deleting a post returns an unexpected response shape. + +The leader can create a follow-up fix task: + +```bash +orch task add --run blog_mvp_001 --task T7a --title "Fix delete post response mismatch" --summary "Align delete response with the published API contract" --default-to backend-worker --json +orch dep add --run blog_mvp_001 --task T7a --depends-on T7 --json +orch reconcile --run blog_mvp_001 --json +orch ready --run blog_mvp_001 --json +orch dispatch --run blog_mvp_001 --task T7a --to backend-worker --body "Fix the delete post response so it matches the contract used by QA and admin UI." --json +``` + +## Final Leader Responsibilities + +The leader still owns: + +- keeping the run coherent +- answering blocked questions +- deciding scope reductions such as textarea instead of rich text +- adding fix tasks after QA +- verifying final acceptance before handoff + +The workers do not own those cross-task decisions. + +## What This Example Shows + +- leader planning and dispatch happen through `orch` +- worker execution happens through `inbox` +- blocked handling stays leader-controlled +- `inbox` is the durable bus +- `orch` is the actual scheduling surface +- leader and workers wait on blocking CLI primitives rather than manual sleeps diff --git a/docs/council-review.md b/docs/council-review.md new file mode 100644 index 0000000..758f189 --- /dev/null +++ b/docs/council-review.md @@ -0,0 +1,545 @@ +# Council Review Workflow + +## Purpose + +This document defines a user-facing brainstorming and review workflow where three reviewer agents analyze the same target from different perspectives, return structured suggestions, and then have their suggestions grouped and tallied. + +This is intended to be exposed as a separate skill because a user may explicitly ask for this workflow, for example: + +- "Use the council review skill" +- "Start a three-reviewer brainstorm" +- "Have three reviewers propose optimizations" + +Under the hood, this workflow should reuse `orch` and `inbox`. It does not need a separate infrastructure binary in v1. + +## Why This Is A Separate Skill + +This workflow is not just generic scheduling. It has: + +- fixed reviewer roles +- a fixed collection and tally process +- a distinct output format +- user-visible semantics around agreement and disagreement + +That makes it a good skill boundary even if the execution still rides on `orch`. + +## Default Model + +The default council has three reviewer agents with intentionally different lenses: + +- `architecture-reviewer`: architecture, boundaries, interfaces, composition +- `implementation-reviewer`: code simplicity, maintainability, duplication, practicality +- `risk-reviewer`: regressions, correctness, security, operability + +These are roles, not necessarily permanently running services. `orch council start` may dispatch them as three worker tasks for the same target. + +## Default Choices + +If the user does not specify otherwise, use these defaults: + +- analysis only, no code changes +- show `consensus` and `majority` in the main report +- keep `minority` out of the main report unless it is unusually valuable +- fixed reviewer roles: `architecture-reviewer`, `implementation-reviewer`, `risk-reviewer` +- support both text targets and codebase targets +- emit both a human-readable markdown report and `--json` +- persist council inputs, reviewer outputs, grouped recommendations, and final report metadata in `orch` +- use `normal` proposal grouping instead of `strict` +- use `low|medium|high` confidence values +- reuse the normal `orch run_id` namespace rather than creating a separate id space + +These defaults are intended to make the workflow useful out of the box without making it too expensive or too conservative. + +## What The Council Produces + +Each reviewer should return: + +- key findings +- proposed modifications +- rationale +- confidence +- optional risks or tradeoffs + +The system then groups similar proposals across reviewers and assigns a vote count. + +## Voting Model + +Do not collapse everything into a single "all or nothing" rule by default. + +Recommended output buckets: + +- `consensus`: 3 of 3 reviewers support the proposal +- `majority`: 2 of 3 reviewers support the proposal +- `minority`: 1 of 3 reviewers supports the proposal + +Recommended behavior: + +- if the user asks for "only unanimous ideas", show only `consensus` +- otherwise show `consensus` first, then `majority`, and only include `minority` if it is especially insightful + +This keeps user feedback useful without making the system overly conservative. + +## Execution Model + +The council workflow should be implemented as an `orch council` command group. + +Recommended phases: + +1. create a council review run for a target +2. dispatch three reviewer tasks +3. wait for all three reviewer responses +4. group similar suggestions +5. tally votes +6. produce a report for the leader or directly for the user + +The recommended default report policy is: + +- main report: `consensus` and `majority` +- optional appendix: `minority` + +## CLI Shape + +Recommended command group: + +- `orch council start` +- `orch council wait` +- `orch council tally` +- `orch council report` + +### `orch council start` + +Create and dispatch a three-reviewer council run. + +Suggested flags: + +- `--run RUN_ID` +- `--target TEXT` +- `--target-file PATH` +- `--target-type text|repo|mixed` +- `--mode brainstorm|review` +- `--only-unanimous` +- `--output markdown|json|both` + +Behavior: + +- creates three review tasks with fixed reviewer roles +- dispatches them through `orch` +- records the council metadata for later tally + +Default behavior: + +- `--mode brainstorm` +- `--target-type mixed` +- `--output both` +- unanimous-only disabled unless explicitly requested +- reviewer count fixed at `3` in v1 + +### `orch council wait` + +Block until all reviewer outputs arrive or the timeout is reached. + +Suggested flags: + +- `--run RUN_ID` +- `--timeout-seconds N` + +### `orch council tally` + +Group similar suggestions and count supporting reviewers. + +Suggested flags: + +- `--run RUN_ID` +- `--similarity strict|normal` +- `--json` + +Behavior: + +- reads the three reviewer outputs +- groups proposals by normalized intent +- records supporter count and dissent +- persists grouped recommendations in `orch` storage + +### `orch council report` + +Produce the final council report. + +Suggested flags: + +- `--run RUN_ID` +- `--show consensus|majority|minority|all` +- `--json` + +Default behavior: + +- report `consensus,majority` +- also allow a markdown artifact for user-facing output + +## Run Identity + +Council runs should reuse the existing `orch run_id` namespace. + +This keeps: + +- storage simpler +- waiting and event handling consistent +- reporting compatible with the rest of `orch` + +There is no need for a second run identifier system in v1. + +## Supported Inputs + +The workflow should support three common target shapes: + +- `text`: a prompt, design note, requirement, or copied problem statement +- `repo`: a repository path or code snapshot to inspect +- `mixed`: a text brief plus repository context + +Recommended default is `mixed` because many useful brainstorms need both goal context and code context. + +## Target Reference Schema + +The council input should allow both free text and explicit references. + +Suggested shape: + +```json +{ + "run_id": "council_blog_001", + "target": { + "target_type": "mixed", + "prompt": "Review the current blog architecture and propose optimizations.", + "target_file": "brief.md", + "repo_path": ".", + "task_id": "T4" + } +} +``` + +Recommended field meanings: + +- `prompt`: optional human brief +- `target_file`: optional design note or copied context file +- `repo_path`: optional repository root for code inspection +- `task_id`: optional existing `orch` task reference when the brainstorm is about a specific task + +This keeps the input flexible without making the protocol ambiguous. + +## Confidence Enum + +Use a simple confidence enum in v1: + +- `low` +- `medium` +- `high` + +Avoid more granular values unless later evidence shows they are useful. + +## Reviewer Output Schema + +Each reviewer should return a structured document like: + +```json +{ + "reviewer_role": "architecture-reviewer", + "findings": [ + { + "title": "Split API contract from UI implementation details", + "summary": "The current structure mixes transport contracts with view concerns.", + "proposal": "Move API contract definitions into a dedicated module and keep UI-specific mapping local to the UI layer.", + "rationale": "This lowers coupling and reduces integration churn.", + "confidence": "high", + "tags": ["architecture", "coupling"], + "target_refs": { + "repo_path": ".", + "files": ["src/api/contracts.ts", "src/ui/admin/posts.tsx"] + } + } + ] +} +``` + +The tally step should map semantically similar `proposal` values into one grouped recommendation. + +Required reviewer output expectations: + +- `reviewer_role` must match one of the assigned council roles +- `confidence` must use `low|medium|high` +- `proposal` should describe one actionable recommendation +- `target_refs` may point to files, repo path, or task id when relevant + +## Grouped Recommendation Schema + +After tally, the system should persist grouped proposals with support counts. + +Suggested shape: + +```json +{ + "run_id": "council_blog_001", + "grouped_recommendations": [ + { + "group_id": "grp_01", + "proposal": "Move API contract definitions into a dedicated module.", + "bucket": "consensus", + "support_count": 3, + "supporters": [ + "architecture-reviewer", + "implementation-reviewer", + "risk-reviewer" + ], + "dissenters": [], + "rationale_summary": "All three reviewers agree the current structure is too coupled.", + "tags": ["architecture", "coupling"], + "source_finding_ids": [ + "architecture-reviewer:f01", + "implementation-reviewer:f03", + "risk-reviewer:f02" + ] + } + ] +} +``` + +## Proposal Grouping Rules + +The default grouping mode should be `normal`. + +Recommended interpretation: + +- `strict`: only group nearly identical proposals +- `normal`: group proposals that clearly recommend the same underlying change even if phrased differently + +In v1, `normal` should be the default because it is more useful for brainstorming workflows. + +## Persistence Model + +Council review should persist its state in `orch` instead of remaining purely ephemeral. + +Recommended persisted objects: + +- council run metadata +- council reviewer assignments +- reviewer outputs +- grouped recommendations +- final report metadata and artifact paths + +This makes it possible to: + +- resume after interruption +- audit how a recommendation was formed +- regenerate user-facing reports +- compare multiple council runs over time + +## JSON Output Shapes + +`orch council start` should return enough information for downstream automation. + +Suggested shape: + +```json +{ + "ok": true, + "command": "council-start", + "run_id": "council_blog_001", + "mode": "brainstorm", + "target_type": "mixed", + "output": "both", + "reviewers": [ + { + "reviewer_role": "architecture-reviewer", + "task_id": "CR1", + "status": "dispatched" + }, + { + "reviewer_role": "implementation-reviewer", + "task_id": "CR2", + "status": "dispatched" + }, + { + "reviewer_role": "risk-reviewer", + "task_id": "CR3", + "status": "dispatched" + } + ] +} +``` + +`orch council tally` should return grouped recommendations. + +Suggested shape: + +```json +{ + "ok": true, + "command": "council-tally", + "run_id": "council_blog_001", + "similarity": "normal", + "counts": { + "consensus": 2, + "majority": 3, + "minority": 1 + }, + "grouped_recommendations": [ + { + "group_id": "grp_01", + "bucket": "consensus", + "support_count": 3, + "proposal": "Move API contract definitions into a dedicated module." + } + ] +} +``` + +`orch council report` should support both machine and human output. + +Suggested JSON shape: + +```json +{ + "ok": true, + "command": "council-report", + "run_id": "council_blog_001", + "show": ["consensus", "majority"], + "report_artifacts": [ + { + "kind": "markdown", + "path": ".orch/reports/council_blog_001.md" + } + ], + "summary": { + "consensus": 2, + "majority": 3, + "minority": 1 + } +} +``` + +## Read-Only By Default + +This workflow should be analysis-first by default. + +- reviewers should inspect the target and return suggestions +- reviewers should not change code unless the user explicitly asks for proposal patches + +If patch proposals are later enabled, they can still run through `orch` with worktrees, but that should be a separate mode. + +## When To Use It + +Use this workflow when: + +- the user explicitly asks for structured brainstorming +- the user wants multiple perspectives on a design or codebase +- the user wants proposals ranked by agreement +- the user wants a more rigorous alternative to a single-agent suggestion list + +Do not use it when: + +- the task is a simple direct implementation request +- a single code review is enough +- time or cost should be minimized + +## Relationship To Existing Docs + +- [architecture.md](/home/kurihada/project/ai-workflow-skill/docs/architecture.md): overall system split +- [orch-cli.md](/home/kurihada/project/ai-workflow-skill/docs/orch-cli.md): leader-facing scheduling surface used to execute the council +- [inbox-cli.md](/home/kurihada/project/ai-workflow-skill/docs/inbox-cli.md): transport used by reviewer tasks + +## Recommended Storage Shape + +One simple approach is to add council-specific tables inside the shared `orch` database. + +Suggested tables: + +```sql +CREATE TABLE IF NOT EXISTS council_runs ( + run_id TEXT PRIMARY KEY, + mode TEXT NOT NULL, + target_type TEXT NOT NULL, + output_mode TEXT NOT NULL, + only_unanimous INTEGER NOT NULL DEFAULT 0, + created_at TEXT NOT NULL, + updated_at TEXT NOT NULL +); + +CREATE TABLE IF NOT EXISTS council_reviewers ( + run_id TEXT NOT NULL, + reviewer_role TEXT NOT NULL, + task_id TEXT NOT NULL, + status TEXT NOT NULL, + PRIMARY KEY (run_id, reviewer_role) +); + +CREATE TABLE IF NOT EXISTS council_findings ( + run_id TEXT NOT NULL, + reviewer_role TEXT NOT NULL, + finding_id TEXT NOT NULL, + title TEXT NOT NULL, + summary TEXT NOT NULL, + proposal TEXT NOT NULL, + rationale TEXT NOT NULL, + confidence TEXT NOT NULL, + tags_json TEXT NOT NULL DEFAULT '[]', + target_refs_json TEXT NOT NULL DEFAULT '{}', + PRIMARY KEY (run_id, reviewer_role, finding_id) +); + +CREATE TABLE IF NOT EXISTS council_groups ( + run_id TEXT NOT NULL, + group_id TEXT NOT NULL, + proposal TEXT NOT NULL, + bucket TEXT NOT NULL, + support_count INTEGER NOT NULL, + supporters_json TEXT NOT NULL DEFAULT '[]', + dissenters_json TEXT NOT NULL DEFAULT '[]', + rationale_summary TEXT NOT NULL DEFAULT '', + tags_json TEXT NOT NULL DEFAULT '[]', + source_finding_ids_json TEXT NOT NULL DEFAULT '[]', + PRIMARY KEY (run_id, group_id) +); +``` + +## Embedded Skill Draft + +The following block is a draft `SKILL.md` for the user-facing council review skill. + +````markdown +```markdown +--- +name: council-review +description: Use this skill when the user explicitly asks for a multi-reviewer brainstorming or review workflow. It launches three reviewer roles with different perspectives, collects their suggestions, groups similar proposals, tallies agreement, and reports consensus, majority, and minority recommendations. Use it for structured idea generation or design critique, not for ordinary task execution. +--- + +# Council Review + +Use this skill when the user wants a structured three-reviewer brainstorm or review. + +## Reviewer Roles + +- architecture-reviewer +- implementation-reviewer +- risk-reviewer + +## Rules + +- Treat this as an analysis workflow, not a code-writing workflow, unless the user explicitly asks for patch proposals. +- Use `orch council` as the execution surface. +- Default to the fixed reviewer roles architecture, implementation, and risk. +- Collect all three reviewer responses before tallying. +- Group similar suggestions before counting votes. +- Use `normal` grouping and `low|medium|high` confidence unless the user asks otherwise. +- If the user asks for unanimous-only output, show only `consensus`. +- Otherwise present `consensus` first, then `majority`, then optional `minority`. +- Support both text and repository context when available. +- Persist the council run and grouped outputs in `orch`. + +## Typical Commands + +```bash +orch council start --run council_blog_001 --target-file brief.md --target-type mixed --mode brainstorm --output both --json +orch council wait --run council_blog_001 --timeout-seconds 900 --json +orch council tally --run council_blog_001 --similarity normal --json +orch council report --run council_blog_001 --show consensus,majority --json +``` +``` +```` diff --git a/docs/implementation-roadmap.md b/docs/implementation-roadmap.md new file mode 100644 index 0000000..955f9bb --- /dev/null +++ b/docs/implementation-roadmap.md @@ -0,0 +1,315 @@ +# Implementation Roadmap + +## Purpose + +This document is the handoff-oriented implementation plan for the project. It is intentionally short and execution-focused. + +A new agent should be able to read this file, understand the current project state, and immediately know what to build next without re-deriving the whole design. + +## Current Status + +As of now: + +- architecture and workflow docs are written +- CLI surfaces for `inbox`, `orch`, worktree execution, and `council-review` are defined +- SQLite schema drafts exist in the docs +- JSON output shapes are defined for the major flows +- Go module and initial command skeletons exist +- `inbox` and `orch` both compile +- shared SQLite schema initialization exists +- `inbox init` works and creates the database schema +- `orch` currently exists as a command skeleton only +- no higher-level inbox or orch workflows have been implemented yet + +This means the project is past design discovery and ready for code implementation. + +## Source Of Truth + +Read these docs first: + +- [architecture.md](/home/kurihada/project/ai-workflow-skill/docs/architecture.md) +- [inbox-cli.md](/home/kurihada/project/ai-workflow-skill/docs/inbox-cli.md) +- [orch-cli.md](/home/kurihada/project/ai-workflow-skill/docs/orch-cli.md) +- [worktree-execution.md](/home/kurihada/project/ai-workflow-skill/docs/worktree-execution.md) +- [council-review.md](/home/kurihada/project/ai-workflow-skill/docs/council-review.md) + +Use this roadmap for implementation order, not for protocol design. + +## Project Goal + +Build a Go-based local agent orchestration stack with: + +- `inbox`: worker-facing durable coordination bus +- `orch`: leader-facing scheduler and control plane +- strict worktree-backed execution for code-writing task attempts +- `council-review`: a user-facing three-reviewer brainstorm workflow implemented on top of `orch` + +## Implementation Principles + +- Do not redesign the protocol unless implementation reveals a real contradiction. +- Keep `inbox` and `orch` as separate CLIs or command groups, but share one SQLite file. +- Prefer one small working path over broad unfinished scaffolding. +- Make JSON output stable early. +- Implement the happy path first, then add wait/retry/cleanup. + +## Recommended v1 Order + +## Progress Snapshot + +Current implementation status: + +- `Milestone 1: Go Skeleton` is complete +- `Milestone 2: Shared DB Layer` is partially complete +- `Milestone 3: Inbox Happy Path` has started only through `inbox init` + +The next practical coding target is the rest of the inbox happy path. + +### Milestone 1: Go Skeleton + +Goal: + +- initialize the Go module +- choose CLI framework and SQLite driver +- create package layout +- make empty commands compile + +Recommended shape: + +- `cmd/inbox` +- `cmd/orch` +- `internal/db` +- `internal/store` +- `internal/protocol` +- `internal/cli` + +Definition of done: + +- `go build ./...` succeeds +- `inbox --help` works +- `orch --help` works + +Status: + +- completed + +### Milestone 2: Shared DB Layer + +Goal: + +- create the SQLite connection layer +- enable required pragmas +- add schema initialization and migration mechanism + +Minimum scope: + +- communication tables for `inbox` +- scheduling tables for `orch` +- shared `events` table + +Definition of done: + +- `inbox init` initializes the database +- `orch` can open the same database successfully + +Status: + +- partially completed + +Completed so far: + +- shared DB open layer exists +- required SQLite pragmas are applied +- embedded schema files exist +- `inbox init` applies schema successfully + +Remaining: + +- decide whether `orch` should gain an explicit DB bootstrap check or reuse `inbox init` + +### Milestone 3: Inbox Happy Path + +Goal: + +- implement worker-facing coordination primitives first + +First commands: + +- `inbox init` +- `inbox send` +- `inbox fetch` +- `inbox claim` +- `inbox update` +- `inbox reply` +- `inbox done` +- `inbox fail` +- `inbox show` + +Delay if needed: + +- `watch` +- `wait-reply` +- `cancel` +- `list` + +Definition of done: + +- one thread can be created, claimed, updated, replied to, and completed +- all major commands support `--json` + +Status: + +- not complete + +Completed so far: + +- `inbox init` + +Next commands to implement: + +- `inbox send` +- `inbox fetch` +- `inbox claim` +- `inbox update` +- `inbox done` +- `inbox show` + +### Milestone 4: Orch Core Scheduling + +Goal: + +- implement run/task/dependency/attempt orchestration on top of `inbox` + +First commands: + +- `orch run init` +- `orch task add` +- `orch dep add` +- `orch ready` +- `orch dispatch` +- `orch reconcile` +- `orch blocked` +- `orch answer` +- `orch status` + +Delay if needed: + +- `retry` +- `reassign` +- `cancel` +- `cleanup` +- `wait` + +Definition of done: + +- a leader can create a run +- add tasks and dependencies +- dispatch a task through `orch` +- see worker state reflected back after `reconcile` + +### Milestone 5: Strict Worktree Support + +Goal: + +- ensure code-writing tasks execute in isolated worktrees + +First scope: + +- `orch dispatch` resolves `base_ref` +- strict mode fails when the repo is dirty and no explicit base is provided +- worktree path and branch name are stored on the attempt + +Definition of done: + +- a code task dispatch creates a real worktree +- the assigned worktree path appears in attempt metadata and inbox payload + +### Milestone 6: Waiting Primitives + +Goal: + +- replace blind polling with blocking CLI waits + +Commands: + +- `orch wait` +- `inbox wait-reply` + +Definition of done: + +- leader can block on new task events +- blocked worker can block on reply events + +### Milestone 7: Council Review + +Goal: + +- implement the user-facing three-reviewer brainstorming workflow + +First commands: + +- `orch council start` +- `orch council wait` +- `orch council tally` +- `orch council report` + +Definition of done: + +- one council run can dispatch three reviewers +- tally grouped recommendations into `consensus`, `majority`, and `minority` +- produce stable JSON and a markdown report artifact + +## Immediate Next Task + +If a new agent is taking over now, the next concrete step should be: + +1. implement `inbox send` +2. implement `inbox fetch` +3. implement `inbox claim` +4. add a small integration test covering `init -> send -> fetch -> claim` + +This is the smallest meaningful slice because the project already has a compiling skeleton and working schema initialization. + +## Recommended Driver Choices + +Current recommendation: + +- CLI framework: `Cobra` +- SQLite driver: pure-Go driver + +Reason: + +- command surfaces are already command-group heavy +- pure-Go SQLite keeps distribution simpler + +## Suggested Early Tests + +Add these tests before the codebase grows too much: + +- schema init test +- inbox thread lifecycle test +- single-task orch dispatch and reconcile test +- worktree path generation test +- council tally grouping test + +## Out Of Scope For First Pass + +Do not block v1 on these: + +- distributed execution +- advanced auth or permissions +- patch-producing council mode +- configurable reviewer counts beyond three +- rich similarity engines for proposal grouping +- background daemons beyond blocking CLI commands + +## Handoff Notes For Future Agents + +- The design phase is complete enough to start coding. +- Avoid reopening major design questions unless implementation forces it. +- The repository already has compiling binaries and working schema init. +- Continue with inbox lifecycle commands before adding advanced orchestration. +- Preserve the separation: + - `inbox` handles communication + - `orch` handles scheduling + - `council-review` is a workflow on top of `orch` +- Treat this file as the implementation entrypoint for new agents. diff --git a/docs/inbox-cli.md b/docs/inbox-cli.md new file mode 100644 index 0000000..93e500e --- /dev/null +++ b/docs/inbox-cli.md @@ -0,0 +1,479 @@ +# Inbox CLI + +## Purpose + +`inbox` is the durable coordination bus for agent-to-agent communication. It is not the scheduler. It stores threads, messages, leases, and artifacts so workers and leaders can coordinate through reliable state instead of ad hoc multi-turn chat. + +In normal operation: + +- workers use `inbox` directly +- leaders use `inbox` mainly for inspection or manual override +- leader-side task planning and dispatch should happen through `orch` + +## Responsibilities + +`inbox` is responsible for: + +- creating and updating communication threads +- sending directed messages between agents +- allowing a worker to claim one thread at a time through leases +- recording progress, blocked questions, results, and failures +- attaching artifact references such as files, logs, or patches +- listing, watching, waiting, and showing thread history + +## Non-Responsibilities + +`inbox` should not decide: + +- how a large goal is decomposed into tasks +- whether a task is ready based on dependencies +- which worker should receive a task by policy +- when a failed task should be retried + +Those decisions belong to `orch`. + +## Core Objects + +- `thread`: the durable container for one work conversation +- `message`: one event inside a thread +- `lease`: an exclusive worker claim for a thread +- `artifact`: a path or file reference attached to a message +- `event`: a monotonic record used to wake blocking waiters + +## Required Fields + +### Thread Fields + +- `thread_id` +- `run_id` +- `task_id` +- `subject` +- `created_by` +- `assigned_to` +- `status` +- `priority` +- `created_at` +- `updated_at` + +### Message Fields + +- `message_id` +- `thread_id` +- `from_agent` +- `to_agent` +- `kind` +- `summary` +- `body` +- `payload_json` +- `created_at` + +## Message Kinds + +- `task` +- `progress` +- `question` +- `answer` +- `result` +- `control` +- `event` + +## Thread Status Values + +- `pending` +- `claimed` +- `in_progress` +- `blocked` +- `done` +- `failed` +- `cancelled` + +## Worker Protocol + +The normal worker flow is: + +1. `fetch` candidate threads +2. `claim` one thread +3. `update --status in_progress` +4. continue with `update` messages as needed +5. if blocked, set `blocked` and ask a precise question +6. wait for a reply with `inbox wait-reply` +7. finish with `done` or `fail` + +Rules: + +- `fetch` does not grant ownership +- only `claim` grants ownership +- only one active lease may exist per thread +- blocked messages must say exactly what is missing +- terminal messages should include result or failure summary +- a blocked worker should wait on a reply event rather than sleeping blindly + +## CLI Surface + +The binary name is `inbox`. + +### Global Flags + +- `--db PATH` +- `--json` +- `--agent NAME` + +### `inbox init` + +Initialize the communication schema and SQLite pragmas. + +Example: + +```bash +inbox init --db .agents/coord.db +``` + +### `inbox send` + +Create a new thread or append a message to an existing one. + +Suggested flags: + +- `--from AGENT` +- `--to AGENT` +- `--subject TEXT` +- `--thread THREAD_ID` +- `--run RUN_ID` +- `--task TASK_ID` +- `--kind task|question|answer|progress|result|control|event` +- `--summary TEXT` +- `--body TEXT` +- `--body-file PATH` +- `--payload-json STRING` +- `--priority low|normal|high` + +### `inbox fetch` + +List candidate threads for an agent without claiming them. + +Suggested flags: + +- `--agent AGENT` +- `--status pending,blocked` +- `--limit N` +- `--unread` + +### `inbox claim` + +Acquire a lease on a thread. + +Suggested flags: + +- `--agent AGENT` +- `--thread THREAD_ID` +- `--lease-seconds N` + +### `inbox renew` + +Extend an existing lease. + +Suggested flags: + +- `--agent AGENT` +- `--thread THREAD_ID` +- `--lease-seconds N` + +### `inbox update` + +Append a progress or blocked update. + +Suggested flags: + +- `--agent AGENT` +- `--thread THREAD_ID` +- `--status in_progress|blocked` +- `--summary TEXT` +- `--body TEXT` +- `--body-file PATH` +- `--payload-json STRING` + +### `inbox reply` + +Reply inside an existing thread. + +Suggested flags: + +- `--from AGENT` +- `--to AGENT` +- `--thread THREAD_ID` +- `--kind answer|question|progress|control` +- `--summary TEXT` +- `--body TEXT` +- `--body-file PATH` +- `--payload-json STRING` + +### `inbox done` + +Mark a thread complete and attach the final result. + +Suggested flags: + +- `--agent AGENT` +- `--thread THREAD_ID` +- `--summary TEXT` +- `--body TEXT` +- `--body-file PATH` +- `--payload-json STRING` + +### `inbox fail` + +Mark a thread failed. + +Suggested flags: + +- `--agent AGENT` +- `--thread THREAD_ID` +- `--summary TEXT` +- `--body TEXT` +- `--payload-json STRING` + +### `inbox cancel` + +Cancel a thread. + +Suggested flags: + +- `--agent AGENT` +- `--thread THREAD_ID` +- `--reason TEXT` + +### `inbox list` + +List threads with filters. + +Suggested flags: + +- `--agent AGENT` +- `--status pending,claimed,in_progress,blocked,done,failed,cancelled` +- `--created-by AGENT` +- `--assigned-to AGENT` +- `--limit N` + +### `inbox show` + +Show one thread with full message history. + +Suggested flags: + +- `--thread THREAD_ID` +- `--json` + +### `inbox watch` + +Block until new matching activity appears. + +Suggested flags: + +- `--agent AGENT` +- `--status pending,blocked,done,failed` +- `--timeout-seconds N` + +### `inbox wait-reply` + +Block until a new reply-like message appears for one thread. + +This is the normal wait primitive for a blocked worker. + +Suggested flags: + +- `--thread THREAD_ID` +- `--after-message MESSAGE_ID` +- `--after-event EVENT_ID` +- `--kinds answer|control|result` +- `--timeout-seconds N` + +Behavior: + +- waits until a later matching message exists in the thread +- returns the new message and associated event cursor +- avoids blind `sleep` loops in worker logic + +## JSON Contract + +Every command should support `--json`. + +Suggested success shape: + +```json +{ + "ok": true, + "command": "claim", + "thread": { + "thread_id": "thr_123", + "task_id": "T4", + "status": "claimed", + "assigned_to": "backend-worker" + } +} +``` + +Suggested error shape: + +```json +{ + "ok": false, + "error": { + "code": "lease_conflict", + "message": "thread already claimed by another worker" + } +} +``` + +Suggested `wait-reply` wake shape: + +```json +{ + "ok": true, + "command": "wait-reply", + "woke": true, + "next_event_id": 127, + "message": { + "message_id": "msg_901", + "thread_id": "thr_123", + "kind": "answer", + "summary": "Use email/password for MVP", + "body": "Use a simple credential flow for the first iteration." + } +} +``` + +## Exit Codes + +- `0`: success +- `10`: no matching work +- `20`: conflict such as lease contention +- `30`: invalid input or invalid state transition +- `40`: not found +- `50`: storage or internal error + +## SQLite Schema Draft + +```sql +CREATE TABLE IF NOT EXISTS threads ( + thread_id TEXT PRIMARY KEY, + run_id TEXT NOT NULL, + task_id TEXT NOT NULL, + subject TEXT NOT NULL, + created_by TEXT NOT NULL, + assigned_to TEXT NOT NULL, + status TEXT NOT NULL, + priority TEXT NOT NULL DEFAULT 'normal', + latest_message_id TEXT, + created_at TEXT NOT NULL, + updated_at TEXT NOT NULL +); + +CREATE TABLE IF NOT EXISTS messages ( + message_id TEXT PRIMARY KEY, + thread_id TEXT NOT NULL, + from_agent TEXT NOT NULL, + to_agent TEXT NOT NULL, + kind TEXT NOT NULL, + summary TEXT NOT NULL, + body TEXT NOT NULL DEFAULT '', + payload_json TEXT NOT NULL DEFAULT '{}', + created_at TEXT NOT NULL, + FOREIGN KEY(thread_id) REFERENCES threads(thread_id) +); + +CREATE TABLE IF NOT EXISTS leases ( + thread_id TEXT PRIMARY KEY, + agent_id TEXT NOT NULL, + lease_token TEXT NOT NULL, + claimed_at TEXT NOT NULL, + expires_at TEXT NOT NULL, + released_at TEXT +); + +CREATE TABLE IF NOT EXISTS artifacts ( + artifact_id TEXT PRIMARY KEY, + message_id TEXT NOT NULL, + path TEXT NOT NULL, + kind TEXT NOT NULL, + metadata_json TEXT NOT NULL DEFAULT '{}', + created_at TEXT NOT NULL, + FOREIGN KEY(message_id) REFERENCES messages(message_id) +); + +CREATE TABLE IF NOT EXISTS events ( + event_id INTEGER PRIMARY KEY AUTOINCREMENT, + run_id TEXT NOT NULL, + task_id TEXT NOT NULL, + thread_id TEXT NOT NULL, + source TEXT NOT NULL, + event_type TEXT NOT NULL, + message_id TEXT, + summary TEXT NOT NULL DEFAULT '', + payload_json TEXT NOT NULL DEFAULT '{}', + created_at TEXT NOT NULL +); + +CREATE INDEX IF NOT EXISTS idx_threads_status_assigned + ON threads(status, assigned_to, updated_at); + +CREATE INDEX IF NOT EXISTS idx_messages_thread_created + ON messages(thread_id, created_at); + +CREATE INDEX IF NOT EXISTS idx_events_thread_event + ON events(thread_id, event_id); +``` + +## Concurrency Notes + +- use `PRAGMA journal_mode=WAL` +- keep write transactions short +- make `claim` atomic +- never mutate state during `fetch` +- implement `watch` and `wait-reply` as blocking queries over message or event cursors, not as user-managed `sleep` + +## Embedded Skill Draft + +The following block is a draft `SKILL.md` for the `inbox` skill. + +````markdown +```markdown +--- +name: inbox +description: Use this skill when an agent needs durable communication through the local inbox CLI. It is for fetching work, claiming a thread, sending progress updates, raising blocked questions, waiting for replies, replying inside a thread, returning results, and watching inbox activity. Do not use it for task decomposition or scheduling decisions; use the orchestrator skill for that. +--- + +# Inbox + +Use this skill when you need to communicate through the `inbox` CLI and its SQLite-backed thread store. + +## When To Use + +- a worker needs to fetch and claim work +- a worker needs to report progress +- a worker is blocked and must ask for clarification +- a leader needs to inspect or manually reply inside a thread +- an agent needs durable, machine-readable coordination instead of ad hoc chat + +## Rules + +- Prefer `--json` when another agent will consume the output. +- Never treat `fetch` as ownership; only `claim` grants ownership. +- Do not start work without a valid lease. +- When blocked, say exactly what is missing. +- If blocked, prefer `inbox wait-reply` instead of manual sleep loops. +- Use `result` for final output and `fail` for terminal failure. +- Keep scheduling decisions out of this layer. + +## Typical Commands + +```bash +inbox fetch --agent backend-worker --status pending --json +inbox claim --agent backend-worker --thread thr_123 --lease-seconds 900 --json +inbox update --agent backend-worker --thread thr_123 --status in_progress --summary "Implementing post CRUD routes" --json +inbox update --agent backend-worker --thread thr_123 --status blocked --summary "Need auth decision" --payload-json '{"question":"Should admin auth use email/password in MVP?"}' --json +inbox wait-reply --thread thr_123 --after-event 51 --timeout-seconds 1800 --json +inbox reply --from leader --to backend-worker --thread thr_123 --kind answer --summary "Use email/password for MVP" --body "Use a simple credential flow for the first iteration." --json +inbox done --agent backend-worker --thread thr_123 --summary "Post CRUD implemented" --body-file result.md --json +``` +``` +```` diff --git a/docs/orch-cli.md b/docs/orch-cli.md new file mode 100644 index 0000000..159a872 --- /dev/null +++ b/docs/orch-cli.md @@ -0,0 +1,602 @@ +# Orch CLI + +## Purpose + +`orch` is the leader-facing scheduler and control plane. It owns the run, task graph, dependencies, ready queue, dispatch decisions, retries, and reassignment logic. + +`orch` does not replace `inbox`. It uses `inbox` as the durable transport and execution record. + +In normal operation: + +- leaders use `orch` +- `orch` creates and monitors `inbox` threads +- workers continue using `inbox` + +## Responsibilities + +`orch` is responsible for: + +- creating a run for one user request or project +- defining tasks and dependencies +- calculating which tasks are ready +- dispatching ready tasks to workers +- tracking attempts and mapping them to inbox threads +- allocating attempt worktrees for code tasks +- surfacing blocked tasks to the leader +- sending answers back into the active inbox thread +- reconciling thread state into task state +- blocking until actionable events arrive for the leader +- retrying, reassigning, cancelling, or adding follow-up tasks + +## Non-Responsibilities + +`orch` should not implement: + +- worker claiming +- direct worker polling +- raw message append storage +- low-level thread history management + +Those belong to `inbox`. + +## Core Objects + +- `run`: one coordinated execution for a user request +- `task`: one schedulable unit of work +- `dependency`: an edge between tasks +- `attempt`: one execution try for a task +- `dispatch`: the act of materializing a task into an inbox thread +- `workspace`: the branch and worktree assigned to one code-writing attempt + +## Workspace Model + +For code-writing tasks, `orch` should allocate one Git worktree per attempt. + +Strict policy: + +- dispatch from a concrete committed `base_ref` +- fail dispatch if strict mode is enabled and the leader is implicitly relying on uncommitted state +- create a fresh worktree for every retry +- do not let workers edit the user's primary checkout + +See [worktree-execution.md](/home/kurihada/project/ai-workflow-skill/docs/worktree-execution.md) for the full execution model. + +## Task State Model + +- `planned`: task exists but is not yet eligible for dispatch +- `ready`: dependencies are satisfied and it can be dispatched +- `dispatched`: an inbox thread exists but the worker has not started yet +- `running`: the task has been claimed and is actively executing +- `blocked`: the active attempt needs clarification or an external dependency +- `done`: task completed and passed its current acceptance gate +- `failed`: task completed unsuccessfully +- `cancelled`: task was cancelled and should not continue + +Suggested transitions: + +- `planned -> ready` +- `ready -> dispatched` +- `dispatched -> running` +- `running -> blocked` +- `blocked -> running` +- `running -> done` +- `running -> failed` +- `failed -> ready` through explicit retry +- `* -> cancelled` by leader action + +## Leader Workflow + +The normal leader loop is: + +1. create a run +2. add tasks +3. add dependencies +4. inspect `ready` +5. `dispatch` tasks +6. `reconcile` inbox state back into task state +7. inspect `blocked` +8. answer blocked questions +9. if nothing is actionable, call `wait` +10. retry or reassign failures when needed +11. finish when all required tasks are `done` + +The leader should block on `orch wait`, not on ad hoc `sleep`. + +## CLI Surface + +The binary name is `orch`. + +### Global Flags + +- `--db PATH` +- `--json` + +### `orch run init` + +Create a new run. + +Suggested flags: + +- `--run RUN_ID` +- `--goal TEXT` +- `--summary TEXT` + +Example: + +```bash +orch run init --db .agents/coord.db --run blog_mvp_001 --goal "Build blog MVP" --summary "Public blog plus admin CRUD" +``` + +### `orch run show` + +Show run metadata and current aggregate status. + +Suggested flags: + +- `--run RUN_ID` + +### `orch task add` + +Add a task to a run. + +Suggested flags: + +- `--run RUN_ID` +- `--task TASK_ID` +- `--title TEXT` +- `--summary TEXT` +- `--default-to AGENT` +- `--acceptance-json STRING` +- `--priority low|normal|high` + +### `orch dep add` + +Add a dependency edge. + +Suggested flags: + +- `--run RUN_ID` +- `--task TASK_ID` +- `--depends-on TASK_ID` + +### `orch ready` + +List tasks ready for dispatch. + +Suggested flags: + +- `--run RUN_ID` +- `--limit N` + +### `orch dispatch` + +Dispatch a ready task to a worker by creating an inbox thread and the first task message. + +Suggested flags: + +- `--run RUN_ID` +- `--task TASK_ID` +- `--to AGENT` +- `--base-ref REF` +- `--workspace-root PATH` +- `--strict-worktree` +- `--body TEXT` +- `--body-file PATH` + +Behavior: + +- creates a new attempt +- resolves a committed base revision +- creates a branch and worktree for the attempt when the task writes code +- creates or links an `inbox` thread +- writes workspace metadata into attempt storage and task payload +- moves the task to `dispatched` + +Strict-mode recommendation: + +- if `--base-ref` is omitted and the repository is clean, default to `HEAD` +- if `--base-ref` is omitted and the repository is dirty, fail dispatch +- if `--base-ref` is provided, resolve it to a commit and use it exactly + +### `orch reconcile` + +Read inbox state and update run/task state. + +Suggested flags: + +- `--run RUN_ID` + +Behavior: + +- maps inbox `claimed` or `in_progress` to `running` +- maps inbox `blocked` to `blocked` +- maps inbox `done` to `done` +- maps inbox `failed` to `failed` + +### `orch blocked` + +List blocked tasks and their latest question. + +Suggested flags: + +- `--run RUN_ID` + +### `orch wait` + +Block until one or more run-scoped events become available. + +This is the normal wait primitive for the interactive leader. + +Suggested flags: + +- `--run RUN_ID` +- `--for task_ready,task_blocked,task_done,task_failed` +- `--after-event EVENT_ID` +- `--timeout-seconds N` + +Behavior: + +- blocks until a later matching event exists +- returns a cursor for the next wait +- lets the leader wait for worker activity without manual sleep loops + +### `orch answer` + +Answer the active blocked question for a task by writing into the mapped inbox thread. + +Suggested flags: + +- `--run RUN_ID` +- `--task TASK_ID` +- `--body TEXT` +- `--body-file PATH` +- `--payload-json STRING` + +### `orch retry` + +Explicitly retry a failed task. + +Suggested flags: + +- `--run RUN_ID` +- `--task TASK_ID` +- `--to AGENT` +- `--body TEXT` +- `--body-file PATH` + +Behavior: + +- creates a new attempt +- links the retry to the prior failed attempt +- dispatches a new inbox thread or fresh task message + +### `orch reassign` + +Move a blocked or failed task to another worker. + +Suggested flags: + +- `--run RUN_ID` +- `--task TASK_ID` +- `--to AGENT` +- `--reason TEXT` + +### `orch cancel` + +Cancel a task or an entire run. + +Suggested flags: + +- `--run RUN_ID` +- `--task TASK_ID` +- `--reason TEXT` + +### `orch cleanup` + +Remove completed or abandoned attempt worktrees that are no longer needed. + +Suggested flags: + +- `--run RUN_ID` +- `--task TASK_ID` +- `--attempt N` +- `--all-completed` +- `--force` + +### `orch status` + +Show task state summary for the run. + +Suggested flags: + +- `--run RUN_ID` + +### `orch show` + +Show one task with dependencies, attempts, and inbox mapping. + +Suggested flags: + +- `--run RUN_ID` +- `--task TASK_ID` + +### `orch council start` + +Start a three-reviewer council workflow for one target. + +Suggested flags: + +- `--run RUN_ID` +- `--target TEXT` +- `--target-file PATH` +- `--repo-path PATH` +- `--task-id TASK_ID` +- `--target-type text|repo|mixed` +- `--mode brainstorm|review` +- `--output markdown|json|both` +- `--only-unanimous` + +Default behavior: + +- fixed reviewer roles: `architecture-reviewer`, `implementation-reviewer`, `risk-reviewer` +- analysis only +- `--target-type mixed` +- `--output both` +- unanimous-only disabled unless requested +- reviewer count fixed at `3` in v1 + +### `orch council wait` + +Block until the council has enough reviewer responses to continue. + +Suggested flags: + +- `--run RUN_ID` +- `--timeout-seconds N` + +### `orch council tally` + +Group similar reviewer suggestions and compute support counts. + +Suggested flags: + +- `--run RUN_ID` +- `--similarity strict|normal` + +Behavior: + +- groups semantically similar reviewer proposals +- assigns `consensus`, `majority`, or `minority` +- persists grouped recommendations in `orch` storage + +Default behavior: + +- `--similarity normal` + +### `orch council report` + +Render the final grouped council output. + +Suggested flags: + +- `--run RUN_ID` +- `--show consensus|majority|minority|all|consensus,majority` + +Default behavior: + +- show `consensus,majority` +- preserve `minority` in persisted storage even if omitted from the main report +- support both markdown artifacts and JSON output + +## Relationship To Inbox + +`orch` should be implemented as a control plane on top of `inbox`. + +- `orch dispatch` writes the first `task` message into `inbox` +- `orch dispatch` also writes worktree metadata for code tasks into the attempt record and inbox payload +- workers claim and update status through `inbox` +- `orch reconcile` reads thread state and converts it into task state +- `orch answer` writes an inbox `answer` message to the active thread + +The leader should not need to hand-write `inbox send` during normal dispatch. + +Higher-level workflows such as council review should also run on top of `orch`, not as a separate infrastructure layer. See [council-review.md](/home/kurihada/project/ai-workflow-skill/docs/council-review.md). + +## Waiting Model + +The leader does not receive worker output as an in-memory push. Instead: + +- workers write updates into `inbox` +- `inbox` appends events +- `orch reconcile` converts thread state into task state +- `orch wait` blocks on the run-scoped event stream + +This is still a single leader model. `orch wait` is just the leader's blocking read primitive. + +## JSON Contract + +Every command should support `--json`. + +Suggested success shape: + +```json +{ + "ok": true, + "command": "dispatch", + "run_id": "blog_mvp_001", + "task": { + "task_id": "T4", + "status": "dispatched", + "assigned_to": "backend-worker" + }, + "attempt": { + "attempt_no": 1, + "thread_id": "thr_987", + "base_ref": "main", + "base_commit": "abc1234", + "branch_name": "orch/blog_mvp_001/T4/attempt-1", + "worktree_path": ".orch/worktrees/blog_mvp_001/T4/attempt-1" + } +} +``` + +Suggested `wait` wake shape: + +```json +{ + "ok": true, + "command": "wait", + "woke": true, + "next_event_id": 127, + "events": [ + { + "event_id": 127, + "type": "task_blocked", + "run_id": "blog_mvp_001", + "task_id": "T5", + "thread_id": "thr_t5_attempt1", + "summary": "Editor choice undecided", + "payload": { + "question": "Should the admin editor use a rich text editor or plain textarea in MVP?" + } + } + ] +} +``` + +## Exit Codes + +- `0`: success +- `10`: no ready or matching tasks +- `20`: conflict +- `30`: invalid input or invalid state transition +- `40`: not found +- `50`: storage or internal error + +## SQLite Schema Draft + +These tables should live in the same SQLite file as the inbox tables. + +```sql +CREATE TABLE IF NOT EXISTS runs ( + run_id TEXT PRIMARY KEY, + goal TEXT NOT NULL, + summary TEXT NOT NULL DEFAULT '', + status TEXT NOT NULL DEFAULT 'active', + created_at TEXT NOT NULL, + updated_at TEXT NOT NULL +); + +CREATE TABLE IF NOT EXISTS tasks ( + run_id TEXT NOT NULL, + task_id TEXT NOT NULL, + title TEXT NOT NULL, + summary TEXT NOT NULL DEFAULT '', + status TEXT NOT NULL, + default_to TEXT, + priority TEXT NOT NULL DEFAULT 'normal', + acceptance_json TEXT NOT NULL DEFAULT '[]', + latest_attempt_no INTEGER, + created_at TEXT NOT NULL, + updated_at TEXT NOT NULL, + PRIMARY KEY (run_id, task_id), + FOREIGN KEY(run_id) REFERENCES runs(run_id) +); + +CREATE TABLE IF NOT EXISTS task_dependencies ( + run_id TEXT NOT NULL, + task_id TEXT NOT NULL, + depends_on_task_id TEXT NOT NULL, + PRIMARY KEY (run_id, task_id, depends_on_task_id) +); + +CREATE TABLE IF NOT EXISTS task_attempts ( + run_id TEXT NOT NULL, + task_id TEXT NOT NULL, + attempt_no INTEGER NOT NULL, + assigned_to TEXT NOT NULL, + thread_id TEXT NOT NULL, + base_ref TEXT, + base_commit TEXT, + branch_name TEXT, + worktree_path TEXT, + workspace_status TEXT, + result_commit TEXT, + status TEXT NOT NULL, + created_at TEXT NOT NULL, + updated_at TEXT NOT NULL, + PRIMARY KEY (run_id, task_id, attempt_no) +); + +CREATE INDEX IF NOT EXISTS idx_tasks_run_status + ON tasks(run_id, status, priority, updated_at); + +CREATE TABLE IF NOT EXISTS events ( + event_id INTEGER PRIMARY KEY AUTOINCREMENT, + run_id TEXT NOT NULL, + task_id TEXT NOT NULL, + thread_id TEXT, + source TEXT NOT NULL, + event_type TEXT NOT NULL, + message_id TEXT, + summary TEXT NOT NULL DEFAULT '', + payload_json TEXT NOT NULL DEFAULT '{}', + created_at TEXT NOT NULL +); + +CREATE INDEX IF NOT EXISTS idx_events_run_event + ON events(run_id, event_id); +``` + +## Embedded Skill Draft + +The following block is a draft `SKILL.md` for the leader-facing orchestration skill. + +````markdown +```markdown +--- +name: orchestrator +description: Use this skill when the leader needs to plan and schedule work through the orch CLI. It is for creating runs, adding tasks and dependencies, finding ready work, dispatching tasks to workers, allocating task worktrees, reconciling inbox state, waiting for worker events, reviewing blocked tasks, answering them, retrying failures, reassigning work, and cleaning up attempt worktrees. Do not use this skill for worker-side claim or progress updates; use inbox for that. +--- + +# Orchestrator + +Use this skill when you are the leader and need to control the task graph through the `orch` CLI. + +## When To Use + +- you need to decompose a goal into tasks +- you need to record dependencies +- you need to know which tasks are ready +- you need to dispatch work to workers +- you need to allocate isolated worktrees for code-writing tasks +- you need to inspect blocked tasks and answer them +- you need to retry or reassign a failed task + +## Rules + +- Prefer `orch` over hand-written `inbox send` for normal leader operations. +- Reconcile inbox state before making new dispatch decisions. +- If nothing is actionable, use `orch wait` instead of manual sleep loops. +- For code tasks, dispatch from a committed base and allocate a fresh worktree per attempt. +- Keep tasks small enough to be checkable and to minimize clarification loops. +- Use `inbox` directly only for inspection or manual repair. +- Keep user-facing discussion in the leader. + +## Typical Commands + +```bash +orch run init --run blog_mvp_001 --goal "Build blog MVP" --summary "Public blog plus admin CRUD" --json +orch task add --run blog_mvp_001 --task T1 --title "Project skeleton" --summary "Initialize app structure and database wiring" --default-to foundation-worker --json +orch dep add --run blog_mvp_001 --task T2 --depends-on T1 --json +orch ready --run blog_mvp_001 --json +orch dispatch --run blog_mvp_001 --task T1 --to foundation-worker --base-ref main --workspace-root .orch/worktrees --strict-worktree --body-file tasks/t1.md --json +orch reconcile --run blog_mvp_001 --json +orch wait --run blog_mvp_001 --for task_blocked,task_done,task_failed --after-event 0 --timeout-seconds 900 --json +orch blocked --run blog_mvp_001 --json +orch answer --run blog_mvp_001 --task T2 --body "MVP supports draft and published only." --json +orch retry --run blog_mvp_001 --task T7a --to backend-worker --body "Retry after fixing the contract mismatch." --json +orch cleanup --run blog_mvp_001 --all-completed --json +``` +``` +```` diff --git a/docs/worktree-execution.md b/docs/worktree-execution.md new file mode 100644 index 0000000..fc2504d --- /dev/null +++ b/docs/worktree-execution.md @@ -0,0 +1,225 @@ +# Worktree Execution Model + +## Purpose + +This document defines how code-writing workers should execute in isolated Git worktrees instead of modifying the user's primary working tree. + +The default recommendation is strict mode: + +- every task attempt gets its own Git worktree +- the worktree is created by `orch` during dispatch +- the worker writes code only inside that assigned worktree +- the leader reviews and integrates the result later + +This model is intended for code tasks. Non-code tasks may not need a worktree. + +## Why Worktrees + +Using a worktree per task attempt gives: + +- isolation between concurrent workers +- protection for the user's main working tree +- deterministic base revisions for review and retry +- easier cleanup of abandoned or failed attempts +- a clean mapping from task attempt to diff, branch, and workspace path + +## Strict Mode + +Strict mode means workers only start from an explicit committed Git base. + +Rules: + +- `orch dispatch` must resolve a concrete `base_ref` to a commit +- worker execution must happen in a separate worktree +- the worker must not modify the user's primary checkout +- each retry gets a fresh attempt and a fresh worktree +- the leader integrates results explicitly after review + +## Base Revision Policy + +The safest default is: + +- use an explicit committed `base_ref` +- fail dispatch if the leader is implicitly relying on uncommitted local changes + +Recommended strict policy: + +1. if `--base-ref` is provided, resolve it to a commit and use that exact commit +2. if `--base-ref` is omitted and the repository is clean, default to `HEAD` +3. if `--base-ref` is omitted and the repository has uncommitted changes, fail dispatch + +This keeps worker execution reproducible and avoids hidden divergence from the leader's uncommitted state. + +## Lifecycle + +### 1. Leader Decides a Task Is Ready + +`orch` determines that a task may be dispatched. + +### 2. `orch dispatch` Creates an Attempt Workspace + +For one task attempt, `orch` should: + +- pick a `base_ref` +- create an attempt record +- choose a branch name +- create a worktree path +- create the worktree from the chosen base commit +- write workspace metadata into the attempt record +- include the workspace metadata in the inbox task payload + +### 3. Worker Runs Inside the Assigned Worktree + +The worker runtime should launch `codex exec` with the assigned worktree as its working directory. + +Example shape: + +```bash +codex exec -C /path/to/worktrees/blog_mvp_001/T4/attempt-1 "Implement the assigned task using the provided repository context." +``` + +The worker must treat the assigned worktree as its only writable repository root. + +### 4. Worker Reports Through `inbox` + +The worker reports progress, blocked questions, results, and failure through `inbox`. + +### 5. Leader Reviews and Integrates + +After completion, the leader may: + +- inspect the worktree directly +- inspect the branch diff +- request fixes through a new attempt +- merge or cherry-pick the result into the integration branch + +### 6. `orch cleanup` Removes Unneeded Worktrees + +After review and integration, `orch cleanup` should remove completed or abandoned worktrees that are no longer needed. + +## Attempt-To-Workspace Mapping + +The mapping should be one-to-one: + +- one task attempt +- one branch +- one worktree path + +Do not reuse a worktree for multiple attempts. Reuse creates hidden state and makes retries harder to reason about. + +## Naming Conventions + +Recommended branch naming: + +```text +orch///attempt- +``` + +Recommended worktree path: + +```text +.orch/worktrees///attempt- +``` + +Example: + +```text +branch: orch/blog_mvp_001/T4/attempt-1 +path: .orch/worktrees/blog_mvp_001/T4/attempt-1 +``` + +Identifiers should be sanitized for filesystem and Git branch compatibility. + +## Workspace Metadata + +Each task attempt should record: + +- `base_ref` +- `base_commit` +- `branch_name` +- `worktree_path` +- `workspace_status` +- `result_commit` if the worker produces one + +Suggested `workspace_status` values: + +- `created` +- `active` +- `completed` +- `abandoned` +- `cleaned` + +## Dispatch-Time Behavior + +`orch dispatch` should treat worktree creation as part of dispatch, not as a later best-effort step. + +Recommended behavior: + +1. validate the Git repository and base commit +2. enforce strict mode policy +3. create the attempt row +4. create the branch and worktree +5. create the inbox thread and initial task message +6. write worktree metadata into the task payload +7. mark the task as `dispatched` + +If worktree creation fails, dispatch should fail atomically and the task should remain undispatched. + +## Worker Runtime Contract + +The worker runtime may be a small launcher around `codex exec`. + +Recommended responsibilities: + +- read the assigned thread or attempt metadata +- claim the inbox thread +- launch `codex exec -C ` +- forward result status into `inbox` +- never rewrite the worktree assignment itself + +This keeps workspace ownership in `orch` and execution ownership in the worker runtime. + +## Review and Integration + +Strict mode works best when integration is explicit. + +The leader should decide whether to: + +- merge the attempt branch +- cherry-pick one or more commits +- open a follow-up attempt for fixes +- discard the attempt and remove the worktree + +Workers should not self-merge into the user's main branch. + +## Retry Policy + +A retry should create: + +- a new attempt number +- a new branch +- a new worktree + +Do not reopen the previous worktree for a retry unless you are intentionally debugging that attempt by hand. + +## Failure and Cleanup + +Recommended `orch cleanup` behavior: + +- remove worktrees for completed attempts after integration +- remove worktrees for abandoned or superseded attempts +- preserve worktrees that are still running or under active review unless forced + +Suggested flags: + +- `--run RUN_ID` +- `--task TASK_ID` +- `--attempt N` +- `--all-completed` +- `--force` + +## Relationship To Existing Docs + +- [architecture.md](/home/kurihada/project/ai-workflow-skill/docs/architecture.md): high-level separation of inbox and orch +- [orch-cli.md](/home/kurihada/project/ai-workflow-skill/docs/orch-cli.md): scheduling commands that create and manage attempts +- [blog-project-example.md](/home/kurihada/project/ai-workflow-skill/docs/blog-project-example.md): example of dispatch creating a worktree-backed attempt