# Implementation Roadmap

## Purpose

This document is the handoff-oriented implementation plan for the project. It is intentionally short and execution-focused.

A new agent should be able to read this file, understand the current project state, and immediately know what to build next without re-deriving the whole design.

## Current Status

As of now:

- architecture and workflow docs are written
- CLI surfaces for `inbox`, `orch`, worktree execution, and `council-review` are defined
- embedded SQLite schema and migrations exist in code
- JSON output shapes are defined for the major flows
- Go module and initial command skeletons exist
- `inbox` and `orch` both compile
- shared SQLite schema initialization exists
- `inbox` is implemented end-to-end, including send/fetch/claim/renew/update/reply/done/fail/cancel/list/show/watch/wait-reply
- `inbox` supports blocking waits, lease renewal, unread fetches backed by per-agent read cursors, `--body-file`, artifact attachments, and structured JSON errors with stable exit codes
- integration tests now cover each implemented inbox command, plus the main inbox workflows, wait/watch flows, artifact persistence, unread behavior, and JSON error contracts
- a human-readable inbox command test-plan set has been authored under `docs/tests/inbox/`
- a human-readable `orch` test-plan set has now been authored under `docs/tests/orch/`, with a `ROADMAP.md`, shared conventions, workflow scenarios, per-command indexes, and concrete case documents aligned to the current CLI surface, including supplemental coverage for key flag validation, ordering/limit behavior, payload-only answers, cleanup errors, and council report default/error contracts
- a reusable Codex skill package for `inbox` now exists under `skills/inbox/`, with a formal `SKILL.md`, `agents/openai.yaml`, and a bundled CLI binary asset
- reusable Codex skill packages for `orch` and `council-review` now exist under `skills/orch/` and `skills/council-review/`, both using bundled copies of the `orch` CLI binary asset
- an inbox skill forward-test plan directory now exists under `docs/tests/inbox-skill/`, with a shared execution template and multiple scenario cases
- an orch skill forward-test plan directory now exists under `docs/tests/orch-skill/`, with a shared execution contract and eight leader-side workflow scenarios
- a repo-local replay runner now exists at `scripts/run_orch_skill_forward_tests.sh`, and all eight `docs/tests/orch-skill/` cases now include recorded example runs from bundled-CLI replays captured on `2026-03-19`, including added coverage for dependency-gated ready sequencing, active task cancellation, and payload-only blocked answers
- the original five `docs/tests/orch-skill/` cases also include recorded real subagent-forward runs captured on `2026-03-19`, with spawned leader and worker agents using the packaged `skills/orch/` and `skills/inbox/` bundles
- a council-review skill forward-test plan directory now exists under `docs/tests/council-review-skill/`, with a shared execution contract and nine council workflow scenarios covering end-to-end flow, unanimous-only defaults, timeout/before-tally errors, explicit minority reporting, invalid report filters, strict tally semantics, malformed reviewer JSON, and target-file inputs
- an execution-roadmap workflow now exists under `docs/roadmaps/active/` and `docs/roadmaps/archive/` for agent-level work traces and completion archives
- a repo-local `scripts/package_skill_clis.sh` packaging flow now builds bundled skill CLI assets for `inbox`, `orch`, and `council-review`
- `orch` now implements `run init/show`, `task add`, `dep add`, `ready`, `dispatch`, `reconcile`, `wait`, `blocked`, `answer`, `retry`, `reassign`, `cancel`, `cleanup`, and `status`
- `orch` can create runs, gate tasks through dependencies, dispatch work through `inbox`, reconcile worker thread state back into task state, answer blocked tasks, retry or reassign work, cancel tasks or runs, clean attempt worktrees, and create per-attempt Git worktrees during strict dispatch
- `orch dispatch` now supports `--repo-path`, `--workspace-root`, and `--strict-worktree`, auto-enables strict worktree mode for code-like tasks inferred from task metadata, resolves committed base revisions, records workspace metadata on attempts, and writes that metadata into inbox task payloads
- `orch wait` now blocks on run-scoped task events and reconciles inbox state while polling so leader waits can wake on worker progress without manual sleep loops
- `orch council start` now creates a dedicated council run, persists council target input metadata, and dispatches the three fixed reviewer roles through the existing scheduler
- `orch council wait` now blocks until the three reviewer tasks reach terminal states or a timeout is reached
- `orch council tally` now parses completed reviewer outputs, persists `council_findings`, groups recommendations into `consensus`, `majority`, and `minority`, and persists `council_groups`
- `orch council report` now reads persisted `council_groups`, renders human-readable markdown reports, writes markdown artifacts, and persists final report metadata in `council_reports`
- automated integration tests now cover the main `orch` scheduler slice, including dependency gating, dispatch, blocked-answer flow, retry, reassign, cancel, cleanup, strict worktree creation, automatic code-task worktree enablement, dirty-repo rejection rules, wait wake/timeout behavior, and council start/wait/tally/report behavior
- additional `orch` command and workflow contract tests now cover the full documented Markdown case set under `docs/tests/orch/`, including `run init/show`, `task add` validation, ready ordering, dispatch attempt/thread contracts, blocked latest-question output, answer payload-only and empty-input rejection, cleanup selector and no-match errors, status summaries, reconcile failed-state mapping, strict-worktree dispatch-to-cleanup, and council report default/error behavior

This means the project now has a working `orch` core scheduler with automatic worktree selection for code-like tasks, strict worktree-backed dispatch, the main leader-side control loop, and the full v1 council workflow from start through final report generation.

## Source Of Truth

Read these docs first:

- [architecture.md](/home/kurihada/project/ai-workflow-skill/docs/architecture.md)
- [inbox-cli.md](/home/kurihada/project/ai-workflow-skill/docs/inbox-cli.md)
- [orch-cli.md](/home/kurihada/project/ai-workflow-skill/docs/orch-cli.md)
- [worktree-execution.md](/home/kurihada/project/ai-workflow-skill/docs/worktree-execution.md)
- [council-review.md](/home/kurihada/project/ai-workflow-skill/docs/council-review.md)

Use this roadmap for implementation order, not for protocol design.

## Project Goal

Build a Go-based local agent orchestration stack with:

- `inbox`: worker-facing durable coordination bus
- `orch`: leader-facing scheduler and control plane
- strict worktree-backed execution for code-writing task attempts
- `council-review`: a user-facing three-reviewer brainstorm workflow implemented on top of `orch`

## Implementation Principles

- Do not redesign the protocol unless implementation reveals a real contradiction.
- Keep `inbox` and `orch` as separate CLIs or command groups, but share one SQLite file.
- Prefer one small working path over broad unfinished scaffolding.
- Make JSON output stable early.
- Implement the happy path first, then add wait/retry/cleanup.

## Recommended v1 Order

## Progress Snapshot

Current implementation status:

- `Milestone 1: Go Skeleton` is complete
- `Milestone 2: Shared DB Layer` is complete enough for both CLIs
- `Milestone 3: Inbox Happy Path` is complete
- `Milestone 4: Orch Core Scheduling` is complete for the current non-worktree scheduler scope
- `Milestone 5: Strict Worktree Support` is complete
- `Milestone 6: Waiting Primitives` is complete
- `Milestone 7: Council Review` is complete

The council review v1 surface is now complete, including final report rendering and metadata persistence.

### Milestone 1: Go Skeleton

Goal:

- initialize the Go module
- choose CLI framework and SQLite driver
- create package layout
- make empty commands compile

Recommended shape:

- `cmd/inbox`
- `cmd/orch`
- `internal/db`
- `internal/store`
- `internal/protocol`
- `internal/cli`

Definition of done:

- `go build ./...` succeeds
- `inbox --help` works
- `orch --help` works

Status:

- completed

### Milestone 2: Shared DB Layer

Goal:

- create the SQLite connection layer
- enable required pragmas
- add schema initialization and migration mechanism

Minimum scope:

- communication tables for `inbox`
- scheduling tables for `orch`
- shared `events` table

Definition of done:

- `inbox init` initializes the database
- `orch` can open the same database successfully

Status:

- completed for current inbox needs

Completed so far:

- shared DB open layer exists
- required SQLite pragmas are applied
- embedded schema files exist
- `inbox init` applies schema successfully

Remaining:

- decide whether `orch` should gain an explicit DB bootstrap check or continue to rely on `inbox init`

### Milestone 3: Inbox Happy Path

Goal:

- implement worker-facing coordination primitives first

First commands:

- `inbox init`
- `inbox send`
- `inbox fetch`
- `inbox claim`
- `inbox update`
- `inbox reply`
- `inbox done`
- `inbox fail`
- `inbox show`

Delay if needed:

- `watch`
- `wait-reply`
- `cancel`
- `list`

Definition of done:

- one thread can be created, claimed, updated, replied to, and completed
- all major commands support `--json`

Status:

- completed

Completed so far:

- `inbox init`
- `inbox send`
- `inbox fetch`
- `inbox claim`
- `inbox renew`
- `inbox update`
- `inbox reply`
- `inbox done`
- `inbox fail`
- `inbox cancel`
- `inbox list`
- `inbox show`
- `inbox watch`
- `inbox wait-reply`

### Milestone 4: Orch Core Scheduling

Goal:

- implement run/task/dependency/attempt orchestration on top of `inbox`

First commands:

- `orch run init`
- `orch task add`
- `orch dep add`
- `orch ready`
- `orch dispatch`
- `orch reconcile`
- `orch blocked`
- `orch answer`
- `orch status`

Delay if needed:

- `retry`
- `reassign`
- `cancel`
- `cleanup`
- `wait`

Definition of done:

- a leader can create a run
- add tasks and dependencies
- dispatch a task through `orch`
- see worker state reflected back after `reconcile`

Status:

- completed for the current non-worktree scheduling scope

Completed so far:

- `orch run init`
- `orch run show`
- `orch task add`
- `orch dep add`
- `orch ready`
- `orch dispatch`
- `orch reconcile`
- `orch wait`
- `orch blocked`
- `orch answer`
- `orch retry`
- `orch reassign`
- `orch cancel`
- `orch cleanup`
- `orch status`
- CLI integration tests cover dispatch/reconcile, dependency gating, blocked-answer flow, wait wake/timeout, retry, reassign, cancel, cleanup, and non-ready dispatch rejection

Remaining:

- none for the current scheduler control surface

### Milestone 5: Strict Worktree Support

Goal:

- ensure code-writing tasks execute in isolated worktrees

First scope:

- `orch dispatch` resolves `base_ref`
- strict mode fails when the repo is dirty and no explicit base is provided
- worktree path and branch name are stored on the attempt

Definition of done:

- a code task dispatch creates a real worktree
- the assigned worktree path appears in attempt metadata and inbox payload

Status:

- completed

Completed so far:

- `orch dispatch` can use `--repo-path` to target a source Git repository without relying on the caller's current working directory
- `orch dispatch --strict-worktree` resolves `base_ref` to a concrete commit, defaults to `HEAD` on clean repositories, and rejects dirty repositories when `--base-ref` is omitted
- `orch dispatch` auto-selects worktree mode for code-like tasks inferred from existing task metadata such as worker role and acceptance markers
- dispatch creates a fresh branch and Git worktree per attempt and persists `base_ref`, `base_commit`, `branch_name`, `worktree_path`, and `workspace_status`
- dispatch writes workspace metadata into the inbox task payload for worker runtimes
- reconcile now advances `workspace_status` from `created` to `active`, `completed`, or `abandoned` based on thread state
- `orch cleanup` removes completed or abandoned worktrees and marks attempt workspace state as `cleaned`
- CLI integration tests cover strict worktree creation, auto-enabled worktrees for code-like tasks, explicit-base dispatch on dirty repos, strict dirty-repo rejection, and cleanup

Remaining:

- none

### Milestone 6: Waiting Primitives

Goal:

- replace blind polling with blocking CLI waits

Commands:

- `orch wait`
- `inbox wait-reply`

Definition of done:

- leader can block on new task events
- blocked worker can block on reply events

Status:

- completed

Completed so far:

- `orch wait`
- `inbox wait-reply`
- `orch wait` reconciles inbox state while polling and wakes on matching run-scoped `task_*` events
- CLI integration tests cover wait wake and timeout behavior

### Milestone 7: Council Review

Goal:

- implement the user-facing three-reviewer brainstorming workflow

First commands:

- `orch council start`
- `orch council wait`
- `orch council tally`
- `orch council report`

Definition of done:

- one council run can dispatch three reviewers
- tally grouped recommendations into `consensus`, `majority`, and `minority`
- produce stable JSON and a markdown report artifact

Status:

- completed

Completed so far:

- council-specific storage now includes run metadata, reviewer assignment rows, reviewer findings/groups tables, persisted council input references, and final report metadata
- `orch council start`
- `orch council wait`
- `orch council tally`
- `orch council report`
- council start creates a dedicated run, stores council target input metadata, creates reviewer tasks `CR1` through `CR3`, and dispatches the fixed reviewer roles `architecture-reviewer`, `implementation-reviewer`, and `risk-reviewer`
- council wait blocks until all three reviewer tasks reach terminal states or timeout
- council tally parses structured reviewer outputs from completed reviewer result messages and persists grouped recommendations
- council report reads grouped recommendations from persisted `council_groups`, supports `--show` bucket filtering, renders markdown report artifacts, and persists report metadata plus artifact paths
- CLI integration tests cover council start dispatch, metadata persistence, council wait wake/timeout behavior, council tally grouping in `normal` and `strict` modes, and council report default/all/JSON rendering behavior

Remaining:

- none for the v1 council workflow

## Immediate Next Task

If a new agent is taking over now, the next concrete step should be:

1. treat `Milestone 7: Council Review` as complete unless a new user request introduces a new council capability
2. keep the authored inbox test-plan set in `docs/tests/inbox/` synchronized if future `orch` work changes shared CLI behavior
3. choose the next milestone explicitly instead of reopening the completed council v1 slice

The inbox implementation and its human-readable test-plan set are already in place, and `orch` now supports the main scheduler loop plus the complete council start/wait/tally/report workflow, so any next step should be a new milestone rather than unfinished council v1 work.

## Recommended Driver Choices

Current recommendation:

- CLI framework: `Cobra`
- SQLite driver: pure-Go driver

Reason:

- command surfaces are already command-group heavy
- pure-Go SQLite keeps distribution simpler

## Suggested Early Tests

Completed so far:

- schema init test
- inbox command-level CLI integration coverage aligned to `docs/tests/inbox/`
- inbox workflow lifecycle coverage
- orch scheduler lifecycle coverage for run/task/dependency/dispatch/reconcile
- orch blocked-question and answer coverage
- orch strict worktree creation and dirty-repo policy coverage
- orch wait wake and timeout coverage
- orch retry, reassign, cancel, and cleanup coverage
- orch council start dispatch and persistence coverage
- orch council wait wake and timeout coverage
- orch council tally grouping coverage
- orch council report default markdown, `--show all`, and JSON shape coverage

Still recommended before the codebase grows too much:

- worktree path generation test

## Inbox Test Documentation Roadmap

Status:

- completed for the current inbox CLI surface
- command-level and workflow Markdown documents exist under `docs/tests/inbox/`
- future updates should revise this section only when new inbox commands or materially new CLI-visible behavior are added

Goal:

- make inbox behavior easy for a new agent to understand and convert into automated tests without re-reading all code paths

Directory layout:

- `docs/tests/inbox/README.md`
- `docs/tests/inbox/_shared/README.md`
- `docs/tests/inbox/workflows/README.md`
- `docs/tests/inbox/<command>/README.md`
- `docs/tests/inbox/<command>/<case-slug>.md`

Initial command folders:

- `init`
- `send`
- `fetch`
- `claim`
- `renew`
- `update`
- `reply`
- `done`
- `fail`
- `cancel`
- `list`
- `show`
- `watch`
- `wait-reply`

Documentation rules:

- organize by folder with a `README.md` entrypoint
- command folders use `README.md` as an index only
- each command case lives in its own Markdown file named after the case slug
- do not use numeric test case IDs
- identify command cases by concrete file path
- keep one command per directory, plus `workflows/` for cross-command behavior
- use `_shared/` for common fixtures, database conventions, exit-code rules, and shared JSON assertions

Required per-case structure:

- `用例意义`
- `前置条件`
- `输入`
- `预期输出`
- `断言结论`

Case file naming pattern:

- `<stable-slug>.md`

Authoring order:

1. global conventions in `docs/tests/inbox/README.md`
2. shared fixtures and assertion helpers in `docs/tests/inbox/_shared/README.md`
3. lifecycle flow in `docs/tests/inbox/workflows/README.md`
4. core command docs: `send`, `fetch`, `claim`, `reply`, `done`, `show`
5. secondary command docs: `renew`, `update`, `fail`, `cancel`, `list`
6. waiting and read-state docs: `watch`, `wait-reply`, unread and mark-read workflow cases

Definition of done:

- every implemented inbox command has a dedicated document directory
- every documented case contains concrete input and expected output
- shared assumptions are centralized instead of copied into each command file
- a new agent can pick any case and implement it as an automated test with minimal additional discovery

## Orch Test Documentation Roadmap

Status:

- current planned `orch` Markdown test-plan set is authored under `docs/tests/orch/`
- global conventions, shared fixtures, workflow scenarios, per-command indexes, and concrete case documents now exist
- `docs/tests/orch/ROADMAP.md` now tracks authored counts, document progress, and future additions in the same style used for `docs/tests/inbox/ROADMAP.md`
- supplemental command-visible cases now cover high-value gaps in `task add`, `ready`, `answer`, `cleanup`, and `council report`

Goal:

- make `orch` behavior easy for a new agent to understand and convert into automated tests or manual validation steps without re-reading all scheduler code paths

Directory layout:

- `docs/tests/orch/README.md`
- `docs/tests/orch/ROADMAP.md`
- `docs/tests/orch/_shared/README.md`
- `docs/tests/orch/workflows/README.md`
- `docs/tests/orch/<leaf-command>/README.md`
- `docs/tests/orch/<leaf-command>/<case-slug>.md`

Current document model:

- one folder per implemented leaf command
- each command folder uses `README.md` as an index only
- workflow cases live in `docs/tests/orch/workflows/README.md`
- detailed case backlog and authored-case register are tracked centrally in `docs/tests/orch/ROADMAP.md`

Next documentation step:

- keep `docs/tests/orch/ROADMAP.md` synchronized when new `orch` CLI behavior or workflow cases are added, removed, or materially revised

## Out Of Scope For First Pass

Do not block v1 on these:

- advanced auth or permissions
- background daemons beyond blocking CLI commands

## Handoff Notes For Future Agents

- The design phase is complete enough to start coding.
- Avoid reopening major design questions unless implementation forces it.
- The repository already has compiling binaries and working schema init.
- The inbox test-plan docs are in place; keep them synchronized before and during broad `orch` implementation.
- inbox command test-plan folders use `README.md` as an index plus one file per case; keep any further structural changes consistent with the documented rules above.
- Preserve the separation:
  - `inbox` handles communication
  - `orch` handles scheduling
  - `council-review` is a workflow on top of `orch`
- When writing inbox test docs, use the folder-per-command structure described above and keep cross-command cases inside `docs/tests/inbox/workflows/`.
- Treat this file as the implementation entrypoint for new agents.