Remove markdown test docs and document tests inline

2026-03-24 02:30:38 +08:00
parent fb2b2dc8be
commit fd2b57feaf
257 changed files with 174 additions and 10431 deletions
@@ -96,15 +96,6 @@ Examples:
 │  ├─ package_skill_runtimes.sh
 │  ├─ skill-bundles.json
 │  └─ ...
-└─ docs/tests/
-   ├─ inbox/
-   ├─ orch/
-   ├─ repo-memory/
-   ├─ inbox-skill/
-   ├─ orch-skill/
-   ├─ council-review-skill/
-   ├─ repo-memory-skill/
-   └─ ...
 ```

 ## Package Boundaries
@@ -314,28 +305,16 @@ Each runtime package owns:
 - integration tests
 - package-local fixtures

-### CLI Markdown Test Plans
+### Test Intent Documentation

-Standalone CLIs with user-facing contracts should also keep a Markdown test-plan
-set under `docs/tests/<cli>/`.
+User-facing test intent should live with the executable tests, not in a separate
+Markdown plan tree.

-Examples:
+Required shape:

- `docs/tests/inbox/`
- `docs/tests/orch/`
- `docs/tests/repo-memory/`
-
-### Skill Forward Tests
-
-`docs/tests/*-skill/` remains skill-oriented.
-These tests validate the bundled skill behavior, not only the runtime package.
-
-Examples:
-
- `docs/tests/inbox-skill/`
- `docs/tests/orch-skill/`
- `docs/tests/council-review-skill/`
- `docs/tests/repo-memory-skill/`
+- add a short comment above each top-level test describing the behavior it protects
+- prefer package-local fixtures and helpers over cross-repo prose test plans
+- keep bundled-skill verification as executable tests or scripts, not as standalone Markdown inventories

 ### Cross-Package Validation

@@ -351,7 +330,7 @@ Keep documentation split by concern:

 - runtime/package docs live under the owning package when tightly tied to implementation
 - cross-workspace architecture docs stay in root `docs/`
- skill forward-test plans stay in `docs/tests/*-skill/`
+- test intent stays in executable test source through short comments above top-level test cases

 This document becomes the repository-level source of truth for the workspace
 split.
@@ -424,8 +403,6 @@ Changes:
 - move the exploratory repo-memory runtime into `packages/repo-memory-runtime`
 - normalize module pathing, tests, and packaging
 - add `skills/repo-memory`
- add `docs/tests/repo-memory/`
- add `docs/tests/repo-memory-skill/`

 Exit criteria:

@@ -1,182 +0,0 @@
-# Council Review Skill Test Plan
-
-## Purpose
-
-This directory tracks human-readable test plans for the `skills/council-review/` Codex skill bundle.
-
-These documents are not command-contract specs for the `orch council` CLI itself.
-That coverage already lives under [../orch/](../orch/).
-
-This directory exists to describe a different test surface:
-
- whether a leader agent can actually use the packaged `council-review` skill
- whether the bundled `./assets/orch` CLI works inside real skill-guided council workflows
- whether a council run driven by the skill reaches the expected reviewer, grouping, tally, and report state
-
-## Test Model
-
- `README.md` is the index for this directory
- each skill test case lives in its own Markdown file
- use stable case slugs in filenames
-
-## Shared Execution Contract
-
-Use these defaults unless a case file explicitly overrides them:
-
- run the scenario with real subagents, not simulated transcripts
- inject `skills/council-review/` into the leader agent
- inject `skills/inbox/` into reviewer agents whenever reviewer task completion is required
- initialize the shared SQLite DB before launching role agents with `INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json init`
- require the leader to coordinate through the bundled `./assets/orch` CLI from the council-review skill instead of ordinary chat
- require reviewer agents to coordinate through the bundled `./assets/inbox` CLI from their skill instead of ordinary chat
- validate final council run, reviewer task state, and report state independently from the main thread after the agents stop
- create any required target-file or repo fixture before launching agents for target-file, mixed, or repo-target cases
-
-## How An Agent Runs These Cases
-
-Use one test-runner agent to execute each case.
-
-The test-runner agent is responsible for:
-
- reading this `README.md` first, then one specific case file
- creating an isolated temporary directory and DB path for that run
- initializing the DB once through the bundled inbox CLI before launching role agents
- creating any required temporary target file or Git repo fixture before launching role agents
- launching the role agents described in `Agent Topology`
- injecting `skills/council-review/` into the leader and `skills/inbox/` into reviewers
- passing each role agent the prompt text from the case file with concrete values substituted for `COUNCIL_SKILL_PATH`, `INBOX_SKILL_PATH`, `TMPDIR`, `RUN_ID`, `THREAD_ID`, and `REPORT_PATH` when needed
- coordinating launch order or parallel start according to the case file
- collecting agent final summaries as evidence
- resolving final run ids, thread ids, and report artifact paths from agent outputs
- running the `Validation Commands` from the main thread after the role agents stop
- comparing the observed results against `Expected Outcomes` and `Assertions`
- returning a final pass/fail judgment with concrete evidence
-
-The role agents are responsible for:
-
- acting only within the role assigned in the case file
- using the injected skill bundle rather than ad hoc repository discovery
- coordinating through the bundled CLI and shared DB
- reporting concrete run ids, thread ids, report artifact paths, and key command outcomes back to the test-runner agent
-
-The test-runner agent should treat a case as passed only when:
-
- all role agents reach a final state without violating the case contract
- the independent validation commands succeed
- the final council, orch, and inbox state matches the assertions in the case file
-
-The test-runner agent should treat a case as failed when:
-
- any required agent times out or stalls
- a required council, orch, or inbox action is skipped
- the leader falls back to ordinary chat for workflow control that should go through the bundled council-review skill
- reviewer agents fall back to ordinary chat instead of returning results through inbox
- the final council grouping, summary, or report state conflicts with the documented assertions
-
-The test-runner agent should report results in this shape:
-
- `case`
- `db_path`
- `run_id`
- `thread_ids`
- `report_paths`
- `result`: `pass` or `fail`
- `agent_summaries`
- `validation_evidence`
- `assertion_checklist`
- `notes`
-
-## Default Timeouts
-
-Use these defaults unless a case file explicitly overrides them:
-
- per-agent timeout: `4m`
- overall scenario timeout: `6m`
- async wait margin for the main thread: `45s`
-
-## Default Failure Conditions
-
-Treat the test as failed if any of the following happens:
-
- any required agent does not reach a final state before timeout
- any required council, orch, or inbox command returns a non-success result unless the case expects that failure
- the final `council report --json` output does not match the expected grouped recommendations
- the final `orch status` output does not match the expected reviewer task state
- a required markdown report artifact is missing when the case expects one
- the agents fall back to ordinary chat for critical coordination instead of the bundled CLIs
-
-## Evidence Capture
-
-Collect at least the following artifacts for every run:
-
- agent final summaries
- final `council report --json` output when the case reaches report stage
- final `orch status --run RUN_ID --json` output
- final `inbox show --thread THREAD_ID --json` output for every relevant reviewer thread when reviewers participated
- any `council wait` or `council tally` output relevant to the case
- the temporary DB path, resolved run id, resolved thread ids, and any report artifact paths
-
-## Cleanup Policy
-
-Use these defaults unless a case file explicitly overrides them:
-
- keep the temporary DB, repo fixture, and working directory on failure for debugging
- cleanup the temporary working directory on success only if the caller does not need replay artifacts
-
-## Per-Case Template
-
-Each case file should use this structure:
-
- `Test Type`
- `Purpose`
- `Preconditions`
- `Agent Topology`
- `Inputs`
- `Execution Parameters`
- `Execution Steps`
- `Validation Commands`
- `Expected Outcomes`
- `Assertions`
- `Cleanup`
- `Recorded Example Run` when a real run has already been captured
-
-## Case Files
-
-| Case Slug | File | Coverage Note |
-| --- | --- | --- |
-| `council-brainstorm-end-to-end-through-bundled-cli` | [council-brainstorm-end-to-end-through-bundled-cli.md](./council-brainstorm-end-to-end-through-bundled-cli.md) | validates that the council-review skill can drive `start -> wait -> tally -> report` with three real reviewer agents |
-| `council-unanimous-only-default-report-through-bundled-cli` | [council-unanimous-only-default-report-through-bundled-cli.md](./council-unanimous-only-default-report-through-bundled-cli.md) | validates that unanimous-only runs default to `consensus` output while preserving the underlying summary counts |
-| `council-wait-timeout-through-bundled-cli` | [council-wait-timeout-through-bundled-cli.md](./council-wait-timeout-through-bundled-cli.md) | validates that the leader sees the expected timeout contract when reviewer tasks do not complete |
-| `council-report-rejects-before-tally-through-bundled-cli` | [council-report-rejects-before-tally-through-bundled-cli.md](./council-report-rejects-before-tally-through-bundled-cli.md) | validates that the skill surfaces the stable invalid-state error when report is attempted before tally |
-| `council-report-show-all-includes-minority-through-bundled-cli` | [council-report-show-all-includes-minority-through-bundled-cli.md](./council-report-show-all-includes-minority-through-bundled-cli.md) | validates that an explicit `--show all` report includes the otherwise hidden minority group |
-| `council-report-rejects-invalid-show-through-bundled-cli` | [council-report-rejects-invalid-show-through-bundled-cli.md](./council-report-rejects-invalid-show-through-bundled-cli.md) | validates that the leader sees the stable `invalid_input` contract for an invalid report bucket selection |
-| `council-tally-strict-keeps-distinct-proposals-through-bundled-cli` | [council-tally-strict-keeps-distinct-proposals-through-bundled-cli.md](./council-tally-strict-keeps-distinct-proposals-through-bundled-cli.md) | validates that strict similarity preserves near-duplicate wording as separate minority groups |
-| `council-reviewer-output-invalid-json-fails-tally-through-bundled-cli` | [council-reviewer-output-invalid-json-fails-tally-through-bundled-cli.md](./council-reviewer-output-invalid-json-fails-tally-through-bundled-cli.md) | validates that malformed reviewer result JSON reaches the leader as the stable tally-time `invalid_input` contract |
-| `council-start-with-target-file-through-bundled-cli` | [council-start-with-target-file-through-bundled-cli.md](./council-start-with-target-file-through-bundled-cli.md) | validates that the skill can start a council run from explicit `--target-file` context instead of a pure inline prompt |
-
-## Scope
-
-In scope:
-
- explicit `$council-review` skill invocation
- bundled `./assets/orch` CLI usage for `orch council ...`
- end-to-end council start, wait, tally, and report flows
- interaction between a leader using `skills/council-review/` and reviewers using `skills/inbox/`
- default report policy, explicit minority inclusion, and invalid report-filter validation
- normal and strict tally behavior
- malformed reviewer-output failure paths
- non-prompt target context including `target-file`
-
-Out of scope:
-
- per-command flag and JSON contract coverage for `orch council`
- generic leader orchestration flows that already belong under [../orch-skill/](../orch-skill/)
- worker-only skill behavior that belongs under [../inbox-skill/](../inbox-skill/)
- implicit skill triggering without `$council-review`
-
-## Relationship To Other Test Docs
-
- [../orch/](../orch/) covers CLI command behavior
- [../orch-skill/](../orch-skill/) covers generic leader-side orchestration behavior on top of `orch`
- [../inbox-skill/](../inbox-skill/) covers worker-side skill-guided behavior on top of inbox
- this directory covers the separate user-facing `council-review` skill on top of `orch council`
@@ -1,108 +0,0 @@
-# Case: `council-brainstorm-end-to-end-through-bundled-cli`
-
-## Test Type
-
-This is a `forward-test` and a high-level council workflow validation.
-
-The goal is to verify that a leader using the packaged `council-review` skill can drive `council start -> wait -> tally -> report` while three real reviewer agents return structured outputs through the packaged inbox skill.
-
-## Purpose
-
-Validate that all of the following can be true at the same time:
-
- the leader can use the bundled `./assets/orch` CLI through the council-review skill
- three reviewer agents can claim and complete their fixed-role inbox tasks
- the leader can wait, tally, and report after all reviewer outputs arrive
- the final report defaults to `consensus,majority`
- a markdown report artifact is written
-
-## Preconditions
-
- council-review skill path exists: `COUNCIL_SKILL_PATH=skills/council-review`
- inbox skill path exists: `INBOX_SKILL_PATH=skills/inbox`
- bundled CLI executables exist at `COUNCIL_SKILL_PATH/assets/orch` and `INBOX_SKILL_PATH/assets/inbox`
- use an empty temporary directory `TMPDIR`
- initialize `TMPDIR/coord.db` before launching role agents through `INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json init`
-
-## Agent Topology
-
- `leader`
- `architecture-reviewer`
- `implementation-reviewer`
- `risk-reviewer`
-
-## Inputs
-
-### Leader Prompt
-
-```text
-Use $council-review at COUNCIL_SKILL_PATH to act as leader on the already initialized SQLite DB TMPDIR/coord.db. Only coordinate through the bundled orch CLI from the skill. Workflow: 1) start council run council_skill_001 with a short architecture review prompt, 2) wait until all three reviewers complete, 3) tally with normal similarity, 4) report with default settings, 5) stop after reporting RUN_ID and REPORT_PATH. Do not use ordinary chat to coordinate with the reviewers.
-```
-
-### Architecture Reviewer Prompt
-
-```text
-Use $inbox at INBOX_SKILL_PATH to act as architecture-reviewer on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) fetch and claim your assigned council task, 2) complete it with done using this exact JSON body: {"reviewer_role":"architecture-reviewer","findings":[{"title":"Split contracts","summary":"Transport contracts are mixed into UI code.","proposal":"Move API contract definitions into a dedicated module.","rationale":"This lowers coupling.","confidence":"high","tags":["architecture"],"target_refs":{"repo_path":"."}},{"title":"Share helpers","summary":"Council report rendering paths are repeated.","proposal":"Introduce shared council coordinator helpers for report rendering.","rationale":"This keeps report assembly consistent.","confidence":"medium","tags":["reporting"],"target_refs":{"repo_path":"."}}]}, 3) stop after reporting THREAD_ID. Do not use ordinary chat to coordinate with the leader.
-```
-
-### Implementation Reviewer Prompt
-
-```text
-Use $inbox at INBOX_SKILL_PATH to act as implementation-reviewer on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) fetch and claim your assigned council task, 2) complete it with done using this exact JSON body: {"reviewer_role":"implementation-reviewer","findings":[{"title":"Extract contracts","summary":"Shared transport shapes are duplicated.","proposal":"Move API contract definitions into dedicated module","rationale":"This reduces duplication.","confidence":"high","tags":["maintainability"],"target_refs":{"repo_path":"."}},{"title":"Reuse report helpers","summary":"Formatting logic should stay shared.","proposal":"Introduce shared council coordinator helpers for report rendering","rationale":"This avoids formatter drift.","confidence":"medium","tags":["reporting"],"target_refs":{"repo_path":"."}}]}, 3) stop after reporting THREAD_ID. Do not use ordinary chat to coordinate with the leader.
-```
-
-### Risk Reviewer Prompt
-
-```text
-Use $inbox at INBOX_SKILL_PATH to act as risk-reviewer on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) fetch and claim your assigned council task, 2) complete it with done using this exact JSON body: {"reviewer_role":"risk-reviewer","findings":[{"title":"Lock contracts","summary":"Contract drift becomes risky over time.","proposal":"Move API contract definitions into a dedicated module.","rationale":"This reduces integration regressions.","confidence":"high","tags":["risk"],"target_refs":{"repo_path":"."}},{"title":"Cover JSON output","summary":"The council report response should stay stable.","proposal":"Add regression tests for council report JSON output.","rationale":"This catches contract regressions earlier.","confidence":"high","tags":["testing"],"target_refs":{"repo_path":"."}}]}, 3) stop after reporting THREAD_ID. Do not use ordinary chat to coordinate with the leader.
-```
-
-## Execution Parameters
-
- use the shared execution contract from [README.md](./README.md)
- use the shared timeout defaults from [README.md](./README.md)
- do not override the default cleanup policy
-
-## Execution Steps
-
-1. Initialize `TMPDIR/coord.db` once through the bundled inbox CLI before launching agents
-2. Inject `skills/council-review/` into `leader`
-3. Inject `skills/inbox/` into the three reviewer agents
-4. Point all agents at the same database path `TMPDIR/coord.db`
-5. Launch `leader`, `architecture-reviewer`, `implementation-reviewer`, and `risk-reviewer` in parallel
-6. Wait for all agents to finish
-7. Resolve `RUN_ID=council_skill_001`, reviewer `THREAD_ID`s, and `REPORT_PATH` from the agent outputs
-8. Independently run the validation commands from the main thread
-
-## Validation Commands
-
-```bash
-COUNCIL_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json status --run council_skill_001
-COUNCIL_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json council report --run council_skill_001
-test -f REPORT_PATH
-```
-
-## Expected Outcomes
-
- the leader successfully starts `council_skill_001`
- all three reviewers complete their fixed-role tasks
- `council wait` returns `all_complete == true`
- `council tally` returns one `consensus`, one `majority`, and one `minority`
- `council report` defaults to showing `consensus,majority`
- a markdown report artifact exists on disk
-
-## Assertions
-
- `status.data.run.status == "done"`
- `status.data.tasks` contains exactly three reviewer tasks and all are `done`
- `report.data.show == ["consensus","majority"]`
- `report.data.summary.consensus == 1`
- `report.data.summary.majority == 1`
- `report.data.summary.minority == 1`
- `report.data.grouped_recommendations` length is `2`
- `REPORT_PATH` exists
-
-## Cleanup
-
- use the default cleanup policy from [README.md](./README.md)
- if the run fails, retain `TMPDIR` and `coord.db` for replay and manual inspection
@@ -1,73 +0,0 @@
-# Case: `council-report-rejects-before-tally-through-bundled-cli`
-
-## Test Type
-
-This is a `forward-test` and an invalid-state council workflow validation.
-
-The goal is to verify that a leader using the packaged `council-review` skill sees the expected stable error when report is attempted before grouped recommendations have been persisted.
-
-## Purpose
-
-Validate that all of the following can be true at the same time:
-
- the leader can start a council run through the bundled council-review skill
- the leader can attempt report without tally
- the command returns the stable invalid-state contract rather than fabricating an empty report
-
-## Preconditions
-
- council-review skill path exists: `COUNCIL_SKILL_PATH=skills/council-review`
- bundled CLI executable exists at `COUNCIL_SKILL_PATH/assets/orch`
- inbox skill path exists: `INBOX_SKILL_PATH=skills/inbox`
- use an empty temporary directory `TMPDIR`
- initialize `TMPDIR/coord.db` before launching role agents through `INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json init`
-
-## Agent Topology
-
- `leader`
-
-## Inputs
-
-### Leader Prompt
-
-```text
-Use $council-review at COUNCIL_SKILL_PATH to act as leader on the already initialized SQLite DB TMPDIR/coord.db. Only coordinate through the bundled orch CLI from the skill. Workflow: 1) start council run council_skill_004 with a short review target, 2) attempt council report immediately without running tally, 3) stop after reporting RUN_ID, exit code, and error payload. Do not use ordinary chat to simulate reviewer output.
-```
-
-## Execution Parameters
-
- use the shared execution contract from [README.md](./README.md)
- use the shared timeout defaults from [README.md](./README.md)
- do not override the default cleanup policy
-
-## Execution Steps
-
-1. Initialize `TMPDIR/coord.db` once through the bundled inbox CLI before launching agents
-2. Inject `skills/council-review/` into `leader`
-3. Point the leader at the database path `TMPDIR/coord.db`
-4. Launch the leader
-5. Wait for the leader to finish
-6. Independently run the validation commands from the main thread
-
-## Validation Commands
-
-```bash
-COUNCIL_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json council report --run council_skill_004
-```
-
-## Expected Outcomes
-
- the leader successfully starts `council_skill_004`
- the report command exits with the stable invalid-state contract
- the error message indicates that council tally must run first
-
-## Assertions
-
- command exit code is `30`
- error code is `invalid_state`
- the error message mentions that grouped recommendations are not available yet or that `council tally` must run first
-
-## Cleanup
-
- use the default cleanup policy from [README.md](./README.md)
- if the run fails, retain `TMPDIR` and `coord.db` for replay and manual inspection
@@ -1,102 +0,0 @@
-# Case: `council-report-rejects-invalid-show-through-bundled-cli`
-
-## Test Type
-
-This is a `forward-test` and an invalid-input report-filter validation.
-
-The goal is to verify that a leader using the packaged `council-review` skill reaches the stable `invalid_input` error contract when it asks `council report` for an unsupported bucket list.
-
-## Purpose
-
-Validate that all of the following can be true at the same time:
-
- the leader can drive a real council run through `start -> wait -> tally`
- three reviewer agents can complete their tasks through the packaged inbox skill
- the leader can attempt `council report --show consensus,invalid`
- the skill surfaces the stable `invalid_input` error instead of silently dropping the bad bucket
-
-## Preconditions
-
- council-review skill path exists: `COUNCIL_SKILL_PATH=skills/council-review`
- inbox skill path exists: `INBOX_SKILL_PATH=skills/inbox`
- bundled CLI executables exist at `COUNCIL_SKILL_PATH/assets/orch` and `INBOX_SKILL_PATH/assets/inbox`
- use an empty temporary directory `TMPDIR`
- initialize `TMPDIR/coord.db` before launching role agents through `INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json init`
-
-## Agent Topology
-
- `leader`
- `architecture-reviewer`
- `implementation-reviewer`
- `risk-reviewer`
-
-## Inputs
-
-### Leader Prompt
-
-```text
-Use $council-review at COUNCIL_SKILL_PATH to act as leader on the already initialized SQLite DB TMPDIR/coord.db. Only coordinate through the bundled orch CLI from the skill. Workflow: 1) start council run council_skill_006 with a short architecture review prompt, 2) wait until all three reviewers complete, 3) tally with normal similarity, 4) attempt council report with --show consensus,invalid, 5) stop after reporting RUN_ID, exit code, and the error payload you observed. Do not use ordinary chat to coordinate with the reviewers.
-```
-
-### Reviewer Prompts
-
- Reuse the same reviewer body JSON and inbox-only workflow as in [council-brainstorm-end-to-end-through-bundled-cli.md](./council-brainstorm-end-to-end-through-bundled-cli.md), but target run `council_skill_006`.
-
-## Execution Parameters
-
- use the shared execution contract from [README.md](./README.md)
- use the shared timeout defaults from [README.md](./README.md)
- do not override the default cleanup policy
-
-## Execution Steps
-
-1. Initialize `TMPDIR/coord.db` once through the bundled inbox CLI before launching agents
-2. Inject `skills/council-review/` into `leader`
-3. Inject `skills/inbox/` into the three reviewer agents
-4. Point all agents at the same database path `TMPDIR/coord.db`
-5. Launch `leader`, `architecture-reviewer`, `implementation-reviewer`, and `risk-reviewer` in parallel
-6. Wait for all agents to finish
-7. Independently run the validation commands from the main thread
-
-## Validation Commands
-
-```bash
-COUNCIL_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json council report --run council_skill_006 --show consensus,invalid
-```
-
-## Expected Outcomes
-
- the leader successfully starts `council_skill_006`
- reviewer completion and tally both succeed before the invalid report attempt
- the report command exits with the stable invalid-input contract
- the error message names the accepted bucket values
-
-## Assertions
-
- command exit code is `30`
- error code is `invalid_input`
- the error message mentions `consensus`
- the error message mentions `majority`
- the error message mentions `minority`
- the error message mentions `all`
-
-## Cleanup
-
- use the default cleanup policy from [README.md](./README.md)
- if the run fails, retain `TMPDIR` and `coord.db` for replay and manual inspection
-
-## Recorded Real Forward Run
-
- recorded on: `2026-03-19`
- execution mode: `real_subagent_forward_test`
- result: `pass`
- evidence root: `/tmp/council-skill-invalid-show-narrow.Sw6so6`
- observed run id: `council_skill_006`
- observed thread ids:
- `architecture-reviewer`: `thr_7fad634dd9d245239d4fbd2287992d54`
- `implementation-reviewer`: `thr_fc76cff125f04fc491064b828a18ff69`
- `risk-reviewer`: `thr_f421bf49fa1240beb5c7a2d5f38aab6b`
- evidence summary:
- main-thread `status --run council_skill_006 --json` returned `run.status == "done"` and `task_counts.done == 3`
- main-thread `council report --run council_skill_006 --show consensus,invalid --json` exited with code `30`
- the returned error payload was `invalid_input` with message `show must contain consensus, majority, minority, or all`
@@ -1,107 +0,0 @@
-# Case: `council-report-show-all-includes-minority-through-bundled-cli`
-
-## Test Type
-
-This is a `forward-test` and an explicit report-filter validation.
-
-The goal is to verify that a leader using the packaged `council-review` skill can override the default report buckets and explicitly request the minority group through the bundled CLI.
-
-## Purpose
-
-Validate that all of the following can be true at the same time:
-
- the leader can drive a complete `start -> wait -> tally -> report` council flow through the bundled council-review skill
- three reviewer agents can complete their tasks through the packaged inbox skill
- the leader can request `council report --show all`
- the final report includes `consensus`, `majority`, and `minority`
-
-## Preconditions
-
- council-review skill path exists: `COUNCIL_SKILL_PATH=skills/council-review`
- inbox skill path exists: `INBOX_SKILL_PATH=skills/inbox`
- bundled CLI executables exist at `COUNCIL_SKILL_PATH/assets/orch` and `INBOX_SKILL_PATH/assets/inbox`
- use an empty temporary directory `TMPDIR`
- initialize `TMPDIR/coord.db` before launching role agents through `INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json init`
-
-## Agent Topology
-
- `leader`
- `architecture-reviewer`
- `implementation-reviewer`
- `risk-reviewer`
-
-## Inputs
-
-### Leader Prompt
-
-```text
-Use $council-review at COUNCIL_SKILL_PATH to act as leader on the already initialized SQLite DB TMPDIR/coord.db. Only coordinate through the bundled orch CLI from the skill. Workflow: 1) start council run council_skill_005 with a short architecture review prompt, 2) wait until all three reviewers complete, 3) tally with normal similarity, 4) report with --show all, 5) stop after reporting RUN_ID, REPORT_PATH, and the show buckets you observed. Do not use ordinary chat to coordinate with the reviewers.
-```
-
-### Reviewer Prompts
-
- Reuse the same reviewer body JSON and inbox-only workflow as in [council-brainstorm-end-to-end-through-bundled-cli.md](./council-brainstorm-end-to-end-through-bundled-cli.md), but target run `council_skill_005`.
-
-## Execution Parameters
-
- use the shared execution contract from [README.md](./README.md)
- use the shared timeout defaults from [README.md](./README.md)
- do not override the default cleanup policy
-
-## Execution Steps
-
-1. Initialize `TMPDIR/coord.db` once through the bundled inbox CLI before launching agents
-2. Inject `skills/council-review/` into `leader`
-3. Inject `skills/inbox/` into the three reviewer agents
-4. Point all agents at the same database path `TMPDIR/coord.db`
-5. Launch `leader`, `architecture-reviewer`, `implementation-reviewer`, and `risk-reviewer` in parallel
-6. Wait for all agents to finish
-7. Resolve `RUN_ID=council_skill_005` and `REPORT_PATH` from the agent outputs
-8. Independently run the validation commands from the main thread
-
-## Validation Commands
-
-```bash
-COUNCIL_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json council report --run council_skill_005 --show all
-test -f REPORT_PATH
-```
-
-## Expected Outcomes
-
- the leader successfully starts `council_skill_005`
- all three reviewers complete their fixed-role tasks
- the report succeeds with explicit `show == ["consensus","majority","minority"]`
- the minority recommendation is present in `grouped_recommendations`
- a markdown report artifact exists on disk
-
-## Assertions
-
- `report.data.show == ["consensus","majority","minority"]`
- `report.data.summary.consensus == 1`
- `report.data.summary.majority == 1`
- `report.data.summary.minority == 1`
- `report.data.grouped_recommendations` length is `3`
- at least one returned recommendation has `bucket == "minority"`
- `REPORT_PATH` exists
-
-## Cleanup
-
- use the default cleanup policy from [README.md](./README.md)
- if the run fails, retain `TMPDIR` and `coord.db` for replay and manual inspection
-
-## Recorded Real Forward Run
-
- recorded on: `2026-03-19`
- execution mode: `real_subagent_forward_test`
- result: `pass`
- evidence root: `/tmp/council-skill-show-all-narrow.Uk0ThB`
- observed run id: `council_skill_005`
- observed thread ids:
- `architecture-reviewer`: `thr_c4cb0a9a5dd142619e854fc0f3864ea8`
- `implementation-reviewer`: `thr_3a54f2e1bc6945f38627958f7f6b4728`
- `risk-reviewer`: `thr_16765453dedf45b4a6ccf4ecfab710db`
- observed report path: `/tmp/council-skill-show-all-narrow.Uk0ThB/.orch/reports/council_skill_005.md`
- evidence summary:
- main-thread `status --run council_skill_005 --json` returned `run.status == "done"` and `task_counts.done == 3`
- main-thread `council report --run council_skill_005 --show all --json` returned `show == ["consensus","majority","minority"]`, summary counts `1/1/1`, and `grouped_recommendations` length `3`
- the returned groups included a `minority` bucket and the markdown artifact existed on disk
@@ -1,126 +0,0 @@
-# Case: `council-reviewer-output-invalid-json-fails-tally-through-bundled-cli`
-
-## Test Type
-
-This is a `forward-test` and a malformed-reviewer-output validation.
-
-The goal is to verify that a leader using the packaged `council-review` skill reaches the stable tally-time `invalid_input` contract when one reviewer completes its inbox task with malformed council JSON.
-
-## Purpose
-
-Validate that all of the following can be true at the same time:
-
- the leader can start a real council run through the bundled council-review skill
- all three reviewer tasks can still reach terminal `done` state through the packaged inbox skill
- one reviewer can return malformed JSON in the result body
- the leader sees `council tally` fail with the expected invalid-input error instead of a silent partial tally
- malformed JSON is exercised as the most realistic representative of the same reviewer-output validation layer that also rejects missing `reviewer_role` and role mismatches
-
-## Preconditions
-
- council-review skill path exists: `COUNCIL_SKILL_PATH=skills/council-review`
- inbox skill path exists: `INBOX_SKILL_PATH=skills/inbox`
- bundled CLI executables exist at `COUNCIL_SKILL_PATH/assets/orch` and `INBOX_SKILL_PATH/assets/inbox`
- use an empty temporary directory `TMPDIR`
- initialize `TMPDIR/coord.db` before launching role agents through `INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json init`
-
-## Agent Topology
-
- `leader`
- `architecture-reviewer`
- `implementation-reviewer`
- `risk-reviewer`
-
-## Inputs
-
-### Leader Prompt
-
-```text
-Use $council-review at COUNCIL_SKILL_PATH to act as leader on the already initialized SQLite DB TMPDIR/coord.db. Only coordinate through the bundled orch CLI from the skill. Workflow: 1) start council run council_skill_008 with a short architecture review prompt, 2) wait until all three reviewers complete, 3) attempt council tally with normal similarity, 4) stop after reporting RUN_ID, exit code, and the error payload you observed. Do not use ordinary chat to coordinate with the reviewers.
-```
-
-### Architecture Reviewer Prompt
-
-```text
-Use $inbox at INBOX_SKILL_PATH to act as architecture-reviewer on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill.
-
-Workflow:
-1) fetch and claim your assigned council task
-2) write TMPDIR/architecture-invalid.json containing exactly this invalid JSON body:
-{"reviewer_role":"architecture-reviewer","findings":[{"title":"Split contracts","summary":"Transport contracts are mixed into UI code.","proposal":"Move API contract definitions into a dedicated module."}
-3) complete the task with done using summary "Review complete" and --body-file TMPDIR/architecture-invalid.json
-4) stop after reporting THREAD_ID and the body file path
-
-Do not use ordinary chat to coordinate with the leader.
-```
-
-### Implementation Reviewer Prompt
-
-```text
-Use $inbox at INBOX_SKILL_PATH to act as implementation-reviewer on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) fetch and claim your assigned council task, 2) complete it with done using this exact JSON body: {"reviewer_role":"implementation-reviewer","findings":[{"title":"Extract API contracts","summary":"Shared transport shapes are duplicated.","proposal":"Move API contract definitions into dedicated module","rationale":"This reduces duplication.","confidence":"medium","tags":["maintainability"],"target_refs":{"repo_path":"."}}]}, 3) stop after reporting THREAD_ID. Do not use ordinary chat to coordinate with the leader.
-```
-
-### Risk Reviewer Prompt
-
-```text
-Use $inbox at INBOX_SKILL_PATH to act as risk-reviewer on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) fetch and claim your assigned council task, 2) complete it with done using this exact JSON body: {"reviewer_role":"risk-reviewer","findings":[{"title":"Add auth integration tests","summary":"Login regressions are hard to catch.","proposal":"Add integration tests for auth flows.","rationale":"This catches regressions earlier.","confidence":"high","tags":["risk"],"target_refs":{"repo_path":"."}}]}, 3) stop after reporting THREAD_ID. Do not use ordinary chat to coordinate with the leader.
-```
-
-## Execution Parameters
-
- use the shared execution contract from [README.md](./README.md)
- use the shared timeout defaults from [README.md](./README.md)
- do not override the default cleanup policy
-
-## Execution Steps
-
-1. Initialize `TMPDIR/coord.db` once through the bundled inbox CLI before launching agents
-2. Inject `skills/council-review/` into `leader`
-3. Inject `skills/inbox/` into the three reviewer agents
-4. Point all agents at the same database path `TMPDIR/coord.db`
-5. Launch `leader`, `architecture-reviewer`, `implementation-reviewer`, and `risk-reviewer` in parallel
-6. Wait for all agents to finish
-7. Independently run the validation commands from the main thread
-
-## Validation Commands
-
-```bash
-COUNCIL_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json council wait --run council_skill_008 --timeout-seconds 2
-COUNCIL_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json council tally --run council_skill_008 --similarity normal
-```
-
-## Expected Outcomes
-
- all three reviewer tasks still reach terminal `done`
- `council wait` returns `all_complete == true`
- `council tally` exits with the stable invalid-input contract
- the error message indicates that reviewer output must be valid JSON
-
-## Assertions
-
- `wait.data.all_complete == true`
- command exit code for `council tally` is `30`
- error code is `invalid_input`
- the error message mentions `reviewer output must be valid JSON`
-
-## Cleanup
-
- use the default cleanup policy from [README.md](./README.md)
- if the run fails, retain `TMPDIR` and `coord.db` for replay and manual inspection
-
-## Recorded Real Forward Run
-
- recorded on: `2026-03-19`
- execution mode: `real_subagent_forward_test`
- result: `pass`
- evidence root: `/tmp/council-reviewer-output-invalid-json-fails-tally-through-bundled-cli.narrow1.i6ZP98`
- observed run id: `council_skill_008`
- observed thread ids:
- `architecture-reviewer`: `thr_350c43fdf8a449228b8611ce5114326d`
- `implementation-reviewer`: `thr_db858b530cb044a7bceeaa417f1cea75`
- `risk-reviewer`: `thr_1c93381b070c47c49e312039b8343655`
- evidence summary:
- main-thread `council wait --run council_skill_008 --timeout-seconds 2 --json` returned `woke == true` and `all_complete == true`
- main-thread `council tally --run council_skill_008 --similarity normal --json` exited with code `30`
- the returned error payload was `invalid_input` with message `reviewer output must be valid JSON`
- this run confirmed the negative path where reviewer tasks are all `done` but tally still fails on stored reviewer-output validation
@@ -1,113 +0,0 @@
-# Case: `council-start-with-target-file-through-bundled-cli`
-
-## Test Type
-
-This is a `forward-test` and a non-prompt target-context validation.
-
-The goal is to verify that a leader using the packaged `council-review` skill can start a council run from explicit `--target-file` context instead of relying on a pure inline prompt.
-
-## Purpose
-
-Validate that all of the following can be true at the same time:
-
- the test runner can prepare a concrete brief file before launching the leader
- the leader can start a council run through the bundled council-review skill using `--target-file`
- the target-file path is persisted in council input metadata
- reviewer tasks are still dispatched normally from the file-based target
-
-## Preconditions
-
- council-review skill path exists: `COUNCIL_SKILL_PATH=skills/council-review`
- inbox skill path exists: `INBOX_SKILL_PATH=skills/inbox`
- bundled CLI executables exist at `COUNCIL_SKILL_PATH/assets/orch` and `INBOX_SKILL_PATH/assets/inbox`
- `sqlite3` is available locally for metadata validation
- use an empty temporary directory `TMPDIR`
- initialize `TMPDIR/coord.db` before launching the leader through `INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json init`
-
-## Agent Topology
-
- `leader`
-
-## Inputs
-
-### Target File Fixture
-
-Create `TMPDIR/brief.md` before launching the leader with contents similar to:
-
-```md
-# Brief
-
-Review the current council-review packaging flow.
-
- Confirm the skill can carry file-based context.
- Focus on documentation quality and report semantics.
-```
-
-### Leader Prompt
-
-```text
-Use $council-review at COUNCIL_SKILL_PATH to act as leader on the already initialized SQLite DB TMPDIR/coord.db. Only coordinate through the bundled orch CLI from the skill. Workflow: 1) start council run council_skill_009 using --target-file TMPDIR/brief.md, --target-type mixed, and --mode review, 2) stop after reporting RUN_ID and the target metadata you observed from the start response. Do not use ordinary chat to simulate reviewer work.
-```
-
-## Execution Parameters
-
- use the shared execution contract from [README.md](./README.md)
- use the shared timeout defaults from [README.md](./README.md)
- do not override the default cleanup policy
-
-## Execution Steps
-
-1. Initialize `TMPDIR/coord.db` once through the bundled inbox CLI before launching agents
-2. Create `TMPDIR/brief.md` with the target file contents
-3. Inject `skills/council-review/` into `leader`
-4. Point the leader at the database path `TMPDIR/coord.db`
-5. Launch the leader
-6. Wait for the leader to finish
-7. Independently run the validation commands from the main thread
-
-## Validation Commands
-
-```bash
-COUNCIL_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json run show --run council_skill_009
-COUNCIL_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json status --run council_skill_009
-sqlite3 TMPDIR/coord.db "SELECT prompt, target_file, repo_path, target_task_id FROM council_inputs WHERE run_id = 'council_skill_009';"
-sqlite3 TMPDIR/coord.db "SELECT acceptance_json FROM tasks WHERE run_id = 'council_skill_009' AND task_id = 'CR1';"
-```
-
-## Expected Outcomes
-
- the leader successfully starts `council_skill_009`
- the run goal references the target file rather than an inline prompt
- the stored council input row keeps `target_file == TMPDIR/brief.md`
- reviewer task dispatch still produces the usual three council tasks
- reviewer task acceptance metadata carries the `target_file` reference forward
-
-## Assertions
-
- `run_show.data.run.goal` mentions `brief.md`
- `status.data.tasks` length is `3`
- `status.data.run.status` is not terminal
- the `council_inputs` row has empty `prompt`, `repo_path`, and `target_task_id`
- the `council_inputs` row has `target_file == "TMPDIR/brief.md"`
- the `CR1` acceptance JSON contains `"target_file":"TMPDIR/brief.md"`
-
-## Cleanup
-
- use the default cleanup policy from [README.md](./README.md)
- if the run fails, retain `TMPDIR`, `brief.md`, and `coord.db` for replay and manual inspection
-
-## Recorded Real Forward Run
-
- recorded on: `2026-03-19`
- execution mode: `real_subagent_forward_test`
- result: `pass`
- evidence root: `/tmp/council-skill-target-file.ikPOLP`
- observed run id: `council_skill_009`
- observed thread ids:
- `CR1`: `thr_32df58f9b55945b899257f583708b7ef`
- `CR2`: `thr_c5f8c552cb1240649546df8386be3668`
- `CR3`: `thr_172eabff13eb48ed9af2deee928a9438`
- evidence summary:
- main-thread `status --run council_skill_009 --json` returned three `dispatched` council tasks and a non-terminal run
- main-thread `sqlite3` validation showed `council_inputs.target_file == "/tmp/council-skill-target-file.ikPOLP/brief.md"` with empty `prompt`, `repo_path`, and `target_task_id`
- main-thread `sqlite3` validation of `CR1` acceptance JSON showed the same `target_file` persisted into the council task payload
@@ -1,120 +0,0 @@
-# Case: `council-tally-strict-keeps-distinct-proposals-through-bundled-cli`
-
-## Test Type
-
-This is a `forward-test` and a strict-similarity tally validation.
-
-The goal is to verify that a leader using the packaged `council-review` skill can request `--similarity strict` and preserve wording-level proposal differences that would normally collapse in `normal` mode.
-
-## Purpose
-
-Validate that all of the following can be true at the same time:
-
- the leader can drive `start -> wait -> tally` through the bundled council-review skill
- three reviewer agents can complete their tasks through the packaged inbox skill
- the architecture and implementation reviewers can submit near-duplicate but not identical proposals
- strict tally keeps all three proposals as separate minority groups
-
-## Preconditions
-
- council-review skill path exists: `COUNCIL_SKILL_PATH=skills/council-review`
- inbox skill path exists: `INBOX_SKILL_PATH=skills/inbox`
- bundled CLI executables exist at `COUNCIL_SKILL_PATH/assets/orch` and `INBOX_SKILL_PATH/assets/inbox`
- use an empty temporary directory `TMPDIR`
- initialize `TMPDIR/coord.db` before launching role agents through `INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json init`
-
-## Agent Topology
-
- `leader`
- `architecture-reviewer`
- `implementation-reviewer`
- `risk-reviewer`
-
-## Inputs
-
-### Leader Prompt
-
-```text
-Use $council-review at COUNCIL_SKILL_PATH to act as leader on the already initialized SQLite DB TMPDIR/coord.db. Only coordinate through the bundled orch CLI from the skill. Workflow: 1) start council run council_skill_007 with a short architecture review prompt, 2) wait until all three reviewers complete, 3) tally with --similarity strict, 4) stop after reporting RUN_ID, tally counts, and the grouped proposals you observed. Do not use ordinary chat to coordinate with the reviewers.
-```
-
-### Architecture Reviewer Prompt
-
-```text
-Use $inbox at INBOX_SKILL_PATH to act as architecture-reviewer on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) fetch and claim your assigned council task, 2) complete it with done using this exact JSON body: {"reviewer_role":"architecture-reviewer","findings":[{"title":"Split contracts","summary":"Transport contracts are mixed into UI code.","proposal":"Move API contract definitions into a dedicated module.","rationale":"This lowers coupling.","confidence":"high","tags":["architecture"],"target_refs":{"repo_path":"."}}]}, 3) stop after reporting THREAD_ID. Do not use ordinary chat to coordinate with the leader.
-```
-
-### Implementation Reviewer Prompt
-
-```text
-Use $inbox at INBOX_SKILL_PATH to act as implementation-reviewer on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) fetch and claim your assigned council task, 2) complete it with done using this exact JSON body: {"reviewer_role":"implementation-reviewer","findings":[{"title":"Extract API contracts","summary":"Shared transport shapes are duplicated.","proposal":"Move API contract definitions into dedicated module","rationale":"This reduces duplication.","confidence":"medium","tags":["maintainability"],"target_refs":{"repo_path":"."}}]}, 3) stop after reporting THREAD_ID. Do not use ordinary chat to coordinate with the leader.
-```
-
-### Risk Reviewer Prompt
-
-```text
-Use $inbox at INBOX_SKILL_PATH to act as risk-reviewer on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) fetch and claim your assigned council task, 2) complete it with done using this exact JSON body: {"reviewer_role":"risk-reviewer","findings":[{"title":"Add auth integration tests","summary":"Login regressions are hard to catch.","proposal":"Add integration tests for auth flows.","rationale":"This catches regressions earlier.","confidence":"high","tags":["risk"],"target_refs":{"repo_path":"."}}]}, 3) stop after reporting THREAD_ID. Do not use ordinary chat to coordinate with the leader.
-```
-
-## Execution Parameters
-
- use the shared execution contract from [README.md](./README.md)
- use the shared timeout defaults from [README.md](./README.md)
- do not override the default cleanup policy
-
-## Execution Steps
-
-1. Initialize `TMPDIR/coord.db` once through the bundled inbox CLI before launching agents
-2. Inject `skills/council-review/` into `leader`
-3. Inject `skills/inbox/` into the three reviewer agents
-4. Point all agents at the same database path `TMPDIR/coord.db`
-5. Launch `leader`, `architecture-reviewer`, `implementation-reviewer`, and `risk-reviewer` in parallel
-6. Wait for all agents to finish
-7. Independently run the validation commands from the main thread
-
-## Validation Commands
-
-```bash
-COUNCIL_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json council wait --run council_skill_007 --timeout-seconds 2
-COUNCIL_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json council tally --run council_skill_007 --similarity strict
-```
-
-## Expected Outcomes
-
- all three reviewers complete their fixed-role tasks
- `council wait` returns `all_complete == true`
- `council tally` succeeds with `similarity == "strict"`
- the two nearly identical contract proposals remain separate rather than merging
- every resulting recommendation lands in `minority`
-
-## Assertions
-
- `wait.data.all_complete == true`
- `tally.data.similarity == "strict"`
- `tally.data.counts.minority == 3`
- `tally.data.grouped_recommendations` length is `3`
- every returned recommendation has `bucket == "minority"`
- the returned proposal set contains `Move API contract definitions into a dedicated module.`
- the returned proposal set contains `Move API contract definitions into dedicated module`
- the returned proposal set contains `Add integration tests for auth flows.`
-
-## Cleanup
-
- use the default cleanup policy from [README.md](./README.md)
- if the run fails, retain `TMPDIR` and `coord.db` for replay and manual inspection
-
-## Recorded Real Forward Run
-
- recorded on: `2026-03-19`
- execution mode: `real_subagent_forward_test`
- result: `pass`
- evidence root: `/tmp/council-tally-strict-keeps-distinct-proposals-through-bundled-cli.narrow4.UCbqOc`
- observed run id: `council_skill_007`
- observed thread ids:
- `architecture-reviewer`: `thr_9e153f61692b4475a55f5c3068842ea5`
- `implementation-reviewer`: `thr_abbd9a2961374b13b3d3e27720fe27ab`
- `risk-reviewer`: `thr_3f2d64211f274f64b606bd8b8c6be5f7`
- evidence summary:
- main-thread `council wait --run council_skill_007 --timeout-seconds 2 --json` returned `woke == true` and `all_complete == true`
- main-thread `council tally --run council_skill_007 --similarity strict --json` returned `similarity == "strict"` and `counts.minority == 3`
- the returned proposal set preserved all three distinct values, including both `Move API contract definitions into a dedicated module.` and `Move API contract definitions into dedicated module`
@@ -1,88 +0,0 @@
-# Case: `council-unanimous-only-default-report-through-bundled-cli`
-
-## Test Type
-
-This is a `forward-test` and a unanimous-only reporting validation.
-
-The goal is to verify that a leader using the packaged `council-review` skill can run a unanimous-only council and observe the expected default report behavior after tally.
-
-## Purpose
-
-Validate that all of the following can be true at the same time:
-
- the leader can start a council run with `--only-unanimous`
- three reviewer agents can complete their tasks through the packaged inbox skill
- the leader can tally and report through the bundled council-review skill
- the final report defaults to `consensus` only while preserving the full summary counts
-
-## Preconditions
-
- council-review skill path exists: `COUNCIL_SKILL_PATH=skills/council-review`
- inbox skill path exists: `INBOX_SKILL_PATH=skills/inbox`
- bundled CLI executables exist at `COUNCIL_SKILL_PATH/assets/orch` and `INBOX_SKILL_PATH/assets/inbox`
- use an empty temporary directory `TMPDIR`
- initialize `TMPDIR/coord.db` before launching role agents through `INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json init`
-
-## Agent Topology
-
- `leader`
- `architecture-reviewer`
- `implementation-reviewer`
- `risk-reviewer`
-
-## Inputs
-
-### Leader Prompt
-
-```text
-Use $council-review at COUNCIL_SKILL_PATH to act as leader on the already initialized SQLite DB TMPDIR/coord.db. Only coordinate through the bundled orch CLI from the skill. Workflow: 1) start council run council_skill_002 with --only-unanimous, 2) wait until all three reviewers complete, 3) tally with normal similarity, 4) report with default settings, 5) stop after reporting RUN_ID and the default show buckets you observed. Do not use ordinary chat to coordinate with the reviewers.
-```
-
-### Reviewer Prompts
-
- Reuse the same reviewer body JSON and inbox-only workflow as in [council-brainstorm-end-to-end-through-bundled-cli.md](./council-brainstorm-end-to-end-through-bundled-cli.md), but target run `council_skill_002`.
-
-## Execution Parameters
-
- use the shared execution contract from [README.md](./README.md)
- use the shared timeout defaults from [README.md](./README.md)
- do not override the default cleanup policy
-
-## Execution Steps
-
-1. Initialize `TMPDIR/coord.db` once through the bundled inbox CLI before launching agents
-2. Inject `skills/council-review/` into `leader`
-3. Inject `skills/inbox/` into the three reviewer agents
-4. Point all agents at the same database path `TMPDIR/coord.db`
-5. Launch `leader`, `architecture-reviewer`, `implementation-reviewer`, and `risk-reviewer` in parallel
-6. Wait for all agents to finish
-7. Resolve `RUN_ID=council_skill_002` from the agent outputs
-8. Independently run the validation commands from the main thread
-
-## Validation Commands
-
-```bash
-COUNCIL_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json council report --run council_skill_002
-COUNCIL_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json status --run council_skill_002
-```
-
-## Expected Outcomes
-
- the unanimous-only run completes successfully
- the report default `show` value is only `consensus`
- the underlying summary still contains `consensus`, `majority`, and `minority` counts
- only the consensus group is returned in `grouped_recommendations`
-
-## Assertions
-
- `report.data.show == ["consensus"]`
- `report.data.summary.consensus == 1`
- `report.data.summary.majority == 1`
- `report.data.summary.minority == 1`
- `report.data.grouped_recommendations` length is `1`
- the sole returned recommendation has `bucket == "consensus"`
-
-## Cleanup
-
- use the default cleanup policy from [README.md](./README.md)
- if the run fails, retain `TMPDIR` and `coord.db` for replay and manual inspection
@@ -1,77 +0,0 @@
-# Case: `council-wait-timeout-through-bundled-cli`
-
-## Test Type
-
-This is a `forward-test` and a timeout-path council workflow validation.
-
-The goal is to verify that a leader using the packaged `council-review` skill sees the expected timeout contract when reviewer tasks do not complete.
-
-## Purpose
-
-Validate that all of the following can be true at the same time:
-
- the leader can start a council run through the bundled skill CLI
- the leader can call `council wait` with a short timeout
- the command reports `woke == false` and `all_complete == false`
- reviewer task metadata remains visible for later follow-up
-
-## Preconditions
-
- council-review skill path exists: `COUNCIL_SKILL_PATH=skills/council-review`
- bundled CLI executable exists at `COUNCIL_SKILL_PATH/assets/orch`
- inbox skill path exists: `INBOX_SKILL_PATH=skills/inbox`
- use an empty temporary directory `TMPDIR`
- initialize `TMPDIR/coord.db` before launching role agents through `INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json init`
-
-## Agent Topology
-
- `leader`
-
-## Inputs
-
-### Leader Prompt
-
-```text
-Use $council-review at COUNCIL_SKILL_PATH to act as leader on the already initialized SQLite DB TMPDIR/coord.db. Only coordinate through the bundled orch CLI from the skill. Workflow: 1) start council run council_skill_003 with a short review target, 2) immediately call council wait with a short timeout such as 1 second, 3) stop after reporting RUN_ID and the wait result you observed. Do not use ordinary chat to simulate reviewer output.
-```
-
-## Execution Parameters
-
- use the shared execution contract from [README.md](./README.md)
- override the council wait timeout to a short interval such as `1s`
- do not override the default cleanup policy
-
-## Execution Steps
-
-1. Initialize `TMPDIR/coord.db` once through the bundled inbox CLI before launching agents
-2. Inject `skills/council-review/` into `leader`
-3. Point the leader at the database path `TMPDIR/coord.db`
-4. Launch the leader
-5. Wait for the leader to finish
-6. Independently run the validation commands from the main thread
-
-## Validation Commands
-
-```bash
-COUNCIL_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json council wait --run council_skill_003 --timeout-seconds 1
-COUNCIL_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json status --run council_skill_003
-```
-
-## Expected Outcomes
-
- the leader successfully starts `council_skill_003`
- `council wait` times out cleanly
- the wait response still includes three reviewer statuses
- the run remains non-terminal because reviewers have not completed
-
-## Assertions
-
- `wait.data.woke == false`
- `wait.data.all_complete == false`
- `wait.data.reviewers` length is `3`
- `status.data.run.status` is not `done`
-
-## Cleanup
-
- use the default cleanup policy from [README.md](./README.md)
- if the run fails, retain `TMPDIR` and `coord.db` for replay and manual inspection
@@ -1,162 +0,0 @@
-# Inbox Skill Test Plan
-
-## Purpose
-
-This directory tracks human-readable test plans for the `skills/inbox/` Codex skill bundle.
-
-These documents are not command-contract specs for the `inbox` CLI itself.
-That coverage already lives under [../inbox/](../inbox/).
-
-This directory exists to describe a different test surface:
-
- whether an agent can actually use the packaged inbox skill
- whether multiple agents can coordinate through the bundled CLI asset
- whether a real skill-guided conversation reaches the expected inbox state
-
-## Test Model
-
- `README.md` is the index for this directory
- each skill test case lives in its own Markdown file
- use stable case slugs in filenames
-
-## Shared Execution Contract
-
-Use these defaults unless a case file explicitly overrides them:
-
- run the scenario with real subagents, not simulated transcripts
- inject the same skill bundle into every participating agent
- launch all role agents in parallel when the scenario depends on agent-to-agent timing
- require every agent to coordinate through the bundled CLI and shared SQLite DB instead of ordinary chat
- validate the final inbox state independently from the main thread after the agents stop
-
-## How An Agent Runs These Cases
-
-Use one test-runner agent to execute each case.
-
-The test-runner agent is responsible for:
-
- reading this `README.md` first, then one specific case file
- creating an isolated temporary directory and SQLite DB path for that run
- launching the role agents described in `Agent Topology`
- injecting the same `skills/inbox/` bundle into every role agent
- passing each role agent the prompt text from the case file with concrete values substituted for `SKILL_PATH`, `TMPDIR`, and `THREAD_ID` when needed
- coordinating launch order or parallel start according to the case file
- collecting agent final summaries as evidence
- resolving the final `THREAD_ID`
- running the `Validation Commands` from the main thread after the role agents stop
- comparing the observed results against `Expected Outcomes` and `Assertions`
- returning a final pass/fail judgment with concrete evidence
-
-The role agents are responsible for:
-
- acting only within the role assigned in the case file
- using the injected inbox skill rather than ad hoc repository discovery
- coordinating through the bundled CLI and shared DB
- reporting the concrete thread id, key command outcomes, and final observed state back to the test-runner agent
-
-The test-runner agent should treat a case as passed only when:
-
- all role agents reach a final state without violating the case contract
- the independent validation commands succeed
- the final inbox state matches the assertions in the case file
-
-The test-runner agent should treat a case as failed when:
-
- any role agent times out or stalls
- a required inbox action is skipped
- a role agent falls back to ordinary chat for critical coordination
- the final inbox state conflicts with the documented assertions
-
-The test-runner agent should report results in this shape:
-
- `case`
- `db_path`
- `thread_id`
- `result`: `pass` or `fail`
- `agent_summaries`
- `validation_evidence`
- `assertion_checklist`
- `notes`
-
-## Default Timeouts
-
-Use these defaults unless a case file explicitly overrides them:
-
- per-agent timeout: `3m`
- overall scenario timeout: `5m`
- async wait margin for the main thread: `30s`
-
-## Default Failure Conditions
-
-Treat the test as failed if any of the following happens:
-
- any required agent does not reach a final state before timeout
- any required inbox command returns a non-success result unless the case expects that failure
- the final `show` output does not match the expected thread state
- the expected message sequence or key message bodies do not appear
- the agents fall back to ordinary chat for critical coordination instead of inbox messages
-
-## Evidence Capture
-
-Collect at least the following artifacts for every run:
-
- agent final summaries
- final `show --thread THREAD_ID --json` output
- at least one independent listing or lookup command such as `list` or `fetch`
- the temporary DB path and resolved thread id
-
-## Cleanup Policy
-
-Use these defaults unless a case file explicitly overrides them:
-
- keep the temporary DB and working directory on failure for debugging
- cleanup the temporary DB and working directory on success only if the caller does not need replay artifacts
-
-## Per-Case Template
-
-Each case file should use this structure:
-
- `Test Type`
- `Purpose`
- `Preconditions`
- `Agent Topology`
- `Inputs`
- `Execution Parameters`
- `Execution Steps`
- `Validation Commands`
- `Expected Outcomes`
- `Assertions`
- `Cleanup`
- `Recorded Example Run` when a real run has already been captured
-
-## Case Files
-
-| Case Slug | File | Coverage Note |
-| --- | --- | --- |
-| `multi-agent-roundtrip-through-bundled-cli` | [multi-agent-roundtrip-through-bundled-cli.md](./multi-agent-roundtrip-through-bundled-cli.md) | validates that two agents can use the bundled inbox skill to complete a blocked question and done result roundtrip |
-| `parallel-workers-claim-conflict-through-bundled-cli` | [parallel-workers-claim-conflict-through-bundled-cli.md](./parallel-workers-claim-conflict-through-bundled-cli.md) | validates that two workers using the skill observe a real `lease_conflict` on the same thread |
-| `blocked-worker-timeout-without-reply-through-bundled-cli` | [blocked-worker-timeout-without-reply-through-bundled-cli.md](./blocked-worker-timeout-without-reply-through-bundled-cli.md) | validates that a blocked worker using the skill receives the expected `wait-reply` timeout outcome when no leader reply arrives |
-| `leader-cancels-claimed-thread-through-bundled-cli` | [leader-cancels-claimed-thread-through-bundled-cli.md](./leader-cancels-claimed-thread-through-bundled-cli.md) | validates that a leader can cancel an actively claimed thread and that both agents observe the cancelled terminal state |
-| `artifact-roundtrip-through-bundled-cli` | [artifact-roundtrip-through-bundled-cli.md](./artifact-roundtrip-through-bundled-cli.md) | validates that bundled CLI usage through the skill preserves body-file and artifact data across task and result messages |
-
-## Scope
-
-In scope:
-
- explicit `$inbox` skill invocation
- bundled `./assets/inbox` CLI usage
- shared SQLite DB coordination between multiple agents
- end-to-end thread state and message history validation
- negative-path skill scenarios such as lease conflicts and reply timeouts
- skill-guided artifact and body-file roundtrips
-
-Out of scope:
-
- per-command flag and JSON contract coverage
- store-level race conditions
- implicit skill triggering without `$inbox`
-
-## Relationship To Other Test Docs
-
- [../inbox/](../inbox/) covers CLI command behavior
- this directory covers skill-guided multi-agent behavior on top of that CLI
@@ -1,83 +0,0 @@
-# Case: `artifact-roundtrip-through-bundled-cli`
-
-## Test Type
-
-This is a `forward-test` and an artifact-preservation validation.
-
-The goal is to verify that agents using the packaged inbox skill can exchange body-file content and artifacts through the bundled CLI without losing message data.
-
-## Purpose
-
-Validate that all of the following can be true at the same time:
-
- the leader can create task input files and send them through the bundled CLI
- the worker can inspect those artifacts through inbox history
- the worker can return a final result using body-file or artifact inputs
- the final thread history preserves both task-side and result-side file references
-
-## Preconditions
-
- skill path exists: `SKILL_PATH=skills/inbox`
- bundled CLI executable exists: `SKILL_PATH/assets/inbox`
- use an empty temporary directory `TMPDIR`
- test database path is `TMPDIR/coord.db`
-
-## Agent Topology
-
- `leader`
- `worker-a`
-
-## Inputs
-
-### Leader Prompt
-
-```text
-Use $inbox at SKILL_PATH to act as leader on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) initialize the DB, 2) create a small task file under TMPDIR, 3) send one task to worker-a using body-file plus at least one artifact and artifact metadata, 4) wait until worker-a marks the thread done, 5) inspect the final thread with show, 6) stop. Do not use ordinary chat to coordinate with the other agent.
-```
-
-### Worker Prompt
-
-```text
-Use $inbox at SKILL_PATH to act as worker-a on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) fetch and claim the task, 2) inspect the task message with show and confirm the artifact is visible, 3) create a small result file under TMPDIR, 4) finish the thread with done using body-file or artifact input, 5) stop after reporting what files were preserved. Do not use ordinary chat to coordinate with the other agent.
-```
-
-## Execution Parameters
-
- use the shared execution contract from [README.md](./README.md)
- use the shared timeout defaults from [README.md](./README.md)
- do not override the default cleanup policy
-
-## Execution Steps
-
-1. Inject the same `skills/inbox/` skill into both real agents
-2. Point both agents at the same database path `TMPDIR/coord.db`
-3. Launch `leader` and `worker-a` in parallel
-4. Wait for both agents to finish
-5. Resolve `THREAD_ID` from the agent outputs or inbox history
-6. Independently run the validation commands from the main thread
-
-## Validation Commands
-
-```bash
-SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json show --thread THREAD_ID
-```
-
-## Expected Outcomes
-
- `leader` successfully creates a task file and sends it through `body-file`
- the initial task message contains at least one artifact reference
- `worker-a` successfully inspects the task artifact through `show`
- `worker-a` completes the thread with `done`
- the final `show` output preserves task-side and result-side file content or artifact references
-
-## Assertions
-
- the first task message contains non-empty body content sourced from a file
- the first task message contains at least one artifact entry
- the final `result` message contains either body-file content or at least one artifact entry
- the final thread status is `done`
-
-## Cleanup
-
- use the default cleanup policy from [README.md](./README.md)
- if the run fails, retain `TMPDIR`, created files, and `coord.db` for replay and manual inspection
@@ -1,88 +0,0 @@
-# Case: `blocked-worker-timeout-without-reply-through-bundled-cli`
-
-## Test Type
-
-This is a `forward-test` and a timeout-path skill validation.
-
-The goal is to verify that a blocked worker using the bundled inbox skill sees the correct `wait-reply` timeout behavior when no answer arrives.
-
-## Purpose
-
-Validate that all of the following can be true at the same time:
-
- a worker can use the skill to fetch, claim, and block a real thread
- the worker can call `wait-reply` through the bundled CLI
- the leader intentionally does not answer
- the worker receives the expected timeout contract instead of silently succeeding
- the thread remains in a blocked state with the question preserved
-
-## Preconditions
-
- skill path exists: `SKILL_PATH=skills/inbox`
- bundled CLI executable exists: `SKILL_PATH/assets/inbox`
- use an empty temporary directory `TMPDIR`
- test database path is `TMPDIR/coord.db`
-
-## Agent Topology
-
- `leader`
- `worker-a`
-
-## Inputs
-
-### Leader Prompt
-
-```text
-Use $inbox at SKILL_PATH to act as leader on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) initialize the DB, 2) send exactly one task to worker-a, 3) monitor until worker-a asks one blocked question, 4) intentionally do not reply, 5) stop after confirming the thread is still blocked. Do not use ordinary chat to coordinate with the other agent.
-```
-
-### Worker Prompt
-
-```text
-Use $inbox at SKILL_PATH to act as worker-a on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) fetch pending work, 2) claim it, 3) send a blocked update with one precise question, 4) call wait-reply with a short timeout, 5) stop after reporting the timeout result exactly as observed. Do not use ordinary chat to coordinate with the other agent.
-```
-
-## Execution Parameters
-
- use the shared execution contract from [README.md](./README.md)
- override the worker-side wait timeout to a short interval such as `10s`
- keep the default cleanup policy
-
-## Execution Steps
-
-1. Inject the same `skills/inbox/` skill into both real agents
-2. Point both agents at the same database path `TMPDIR/coord.db`
-3. Launch `leader` and `worker-a` in parallel
-4. Wait for both agents to finish
-5. Resolve `THREAD_ID` from the agent outputs or inbox history
-6. Independently run the validation commands from the main thread
-
-## Validation Commands
-
-```bash
-SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json show --thread THREAD_ID
-SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json list --status blocked
-```
-
-## Expected Outcomes
-
- `leader` successfully creates one thread for `worker-a`
- `worker-a` successfully fetches and claims it
- `worker-a` emits one blocked `question`
- the blocked question is preserved at least in `message.payload_json.question`
- `worker-a` runs `wait-reply` and receives the no-match timeout contract
- the leader emits no `answer` message
- the final thread status remains `blocked`
-
-## Assertions
-
- the worker reports exit code `10` and JSON error code `no_matching_work` from `wait-reply`
- `show` includes the blocked `question` message
- `show.data.messages[*].payload_json.question` contains `Should logging go to stdout or stderr?`
- `show` does not include any `answer` message
- `list --status blocked` returns the thread
-
-## Cleanup
-
- use the default cleanup policy from [README.md](./README.md)
- if the run fails, retain `TMPDIR` and `coord.db` for replay and manual inspection
@@ -1,84 +0,0 @@
-# Case: `leader-cancels-claimed-thread-through-bundled-cli`
-
-## Test Type
-
-This is a `forward-test` and a terminal-state intervention validation.
-
-The goal is to verify that a leader and worker can both observe a thread transition to `cancelled` through the bundled inbox skill while the thread is actively claimed.
-
-## Purpose
-
-Validate that all of the following can be true at the same time:
-
- the worker can fetch and claim a real thread through the skill
- the leader can cancel that thread through the same bundled CLI
- the final thread state is `cancelled`
- both parties can inspect the terminal state from inbox history
-
-## Preconditions
-
- skill path exists: `SKILL_PATH=skills/inbox`
- bundled CLI executable exists: `SKILL_PATH/assets/inbox`
- use an empty temporary directory `TMPDIR`
- test database path is `TMPDIR/coord.db`
-
-## Agent Topology
-
- `leader`
- `worker-a`
-
-## Inputs
-
-### Leader Prompt
-
-```text
-Use $inbox at SKILL_PATH to act as leader on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) initialize the DB, 2) send exactly one task to worker-a, 3) wait until worker-a has claimed the thread or reported in_progress, 4) cancel the thread with a clear reason, 5) inspect the final thread with show, 6) stop. Do not use ordinary chat to coordinate with the other agent.
-```
-
-### Worker Prompt
-
-```text
-Use $inbox at SKILL_PATH to act as worker-a on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) fetch pending work, 2) claim it, 3) send an in_progress update, 4) keep monitoring the thread until it reaches a terminal state, 5) stop after reporting the final status you observed. Do not use ordinary chat to coordinate with the other agent.
-```
-
-## Execution Parameters
-
- use the shared execution contract from [README.md](./README.md)
- use the shared timeout defaults from [README.md](./README.md)
- do not override the default cleanup policy
-
-## Execution Steps
-
-1. Inject the same `skills/inbox/` skill into both real agents
-2. Point both agents at the same database path `TMPDIR/coord.db`
-3. Launch `leader` and `worker-a` in parallel
-4. Wait for both agents to finish
-5. Resolve `THREAD_ID` from the agent outputs or inbox history
-6. Independently run the validation commands from the main thread
-
-## Validation Commands
-
-```bash
-SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json show --thread THREAD_ID
-SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json list --status cancelled
-```
-
-## Expected Outcomes
-
- `worker-a` successfully claims the thread
- `worker-a` emits one `progress` message
- `leader` successfully emits `cancel` with a reason
- the final thread status is `cancelled`
- the worker reports that it observed the cancelled terminal state
-
-## Assertions
-
- `show` contains at least `task -> event -> progress -> control`
- the final thread status is `cancelled`
- the terminal message or thread history captures the cancel reason
- `list --status cancelled` returns the thread
-
-## Cleanup
-
- use the default cleanup policy from [README.md](./README.md)
- if the run fails, retain `TMPDIR` and `coord.db` for replay and manual inspection
@@ -1,106 +0,0 @@
-# Case: `multi-agent-roundtrip-through-bundled-cli`
-
-## Test Type
-
-This is a `forward-test` and a multi-agent end-to-end skill validation.
-
-The goal is not to validate one CLI subcommand in isolation. The goal is to validate that two real agents can complete a closed-loop coordination flow through the packaged `skills/inbox/` skill and bundled CLI.
-
-## Purpose
-
-Validate that all of the following can be true at the same time:
-
- both agents can explicitly use `$inbox`
- both agents coordinate through the bundled `./assets/inbox` against the same SQLite DB
- the worker follows the protocol `fetch -> claim -> update -> wait-reply -> done`
- the leader follows the protocol `init -> send -> show/reply -> show`
- the final inbox thread state and message history match the expected contract
-
-## Preconditions
-
- skill path exists: `SKILL_PATH=skills/inbox`
- bundled CLI executable exists: `SKILL_PATH/assets/inbox`
- use an empty temporary directory `TMPDIR`
- test database path is `TMPDIR/coord.db`
-
-## Agent Topology
-
- `leader`
- `worker-a`
-
-## Inputs
-
-### Leader Prompt
-
-```text
-Use $inbox at SKILL_PATH to act as leader on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) initialize the DB, 2) send exactly one task to worker-a asking them to implement a small logging choice, 3) monitor the thread until worker-a asks one blocked question, 4) answer the blocked question with a clear decision ('use stdout'), 5) wait until worker-a marks the thread done, 6) inspect the final thread with show, then stop. Do not use ordinary chat to coordinate with the other agent.
-```
-
-### Worker Prompt
-
-```text
-Use $inbox at SKILL_PATH to act as worker-a on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) wait until there is pending work for worker-a, 2) fetch it, 3) claim it, 4) send an in_progress update, 5) send a blocked update with one precise question asking whether logging should go to stdout or stderr, 6) wait for a reply, 7) finish the task with done using the received decision, 8) stop. Do not use ordinary chat to coordinate with the other agent.
-```
-
-## Execution Parameters
-
- use the shared execution contract from [README.md](./README.md)
- use the shared timeout defaults from [README.md](./README.md)
- do not override the default cleanup policy
-
-## Execution Steps
-
-1. Inject the same `skills/inbox/` skill into both real agents
-2. Point both agents at the same database path `TMPDIR/coord.db`
-3. Launch `leader` and `worker-a` in parallel
-4. Wait for both agents to finish
-5. Resolve `THREAD_ID` from the agent outputs or inbox history
-6. Independently run the validation commands from the main thread
-
-## Validation Commands
-
-```bash
-SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json show --thread THREAD_ID
-SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json list --assigned-to worker-a
-```
-
-## Expected Outcomes
-
- `leader` successfully runs `init`
- `leader` successfully `send`s one new thread to `worker-a`
- `worker-a` successfully `fetch`es that thread and successfully `claim`s it
- `worker-a` emits one `progress` message
- `worker-a` emits one `question` message focused on `stdout` vs `stderr`
- `leader` successfully emits one `answer` message with the explicit decision `Use stdout.`
- `worker-a` successfully consumes that answer through `wait-reply`
- `worker-a` successfully emits `done`
- `show` returns `thread.status == "done"`
-
-## Assertions
-
- `show` contains at least the following message kinds in order:
-  - `task`
-  - `event` (`thread claimed`)
-  - `progress`
-  - `question`
-  - `answer`
-  - `result`
- `question.body == "Should logging go to stdout or stderr?"`
- `answer.body == "Use stdout."`
- the final `result` message explicitly states that logging uses `stdout`
- `list --assigned-to worker-a` shows the thread and its status is `done`
- coordination happens primarily through the inbox thread rather than ordinary chat
-
-## Cleanup
-
- use the default cleanup policy from [README.md](./README.md)
- if the run fails, retain `TMPDIR` and `coord.db` for replay and manual inspection
-
-## Recorded Example Run
-
-This case already has one reference forward-test run:
-
- DB: `/tmp/inbox-skill-fwd.j9kKvp/coord.db`
- Thread: `thr_48d6f6a77eff4c2e88ce80e8fdc05da3`
-
-That run passed. The thread history contained `task -> event -> progress -> question -> answer -> result`, and the final thread state was `done`.
@@ -1,93 +0,0 @@
-# Case: `parallel-workers-claim-conflict-through-bundled-cli`
-
-## Test Type
-
-This is a `forward-test` and a multi-agent negative-path validation.
-
-The goal is to verify that two workers using the same bundled inbox skill can exercise a real claim conflict through the SQLite-backed inbox instead of simulating the outcome.
-
-## Purpose
-
-Validate that all of the following can be true at the same time:
-
- multiple workers can use the same `skills/inbox/` bundle against one shared DB
- one worker can successfully claim the thread
- a competing worker can observe and attempt to claim that same thread
- the competing worker receives the expected `lease_conflict` contract
- the thread remains owned by the original worker
-
-## Preconditions
-
- skill path exists: `SKILL_PATH=skills/inbox`
- bundled CLI executable exists: `SKILL_PATH/assets/inbox`
- use an empty temporary directory `TMPDIR`
- test database path is `TMPDIR/coord.db`
-
-## Agent Topology
-
- `leader`
- `worker-a`
- `worker-b`
-
-## Inputs
-
-### Leader Prompt
-
-```text
-Use $inbox at SKILL_PATH to act as leader on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) initialize the DB, 2) send exactly one task assigned to worker-a, 3) stop after confirming the thread exists and report the thread id. Do not use ordinary chat to coordinate with the workers.
-```
-
-### Worker A Prompt
-
-```text
-Use $inbox at SKILL_PATH to act as worker-a on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) wait for pending work assigned to worker-a, 2) fetch it, 3) claim it, 4) stop after confirming the claim succeeded and report the thread id and lease result. Do not use ordinary chat to coordinate with the other agents.
-```
-
-### Worker B Prompt
-
-```text
-Use $inbox at SKILL_PATH to act as worker-b on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. This is a conflict test. Workflow: 1) wait until there is a thread assigned to worker-a visible through inbox inspection, 2) resolve its thread id, 3) attempt to claim that thread as worker-b, 4) stop after reporting the exact error contract you observed. Do not use ordinary chat to coordinate with the other agents.
-```
-
-## Execution Parameters
-
- use the shared execution contract from [README.md](./README.md)
- use the shared timeout defaults from [README.md](./README.md)
- do not override the default cleanup policy
-
-## Execution Steps
-
-1. Inject the same `skills/inbox/` skill into all three real agents
-2. Point all three agents at the same database path `TMPDIR/coord.db`
-3. Launch `leader`, `worker-a`, and `worker-b` in parallel
-4. Wait for all agents to finish
-5. Resolve `THREAD_ID` from the agent outputs or inbox history
-6. Independently run the validation commands from the main thread
-
-## Validation Commands
-
-```bash
-SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json show --thread THREAD_ID
-SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json list --assigned-to worker-a
-```
-
-## Expected Outcomes
-
- `leader` successfully runs `init`
- `leader` successfully creates one thread for `worker-a`
- `worker-a` successfully `claim`s that thread
- `worker-b` attempts `claim --agent worker-b --thread THREAD_ID`
- `worker-b` receives exit code `20` and JSON error code `lease_conflict`
- the final thread remains assigned to `worker-a`
-
-## Assertions
-
- `show` contains a worker-side `event` message with summary `thread claimed`
- the final thread status is still `claimed` or `in_progress`, not transferred to `worker-b`
- `list --assigned-to worker-a` still returns the thread
- no agent reports successful ownership transfer to `worker-b`
-
-## Cleanup
-
- use the default cleanup policy from [README.md](./README.md)
- if the run fails, retain `TMPDIR` and `coord.db` for replay and manual inspection
@@ -1,71 +0,0 @@
-# Inbox Markdown Test Plan
-
-## Purpose
-
-This directory contains the human-readable Markdown test plan for the `inbox` CLI.
-
-It complements automated Go tests. The goal is not to restate implementation details, but to preserve the user-visible CLI contract in a form that can be reviewed, extended, and executed manually when needed.
-
-## Directory Rules
-
- one folder per command or shared area
- each folder keeps a `README.md` entrypoint
- command folders use `README.md` as an index only
- each command test case lives in its own Markdown file named after the case slug
- no numeric test IDs
- each command case is identified by its concrete file path
-
-Case file naming pattern:
-
-```text
-<case-slug>.md
-```
-
-## Authoring Principles
-
- focus on externally visible behavior of the CLI
- prefer stable command examples that a new agent can replay against a temp database
- describe both success shape and failure contract
- when a case already exists in automated Go tests, reuse its scenario rather than inventing a new one
- keep terminology consistent with command flags and JSON fields exposed by the CLI
-
-## Common Execution Model
-
-Most cases in this directory assume the same baseline:
-
-1. create an isolated temporary directory
-2. choose a database path such as `TMPDIR/coord.db`
-3. run `inbox --db TMPDIR/coord.db --json init`
-4. run the target command sequence against that database
-
-Unless a case says otherwise:
-
- commands should use `--json`
- assertions should check both exit code and JSON payload
- examples may use explicit `--agent`, or rely on the root `--agent` flag when that is the behavior under test
-
-## Folder Map
-
- `README.md`: global conventions and glossary
- `_shared/README.md`: reusable fixtures, JSON assertions, exit codes, payload rules
- `workflows/README.md`: cross-command end-to-end scenarios
- per-command folders: command-specific index `README.md` files plus one case document per test case
-
-## Glossary
-
- `thread`: the durable coordination unit tracked by `thread_id`
- `message`: an event-bearing entry appended to a thread
- `artifact`: a file attachment associated with a message
- `read cursor`: the per-agent marker used by unread flows
- `lease`: the temporary ownership granted by `claim` and extended by `renew`
- `terminal state`: a thread state such as `done`, `failed`, or `cancelled`
-
-## Relationship To Automated Tests
-
-The current best executable reference is [integration_test.go](../../../packages/inbox-runtime/internal/cli/inbox/integration_test.go).
-
-When this Markdown plan is expanded:
-
- prefer matching an existing automated scenario first
- record any additional manual-only contract coverage explicitly in the relevant command case file and keep the folder index synchronized
- keep `docs/tests/inbox/ROADMAP.md` synchronized with authored files and case slugs
@@ -1,364 +0,0 @@
-# Inbox Test Documentation Roadmap
-
-## Purpose
-
-This roadmap tracks the human-readable Markdown test plan for `inbox`.
-
-It exists so a new agent can immediately answer four questions without re-reading the whole codebase:
-
- which test-plan documents already exist
- which cases have already been written down
- which cases are still missing
- what file should be updated next
-
-This roadmap is for the Markdown test-plan set under `docs/tests/inbox/`.
-It is not a replacement for automated Go tests.
-
-## Current Snapshot
-
-Snapshot date:
-
- `2026-03-19`
-
-Current state:
-
- `inbox` CLI is implemented end-to-end
- automated Go integration tests already exist for the main lifecycle, wait flows, unread behavior, artifacts, and JSON error contracts
- this roadmap now exists under `docs/tests/inbox/ROADMAP.md`
- all planned global, shared, workflow, command-index, and command-case Markdown documents have been authored
- command-level documents have been audited once per command against current CLI and store behavior, with edge-contract notes added for defaults, fallbacks, and error boundaries where needed
- every inbox command folder now uses `README.md` as an index plus one Markdown file per case
-
-Progress summary for planned test-plan documents, excluding `ROADMAP.md`:
-
- planned document files: `70`
- authored document files: `70`
- planned case slugs in this roadmap: `61`
- authored case slugs in this roadmap: `61`
-
-## Scope
-
-In scope:
-
- `inbox init`
- `inbox send`
- `inbox fetch`
- `inbox claim`
- `inbox renew`
- `inbox update`
- `inbox reply`
- `inbox done`
- `inbox fail`
- `inbox cancel`
- `inbox list`
- `inbox show`
- `inbox watch`
- `inbox wait-reply`
- cross-command workflows
- shared test conventions for JSON output, exit codes, fixtures, and assertions
-
-Out of scope:
-
- `orch`
- `council-review`
- implementation details that are not visible through the CLI contract
-
-## Tracking Rules
-
-Directory model:
-
- one folder per command or shared area
- each folder keeps a `README.md` entrypoint
- command folders use `README.md` as an index only
- each command case lives in its own Markdown file named after the case slug
- cross-command workflow cases remain grouped in `docs/tests/inbox/workflows/README.md`
-
-Case identity:
-
- do not use numeric IDs
- identify each command case by its concrete file path
- identify each workflow case by `path + case slug`
- command case file naming pattern:
-
-```text
-<case-slug>.md
-```
-
- workflow case heading pattern:
-
-```md
-## case: send-rejects-invalid-payload-json
-```
-
-Per-case structure inside the case document:
-
- `用例意义`
- `前置条件`
- `输入`
- `预期输出`
- `断言结论`
-
-How to update this roadmap when a new case is written:
-
-1. if it is a command case, create or update the target `<case-slug>.md` file under the relevant command folder
-2. if it is a command case, add or update the entry in that folder `README.md` index
-3. if it is a workflow case, add or update the case inside `docs/tests/inbox/workflows/README.md`
-4. move the case slug from `Pending Case Backlog` to `Authored Case Register`
-5. update the authored counts in `Current Snapshot`
-6. if a new Markdown file is created, update `Document Progress`
-
-Allowed status values in this roadmap:
-
- `pending`
- `in_progress`
- `done`
- `deferred`
-
-## Existing Automated Coverage Reference
-
-The Markdown test-plan set starts at zero, but these automated tests already exist and should be used as source material when writing the docs:
-
- [integration_test.go](../../../packages/inbox-runtime/internal/cli/inbox/integration_test.go#L12) `TestInboxLifecycle`
- [integration_test.go](../../../packages/inbox-runtime/internal/cli/inbox/integration_test.go#L176) `TestInboxFailLifecycle`
- [integration_test.go](../../../packages/inbox-runtime/internal/cli/inbox/integration_test.go#L243) `TestInboxRenewWaitReplyAndCancel`
- [integration_test.go](../../../packages/inbox-runtime/internal/cli/inbox/integration_test.go#L392) `TestInboxWatchListUnreadAndAppend`
- [integration_test.go](../../../packages/inbox-runtime/internal/cli/inbox/integration_test.go#L549) `TestInboxUnreadReadCursor`
- [integration_test.go](../../../packages/inbox-runtime/internal/cli/inbox/integration_test.go#L639) `TestInboxJSONErrorsAndExitCodes`
-
-These tests do not remove the need for the Markdown plan. They only reduce discovery work.
-
-## Planned Directory Tree
-
-```text
-docs/tests/inbox/
-  ROADMAP.md
-  README.md
-  _shared/
-    README.md
-  workflows/
-    README.md
-  init/
-    README.md
-    <case-slug>.md
-  send/
-    README.md
-    <case-slug>.md
-  fetch/
-    README.md
-    <case-slug>.md
-  claim/
-    README.md
-    <case-slug>.md
-  renew/
-    README.md
-    <case-slug>.md
-  update/
-    README.md
-    <case-slug>.md
-  reply/
-    README.md
-    <case-slug>.md
-  done/
-    README.md
-    <case-slug>.md
-  fail/
-    README.md
-    <case-slug>.md
-  cancel/
-    README.md
-    <case-slug>.md
-  list/
-    README.md
-    <case-slug>.md
-  show/
-    README.md
-    <case-slug>.md
-  watch/
-    README.md
-    <case-slug>.md
-  wait-reply/
-    README.md
-    <case-slug>.md
-```
-
-## Document Progress
-
-| Path | Purpose | Planned Cases | Authored Cases | Status |
-| --- | --- | ---: | ---: | --- |
-| `docs/tests/inbox/README.md` | Global testing conventions and glossary | 0 | 0 | done |
-| `docs/tests/inbox/_shared/README.md` | Shared fixtures, JSON assertions, exit-code rules | 0 | 0 | done |
-| `docs/tests/inbox/workflows/README.md` | Cross-command scenarios | 8 | 8 | done |
-| `docs/tests/inbox/init/README.md` | `init` command case index | 0 | 0 | done |
-| `docs/tests/inbox/init/init-creates-schema-on-empty-db.md` | `init` command case | 1 | 1 | done |
-| `docs/tests/inbox/init/init-is-idempotent-on-existing-db.md` | `init` command case | 1 | 1 | done |
-| `docs/tests/inbox/send/README.md` | `send` command case index | 0 | 0 | done |
-| `docs/tests/inbox/send/send-creates-new-thread.md` | `send` command case | 1 | 1 | done |
-| `docs/tests/inbox/send/send-appends-message-to-existing-thread.md` | `send` command case | 1 | 1 | done |
-| `docs/tests/inbox/send/send-reads-body-from-body-file.md` | `send` command case | 1 | 1 | done |
-| `docs/tests/inbox/send/send-attaches-artifact-with-metadata.md` | `send` command case | 1 | 1 | done |
-| `docs/tests/inbox/send/send-rejects-invalid-payload-json.md` | `send` command case | 1 | 1 | done |
-| `docs/tests/inbox/send/send-rejects-invalid-artifact-metadata-json.md` | `send` command case | 1 | 1 | done |
-| `docs/tests/inbox/fetch/README.md` | `fetch` command case index | 0 | 0 | done |
-| `docs/tests/inbox/fetch/fetch-returns-pending-thread-for-target-agent.md` | `fetch` command case | 1 | 1 | done |
-| `docs/tests/inbox/fetch/fetch-respects-status-and-limit-filters.md` | `fetch` command case | 1 | 1 | done |
-| `docs/tests/inbox/fetch/fetch-unread-uses-read-cursor.md` | `fetch` command case | 1 | 1 | done |
-| `docs/tests/inbox/fetch/fetch-returns-no-matching-work-when-empty.md` | `fetch` command case | 1 | 1 | done |
-| `docs/tests/inbox/claim/README.md` | `claim` command case index | 0 | 0 | done |
-| `docs/tests/inbox/claim/claim-acquires-thread-lease.md` | `claim` command case | 1 | 1 | done |
-| `docs/tests/inbox/claim/claim-rejects-when-thread-missing.md` | `claim` command case | 1 | 1 | done |
-| `docs/tests/inbox/claim/claim-rejects-when-thread-already-claimed.md` | `claim` command case | 1 | 1 | done |
-| `docs/tests/inbox/claim/claim-records-requested-lease-duration.md` | `claim` command case | 1 | 1 | done |
-| `docs/tests/inbox/renew/README.md` | `renew` command case index | 0 | 0 | done |
-| `docs/tests/inbox/renew/renew-extends-active-lease.md` | `renew` command case | 1 | 1 | done |
-| `docs/tests/inbox/renew/renew-rejects-non-owner.md` | `renew` command case | 1 | 1 | done |
-| `docs/tests/inbox/renew/renew-rejects-without-active-lease.md` | `renew` command case | 1 | 1 | done |
-| `docs/tests/inbox/update/README.md` | `update` command case index | 0 | 0 | done |
-| `docs/tests/inbox/update/update-moves-thread-to-in-progress.md` | `update` command case | 1 | 1 | done |
-| `docs/tests/inbox/update/update-moves-thread-to-blocked-with-payload.md` | `update` command case | 1 | 1 | done |
-| `docs/tests/inbox/update/update-accepts-body-file-and-artifact.md` | `update` command case | 1 | 1 | done |
-| `docs/tests/inbox/update/update-rejects-invalid-payload-json.md` | `update` command case | 1 | 1 | done |
-| `docs/tests/inbox/update/update-rejects-non-owner.md` | `update` command case | 1 | 1 | done |
-| `docs/tests/inbox/reply/README.md` | `reply` command case index | 0 | 0 | done |
-| `docs/tests/inbox/reply/reply-adds-answer-message.md` | `reply` command case | 1 | 1 | done |
-| `docs/tests/inbox/reply/reply-supports-control-kind.md` | `reply` command case | 1 | 1 | done |
-| `docs/tests/inbox/reply/reply-attaches-artifact.md` | `reply` command case | 1 | 1 | done |
-| `docs/tests/inbox/reply/reply-rejects-invalid-payload-json.md` | `reply` command case | 1 | 1 | done |
-| `docs/tests/inbox/done/README.md` | `done` command case index | 0 | 0 | done |
-| `docs/tests/inbox/done/done-marks-thread-terminal.md` | `done` command case | 1 | 1 | done |
-| `docs/tests/inbox/done/done-persists-result-body-and-artifact.md` | `done` command case | 1 | 1 | done |
-| `docs/tests/inbox/done/done-rejects-non-owner.md` | `done` command case | 1 | 1 | done |
-| `docs/tests/inbox/done/done-rejects-on-terminal-thread.md` | `done` command case | 1 | 1 | done |
-| `docs/tests/inbox/fail/README.md` | `fail` command case index | 0 | 0 | done |
-| `docs/tests/inbox/fail/fail-marks-thread-failed.md` | `fail` command case | 1 | 1 | done |
-| `docs/tests/inbox/fail/fail-persists-failure-body-and-artifact.md` | `fail` command case | 1 | 1 | done |
-| `docs/tests/inbox/fail/fail-rejects-non-owner.md` | `fail` command case | 1 | 1 | done |
-| `docs/tests/inbox/fail/fail-rejects-on-terminal-thread.md` | `fail` command case | 1 | 1 | done |
-| `docs/tests/inbox/cancel/README.md` | `cancel` command case index | 0 | 0 | done |
-| `docs/tests/inbox/cancel/cancel-marks-thread-cancelled.md` | `cancel` command case | 1 | 1 | done |
-| `docs/tests/inbox/cancel/cancel-persists-reason-and-artifact.md` | `cancel` command case | 1 | 1 | done |
-| `docs/tests/inbox/cancel/cancel-rejects-when-thread-missing.md` | `cancel` command case | 1 | 1 | done |
-| `docs/tests/inbox/list/README.md` | `list` command case index | 0 | 0 | done |
-| `docs/tests/inbox/list/list-filters-by-status.md` | `list` command case | 1 | 1 | done |
-| `docs/tests/inbox/list/list-filters-by-created-by.md` | `list` command case | 1 | 1 | done |
-| `docs/tests/inbox/list/list-filters-by-assigned-to.md` | `list` command case | 1 | 1 | done |
-| `docs/tests/inbox/list/list-respects-limit.md` | `list` command case | 1 | 1 | done |
-| `docs/tests/inbox/show/README.md` | `show` command case index | 0 | 0 | done |
-| `docs/tests/inbox/show/show-returns-thread-and-message-history.md` | `show` command case | 1 | 1 | done |
-| `docs/tests/inbox/show/show-includes-artifacts-per-message.md` | `show` command case | 1 | 1 | done |
-| `docs/tests/inbox/show/show-mark-read-advances-read-cursor.md` | `show` command case | 1 | 1 | done |
-| `docs/tests/inbox/show/show-rejects-when-thread-missing.md` | `show` command case | 1 | 1 | done |
-| `docs/tests/inbox/watch/README.md` | `watch` command case index | 0 | 0 | done |
-| `docs/tests/inbox/watch/watch-wakes-on-matching-thread.md` | `watch` command case | 1 | 1 | done |
-| `docs/tests/inbox/watch/watch-respects-status-filter.md` | `watch` command case | 1 | 1 | done |
-| `docs/tests/inbox/watch/watch-times-out-with-no-activity.md` | `watch` command case | 1 | 1 | done |
-| `docs/tests/inbox/wait-reply/README.md` | `wait-reply` command case index | 0 | 0 | done |
-| `docs/tests/inbox/wait-reply/wait-reply-wakes-on-answer-after-message.md` | `wait-reply` command case | 1 | 1 | done |
-| `docs/tests/inbox/wait-reply/wait-reply-can-start-from-after-event.md` | `wait-reply` command case | 1 | 1 | done |
-| `docs/tests/inbox/wait-reply/wait-reply-times-out-when-no-reply.md` | `wait-reply` command case | 1 | 1 | done |
-
-## Authoring Order
-
-Recommended order:
-
-1. `docs/tests/inbox/README.md`
-2. `docs/tests/inbox/_shared/README.md`
-3. `docs/tests/inbox/workflows/README.md`
-4. `docs/tests/inbox/send/README.md` plus its linked case files
-5. `docs/tests/inbox/fetch/README.md` plus its linked case files
-6. `docs/tests/inbox/claim/README.md` plus its linked case files
-7. `docs/tests/inbox/reply/README.md` plus its linked case files
-8. `docs/tests/inbox/done/README.md` plus its linked case files
-9. `docs/tests/inbox/show/README.md` plus its linked case files
-10. the remaining command indexes and case files
-
-Reason:
-
- the workflow file captures the highest-value end-to-end behavior first
- the command documents can then reuse shared conventions and already-fixed terminology
-
-## Authored Case Register
-
-| Path | Case Slug | Coverage Note | Status |
-| --- | --- | --- | --- |
-| `docs/tests/inbox/workflows/README.md` | `thread-lifecycle-happy-path` | end-to-end happy path from send to show after done | done |
-| `docs/tests/inbox/workflows/README.md` | `blocked-question-reply-resume-to-done` | blocked thread receives answer and resumes to done | done |
-| `docs/tests/inbox/workflows/README.md` | `fail-lifecycle-from-claim-to-terminal` | claimed thread transitions to failed terminal state | done |
-| `docs/tests/inbox/workflows/README.md` | `cancel-lifecycle-after-worker-claim` | claimed thread can be cancelled by initiator | done |
-| `docs/tests/inbox/workflows/README.md` | `watch-wakes-then-fetch-sees-new-thread` | watch wake-up remains consistent with unread fetch visibility | done |
-| `docs/tests/inbox/workflows/README.md` | `artifact-visible-through-send-and-show` | body-file and artifact data survive send and show | done |
-| `docs/tests/inbox/workflows/README.md` | `unread-clears-after-mark-read-and-reappears-on-new-message` | read cursor clears unread and new message restores it | done |
-| `docs/tests/inbox/workflows/README.md` | `wait-reply-clears-blocked-unread-for-agent` | wait-reply consumes reply and clears blocked unread view | done |
-| `docs/tests/inbox/init/init-creates-schema-on-empty-db.md` | `init-creates-schema-on-empty-db` | initializes an empty database path and returns initialized status | done |
-| `docs/tests/inbox/init/init-is-idempotent-on-existing-db.md` | `init-is-idempotent-on-existing-db` | repeated init succeeds on the same database path | done |
-| `docs/tests/inbox/send/send-creates-new-thread.md` | `send-creates-new-thread` | creates a pending thread with an initial task message | done |
-| `docs/tests/inbox/send/send-appends-message-to-existing-thread.md` | `send-appends-message-to-existing-thread` | appends a message to an existing non-terminal thread | done |
-| `docs/tests/inbox/send/send-reads-body-from-body-file.md` | `send-reads-body-from-body-file` | reads message body from a file path | done |
-| `docs/tests/inbox/send/send-attaches-artifact-with-metadata.md` | `send-attaches-artifact-with-metadata` | persists artifact path, kind, and metadata on send | done |
-| `docs/tests/inbox/send/send-rejects-invalid-payload-json.md` | `send-rejects-invalid-payload-json` | rejects malformed payload JSON with `invalid_input` | done |
-| `docs/tests/inbox/send/send-rejects-invalid-artifact-metadata-json.md` | `send-rejects-invalid-artifact-metadata-json` | rejects malformed artifact metadata JSON | done |
-| `docs/tests/inbox/fetch/fetch-returns-pending-thread-for-target-agent.md` | `fetch-returns-pending-thread-for-target-agent` | returns pending candidate work for the target agent | done |
-| `docs/tests/inbox/fetch/fetch-respects-status-and-limit-filters.md` | `fetch-respects-status-and-limit-filters` | enforces status filtering and max row count | done |
-| `docs/tests/inbox/fetch/fetch-unread-uses-read-cursor.md` | `fetch-unread-uses-read-cursor` | unread filtering depends on per-agent read cursor state | done |
-| `docs/tests/inbox/fetch/fetch-returns-no-matching-work-when-empty.md` | `fetch-returns-no-matching-work-when-empty` | empty fetch result returns no_matching_work | done |
-| `docs/tests/inbox/claim/claim-acquires-thread-lease.md` | `claim-acquires-thread-lease` | claims a pending thread and records a claim event message | done |
-| `docs/tests/inbox/claim/claim-rejects-when-thread-missing.md` | `claim-rejects-when-thread-missing` | missing thread returns not_found | done |
-| `docs/tests/inbox/claim/claim-rejects-when-thread-already-claimed.md` | `claim-rejects-when-thread-already-claimed` | active lease conflict returns lease_conflict | done |
-| `docs/tests/inbox/claim/claim-records-requested-lease-duration.md` | `claim-records-requested-lease-duration` | claim event payload records requested lease duration | done |
-| `docs/tests/inbox/renew/renew-extends-active-lease.md` | `renew-extends-active-lease` | owner renews an active lease and gets a renewal event | done |
-| `docs/tests/inbox/renew/renew-rejects-non-owner.md` | `renew-rejects-non-owner` | non-owner renew attempt returns lease_conflict | done |
-| `docs/tests/inbox/renew/renew-rejects-without-active-lease.md` | `renew-rejects-without-active-lease` | missing active lease returns invalid_state | done |
-| `docs/tests/inbox/update/update-moves-thread-to-in-progress.md` | `update-moves-thread-to-in-progress` | moves a claimed thread to `in_progress` and emits a progress message | done |
-| `docs/tests/inbox/update/update-moves-thread-to-blocked-with-payload.md` | `update-moves-thread-to-blocked-with-payload` | moves a claimed thread to `blocked` with structured question payload | done |
-| `docs/tests/inbox/update/update-accepts-body-file-and-artifact.md` | `update-accepts-body-file-and-artifact` | persists update body from file plus artifacts | done |
-| `docs/tests/inbox/update/update-rejects-invalid-payload-json.md` | `update-rejects-invalid-payload-json` | rejects malformed `--payload-json` input | done |
-| `docs/tests/inbox/update/update-rejects-non-owner.md` | `update-rejects-non-owner` | rejects update when caller is not the active lease owner | done |
-| `docs/tests/inbox/reply/reply-adds-answer-message.md` | `reply-adds-answer-message` | appends default `answer` message to an existing non-terminal thread | done |
-| `docs/tests/inbox/reply/reply-supports-control-kind.md` | `reply-supports-control-kind` | supports explicit `--kind control` reply message | done |
-| `docs/tests/inbox/reply/reply-attaches-artifact.md` | `reply-attaches-artifact` | appends reply message with artifact payload | done |
-| `docs/tests/inbox/reply/reply-rejects-invalid-payload-json.md` | `reply-rejects-invalid-payload-json` | rejects malformed `--payload-json` input | done |
-| `docs/tests/inbox/done/done-marks-thread-terminal.md` | `done-marks-thread-terminal` | marks a claimed thread as `done` with a result message | done |
-| `docs/tests/inbox/done/done-persists-result-body-and-artifact.md` | `done-persists-result-body-and-artifact` | persists result body and artifact for follow-up reads | done |
-| `docs/tests/inbox/done/done-rejects-non-owner.md` | `done-rejects-non-owner` | rejects `done` from non-owner agent | done |
-| `docs/tests/inbox/done/done-rejects-on-terminal-thread.md` | `done-rejects-on-terminal-thread` | rejects `done` on terminal thread states | done |
-| `docs/tests/inbox/fail/fail-marks-thread-failed.md` | `fail-marks-thread-failed` | marks a claimed thread as `failed` with a result message | done |
-| `docs/tests/inbox/fail/fail-persists-failure-body-and-artifact.md` | `fail-persists-failure-body-and-artifact` | persists failure body and artifacts for diagnosis | done |
-| `docs/tests/inbox/fail/fail-rejects-non-owner.md` | `fail-rejects-non-owner` | rejects `fail` from non-owner agent | done |
-| `docs/tests/inbox/fail/fail-rejects-on-terminal-thread.md` | `fail-rejects-on-terminal-thread` | rejects `fail` on terminal thread states | done |
-| `docs/tests/inbox/cancel/cancel-marks-thread-cancelled.md` | `cancel-marks-thread-cancelled` | moves a non-terminal thread into `cancelled` and emits a control message | done |
-| `docs/tests/inbox/cancel/cancel-persists-reason-and-artifact.md` | `cancel-persists-reason-and-artifact` | persists cancel reason text and attached artifacts | done |
-| `docs/tests/inbox/cancel/cancel-rejects-when-thread-missing.md` | `cancel-rejects-when-thread-missing` | returns stable not-found contract when thread does not exist | done |
-| `docs/tests/inbox/list/list-filters-by-status.md` | `list-filters-by-status` | filters returned threads by status set | done |
-| `docs/tests/inbox/list/list-filters-by-created-by.md` | `list-filters-by-created-by` | filters returned threads by creator | done |
-| `docs/tests/inbox/list/list-filters-by-assigned-to.md` | `list-filters-by-assigned-to` | filters returned threads by current assignee | done |
-| `docs/tests/inbox/list/list-respects-limit.md` | `list-respects-limit` | enforces hard cap on returned thread count | done |
-| `docs/tests/inbox/show/show-returns-thread-and-message-history.md` | `show-returns-thread-and-message-history` | returns thread details and full time-ordered message history | done |
-| `docs/tests/inbox/show/show-includes-artifacts-per-message.md` | `show-includes-artifacts-per-message` | expands per-message artifacts in the show payload | done |
-| `docs/tests/inbox/show/show-mark-read-advances-read-cursor.md` | `show-mark-read-advances-read-cursor` | advances caller read cursor when `--mark-read` is used | done |
-| `docs/tests/inbox/show/show-rejects-when-thread-missing.md` | `show-rejects-when-thread-missing` | returns stable not-found contract for missing thread | done |
-| `docs/tests/inbox/watch/watch-wakes-on-matching-thread.md` | `watch-wakes-on-matching-thread` | wakes when a matching post-start event arrives and returns event context | done |
-| `docs/tests/inbox/watch/watch-respects-status-filter.md` | `watch-respects-status-filter` | wakes only when thread transitions into requested status | done |
-| `docs/tests/inbox/watch/watch-times-out-with-no-activity.md` | `watch-times-out-with-no-activity` | returns timeout contract when no matching activity arrives | done |
-| `docs/tests/inbox/wait-reply/wait-reply-wakes-on-answer-after-message.md` | `wait-reply-wakes-on-answer-after-message` | wakes for a qualifying reply after known message boundary | done |
-| `docs/tests/inbox/wait-reply/wait-reply-can-start-from-after-event.md` | `wait-reply-can-start-from-after-event` | resumes waiting from a known event cursor | done |
-| `docs/tests/inbox/wait-reply/wait-reply-times-out-when-no-reply.md` | `wait-reply-times-out-when-no-reply` | returns timeout contract when no qualifying reply arrives | done |
-
-## Pending Case Backlog
-
-No pending case slugs remain in the current plan.
-
-When a new CLI contract or workflow needs coverage:
-
-1. if it is a command case, create a new `<case-slug>.md` file under the relevant command folder and add it to that folder `README.md` index
-2. if it is a workflow case, add it to `docs/tests/inbox/workflows/README.md`
-3. add the new slug to `Authored Case Register`
-4. update `Current Snapshot` and `Document Progress`
-
-## Definition Of Done
-
-This roadmap is complete only when all of the following are true:
-
- every implemented inbox command has a corresponding document folder
- each planned command index and case document exists
- each pending case slug has been either authored or explicitly deferred
- the authored-case register matches the actual Markdown files on disk
- a new agent can pick any pending case and know exactly where it should be written
@@ -1,130 +0,0 @@
-# Inbox Shared Test Conventions
-
-## Purpose
-
-This document captures shared assumptions used by multiple `inbox` test-plan documents so command and workflow files can stay focused on behavior rather than repeating setup boilerplate.
-
-## Recommended Fixture Shape
-
-Use an isolated temp workspace per case:
-
- database path: `TMPDIR/coord.db`
- optional body file: `TMPDIR/body.md`
- optional artifact file: `TMPDIR/artifact.txt`
-
-Recommended bootstrap command:
-
-```bash
-inbox --db TMPDIR/coord.db --json init
-```
-
-## Global Flags
-
-Root-level flags apply to every subcommand:
-
- `--db`: SQLite database path, default `.agents/coord.db`
- `--json`: emit machine-readable JSON
- `--agent`: acting agent identity shortcut used by commands that accept agent context
-
-When a command-specific `--agent` or `--from` flag is omitted, the root `--agent` value may be used instead. Cases that verify fallback behavior should state that explicitly.
-
-## Success JSON Contract
-
-Successful JSON output uses this shape:
-
-```json
-{
-  "ok": true,
-  "command": "send",
-  "data": {}
-}
-```
-
-Shared assertion points:
-
- `ok` is `true`
- `command` matches the invoked subcommand
- `data` contains the command-specific payload
-
-## Error JSON Contract
-
-Failure JSON output uses this shape:
-
-```json
-{
-  "ok": false,
-  "error": {
-    "code": "invalid_input",
-    "message": "..."
-  }
-}
-```
-
-Shared assertion points:
-
- `ok` is `false`
- `error.code` matches the stable contract
- `error.message` is present and human-readable
-
-## Exit Code Contract
-
-The current CLI contract uses these exit codes:
-
-| Exit Code | Meaning | Typical Error Code |
-| --- | --- | --- |
-| `0` | success | none |
-| `10` | no matching work / timeout without match | `no_matching_work` |
-| `20` | lease conflict | `lease_conflict` |
-| `30` | invalid input, invalid state, usage-style error | `invalid_input` or `invalid_state` |
-| `40` | referenced thread or message missing | `not_found` |
-| `50` | unexpected internal failure | `internal_error` |
-
-When a case expects no result, assert both the exit code and the JSON error code.
-
-## Body Input Rules
-
-Commands that support `--body` and `--body-file` follow these rules:
-
- `--body` and `--body-file` are mutually exclusive
- `--body-file` content is read verbatim into the message body
- unreadable `--body-file` should be treated as `invalid_input`
-
-Relevant commands:
-
- `send`
- `update`
- `reply`
- `done`
- `fail`
-
-## Artifact Rules
-
-Commands with artifact support use these shared rules:
-
- `--artifact` may be repeated
- `--artifact-kind` may be specified once for all artifacts, or once per artifact
- `--artifact-metadata-json` may be specified once for all artifacts, or once per artifact
- `--artifact-kind` and `--artifact-metadata-json` are invalid without at least one `--artifact`
- an empty artifact path is invalid input
-
-When artifact behavior is under test, assert at least:
-
- artifact count
- artifact `path`
- artifact `kind`
- metadata presence when supplied
-
-## Read And Unread Assertions
-
-Unread-related cases should verify behavior from the agent's point of view, not only raw message existence.
-
-Recommended checks:
-
- `fetch --unread` returns a thread before read acknowledgement
- `show --mark-read` clears unread state for that agent
- a new message to the same thread makes the thread unread again
- `wait-reply` may clear blocked unread state for the waiting agent when the reply is consumed
-
-## Workflow Authoring Rule
-
-If a case spans multiple commands, place the end-to-end narrative in `workflows/README.md` first, then add narrower command-level cases only when they introduce behavior that is easier to reason about in isolation.
@@ -1,9 +0,0 @@
-# Inbox `cancel` Test Plan Index
-
-## Case Files
-
-| Case Slug | File | Coverage Note |
-| --- | --- | --- |
-| `cancel-marks-thread-cancelled` | [cancel-marks-thread-cancelled.md](./cancel-marks-thread-cancelled.md) | moves a non-terminal thread into `cancelled` and emits a control message |
-| `cancel-persists-reason-and-artifact` | [cancel-persists-reason-and-artifact.md](./cancel-persists-reason-and-artifact.md) | persists cancel reason text and attached artifacts |
-| `cancel-rejects-when-thread-missing` | [cancel-rejects-when-thread-missing.md](./cancel-rejects-when-thread-missing.md) | returns stable not-found contract when thread does not exist |
@@ -1,29 +0,0 @@
-# case: cancel-marks-thread-cancelled
-
-### 用例意义
-
-验证 `cancel` 可以把非终态线程推进到 `cancelled` 终态，并生成控制消息。
-
-### 前置条件
-
- 已存在一个非终态线程 `THREAD_ID`
-
-### 输入
-
-```bash
-inbox --db TMPDIR/coord.db --json cancel --agent leader --thread THREAD_ID --reason "Task superseded by a larger refactor"
-```
-
-### 预期输出
-
- 命令退出码为 `0`
- `thread.status == "cancelled"`
- `message.kind == "control"`
-
-### 断言结论
-
- `cancel` 是线程级终态转换
- 取消时会释放活跃 lease
- `cancel` 不要求调用方持有活跃 lease；只要线程存在且尚未进入终态，就可以被取消
- 如果线程已经是 `done`、`failed` 或 `cancelled`，应返回 `invalid_state`，而不是 `lease_conflict`
-
@@ -1,30 +0,0 @@
-# case: cancel-persists-reason-and-artifact
-
-### 用例意义
-
-验证 `cancel` 的原因文本与附件会被完整持久化。
-
-### 前置条件
-
- 已存在一个非终态线程 `THREAD_ID`
- `TMPDIR/cancel.md` 已存在
-
-### 输入
-
-```bash
-inbox --db TMPDIR/coord.db --json cancel --agent leader --thread THREAD_ID --reason "Task superseded by a larger refactor" --artifact TMPDIR/cancel.md --artifact-kind brief
-inbox --db TMPDIR/coord.db --json show --thread THREAD_ID
-```
-
-### 预期输出
-
- `cancel` 成功
- 取消消息 `summary` 与 `body` 都保留取消原因
- 取消消息包含 1 个 artifact
-
-### 断言结论
-
- `cancel` 既保留人类可读原因，也支持附带上下文材料
- 当 `--reason` 为空时，取消消息的 `summary` 会回退为 `thread cancelled`，而 `body` 保持空字符串
- `--artifact-kind` 与 `--artifact-metadata-json` 需要至少一个 `--artifact`，且多值数量必须是 `1` 或与 artifact 数量一致；否则应返回 `invalid_input`
-
@@ -1,27 +0,0 @@
-# case: cancel-rejects-when-thread-missing
-
-### 用例意义
-
-验证 `cancel` 对不存在线程返回稳定的 not-found 错误契约。
-
-### 前置条件
-
- 空数据库已完成 `init`
-
-### 输入
-
-```bash
-inbox --db TMPDIR/coord.db --json cancel --agent leader --thread thr_missing
-```
-
-### 预期输出
-
- 退出码为 `40`
- JSON 错误码为 `not_found`
-
-### 断言结论
-
- `cancel` 不会为缺失线程隐式创建控制消息
- 当命令级 `--agent` 未显式提供时，可以回退使用根级 `--agent`；两者都缺失时应返回 `invalid_input`
- `--thread` 是必填 flag；缺失时属于 `invalid_input` 类 usage error
-
@@ -1,10 +0,0 @@
-# Inbox `claim` Test Plan Index
-
-## Case Files
-
-| Case Slug | File | Coverage Note |
-| --- | --- | --- |
-| `claim-acquires-thread-lease` | [claim-acquires-thread-lease.md](./claim-acquires-thread-lease.md) | claims a pending thread and records a claim event message |
-| `claim-rejects-when-thread-missing` | [claim-rejects-when-thread-missing.md](./claim-rejects-when-thread-missing.md) | missing thread returns not_found |
-| `claim-rejects-when-thread-already-claimed` | [claim-rejects-when-thread-already-claimed.md](./claim-rejects-when-thread-already-claimed.md) | active lease conflict returns lease_conflict |
-| `claim-records-requested-lease-duration` | [claim-records-requested-lease-duration.md](./claim-records-requested-lease-duration.md) | claim event payload records requested lease duration |
@@ -1,33 +0,0 @@
-# Case: `claim-acquires-thread-lease`
-
-## 用例意义
-
-验证 `claim` 可以把 `pending` 线程切换到 `claimed`，并生成租约事件消息。
-
-## 前置条件
-
- 已存在一个 `pending` 线程 `THREAD_ID`
-
-## 输入
-
-```bash
-inbox --db TMPDIR/coord.db --json claim --agent worker-a --thread THREAD_ID --lease-seconds 300
-```
-
-## 预期输出
-
- 命令退出码为 `0`
- `thread.status == "claimed"`
- `thread.assigned_to == "worker-a"`
- `message.kind == "event"`
- `message.summary == "thread claimed"`
-
-## 断言结论
-
- `claim` 同时更新线程状态与活跃租约
- 成功领取会附带一条事件消息，而不是静默改状态
- 未显式传 `--lease-seconds`，或传入非正值时，租约时长应回退到默认 `900` 秒
-
-## 补充约束
-
- 当 `--agent` 未显式提供时，可以回退使用根级 `--agent`
@@ -1,25 +0,0 @@
-# Case: `claim-records-requested-lease-duration`
-
-## 用例意义
-
-验证 `claim --lease-seconds` 的请求值会进入事件消息 payload，便于后续审计。
-
-## 前置条件
-
- 已存在一个 `pending` 线程 `THREAD_ID`
-
-## 输入
-
-```bash
-inbox --db TMPDIR/coord.db --json claim --agent worker-a --thread THREAD_ID --lease-seconds 300
-```
-
-## 预期输出
-
- 命令退出码为 `0`
- `message.payload_json.lease_seconds == 300`
- `message.payload_json.lease_token` 存在
-
-## 断言结论
-
- 请求的租约时长不是仅用于内部计算，也会被持久化到事件消息中
@@ -1,28 +0,0 @@
-# Case: `claim-rejects-when-thread-already-claimed`
-
-## 用例意义
-
-验证同一线程在已有活跃租约时，其他执行者无法重复领取。
-
-## 前置条件
-
- `worker-z` 已成功 `claim` 线程 `THREAD_ID`
-
-## 输入
-
-```bash
-inbox --db TMPDIR/coord.db --json claim --agent worker-y --thread THREAD_ID
-```
-
-## 预期输出
-
- 退出码为 `20`
- JSON 错误码为 `lease_conflict`
-
-## 断言结论
-
- 活跃 lease 是 `claim` 的排他条件
-
-## 补充约束
-
- `claim` 只允许作用在 `pending` 线程上；如果线程已是 `claimed`、`in_progress`、`blocked`，或已进入任一终态，则应返回 `invalid_state`，而不是 `lease_conflict`
@@ -1,28 +0,0 @@
-# Case: `claim-rejects-when-thread-missing`
-
-## 用例意义
-
-验证 `claim` 对不存在的线程返回稳定的 not-found 错误契约。
-
-## 前置条件
-
- 空数据库已完成 `init`
-
-## 输入
-
-```bash
-inbox --db TMPDIR/coord.db --json claim --agent worker-z --thread thr_missing
-```
-
-## 预期输出
-
- 退出码为 `40`
- JSON 错误码为 `not_found`
-
-## 断言结论
-
- 缺失线程会被明确区分为引用错误，而不是 lease 冲突
-
-## 补充约束
-
- `--thread` 是必填 flag；缺失时属于 `invalid_input` 类 usage error
@@ -1,10 +0,0 @@
-# Inbox `done` Test Plan Index
-
-## Case Files
-
-| Case Slug | File | Coverage Note |
-| --- | --- | --- |
-| `done-marks-thread-terminal` | [done-marks-thread-terminal.md](./done-marks-thread-terminal.md) | marks a claimed thread as `done` with a result message |
-| `done-persists-result-body-and-artifact` | [done-persists-result-body-and-artifact.md](./done-persists-result-body-and-artifact.md) | persists result body and artifact for follow-up reads |
-| `done-rejects-non-owner` | [done-rejects-non-owner.md](./done-rejects-non-owner.md) | rejects `done` from non-owner agent |
-| `done-rejects-on-terminal-thread` | [done-rejects-on-terminal-thread.md](./done-rejects-on-terminal-thread.md) | rejects `done` on terminal thread states |
@@ -1,33 +0,0 @@
-# Case: `done-marks-thread-terminal`
-
-## 用例意义
-
-验证租约拥有者可以将线程推进到 `done` 终态，并生成结果消息。
-
-## 前置条件
-
- `worker-a` 已成功 `claim` 线程 `THREAD_ID`
-
-## 输入
-
-```bash
-inbox --db TMPDIR/coord.db --json done --agent worker-a --thread THREAD_ID --summary "Retry policy implemented" --body "The HTTP client now retries the selected transient failures."
-```
-
-## 预期输出
-
- 命令退出码为 `0`
- `thread.status == "done"`
- `message.kind == "result"`
-
-## 断言结论
-
- `done` 会把线程推进到成功终态
- 完成时会释放活跃 lease
-
-## 补充约束
-
- 当 `--agent` 未显式提供时，可以回退使用根级 `--agent`
- 若线程存在但当前没有活跃 lease，例如 lease 已释放或过期，`done` 应返回 `invalid_state`，而不是 `lease_conflict`
- `--thread` 与 `--summary` 是必填 flag；缺失时属于 `invalid_input` 类 usage error
-
@@ -1,34 +0,0 @@
-# Case: `done-persists-result-body-and-artifact`
-
-## 用例意义
-
-验证 `done` 能持久化结果正文与附件，并被后续 `show` 读取。
-
-## 前置条件
-
- `worker-a` 已成功 `claim` 线程 `THREAD_ID`
- `TMPDIR/result.md` 已存在
-
-## 输入
-
-```bash
-inbox --db TMPDIR/coord.db --json done --agent worker-a --thread THREAD_ID --summary "Retry policy implemented" --body-file TMPDIR/result.md --artifact TMPDIR/result.md --artifact-kind report
-inbox --db TMPDIR/coord.db --json show --thread THREAD_ID
-```
-
-## 预期输出
-
- `done` 成功
- 最终结果消息 `body` 等于文件内容
- 结果消息包含 1 个 `report` artifact
-
-## 断言结论
-
- `done` 是结果交付命令，不只是状态切换命令
- `done` 也支持 `--payload-json`；若传入非法 JSON，应返回 `invalid_input`
-
-## 补充约束
-
- `--body` 与 `--body-file` 互斥；不可读的 `--body-file` 也属于 `invalid_input`
- artifact 相关 flag 依赖至少一个 `--artifact`，并遵守“指定一次或按 artifact 数量逐个指定”的计数规则
-
@@ -1,25 +0,0 @@
-# Case: `done-rejects-non-owner`
-
-## 用例意义
-
-验证非租约拥有者不能代替执行者完成线程。
-
-## 前置条件
-
- `worker-a` 已成功 `claim` 线程 `THREAD_ID`
-
-## 输入
-
-```bash
-inbox --db TMPDIR/coord.db --json done --agent worker-b --thread THREAD_ID --summary "Retry policy implemented"
-```
-
-## 预期输出
-
- 退出码为 `20`
- JSON 错误码为 `lease_conflict`
-
-## 断言结论
-
- `done` 受活跃 lease 所属者约束
-
@@ -1,25 +0,0 @@
-# Case: `done-rejects-on-terminal-thread`
-
-## 用例意义
-
-验证已进入终态的线程不能再次执行 `done`。
-
-## 前置条件
-
- 线程 `THREAD_ID` 已经是 `done`、`failed` 或 `cancelled`
-
-## 输入
-
-```bash
-inbox --db TMPDIR/coord.db --json done --agent worker-a --thread THREAD_ID --summary "Retry policy implemented"
-```
-
-## 预期输出
-
- 退出码为 `30`
- JSON 错误码为 `invalid_state`
-
-## 断言结论
-
- `done` 对终态线程是幂等失败，而不是重复成功
-
@@ -1,10 +0,0 @@
-# Inbox `fail` Test Plan Index
-
-## Case Files
-
-| Case Slug | File | Coverage Note |
-| --- | --- | --- |
-| `fail-marks-thread-failed` | [fail-marks-thread-failed.md](./fail-marks-thread-failed.md) | marks a claimed thread as `failed` with a result message |
-| `fail-persists-failure-body-and-artifact` | [fail-persists-failure-body-and-artifact.md](./fail-persists-failure-body-and-artifact.md) | persists failure body and artifacts for diagnosis |
-| `fail-rejects-non-owner` | [fail-rejects-non-owner.md](./fail-rejects-non-owner.md) | rejects `fail` from non-owner agent |
-| `fail-rejects-on-terminal-thread` | [fail-rejects-on-terminal-thread.md](./fail-rejects-on-terminal-thread.md) | rejects `fail` on terminal thread states |
@@ -1,33 +0,0 @@
-# Case: `fail-marks-thread-failed`
-
-## 用例意义
-
-验证租约拥有者可以把线程推进到 `failed` 终态，并生成失败结果消息。
-
-## 前置条件
-
- `worker-b` 已成功 `claim` 线程 `THREAD_ID`
-
-## 输入
-
-```bash
-inbox --db TMPDIR/coord.db --json fail --agent worker-b --thread THREAD_ID --summary "Migration failed" --body "The migration cannot proceed because the prior schema is inconsistent."
-```
-
-## 预期输出
-
- 命令退出码为 `0`
- `thread.status == "failed"`
- `message.kind == "result"`
-
-## 断言结论
-
- `fail` 与 `done` 共享结果消息模型，但进入的是失败终态
- 成功 `fail` 后会释放当前活跃 lease，避免线程停留在失败终态却仍显示被占用
-
-## 补充约束
-
- 当 `--agent` 未显式提供时，可以回退使用根级 `--agent`
- `fail` 生成的 `result` 消息会发回线程创建者，而不是发给当前执行者自己
- 如果线程没有活跃 lease，`fail` 应返回 `invalid_state`，而不是 `lease_conflict`
-
@@ -1,34 +0,0 @@
-# Case: `fail-persists-failure-body-and-artifact`
-
-## 用例意义
-
-验证 `fail` 能持久化失败说明与附件。
-
-## 前置条件
-
- `worker-b` 已成功 `claim` 线程 `THREAD_ID`
- `TMPDIR/failure.md` 已存在
-
-## 输入
-
-```bash
-inbox --db TMPDIR/coord.db --json fail --agent worker-b --thread THREAD_ID --summary "Migration failed" --body-file TMPDIR/failure.md --artifact TMPDIR/failure.md --artifact-kind report
-inbox --db TMPDIR/coord.db --json show --thread THREAD_ID
-```
-
-## 预期输出
-
- `fail` 成功
- 最终结果消息 `body` 等于文件内容
- 结果消息包含 1 个 `report` artifact
-
-## 断言结论
-
- 失败终态同样要能完整交付排障材料
-
-## 补充约束
-
- `--payload-json` 需要是合法 JSON；空值会按 `{}` 处理
- `--body` 与 `--body-file` 互斥；不可读的 `--body-file` 属于 `invalid_input`
- `artifact-kind` 和 `artifact-metadata-json` 不能脱离 `--artifact` 单独使用，且多值数量必须满足“一次全量应用”或“逐 artifact 对齐”
-
@@ -1,25 +0,0 @@
-# Case: `fail-rejects-non-owner`
-
-## 用例意义
-
-验证非租约拥有者不能把线程标记为失败。
-
-## 前置条件
-
- `worker-b` 已成功 `claim` 线程 `THREAD_ID`
-
-## 输入
-
-```bash
-inbox --db TMPDIR/coord.db --json fail --agent worker-x --thread THREAD_ID --summary "Migration failed"
-```
-
-## 预期输出
-
- 退出码为 `20`
- JSON 错误码为 `lease_conflict`
-
-## 断言结论
-
- `fail` 与 `done` 一样受 lease owner 约束
-
@@ -1,25 +0,0 @@
-# Case: `fail-rejects-on-terminal-thread`
-
-## 用例意义
-
-验证已进入终态的线程不能再次执行 `fail`。
-
-## 前置条件
-
- 线程 `THREAD_ID` 已经是 `done`、`failed` 或 `cancelled`
-
-## 输入
-
-```bash
-inbox --db TMPDIR/coord.db --json fail --agent worker-b --thread THREAD_ID --summary "Migration failed"
-```
-
-## 预期输出
-
- 退出码为 `30`
- JSON 错误码为 `invalid_state`
-
-## 断言结论
-
- `fail` 对终态线程不会重复成功
-
@@ -1,10 +0,0 @@
-# Inbox `fetch` Test Plan Index
-
-## Case Files
-
-| Case Slug | File | Coverage Note |
-| --- | --- | --- |
-| `fetch-returns-pending-thread-for-target-agent` | [fetch-returns-pending-thread-for-target-agent.md](./fetch-returns-pending-thread-for-target-agent.md) | returns pending candidate work for the target agent |
-| `fetch-respects-status-and-limit-filters` | [fetch-respects-status-and-limit-filters.md](./fetch-respects-status-and-limit-filters.md) | enforces status filtering and max row count |
-| `fetch-unread-uses-read-cursor` | [fetch-unread-uses-read-cursor.md](./fetch-unread-uses-read-cursor.md) | unread filtering depends on per-agent read cursor state |
-| `fetch-returns-no-matching-work-when-empty` | [fetch-returns-no-matching-work-when-empty.md](./fetch-returns-no-matching-work-when-empty.md) | empty fetch result returns no_matching_work |
@@ -1,31 +0,0 @@
-# Case: `fetch-respects-status-and-limit-filters`
-
-## 用例意义
-
-验证 `fetch` 同时遵守状态过滤与返回上限。
-
-## 前置条件
-
- `worker-a` 拥有多个不同状态的线程
- 其中至少两个线程满足目标状态过滤条件
-
-## 输入
-
-```bash
-inbox --db TMPDIR/coord.db --json fetch --agent worker-a --status pending,blocked --limit 1
-```
-
-## 预期输出
-
- 命令退出码为 `0`
- 返回线程数不超过 `1`
- 返回的每条线程都满足 `status in ["pending","blocked"]`
-
-## 断言结论
-
- `fetch` 的 `status` 与 `limit` 会同时生效
- 返回顺序按 `updated_at` 倒序，优先暴露最新线程
-
-## 补充约束
-
- `--limit` 传入 `0` 或负数时，实际会回退到默认上限 `20`
@@ -1,24 +0,0 @@
-# Case: `fetch-returns-no-matching-work-when-empty`
-
-## 用例意义
-
-验证 `fetch` 在没有匹配线程时返回稳定的“无工作”错误契约。
-
-## 前置条件
-
- 空数据库已完成 `init`
-
-## 输入
-
-```bash
-inbox --db TMPDIR/coord.db --json fetch --agent worker-z --status pending
-```
-
-## 预期输出
-
- 退出码为 `10`
- JSON 错误码为 `no_matching_work`
-
-## 断言结论
-
- 空结果不是成功空数组，而是显式的“无匹配工作”信号
@@ -1,30 +0,0 @@
-# Case: `fetch-returns-pending-thread-for-target-agent`
-
-## 用例意义
-
-验证 `fetch` 能按目标执行者拉取待处理线程。
-
-## 前置条件
-
- `leader` 已向 `worker-a` 发送至少一个 `pending` 线程
-
-## 输入
-
-```bash
-inbox --db TMPDIR/coord.db --json fetch --agent worker-a --status pending
-```
-
-## 预期输出
-
- 命令退出码为 `0`
- 返回 `data.threads`
- 至少包含一个 `assigned_to == "worker-a"` 且 `status == "pending"` 的线程
-
-## 断言结论
-
- `fetch` 默认是执行者视角的候选工作列表，不是全局线程扫描
-
-## 补充约束
-
- 未显式传 `--status` 时，`fetch` 默认只查询 `pending` 线程
- 未显式传命令级 `--agent` 时，可回退到根级 `--agent`
@@ -1,34 +0,0 @@
-# Case: `fetch-unread-uses-read-cursor`
-
-## 用例意义
-
-验证 `fetch --unread` 基于 agent 的 read cursor 计算未读，而不是仅按线程是否存在新消息。
-
-## 前置条件
-
- `leader` 已向 `worker-e` 发送一个 `pending` 线程 `THREAD_ID`
-
-## 输入
-
-```bash
-inbox --db TMPDIR/coord.db --json fetch --agent worker-e --status pending --unread
-inbox --db TMPDIR/coord.db --agent worker-e --json show --thread THREAD_ID --mark-read
-inbox --db TMPDIR/coord.db --json fetch --agent worker-e --status pending --unread
-inbox --db TMPDIR/coord.db --json send --from leader --to worker-e --thread THREAD_ID --summary "Use sentence case" --body "Keep the nav labels in sentence case."
-inbox --db TMPDIR/coord.db --json fetch --agent worker-e --status pending --unread
-```
-
-## 预期输出
-
- 第一次 `fetch --unread` 返回该线程
- `show --mark-read` 后，第二次 `fetch --unread` 无匹配结果
- 新消息追加后，第三次 `fetch --unread` 再次返回该线程
-
-## 断言结论
-
- 未读判断依赖 `thread_reads.last_read_message_id`
- 新消息到达会让同线程重新进入未读结果集
-
-## 补充约束
-
- 使用 `--unread` 时必须具备 agent 身份，否则会返回 `invalid_input`
@@ -1,8 +0,0 @@
-# Inbox `init` Test Plan Index
-
-## Case Files
-
-| Case Slug | File | Coverage Note |
-| --- | --- | --- |
-| `init-creates-schema-on-empty-db` | [init-creates-schema-on-empty-db.md](./init-creates-schema-on-empty-db.md) | initializes an empty database path and returns initialized status |
-| `init-is-idempotent-on-existing-db` | [init-is-idempotent-on-existing-db.md](./init-is-idempotent-on-existing-db.md) | repeated init succeeds on the same database path |
@@ -1,28 +0,0 @@
-# Case: `init-creates-schema-on-empty-db`
-
-## 用例意义
-
-验证在空数据库路径上执行 `init` 会创建可用的 inbox schema，并返回稳定的初始化响应。
-
-## 前置条件
-
- 选择一个尚不存在的数据库路径 `TMPDIR/coord.db`
-
-## 输入
-
-```bash
-inbox --db TMPDIR/coord.db --json init
-```
-
-## 预期输出
-
- 命令退出码为 `0`
- 返回 `ok=true`
- `command` 为 `init`
- `data.db_path` 等于传入路径
- `data.status` 为 `initialized`
-
-## 断言结论
-
- `init` 在空路径上可以直接完成 schema 初始化
- 初始化结果足以让后续 `send`、`fetch` 等命令继续使用同一数据库
@@ -1,27 +0,0 @@
-# Case: `init-is-idempotent-on-existing-db`
-
-## 用例意义
-
-验证 `init` 可以对已初始化过的数据库重复执行，而不会报错或破坏已有 schema。
-
-## 前置条件
-
- `TMPDIR/coord.db` 已经执行过一次 `inbox --db TMPDIR/coord.db --json init`
-
-## 输入
-
-```bash
-inbox --db TMPDIR/coord.db --json init
-inbox --db TMPDIR/coord.db --json init
-```
-
-## 预期输出
-
- 两次命令都退出码为 `0`
- 两次响应都返回 `data.status == "initialized"`
- 两次响应都返回相同的 `data.db_path`
-
-## 断言结论
-
- `init` 是幂等操作
- 对已存在 schema 的重复初始化不应引入额外迁移失败或状态漂移
@@ -1,10 +0,0 @@
-# Inbox `list` Test Plan Index
-
-## Case Files
-
-| Case Slug | File | Coverage Note |
-| --- | --- | --- |
-| `list-filters-by-status` | [list-filters-by-status.md](./list-filters-by-status.md) | filters returned threads by status set |
-| `list-filters-by-created-by` | [list-filters-by-created-by.md](./list-filters-by-created-by.md) | filters returned threads by creator |
-| `list-filters-by-assigned-to` | [list-filters-by-assigned-to.md](./list-filters-by-assigned-to.md) | filters returned threads by current assignee |
-| `list-respects-limit` | [list-respects-limit.md](./list-respects-limit.md) | enforces hard cap on returned thread count |
@@ -1,25 +0,0 @@
-# case: list-filters-by-assigned-to
-
-### 用例意义
-
-验证 `list --assigned-to` 能按当前指派执行者筛选线程。
-
-### 前置条件
-
- 数据库中存在多个不同 `assigned_to` 的线程
-
-### 输入
-
-```bash
-inbox --db TMPDIR/coord.db --json list --assigned-to worker-d --status pending
-```
-
-### 预期输出
-
- 命令退出码为 `0`
- 返回的每条线程都满足 `assigned_to == "worker-d"`
-
-### 断言结论
-
- `list` 可用于管理侧查看某位执行者当前承担的线程集合
-
@@ -1,26 +0,0 @@
-# case: list-filters-by-created-by
-
-### 用例意义
-
-验证 `list --created-by` 能按线程创建者筛选结果。
-
-### 前置条件
-
- 至少有两位不同创建者产生的线程
-
-### 输入
-
-```bash
-inbox --db TMPDIR/coord.db --json list --created-by leader
-```
-
-### 预期输出
-
- 命令退出码为 `0`
- 返回的每条线程都满足 `created_by == "leader"`
-
-### 断言结论
-
- `created-by` 过滤条件直接作用在线程元数据上
- 没有任何匹配线程时，`list` 返回退出码 `10` 和错误码 `no_matching_work`，而不是成功空数组
-
@@ -1,26 +0,0 @@
-# case: list-filters-by-status
-
-### 用例意义
-
-验证 `list --status` 只返回指定状态集合内的线程。
-
-### 前置条件
-
- 数据库中存在多个不同状态的线程
-
-### 输入
-
-```bash
-inbox --db TMPDIR/coord.db --json list --status pending,blocked
-```
-
-### 预期输出
-
- 命令退出码为 `0`
- 返回的每条线程都满足 `status in ["pending","blocked"]`
-
-### 断言结论
-
- `list` 会严格应用状态过滤
- 当未显式传 `--assigned-to` 时，`list` 可以作为全局视角，也可以在提供 `--agent` 或根级 `--agent` 时退化为“按 assigned-to 过滤”的快捷入口
-
@@ -1,26 +0,0 @@
-# case: list-respects-limit
-
-### 用例意义
-
-验证 `list --limit` 会约束返回条数，并按更新时间倒序返回最新线程。
-
-### 前置条件
-
- 存在多个满足过滤条件的线程
-
-### 输入
-
-```bash
-inbox --db TMPDIR/coord.db --json list --assigned-to worker-d --limit 1
-```
-
-### 预期输出
-
- 命令退出码为 `0`
- 返回线程数不超过 `1`
-
-### 断言结论
-
- `list` 的 limit 是硬上限，不会返回超量结果
- `--limit <= 0` 时会回退到默认值 `20`
-
@@ -1,9 +0,0 @@
-# Inbox `renew` Test Plan Index
-
-## Case Files
-
-| Case Slug | File | Coverage Note |
-| --- | --- | --- |
-| `renew-extends-active-lease` | [renew-extends-active-lease.md](./renew-extends-active-lease.md) | owner renews an active lease and gets a renewal event |
-| `renew-rejects-non-owner` | [renew-rejects-non-owner.md](./renew-rejects-non-owner.md) | non-owner renew attempt returns lease_conflict |
-| `renew-rejects-without-active-lease` | [renew-rejects-without-active-lease.md](./renew-rejects-without-active-lease.md) | missing active lease returns invalid_state |
@@ -1,33 +0,0 @@
-# Case: `renew-extends-active-lease`
-
-## 用例意义
-
-验证租约拥有者可以对活跃 lease 执行续租，并生成续租事件消息。
-
-## 前置条件
-
- `worker-c` 已成功 `claim` 线程 `THREAD_ID`
-
-## 输入
-
-```bash
-inbox --db TMPDIR/coord.db --json renew --agent worker-c --thread THREAD_ID --lease-seconds 600
-```
-
-## 预期输出
-
- 命令退出码为 `0`
- `thread.status` 保持原状态
- `message.kind == "event"`
- `message.summary == "lease renewed"`
- `message.payload_json.lease_seconds == 600`
- `message.payload_json.lease_token` 存在
-
-## 断言结论
-
- `renew` 是在原线程上追加续租事件，而不是重新 claim
-
-## 补充约束
-
- `renew` 需要 agent 身份；可以通过命令级 `--agent` 提供，也可以回退到根级 `--agent`
- `--lease-seconds` 传入 `0` 或负数时，CLI 会按 `900` 秒默认值处理
@@ -1,24 +0,0 @@
-# Case: `renew-rejects-non-owner`
-
-## 用例意义
-
-验证非租约拥有者不能续租别人的活跃 lease。
-
-## 前置条件
-
- `worker-c` 已成功 `claim` 线程 `THREAD_ID`
-
-## 输入
-
-```bash
-inbox --db TMPDIR/coord.db --json renew --agent worker-x --thread THREAD_ID --lease-seconds 600
-```
-
-## 预期输出
-
- 退出码为 `20`
- JSON 错误码为 `lease_conflict`
-
-## 断言结论
-
- `renew` 与 `claim` 一样受 lease owner 约束
@@ -1,26 +0,0 @@
-# Case: `renew-rejects-without-active-lease`
-
-## 用例意义
-
-验证线程没有活跃租约时，`renew` 会明确失败。
-
-## 前置条件
-
- 已存在线程 `THREAD_ID`
- 该线程当前没有活跃 lease
-
-## 输入
-
-```bash
-inbox --db TMPDIR/coord.db --json renew --agent worker-c --thread THREAD_ID --lease-seconds 600
-```
-
-## 预期输出
-
- 退出码为 `30`
- JSON 错误码为 `invalid_state`
-
-## 断言结论
-
- `renew` 依赖已有活跃租约
- 没有 lease 属于状态错误，不是 not-found
@@ -1,10 +0,0 @@
-# Inbox `reply` Test Plan Index
-
-## Case Files
-
-| Case Slug | File | Coverage Note |
-| --- | --- | --- |
-| `reply-adds-answer-message` | [reply-adds-answer-message.md](./reply-adds-answer-message.md) | appends default `answer` message to an existing non-terminal thread |
-| `reply-supports-control-kind` | [reply-supports-control-kind.md](./reply-supports-control-kind.md) | supports explicit `--kind control` reply message |
-| `reply-attaches-artifact` | [reply-attaches-artifact.md](./reply-attaches-artifact.md) | appends reply message with artifact payload |
-| `reply-rejects-invalid-payload-json` | [reply-rejects-invalid-payload-json.md](./reply-rejects-invalid-payload-json.md) | rejects malformed `--payload-json` input |
@@ -1,34 +0,0 @@
-# Case: `reply-adds-answer-message`
-
-## 用例意义
-
-验证 `reply` 默认会向现有线程追加一条 `answer` 消息，并保持线程状态不变。
-
-## 前置条件
-
- 已存在一个非终态线程 `THREAD_ID`
-
-## 输入
-
-```bash
-inbox --db TMPDIR/coord.db --json reply --from leader --to worker-a --thread THREAD_ID --summary "Retry read timeouts" --body "Yes, include read timeouts in the retry policy."
-```
-
-## 预期输出
-
- 命令退出码为 `0`
- `message.kind == "answer"`
- `thread.thread_id == THREAD_ID`
- 线程状态保持原值
-
-## 断言结论
-
- `reply` 是线程内追加消息，而不是状态转换命令
-
-## 补充约束
-
- `--from` 未显式提供时，可以回退使用根级 `--agent`；如果两者都缺失，应返回 `invalid_input`
- `--thread`、`--to`、`--summary` 都是必填 flag；缺失时属于 `invalid_input` 类 usage error
- `reply` 只允许作用在既有非终态线程上；缺失线程应返回 `not_found`，终态线程应返回 `invalid_state`
- `--body` 与 `--body-file` 互斥；不可读的 `--body-file` 应返回 `invalid_input`
-
@@ -1,31 +0,0 @@
-# Case: `reply-attaches-artifact`
-
-## 用例意义
-
-验证 `reply` 支持追加带附件的答复消息。
-
-## 前置条件
-
- 已存在一个非终态线程 `THREAD_ID`
- `TMPDIR/decision.md` 已存在
-
-## 输入
-
-```bash
-inbox --db TMPDIR/coord.db --json reply --from leader --to worker-a --thread THREAD_ID --summary "Retry read timeouts" --artifact TMPDIR/decision.md --artifact-kind brief --artifact-metadata-json '{"label":"decision"}'
-```
-
-## 预期输出
-
- 命令退出码为 `0`
- `message.artifacts` 长度为 `1`
- artifact 路径、kind、metadata 都可读
-
-## 断言结论
-
- `reply` 与 `send/update/done/fail` 共享附件写入契约
-
-## 补充约束
-
- `artifact-kind` 与 `artifact-metadata-json` 依赖至少一个 `--artifact`；数量不匹配也应返回 `invalid_input`
-
@@ -1,25 +0,0 @@
-# Case: `reply-rejects-invalid-payload-json`
-
-## 用例意义
-
-验证 `reply` 对非法 `--payload-json` 输入返回稳定错误契约。
-
-## 前置条件
-
- 已存在一个非终态线程 `THREAD_ID`
-
-## 输入
-
-```bash
-inbox --db TMPDIR/coord.db --json reply --from leader --to worker-a --thread THREAD_ID --summary "Retry read timeouts" --payload-json not-json
-```
-
-## 预期输出
-
- 退出码为 `30`
- JSON 错误码为 `invalid_input`
-
-## 断言结论
-
- `reply` 的 payload 与其他消息写入命令一样需要通过 JSON 校验
-
@@ -1,25 +0,0 @@
-# Case: `reply-supports-control-kind`
-
-## 用例意义
-
-验证 `reply --kind control` 可以发送控制类消息，而不局限于默认 `answer`。
-
-## 前置条件
-
- 已存在一个非终态线程 `THREAD_ID`
-
-## 输入
-
-```bash
-inbox --db TMPDIR/coord.db --json reply --from leader --to worker-a --thread THREAD_ID --kind control --summary "Pause rollout" --body "Pause rollout until QA confirms the fix."
-```
-
-## 预期输出
-
- 命令退出码为 `0`
- `message.kind == "control"`
-
-## 断言结论
-
- `reply` 的消息种类可由调用方显式指定
-
@@ -1,12 +0,0 @@
-# Inbox `send` Test Plan Index
-
-## Case Files
-
-| Case Slug | File | Coverage Note |
-| --- | --- | --- |
-| `send-creates-new-thread` | [send-creates-new-thread.md](./send-creates-new-thread.md) | creates a pending thread with an initial task message |
-| `send-appends-message-to-existing-thread` | [send-appends-message-to-existing-thread.md](./send-appends-message-to-existing-thread.md) | appends a message to an existing non-terminal thread |
-| `send-reads-body-from-body-file` | [send-reads-body-from-body-file.md](./send-reads-body-from-body-file.md) | reads message body from a file path |
-| `send-attaches-artifact-with-metadata` | [send-attaches-artifact-with-metadata.md](./send-attaches-artifact-with-metadata.md) | persists artifact path, kind, and metadata on send |
-| `send-rejects-invalid-payload-json` | [send-rejects-invalid-payload-json.md](./send-rejects-invalid-payload-json.md) | rejects malformed payload JSON with `invalid_input` |
-| `send-rejects-invalid-artifact-metadata-json` | [send-rejects-invalid-artifact-metadata-json.md](./send-rejects-invalid-artifact-metadata-json.md) | rejects malformed artifact metadata JSON |
@@ -1,33 +0,0 @@
-# Case: `send-appends-message-to-existing-thread`
-
-## 用例意义
-
-验证 `send` 在指定既有 `--thread` 时会向原线程追加消息，而不是重建线程。
-
-## 前置条件
-
- 已存在一个由 `leader` 发给 `worker-d` 的非终态线程 `THREAD_ID`
-
-## 输入
-
-```bash
-inbox --db TMPDIR/coord.db --json send --from leader --to worker-d --thread THREAD_ID --summary "Use a markdown editor" --body "Prefer a textarea-based markdown editor for v1."
-inbox --db TMPDIR/coord.db --json show --thread THREAD_ID
-```
-
-## 预期输出
-
- `send` 成功，返回的 `thread.thread_id` 仍为 `THREAD_ID`
- 线程状态保持原值，不被强制改写为新状态
- `show` 可见消息数增加
-
-## 断言结论
-
- 追加消息不会重置线程生命周期
- 线程历史按时间顺序保留旧消息与新消息
-
-## 补充约束
-
- `--to` 是 CLI 必填参数；即使是向既有线程追加消息也不能省略
- 对既有线程执行追加时，如果传入了不同的 `--to`，线程的 `assigned_to` 会更新为新的接收者
- 终态线程不允许继续通过 `send` 追加消息，预期错误类型为 `invalid_state`
@@ -1,27 +0,0 @@
-# Case: `send-attaches-artifact-with-metadata`
-
-## 用例意义
-
-验证 `send` 支持附带 artifact、kind 和 metadata，并可在返回值或后续 `show` 中读取。
-
-## 前置条件
-
- `TMPDIR/task.md` 已存在
-
-## 输入
-
-```bash
-inbox --db TMPDIR/coord.db --json send --from leader --to worker-d --subject "Build admin editor" --summary "Create the first editor screen" --artifact TMPDIR/task.md --artifact-kind brief --artifact-metadata-json '{"label":"task-brief"}'
-```
-
-## 预期输出
-
- 命令退出码为 `0`
- `message.artifacts` 长度为 `1`
- artifact `path == "TMPDIR/task.md"`
- artifact `kind == "brief"`
- artifact `metadata_json.label == "task-brief"`
-
-## 断言结论
-
- `send` 可以在创建消息时持久化附件及其结构化元数据
@@ -1,36 +0,0 @@
-# Case: `send-creates-new-thread`
-
-## 用例意义
-
-验证 `send` 在未指定既有线程时会创建新线程，并写入首条任务消息。
-
-## 前置条件
-
- 空数据库已完成 `init`
-
-## 输入
-
-```bash
-inbox --db TMPDIR/coord.db --json send --from leader --to worker-a --subject "Implement feature X" --summary "Add retry policy" --body "Implement retry handling for the HTTP client." --run run_blog_001 --task T1
-```
-
-## 预期输出
-
- 命令退出码为 `0`
- 返回 `thread.thread_id`
- `thread.status == "pending"`
- `thread.created_by == "leader"`
- `thread.assigned_to == "worker-a"`
- `message.kind == "task"`
-
-## 断言结论
-
- `send` 会新建线程而不是只插入孤立消息
- 新线程的默认初始状态是 `pending`
-
-## 补充约束
-
- `--from` 未显式传入时，会回退使用根级 `--agent`
- 新建线程时未显式传 `--summary`，会回退到 `--subject`
- 新建线程时 `--kind` 默认是 `task`，`--priority` 默认是 `normal`
- 当 `--thread` 指向不存在的线程时，`send` 会使用该 thread ID 新建线程，而不是返回 `not_found`
@@ -1,30 +0,0 @@
-# Case: `send-reads-body-from-body-file`
-
-## 用例意义
-
-验证 `send --body-file` 会把文件内容写入消息正文。
-
-## 前置条件
-
- `TMPDIR/task.md` 已存在，内容为测试正文
-
-## 输入
-
-```bash
-inbox --db TMPDIR/coord.db --json send --from leader --to worker-d --subject "Build admin editor" --summary "Create the first editor screen" --body-file TMPDIR/task.md
-inbox --db TMPDIR/coord.db --json show --thread THREAD_ID
-```
-
-## 预期输出
-
- `send` 成功
- `show` 首条消息的 `body` 与文件内容一致
-
-## 断言结论
-
- `body-file` 内容会被原样读取
- 该行为与直接传 `--body` 的最终存储结果等价
-
-## 补充约束
-
- `--body` 与 `--body-file` 互斥；该约束由 shared 文档统一说明
@@ -1,24 +0,0 @@
-# Case: `send-rejects-invalid-artifact-metadata-json`
-
-## 用例意义
-
-验证 `send` 对非法 artifact metadata JSON 给出稳定错误契约。
-
-## 前置条件
-
- 空数据库已完成 `init`
-
-## 输入
-
-```bash
-inbox --db TMPDIR/coord.db --json send --from leader --to worker-z --subject "Invalid artifact json" --artifact TMPDIR/report.md --artifact-metadata-json not-json
-```
-
-## 预期输出
-
- 退出码为 `30`
- JSON 错误码为 `invalid_input`
-
-## 断言结论
-
- artifact metadata 会在写入前校验 JSON 合法性
@@ -1,25 +0,0 @@
-# Case: `send-rejects-invalid-payload-json`
-
-## 用例意义
-
-验证 `send` 对非法 `--payload-json` 输入给出稳定错误契约。
-
-## 前置条件
-
- 空数据库已完成 `init`
-
-## 输入
-
-```bash
-inbox --db TMPDIR/coord.db --json send --from leader --to worker-z --subject "Invalid payload json" --payload-json not-json
-```
-
-## 预期输出
-
- 退出码为 `30`
- JSON 错误码为 `invalid_input`
-
-## 断言结论
-
- 非法 payload 在写库前就会被拒绝
- 错误归类为输入问题，而不是内部错误
@@ -1,10 +0,0 @@
-# Inbox `show` Test Plan Index
-
-## Case Files
-
-| Case Slug | File | Coverage Note |
-| --- | --- | --- |
-| `show-returns-thread-and-message-history` | [show-returns-thread-and-message-history.md](./show-returns-thread-and-message-history.md) | returns thread details and full time-ordered message history |
-| `show-includes-artifacts-per-message` | [show-includes-artifacts-per-message.md](./show-includes-artifacts-per-message.md) | expands per-message artifacts in the show payload |
-| `show-mark-read-advances-read-cursor` | [show-mark-read-advances-read-cursor.md](./show-mark-read-advances-read-cursor.md) | advances caller read cursor when `--mark-read` is used |
-| `show-rejects-when-thread-missing` | [show-rejects-when-thread-missing.md](./show-rejects-when-thread-missing.md) | returns stable not-found contract for missing thread |
@@ -1,26 +0,0 @@
-# case: show-includes-artifacts-per-message
-
-### 用例意义
-
-验证 `show` 返回的每条消息都包含其关联 artifact 列表。
-
-### 前置条件
-
- 线程 `THREAD_ID` 中至少一条消息附带 artifact
-
-### 输入
-
-```bash
-inbox --db TMPDIR/coord.db --json show --thread THREAD_ID
-```
-
-### 预期输出
-
- 命令退出码为 `0`
- 相关消息节点包含 `artifacts`
- artifact 的 `path`、`kind`、`metadata_json` 可读
-
-### 断言结论
-
- `show` 需要把附件一并展开，而不是只返回 message 基本字段
-
@@ -1,27 +0,0 @@
-# case: show-mark-read-advances-read-cursor
-
-### 用例意义
-
-验证 `show --mark-read` 会推进调用 agent 的 read cursor，并影响后续 unread 查询。
-
-### 前置条件
-
- `worker-e` 有一个未读线程 `THREAD_ID`
-
-### 输入
-
-```bash
-inbox --db TMPDIR/coord.db --agent worker-e --json show --thread THREAD_ID --mark-read
-inbox --db TMPDIR/coord.db --json fetch --agent worker-e --status pending --unread
-```
-
-### 预期输出
-
- `show` 成功
- 随后的 `fetch --unread` 对该线程不再返回结果
-
-### 断言结论
-
- `mark-read` 的副作用是推进该 agent 的 `last_read_message_id`
- 使用 `--mark-read` 时必须提供 agent 身份，可通过根级 `--agent` 或命令参数传入
-
@@ -1,26 +0,0 @@
-# case: show-rejects-when-thread-missing
-
-### 用例意义
-
-验证 `show` 对不存在线程返回稳定的 not-found 错误契约。
-
-### 前置条件
-
- 空数据库已完成 `init`
-
-### 输入
-
-```bash
-inbox --db TMPDIR/coord.db --json show --thread thr_missing
-```
-
-### 预期输出
-
- 退出码为 `40`
- JSON 错误码为 `not_found`
-
-### 断言结论
-
- `show` 不会对缺失线程返回空对象
- `--thread` 是必填 flag；缺失时属于 `invalid_input` 类 usage error
-
@@ -1,29 +0,0 @@
-# case: show-returns-thread-and-message-history
-
-### 用例意义
-
-验证 `show` 会返回线程详情和完整消息历史。
-
-### 前置条件
-
- 已存在一个含多条消息的线程 `THREAD_ID`
-
-### 输入
-
-```bash
-inbox --db TMPDIR/coord.db --json show --thread THREAD_ID
-```
-
-### 预期输出
-
- 命令退出码为 `0`
- 返回 `data.thread`
- 返回 `data.messages`
- 消息按创建时间升序排列
-
-### 断言结论
-
- `show` 是线程详情与时间序历史的读取入口
- `show` 不依赖线程是否处于活动态；只要线程存在，就应能读取包括终态线程在内的完整历史
- 未使用 `--mark-read` 时，`show` 不要求提供 agent 身份
-
@@ -1,11 +0,0 @@
-# Inbox `update` Test Plan Index
-
-## Case Files
-
-| Case Slug | File | Coverage Note |
-| --- | --- | --- |
-| `update-moves-thread-to-in-progress` | [update-moves-thread-to-in-progress.md](./update-moves-thread-to-in-progress.md) | moves a claimed thread to `in_progress` and emits a progress message |
-| `update-moves-thread-to-blocked-with-payload` | [update-moves-thread-to-blocked-with-payload.md](./update-moves-thread-to-blocked-with-payload.md) | moves a claimed thread to `blocked` with structured question payload |
-| `update-accepts-body-file-and-artifact` | [update-accepts-body-file-and-artifact.md](./update-accepts-body-file-and-artifact.md) | persists update body from file plus artifacts |
-| `update-rejects-invalid-payload-json` | [update-rejects-invalid-payload-json.md](./update-rejects-invalid-payload-json.md) | rejects malformed `--payload-json` input |
-| `update-rejects-non-owner` | [update-rejects-non-owner.md](./update-rejects-non-owner.md) | rejects update when caller is not the active lease owner |
@@ -1,33 +0,0 @@
-# Case: `update-accepts-body-file-and-artifact`
-
-## 用例意义
-
-验证 `update` 支持通过 `body-file` 与 artifact 发送结构化进度材料。
-
-## 前置条件
-
- `worker-a` 已成功 `claim` 线程 `THREAD_ID`
- `TMPDIR/progress.md` 已存在
-
-## 输入
-
-```bash
-inbox --db TMPDIR/coord.db --json update --agent worker-a --thread THREAD_ID --status in_progress --summary "Implementation started" --body-file TMPDIR/progress.md --artifact TMPDIR/progress.md --artifact-kind note
-inbox --db TMPDIR/coord.db --json show --thread THREAD_ID
-```
-
-## 预期输出
-
- `update` 成功
- 对应消息 `body` 等于文件内容
- 对应消息包含 1 个 artifact，kind 为 `note`
-
-## 断言结论
-
- `update` 的正文与 artifact 支持与 `send/reply/done/fail` 保持一致
-
-## 补充约束
-
- `--body` 与 `--body-file` 互斥；读取 `body-file` 失败时应返回 `invalid_input`
- `artifact-kind` 与 `artifact-metadata-json` 不能脱离 `--artifact` 单独使用；数量不匹配时也应返回 `invalid_input`
-
@@ -1,27 +0,0 @@
-# Case: `update-moves-thread-to-blocked-with-payload`
-
-## 用例意义
-
-验证 `update --status blocked` 会写入阻塞问题消息，并保留结构化 payload。
-
-## 前置条件
-
- `worker-a` 已成功 `claim` 线程 `THREAD_ID`
-
-## 输入
-
-```bash
-inbox --db TMPDIR/coord.db --json update --agent worker-a --thread THREAD_ID --status blocked --summary "Need timeout decision" --payload-json '{"question":"Should retries apply to read timeouts?"}'
-```
-
-## 预期输出
-
- 命令退出码为 `0`
- `thread.status == "blocked"`
- `message.kind == "question"`
- `message.payload_json.question` 保存提问内容
-
-## 断言结论
-
- `blocked` 更新会生成面向创建者的问题消息
-
@@ -1,34 +0,0 @@
-# Case: `update-moves-thread-to-in-progress`
-
-## 用例意义
-
-验证租约拥有者可以把线程推进到 `in_progress`，并生成进度消息。
-
-## 前置条件
-
- `worker-a` 已成功 `claim` 线程 `THREAD_ID`
-
-## 输入
-
-```bash
-inbox --db TMPDIR/coord.db --json update --agent worker-a --thread THREAD_ID --status in_progress --summary "Implementation started" --body "Scanning current HTTP client usage."
-```
-
-## 预期输出
-
- 命令退出码为 `0`
- `thread.status == "in_progress"`
- `message.kind == "progress"`
- `message.to_agent` 指向线程创建者
-
-## 断言结论
-
- `update` 会把状态推进和消息追加合并为同一次事务
-
-## 补充约束
-
- `update` 只接受 `in_progress` 和 `blocked` 两种 `--status`；其他值应返回退出码 `30` 与错误码 `invalid_input`
- `update` 依赖活跃 lease：
- 若线程存在活跃 lease 但归属其他 agent，应返回 `lease_conflict`
- 若线程当前没有活跃 lease，应返回 `invalid_state`
-
@@ -1,25 +0,0 @@
-# Case: `update-rejects-invalid-payload-json`
-
-## 用例意义
-
-验证 `update` 对非法 `--payload-json` 输入返回稳定错误契约。
-
-## 前置条件
-
- `worker-a` 已成功 `claim` 线程 `THREAD_ID`
-
-## 输入
-
-```bash
-inbox --db TMPDIR/coord.db --json update --agent worker-a --thread THREAD_ID --status blocked --summary "Need timeout decision" --payload-json not-json
-```
-
-## 预期输出
-
- 退出码为 `30`
- JSON 错误码为 `invalid_input`
-
-## 断言结论
-
- 阻塞问题的 payload 需要满足合法 JSON 约束
-
@@ -1,25 +0,0 @@
-# Case: `update-rejects-non-owner`
-
-## 用例意义
-
-验证非租约拥有者不能更新线程状态。
-
-## 前置条件
-
- `worker-a` 已成功 `claim` 线程 `THREAD_ID`
-
-## 输入
-
-```bash
-inbox --db TMPDIR/coord.db --json update --agent worker-b --thread THREAD_ID --status in_progress --summary "Implementation started"
-```
-
-## 预期输出
-
- 退出码为 `20`
- JSON 错误码为 `lease_conflict`
-
-## 断言结论
-
- `update` 明确依赖活跃 lease 所属者
-
@@ -1,9 +0,0 @@
-# Inbox `wait-reply` Test Plan Index
-
-## Case Files
-
-| Case Slug | File | Coverage Note |
-| --- | --- | --- |
-| `wait-reply-wakes-on-answer-after-message` | [wait-reply-wakes-on-answer-after-message.md](./wait-reply-wakes-on-answer-after-message.md) | wakes for a qualifying reply after known message boundary |
-| `wait-reply-can-start-from-after-event` | [wait-reply-can-start-from-after-event.md](./wait-reply-can-start-from-after-event.md) | resumes waiting from a known event cursor |
-| `wait-reply-times-out-when-no-reply` | [wait-reply-times-out-when-no-reply.md](./wait-reply-times-out-when-no-reply.md) | returns timeout contract when no qualifying reply arrives |
@@ -1,29 +0,0 @@
-# case: wait-reply-can-start-from-after-event
-
-### 用例意义
-
-验证 `wait-reply --after-event` 支持从既知事件游标之后恢复等待。
-
-### 前置条件
-
- 已通过先前的 `watch` 或 `wait-reply` 结果拿到某个 `NEXT_EVENT_ID`
- 线程 `THREAD_ID` 后续还会收到新的回复类消息
-
-### 输入
-
-```bash
-inbox --db TMPDIR/coord.db --agent worker-c --json wait-reply --thread THREAD_ID --after-event NEXT_EVENT_ID --timeout-seconds 2
-inbox --db TMPDIR/coord.db --json reply --from leader --to worker-c --thread THREAD_ID --summary "Redirect to login" --body "Redirect guests to login for the MVP."
-```
-
-### 预期输出
-
- `wait-reply` 在事件游标之后的新回复出现时被唤醒
- 返回新的 `next_event_id`
-
-### 断言结论
-
- `after-event` 允许等待逻辑在断点之后继续，而不会重复消费旧回复
- `--kinds` 支持自定义逗号分隔的唤醒消息类型；未显式提供时默认使用 `answer,control,result`
- 默认唤醒 kinds 为 `answer,control,result`
-
@@ -1,28 +0,0 @@
-# case: wait-reply-times-out-when-no-reply
-
-### 用例意义
-
-验证在超时时间内没有匹配回复出现时，`wait-reply` 返回稳定超时契约。
-
-### 前置条件
-
- 存在一个线程 `THREAD_ID`
- 不会有新的 `answer/control/result` 消息到达
-
-### 输入
-
-```bash
-inbox --db TMPDIR/coord.db --agent worker-c --json wait-reply --thread THREAD_ID --timeout-seconds 1
-```
-
-### 预期输出
-
- 退出码为 `10`
- JSON 错误码为 `no_matching_work`
-
-### 断言结论
-
- `wait-reply` 超时被视为“没有等到匹配回复”
- `--thread` 是必填 flag；缺失时属于 `invalid_input` 类 usage error
- `--timeout-seconds=0` 表示无限等待，而不是立即超时
-
@@ -1,31 +0,0 @@
-# case: wait-reply-wakes-on-answer-after-message
-
-### 用例意义
-
-验证 `wait-reply` 可以从某条已知消息之后开始等待，并在答复到达后唤醒。
-
-### 前置条件
-
- `worker-c` 已拥有一个 `blocked` 线程 `THREAD_ID`
- 阻塞消息的 `message_id` 为 `BLOCKED_MESSAGE_ID`
-
-### 输入
-
-```bash
-inbox --db TMPDIR/coord.db --agent worker-c --json wait-reply --thread THREAD_ID --after-message BLOCKED_MESSAGE_ID --timeout-seconds 2
-inbox --db TMPDIR/coord.db --json reply --from leader --to worker-c --thread THREAD_ID --summary "Redirect to login" --body "Redirect guests to login for the MVP."
-```
-
-### 预期输出
-
- `wait-reply` 退出码为 `0`
- `wait-reply.data.woke == true`
- 返回的 `message.kind == "answer"`
-
-### 断言结论
-
- `wait-reply` 可以可靠地从既知消息边界之后等待后续答复
- `--agent` 不是必填；它主要用于在命中外来消息时推进该 agent 的 read cursor
- `--after-message` 必须引用该线程中已知的消息；如果消息不存在，应返回 `not_found`
- 当返回消息是发给等待 agent 的外来消息时，`wait-reply` 会顺带推进该 agent 的 read cursor
-
@@ -1,9 +0,0 @@
-# Inbox `watch` Test Plan Index
-
-## Case Files
-
-| Case Slug | File | Coverage Note |
-| --- | --- | --- |
-| `watch-wakes-on-matching-thread` | [watch-wakes-on-matching-thread.md](./watch-wakes-on-matching-thread.md) | wakes when a matching post-start event arrives and returns event context |
-| `watch-respects-status-filter` | [watch-respects-status-filter.md](./watch-respects-status-filter.md) | wakes only when thread transitions into requested status |
-| `watch-times-out-with-no-activity` | [watch-times-out-with-no-activity.md](./watch-times-out-with-no-activity.md) | returns timeout contract when no matching activity arrives |
@@ -1,29 +0,0 @@
-# case: watch-respects-status-filter
-
-### 用例意义
-
-验证 `watch --status` 只会对匹配状态的后续事件唤醒。
-
-### 前置条件
-
- 存在一个会被推进到 `blocked` 的线程 `THREAD_ID`
- `watch` 以 `--status blocked` 先启动
-
-### 输入
-
-```bash
-inbox --db TMPDIR/coord.db --json watch --agent worker-c --status blocked --timeout-seconds 2
-inbox --db TMPDIR/coord.db --json update --agent worker-c --thread THREAD_ID --status blocked --summary "Need policy decision"
-```
-
-### 预期输出
-
- `watch` 只在线程进入 `blocked` 后返回
- 返回的 `thread.status == "blocked"`
-
-### 断言结论
-
- `watch` 的状态过滤作用在“事件发生后的线程状态”上
- `--status` 默认值为 `pending,blocked,done,failed`；未显式传入时，`watch` 不是只观察 `pending`
- 显式传入 `--after-event` 时，`watch` 会从该事件游标之后恢复，允许调用方断点续看
-
@@ -1,27 +0,0 @@
-# case: watch-times-out-with-no-activity
-
-### 用例意义
-
-验证在超时时间内没有匹配活动时，`watch` 返回稳定超时契约。
-
-### 前置条件
-
- 没有新匹配事件会发生
-
-### 输入
-
-```bash
-inbox --db TMPDIR/coord.db --json watch --agent worker-d --status pending --timeout-seconds 1
-```
-
-### 预期输出
-
- 退出码为 `10`
- JSON 错误码为 `no_matching_work`
-
-### 断言结论
-
- `watch` 超时被归类为“无匹配工作”，而不是内部错误
- `--timeout-seconds 0` 表示无限等待，而不是立即超时
- 未传 `--after-event` 时，`watch` 默认从“当前时刻之后”开始等待，不会回放既有事件
-
@@ -1,30 +0,0 @@
-# case: watch-wakes-on-matching-thread
-
-### 用例意义
-
-验证 `watch` 在新匹配线程到达时会被唤醒，并返回线程、消息与事件信息。
-
-### 前置条件
-
- `worker-d` 当前没有匹配 `pending` 线程
- `watch` 先于 `send` 启动
-
-### 输入
-
-```bash
-inbox --db TMPDIR/coord.db --json watch --agent worker-d --status pending --timeout-seconds 2
-inbox --db TMPDIR/coord.db --json send --from leader --to worker-d --subject "Build admin editor" --summary "Create the first editor screen"
-```
-
-### 预期输出
-
- `watch` 退出码为 `0`
- `watch.data.woke == true`
- 返回 `thread`、`message`、`event`
-
-### 断言结论
-
- `watch` 唤醒结果不仅说明“醒了”，还提供触发该唤醒的具体事件上下文
- `--agent` 未显式提供时，可以回退使用根级 `--agent`；如果两者都未提供，则 `watch` 变为不按 `assigned_to` 过滤的全局观察
- 成功唤醒时返回的 `next_event_id` 应等于触发唤醒的 `event.event_id`
-
@@ -1,276 +0,0 @@
-# Inbox Workflow Test Plan
-
-## Scope
-
-This document tracks cross-command scenarios where the main value is the interaction between multiple `inbox` subcommands.
-
-All examples assume:
-
- isolated temp database
- `inbox --db TMPDIR/coord.db --json init` already executed
- assertions follow the shared rules in [../_shared/README.md](../_shared/README.md)
-
-## case: thread-lifecycle-happy-path
-
-### 用例意义
-
-验证 `send -> fetch -> claim -> update(in_progress) -> update(blocked) -> reply -> done -> show` 的主干链路可用，且线程与消息历史一致。
-
-### 前置条件
-
- 空数据库已完成 `init`
- 发送方为 `leader`
- 执行方为 `worker-a`
-
-### 输入
-
-```bash
-inbox --db TMPDIR/coord.db --json send --from leader --to worker-a --subject "Implement feature X" --summary "Add retry policy" --body "Implement retry handling for the HTTP client." --run run_blog_001 --task T1
-inbox --db TMPDIR/coord.db --json fetch --agent worker-a --status pending
-inbox --db TMPDIR/coord.db --json claim --agent worker-a --thread THREAD_ID --lease-seconds 300
-inbox --db TMPDIR/coord.db --json update --agent worker-a --thread THREAD_ID --status in_progress --summary "Implementation started" --body "Scanning current HTTP client usage."
-inbox --db TMPDIR/coord.db --json update --agent worker-a --thread THREAD_ID --status blocked --summary "Need timeout decision" --payload-json '{"question":"Should retries apply to read timeouts?"}'
-inbox --db TMPDIR/coord.db --json reply --from leader --to worker-a --thread THREAD_ID --summary "Retry read timeouts" --body "Yes, include read timeouts in the retry policy."
-inbox --db TMPDIR/coord.db --json done --agent worker-a --thread THREAD_ID --summary "Retry policy implemented" --body "The HTTP client now retries the selected transient failures."
-inbox --db TMPDIR/coord.db --json show --thread THREAD_ID
-```
-
-### 预期输出
-
- `send` 返回新建线程，线程状态为 `pending`
- `fetch` 返回唯一匹配线程
- `claim` 后线程状态为 `claimed`
- 第一次 `update` 后线程状态为 `in_progress`
- 第二次 `update` 后线程状态为 `blocked`
- `reply` 返回一条 `kind=answer` 的消息
- `done` 后线程状态为 `done`
- `show` 返回线程状态 `done`，并包含完整消息历史
-
-### 断言结论
-
- 全链路所有命令退出码为 `0`
- `show.data.thread.status == "done"`
- `show.data.messages` 长度为 `6`
- 历史中的状态推进顺序与执行顺序一致，不出现丢消息或状态回退
-
-## case: blocked-question-reply-resume-to-done
-
-### 用例意义
-
-验证被阻塞线程在收到答复后可以继续推进，并最终进入完成态。
-
-### 前置条件
-
- 已存在由 `leader` 发给 `worker-c` 的线程
- `worker-c` 已经成功 `claim`
-
-### 输入
-
-```bash
-inbox --db TMPDIR/coord.db --json update --agent worker-c --thread THREAD_ID --status blocked --summary "Need policy decision" --body "Should guest users be redirected to login or shown a 403 page?"
-inbox --db TMPDIR/coord.db --agent worker-c --json wait-reply --thread THREAD_ID --after-message BLOCKED_MESSAGE_ID --timeout-seconds 2
-inbox --db TMPDIR/coord.db --json reply --from leader --to worker-c --thread THREAD_ID --summary "Redirect to login" --body "Redirect guests to login for the MVP."
-inbox --db TMPDIR/coord.db --json done --agent worker-c --thread THREAD_ID --summary "Policy applied" --body "The flow now redirects guests to login."
-```
-
-### 预期输出
-
- `update` 将线程推进到 `blocked`
- `wait-reply` 在答复出现后唤醒
- 唤醒结果包含答复消息
- `done` 成功将线程推进到 `done`
-
-### 断言结论
-
- `wait-reply.data.woke == true`
- `wait-reply.data.message.kind == "answer"`
- 最终 `done.data.thread.status == "done"`
- 该用例强调“阻塞后可恢复”，不是单纯验证 reply 本身
-
-## case: fail-lifecycle-from-claim-to-terminal
-
-### 用例意义
-
-验证线程在被领取后可以直接进入失败终态，并且 `show` 对终态读取一致。
-
-### 前置条件
-
- 空数据库已完成 `init`
- `leader` 已向 `worker-b` 发送任务
- `worker-b` 已 `claim` 该线程
-
-### 输入
-
-```bash
-inbox --db TMPDIR/coord.db --json fail --agent worker-b --thread THREAD_ID --summary "Migration failed" --body "The migration cannot proceed because the prior schema is inconsistent."
-inbox --db TMPDIR/coord.db --json show --thread THREAD_ID
-```
-
-### 预期输出
-
- `fail` 返回线程状态 `failed`
- `show` 返回相同终态
-
-### 断言结论
-
- `fail.data.thread.status == "failed"`
- `show.data.thread.status == "failed"`
- 失败消息保留在线程历史中，可被后续排障读取
-
-## case: cancel-lifecycle-after-worker-claim
-
-### 用例意义
-
-验证线程在执行者已领取后，发起方仍可以取消任务，并进入 `cancelled` 终态。
-
-### 前置条件
-
- `leader` 已向 `worker-c` 发送任务
- `worker-c` 已成功 `claim`
-
-### 输入
-
-```bash
-inbox --db TMPDIR/coord.db --json cancel --agent leader --thread THREAD_ID --reason "Task superseded by a larger refactor"
-```
-
-### 预期输出
-
- `cancel` 成功
- 返回线程状态 `cancelled`
- 返回的消息记录取消原因
-
-### 断言结论
-
- `cancel.data.thread.status == "cancelled"`
- 取消属于终态转换，不要求执行者先主动释放 lease
- 原因字段可被后续 `show` 或审计场景消费
-
-## case: watch-wakes-then-fetch-sees-new-thread
-
-### 用例意义
-
-验证 `watch` 的等待语义与 `fetch --unread` 的可见性一致，确保新线程到达时执行者既会被唤醒，也能随后拉到未读任务。
-
-### 前置条件
-
- `worker-d` 尚无匹配 `pending` 线程
- `watch` 先于 `send` 启动
-
-### 输入
-
-```bash
-inbox --db TMPDIR/coord.db --json watch --agent worker-d --status pending --timeout-seconds 2
-inbox --db TMPDIR/coord.db --json send --from leader --to worker-d --subject "Build admin editor" --summary "Create the first editor screen" --body-file TMPDIR/task.md --artifact TMPDIR/task.md --artifact-kind brief --artifact-metadata-json '{"label":"task-brief"}' --run run_blog_004 --task T4
-inbox --db TMPDIR/coord.db --json fetch --agent worker-d --status pending --unread
-```
-
-### 预期输出
-
- `watch` 因新线程到达而唤醒
- 唤醒结果中的 `thread_id` 与 `send` 返回值一致
- 随后 `fetch --unread` 仍能看到该 `pending` 线程
-
-### 断言结论
-
- `watch.data.woke == true`
- `watch.data.thread.thread_id == send.data.thread.thread_id`
- `fetch.data.threads` 长度为 `1`
- `watch` 唤醒不应提前消费掉线程的未读可见性
-
-## case: artifact-visible-through-send-and-show
-
-### 用例意义
-
-验证 `send` 写入的 body-file 与 artifact 信息能被后续 `show` 完整读回。
-
-### 前置条件
-
- `TMPDIR/task.md` 已存在，内容为测试任务正文
- 空数据库已完成 `init`
-
-### 输入
-
-```bash
-inbox --db TMPDIR/coord.db --json send --from leader --to worker-d --subject "Build admin editor" --summary "Create the first editor screen" --body-file TMPDIR/task.md --artifact TMPDIR/task.md --artifact-kind brief --artifact-metadata-json '{"label":"task-brief"}' --run run_blog_004 --task T4
-inbox --db TMPDIR/coord.db --json show --thread THREAD_ID
-```
-
-### 预期输出
-
- `send` 成功创建线程并附带一条 artifact
- `show` 的首条消息包含从文件读取的正文与 artifact 列表
-
-### 断言结论
-
- 首条消息 `body` 等于 `TMPDIR/task.md` 的文件内容
- 首条消息 `artifacts` 长度为 `1`
- 首个 artifact 的 `path` 等于 `TMPDIR/task.md`
- 首个 artifact 的 `kind` 等于 `brief`
-
-## case: unread-clears-after-mark-read-and-reappears-on-new-message
-
-### 用例意义
-
-验证 read cursor 的最关键用户感知行为：未读任务可被显式清空，并会在同线程新消息到达后重新出现。
-
-### 前置条件
-
- `leader` 已向 `worker-e` 发送一个 `pending` 线程
-
-### 输入
-
-```bash
-inbox --db TMPDIR/coord.db --json fetch --agent worker-e --status pending --unread
-inbox --db TMPDIR/coord.db --agent worker-e --json show --thread THREAD_ID --mark-read
-inbox --db TMPDIR/coord.db --json fetch --agent worker-e --status pending --unread
-inbox --db TMPDIR/coord.db --json send --from leader --to worker-e --thread THREAD_ID --summary "Use sentence case" --body "Keep the nav labels in sentence case."
-inbox --db TMPDIR/coord.db --json fetch --agent worker-e --status pending --unread
-```
-
-### 预期输出
-
- 第一次 `fetch --unread` 返回该线程
- `show --mark-read` 成功推进 `worker-e` 的 read cursor
- 第二次 `fetch --unread` 无匹配结果
- 新消息追加后，第三次 `fetch --unread` 再次返回该线程
-
-### 断言结论
-
- 第一次 `fetch` 返回 1 条线程
- 第二次 `fetch` 退出码为 `10`，错误码为 `no_matching_work`
- 追加消息后第三次 `fetch` 再次返回 1 条线程
- 未读状态是按 agent 视角计算，而不是线程级布尔值
-
-## case: wait-reply-clears-blocked-unread-for-agent
-
-### 用例意义
-
-验证等待答复的消费者在收到答复后，其阻塞线程未读状态会被消费，避免“已经处理过回复但列表仍显示未读”的错觉。
-
-### 前置条件
-
- `worker-c` 已拥有一个 `blocked` 线程
- 该线程阻塞消息对应的 `message_id` 已知
- `worker-c` 使用 `wait-reply` 等待答复
-
-### 输入
-
-```bash
-inbox --db TMPDIR/coord.db --agent worker-c --json wait-reply --thread THREAD_ID --after-message BLOCKED_MESSAGE_ID --timeout-seconds 2
-inbox --db TMPDIR/coord.db --json reply --from leader --to worker-c --thread THREAD_ID --summary "Redirect to login" --body "Redirect guests to login for the MVP."
-inbox --db TMPDIR/coord.db --agent worker-c --json fetch --status blocked --unread
-```
-
-### 预期输出
-
- `wait-reply` 在答复后唤醒
- 唤醒结果携带 `answer` 消息
- 随后的 `fetch --status blocked --unread` 不再返回该线程
-
-### 断言结论
-
- `wait-reply.data.woke == true`
- `wait-reply.data.message.kind == "answer"`
- 后续 `fetch` 退出码为 `10`
- 对等待中的 agent 来说，答复消费与未读清理是同一条用户契约链路
@@ -1,210 +0,0 @@
-# Orch Skill Test Plan
-
-## Purpose
-
-This directory tracks human-readable test plans for the `skills/orch/` Codex skill bundle.
-
-These documents are not command-contract specs for the `orch` CLI itself.
-That coverage already lives under [../orch/](../orch/).
-
-This directory exists to describe a different test surface:
-
- whether a leader agent can actually use the packaged `orch` skill
- whether the bundled `./assets/orch` CLI works inside real skill-guided conversations
- whether leader-side orchestration driven by the skill reaches the expected run, task, thread, and worktree state
-
-## Test Model
-
- `README.md` is the index for this directory
- each skill test case lives in its own Markdown file
- use stable case slugs in filenames
-
-## Shared Execution Contract
-
-Use these defaults unless a case file explicitly overrides them:
-
- run the scenario with real subagents, not simulated transcripts
- inject `skills/orch/` into the leader agent
- inject `skills/inbox/` into worker agents whenever worker-side thread progress is required
- initialize the shared SQLite DB before launching role agents with `INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json init`
- require the leader to coordinate through the bundled `./assets/orch` CLI from the skill instead of ordinary chat
- require workers to coordinate through the bundled `./assets/inbox` CLI from their skill instead of ordinary chat
- launch-bridge cases may use a leader-only topology where the leader spawns worker subagents after dispatch instead of relying on the test-runner to launch separate worker roles
- validate final run and thread state independently from the main thread after the agents stop
- create any required Git repo fixture before launching agents for worktree cases
-
-## How An Agent Runs These Cases
-
-Use one test-runner agent to execute each case.
-
-The test-runner agent is responsible for:
-
- reading this `README.md` first, then one specific case file
- creating an isolated temporary directory and DB path for that run
- initializing the DB once through the bundled inbox CLI before launching role agents
- creating any required temporary Git repo fixture before launching role agents
- launching the role agents described in `Agent Topology`
- injecting `skills/orch/` into the leader and `skills/inbox/` into workers
- passing each role agent the prompt text from the case file with concrete values substituted for `ORCH_SKILL_PATH`, `INBOX_SKILL_PATH`, `TMPDIR`, `RUN_ID`, `THREAD_ID`, and `WORKTREE_PATH` when needed
- coordinating launch order or parallel start according to the case file
- collecting agent final summaries as evidence
- resolving final run ids, thread ids, and worktree paths from agent outputs
- running the `Validation Commands` from the main thread after the role agents stop
- comparing the observed results against `Expected Outcomes` and `Assertions`
- returning a final pass/fail judgment with concrete evidence
-
-The role agents are responsible for:
-
- acting only within the role assigned in the case file
- using the injected skill bundle rather than ad hoc repository discovery
- coordinating through the bundled CLI and shared DB
- reporting concrete run ids, thread ids, worktree paths, and key command outcomes back to the test-runner agent
-
-For launch-bridge cases:
-
- the leader may be the only top-level role agent
- that leader is responsible for spawning any worker subagents itself after `dispatch`
- spawned worker subagents should use the generated worker brief plus `skills/inbox/`, not ordinary chat
-
-The test-runner agent should treat a case as passed only when:
-
- all role agents reach a final state without violating the case contract
- the independent validation commands succeed
- the final orch and inbox state matches the assertions in the case file
-
-The test-runner agent should treat a case as failed when:
-
- any required agent times out or stalls
- a required orch or inbox action is skipped
- the leader falls back to ordinary chat for orchestration decisions that should go through `orch`
- workers fall back to ordinary chat for progress that should go through `inbox`
- the final run, task, thread, or worktree state conflicts with the documented assertions
-
-The test-runner agent should report results in this shape:
-
- `case`
- `db_path`
- `run_id`
- `thread_ids`
- `worktree_paths`
- `result`: `pass` or `fail`
- `agent_summaries`
- `validation_evidence`
- `assertion_checklist`
- `notes`
-
-## Default Timeouts
-
-Use these defaults unless a case file explicitly overrides them:
-
- per-agent timeout: `4m`
- overall scenario timeout: `6m`
- async wait margin for the main thread: `45s`
-
-## Default Failure Conditions
-
-Treat the test as failed if any of the following happens:
-
- any required agent does not reach a final state before timeout
- any required orch or inbox command returns a non-success result unless the case expects that failure
- the final `orch status` output does not match the expected run or task state
- the final `inbox show` output does not match the expected thread or message history
- a required worktree is missing too early or still present after cleanup in a cleanup case
- the agents fall back to ordinary chat for critical coordination instead of the bundled CLIs
-
-## Evidence Capture
-
-Collect at least the following artifacts for every run:
-
- agent final summaries
- final `orch status --run RUN_ID --json` output
- final `inbox show --thread THREAD_ID --json` output for every relevant thread
- any `blocked`, `wait`, `retry`, `reassign`, or `cleanup` output relevant to the case
- the temporary DB path, resolved run id, resolved thread ids, and any worktree paths
-
-## Cleanup Policy
-
-Use these defaults unless a case file explicitly overrides them:
-
- keep the temporary DB, repo fixture, and working directory on failure for debugging
- cleanup the temporary working directory on success only if the caller does not need replay artifacts
-
-## Direct CLI Replay
-
-The repository also includes a reusable direct replay runner at `scripts/run_orch_skill_forward_tests.sh`.
-
-This runner executes the bundled `skills/orch/assets/orch` and `skills/inbox/assets/inbox` binaries against temporary SQLite DBs and Git fixtures without spawning Codex role agents.
-
-Use it to validate packaged CLI behavior and record concrete evidence quickly, but do not treat it as a full replacement for the real subagent-forward model described above.
-
-All eight case files in this directory now include recorded example runs captured through that direct replay path on `2026-03-19`.
-
-## Real Subagent Forward Runs
-
-The original five cases in this directory were also executed with real spawned role agents on `2026-03-19`.
-
-That run used injected project-local `skills/orch/` and `skills/inbox/` bundles with a narrow-context fallback (`fork_context: false`) after an earlier wider-context attempt proved unreliable for this repo.
-
-The successful evidence root for those runs was `/tmp/orch-skill-subagents.J1XWgs`.
-
-Some longer cases used staged leader progression while keeping the same leader agent active across phases so the run still exercised real agent-driven `orch` control flow instead of a main-thread direct replay.
-
-The three gap-fill cases added later on `2026-03-19` currently have direct replay evidence only and have not yet been rerun through the real subagent-forward path.
-
-## Per-Case Template
-
-Each case file should use this structure:
-
- `Test Type`
- `Purpose`
- `Preconditions`
- `Agent Topology`
- `Inputs`
- `Execution Parameters`
- `Execution Steps`
- `Validation Commands`
- `Expected Outcomes`
- `Assertions`
- `Cleanup`
- `Recorded Example Run` when a real run has already been captured
-
-## Case Files
-
-| Case Slug | File | Coverage Note |
-| --- | --- | --- |
-| `leader-run-dispatch-reconcile-through-bundled-cli` | [leader-run-dispatch-reconcile-through-bundled-cli.md](./leader-run-dispatch-reconcile-through-bundled-cli.md) | validates that a leader can drive a complete `run -> task -> dispatch -> reconcile -> status` happy path through the packaged orch skill |
-| `leader-blocked-answer-resume-through-bundled-cli` | [leader-blocked-answer-resume-through-bundled-cli.md](./leader-blocked-answer-resume-through-bundled-cli.md) | validates that a leader can observe a blocked task, answer it through `orch`, and reach final completion with a real worker |
-| `strict-worktree-dispatch-to-cleanup-through-bundled-cli` | [strict-worktree-dispatch-to-cleanup-through-bundled-cli.md](./strict-worktree-dispatch-to-cleanup-through-bundled-cli.md) | validates that the skill can drive `execution-mode code` worktree allocation, reconcile completion, and cleanup through the bundled orch CLI |
-| `leader-dispatches-dependent-task-after-prerequisite-through-bundled-cli` | [leader-dispatches-dependent-task-after-prerequisite-through-bundled-cli.md](./leader-dispatches-dependent-task-after-prerequisite-through-bundled-cli.md) | validates that a leader can use `dep add` and `ready` to hold back dependent work until a prerequisite completes, then dispatch the newly ready task |
-| `leader-cancels-active-task-through-bundled-cli` | [leader-cancels-active-task-through-bundled-cli.md](./leader-cancels-active-task-through-bundled-cli.md) | validates that a leader can cancel an already active task through the packaged orch skill without cancelling unrelated ready work |
-| `leader-answers-blocked-task-with-payload-json-through-bundled-cli` | [leader-answers-blocked-task-with-payload-json-through-bundled-cli.md](./leader-answers-blocked-task-with-payload-json-through-bundled-cli.md) | validates that a leader can answer a blocked task with structured payload data only and still drive the run to completion |
-| `leader-retries-failed-task-through-bundled-cli` | [leader-retries-failed-task-through-bundled-cli.md](./leader-retries-failed-task-through-bundled-cli.md) | validates that a leader can reconcile a failed attempt and create a successful retry through the packaged orch skill |
-| `leader-reassigns-blocked-task-through-bundled-cli` | [leader-reassigns-blocked-task-through-bundled-cli.md](./leader-reassigns-blocked-task-through-bundled-cli.md) | validates that a leader can reassign a blocked task from one worker to another and close the run through the packaged orch skill |
-| `leader-dispatches-and-launches-worker-through-codex-bridge` | [leader-dispatches-and-launches-worker-through-codex-bridge.md](./leader-dispatches-and-launches-worker-through-codex-bridge.md) | validates that a leader can dispatch a task, render a standardized worker brief, and launch a worker subagent from the same Codex thread |
-| `strict-worktree-dispatch-launches-worker-through-codex-bridge` | [strict-worktree-dispatch-launches-worker-through-codex-bridge.md](./strict-worktree-dispatch-launches-worker-through-codex-bridge.md) | validates that a leader can launch a code-writing worker subagent from saved `execution-mode code` dispatch metadata while preserving the assigned worktree contract |
-
-## Scope
-
-In scope:
-
- explicit `$orch` skill invocation
- bundled `./assets/orch` CLI usage
- leader-side run, task, dependency, dispatch, reconcile, answer, retry, reassign, wait, status, and cleanup flows
- interaction between a leader using `skills/orch/` and workers using `skills/inbox/`
- leader-side launch-bridge workflows where the leader spawns worker subagents after `dispatch`
- worktree-backed dispatch and cleanup validation
- end-to-end run state and thread history validation
-
-Out of scope:
-
- per-command flag and JSON contract coverage for `orch`
- worker-only skill behavior that already belongs under [../inbox-skill/](../inbox-skill/)
- the separate `council-review` skill package
- implicit skill triggering without `$orch`
- changing the core `orch` CLI so it launches workers by itself
-
-## Relationship To Other Test Docs
-
- [../orch/](../orch/) covers CLI command behavior
- [../inbox-skill/](../inbox-skill/) covers worker-side skill-guided behavior on top of inbox
- this directory covers leader-side skill-guided behavior on top of `orch`
@@ -1,107 +0,0 @@
-# Case: `leader-answers-blocked-task-with-payload-json-through-bundled-cli`
-
-## Test Type
-
-This is a `forward-test` and a structured-answer skill validation.
-
-The goal is to verify that a leader using the packaged `orch` skill can answer a blocked task with pure `--payload-json`, allowing the worker to resume without relying on a freeform answer body.
-
-## Purpose
-
-Validate that all of the following can be true at the same time:
-
- the leader can use `wait`, `blocked`, `answer --payload-json`, `reconcile`, and `status` through the bundled orch skill
- a worker can post a blocked question through the bundled inbox skill
- the answer reaches the active thread as structured payload data
- the worker resumes after reading that payload and completes the task
- the final run reaches `done`
-
-## Preconditions
-
- orch skill path exists: `ORCH_SKILL_PATH=skills/orch`
- inbox skill path exists: `INBOX_SKILL_PATH=skills/inbox`
- bundled CLI executables exist at `ORCH_SKILL_PATH/assets/orch` and `INBOX_SKILL_PATH/assets/inbox`
- use an empty temporary directory `TMPDIR`
- initialize `TMPDIR/coord.db` before launching role agents through `INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json init`
-
-## Agent Topology
-
- `leader`
- `worker-a`
-
-## Inputs
-
-### Leader Prompt
-
-```text
-Use $orch at ORCH_SKILL_PATH to act as leader on the already initialized SQLite DB TMPDIR/coord.db. Only coordinate through the bundled orch CLI from the skill. Workflow: 1) create run run_blog_skill_payload_answer_001, 2) add and dispatch one task T1 to worker-a with --execution-mode analysis, 3) wait until the task becomes blocked, 4) inspect blocked tasks, 5) answer the blocked question using payload-json only with decision=stdout, source=leader, and format=structured, 6) wait until the task completes, 7) reconcile and inspect final status, 8) stop after reporting RUN_ID and THREAD_ID. Do not use ordinary chat to coordinate with the worker.
-```
-
-### Worker Prompt
-
-```text
-Use $inbox at INBOX_SKILL_PATH to act as worker-a on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) fetch and claim the assigned task, 2) send a blocked update asking for a structured logging decision, 3) wait for a reply, 4) confirm the reply payload tells you to use stdout, 5) finish the task with done, 6) stop after reporting the THREAD_ID you handled. Do not use ordinary chat to coordinate with the leader.
-```
-
-## Execution Parameters
-
- use the shared execution contract from [README.md](./README.md)
- use the shared timeout defaults from [README.md](./README.md)
- do not override the default cleanup policy
-
-## Execution Steps
-
-1. Initialize `TMPDIR/coord.db` once through the bundled inbox CLI before launching agents
-2. Inject `skills/orch/` into `leader`
-3. Inject `skills/inbox/` into `worker-a`
-4. Point both agents at the same database path `TMPDIR/coord.db`
-5. Launch `leader` and `worker-a` in parallel
-6. Wait for both agents to finish
-7. Resolve `THREAD_ID` from the agent outputs
-8. Independently run the validation commands from the main thread
-
-## Validation Commands
-
-```bash
-ORCH_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json status --run run_blog_skill_payload_answer_001
-INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json show --thread THREAD_ID
-```
-
-## Expected Outcomes
-
- the leader successfully observes a blocked event and inspects the blocked queue
- the leader successfully emits one payload-only answer through `orch`
- `worker-a` receives that answer through inbox history and sees `payload_json.decision == "stdout"`
- `worker-a` completes the task after the structured answer arrives
- the final run state is `done`
-
-## Assertions
-
- `status.data.run.status == "done"`
- `status.data.tasks[0].status == "done"`
- `show.data.messages[*].kind` includes `question`, `answer`, and `result`
- one `question` message contains `payload_json.question == "Use stdout or stderr for structured logs?"`
- one `answer` message contains `payload_json.decision == "stdout"`
- one `answer` message contains `payload_json.source == "leader"`
- one `answer` message contains `payload_json.format == "structured"`
- the final thread status is `done`
-
-## Cleanup
-
- use the default cleanup policy from [README.md](./README.md)
- if the run fails, retain `TMPDIR` and `coord.db` for replay and manual inspection
-
-## Recorded Example Run
-
- recorded on: `2026-03-19`
- execution mode: `direct_cli_replay` via `scripts/run_orch_skill_forward_tests.sh`
- result: `pass`
- observed run id: `run_blog_skill_payload_answer_001`
- observed thread id: `thr_735bde0f91794174b2b85fbe89e80581`
- evidence summary:
- `orch wait --for task_blocked` woke after the worker question, and `orch blocked` listed task `T1` as the active blocked task
- `orch answer --payload-json '{"decision":"stdout","source":"leader","format":"structured"}'` appended an `answer` message with those exact payload fields and an empty body
- `inbox wait-reply` woke on that structured answer and exposed `payload_json.decision == "stdout"`
- final `orch status --run run_blog_skill_payload_answer_001 --json` returned `run.status == "done"` and `tasks[0].status == "done"`
- final `inbox show --thread thr_735bde0f91794174b2b85fbe89e80581 --json` contained the blocked `question`, the structured `answer`, and the terminal `result`
- note: this recorded run exercised the packaged binaries directly in a temporary DB and did not spawn separate Codex role agents
@@ -1,116 +0,0 @@
-# Case: `leader-blocked-answer-resume-through-bundled-cli`
-
-## Test Type
-
-This is a `forward-test` and a blocked-question resolution skill validation.
-
-The goal is to verify that a leader using the packaged `orch` skill can observe a blocked task, answer it through `orch`, and reach final completion with a real worker using the packaged inbox skill.
-
-## Purpose
-
-Validate that all of the following can be true at the same time:
-
- the leader can use `orch wait`, `blocked`, `answer`, `reconcile`, and `status` through the bundled skill CLI
- a worker can ask a blocked question through the bundled inbox skill
- the answer reaches the active attempt thread
- the worker resumes after the answer and completes the task
- the final run reaches `done`
-
-## Preconditions
-
- orch skill path exists: `ORCH_SKILL_PATH=skills/orch`
- inbox skill path exists: `INBOX_SKILL_PATH=skills/inbox`
- bundled CLI executables exist at `ORCH_SKILL_PATH/assets/orch` and `INBOX_SKILL_PATH/assets/inbox`
- use an empty temporary directory `TMPDIR`
- initialize `TMPDIR/coord.db` before launching role agents through `INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json init`
-
-## Agent Topology
-
- `leader`
- `worker-a`
-
-## Inputs
-
-### Leader Prompt
-
-```text
-Use $orch at ORCH_SKILL_PATH to act as leader on the already initialized SQLite DB TMPDIR/coord.db. Only coordinate through the bundled orch CLI from the skill. Workflow: 1) create run run_blog_skill_002, 2) add and dispatch one task T1 to worker-a with --execution-mode analysis, 3) wait until the task becomes blocked, 4) inspect blocked tasks, 5) answer the blocked question with the decision "Use stdout for MVP.", 6) wait until the task completes, 7) reconcile and inspect final status, 8) stop after reporting RUN_ID and THREAD_ID. Do not use ordinary chat to coordinate with the worker.
-```
-
-### Worker Prompt
-
-```text
-Use $inbox at INBOX_SKILL_PATH to act as worker-a on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) fetch and claim the assigned task, 2) send one in_progress update, 3) send a blocked update asking "Should logging go to stdout or stderr?", 4) wait for a reply, 5) finish the task with done after you receive the leader decision, 6) stop after reporting the THREAD_ID you handled. Do not use ordinary chat to coordinate with the leader.
-```
-
-## Execution Parameters
-
- use the shared execution contract from [README.md](./README.md)
- use the shared timeout defaults from [README.md](./README.md)
- do not override the default cleanup policy
-
-## Execution Steps
-
-1. Initialize `TMPDIR/coord.db` once through the bundled inbox CLI before launching agents
-2. Inject `skills/orch/` into `leader`
-3. Inject `skills/inbox/` into `worker-a`
-4. Point both agents at the same database path `TMPDIR/coord.db`
-5. Launch `leader` and `worker-a` in parallel
-6. Wait for both agents to finish
-7. Resolve `THREAD_ID` from the agent outputs
-8. Independently run the validation commands from the main thread
-
-## Validation Commands
-
-```bash
-ORCH_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json status --run run_blog_skill_002
-INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json show --thread THREAD_ID
-```
-
-## Expected Outcomes
-
- `leader` successfully observes a blocked event through `orch`
- `leader` successfully inspects the blocked queue and emits one `answer`
- `worker-a` receives that answer through inbox history and completes the task
- the final run state is `done`
-
-## Assertions
-
- `status.data.run.status == "done"`
- `status.data.tasks[0].status == "done"`
- `show.data.messages[*].kind` includes `question`, `answer`, and `result`
- one `question` message contains `payload_json.question == "Should logging go to stdout or stderr?"`
- one `answer` message contains body `Use stdout for MVP.`
- the final thread status is `done`
-
-## Cleanup
-
- use the default cleanup policy from [README.md](./README.md)
- if the run fails, retain `TMPDIR` and `coord.db` for replay and manual inspection
-
-## Recorded Example Run
-
- recorded on: `2026-03-19`
- execution mode: `direct_cli_replay` via `scripts/run_orch_skill_forward_tests.sh`
- result: `pass`
- observed run id: `run_blog_skill_002`
- observed thread id: `thr_42ce634f273745e9b95badc14ce52708`
- evidence summary:
- `orch wait --for task_blocked` woke on the worker question, and `inbox wait-reply` later woke on the leader answer
- final `orch status --run run_blog_skill_002 --json` returned `run.status == "done"` and `tasks[0].status == "done"`
- final `inbox show --thread thr_42ce634f273745e9b95badc14ce52708 --json` contained `question`, `answer`, and `result` messages
- the recorded `question` payload was `Should logging go to stdout or stderr?`, and the recorded `answer` body was `Use stdout for MVP.`
- note: this recorded run exercised the packaged binaries directly in a temporary DB and did not spawn separate Codex role agents
-
-## Recorded Real Forward Run
-
- recorded on: `2026-03-19`
- execution mode: `real_subagent_forward_test`
- result: `pass`
- evidence root: `/tmp/orch-skill-subagents.J1XWgs/leader-blocked-answer-resume-through-bundled-cli`
- observed run id: `run_blog_skill_002`
- observed thread id: `thr_fd11536a0b2f4c668f6e78c38090816e`
- evidence summary:
- a real leader agent using `skills/orch/` completed `wait --for task_blocked`, `blocked`, `answer`, `wait --for task_done`, `reconcile`, and `status`
- a real worker agent using `skills/inbox/` completed `claim`, `update --status in_progress`, `update --status blocked`, `wait-reply`, resume `update`, and `done`
- main-thread validation confirmed `run.status == "done"`, `task.status == "done"`, the blocked question payload `Should logging go to stdout or stderr?`, and the answer body `Use stdout for MVP.`
@@ -1,105 +0,0 @@
-# Case: `leader-cancels-active-task-through-bundled-cli`
-
-## Test Type
-
-This is a `forward-test` and a direct task-cancel skill validation.
-
-The goal is to verify that a leader using the packaged `orch` skill can cancel an already active task attempt without cancelling unrelated ready work in the same run.
-
-## Purpose
-
-Validate that all of the following can be true at the same time:
-
- the leader can use `dispatch`, `cancel`, `ready`, and `status` through the bundled orch skill
- `worker-a` can claim the original thread and report active progress through the bundled inbox skill
- the leader can cancel that active task through `orch cancel --task`
- the original thread reaches `cancelled`
- another task in the same run remains actionable instead of being implicitly cancelled
-
-## Preconditions
-
- orch skill path exists: `ORCH_SKILL_PATH=skills/orch`
- inbox skill path exists: `INBOX_SKILL_PATH=skills/inbox`
- bundled CLI executables exist at `ORCH_SKILL_PATH/assets/orch` and `INBOX_SKILL_PATH/assets/inbox`
- use an empty temporary directory `TMPDIR`
- initialize `TMPDIR/coord.db` before launching role agents through `INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json init`
-
-## Agent Topology
-
- `leader`
- `worker-a`
-
-## Inputs
-
-### Leader Prompt
-
-```text
-Use $orch at ORCH_SKILL_PATH to act as leader on the already initialized SQLite DB TMPDIR/coord.db. Only coordinate through the bundled orch CLI from the skill. Workflow: 1) create run run_blog_skill_cancel_001, 2) add task T1 for worker-a and a second task T2 that should remain untouched, 3) dispatch T1 with --execution-mode analysis, 4) wait until worker-a has claimed it or marked it in progress, 5) cancel T1 with a clear reason through orch, 6) inspect ready work and final run status, 7) stop after reporting THREAD_ID_1. Do not use ordinary chat to coordinate with the worker.
-```
-
-### Worker Prompt
-
-```text
-Use $inbox at INBOX_SKILL_PATH to act as worker-a on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) fetch and claim the assigned thread, 2) send one in_progress update, 3) stop after reporting THREAD_ID_1 and that the task became active. Do not use ordinary chat to coordinate with the leader.
-```
-
-## Execution Parameters
-
- use the shared execution contract from [README.md](./README.md)
- use the shared timeout defaults from [README.md](./README.md)
- do not override the default cleanup policy
-
-## Execution Steps
-
-1. Initialize `TMPDIR/coord.db` once through the bundled inbox CLI before launching agents
-2. Inject `skills/orch/` into `leader`
-3. Inject `skills/inbox/` into `worker-a`
-4. Point both agents at the same database path `TMPDIR/coord.db`
-5. Launch `leader` and `worker-a` in parallel
-6. Wait for both agents to finish
-7. Resolve `THREAD_ID_1` from the agent outputs
-8. Independently run the validation commands from the main thread
-
-## Validation Commands
-
-```bash
-ORCH_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json status --run run_blog_skill_cancel_001
-ORCH_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json ready --run run_blog_skill_cancel_001
-INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json show --thread THREAD_ID_1
-```
-
-## Expected Outcomes
-
- `worker-a` successfully claims the original thread and reports `in_progress`
- the leader successfully cancels `T1` through `orch cancel --task`
- the original thread reaches `cancelled`
- the untouched task `T2` remains available in the ready queue
- the run remains open rather than collapsing into a fully cancelled run
-
-## Assertions
-
- `status.data.tasks` contains `T1` with status `cancelled`
- `status.data.tasks` contains `T2` with status `ready`
- `status.data.run.status == "ready"`
- `ready.data.tasks` contains only `T2`
- `show.data.thread.status == "cancelled"`
- the thread history preserves the worker `progress` message before the cancel
-
-## Cleanup
-
- use the default cleanup policy from [README.md](./README.md)
- if the run fails, retain `TMPDIR` and `coord.db` for replay and manual inspection
-
-## Recorded Example Run
-
- recorded on: `2026-03-19`
- execution mode: `direct_cli_replay` via `scripts/run_orch_skill_forward_tests.sh`
- result: `pass`
- observed run id: `run_blog_skill_cancel_001`
- observed thread id: `thr_175e00bca76549ea8529cb4c92d99fd4`
- evidence summary:
- final `orch status --run run_blog_skill_cancel_001 --json` returned `run.status == "ready"` with task counts `cancelled: 1` and `ready: 1`
- that same `status` output showed `T1.status == "cancelled"` while `T2.status == "ready"`
- final `orch ready --run run_blog_skill_cancel_001 --json` returned only `T2`, confirming the untouched task remained dispatchable
- final `inbox show --thread thr_175e00bca76549ea8529cb4c92d99fd4 --json` returned `thread.status == "cancelled"` and preserved the worker `progress` message before the cancel
- note: this recorded run exercised the packaged binaries directly in a temporary DB and did not spawn separate Codex role agents
@@ -1,97 +0,0 @@
-# Case: `leader-dispatches-and-launches-worker-through-codex-bridge`
-
-## Test Type
-
-This is a `forward-test` and a leader-side launch-bridge validation.
-
-The goal is to verify that a leader using the packaged `orch` skill can dispatch work, render a standardized worker brief through the skill assets, and launch a worker subagent from the same Codex thread without hand-writing the inbox handoff.
-
-## Purpose
-
-Validate that all of the following can be true at the same time:
-
- the leader can use the bundled `./assets/orch` CLI through the skill
- the leader can save `dispatch --json` output and turn it into a stable worker brief through `./assets/orch-worker-brief`
- the leader can spawn a worker subagent that uses `skills/inbox/` instead of ordinary chat
- the launched worker claims the dispatched thread and completes it
- the final orch run state and inbox thread state both reach `done`
-
-## Preconditions
-
- orch skill path exists: `ORCH_SKILL_PATH=skills/orch`
- inbox skill path exists: `INBOX_SKILL_PATH=skills/inbox`
- bundled CLI executables exist at `ORCH_SKILL_PATH/assets/orch` and `INBOX_SKILL_PATH/assets/inbox`
- the helper asset exists at `ORCH_SKILL_PATH/assets/orch-worker-brief`
- use an empty temporary directory `TMPDIR`
- initialize `TMPDIR/coord.db` before launching role agents through `INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json init`
-
-## Agent Topology
-
- `leader`
-
-The leader is responsible for spawning the worker subagent after dispatch.
-
-## Inputs
-
-### Leader Prompt
-
-```text
-Use $orch at ORCH_SKILL_PATH to act as leader on the already initialized SQLite DB TMPDIR/coord.db. Only coordinate through the bundled orch CLI from the skill. Workflow: 1) create run run_blog_skill_launch_001, 2) add exactly one task T1 assigned to worker-a, 3) dispatch it with --execution-mode analysis and save --json to TMPDIR/dispatch.json, 4) render a worker brief with ORCH_SKILL_PATH/assets/orch-worker-brief into TMPDIR/worker-brief.txt, 5) spawn one worker subagent that uses INBOX_SKILL_PATH and the generated worker brief, 6) wait or poll until the worker reports completion, 7) inspect final status, 8) stop after reporting RUN_ID and THREAD_ID. Do not use ordinary chat to coordinate with the worker; the launched worker must use inbox only.
-```
-
-## Execution Parameters
-
- use the shared execution contract from [README.md](./README.md)
- use the shared timeout defaults from [README.md](./README.md)
- do not override the default cleanup policy
-
-## Execution Steps
-
-1. Initialize `TMPDIR/coord.db` once through the bundled inbox CLI before launching agents
-2. Inject `skills/orch/` into `leader`
-3. Ensure `leader` can also reference `skills/inbox/` by path when it spawns the worker subagent
-4. Point the leader at the same database path `TMPDIR/coord.db`
-5. Launch `leader`
-6. Wait for `leader` and any spawned worker subagent(s) to finish
-7. Resolve `RUN_ID=run_blog_skill_launch_001` and `THREAD_ID` from the leader output
-8. Independently run the validation commands from the main thread
-
-## Validation Commands
-
-```bash
-ORCH_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json status --run run_blog_skill_launch_001
-INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json show --thread THREAD_ID
-test -f TMPDIR/dispatch.json
-test -f TMPDIR/worker-brief.txt
-```
-
-## Expected Outcomes
-
- the leader successfully creates `run_blog_skill_launch_001`
- the leader successfully dispatches `T1` and saves the JSON response
- the leader successfully renders a non-empty worker brief from that JSON response
- the leader successfully spawns a worker subagent that uses `skills/inbox/`
- the launched worker successfully claims the dispatched thread
- the launched worker completes the thread with `done`
- the final run state is `done`
-
-## Assertions
-
- `status.data.run.run_id == "run_blog_skill_launch_001"`
- `status.data.run.status == "done"`
- `status.data.tasks` contains exactly one task `T1`
- `status.data.tasks[0].status == "done"`
- `status.data.tasks[0].latest_attempt.assigned_to == "worker-a"`
- `show.data.thread.status == "done"`
- `show.data.messages[*].kind` includes `task`, `progress`, and `result`
- `TMPDIR/worker-brief.txt` mentions the expected `thread_id`
-
-## Cleanup
-
- use the default cleanup policy from [README.md](./README.md)
- if the run fails, retain `TMPDIR` and `coord.db` for replay and manual inspection
-
-## Recorded Example Run
-
- no recorded run yet
- this case should be captured with a real leader agent plus leader-launched worker subagent after the launch bridge assets are adopted
@@ -1,115 +0,0 @@
-# Case: `leader-dispatches-dependent-task-after-prerequisite-through-bundled-cli`
-
-## Test Type
-
-This is a `forward-test` and a dependency-gated ready-queue skill validation.
-
-The goal is to verify that a leader using the packaged `orch` skill can create a dependency edge, observe the correct `ready` set before and after prerequisite completion, and dispatch the dependent task only after it becomes eligible.
-
-## Purpose
-
-Validate that all of the following can be true at the same time:
-
- the leader can use `dep add`, `ready`, `dispatch`, `wait`, `reconcile`, and `status` through the bundled orch skill
- `worker-a` can complete the prerequisite task on the bundled inbox skill
- the dependent task stays out of the initial `ready` queue
- the dependent task appears in `ready` only after the prerequisite reaches `done`
- the leader can dispatch that newly ready dependent task to `worker-b` and close the run
-
-## Preconditions
-
- orch skill path exists: `ORCH_SKILL_PATH=skills/orch`
- inbox skill path exists: `INBOX_SKILL_PATH=skills/inbox`
- bundled CLI executables exist at `ORCH_SKILL_PATH/assets/orch` and `INBOX_SKILL_PATH/assets/inbox`
- use an empty temporary directory `TMPDIR`
- initialize `TMPDIR/coord.db` before launching role agents through `INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json init`
-
-## Agent Topology
-
- `leader`
- `worker-a`
- `worker-b`
-
-## Inputs
-
-### Leader Prompt
-
-```text
-Use $orch at ORCH_SKILL_PATH to act as leader on the already initialized SQLite DB TMPDIR/coord.db. Only coordinate through the bundled orch CLI from the skill. Workflow: 1) create run run_blog_skill_deps_001, 2) add prerequisite task T1 for worker-a and dependent task T2 for worker-b, 3) make T2 depend on T1, 4) inspect ready work and confirm only T1 is dispatchable at first, 5) dispatch T1 with --execution-mode analysis, 6) wait until T1 completes, 7) reconcile and inspect ready work again, 8) dispatch T2 only after it becomes ready with --execution-mode analysis, 9) wait until T2 completes, 10) reconcile and inspect final status, 11) stop after reporting THREAD_ID_1 and THREAD_ID_2. Do not use ordinary chat to coordinate with the workers.
-```
-
-### Worker A Prompt
-
-```text
-Use $inbox at INBOX_SKILL_PATH to act as worker-a on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) fetch and claim the prerequisite thread assigned to worker-a, 2) send one in_progress update, 3) finish it with done, 4) stop after reporting THREAD_ID_1. Do not use ordinary chat to coordinate with the leader or worker-b.
-```
-
-### Worker B Prompt
-
-```text
-Use $inbox at INBOX_SKILL_PATH to act as worker-b on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) wait until dependent work assigned to worker-b appears, 2) fetch and claim that thread, 3) finish it with done, 4) stop after reporting THREAD_ID_2. Do not use ordinary chat to coordinate with the leader or worker-a.
-```
-
-## Execution Parameters
-
- use the shared execution contract from [README.md](./README.md)
- use the shared timeout defaults from [README.md](./README.md)
- do not override the default cleanup policy
-
-## Execution Steps
-
-1. Initialize `TMPDIR/coord.db` once through the bundled inbox CLI before launching agents
-2. Inject `skills/orch/` into `leader`
-3. Inject `skills/inbox/` into `worker-a` and `worker-b`
-4. Point all agents at the same database path `TMPDIR/coord.db`
-5. Launch `leader`, `worker-a`, and `worker-b` in parallel
-6. Wait for all agents to finish
-7. Resolve `THREAD_ID_1` and `THREAD_ID_2` from the agent outputs
-8. Independently run the validation commands from the main thread
-
-## Validation Commands
-
-```bash
-ORCH_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json status --run run_blog_skill_deps_001
-INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json show --thread THREAD_ID_1
-INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json show --thread THREAD_ID_2
-```
-
-## Expected Outcomes
-
- the leader initially sees only `T1` in the `ready` output
- `worker-a` completes the prerequisite thread for `T1`
- after reconcile, the leader sees `T2` become ready
- `worker-b` receives a distinct thread for `T2` and completes it
- the final run reaches `done`
-
-## Assertions
-
- the initial `ready` output contains `T1` and does not contain `T2`
- the post-reconcile `ready` output contains `T2`
- `THREAD_ID_1 != THREAD_ID_2`
- `status.data.run.status == "done"`
- `status.data.tasks` contains `T1` and `T2`, both with status `done`
- `show THREAD_ID_1` reports a terminal done thread state
- `show THREAD_ID_2` reports a terminal done thread state
-
-## Cleanup
-
- use the default cleanup policy from [README.md](./README.md)
- if the run fails, retain `TMPDIR` and `coord.db` for replay and manual inspection
-
-## Recorded Example Run
-
- recorded on: `2026-03-19`
- execution mode: `direct_cli_replay` via `scripts/run_orch_skill_forward_tests.sh`
- result: `pass`
- observed run id: `run_blog_skill_deps_001`
- observed first thread id: `thr_7f57b577e5ce4cc094341e7d2eae4570`
- observed second thread id: `thr_5dbc81f2fe234b6dbf0c57a176e13acf`
- evidence summary:
- the initial `ready` output returned only `T1`, confirming that dependent task `T2` stayed gated before prerequisite completion
- after `worker-a` completed `T1` and the leader ran `reconcile`, the next `ready` output returned only `T2`
- final `orch status --run run_blog_skill_deps_001 --json` returned `run.status == "done"` with both tasks `T1` and `T2` in state `done`
- final `inbox show` on both thread ids returned terminal thread state `done`
- the replay also observed `orch wait --for task_done` wake on the prerequisite completion before the dependent dispatch
- note: this recorded run exercised the packaged binaries directly in a temporary DB and did not spawn separate Codex role agents
@@ -1,129 +0,0 @@
-# Case: `leader-reassigns-blocked-task-through-bundled-cli`
-
-## Test Type
-
-This is a `forward-test` and a reassignment-path skill validation.
-
-The goal is to verify that a leader using the packaged `orch` skill can observe a blocked task, reassign it from one worker to another, and drive the run to completion through the new attempt.
-
-## Purpose
-
-Validate that all of the following can be true at the same time:
-
- the leader can use `blocked`, `reassign`, `reconcile`, and `status` through the bundled orch skill
- `worker-a` can claim the original attempt and block on a question
- `worker-b` can receive the reassigned attempt as a new thread
- the original thread is cancelled and the new thread reaches `done`
- the final run reaches `done`
-
-## Preconditions
-
- orch skill path exists: `ORCH_SKILL_PATH=skills/orch`
- inbox skill path exists: `INBOX_SKILL_PATH=skills/inbox`
- bundled CLI executables exist at `ORCH_SKILL_PATH/assets/orch` and `INBOX_SKILL_PATH/assets/inbox`
- use an empty temporary directory `TMPDIR`
- initialize `TMPDIR/coord.db` before launching role agents through `INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json init`
-
-## Agent Topology
-
- `leader`
- `worker-a`
- `worker-b`
-
-## Inputs
-
-### Leader Prompt
-
-```text
-Use $orch at ORCH_SKILL_PATH to act as leader on the already initialized SQLite DB TMPDIR/coord.db. Only coordinate through the bundled orch CLI from the skill. Workflow: 1) create run run_blog_skill_reassign_001, 2) add and dispatch one task T1 to worker-a with --execution-mode analysis, 3) wait until worker-a blocks, 4) inspect blocked tasks, 5) reassign T1 to worker-b with a short reason, 6) wait until worker-b completes the new attempt, 7) reconcile and inspect final status, 8) stop after reporting THREAD_ID_1 and THREAD_ID_2. Do not use ordinary chat to coordinate with the workers.
-```
-
-### Worker A Prompt
-
-```text
-Use $inbox at INBOX_SKILL_PATH to act as worker-a on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) fetch and claim the initial assigned thread, 2) send one blocked update with a precise question, 3) stop after reporting THREAD_ID_1 and the blocked summary you sent. Do not use ordinary chat to coordinate with the leader or worker-b.
-```
-
-### Worker B Prompt
-
-```text
-Use $inbox at INBOX_SKILL_PATH to act as worker-b on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) wait until reassigned work for worker-b appears, 2) fetch and claim it, 3) complete it with done, 4) stop after reporting THREAD_ID_2. Do not use ordinary chat to coordinate with the leader or worker-a.
-```
-
-## Execution Parameters
-
- use the shared execution contract from [README.md](./README.md)
- use the shared timeout defaults from [README.md](./README.md)
- do not override the default cleanup policy
-
-## Execution Steps
-
-1. Initialize `TMPDIR/coord.db` once through the bundled inbox CLI before launching agents
-2. Inject `skills/orch/` into `leader`
-3. Inject `skills/inbox/` into `worker-a` and `worker-b`
-4. Point all agents at the same database path `TMPDIR/coord.db`
-5. Launch `leader`, `worker-a`, and `worker-b` in parallel
-6. Wait for all agents to finish
-7. Resolve `THREAD_ID_1` and `THREAD_ID_2` from the agent outputs
-8. Independently run the validation commands from the main thread
-
-## Validation Commands
-
-```bash
-ORCH_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json status --run run_blog_skill_reassign_001
-INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json show --thread THREAD_ID_1
-INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json show --thread THREAD_ID_2
-```
-
-## Expected Outcomes
-
- `worker-a` successfully claims the original thread and blocks it
- the leader successfully reassigns the task to `worker-b`
- the original thread reaches `cancelled`
- `worker-b` receives a distinct reassigned thread and completes it
- the final run reaches `done`
-
-## Assertions
-
- `THREAD_ID_1 != THREAD_ID_2`
- `status.data.run.status == "done"`
- `status.data.tasks[0].status == "done"`
- `show THREAD_ID_1` reports a terminal cancelled thread state
- `show THREAD_ID_2` reports a terminal done thread state
- the blocked question remains visible in the original thread history
-
-## Cleanup
-
- use the default cleanup policy from [README.md](./README.md)
- if the run fails, retain `TMPDIR` and `coord.db` for replay and manual inspection
-
-## Recorded Example Run
-
- recorded on: `2026-03-19`
- execution mode: `direct_cli_replay` via `scripts/run_orch_skill_forward_tests.sh`
- result: `pass`
- observed run id: `run_blog_skill_reassign_001`
- observed original thread id: `thr_0a61240412134de3b3d9ab219b6c8f19`
- observed reassigned thread id: `thr_12fbcf6d89d948548306198d013d77a5`
- evidence summary:
- `orch wait --for task_blocked` woke after worker-a posted a blocked question with payload `Proceed with v1 scope?`
- `orch reassign --run run_blog_skill_reassign_001 --task T1 --to worker-b --json` returned `attempt_no == 2` and assigned the new attempt to `worker-b`
- final `inbox show` on the original thread returned `thread.status == "cancelled"` and preserved the blocked `question` message
- final `inbox show` on the reassigned thread returned `thread.status == "done"`
- final `orch status --run run_blog_skill_reassign_001 --json` returned `run.status == "done"` and `tasks[0].status == "done"`
- note: this recorded run exercised the packaged binaries directly in a temporary DB and did not spawn separate Codex role agents
-
-## Recorded Real Forward Run
-
- recorded on: `2026-03-19`
- execution mode: `real_subagent_forward_test`
- result: `pass`
- evidence root: `/tmp/orch-skill-subagents.J1XWgs/leader-reassigns-blocked-task-through-bundled-cli-phased`
- observed run id: `run_blog_skill_reassign_001`
- observed original thread id: `thr_7d43af5bc1f7467da98a39adb0de5808`
- observed reassigned thread id: `thr_eba253db8965423b855d0c784a29702c`
- evidence summary:
- the same real leader agent using `skills/orch/` completed the case in three phases: initial `run/task/dispatch`, then `wait --for task_blocked` plus `reassign`, then final `wait --for task_done` plus `status`
- a real `worker-a` agent using `skills/inbox/` claimed the original thread and posted the blocked question `Proceed with v1 scope?`
- a real `worker-b` agent using `skills/inbox/` claimed the reassigned thread and completed it
- main-thread validation confirmed the original thread finished `cancelled`, the reassigned thread finished `done`, and the original blocked question remained visible in thread history
@@ -1,121 +0,0 @@
-# Case: `leader-retries-failed-task-through-bundled-cli`
-
-## Test Type
-
-This is a `forward-test` and a retry-path skill validation.
-
-The goal is to verify that a leader using the packaged `orch` skill can reconcile a failed attempt, issue `retry`, and drive the task to success through a second attempt handled by a real worker.
-
-## Purpose
-
-Validate that all of the following can be true at the same time:
-
- the leader can use the bundled orch skill to dispatch an initial attempt
- a worker can fail the first attempt through inbox
- the leader can reconcile that failure and create a fresh retry attempt
- the worker can complete the retried attempt
- the final run reaches `done` and the two attempts map to different threads
-
-## Preconditions
-
- orch skill path exists: `ORCH_SKILL_PATH=skills/orch`
- inbox skill path exists: `INBOX_SKILL_PATH=skills/inbox`
- bundled CLI executables exist at `ORCH_SKILL_PATH/assets/orch` and `INBOX_SKILL_PATH/assets/inbox`
- use an empty temporary directory `TMPDIR`
- initialize `TMPDIR/coord.db` before launching role agents through `INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json init`
-
-## Agent Topology
-
- `leader`
- `worker-a`
-
-## Inputs
-
-### Leader Prompt
-
-```text
-Use $orch at ORCH_SKILL_PATH to act as leader on the already initialized SQLite DB TMPDIR/coord.db. Only coordinate through the bundled orch CLI from the skill. Workflow: 1) create run run_blog_skill_retry_001, 2) add and dispatch one task T1 to worker-a with --execution-mode analysis, 3) wait until the first attempt fails, 4) reconcile, 5) retry T1 with a short retry note, 6) wait until the retried attempt completes, 7) reconcile again and inspect final status, 8) stop after reporting RUN_ID, THREAD_ID_1, and THREAD_ID_2. Do not use ordinary chat to coordinate with the worker.
-```
-
-### Worker Prompt
-
-```text
-Use $inbox at INBOX_SKILL_PATH to act as worker-a on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) fetch and claim the first assigned thread, 2) fail that first attempt with a clear summary, 3) keep watching for retried work assigned to worker-a, 4) fetch and claim the retried thread, 5) finish the retried attempt with done, 6) stop after reporting both THREAD_ID_1 and THREAD_ID_2. Do not use ordinary chat to coordinate with the leader.
-```
-
-## Execution Parameters
-
- use the shared execution contract from [README.md](./README.md)
- use the shared timeout defaults from [README.md](./README.md)
- do not override the default cleanup policy
-
-## Execution Steps
-
-1. Initialize `TMPDIR/coord.db` once through the bundled inbox CLI before launching agents
-2. Inject `skills/orch/` into `leader`
-3. Inject `skills/inbox/` into `worker-a`
-4. Point both agents at the same database path `TMPDIR/coord.db`
-5. Launch `leader` and `worker-a` in parallel
-6. Wait for both agents to finish
-7. Resolve `THREAD_ID_1` and `THREAD_ID_2` from the agent outputs
-8. Independently run the validation commands from the main thread
-
-## Validation Commands
-
-```bash
-ORCH_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json status --run run_blog_skill_retry_001
-INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json show --thread THREAD_ID_1
-INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json show --thread THREAD_ID_2
-```
-
-## Expected Outcomes
-
- the first worker-owned thread reaches `failed`
- the leader successfully issues `retry`
- the second worker-owned thread is distinct from the first
- the second worker-owned thread reaches `done`
- the final run state is `done`
-
-## Assertions
-
- `THREAD_ID_1 != THREAD_ID_2`
- `status.data.run.status == "done"`
- `status.data.tasks[0].status == "done"`
- `show THREAD_ID_1` reports a terminal failed thread state
- `show THREAD_ID_2` reports a terminal done thread state
- the worker summary confirms that the retried attempt was a new thread rather than a reused one
-
-## Cleanup
-
- use the default cleanup policy from [README.md](./README.md)
- if the run fails, retain `TMPDIR` and `coord.db` for replay and manual inspection
-
-## Recorded Example Run
-
- recorded on: `2026-03-19`
- execution mode: `direct_cli_replay` via `scripts/run_orch_skill_forward_tests.sh`
- result: `pass`
- observed run id: `run_blog_skill_retry_001`
- observed first thread id: `thr_8dbf2d2e46d7469891cc1ef604da476f`
- observed second thread id: `thr_bdd86f4fe08e4ebfb39b8151ac41a3bb`
- evidence summary:
- `orch wait --for task_failed` woke after the first worker-owned thread failed
- `orch retry --run run_blog_skill_retry_001 --task T1 --json` returned `attempt_no == 2` with a distinct replacement thread for the same worker
- final `inbox show` on the first thread returned `thread.status == "failed"`
- final `inbox show` on the second thread returned `thread.status == "done"`
- final `orch status --run run_blog_skill_retry_001 --json` returned `run.status == "done"` and `tasks[0].status == "done"`
- note: this recorded run exercised the packaged binaries directly in a temporary DB and did not spawn separate Codex role agents
-
-## Recorded Real Forward Run
-
- recorded on: `2026-03-19`
- execution mode: `real_subagent_forward_test`
- result: `pass`
- evidence root: `/tmp/orch-skill-subagents.J1XWgs/leader-retries-failed-task-through-bundled-cli-phased`
- observed run id: `run_blog_skill_retry_001`
- observed first thread id: `thr_1e22121642294b56aae351ddec5180d1`
- observed second thread id: `thr_f2ab1f1899964007b2447796204e1928`
- evidence summary:
- the same real leader agent using `skills/orch/` completed the case in three phases: initial `run/task/dispatch`, then `wait --for task_failed` plus `retry`, then final `wait --for task_done` plus `status`
- a real worker agent using `skills/inbox/` failed the first thread, polled for the retried pending thread, then claimed and completed the second thread
- main-thread validation confirmed the two thread ids were distinct, the first thread finished `failed`, the second thread finished `done`, and the run/task both finished `done`
@@ -1,116 +0,0 @@
-# Case: `leader-run-dispatch-reconcile-through-bundled-cli`
-
-## Test Type
-
-This is a `forward-test` and a leader-side happy-path skill validation.
-
-The goal is to verify that a leader using the packaged `orch` skill can drive a complete run lifecycle while a worker uses the packaged `inbox` skill for thread progress.
-
-## Purpose
-
-Validate that all of the following can be true at the same time:
-
- the leader can use the bundled `./assets/orch` CLI through the skill
- the leader can create a run, add a task, dispatch it, reconcile worker progress, and inspect final status
- a worker using the bundled inbox skill can claim the dispatched thread and finish it
- the final orch run state and inbox thread state both reach `done`
-
-## Preconditions
-
- orch skill path exists: `ORCH_SKILL_PATH=skills/orch`
- inbox skill path exists: `INBOX_SKILL_PATH=skills/inbox`
- bundled CLI executables exist at `ORCH_SKILL_PATH/assets/orch` and `INBOX_SKILL_PATH/assets/inbox`
- use an empty temporary directory `TMPDIR`
- initialize `TMPDIR/coord.db` before launching role agents through `INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json init`
-
-## Agent Topology
-
- `leader`
- `worker-a`
-
-## Inputs
-
-### Leader Prompt
-
-```text
-Use $orch at ORCH_SKILL_PATH to act as leader on the already initialized SQLite DB TMPDIR/coord.db. Only coordinate through the bundled orch CLI from the skill. Workflow: 1) create run run_blog_skill_001, 2) add exactly one task T1 assigned to worker-a, 3) dispatch it with --execution-mode analysis, 4) wait or poll until the worker reports completion, 5) reconcile the run, 6) inspect final status, 7) stop after reporting RUN_ID and THREAD_ID. Do not use ordinary chat to coordinate with the worker.
-```
-
-### Worker Prompt
-
-```text
-Use $inbox at INBOX_SKILL_PATH to act as worker-a on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) fetch pending work for worker-a, 2) claim it, 3) send one in_progress update, 4) finish it with done, 5) stop after reporting the THREAD_ID you handled. Do not use ordinary chat to coordinate with the leader.
-```
-
-## Execution Parameters
-
- use the shared execution contract from [README.md](./README.md)
- use the shared timeout defaults from [README.md](./README.md)
- do not override the default cleanup policy
-
-## Execution Steps
-
-1. Initialize `TMPDIR/coord.db` once through the bundled inbox CLI before launching agents
-2. Inject `skills/orch/` into `leader`
-3. Inject `skills/inbox/` into `worker-a`
-4. Point both agents at the same database path `TMPDIR/coord.db`
-5. Launch `leader` and `worker-a` in parallel
-6. Wait for both agents to finish
-7. Resolve `RUN_ID=run_blog_skill_001` and `THREAD_ID` from the agent outputs
-8. Independently run the validation commands from the main thread
-
-## Validation Commands
-
-```bash
-ORCH_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json status --run run_blog_skill_001
-INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json show --thread THREAD_ID
-```
-
-## Expected Outcomes
-
- `leader` successfully creates `run_blog_skill_001`
- `leader` successfully adds and dispatches `T1`
- `worker-a` successfully claims the dispatched thread
- `worker-a` emits at least one `in_progress` update
- `worker-a` completes the thread with `done`
- `leader` successfully reconciles and sees `run.status == "done"`
-
-## Assertions
-
- `status.data.run.run_id == "run_blog_skill_001"`
- `status.data.run.status == "done"`
- `status.data.tasks` contains exactly one task `T1`
- `status.data.tasks[0].status == "done"`
- `show.data.thread.status == "done"`
- `show.data.messages[*].kind` includes `task`, `progress`, and `result`
-
-## Cleanup
-
- use the default cleanup policy from [README.md](./README.md)
- if the run fails, retain `TMPDIR` and `coord.db` for replay and manual inspection
-
-## Recorded Example Run
-
- recorded on: `2026-03-19`
- execution mode: `direct_cli_replay` via `scripts/run_orch_skill_forward_tests.sh`
- result: `pass`
- observed run id: `run_blog_skill_001`
- observed thread id: `thr_eced1b8cb1254065a7cd3aaff6dc0bcb`
- evidence summary:
- final `orch status --run run_blog_skill_001 --json` returned `run.status == "done"` with a single task `T1` in state `done`
- final `inbox show --thread thr_eced1b8cb1254065a7cd3aaff6dc0bcb --json` returned thread state `done` and message kinds `task`, `progress`, and `result`
- the replay also observed `orch wait --for task_done` wake successfully before the final reconcile
- note: this recorded run exercised the packaged binaries directly in a temporary DB and did not spawn separate Codex role agents
-
-## Recorded Real Forward Run
-
- recorded on: `2026-03-19`
- execution mode: `real_subagent_forward_test`
- result: `pass`
- evidence root: `/tmp/orch-skill-subagents.J1XWgs/leader-run-dispatch-reconcile-through-bundled-cli`
- observed run id: `run_blog_skill_001`
- observed thread id: `thr_7c64e75bbcce4143a7fc425242f7e7d3`
- evidence summary:
- a real leader agent using `skills/orch/` completed `run init`, `task add`, `dispatch`, `wait`, `reconcile`, and `status`
- a real worker agent using `skills/inbox/` completed `fetch`, `claim`, `update --status in_progress`, and `done`
- main-thread validation confirmed `status.data.run.status == "done"`, `status.data.tasks[0].status == "done"`, and thread history kinds `task`, `progress`, and `result`
@@ -1,97 +0,0 @@
-# Case: `strict-worktree-dispatch-launches-worker-through-codex-bridge`
-
-## Test Type
-
-This is a `forward-test` and a worktree launch-bridge validation.
-
-The goal is to verify that a leader using the packaged `orch` skill can dispatch a code task, render a standardized worker brief from the saved dispatch JSON, and launch a worker subagent that respects the assigned worktree contract.
-
-## Purpose
-
-Validate that all of the following can be true at the same time:
-
- the leader can dispatch a code task with `--execution-mode code` through the bundled orch skill
- the leader can turn that dispatch JSON into a stable worker brief through `./assets/orch-worker-brief`
- the launched worker subagent uses `skills/inbox/` and reports through inbox
- the launched worker observes the assigned `worktree_path` and completes the attempt
- the leader can reconcile the finished task and clean the attempt worktree
-
-## Preconditions
-
- orch skill path exists: `ORCH_SKILL_PATH=skills/orch`
- inbox skill path exists: `INBOX_SKILL_PATH=skills/inbox`
- bundled CLI executables exist at `ORCH_SKILL_PATH/assets/orch` and `INBOX_SKILL_PATH/assets/inbox`
- the helper asset exists at `ORCH_SKILL_PATH/assets/orch-worker-brief`
- use an empty temporary directory `TMPDIR`
- initialize `TMPDIR/coord.db` before launching role agents through `INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json init`
- create `TMPDIR/repo` as a Git repository with one committed file before launching agents
-
-## Agent Topology
-
- `leader`
-
-The leader is responsible for spawning the code-writing worker subagent after dispatch.
-
-## Inputs
-
-### Leader Prompt
-
-```text
-Use $orch at ORCH_SKILL_PATH to act as leader on the already initialized SQLite DB TMPDIR/coord.db. Only coordinate through the bundled orch CLI from the skill. Workflow: 1) create run run_blog_skill_launch_worktree_001, 2) add one code task T1 for worker-a, 3) dispatch it with --execution-mode code --repo-path TMPDIR/repo --workspace-root .orch/worktrees while saving --json to TMPDIR/dispatch.json, 4) render a worker brief with ORCH_SKILL_PATH/assets/orch-worker-brief into TMPDIR/worker-brief.txt, 5) spawn one worker subagent that uses INBOX_SKILL_PATH and the generated worker brief, 6) wait until the worker completes, 7) inspect final status, 8) clean up attempt 1, 9) stop after reporting RUN_ID, THREAD_ID, and WORKTREE_PATH. Do not use ordinary chat to coordinate with the worker; the launched worker must use inbox only and should respect the assigned worktree.
-```
-
-## Execution Parameters
-
- use the shared execution contract from [README.md](./README.md)
- use the shared timeout defaults from [README.md](./README.md)
- do not override the default cleanup policy
-
-## Execution Steps
-
-1. Initialize `TMPDIR/coord.db` once through the bundled inbox CLI before launching agents
-2. Create `TMPDIR/repo` with an initial commit before launching agents
-3. Inject `skills/orch/` into `leader`
-4. Ensure `leader` can also reference `skills/inbox/` by path when it spawns the worker subagent
-5. Point the leader at the same database path `TMPDIR/coord.db`
-6. Launch `leader`
-7. Wait for `leader` and any spawned worker subagent(s) to finish
-8. Resolve `THREAD_ID` and `WORKTREE_PATH` from the leader output
-9. Independently run the validation commands from the main thread
-
-## Validation Commands
-
-```bash
-ORCH_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json status --run run_blog_skill_launch_worktree_001
-INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json show --thread THREAD_ID
-test -f TMPDIR/dispatch.json
-test -f TMPDIR/worker-brief.txt
-test ! -d WORKTREE_PATH
-```
-
-## Expected Outcomes
-
- the leader reports a non-empty `WORKTREE_PATH` from dispatch
- the rendered worker brief includes that same `worktree_path`
- the launched worker subagent claims the assigned thread and completes it through inbox
- the final run status is `done`
- the cleanup step removes the worktree directory
-
-## Assertions
-
- `status.data.run.status == "done"`
- `status.data.tasks[0].status == "done"`
- `status.data.tasks[0].latest_attempt.worktree_path == WORKTREE_PATH`
- `show.data.thread.status == "done"`
- the task-side thread history includes a payload field or body content referencing the worktree path
- `TMPDIR/worker-brief.txt` mentions the expected `WORKTREE_PATH`
- `WORKTREE_PATH` does not exist after cleanup
-
-## Cleanup
-
- use the default cleanup policy from [README.md](./README.md)
- if the run fails, retain `TMPDIR`, `coord.db`, and the Git repo fixture for replay and manual inspection
-
-## Recorded Example Run
-
- no recorded run yet
- this case should be captured with a real leader agent plus leader-launched worker subagent after the launch bridge assets are adopted
@@ -1,119 +0,0 @@
-# Case: `strict-worktree-dispatch-to-cleanup-through-bundled-cli`
-
-## Test Type
-
-This is a `forward-test` and a worktree-lifecycle skill validation.
-
-The goal is to verify that a leader using the packaged `orch` skill can allocate a code-mode worktree, reconcile completion, and clean that worktree up through the bundled CLI while a worker completes the task through inbox.
-
-## Purpose
-
-Validate that all of the following can be true at the same time:
-
- the leader can dispatch a code task with `--execution-mode code` through the bundled orch skill
- the worker can complete the resulting attempt thread through inbox
- the leader can reconcile the finished task and clean the attempt worktree
- the final filesystem state matches the cleanup contract
-
-## Preconditions
-
- orch skill path exists: `ORCH_SKILL_PATH=skills/orch`
- inbox skill path exists: `INBOX_SKILL_PATH=skills/inbox`
- bundled CLI executables exist at `ORCH_SKILL_PATH/assets/orch` and `INBOX_SKILL_PATH/assets/inbox`
- use an empty temporary directory `TMPDIR`
- initialize `TMPDIR/coord.db` before launching role agents through `INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json init`
- create `TMPDIR/repo` as a Git repository with one committed file before launching role agents
-
-## Agent Topology
-
- `leader`
- `worker-a`
-
-## Inputs
-
-### Leader Prompt
-
-```text
-Use $orch at ORCH_SKILL_PATH to act as leader on the already initialized SQLite DB TMPDIR/coord.db. Only coordinate through the bundled orch CLI from the skill. Workflow: 1) create run run_blog_skill_worktree_001, 2) add one code task T1 for worker-a, 3) dispatch it with --execution-mode code --repo-path TMPDIR/repo --workspace-root .orch/worktrees, 4) record the returned THREAD_ID and WORKTREE_PATH, 5) wait until the worker completes, 6) reconcile, 7) clean up attempt 1, 8) stop after reporting RUN_ID, THREAD_ID, and WORKTREE_PATH. Do not use ordinary chat to coordinate with the worker.
-```
-
-### Worker Prompt
-
-```text
-Use $inbox at INBOX_SKILL_PATH to act as worker-a on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) fetch and claim the assigned task, 2) inspect the task payload enough to confirm a worktree path was provided, 3) finish the task with done, 4) stop after reporting the THREAD_ID you handled and whether you observed a worktree path. Do not use ordinary chat to coordinate with the leader.
-```
-
-## Execution Parameters
-
- use the shared execution contract from [README.md](./README.md)
- use the shared timeout defaults from [README.md](./README.md)
- do not override the default cleanup policy
-
-## Execution Steps
-
-1. Initialize `TMPDIR/coord.db` once through the bundled inbox CLI before launching agents
-2. Create `TMPDIR/repo` with an initial commit before launching agents
-3. Inject `skills/orch/` into `leader`
-4. Inject `skills/inbox/` into `worker-a`
-5. Point both agents at the same database path `TMPDIR/coord.db`
-6. Launch `leader` and `worker-a` in parallel
-7. Wait for both agents to finish
-8. Resolve `THREAD_ID` and `WORKTREE_PATH` from the agent outputs
-9. Independently run the validation commands from the main thread
-
-## Validation Commands
-
-```bash
-ORCH_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json status --run run_blog_skill_worktree_001
-INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json show --thread THREAD_ID
-test ! -d WORKTREE_PATH
-```
-
-## Expected Outcomes
-
- the leader reports a non-empty `WORKTREE_PATH` from dispatch
- the worker reports that the task payload exposed a worktree path
- the final run status is `done`
- the cleanup step removes the worktree directory
-
-## Assertions
-
- `status.data.run.status == "done"`
- `status.data.tasks[0].status == "done"`
- `show.data.thread.status == "done"`
- the task-side thread history includes a payload field or body content referencing the worktree path
- `WORKTREE_PATH` does not exist after cleanup
-
-## Cleanup
-
- use the default cleanup policy from [README.md](./README.md)
- if the run fails, retain `TMPDIR`, `coord.db`, and the Git repo fixture for replay and manual inspection
-
-## Recorded Example Run
-
- recorded on: `2026-03-19`
- execution mode: `direct_cli_replay` via `scripts/run_orch_skill_forward_tests.sh`
- result: `pass`
- observed run id: `run_blog_skill_worktree_001`
- observed thread id: `thr_5743259fdccb41f9bb33dce0040b27a5`
- observed worktree suffix: `.orch/worktrees/run-blog-skill-worktree-001/T1/attempt-1`
- evidence summary:
- `orch dispatch --execution-mode code` returned `base_ref == "HEAD"`, a concrete `base_commit`, branch `orch/run-blog-skill-worktree-001/T1/attempt-1`, and a non-empty `worktree_path`
- the task payload stored on the worker thread exposed the same `worktree_path`
- final `orch status --run run_blog_skill_worktree_001 --json` returned `run.status == "done"` and `tasks[0].status == "done"`
- final `orch cleanup --run run_blog_skill_worktree_001 --task T1 --json` returned one cleaned attempt and the worktree directory no longer existed afterward
- note: this recorded run exercised the packaged binaries directly in a temporary DB and Git fixture and did not spawn separate Codex role agents
-
-## Recorded Real Forward Run
-
- recorded on: `2026-03-19`
- execution mode: `real_subagent_forward_test`
- result: `pass`
- evidence root: `/tmp/orch-skill-subagents.J1XWgs/strict-worktree-dispatch-to-cleanup-through-bundled-cli`
- observed run id: `run_blog_skill_worktree_001`
- observed thread id: `thr_089527cd07f74b52a524ba07ed74c2e4`
- observed worktree path: `/private/tmp/orch-skill-subagents.J1XWgs/strict-worktree-dispatch-to-cleanup-through-bundled-cli/repo/.orch/worktrees/run-blog-skill-worktree-001/T1/attempt-1`
- evidence summary:
- a real leader agent using `skills/orch/` completed code-mode `dispatch`, `wait`, `reconcile`, `cleanup`, and `status`
- a real worker agent using `skills/inbox/` claimed the thread and finished it with `done`
- main-thread validation confirmed that the task payload did include the same `worktree_path` even though the worker agent summary failed to notice it, and also confirmed the worktree directory no longer existed after cleanup
@@ -1,77 +0,0 @@
-# Orch Markdown Test Plan
-
-## Purpose
-
-This directory contains the human-readable Markdown test plan for the `orch` CLI.
-
-It complements automated Go tests. The goal is to preserve the user-visible scheduler contract in a form that can be reviewed, extended, and executed manually without re-deriving command behavior from implementation code.
-
-## Directory Rules
-
- one folder per `orch` leaf command or shared area
- each folder keeps a `README.md` entrypoint
- command folders use `README.md` as an index only
- each command test case lives in its own Markdown file named after the case slug
- no numeric test IDs
- each command case is identified by its concrete file path
-
-Case file naming pattern:
-
-```text
-<case-slug>.md
-```
-
-## Authoring Principles
-
- focus on externally visible CLI behavior rather than store internals
- prefer stable command sequences that a new agent can replay against a temp database
- document both success contracts and failure boundaries
- reuse scenarios from automated `orch` integration tests before inventing new cases
- keep terminology consistent with the scheduler concepts exposed by `orch`: run, task, dependency, attempt, blocked task, worktree, and council review
-
-## Common Execution Model
-
-Most cases in this directory assume the same baseline:
-
-1. create an isolated temporary directory
-2. choose a database path such as `TMPDIR/coord.db`
-3. run the target `orch` command sequence with `--db TMPDIR/coord.db --json`
-4. when a case needs worker-side state transitions, drive them through `inbox` against the same database
-
-Unless a case says otherwise:
-
- commands should use `--json`
- assertions should check both exit code and JSON payload
- `orch` may be pointed at an empty database path; schema bootstrapping happens automatically on open
-
-## Folder Map
-
- `README.md`: global conventions and glossary
- `ROADMAP.md`: document progress, planned case backlog, and authored-case register
- `_shared/README.md`: reusable fixtures, JSON assertions, exit-code rules, and worktree conventions
- `workflows/README.md`: cross-command end-to-end scenarios
- per-command folders: one leaf-command directory per implemented `orch` command surface
- `verify/`: verification-gate command cases
-
-## Glossary
-
- `run`: one coordinated execution for a user request
- `task`: one schedulable unit of work inside a run
- `dependency`: an edge that gates one task on another
- `attempt`: one execution try for a task
- `dispatch`: the act of materializing a task into an inbox thread
- `workspace`: the branch and worktree assigned to a code-writing attempt
- `verification gate`: the check aggregation state between worker `done` and final task completion
- `verifying`: the task state used while required checks are still pending or being recorded
- `blocked task`: a task whose active attempt requires clarification or another external decision
- `council review`: a higher-level workflow built on top of `orch` that dispatches fixed reviewer roles and tallies recommendations
-
-## Relationship To Automated Tests
-
-The current best executable reference is [integration_test.go](../../../packages/orch-runtime/internal/cli/orch/integration_test.go).
-
-When this Markdown plan expands:
-
- prefer matching an existing automated scenario first
- record any additional manual-only contract coverage explicitly in the relevant command case file
- keep [ROADMAP.md](./ROADMAP.md) synchronized with authored files and case slugs
--- a/Show More
+++ b/Show More