Add orch skill test plan docs

2026-03-19 17:25:03 +08:00
parent 8f10dff823
commit 8b26815d53
8 changed files with 697 additions and 0 deletions
@@ -25,6 +25,7 @@ As of now:
 - a reusable Codex skill package for `inbox` now exists under `skills/inbox/`, with a formal `SKILL.md`, `agents/openai.yaml`, and a bundled CLI binary asset
 - reusable Codex skill packages for `orch` and `council-review` now exist under `skills/orch/` and `skills/council-review/`, both using bundled copies of the `orch` CLI binary asset
 - an inbox skill forward-test plan directory now exists under `docs/tests/inbox-skill/`, with a shared execution template and multiple scenario cases
 - an orch skill forward-test plan directory now exists under `docs/tests/orch-skill/`, with a shared execution contract and initial leader-side workflow scenarios
 - an execution-roadmap workflow now exists under `docs/roadmaps/active/` and `docs/roadmaps/archive/` for agent-level work traces and completion archives
 - a repo-local `scripts/package_skill_clis.sh` packaging flow now builds bundled skill CLI assets for `inbox`, `orch`, and `council-review`
 - `orch` now implements `run init/show`, `task add`, `dep add`, `ready`, `dispatch`, `reconcile`, `wait`, `blocked`, `answer`, `retry`, `reassign`, `cancel`, `cleanup`, and `status`
@@ -0,0 +1,64 @@
 # Title
 Add Orch Skill Test Plan Documents
 ## Status
 - `completed`
 ## Owner
 - Codex main agent
 ## Started At
 - `2026-03-19`
 ## Goal
 - Add a human-readable forward-test plan directory for the packaged `skills/orch/` bundle under `docs/tests/`.
 - Mirror the proven structure of `docs/tests/inbox-skill/` while adapting it to leader-side `orch` workflows and `inbox`-backed worker coordination.
 ## Scope
 - Create `docs/tests/orch-skill/README.md`.
 - Author an initial set of `orch` skill scenario cases as separate Markdown files.
 - Update implementation progress docs to record the new test-plan directory.
 ## Checklist
 - [x] Review `docs/tests/inbox-skill/`, `skills/orch/`, and current `orch` workflow surface.
 - [x] Create `docs/tests/orch-skill/README.md` with shared execution contract and case index.
 - [x] Author initial `orch-skill` case documents.
 - [x] Update implementation roadmap and archive this execution roadmap.
 ## Files
 - `docs/tests/orch-skill/README.md`
 - `docs/tests/orch-skill/leader-run-dispatch-reconcile-through-bundled-cli.md`
 - `docs/tests/orch-skill/leader-blocked-answer-resume-through-bundled-cli.md`
 - `docs/tests/orch-skill/strict-worktree-dispatch-to-cleanup-through-bundled-cli.md`
 - `docs/tests/orch-skill/leader-retries-failed-task-through-bundled-cli.md`
 - `docs/tests/orch-skill/leader-reassigns-blocked-task-through-bundled-cli.md`
 - `docs/implementation-roadmap.md`
 - `docs/roadmaps/archive/orch-skill-test-plan.md`
 ## Decisions
 - Keep `orch-skill` separate from `council-review` skill docs, because `council-review` is a distinct project-local skill package.
 - Use the same forward-test style as `inbox-skill`, but inject `skills/orch/` only into the leader and `skills/inbox/` into workers.
 - Treat shared DB bootstrap through `inbox init` as part of the test-runner setup contract rather than pretending `orch` owns schema initialization.
 ## Blockers
 - none
 ## Next Step
 - Add a parallel `docs/tests/council-review-skill/` directory when the separate council skill test surface is ready to be documented.
 ## Completion Summary
 - Added `docs/tests/orch-skill/README.md` as the shared execution contract and index for leader-side skill validation.
 - Added five initial forward-test scenario documents covering happy-path orchestration, blocked-answer resume, strict worktree cleanup, retry after failure, and reassignment from one worker to another.
 - Updated `docs/implementation-roadmap.md` to record that the `orch` skill now has a dedicated forward-test plan directory under `docs/tests/orch-skill/`.
@@ -0,0 +1,174 @@
 # Orch Skill Test Plan
 ## Purpose
 This directory tracks human-readable test plans for the `skills/orch/` Codex skill bundle.
 These documents are not command-contract specs for the `orch` CLI itself.
 That coverage already lives under [../orch/](../orch/).
 This directory exists to describe a different test surface:
 - whether a leader agent can actually use the packaged `orch` skill
 - whether the bundled `./assets/orch` CLI works inside real skill-guided conversations
 - whether leader-side orchestration driven by the skill reaches the expected run, task, thread, and worktree state
 ## Test Model
 - `README.md` is the index for this directory
 - each skill test case lives in its own Markdown file
 - use stable case slugs in filenames
 ## Shared Execution Contract
 Use these defaults unless a case file explicitly overrides them:
 - run the scenario with real subagents, not simulated transcripts
 - inject `skills/orch/` into the leader agent
 - inject `skills/inbox/` into worker agents whenever worker-side thread progress is required
 - initialize the shared SQLite DB before launching role agents with `INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json init`
 - require the leader to coordinate through the bundled `./assets/orch` CLI from the skill instead of ordinary chat
 - require workers to coordinate through the bundled `./assets/inbox` CLI from their skill instead of ordinary chat
 - validate final run and thread state independently from the main thread after the agents stop
 - create any required Git repo fixture before launching agents for worktree cases
 ## How An Agent Runs These Cases
 Use one test-runner agent to execute each case.
 The test-runner agent is responsible for:
 - reading this `README.md` first, then one specific case file
 - creating an isolated temporary directory and DB path for that run
 - initializing the DB once through the bundled inbox CLI before launching role agents
 - creating any required temporary Git repo fixture before launching role agents
 - launching the role agents described in `Agent Topology`
 - injecting `skills/orch/` into the leader and `skills/inbox/` into workers
 - passing each role agent the prompt text from the case file with concrete values substituted for `ORCH_SKILL_PATH`, `INBOX_SKILL_PATH`, `TMPDIR`, `RUN_ID`, `THREAD_ID`, and `WORKTREE_PATH` when needed
 - coordinating launch order or parallel start according to the case file
 - collecting agent final summaries as evidence
 - resolving final run ids, thread ids, and worktree paths from agent outputs
 - running the `Validation Commands` from the main thread after the role agents stop
 - comparing the observed results against `Expected Outcomes` and `Assertions`
 - returning a final pass/fail judgment with concrete evidence
 The role agents are responsible for:
 - acting only within the role assigned in the case file
 - using the injected skill bundle rather than ad hoc repository discovery
 - coordinating through the bundled CLI and shared DB
 - reporting concrete run ids, thread ids, worktree paths, and key command outcomes back to the test-runner agent
 The test-runner agent should treat a case as passed only when:
 - all role agents reach a final state without violating the case contract
 - the independent validation commands succeed
 - the final orch and inbox state matches the assertions in the case file
 The test-runner agent should treat a case as failed when:
 - any required agent times out or stalls
 - a required orch or inbox action is skipped
 - the leader falls back to ordinary chat for orchestration decisions that should go through `orch`
 - workers fall back to ordinary chat for progress that should go through `inbox`
 - the final run, task, thread, or worktree state conflicts with the documented assertions
 The test-runner agent should report results in this shape:
 - `case`
 - `db_path`
 - `run_id`
 - `thread_ids`
 - `worktree_paths`
 - `result`: `pass` or `fail`
 - `agent_summaries`
 - `validation_evidence`
 - `assertion_checklist`
 - `notes`
 ## Default Timeouts
 Use these defaults unless a case file explicitly overrides them:
 - per-agent timeout: `4m`
 - overall scenario timeout: `6m`
 - async wait margin for the main thread: `45s`
 ## Default Failure Conditions
 Treat the test as failed if any of the following happens:
 - any required agent does not reach a final state before timeout
 - any required orch or inbox command returns a non-success result unless the case expects that failure
 - the final `orch status` output does not match the expected run or task state
 - the final `inbox show` output does not match the expected thread or message history
 - a required worktree is missing too early or still present after cleanup in a cleanup case
 - the agents fall back to ordinary chat for critical coordination instead of the bundled CLIs
 ## Evidence Capture
 Collect at least the following artifacts for every run:
 - agent final summaries
 - final `orch status --run RUN_ID --json` output
 - final `inbox show --thread THREAD_ID --json` output for every relevant thread
 - any `blocked`, `wait`, `retry`, `reassign`, or `cleanup` output relevant to the case
 - the temporary DB path, resolved run id, resolved thread ids, and any worktree paths
 ## Cleanup Policy
 Use these defaults unless a case file explicitly overrides them:
 - keep the temporary DB, repo fixture, and working directory on failure for debugging
 - cleanup the temporary working directory on success only if the caller does not need replay artifacts
 ## Per-Case Template
 Each case file should use this structure:
 - `Test Type`
 - `Purpose`
 - `Preconditions`
 - `Agent Topology`
 - `Inputs`
 - `Execution Parameters`
 - `Execution Steps`
 - `Validation Commands`
 - `Expected Outcomes`
 - `Assertions`
 - `Cleanup`
 - `Recorded Example Run` when a real run has already been captured
 ## Case Files
 | Case Slug | File | Coverage Note |
 | --- | --- | --- |
 | `leader-run-dispatch-reconcile-through-bundled-cli` | [leader-run-dispatch-reconcile-through-bundled-cli.md](./leader-run-dispatch-reconcile-through-bundled-cli.md) | validates that a leader can drive a complete `run -> task -> dispatch -> reconcile -> status` happy path through the packaged orch skill |
 | `leader-blocked-answer-resume-through-bundled-cli` | [leader-blocked-answer-resume-through-bundled-cli.md](./leader-blocked-answer-resume-through-bundled-cli.md) | validates that a leader can observe a blocked task, answer it through `orch`, and reach final completion with a real worker |
 | `strict-worktree-dispatch-to-cleanup-through-bundled-cli` | [strict-worktree-dispatch-to-cleanup-through-bundled-cli.md](./strict-worktree-dispatch-to-cleanup-through-bundled-cli.md) | validates that the skill can drive strict worktree allocation, reconcile completion, and cleanup through the bundled orch CLI |
 | `leader-retries-failed-task-through-bundled-cli` | [leader-retries-failed-task-through-bundled-cli.md](./leader-retries-failed-task-through-bundled-cli.md) | validates that a leader can reconcile a failed attempt and create a successful retry through the packaged orch skill |
 | `leader-reassigns-blocked-task-through-bundled-cli` | [leader-reassigns-blocked-task-through-bundled-cli.md](./leader-reassigns-blocked-task-through-bundled-cli.md) | validates that a leader can reassign a blocked task from one worker to another and close the run through the packaged orch skill |
 ## Scope
 In scope:
 - explicit `$orch` skill invocation
 - bundled `./assets/orch` CLI usage
 - leader-side run, task, dependency, dispatch, reconcile, answer, retry, reassign, wait, status, and cleanup flows
 - interaction between a leader using `skills/orch/` and workers using `skills/inbox/`
 - worktree-backed dispatch and cleanup validation
 - end-to-end run state and thread history validation
 Out of scope:
 - per-command flag and JSON contract coverage for `orch`
 - worker-only skill behavior that already belongs under [../inbox-skill/](../inbox-skill/)
 - the separate `council-review` skill package
 - implicit skill triggering without `$orch`
 ## Relationship To Other Test Docs
 - [../orch/](../orch/) covers CLI command behavior
 - [../inbox-skill/](../inbox-skill/) covers worker-side skill-guided behavior on top of inbox
 - this directory covers leader-side skill-guided behavior on top of `orch`
@@ -0,0 +1,89 @@
 # Case: `leader-blocked-answer-resume-through-bundled-cli`
 ## Test Type
 This is a `forward-test` and a blocked-question resolution skill validation.
 The goal is to verify that a leader using the packaged `orch` skill can observe a blocked task, answer it through `orch`, and reach final completion with a real worker using the packaged inbox skill.
 ## Purpose
 Validate that all of the following can be true at the same time:
 - the leader can use `orch wait`, `blocked`, `answer`, `reconcile`, and `status` through the bundled skill CLI
 - a worker can ask a blocked question through the bundled inbox skill
 - the answer reaches the active attempt thread
 - the worker resumes after the answer and completes the task
 - the final run reaches `done`
 ## Preconditions
 - orch skill path exists: `ORCH_SKILL_PATH=skills/orch`
 - inbox skill path exists: `INBOX_SKILL_PATH=skills/inbox`
 - bundled CLI executables exist at `ORCH_SKILL_PATH/assets/orch` and `INBOX_SKILL_PATH/assets/inbox`
 - use an empty temporary directory `TMPDIR`
 - initialize `TMPDIR/coord.db` before launching role agents through `INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json init`
 ## Agent Topology
 - `leader`
 - `worker-a`
 ## Inputs
 ### Leader Prompt
 ```text
 Use $orch at ORCH_SKILL_PATH to act as leader on the already initialized SQLite DB TMPDIR/coord.db. Only coordinate through the bundled orch CLI from the skill. Workflow: 1) create run run_blog_skill_002, 2) add and dispatch one task T1 to worker-a, 3) wait until the task becomes blocked, 4) inspect blocked tasks, 5) answer the blocked question with the decision "Use stdout for MVP.", 6) wait until the task completes, 7) reconcile and inspect final status, 8) stop after reporting RUN_ID and THREAD_ID. Do not use ordinary chat to coordinate with the worker.
 ```
 ### Worker Prompt
 ```text
 Use $inbox at INBOX_SKILL_PATH to act as worker-a on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) fetch and claim the assigned task, 2) send one in_progress update, 3) send a blocked update asking "Should logging go to stdout or stderr?", 4) wait for a reply, 5) finish the task with done after you receive the leader decision, 6) stop after reporting the THREAD_ID you handled. Do not use ordinary chat to coordinate with the leader.
 ```
 ## Execution Parameters
 - use the shared execution contract from [README.md](./README.md)
 - use the shared timeout defaults from [README.md](./README.md)
 - do not override the default cleanup policy
 ## Execution Steps
 1. Initialize `TMPDIR/coord.db` once through the bundled inbox CLI before launching agents
 2. Inject `skills/orch/` into `leader`
 3. Inject `skills/inbox/` into `worker-a`
 4. Point both agents at the same database path `TMPDIR/coord.db`
 5. Launch `leader` and `worker-a` in parallel
 6. Wait for both agents to finish
 7. Resolve `THREAD_ID` from the agent outputs
 8. Independently run the validation commands from the main thread
 ## Validation Commands
 ```bash
 ORCH_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json status --run run_blog_skill_002
 INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json show --thread THREAD_ID
 ```
 ## Expected Outcomes
 - `leader` successfully observes a blocked event through `orch`
 - `leader` successfully inspects the blocked queue and emits one `answer`
 - `worker-a` receives that answer through inbox history and completes the task
 - the final run state is `done`
 ## Assertions
 - `status.data.run.status == "done"`
 - `status.data.tasks[0].status == "done"`
 - `show.data.messages[*].kind` includes `question`, `answer`, and `result`
 - one `question` message contains `payload_json.question == "Should logging go to stdout or stderr?"`
 - one `answer` message contains body `Use stdout for MVP.`
 - the final thread status is `done`
 ## Cleanup
 - use the default cleanup policy from [README.md](./README.md)
 - if the run fails, retain `TMPDIR` and `coord.db` for replay and manual inspection
@@ -0,0 +1,98 @@
 # Case: `leader-reassigns-blocked-task-through-bundled-cli`
 ## Test Type
 This is a `forward-test` and a reassignment-path skill validation.
 The goal is to verify that a leader using the packaged `orch` skill can observe a blocked task, reassign it from one worker to another, and drive the run to completion through the new attempt.
 ## Purpose
 Validate that all of the following can be true at the same time:
 - the leader can use `blocked`, `reassign`, `reconcile`, and `status` through the bundled orch skill
 - `worker-a` can claim the original attempt and block on a question
 - `worker-b` can receive the reassigned attempt as a new thread
 - the original thread is cancelled and the new thread reaches `done`
 - the final run reaches `done`
 ## Preconditions
 - orch skill path exists: `ORCH_SKILL_PATH=skills/orch`
 - inbox skill path exists: `INBOX_SKILL_PATH=skills/inbox`
 - bundled CLI executables exist at `ORCH_SKILL_PATH/assets/orch` and `INBOX_SKILL_PATH/assets/inbox`
 - use an empty temporary directory `TMPDIR`
 - initialize `TMPDIR/coord.db` before launching role agents through `INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json init`
 ## Agent Topology
 - `leader`
 - `worker-a`
 - `worker-b`
 ## Inputs
 ### Leader Prompt
 ```text
 Use $orch at ORCH_SKILL_PATH to act as leader on the already initialized SQLite DB TMPDIR/coord.db. Only coordinate through the bundled orch CLI from the skill. Workflow: 1) create run run_blog_skill_reassign_001, 2) add and dispatch one task T1 to worker-a, 3) wait until worker-a blocks, 4) inspect blocked tasks, 5) reassign T1 to worker-b with a short reason, 6) wait until worker-b completes the new attempt, 7) reconcile and inspect final status, 8) stop after reporting THREAD_ID_1 and THREAD_ID_2. Do not use ordinary chat to coordinate with the workers.
 ```
 ### Worker A Prompt
 ```text
 Use $inbox at INBOX_SKILL_PATH to act as worker-a on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) fetch and claim the initial assigned thread, 2) send one blocked update with a precise question, 3) stop after reporting THREAD_ID_1 and the blocked summary you sent. Do not use ordinary chat to coordinate with the leader or worker-b.
 ```
 ### Worker B Prompt
 ```text
 Use $inbox at INBOX_SKILL_PATH to act as worker-b on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) wait until reassigned work for worker-b appears, 2) fetch and claim it, 3) complete it with done, 4) stop after reporting THREAD_ID_2. Do not use ordinary chat to coordinate with the leader or worker-a.
 ```
 ## Execution Parameters
 - use the shared execution contract from [README.md](./README.md)
 - use the shared timeout defaults from [README.md](./README.md)
 - do not override the default cleanup policy
 ## Execution Steps
 1. Initialize `TMPDIR/coord.db` once through the bundled inbox CLI before launching agents
 2. Inject `skills/orch/` into `leader`
 3. Inject `skills/inbox/` into `worker-a` and `worker-b`
 4. Point all agents at the same database path `TMPDIR/coord.db`
 5. Launch `leader`, `worker-a`, and `worker-b` in parallel
 6. Wait for all agents to finish
 7. Resolve `THREAD_ID_1` and `THREAD_ID_2` from the agent outputs
 8. Independently run the validation commands from the main thread
 ## Validation Commands
 ```bash
 ORCH_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json status --run run_blog_skill_reassign_001
 INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json show --thread THREAD_ID_1
 INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json show --thread THREAD_ID_2
 ```
 ## Expected Outcomes
 - `worker-a` successfully claims the original thread and blocks it
 - the leader successfully reassigns the task to `worker-b`
 - the original thread reaches `cancelled`
 - `worker-b` receives a distinct reassigned thread and completes it
 - the final run reaches `done`
 ## Assertions
 - `THREAD_ID_1 != THREAD_ID_2`
 - `status.data.run.status == "done"`
 - `status.data.tasks[0].status == "done"`
 - `show THREAD_ID_1` reports a terminal cancelled thread state
 - `show THREAD_ID_2` reports a terminal done thread state
 - the blocked question remains visible in the original thread history
 ## Cleanup
 - use the default cleanup policy from [README.md](./README.md)
 - if the run fails, retain `TMPDIR` and `coord.db` for replay and manual inspection
@@ -0,0 +1,91 @@
 # Case: `leader-retries-failed-task-through-bundled-cli`
 ## Test Type
 This is a `forward-test` and a retry-path skill validation.
 The goal is to verify that a leader using the packaged `orch` skill can reconcile a failed attempt, issue `retry`, and drive the task to success through a second attempt handled by a real worker.
 ## Purpose
 Validate that all of the following can be true at the same time:
 - the leader can use the bundled orch skill to dispatch an initial attempt
 - a worker can fail the first attempt through inbox
 - the leader can reconcile that failure and create a fresh retry attempt
 - the worker can complete the retried attempt
 - the final run reaches `done` and the two attempts map to different threads
 ## Preconditions
 - orch skill path exists: `ORCH_SKILL_PATH=skills/orch`
 - inbox skill path exists: `INBOX_SKILL_PATH=skills/inbox`
 - bundled CLI executables exist at `ORCH_SKILL_PATH/assets/orch` and `INBOX_SKILL_PATH/assets/inbox`
 - use an empty temporary directory `TMPDIR`
 - initialize `TMPDIR/coord.db` before launching role agents through `INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json init`
 ## Agent Topology
 - `leader`
 - `worker-a`
 ## Inputs
 ### Leader Prompt
 ```text
 Use $orch at ORCH_SKILL_PATH to act as leader on the already initialized SQLite DB TMPDIR/coord.db. Only coordinate through the bundled orch CLI from the skill. Workflow: 1) create run run_blog_skill_retry_001, 2) add and dispatch one task T1 to worker-a, 3) wait until the first attempt fails, 4) reconcile, 5) retry T1 with a short retry note, 6) wait until the retried attempt completes, 7) reconcile again and inspect final status, 8) stop after reporting RUN_ID, THREAD_ID_1, and THREAD_ID_2. Do not use ordinary chat to coordinate with the worker.
 ```
 ### Worker Prompt
 ```text
 Use $inbox at INBOX_SKILL_PATH to act as worker-a on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) fetch and claim the first assigned thread, 2) fail that first attempt with a clear summary, 3) keep watching for retried work assigned to worker-a, 4) fetch and claim the retried thread, 5) finish the retried attempt with done, 6) stop after reporting both THREAD_ID_1 and THREAD_ID_2. Do not use ordinary chat to coordinate with the leader.
 ```
 ## Execution Parameters
 - use the shared execution contract from [README.md](./README.md)
 - use the shared timeout defaults from [README.md](./README.md)
 - do not override the default cleanup policy
 ## Execution Steps
 1. Initialize `TMPDIR/coord.db` once through the bundled inbox CLI before launching agents
 2. Inject `skills/orch/` into `leader`
 3. Inject `skills/inbox/` into `worker-a`
 4. Point both agents at the same database path `TMPDIR/coord.db`
 5. Launch `leader` and `worker-a` in parallel
 6. Wait for both agents to finish
 7. Resolve `THREAD_ID_1` and `THREAD_ID_2` from the agent outputs
 8. Independently run the validation commands from the main thread
 ## Validation Commands
 ```bash
 ORCH_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json status --run run_blog_skill_retry_001
 INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json show --thread THREAD_ID_1
 INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json show --thread THREAD_ID_2
 ```
 ## Expected Outcomes
 - the first worker-owned thread reaches `failed`
 - the leader successfully issues `retry`
 - the second worker-owned thread is distinct from the first
 - the second worker-owned thread reaches `done`
 - the final run state is `done`
 ## Assertions
 - `THREAD_ID_1 != THREAD_ID_2`
 - `status.data.run.status == "done"`
 - `status.data.tasks[0].status == "done"`
 - `show THREAD_ID_1` reports a terminal failed thread state
 - `show THREAD_ID_2` reports a terminal done thread state
 - the worker summary confirms that the retried attempt was a new thread rather than a reused one
 ## Cleanup
 - use the default cleanup policy from [README.md](./README.md)
 - if the run fails, retain `TMPDIR` and `coord.db` for replay and manual inspection
@@ -0,0 +1,90 @@
 # Case: `leader-run-dispatch-reconcile-through-bundled-cli`
 ## Test Type
 This is a `forward-test` and a leader-side happy-path skill validation.
 The goal is to verify that a leader using the packaged `orch` skill can drive a complete run lifecycle while a worker uses the packaged `inbox` skill for thread progress.
 ## Purpose
 Validate that all of the following can be true at the same time:
 - the leader can use the bundled `./assets/orch` CLI through the skill
 - the leader can create a run, add a task, dispatch it, reconcile worker progress, and inspect final status
 - a worker using the bundled inbox skill can claim the dispatched thread and finish it
 - the final orch run state and inbox thread state both reach `done`
 ## Preconditions
 - orch skill path exists: `ORCH_SKILL_PATH=skills/orch`
 - inbox skill path exists: `INBOX_SKILL_PATH=skills/inbox`
 - bundled CLI executables exist at `ORCH_SKILL_PATH/assets/orch` and `INBOX_SKILL_PATH/assets/inbox`
 - use an empty temporary directory `TMPDIR`
 - initialize `TMPDIR/coord.db` before launching role agents through `INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json init`
 ## Agent Topology
 - `leader`
 - `worker-a`
 ## Inputs
 ### Leader Prompt
 ```text
 Use $orch at ORCH_SKILL_PATH to act as leader on the already initialized SQLite DB TMPDIR/coord.db. Only coordinate through the bundled orch CLI from the skill. Workflow: 1) create run run_blog_skill_001, 2) add exactly one task T1 assigned to worker-a, 3) dispatch it, 4) wait or poll until the worker reports completion, 5) reconcile the run, 6) inspect final status, 7) stop after reporting RUN_ID and THREAD_ID. Do not use ordinary chat to coordinate with the worker.
 ```
 ### Worker Prompt
 ```text
 Use $inbox at INBOX_SKILL_PATH to act as worker-a on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) fetch pending work for worker-a, 2) claim it, 3) send one in_progress update, 4) finish it with done, 5) stop after reporting the THREAD_ID you handled. Do not use ordinary chat to coordinate with the leader.
 ```
 ## Execution Parameters
 - use the shared execution contract from [README.md](./README.md)
 - use the shared timeout defaults from [README.md](./README.md)
 - do not override the default cleanup policy
 ## Execution Steps
 1. Initialize `TMPDIR/coord.db` once through the bundled inbox CLI before launching agents
 2. Inject `skills/orch/` into `leader`
 3. Inject `skills/inbox/` into `worker-a`
 4. Point both agents at the same database path `TMPDIR/coord.db`
 5. Launch `leader` and `worker-a` in parallel
 6. Wait for both agents to finish
 7. Resolve `RUN_ID=run_blog_skill_001` and `THREAD_ID` from the agent outputs
 8. Independently run the validation commands from the main thread
 ## Validation Commands
 ```bash
 ORCH_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json status --run run_blog_skill_001
 INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json show --thread THREAD_ID
 ```
 ## Expected Outcomes
 - `leader` successfully creates `run_blog_skill_001`
 - `leader` successfully adds and dispatches `T1`
 - `worker-a` successfully claims the dispatched thread
 - `worker-a` emits at least one `in_progress` update
 - `worker-a` completes the thread with `done`
 - `leader` successfully reconciles and sees `run.status == "done"`
 ## Assertions
 - `status.data.run.run_id == "run_blog_skill_001"`
 - `status.data.run.status == "done"`
 - `status.data.tasks` contains exactly one task `T1`
 - `status.data.tasks[0].status == "done"`
 - `show.data.thread.status == "done"`
 - `show.data.messages[*].kind` includes `task`, `progress`, and `result`
 ## Cleanup
 - use the default cleanup policy from [README.md](./README.md)
 - if the run fails, retain `TMPDIR` and `coord.db` for replay and manual inspection
@@ -0,0 +1,90 @@
 # Case: `strict-worktree-dispatch-to-cleanup-through-bundled-cli`
 ## Test Type
 This is a `forward-test` and a worktree-lifecycle skill validation.
 The goal is to verify that a leader using the packaged `orch` skill can allocate a strict worktree, reconcile completion, and clean that worktree up through the bundled CLI while a worker completes the task through inbox.
 ## Purpose
 Validate that all of the following can be true at the same time:
 - the leader can dispatch a code task with `--strict-worktree` through the bundled orch skill
 - the worker can complete the resulting attempt thread through inbox
 - the leader can reconcile the finished task and clean the attempt worktree
 - the final filesystem state matches the cleanup contract
 ## Preconditions
 - orch skill path exists: `ORCH_SKILL_PATH=skills/orch`
 - inbox skill path exists: `INBOX_SKILL_PATH=skills/inbox`
 - bundled CLI executables exist at `ORCH_SKILL_PATH/assets/orch` and `INBOX_SKILL_PATH/assets/inbox`
 - use an empty temporary directory `TMPDIR`
 - initialize `TMPDIR/coord.db` before launching role agents through `INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json init`
 - create `TMPDIR/repo` as a Git repository with one committed file before launching role agents
 ## Agent Topology
 - `leader`
 - `worker-a`
 ## Inputs
 ### Leader Prompt
 ```text
 Use $orch at ORCH_SKILL_PATH to act as leader on the already initialized SQLite DB TMPDIR/coord.db. Only coordinate through the bundled orch CLI from the skill. Workflow: 1) create run run_blog_skill_worktree_001, 2) add one code task T1 for worker-a, 3) dispatch it with --repo-path TMPDIR/repo --workspace-root .orch/worktrees --strict-worktree, 4) record the returned THREAD_ID and WORKTREE_PATH, 5) wait until the worker completes, 6) reconcile, 7) clean up attempt 1, 8) stop after reporting RUN_ID, THREAD_ID, and WORKTREE_PATH. Do not use ordinary chat to coordinate with the worker.
 ```
 ### Worker Prompt
 ```text
 Use $inbox at INBOX_SKILL_PATH to act as worker-a on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) fetch and claim the assigned task, 2) inspect the task payload enough to confirm a worktree path was provided, 3) finish the task with done, 4) stop after reporting the THREAD_ID you handled and whether you observed a worktree path. Do not use ordinary chat to coordinate with the leader.
 ```
 ## Execution Parameters
 - use the shared execution contract from [README.md](./README.md)
 - use the shared timeout defaults from [README.md](./README.md)
 - do not override the default cleanup policy
 ## Execution Steps
 1. Initialize `TMPDIR/coord.db` once through the bundled inbox CLI before launching agents
 2. Create `TMPDIR/repo` with an initial commit before launching agents
 3. Inject `skills/orch/` into `leader`
 4. Inject `skills/inbox/` into `worker-a`
 5. Point both agents at the same database path `TMPDIR/coord.db`
 6. Launch `leader` and `worker-a` in parallel
 7. Wait for both agents to finish
 8. Resolve `THREAD_ID` and `WORKTREE_PATH` from the agent outputs
 9. Independently run the validation commands from the main thread
 ## Validation Commands
 ```bash
 ORCH_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json status --run run_blog_skill_worktree_001
 INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json show --thread THREAD_ID
 test ! -d WORKTREE_PATH
 ```
 ## Expected Outcomes
 - the leader reports a non-empty `WORKTREE_PATH` from dispatch
 - the worker reports that the task payload exposed a worktree path
 - the final run status is `done`
 - the cleanup step removes the worktree directory
 ## Assertions
 - `status.data.run.status == "done"`
 - `status.data.tasks[0].status == "done"`
 - `show.data.thread.status == "done"`
 - the task-side thread history includes a payload field or body content referencing the worktree path
 - `WORKTREE_PATH` does not exist after cleanup
 ## Cleanup
 - use the default cleanup policy from [README.md](./README.md)
 - if the run fails, retain `TMPDIR`, `coord.db`, and the Git repo fixture for replay and manual inspection