whether a leader agent can actually use the packaged orch skill
whether the bundled ./assets/orch CLI works inside real skill-guided conversations
whether leader-side orchestration driven by the skill reaches the expected run, task, thread, and worktree state

Test Model

Shared Execution Contract

Use these defaults unless a case file explicitly overrides them:

run the scenario with real subagents, not simulated transcripts
inject skills/orch/ into the leader agent
inject skills/inbox/ into worker agents whenever worker-side thread progress is required
initialize the shared SQLite DB before launching role agents with INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json init
require the leader to coordinate through the bundled ./assets/orch CLI from the skill instead of ordinary chat
require workers to coordinate through the bundled ./assets/inbox CLI from their skill instead of ordinary chat
validate final run and thread state independently from the main thread after the agents stop
create any required Git repo fixture before launching agents for worktree cases

Use one test-runner agent to execute each case.

The test-runner agent is responsible for:

reading this README.md first, then one specific case file
creating an isolated temporary directory and DB path for that run
initializing the DB once through the bundled inbox CLI before launching role agents
creating any required temporary Git repo fixture before launching role agents
launching the role agents described in Agent Topology
injecting skills/orch/ into the leader and skills/inbox/ into workers
passing each role agent the prompt text from the case file with concrete values substituted for ORCH_SKILL_PATH, INBOX_SKILL_PATH, TMPDIR, RUN_ID, THREAD_ID, and WORKTREE_PATH when needed
coordinating launch order or parallel start according to the case file
collecting agent final summaries as evidence
resolving final run ids, thread ids, and worktree paths from agent outputs
running the Validation Commands from the main thread after the role agents stop
comparing the observed results against Expected Outcomes and Assertions
returning a final pass/fail judgment with concrete evidence

The role agents are responsible for:

acting only within the role assigned in the case file
using the injected skill bundle rather than ad hoc repository discovery
coordinating through the bundled CLI and shared DB
reporting concrete run ids, thread ids, worktree paths, and key command outcomes back to the test-runner agent

The test-runner agent should treat a case as passed only when:

The test-runner agent should treat a case as failed when:

any required agent times out or stalls
a required orch or inbox action is skipped
the leader falls back to ordinary chat for orchestration decisions that should go through orch
workers fall back to ordinary chat for progress that should go through inbox
the final run, task, thread, or worktree state conflicts with the documented assertions

The test-runner agent should report results in this shape:

Use these defaults unless a case file explicitly overrides them:

Treat the test as failed if any of the following happens:

any required agent does not reach a final state before timeout
any required orch or inbox command returns a non-success result unless the case expects that failure
the final orch status output does not match the expected run or task state
the final inbox show output does not match the expected thread or message history
a required worktree is missing too early or still present after cleanup in a cleanup case
the agents fall back to ordinary chat for critical coordination instead of the bundled CLIs

Collect at least the following artifacts for every run:

agent final summaries
final orch status --run RUN_ID --json output
final inbox show --thread THREAD_ID --json output for every relevant thread
any blocked, wait, retry, reassign, or cleanup output relevant to the case
the temporary DB path, resolved run id, resolved thread ids, and any worktree paths

Use these defaults unless a case file explicitly overrides them:

keep the temporary DB, repo fixture, and working directory on failure for debugging
cleanup the temporary working directory on success only if the caller does not need replay artifacts

Each case file should use this structure:

Case Slug	File	Coverage Note
`leader-run-dispatch-reconcile-through-bundled-cli`	leader-run-dispatch-reconcile-through-bundled-cli.md	validates that a leader can drive a complete `run -> task -> dispatch -> reconcile -> status` happy path through the packaged orch skill
`leader-blocked-answer-resume-through-bundled-cli`	leader-blocked-answer-resume-through-bundled-cli.md	validates that a leader can observe a blocked task, answer it through `orch`, and reach final completion with a real worker
`strict-worktree-dispatch-to-cleanup-through-bundled-cli`	strict-worktree-dispatch-to-cleanup-through-bundled-cli.md	validates that the skill can drive strict worktree allocation, reconcile completion, and cleanup through the bundled orch CLI
`leader-retries-failed-task-through-bundled-cli`	leader-retries-failed-task-through-bundled-cli.md	validates that a leader can reconcile a failed attempt and create a successful retry through the packaged orch skill
`leader-reassigns-blocked-task-through-bundled-cli`	leader-reassigns-blocked-task-through-bundled-cli.md	validates that a leader can reassign a blocked task from one worker to another and close the run through the packaged orch skill

In scope:

explicit $orch skill invocation
bundled ./assets/orch CLI usage
leader-side run, task, dependency, dispatch, reconcile, answer, retry, reassign, wait, status, and cleanup flows
interaction between a leader using skills/orch/ and workers using skills/inbox/
worktree-backed dispatch and cleanup validation
end-to-end run state and thread history validation

Out of scope: