Orch Skill Test Plan
Purpose
This directory tracks human-readable test plans for the skills/orch/ Codex skill bundle.
These documents are not command-contract specs for the orch CLI itself.
That coverage already lives under ../orch/.
This directory exists to describe a different test surface:
- whether a leader agent can actually use the packaged
orchskill - whether the bundled
./assets/orchCLI works inside real skill-guided conversations - whether leader-side orchestration driven by the skill reaches the expected run, task, thread, and worktree state
Test Model
README.mdis the index for this directory- each skill test case lives in its own Markdown file
- use stable case slugs in filenames
Shared Execution Contract
Use these defaults unless a case file explicitly overrides them:
- run the scenario with real subagents, not simulated transcripts
- inject
skills/orch/into the leader agent - inject
skills/inbox/into worker agents whenever worker-side thread progress is required - initialize the shared SQLite DB before launching role agents with
INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json init - require the leader to coordinate through the bundled
./assets/orchCLI from the skill instead of ordinary chat - require workers to coordinate through the bundled
./assets/inboxCLI from their skill instead of ordinary chat - validate final run and thread state independently from the main thread after the agents stop
- create any required Git repo fixture before launching agents for worktree cases
How An Agent Runs These Cases
Use one test-runner agent to execute each case.
The test-runner agent is responsible for:
- reading this
README.mdfirst, then one specific case file - creating an isolated temporary directory and DB path for that run
- initializing the DB once through the bundled inbox CLI before launching role agents
- creating any required temporary Git repo fixture before launching role agents
- launching the role agents described in
Agent Topology - injecting
skills/orch/into the leader andskills/inbox/into workers - passing each role agent the prompt text from the case file with concrete values substituted for
ORCH_SKILL_PATH,INBOX_SKILL_PATH,TMPDIR,RUN_ID,THREAD_ID, andWORKTREE_PATHwhen needed - coordinating launch order or parallel start according to the case file
- collecting agent final summaries as evidence
- resolving final run ids, thread ids, and worktree paths from agent outputs
- running the
Validation Commandsfrom the main thread after the role agents stop - comparing the observed results against
Expected OutcomesandAssertions - returning a final pass/fail judgment with concrete evidence
The role agents are responsible for:
- acting only within the role assigned in the case file
- using the injected skill bundle rather than ad hoc repository discovery
- coordinating through the bundled CLI and shared DB
- reporting concrete run ids, thread ids, worktree paths, and key command outcomes back to the test-runner agent
The test-runner agent should treat a case as passed only when:
- all role agents reach a final state without violating the case contract
- the independent validation commands succeed
- the final orch and inbox state matches the assertions in the case file
The test-runner agent should treat a case as failed when:
- any required agent times out or stalls
- a required orch or inbox action is skipped
- the leader falls back to ordinary chat for orchestration decisions that should go through
orch - workers fall back to ordinary chat for progress that should go through
inbox - the final run, task, thread, or worktree state conflicts with the documented assertions
The test-runner agent should report results in this shape:
casedb_pathrun_idthread_idsworktree_pathsresult:passorfailagent_summariesvalidation_evidenceassertion_checklistnotes
Default Timeouts
Use these defaults unless a case file explicitly overrides them:
- per-agent timeout:
4m - overall scenario timeout:
6m - async wait margin for the main thread:
45s
Default Failure Conditions
Treat the test as failed if any of the following happens:
- any required agent does not reach a final state before timeout
- any required orch or inbox command returns a non-success result unless the case expects that failure
- the final
orch statusoutput does not match the expected run or task state - the final
inbox showoutput does not match the expected thread or message history - a required worktree is missing too early or still present after cleanup in a cleanup case
- the agents fall back to ordinary chat for critical coordination instead of the bundled CLIs
Evidence Capture
Collect at least the following artifacts for every run:
- agent final summaries
- final
orch status --run RUN_ID --jsonoutput - final
inbox show --thread THREAD_ID --jsonoutput for every relevant thread - any
blocked,wait,retry,reassign, orcleanupoutput relevant to the case - the temporary DB path, resolved run id, resolved thread ids, and any worktree paths
Cleanup Policy
Use these defaults unless a case file explicitly overrides them:
- keep the temporary DB, repo fixture, and working directory on failure for debugging
- cleanup the temporary working directory on success only if the caller does not need replay artifacts
Per-Case Template
Each case file should use this structure:
Test TypePurposePreconditionsAgent TopologyInputsExecution ParametersExecution StepsValidation CommandsExpected OutcomesAssertionsCleanupRecorded Example Runwhen a real run has already been captured
Case Files
| Case Slug | File | Coverage Note |
|---|---|---|
leader-run-dispatch-reconcile-through-bundled-cli |
leader-run-dispatch-reconcile-through-bundled-cli.md | validates that a leader can drive a complete run -> task -> dispatch -> reconcile -> status happy path through the packaged orch skill |
leader-blocked-answer-resume-through-bundled-cli |
leader-blocked-answer-resume-through-bundled-cli.md | validates that a leader can observe a blocked task, answer it through orch, and reach final completion with a real worker |
strict-worktree-dispatch-to-cleanup-through-bundled-cli |
strict-worktree-dispatch-to-cleanup-through-bundled-cli.md | validates that the skill can drive strict worktree allocation, reconcile completion, and cleanup through the bundled orch CLI |
leader-retries-failed-task-through-bundled-cli |
leader-retries-failed-task-through-bundled-cli.md | validates that a leader can reconcile a failed attempt and create a successful retry through the packaged orch skill |
leader-reassigns-blocked-task-through-bundled-cli |
leader-reassigns-blocked-task-through-bundled-cli.md | validates that a leader can reassign a blocked task from one worker to another and close the run through the packaged orch skill |
Scope
In scope:
- explicit
$orchskill invocation - bundled
./assets/orchCLI usage - leader-side run, task, dependency, dispatch, reconcile, answer, retry, reassign, wait, status, and cleanup flows
- interaction between a leader using
skills/orch/and workers usingskills/inbox/ - worktree-backed dispatch and cleanup validation
- end-to-end run state and thread history validation
Out of scope:
- per-command flag and JSON contract coverage for
orch - worker-only skill behavior that already belongs under ../inbox-skill/
- the separate
council-reviewskill package - implicit skill triggering without
$orch
Relationship To Other Test Docs
- ../orch/ covers CLI command behavior
- ../inbox-skill/ covers worker-side skill-guided behavior on top of inbox
- this directory covers leader-side skill-guided behavior on top of
orch