Files
ai-workflow-skill/docs/tests/orch-skill

Orch Skill Test Plan

Purpose

This directory tracks human-readable test plans for the skills/orch/ Codex skill bundle.

These documents are not command-contract specs for the orch CLI itself. That coverage already lives under ../orch/.

This directory exists to describe a different test surface:

  • whether a leader agent can actually use the packaged orch skill
  • whether the bundled ./assets/orch CLI works inside real skill-guided conversations
  • whether leader-side orchestration driven by the skill reaches the expected run, task, thread, and worktree state

Test Model

  • README.md is the index for this directory
  • each skill test case lives in its own Markdown file
  • use stable case slugs in filenames

Shared Execution Contract

Use these defaults unless a case file explicitly overrides them:

  • run the scenario with real subagents, not simulated transcripts
  • inject skills/orch/ into the leader agent
  • inject skills/inbox/ into worker agents whenever worker-side thread progress is required
  • initialize the shared SQLite DB before launching role agents with INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json init
  • require the leader to coordinate through the bundled ./assets/orch CLI from the skill instead of ordinary chat
  • require workers to coordinate through the bundled ./assets/inbox CLI from their skill instead of ordinary chat
  • validate final run and thread state independently from the main thread after the agents stop
  • create any required Git repo fixture before launching agents for worktree cases

How An Agent Runs These Cases

Use one test-runner agent to execute each case.

The test-runner agent is responsible for:

  • reading this README.md first, then one specific case file
  • creating an isolated temporary directory and DB path for that run
  • initializing the DB once through the bundled inbox CLI before launching role agents
  • creating any required temporary Git repo fixture before launching role agents
  • launching the role agents described in Agent Topology
  • injecting skills/orch/ into the leader and skills/inbox/ into workers
  • passing each role agent the prompt text from the case file with concrete values substituted for ORCH_SKILL_PATH, INBOX_SKILL_PATH, TMPDIR, RUN_ID, THREAD_ID, and WORKTREE_PATH when needed
  • coordinating launch order or parallel start according to the case file
  • collecting agent final summaries as evidence
  • resolving final run ids, thread ids, and worktree paths from agent outputs
  • running the Validation Commands from the main thread after the role agents stop
  • comparing the observed results against Expected Outcomes and Assertions
  • returning a final pass/fail judgment with concrete evidence

The role agents are responsible for:

  • acting only within the role assigned in the case file
  • using the injected skill bundle rather than ad hoc repository discovery
  • coordinating through the bundled CLI and shared DB
  • reporting concrete run ids, thread ids, worktree paths, and key command outcomes back to the test-runner agent

The test-runner agent should treat a case as passed only when:

  • all role agents reach a final state without violating the case contract
  • the independent validation commands succeed
  • the final orch and inbox state matches the assertions in the case file

The test-runner agent should treat a case as failed when:

  • any required agent times out or stalls
  • a required orch or inbox action is skipped
  • the leader falls back to ordinary chat for orchestration decisions that should go through orch
  • workers fall back to ordinary chat for progress that should go through inbox
  • the final run, task, thread, or worktree state conflicts with the documented assertions

The test-runner agent should report results in this shape:

  • case
  • db_path
  • run_id
  • thread_ids
  • worktree_paths
  • result: pass or fail
  • agent_summaries
  • validation_evidence
  • assertion_checklist
  • notes

Default Timeouts

Use these defaults unless a case file explicitly overrides them:

  • per-agent timeout: 4m
  • overall scenario timeout: 6m
  • async wait margin for the main thread: 45s

Default Failure Conditions

Treat the test as failed if any of the following happens:

  • any required agent does not reach a final state before timeout
  • any required orch or inbox command returns a non-success result unless the case expects that failure
  • the final orch status output does not match the expected run or task state
  • the final inbox show output does not match the expected thread or message history
  • a required worktree is missing too early or still present after cleanup in a cleanup case
  • the agents fall back to ordinary chat for critical coordination instead of the bundled CLIs

Evidence Capture

Collect at least the following artifacts for every run:

  • agent final summaries
  • final orch status --run RUN_ID --json output
  • final inbox show --thread THREAD_ID --json output for every relevant thread
  • any blocked, wait, retry, reassign, or cleanup output relevant to the case
  • the temporary DB path, resolved run id, resolved thread ids, and any worktree paths

Cleanup Policy

Use these defaults unless a case file explicitly overrides them:

  • keep the temporary DB, repo fixture, and working directory on failure for debugging
  • cleanup the temporary working directory on success only if the caller does not need replay artifacts

Per-Case Template

Each case file should use this structure:

  • Test Type
  • Purpose
  • Preconditions
  • Agent Topology
  • Inputs
  • Execution Parameters
  • Execution Steps
  • Validation Commands
  • Expected Outcomes
  • Assertions
  • Cleanup
  • Recorded Example Run when a real run has already been captured

Case Files

Case Slug File Coverage Note
leader-run-dispatch-reconcile-through-bundled-cli leader-run-dispatch-reconcile-through-bundled-cli.md validates that a leader can drive a complete run -> task -> dispatch -> reconcile -> status happy path through the packaged orch skill
leader-blocked-answer-resume-through-bundled-cli leader-blocked-answer-resume-through-bundled-cli.md validates that a leader can observe a blocked task, answer it through orch, and reach final completion with a real worker
strict-worktree-dispatch-to-cleanup-through-bundled-cli strict-worktree-dispatch-to-cleanup-through-bundled-cli.md validates that the skill can drive strict worktree allocation, reconcile completion, and cleanup through the bundled orch CLI
leader-retries-failed-task-through-bundled-cli leader-retries-failed-task-through-bundled-cli.md validates that a leader can reconcile a failed attempt and create a successful retry through the packaged orch skill
leader-reassigns-blocked-task-through-bundled-cli leader-reassigns-blocked-task-through-bundled-cli.md validates that a leader can reassign a blocked task from one worker to another and close the run through the packaged orch skill

Scope

In scope:

  • explicit $orch skill invocation
  • bundled ./assets/orch CLI usage
  • leader-side run, task, dependency, dispatch, reconcile, answer, retry, reassign, wait, status, and cleanup flows
  • interaction between a leader using skills/orch/ and workers using skills/inbox/
  • worktree-backed dispatch and cleanup validation
  • end-to-end run state and thread history validation

Out of scope:

  • per-command flag and JSON contract coverage for orch
  • worker-only skill behavior that already belongs under ../inbox-skill/
  • the separate council-review skill package
  • implicit skill triggering without $orch

Relationship To Other Test Docs

  • ../orch/ covers CLI command behavior
  • ../inbox-skill/ covers worker-side skill-guided behavior on top of inbox
  • this directory covers leader-side skill-guided behavior on top of orch