175 lines
8.1 KiB
Markdown
175 lines
8.1 KiB
Markdown
# Orch Skill Test Plan
|
|
|
|
## Purpose
|
|
|
|
This directory tracks human-readable test plans for the `skills/orch/` Codex skill bundle.
|
|
|
|
These documents are not command-contract specs for the `orch` CLI itself.
|
|
That coverage already lives under [../orch/](../orch/).
|
|
|
|
This directory exists to describe a different test surface:
|
|
|
|
- whether a leader agent can actually use the packaged `orch` skill
|
|
- whether the bundled `./assets/orch` CLI works inside real skill-guided conversations
|
|
- whether leader-side orchestration driven by the skill reaches the expected run, task, thread, and worktree state
|
|
|
|
## Test Model
|
|
|
|
- `README.md` is the index for this directory
|
|
- each skill test case lives in its own Markdown file
|
|
- use stable case slugs in filenames
|
|
|
|
## Shared Execution Contract
|
|
|
|
Use these defaults unless a case file explicitly overrides them:
|
|
|
|
- run the scenario with real subagents, not simulated transcripts
|
|
- inject `skills/orch/` into the leader agent
|
|
- inject `skills/inbox/` into worker agents whenever worker-side thread progress is required
|
|
- initialize the shared SQLite DB before launching role agents with `INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json init`
|
|
- require the leader to coordinate through the bundled `./assets/orch` CLI from the skill instead of ordinary chat
|
|
- require workers to coordinate through the bundled `./assets/inbox` CLI from their skill instead of ordinary chat
|
|
- validate final run and thread state independently from the main thread after the agents stop
|
|
- create any required Git repo fixture before launching agents for worktree cases
|
|
|
|
## How An Agent Runs These Cases
|
|
|
|
Use one test-runner agent to execute each case.
|
|
|
|
The test-runner agent is responsible for:
|
|
|
|
- reading this `README.md` first, then one specific case file
|
|
- creating an isolated temporary directory and DB path for that run
|
|
- initializing the DB once through the bundled inbox CLI before launching role agents
|
|
- creating any required temporary Git repo fixture before launching role agents
|
|
- launching the role agents described in `Agent Topology`
|
|
- injecting `skills/orch/` into the leader and `skills/inbox/` into workers
|
|
- passing each role agent the prompt text from the case file with concrete values substituted for `ORCH_SKILL_PATH`, `INBOX_SKILL_PATH`, `TMPDIR`, `RUN_ID`, `THREAD_ID`, and `WORKTREE_PATH` when needed
|
|
- coordinating launch order or parallel start according to the case file
|
|
- collecting agent final summaries as evidence
|
|
- resolving final run ids, thread ids, and worktree paths from agent outputs
|
|
- running the `Validation Commands` from the main thread after the role agents stop
|
|
- comparing the observed results against `Expected Outcomes` and `Assertions`
|
|
- returning a final pass/fail judgment with concrete evidence
|
|
|
|
The role agents are responsible for:
|
|
|
|
- acting only within the role assigned in the case file
|
|
- using the injected skill bundle rather than ad hoc repository discovery
|
|
- coordinating through the bundled CLI and shared DB
|
|
- reporting concrete run ids, thread ids, worktree paths, and key command outcomes back to the test-runner agent
|
|
|
|
The test-runner agent should treat a case as passed only when:
|
|
|
|
- all role agents reach a final state without violating the case contract
|
|
- the independent validation commands succeed
|
|
- the final orch and inbox state matches the assertions in the case file
|
|
|
|
The test-runner agent should treat a case as failed when:
|
|
|
|
- any required agent times out or stalls
|
|
- a required orch or inbox action is skipped
|
|
- the leader falls back to ordinary chat for orchestration decisions that should go through `orch`
|
|
- workers fall back to ordinary chat for progress that should go through `inbox`
|
|
- the final run, task, thread, or worktree state conflicts with the documented assertions
|
|
|
|
The test-runner agent should report results in this shape:
|
|
|
|
- `case`
|
|
- `db_path`
|
|
- `run_id`
|
|
- `thread_ids`
|
|
- `worktree_paths`
|
|
- `result`: `pass` or `fail`
|
|
- `agent_summaries`
|
|
- `validation_evidence`
|
|
- `assertion_checklist`
|
|
- `notes`
|
|
|
|
## Default Timeouts
|
|
|
|
Use these defaults unless a case file explicitly overrides them:
|
|
|
|
- per-agent timeout: `4m`
|
|
- overall scenario timeout: `6m`
|
|
- async wait margin for the main thread: `45s`
|
|
|
|
## Default Failure Conditions
|
|
|
|
Treat the test as failed if any of the following happens:
|
|
|
|
- any required agent does not reach a final state before timeout
|
|
- any required orch or inbox command returns a non-success result unless the case expects that failure
|
|
- the final `orch status` output does not match the expected run or task state
|
|
- the final `inbox show` output does not match the expected thread or message history
|
|
- a required worktree is missing too early or still present after cleanup in a cleanup case
|
|
- the agents fall back to ordinary chat for critical coordination instead of the bundled CLIs
|
|
|
|
## Evidence Capture
|
|
|
|
Collect at least the following artifacts for every run:
|
|
|
|
- agent final summaries
|
|
- final `orch status --run RUN_ID --json` output
|
|
- final `inbox show --thread THREAD_ID --json` output for every relevant thread
|
|
- any `blocked`, `wait`, `retry`, `reassign`, or `cleanup` output relevant to the case
|
|
- the temporary DB path, resolved run id, resolved thread ids, and any worktree paths
|
|
|
|
## Cleanup Policy
|
|
|
|
Use these defaults unless a case file explicitly overrides them:
|
|
|
|
- keep the temporary DB, repo fixture, and working directory on failure for debugging
|
|
- cleanup the temporary working directory on success only if the caller does not need replay artifacts
|
|
|
|
## Per-Case Template
|
|
|
|
Each case file should use this structure:
|
|
|
|
- `Test Type`
|
|
- `Purpose`
|
|
- `Preconditions`
|
|
- `Agent Topology`
|
|
- `Inputs`
|
|
- `Execution Parameters`
|
|
- `Execution Steps`
|
|
- `Validation Commands`
|
|
- `Expected Outcomes`
|
|
- `Assertions`
|
|
- `Cleanup`
|
|
- `Recorded Example Run` when a real run has already been captured
|
|
|
|
## Case Files
|
|
|
|
| Case Slug | File | Coverage Note |
|
|
| --- | --- | --- |
|
|
| `leader-run-dispatch-reconcile-through-bundled-cli` | [leader-run-dispatch-reconcile-through-bundled-cli.md](./leader-run-dispatch-reconcile-through-bundled-cli.md) | validates that a leader can drive a complete `run -> task -> dispatch -> reconcile -> status` happy path through the packaged orch skill |
|
|
| `leader-blocked-answer-resume-through-bundled-cli` | [leader-blocked-answer-resume-through-bundled-cli.md](./leader-blocked-answer-resume-through-bundled-cli.md) | validates that a leader can observe a blocked task, answer it through `orch`, and reach final completion with a real worker |
|
|
| `strict-worktree-dispatch-to-cleanup-through-bundled-cli` | [strict-worktree-dispatch-to-cleanup-through-bundled-cli.md](./strict-worktree-dispatch-to-cleanup-through-bundled-cli.md) | validates that the skill can drive strict worktree allocation, reconcile completion, and cleanup through the bundled orch CLI |
|
|
| `leader-retries-failed-task-through-bundled-cli` | [leader-retries-failed-task-through-bundled-cli.md](./leader-retries-failed-task-through-bundled-cli.md) | validates that a leader can reconcile a failed attempt and create a successful retry through the packaged orch skill |
|
|
| `leader-reassigns-blocked-task-through-bundled-cli` | [leader-reassigns-blocked-task-through-bundled-cli.md](./leader-reassigns-blocked-task-through-bundled-cli.md) | validates that a leader can reassign a blocked task from one worker to another and close the run through the packaged orch skill |
|
|
|
|
## Scope
|
|
|
|
In scope:
|
|
|
|
- explicit `$orch` skill invocation
|
|
- bundled `./assets/orch` CLI usage
|
|
- leader-side run, task, dependency, dispatch, reconcile, answer, retry, reassign, wait, status, and cleanup flows
|
|
- interaction between a leader using `skills/orch/` and workers using `skills/inbox/`
|
|
- worktree-backed dispatch and cleanup validation
|
|
- end-to-end run state and thread history validation
|
|
|
|
Out of scope:
|
|
|
|
- per-command flag and JSON contract coverage for `orch`
|
|
- worker-only skill behavior that already belongs under [../inbox-skill/](../inbox-skill/)
|
|
- the separate `council-review` skill package
|
|
- implicit skill triggering without `$orch`
|
|
|
|
## Relationship To Other Test Docs
|
|
|
|
- [../orch/](../orch/) covers CLI command behavior
|
|
- [../inbox-skill/](../inbox-skill/) covers worker-side skill-guided behavior on top of inbox
|
|
- this directory covers leader-side skill-guided behavior on top of `orch`
|