From 0b533a70f9cc50bc106e4dc943fbf081bef4c358 Mon Sep 17 00:00:00 2001 From: kurihada Date: Thu, 19 Mar 2026 17:25:40 +0800 Subject: [PATCH] Add council-review skill test plan docs --- docs/implementation-roadmap.md | 1 + .../archive/council-review-skill-test-plan.md | 63 +++++++ docs/tests/council-review-skill/README.md | 174 ++++++++++++++++++ ...ainstorm-end-to-end-through-bundled-cli.md | 108 +++++++++++ ...ejects-before-tally-through-bundled-cli.md | 73 ++++++++ ...only-default-report-through-bundled-cli.md | 88 +++++++++ ...ouncil-wait-timeout-through-bundled-cli.md | 77 ++++++++ 7 files changed, 584 insertions(+) create mode 100644 docs/roadmaps/archive/council-review-skill-test-plan.md create mode 100644 docs/tests/council-review-skill/README.md create mode 100644 docs/tests/council-review-skill/council-brainstorm-end-to-end-through-bundled-cli.md create mode 100644 docs/tests/council-review-skill/council-report-rejects-before-tally-through-bundled-cli.md create mode 100644 docs/tests/council-review-skill/council-unanimous-only-default-report-through-bundled-cli.md create mode 100644 docs/tests/council-review-skill/council-wait-timeout-through-bundled-cli.md diff --git a/docs/implementation-roadmap.md b/docs/implementation-roadmap.md index 5d6b84e..a69e71f 100644 --- a/docs/implementation-roadmap.md +++ b/docs/implementation-roadmap.md @@ -26,6 +26,7 @@ As of now: - reusable Codex skill packages for `orch` and `council-review` now exist under `skills/orch/` and `skills/council-review/`, both using bundled copies of the `orch` CLI binary asset - an inbox skill forward-test plan directory now exists under `docs/tests/inbox-skill/`, with a shared execution template and multiple scenario cases - an orch skill forward-test plan directory now exists under `docs/tests/orch-skill/`, with a shared execution contract and initial leader-side workflow scenarios +- a council-review skill forward-test plan directory now exists under `docs/tests/council-review-skill/`, with a shared execution contract and initial council workflow scenarios - an execution-roadmap workflow now exists under `docs/roadmaps/active/` and `docs/roadmaps/archive/` for agent-level work traces and completion archives - a repo-local `scripts/package_skill_clis.sh` packaging flow now builds bundled skill CLI assets for `inbox`, `orch`, and `council-review` - `orch` now implements `run init/show`, `task add`, `dep add`, `ready`, `dispatch`, `reconcile`, `wait`, `blocked`, `answer`, `retry`, `reassign`, `cancel`, `cleanup`, and `status` diff --git a/docs/roadmaps/archive/council-review-skill-test-plan.md b/docs/roadmaps/archive/council-review-skill-test-plan.md new file mode 100644 index 0000000..55ca37f --- /dev/null +++ b/docs/roadmaps/archive/council-review-skill-test-plan.md @@ -0,0 +1,63 @@ +# Title + +Add Council Review Skill Test Plan Documents + +## Status + +- `completed` + +## Owner + +- Codex main agent + +## Started At + +- `2026-03-19` + +## Goal + +- Add a human-readable forward-test plan directory for the packaged `skills/council-review/` bundle under `docs/tests/`. +- Mirror the structure used by `docs/tests/inbox-skill/` and `docs/tests/orch-skill/` while adapting it to the high-level `orch council ...` workflow and reviewer coordination model. + +## Scope + +- Create `docs/tests/council-review-skill/README.md`. +- Author an initial set of `council-review` skill scenario cases as separate Markdown files. +- Update implementation progress docs to record the new test-plan directory. + +## Checklist + +- [x] Review `docs/tests/orch-skill/`, `skills/council-review/`, and the current council workflow surface. +- [x] Create `docs/tests/council-review-skill/README.md` with shared execution contract and case index. +- [x] Author initial `council-review-skill` case documents. +- [x] Update implementation roadmap and archive this execution roadmap. + +## Files + +- `docs/tests/council-review-skill/README.md` +- `docs/tests/council-review-skill/council-brainstorm-end-to-end-through-bundled-cli.md` +- `docs/tests/council-review-skill/council-unanimous-only-default-report-through-bundled-cli.md` +- `docs/tests/council-review-skill/council-wait-timeout-through-bundled-cli.md` +- `docs/tests/council-review-skill/council-report-rejects-before-tally-through-bundled-cli.md` +- `docs/implementation-roadmap.md` +- `docs/roadmaps/archive/council-review-skill-test-plan.md` + +## Decisions + +- Keep `council-review-skill` separate from `orch-skill`, because `council-review` is a distinct project-local skill package with its own user-facing semantics. +- Use the same forward-test style as the other skill test-plan directories, but inject `skills/council-review/` only into the leader and `skills/inbox/` into reviewer agents. +- Treat shared DB bootstrap through `inbox init` as part of the test-runner setup contract rather than pretending `council-review` owns schema initialization. + +## Blockers + +- none + +## Next Step + +- Capture one or more real recorded example runs for the end-to-end and unanimous-only cases after the packaged council-review skill is exercised in practice. + +## Completion Summary + +- Added `docs/tests/council-review-skill/README.md` as the shared execution contract and index for council-review skill validation. +- Added four initial forward-test scenario documents covering end-to-end brainstorm/report, unanimous-only default reporting, wait timeout, and report-before-tally invalid-state behavior. +- Updated `docs/implementation-roadmap.md` to record that the separate `council-review` skill now has a dedicated forward-test plan directory under `docs/tests/council-review-skill/`. diff --git a/docs/tests/council-review-skill/README.md b/docs/tests/council-review-skill/README.md new file mode 100644 index 0000000..16ecd9a --- /dev/null +++ b/docs/tests/council-review-skill/README.md @@ -0,0 +1,174 @@ +# Council Review Skill Test Plan + +## Purpose + +This directory tracks human-readable test plans for the `skills/council-review/` Codex skill bundle. + +These documents are not command-contract specs for the `orch council` CLI itself. +That coverage already lives under [../orch/](../orch/). + +This directory exists to describe a different test surface: + +- whether a leader agent can actually use the packaged `council-review` skill +- whether the bundled `./assets/orch` CLI works inside real skill-guided council workflows +- whether a council run driven by the skill reaches the expected reviewer, grouping, tally, and report state + +## Test Model + +- `README.md` is the index for this directory +- each skill test case lives in its own Markdown file +- use stable case slugs in filenames + +## Shared Execution Contract + +Use these defaults unless a case file explicitly overrides them: + +- run the scenario with real subagents, not simulated transcripts +- inject `skills/council-review/` into the leader agent +- inject `skills/inbox/` into reviewer agents whenever reviewer task completion is required +- initialize the shared SQLite DB before launching role agents with `INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json init` +- require the leader to coordinate through the bundled `./assets/orch` CLI from the council-review skill instead of ordinary chat +- require reviewer agents to coordinate through the bundled `./assets/inbox` CLI from their skill instead of ordinary chat +- validate final council run, reviewer task state, and report state independently from the main thread after the agents stop +- create any required repo fixture before launching agents for mixed or repo-target cases + +## How An Agent Runs These Cases + +Use one test-runner agent to execute each case. + +The test-runner agent is responsible for: + +- reading this `README.md` first, then one specific case file +- creating an isolated temporary directory and DB path for that run +- initializing the DB once through the bundled inbox CLI before launching role agents +- creating any required temporary Git repo fixture before launching role agents +- launching the role agents described in `Agent Topology` +- injecting `skills/council-review/` into the leader and `skills/inbox/` into reviewers +- passing each role agent the prompt text from the case file with concrete values substituted for `COUNCIL_SKILL_PATH`, `INBOX_SKILL_PATH`, `TMPDIR`, `RUN_ID`, `THREAD_ID`, and `REPORT_PATH` when needed +- coordinating launch order or parallel start according to the case file +- collecting agent final summaries as evidence +- resolving final run ids, thread ids, and report artifact paths from agent outputs +- running the `Validation Commands` from the main thread after the role agents stop +- comparing the observed results against `Expected Outcomes` and `Assertions` +- returning a final pass/fail judgment with concrete evidence + +The role agents are responsible for: + +- acting only within the role assigned in the case file +- using the injected skill bundle rather than ad hoc repository discovery +- coordinating through the bundled CLI and shared DB +- reporting concrete run ids, thread ids, report artifact paths, and key command outcomes back to the test-runner agent + +The test-runner agent should treat a case as passed only when: + +- all role agents reach a final state without violating the case contract +- the independent validation commands succeed +- the final council, orch, and inbox state matches the assertions in the case file + +The test-runner agent should treat a case as failed when: + +- any required agent times out or stalls +- a required council, orch, or inbox action is skipped +- the leader falls back to ordinary chat for workflow control that should go through the bundled council-review skill +- reviewer agents fall back to ordinary chat instead of returning results through inbox +- the final council grouping, summary, or report state conflicts with the documented assertions + +The test-runner agent should report results in this shape: + +- `case` +- `db_path` +- `run_id` +- `thread_ids` +- `report_paths` +- `result`: `pass` or `fail` +- `agent_summaries` +- `validation_evidence` +- `assertion_checklist` +- `notes` + +## Default Timeouts + +Use these defaults unless a case file explicitly overrides them: + +- per-agent timeout: `4m` +- overall scenario timeout: `6m` +- async wait margin for the main thread: `45s` + +## Default Failure Conditions + +Treat the test as failed if any of the following happens: + +- any required agent does not reach a final state before timeout +- any required council, orch, or inbox command returns a non-success result unless the case expects that failure +- the final `council report --json` output does not match the expected grouped recommendations +- the final `orch status` output does not match the expected reviewer task state +- a required markdown report artifact is missing when the case expects one +- the agents fall back to ordinary chat for critical coordination instead of the bundled CLIs + +## Evidence Capture + +Collect at least the following artifacts for every run: + +- agent final summaries +- final `council report --json` output when the case reaches report stage +- final `orch status --run RUN_ID --json` output +- final `inbox show --thread THREAD_ID --json` output for every relevant reviewer thread when reviewers participated +- any `council wait` or `council tally` output relevant to the case +- the temporary DB path, resolved run id, resolved thread ids, and any report artifact paths + +## Cleanup Policy + +Use these defaults unless a case file explicitly overrides them: + +- keep the temporary DB, repo fixture, and working directory on failure for debugging +- cleanup the temporary working directory on success only if the caller does not need replay artifacts + +## Per-Case Template + +Each case file should use this structure: + +- `Test Type` +- `Purpose` +- `Preconditions` +- `Agent Topology` +- `Inputs` +- `Execution Parameters` +- `Execution Steps` +- `Validation Commands` +- `Expected Outcomes` +- `Assertions` +- `Cleanup` +- `Recorded Example Run` when a real run has already been captured + +## Case Files + +| Case Slug | File | Coverage Note | +| --- | --- | --- | +| `council-brainstorm-end-to-end-through-bundled-cli` | [council-brainstorm-end-to-end-through-bundled-cli.md](./council-brainstorm-end-to-end-through-bundled-cli.md) | validates that the council-review skill can drive `start -> wait -> tally -> report` with three real reviewer agents | +| `council-unanimous-only-default-report-through-bundled-cli` | [council-unanimous-only-default-report-through-bundled-cli.md](./council-unanimous-only-default-report-through-bundled-cli.md) | validates that unanimous-only runs default to `consensus` output while preserving the underlying summary counts | +| `council-wait-timeout-through-bundled-cli` | [council-wait-timeout-through-bundled-cli.md](./council-wait-timeout-through-bundled-cli.md) | validates that the leader sees the expected timeout contract when reviewer tasks do not complete | +| `council-report-rejects-before-tally-through-bundled-cli` | [council-report-rejects-before-tally-through-bundled-cli.md](./council-report-rejects-before-tally-through-bundled-cli.md) | validates that the skill surfaces the stable invalid-state error when report is attempted before tally | + +## Scope + +In scope: + +- explicit `$council-review` skill invocation +- bundled `./assets/orch` CLI usage for `orch council ...` +- end-to-end council start, wait, tally, and report flows +- interaction between a leader using `skills/council-review/` and reviewers using `skills/inbox/` +- default report policy, unanimous-only behavior, and timeout/error-path validation + +Out of scope: + +- per-command flag and JSON contract coverage for `orch council` +- generic leader orchestration flows that already belong under [../orch-skill/](../orch-skill/) +- worker-only skill behavior that belongs under [../inbox-skill/](../inbox-skill/) +- implicit skill triggering without `$council-review` + +## Relationship To Other Test Docs + +- [../orch/](../orch/) covers CLI command behavior +- [../orch-skill/](../orch-skill/) covers generic leader-side orchestration behavior on top of `orch` +- [../inbox-skill/](../inbox-skill/) covers worker-side skill-guided behavior on top of inbox +- this directory covers the separate user-facing `council-review` skill on top of `orch council` diff --git a/docs/tests/council-review-skill/council-brainstorm-end-to-end-through-bundled-cli.md b/docs/tests/council-review-skill/council-brainstorm-end-to-end-through-bundled-cli.md new file mode 100644 index 0000000..bb50073 --- /dev/null +++ b/docs/tests/council-review-skill/council-brainstorm-end-to-end-through-bundled-cli.md @@ -0,0 +1,108 @@ +# Case: `council-brainstorm-end-to-end-through-bundled-cli` + +## Test Type + +This is a `forward-test` and a high-level council workflow validation. + +The goal is to verify that a leader using the packaged `council-review` skill can drive `council start -> wait -> tally -> report` while three real reviewer agents return structured outputs through the packaged inbox skill. + +## Purpose + +Validate that all of the following can be true at the same time: + +- the leader can use the bundled `./assets/orch` CLI through the council-review skill +- three reviewer agents can claim and complete their fixed-role inbox tasks +- the leader can wait, tally, and report after all reviewer outputs arrive +- the final report defaults to `consensus,majority` +- a markdown report artifact is written + +## Preconditions + +- council-review skill path exists: `COUNCIL_SKILL_PATH=skills/council-review` +- inbox skill path exists: `INBOX_SKILL_PATH=skills/inbox` +- bundled CLI executables exist at `COUNCIL_SKILL_PATH/assets/orch` and `INBOX_SKILL_PATH/assets/inbox` +- use an empty temporary directory `TMPDIR` +- initialize `TMPDIR/coord.db` before launching role agents through `INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json init` + +## Agent Topology + +- `leader` +- `architecture-reviewer` +- `implementation-reviewer` +- `risk-reviewer` + +## Inputs + +### Leader Prompt + +```text +Use $council-review at COUNCIL_SKILL_PATH to act as leader on the already initialized SQLite DB TMPDIR/coord.db. Only coordinate through the bundled orch CLI from the skill. Workflow: 1) start council run council_skill_001 with a short architecture review prompt, 2) wait until all three reviewers complete, 3) tally with normal similarity, 4) report with default settings, 5) stop after reporting RUN_ID and REPORT_PATH. Do not use ordinary chat to coordinate with the reviewers. +``` + +### Architecture Reviewer Prompt + +```text +Use $inbox at INBOX_SKILL_PATH to act as architecture-reviewer on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) fetch and claim your assigned council task, 2) complete it with done using this exact JSON body: {"reviewer_role":"architecture-reviewer","findings":[{"title":"Split contracts","summary":"Transport contracts are mixed into UI code.","proposal":"Move API contract definitions into a dedicated module.","rationale":"This lowers coupling.","confidence":"high","tags":["architecture"],"target_refs":{"repo_path":"."}},{"title":"Share helpers","summary":"Council report rendering paths are repeated.","proposal":"Introduce shared council coordinator helpers for report rendering.","rationale":"This keeps report assembly consistent.","confidence":"medium","tags":["reporting"],"target_refs":{"repo_path":"."}}]}, 3) stop after reporting THREAD_ID. Do not use ordinary chat to coordinate with the leader. +``` + +### Implementation Reviewer Prompt + +```text +Use $inbox at INBOX_SKILL_PATH to act as implementation-reviewer on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) fetch and claim your assigned council task, 2) complete it with done using this exact JSON body: {"reviewer_role":"implementation-reviewer","findings":[{"title":"Extract contracts","summary":"Shared transport shapes are duplicated.","proposal":"Move API contract definitions into dedicated module","rationale":"This reduces duplication.","confidence":"high","tags":["maintainability"],"target_refs":{"repo_path":"."}},{"title":"Reuse report helpers","summary":"Formatting logic should stay shared.","proposal":"Introduce shared council coordinator helpers for report rendering","rationale":"This avoids formatter drift.","confidence":"medium","tags":["reporting"],"target_refs":{"repo_path":"."}}]}, 3) stop after reporting THREAD_ID. Do not use ordinary chat to coordinate with the leader. +``` + +### Risk Reviewer Prompt + +```text +Use $inbox at INBOX_SKILL_PATH to act as risk-reviewer on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) fetch and claim your assigned council task, 2) complete it with done using this exact JSON body: {"reviewer_role":"risk-reviewer","findings":[{"title":"Lock contracts","summary":"Contract drift becomes risky over time.","proposal":"Move API contract definitions into a dedicated module.","rationale":"This reduces integration regressions.","confidence":"high","tags":["risk"],"target_refs":{"repo_path":"."}},{"title":"Cover JSON output","summary":"The council report response should stay stable.","proposal":"Add regression tests for council report JSON output.","rationale":"This catches contract regressions earlier.","confidence":"high","tags":["testing"],"target_refs":{"repo_path":"."}}]}, 3) stop after reporting THREAD_ID. Do not use ordinary chat to coordinate with the leader. +``` + +## Execution Parameters + +- use the shared execution contract from [README.md](./README.md) +- use the shared timeout defaults from [README.md](./README.md) +- do not override the default cleanup policy + +## Execution Steps + +1. Initialize `TMPDIR/coord.db` once through the bundled inbox CLI before launching agents +2. Inject `skills/council-review/` into `leader` +3. Inject `skills/inbox/` into the three reviewer agents +4. Point all agents at the same database path `TMPDIR/coord.db` +5. Launch `leader`, `architecture-reviewer`, `implementation-reviewer`, and `risk-reviewer` in parallel +6. Wait for all agents to finish +7. Resolve `RUN_ID=council_skill_001`, reviewer `THREAD_ID`s, and `REPORT_PATH` from the agent outputs +8. Independently run the validation commands from the main thread + +## Validation Commands + +```bash +COUNCIL_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json status --run council_skill_001 +COUNCIL_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json council report --run council_skill_001 +test -f REPORT_PATH +``` + +## Expected Outcomes + +- the leader successfully starts `council_skill_001` +- all three reviewers complete their fixed-role tasks +- `council wait` returns `all_complete == true` +- `council tally` returns one `consensus`, one `majority`, and one `minority` +- `council report` defaults to showing `consensus,majority` +- a markdown report artifact exists on disk + +## Assertions + +- `status.data.run.status == "done"` +- `status.data.tasks` contains exactly three reviewer tasks and all are `done` +- `report.data.show == ["consensus","majority"]` +- `report.data.summary.consensus == 1` +- `report.data.summary.majority == 1` +- `report.data.summary.minority == 1` +- `report.data.grouped_recommendations` length is `2` +- `REPORT_PATH` exists + +## Cleanup + +- use the default cleanup policy from [README.md](./README.md) +- if the run fails, retain `TMPDIR` and `coord.db` for replay and manual inspection diff --git a/docs/tests/council-review-skill/council-report-rejects-before-tally-through-bundled-cli.md b/docs/tests/council-review-skill/council-report-rejects-before-tally-through-bundled-cli.md new file mode 100644 index 0000000..0fdccbe --- /dev/null +++ b/docs/tests/council-review-skill/council-report-rejects-before-tally-through-bundled-cli.md @@ -0,0 +1,73 @@ +# Case: `council-report-rejects-before-tally-through-bundled-cli` + +## Test Type + +This is a `forward-test` and an invalid-state council workflow validation. + +The goal is to verify that a leader using the packaged `council-review` skill sees the expected stable error when report is attempted before grouped recommendations have been persisted. + +## Purpose + +Validate that all of the following can be true at the same time: + +- the leader can start a council run through the bundled council-review skill +- the leader can attempt report without tally +- the command returns the stable invalid-state contract rather than fabricating an empty report + +## Preconditions + +- council-review skill path exists: `COUNCIL_SKILL_PATH=skills/council-review` +- bundled CLI executable exists at `COUNCIL_SKILL_PATH/assets/orch` +- inbox skill path exists: `INBOX_SKILL_PATH=skills/inbox` +- use an empty temporary directory `TMPDIR` +- initialize `TMPDIR/coord.db` before launching role agents through `INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json init` + +## Agent Topology + +- `leader` + +## Inputs + +### Leader Prompt + +```text +Use $council-review at COUNCIL_SKILL_PATH to act as leader on the already initialized SQLite DB TMPDIR/coord.db. Only coordinate through the bundled orch CLI from the skill. Workflow: 1) start council run council_skill_004 with a short review target, 2) attempt council report immediately without running tally, 3) stop after reporting RUN_ID, exit code, and error payload. Do not use ordinary chat to simulate reviewer output. +``` + +## Execution Parameters + +- use the shared execution contract from [README.md](./README.md) +- use the shared timeout defaults from [README.md](./README.md) +- do not override the default cleanup policy + +## Execution Steps + +1. Initialize `TMPDIR/coord.db` once through the bundled inbox CLI before launching agents +2. Inject `skills/council-review/` into `leader` +3. Point the leader at the database path `TMPDIR/coord.db` +4. Launch the leader +5. Wait for the leader to finish +6. Independently run the validation commands from the main thread + +## Validation Commands + +```bash +COUNCIL_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json council report --run council_skill_004 +``` + +## Expected Outcomes + +- the leader successfully starts `council_skill_004` +- the report command exits with the stable invalid-state contract +- the error message indicates that council tally must run first + +## Assertions + +- command exit code is `30` +- error code is `invalid_state` +- the error message mentions that grouped recommendations are not available yet or that `council tally` must run first + +## Cleanup + +- use the default cleanup policy from [README.md](./README.md) +- if the run fails, retain `TMPDIR` and `coord.db` for replay and manual inspection diff --git a/docs/tests/council-review-skill/council-unanimous-only-default-report-through-bundled-cli.md b/docs/tests/council-review-skill/council-unanimous-only-default-report-through-bundled-cli.md new file mode 100644 index 0000000..31408a7 --- /dev/null +++ b/docs/tests/council-review-skill/council-unanimous-only-default-report-through-bundled-cli.md @@ -0,0 +1,88 @@ +# Case: `council-unanimous-only-default-report-through-bundled-cli` + +## Test Type + +This is a `forward-test` and a unanimous-only reporting validation. + +The goal is to verify that a leader using the packaged `council-review` skill can run a unanimous-only council and observe the expected default report behavior after tally. + +## Purpose + +Validate that all of the following can be true at the same time: + +- the leader can start a council run with `--only-unanimous` +- three reviewer agents can complete their tasks through the packaged inbox skill +- the leader can tally and report through the bundled council-review skill +- the final report defaults to `consensus` only while preserving the full summary counts + +## Preconditions + +- council-review skill path exists: `COUNCIL_SKILL_PATH=skills/council-review` +- inbox skill path exists: `INBOX_SKILL_PATH=skills/inbox` +- bundled CLI executables exist at `COUNCIL_SKILL_PATH/assets/orch` and `INBOX_SKILL_PATH/assets/inbox` +- use an empty temporary directory `TMPDIR` +- initialize `TMPDIR/coord.db` before launching role agents through `INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json init` + +## Agent Topology + +- `leader` +- `architecture-reviewer` +- `implementation-reviewer` +- `risk-reviewer` + +## Inputs + +### Leader Prompt + +```text +Use $council-review at COUNCIL_SKILL_PATH to act as leader on the already initialized SQLite DB TMPDIR/coord.db. Only coordinate through the bundled orch CLI from the skill. Workflow: 1) start council run council_skill_002 with --only-unanimous, 2) wait until all three reviewers complete, 3) tally with normal similarity, 4) report with default settings, 5) stop after reporting RUN_ID and the default show buckets you observed. Do not use ordinary chat to coordinate with the reviewers. +``` + +### Reviewer Prompts + +- Reuse the same reviewer body JSON and inbox-only workflow as in [council-brainstorm-end-to-end-through-bundled-cli.md](./council-brainstorm-end-to-end-through-bundled-cli.md), but target run `council_skill_002`. + +## Execution Parameters + +- use the shared execution contract from [README.md](./README.md) +- use the shared timeout defaults from [README.md](./README.md) +- do not override the default cleanup policy + +## Execution Steps + +1. Initialize `TMPDIR/coord.db` once through the bundled inbox CLI before launching agents +2. Inject `skills/council-review/` into `leader` +3. Inject `skills/inbox/` into the three reviewer agents +4. Point all agents at the same database path `TMPDIR/coord.db` +5. Launch `leader`, `architecture-reviewer`, `implementation-reviewer`, and `risk-reviewer` in parallel +6. Wait for all agents to finish +7. Resolve `RUN_ID=council_skill_002` from the agent outputs +8. Independently run the validation commands from the main thread + +## Validation Commands + +```bash +COUNCIL_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json council report --run council_skill_002 +COUNCIL_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json status --run council_skill_002 +``` + +## Expected Outcomes + +- the unanimous-only run completes successfully +- the report default `show` value is only `consensus` +- the underlying summary still contains `consensus`, `majority`, and `minority` counts +- only the consensus group is returned in `grouped_recommendations` + +## Assertions + +- `report.data.show == ["consensus"]` +- `report.data.summary.consensus == 1` +- `report.data.summary.majority == 1` +- `report.data.summary.minority == 1` +- `report.data.grouped_recommendations` length is `1` +- the sole returned recommendation has `bucket == "consensus"` + +## Cleanup + +- use the default cleanup policy from [README.md](./README.md) +- if the run fails, retain `TMPDIR` and `coord.db` for replay and manual inspection diff --git a/docs/tests/council-review-skill/council-wait-timeout-through-bundled-cli.md b/docs/tests/council-review-skill/council-wait-timeout-through-bundled-cli.md new file mode 100644 index 0000000..4c7752a --- /dev/null +++ b/docs/tests/council-review-skill/council-wait-timeout-through-bundled-cli.md @@ -0,0 +1,77 @@ +# Case: `council-wait-timeout-through-bundled-cli` + +## Test Type + +This is a `forward-test` and a timeout-path council workflow validation. + +The goal is to verify that a leader using the packaged `council-review` skill sees the expected timeout contract when reviewer tasks do not complete. + +## Purpose + +Validate that all of the following can be true at the same time: + +- the leader can start a council run through the bundled skill CLI +- the leader can call `council wait` with a short timeout +- the command reports `woke == false` and `all_complete == false` +- reviewer task metadata remains visible for later follow-up + +## Preconditions + +- council-review skill path exists: `COUNCIL_SKILL_PATH=skills/council-review` +- bundled CLI executable exists at `COUNCIL_SKILL_PATH/assets/orch` +- inbox skill path exists: `INBOX_SKILL_PATH=skills/inbox` +- use an empty temporary directory `TMPDIR` +- initialize `TMPDIR/coord.db` before launching role agents through `INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json init` + +## Agent Topology + +- `leader` + +## Inputs + +### Leader Prompt + +```text +Use $council-review at COUNCIL_SKILL_PATH to act as leader on the already initialized SQLite DB TMPDIR/coord.db. Only coordinate through the bundled orch CLI from the skill. Workflow: 1) start council run council_skill_003 with a short review target, 2) immediately call council wait with a short timeout such as 1 second, 3) stop after reporting RUN_ID and the wait result you observed. Do not use ordinary chat to simulate reviewer output. +``` + +## Execution Parameters + +- use the shared execution contract from [README.md](./README.md) +- override the council wait timeout to a short interval such as `1s` +- do not override the default cleanup policy + +## Execution Steps + +1. Initialize `TMPDIR/coord.db` once through the bundled inbox CLI before launching agents +2. Inject `skills/council-review/` into `leader` +3. Point the leader at the database path `TMPDIR/coord.db` +4. Launch the leader +5. Wait for the leader to finish +6. Independently run the validation commands from the main thread + +## Validation Commands + +```bash +COUNCIL_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json council wait --run council_skill_003 --timeout-seconds 1 +COUNCIL_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json status --run council_skill_003 +``` + +## Expected Outcomes + +- the leader successfully starts `council_skill_003` +- `council wait` times out cleanly +- the wait response still includes three reviewer statuses +- the run remains non-terminal because reviewers have not completed + +## Assertions + +- `wait.data.woke == false` +- `wait.data.all_complete == false` +- `wait.data.reviewers` length is `3` +- `status.data.run.status` is not `done` + +## Cleanup + +- use the default cleanup policy from [README.md](./README.md) +- if the run fails, retain `TMPDIR` and `coord.db` for replay and manual inspection