Add council-review skill gap-fill test plans

This commit is contained in:
2026-03-19 17:59:58 +08:00
parent 0b533a70f9
commit d17b5ebfbd
8 changed files with 561 additions and 4 deletions
+11 -3
View File
@@ -30,7 +30,7 @@ Use these defaults unless a case file explicitly overrides them:
- require the leader to coordinate through the bundled `./assets/orch` CLI from the council-review skill instead of ordinary chat
- require reviewer agents to coordinate through the bundled `./assets/inbox` CLI from their skill instead of ordinary chat
- validate final council run, reviewer task state, and report state independently from the main thread after the agents stop
- create any required repo fixture before launching agents for mixed or repo-target cases
- create any required target-file or repo fixture before launching agents for target-file, mixed, or repo-target cases
## How An Agent Runs These Cases
@@ -41,7 +41,7 @@ The test-runner agent is responsible for:
- reading this `README.md` first, then one specific case file
- creating an isolated temporary directory and DB path for that run
- initializing the DB once through the bundled inbox CLI before launching role agents
- creating any required temporary Git repo fixture before launching role agents
- creating any required temporary target file or Git repo fixture before launching role agents
- launching the role agents described in `Agent Topology`
- injecting `skills/council-review/` into the leader and `skills/inbox/` into reviewers
- passing each role agent the prompt text from the case file with concrete values substituted for `COUNCIL_SKILL_PATH`, `INBOX_SKILL_PATH`, `TMPDIR`, `RUN_ID`, `THREAD_ID`, and `REPORT_PATH` when needed
@@ -148,6 +148,11 @@ Each case file should use this structure:
| `council-unanimous-only-default-report-through-bundled-cli` | [council-unanimous-only-default-report-through-bundled-cli.md](./council-unanimous-only-default-report-through-bundled-cli.md) | validates that unanimous-only runs default to `consensus` output while preserving the underlying summary counts |
| `council-wait-timeout-through-bundled-cli` | [council-wait-timeout-through-bundled-cli.md](./council-wait-timeout-through-bundled-cli.md) | validates that the leader sees the expected timeout contract when reviewer tasks do not complete |
| `council-report-rejects-before-tally-through-bundled-cli` | [council-report-rejects-before-tally-through-bundled-cli.md](./council-report-rejects-before-tally-through-bundled-cli.md) | validates that the skill surfaces the stable invalid-state error when report is attempted before tally |
| `council-report-show-all-includes-minority-through-bundled-cli` | [council-report-show-all-includes-minority-through-bundled-cli.md](./council-report-show-all-includes-minority-through-bundled-cli.md) | validates that an explicit `--show all` report includes the otherwise hidden minority group |
| `council-report-rejects-invalid-show-through-bundled-cli` | [council-report-rejects-invalid-show-through-bundled-cli.md](./council-report-rejects-invalid-show-through-bundled-cli.md) | validates that the leader sees the stable `invalid_input` contract for an invalid report bucket selection |
| `council-tally-strict-keeps-distinct-proposals-through-bundled-cli` | [council-tally-strict-keeps-distinct-proposals-through-bundled-cli.md](./council-tally-strict-keeps-distinct-proposals-through-bundled-cli.md) | validates that strict similarity preserves near-duplicate wording as separate minority groups |
| `council-reviewer-output-invalid-json-fails-tally-through-bundled-cli` | [council-reviewer-output-invalid-json-fails-tally-through-bundled-cli.md](./council-reviewer-output-invalid-json-fails-tally-through-bundled-cli.md) | validates that malformed reviewer result JSON reaches the leader as the stable tally-time `invalid_input` contract |
| `council-start-with-target-file-through-bundled-cli` | [council-start-with-target-file-through-bundled-cli.md](./council-start-with-target-file-through-bundled-cli.md) | validates that the skill can start a council run from explicit `--target-file` context instead of a pure inline prompt |
## Scope
@@ -157,7 +162,10 @@ In scope:
- bundled `./assets/orch` CLI usage for `orch council ...`
- end-to-end council start, wait, tally, and report flows
- interaction between a leader using `skills/council-review/` and reviewers using `skills/inbox/`
- default report policy, unanimous-only behavior, and timeout/error-path validation
- default report policy, explicit minority inclusion, and invalid report-filter validation
- normal and strict tally behavior
- malformed reviewer-output failure paths
- non-prompt target context including `target-file`
Out of scope: