Add council-review skill gap-fill test plans

2026-03-19 17:59:58 +08:00
parent 0b533a70f9
commit d17b5ebfbd
8 changed files with 561 additions and 4 deletions
@@ -30,7 +30,7 @@ Use these defaults unless a case file explicitly overrides them:
 - require the leader to coordinate through the bundled `./assets/orch` CLI from the council-review skill instead of ordinary chat
 - require reviewer agents to coordinate through the bundled `./assets/inbox` CLI from their skill instead of ordinary chat
 - validate final council run, reviewer task state, and report state independently from the main thread after the agents stop
- create any required repo fixture before launching agents for mixed or repo-target cases
+- create any required target-file or repo fixture before launching agents for target-file, mixed, or repo-target cases

 ## How An Agent Runs These Cases

@@ -41,7 +41,7 @@ The test-runner agent is responsible for:
 - reading this `README.md` first, then one specific case file
 - creating an isolated temporary directory and DB path for that run
 - initializing the DB once through the bundled inbox CLI before launching role agents
- creating any required temporary Git repo fixture before launching role agents
+- creating any required temporary target file or Git repo fixture before launching role agents
 - launching the role agents described in `Agent Topology`
 - injecting `skills/council-review/` into the leader and `skills/inbox/` into reviewers
 - passing each role agent the prompt text from the case file with concrete values substituted for `COUNCIL_SKILL_PATH`, `INBOX_SKILL_PATH`, `TMPDIR`, `RUN_ID`, `THREAD_ID`, and `REPORT_PATH` when needed
@@ -148,6 +148,11 @@ Each case file should use this structure:
 | `council-unanimous-only-default-report-through-bundled-cli` | [council-unanimous-only-default-report-through-bundled-cli.md](./council-unanimous-only-default-report-through-bundled-cli.md) | validates that unanimous-only runs default to `consensus` output while preserving the underlying summary counts |
 | `council-wait-timeout-through-bundled-cli` | [council-wait-timeout-through-bundled-cli.md](./council-wait-timeout-through-bundled-cli.md) | validates that the leader sees the expected timeout contract when reviewer tasks do not complete |
 | `council-report-rejects-before-tally-through-bundled-cli` | [council-report-rejects-before-tally-through-bundled-cli.md](./council-report-rejects-before-tally-through-bundled-cli.md) | validates that the skill surfaces the stable invalid-state error when report is attempted before tally |
+| `council-report-show-all-includes-minority-through-bundled-cli` | [council-report-show-all-includes-minority-through-bundled-cli.md](./council-report-show-all-includes-minority-through-bundled-cli.md) | validates that an explicit `--show all` report includes the otherwise hidden minority group |
+| `council-report-rejects-invalid-show-through-bundled-cli` | [council-report-rejects-invalid-show-through-bundled-cli.md](./council-report-rejects-invalid-show-through-bundled-cli.md) | validates that the leader sees the stable `invalid_input` contract for an invalid report bucket selection |
+| `council-tally-strict-keeps-distinct-proposals-through-bundled-cli` | [council-tally-strict-keeps-distinct-proposals-through-bundled-cli.md](./council-tally-strict-keeps-distinct-proposals-through-bundled-cli.md) | validates that strict similarity preserves near-duplicate wording as separate minority groups |
+| `council-reviewer-output-invalid-json-fails-tally-through-bundled-cli` | [council-reviewer-output-invalid-json-fails-tally-through-bundled-cli.md](./council-reviewer-output-invalid-json-fails-tally-through-bundled-cli.md) | validates that malformed reviewer result JSON reaches the leader as the stable tally-time `invalid_input` contract |
+| `council-start-with-target-file-through-bundled-cli` | [council-start-with-target-file-through-bundled-cli.md](./council-start-with-target-file-through-bundled-cli.md) | validates that the skill can start a council run from explicit `--target-file` context instead of a pure inline prompt |

 ## Scope

@@ -157,7 +162,10 @@ In scope:
 - bundled `./assets/orch` CLI usage for `orch council ...`
 - end-to-end council start, wait, tally, and report flows
 - interaction between a leader using `skills/council-review/` and reviewers using `skills/inbox/`
- default report policy, unanimous-only behavior, and timeout/error-path validation
+- default report policy, explicit minority inclusion, and invalid report-filter validation
+- normal and strict tally behavior
+- malformed reviewer-output failure paths
+- non-prompt target context including `target-file`

 Out of scope: