121 lines
6.7 KiB
Markdown
121 lines
6.7 KiB
Markdown
# Case: `council-tally-strict-keeps-distinct-proposals-through-bundled-cli`
|
|
|
|
## Test Type
|
|
|
|
This is a `forward-test` and a strict-similarity tally validation.
|
|
|
|
The goal is to verify that a leader using the packaged `council-review` skill can request `--similarity strict` and preserve wording-level proposal differences that would normally collapse in `normal` mode.
|
|
|
|
## Purpose
|
|
|
|
Validate that all of the following can be true at the same time:
|
|
|
|
- the leader can drive `start -> wait -> tally` through the bundled council-review skill
|
|
- three reviewer agents can complete their tasks through the packaged inbox skill
|
|
- the architecture and implementation reviewers can submit near-duplicate but not identical proposals
|
|
- strict tally keeps all three proposals as separate minority groups
|
|
|
|
## Preconditions
|
|
|
|
- council-review skill path exists: `COUNCIL_SKILL_PATH=skills/council-review`
|
|
- inbox skill path exists: `INBOX_SKILL_PATH=skills/inbox`
|
|
- bundled CLI executables exist at `COUNCIL_SKILL_PATH/assets/orch` and `INBOX_SKILL_PATH/assets/inbox`
|
|
- use an empty temporary directory `TMPDIR`
|
|
- initialize `TMPDIR/coord.db` before launching role agents through `INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json init`
|
|
|
|
## Agent Topology
|
|
|
|
- `leader`
|
|
- `architecture-reviewer`
|
|
- `implementation-reviewer`
|
|
- `risk-reviewer`
|
|
|
|
## Inputs
|
|
|
|
### Leader Prompt
|
|
|
|
```text
|
|
Use $council-review at COUNCIL_SKILL_PATH to act as leader on the already initialized SQLite DB TMPDIR/coord.db. Only coordinate through the bundled orch CLI from the skill. Workflow: 1) start council run council_skill_007 with a short architecture review prompt, 2) wait until all three reviewers complete, 3) tally with --similarity strict, 4) stop after reporting RUN_ID, tally counts, and the grouped proposals you observed. Do not use ordinary chat to coordinate with the reviewers.
|
|
```
|
|
|
|
### Architecture Reviewer Prompt
|
|
|
|
```text
|
|
Use $inbox at INBOX_SKILL_PATH to act as architecture-reviewer on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) fetch and claim your assigned council task, 2) complete it with done using this exact JSON body: {"reviewer_role":"architecture-reviewer","findings":[{"title":"Split contracts","summary":"Transport contracts are mixed into UI code.","proposal":"Move API contract definitions into a dedicated module.","rationale":"This lowers coupling.","confidence":"high","tags":["architecture"],"target_refs":{"repo_path":"."}}]}, 3) stop after reporting THREAD_ID. Do not use ordinary chat to coordinate with the leader.
|
|
```
|
|
|
|
### Implementation Reviewer Prompt
|
|
|
|
```text
|
|
Use $inbox at INBOX_SKILL_PATH to act as implementation-reviewer on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) fetch and claim your assigned council task, 2) complete it with done using this exact JSON body: {"reviewer_role":"implementation-reviewer","findings":[{"title":"Extract API contracts","summary":"Shared transport shapes are duplicated.","proposal":"Move API contract definitions into dedicated module","rationale":"This reduces duplication.","confidence":"medium","tags":["maintainability"],"target_refs":{"repo_path":"."}}]}, 3) stop after reporting THREAD_ID. Do not use ordinary chat to coordinate with the leader.
|
|
```
|
|
|
|
### Risk Reviewer Prompt
|
|
|
|
```text
|
|
Use $inbox at INBOX_SKILL_PATH to act as risk-reviewer on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) fetch and claim your assigned council task, 2) complete it with done using this exact JSON body: {"reviewer_role":"risk-reviewer","findings":[{"title":"Add auth integration tests","summary":"Login regressions are hard to catch.","proposal":"Add integration tests for auth flows.","rationale":"This catches regressions earlier.","confidence":"high","tags":["risk"],"target_refs":{"repo_path":"."}}]}, 3) stop after reporting THREAD_ID. Do not use ordinary chat to coordinate with the leader.
|
|
```
|
|
|
|
## Execution Parameters
|
|
|
|
- use the shared execution contract from [README.md](./README.md)
|
|
- use the shared timeout defaults from [README.md](./README.md)
|
|
- do not override the default cleanup policy
|
|
|
|
## Execution Steps
|
|
|
|
1. Initialize `TMPDIR/coord.db` once through the bundled inbox CLI before launching agents
|
|
2. Inject `skills/council-review/` into `leader`
|
|
3. Inject `skills/inbox/` into the three reviewer agents
|
|
4. Point all agents at the same database path `TMPDIR/coord.db`
|
|
5. Launch `leader`, `architecture-reviewer`, `implementation-reviewer`, and `risk-reviewer` in parallel
|
|
6. Wait for all agents to finish
|
|
7. Independently run the validation commands from the main thread
|
|
|
|
## Validation Commands
|
|
|
|
```bash
|
|
COUNCIL_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json council wait --run council_skill_007 --timeout-seconds 2
|
|
COUNCIL_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json council tally --run council_skill_007 --similarity strict
|
|
```
|
|
|
|
## Expected Outcomes
|
|
|
|
- all three reviewers complete their fixed-role tasks
|
|
- `council wait` returns `all_complete == true`
|
|
- `council tally` succeeds with `similarity == "strict"`
|
|
- the two nearly identical contract proposals remain separate rather than merging
|
|
- every resulting recommendation lands in `minority`
|
|
|
|
## Assertions
|
|
|
|
- `wait.data.all_complete == true`
|
|
- `tally.data.similarity == "strict"`
|
|
- `tally.data.counts.minority == 3`
|
|
- `tally.data.grouped_recommendations` length is `3`
|
|
- every returned recommendation has `bucket == "minority"`
|
|
- the returned proposal set contains `Move API contract definitions into a dedicated module.`
|
|
- the returned proposal set contains `Move API contract definitions into dedicated module`
|
|
- the returned proposal set contains `Add integration tests for auth flows.`
|
|
|
|
## Cleanup
|
|
|
|
- use the default cleanup policy from [README.md](./README.md)
|
|
- if the run fails, retain `TMPDIR` and `coord.db` for replay and manual inspection
|
|
|
|
## Recorded Real Forward Run
|
|
|
|
- recorded on: `2026-03-19`
|
|
- execution mode: `real_subagent_forward_test`
|
|
- result: `pass`
|
|
- evidence root: `/tmp/council-tally-strict-keeps-distinct-proposals-through-bundled-cli.narrow4.UCbqOc`
|
|
- observed run id: `council_skill_007`
|
|
- observed thread ids:
|
|
- `architecture-reviewer`: `thr_9e153f61692b4475a55f5c3068842ea5`
|
|
- `implementation-reviewer`: `thr_abbd9a2961374b13b3d3e27720fe27ab`
|
|
- `risk-reviewer`: `thr_3f2d64211f274f64b606bd8b8c6be5f7`
|
|
- evidence summary:
|
|
- main-thread `council wait --run council_skill_007 --timeout-seconds 2 --json` returned `woke == true` and `all_complete == true`
|
|
- main-thread `council tally --run council_skill_007 --similarity strict --json` returned `similarity == "strict"` and `counts.minority == 3`
|
|
- the returned proposal set preserved all three distinct values, including both `Move API contract definitions into a dedicated module.` and `Move API contract definitions into dedicated module`
|