6.7 KiB
6.7 KiB
Case: council-tally-strict-keeps-distinct-proposals-through-bundled-cli
Test Type
This is a forward-test and a strict-similarity tally validation.
The goal is to verify that a leader using the packaged council-review skill can request --similarity strict and preserve wording-level proposal differences that would normally collapse in normal mode.
Purpose
Validate that all of the following can be true at the same time:
- the leader can drive
start -> wait -> tallythrough the bundled council-review skill - three reviewer agents can complete their tasks through the packaged inbox skill
- the architecture and implementation reviewers can submit near-duplicate but not identical proposals
- strict tally keeps all three proposals as separate minority groups
Preconditions
- council-review skill path exists:
COUNCIL_SKILL_PATH=skills/council-review - inbox skill path exists:
INBOX_SKILL_PATH=skills/inbox - bundled CLI executables exist at
COUNCIL_SKILL_PATH/assets/orchandINBOX_SKILL_PATH/assets/inbox - use an empty temporary directory
TMPDIR - initialize
TMPDIR/coord.dbbefore launching role agents throughINBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json init
Agent Topology
leaderarchitecture-reviewerimplementation-reviewerrisk-reviewer
Inputs
Leader Prompt
Use $council-review at COUNCIL_SKILL_PATH to act as leader on the already initialized SQLite DB TMPDIR/coord.db. Only coordinate through the bundled orch CLI from the skill. Workflow: 1) start council run council_skill_007 with a short architecture review prompt, 2) wait until all three reviewers complete, 3) tally with --similarity strict, 4) stop after reporting RUN_ID, tally counts, and the grouped proposals you observed. Do not use ordinary chat to coordinate with the reviewers.
Architecture Reviewer Prompt
Use $inbox at INBOX_SKILL_PATH to act as architecture-reviewer on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) fetch and claim your assigned council task, 2) complete it with done using this exact JSON body: {"reviewer_role":"architecture-reviewer","findings":[{"title":"Split contracts","summary":"Transport contracts are mixed into UI code.","proposal":"Move API contract definitions into a dedicated module.","rationale":"This lowers coupling.","confidence":"high","tags":["architecture"],"target_refs":{"repo_path":"."}}]}, 3) stop after reporting THREAD_ID. Do not use ordinary chat to coordinate with the leader.
Implementation Reviewer Prompt
Use $inbox at INBOX_SKILL_PATH to act as implementation-reviewer on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) fetch and claim your assigned council task, 2) complete it with done using this exact JSON body: {"reviewer_role":"implementation-reviewer","findings":[{"title":"Extract API contracts","summary":"Shared transport shapes are duplicated.","proposal":"Move API contract definitions into dedicated module","rationale":"This reduces duplication.","confidence":"medium","tags":["maintainability"],"target_refs":{"repo_path":"."}}]}, 3) stop after reporting THREAD_ID. Do not use ordinary chat to coordinate with the leader.
Risk Reviewer Prompt
Use $inbox at INBOX_SKILL_PATH to act as risk-reviewer on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) fetch and claim your assigned council task, 2) complete it with done using this exact JSON body: {"reviewer_role":"risk-reviewer","findings":[{"title":"Add auth integration tests","summary":"Login regressions are hard to catch.","proposal":"Add integration tests for auth flows.","rationale":"This catches regressions earlier.","confidence":"high","tags":["risk"],"target_refs":{"repo_path":"."}}]}, 3) stop after reporting THREAD_ID. Do not use ordinary chat to coordinate with the leader.
Execution Parameters
- use the shared execution contract from README.md
- use the shared timeout defaults from README.md
- do not override the default cleanup policy
Execution Steps
- Initialize
TMPDIR/coord.dbonce through the bundled inbox CLI before launching agents - Inject
skills/council-review/intoleader - Inject
skills/inbox/into the three reviewer agents - Point all agents at the same database path
TMPDIR/coord.db - Launch
leader,architecture-reviewer,implementation-reviewer, andrisk-reviewerin parallel - Wait for all agents to finish
- Independently run the validation commands from the main thread
Validation Commands
COUNCIL_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json council wait --run council_skill_007 --timeout-seconds 2
COUNCIL_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json council tally --run council_skill_007 --similarity strict
Expected Outcomes
- all three reviewers complete their fixed-role tasks
council waitreturnsall_complete == truecouncil tallysucceeds withsimilarity == "strict"- the two nearly identical contract proposals remain separate rather than merging
- every resulting recommendation lands in
minority
Assertions
wait.data.all_complete == truetally.data.similarity == "strict"tally.data.counts.minority == 3tally.data.grouped_recommendationslength is3- every returned recommendation has
bucket == "minority" - the returned proposal set contains
Move API contract definitions into a dedicated module. - the returned proposal set contains
Move API contract definitions into dedicated module - the returned proposal set contains
Add integration tests for auth flows.
Cleanup
- use the default cleanup policy from README.md
- if the run fails, retain
TMPDIRandcoord.dbfor replay and manual inspection
Recorded Real Forward Run
- recorded on:
2026-03-19 - execution mode:
real_subagent_forward_test - result:
pass - evidence root:
/tmp/council-tally-strict-keeps-distinct-proposals-through-bundled-cli.narrow4.UCbqOc - observed run id:
council_skill_007 - observed thread ids:
architecture-reviewer:thr_9e153f61692b4475a55f5c3068842ea5implementation-reviewer:thr_abbd9a2961374b13b3d3e27720fe27abrisk-reviewer:thr_3f2d64211f274f64b606bd8b8c6be5f7- evidence summary:
- main-thread
council wait --run council_skill_007 --timeout-seconds 2 --jsonreturnedwoke == trueandall_complete == true - main-thread
council tally --run council_skill_007 --similarity strict --jsonreturnedsimilarity == "strict"andcounts.minority == 3 - the returned proposal set preserved all three distinct values, including both
Move API contract definitions into a dedicated module.andMove API contract definitions into dedicated module