Files
ai-workflow-skill/docs/tests/council-review-skill/council-tally-strict-keeps-distinct-proposals-through-bundled-cli.md
T

5.8 KiB

Case: council-tally-strict-keeps-distinct-proposals-through-bundled-cli

Test Type

This is a forward-test and a strict-similarity tally validation.

The goal is to verify that a leader using the packaged council-review skill can request --similarity strict and preserve wording-level proposal differences that would normally collapse in normal mode.

Purpose

Validate that all of the following can be true at the same time:

  • the leader can drive start -> wait -> tally through the bundled council-review skill
  • three reviewer agents can complete their tasks through the packaged inbox skill
  • the architecture and implementation reviewers can submit near-duplicate but not identical proposals
  • strict tally keeps all three proposals as separate minority groups

Preconditions

  • council-review skill path exists: COUNCIL_SKILL_PATH=skills/council-review
  • inbox skill path exists: INBOX_SKILL_PATH=skills/inbox
  • bundled CLI executables exist at COUNCIL_SKILL_PATH/assets/orch and INBOX_SKILL_PATH/assets/inbox
  • use an empty temporary directory TMPDIR
  • initialize TMPDIR/coord.db before launching role agents through INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json init

Agent Topology

  • leader
  • architecture-reviewer
  • implementation-reviewer
  • risk-reviewer

Inputs

Leader Prompt

Use $council-review at COUNCIL_SKILL_PATH to act as leader on the already initialized SQLite DB TMPDIR/coord.db. Only coordinate through the bundled orch CLI from the skill. Workflow: 1) start council run council_skill_007 with a short architecture review prompt, 2) wait until all three reviewers complete, 3) tally with --similarity strict, 4) stop after reporting RUN_ID, tally counts, and the grouped proposals you observed. Do not use ordinary chat to coordinate with the reviewers.

Architecture Reviewer Prompt

Use $inbox at INBOX_SKILL_PATH to act as architecture-reviewer on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) fetch and claim your assigned council task, 2) complete it with done using this exact JSON body: {"reviewer_role":"architecture-reviewer","findings":[{"title":"Split contracts","summary":"Transport contracts are mixed into UI code.","proposal":"Move API contract definitions into a dedicated module.","rationale":"This lowers coupling.","confidence":"high","tags":["architecture"],"target_refs":{"repo_path":"."}}]}, 3) stop after reporting THREAD_ID. Do not use ordinary chat to coordinate with the leader.

Implementation Reviewer Prompt

Use $inbox at INBOX_SKILL_PATH to act as implementation-reviewer on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) fetch and claim your assigned council task, 2) complete it with done using this exact JSON body: {"reviewer_role":"implementation-reviewer","findings":[{"title":"Extract API contracts","summary":"Shared transport shapes are duplicated.","proposal":"Move API contract definitions into dedicated module","rationale":"This reduces duplication.","confidence":"medium","tags":["maintainability"],"target_refs":{"repo_path":"."}}]}, 3) stop after reporting THREAD_ID. Do not use ordinary chat to coordinate with the leader.

Risk Reviewer Prompt

Use $inbox at INBOX_SKILL_PATH to act as risk-reviewer on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) fetch and claim your assigned council task, 2) complete it with done using this exact JSON body: {"reviewer_role":"risk-reviewer","findings":[{"title":"Add auth integration tests","summary":"Login regressions are hard to catch.","proposal":"Add integration tests for auth flows.","rationale":"This catches regressions earlier.","confidence":"high","tags":["risk"],"target_refs":{"repo_path":"."}}]}, 3) stop after reporting THREAD_ID. Do not use ordinary chat to coordinate with the leader.

Execution Parameters

  • use the shared execution contract from README.md
  • use the shared timeout defaults from README.md
  • do not override the default cleanup policy

Execution Steps

  1. Initialize TMPDIR/coord.db once through the bundled inbox CLI before launching agents
  2. Inject skills/council-review/ into leader
  3. Inject skills/inbox/ into the three reviewer agents
  4. Point all agents at the same database path TMPDIR/coord.db
  5. Launch leader, architecture-reviewer, implementation-reviewer, and risk-reviewer in parallel
  6. Wait for all agents to finish
  7. Independently run the validation commands from the main thread

Validation Commands

COUNCIL_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json council wait --run council_skill_007 --timeout-seconds 2
COUNCIL_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json council tally --run council_skill_007 --similarity strict

Expected Outcomes

  • all three reviewers complete their fixed-role tasks
  • council wait returns all_complete == true
  • council tally succeeds with similarity == "strict"
  • the two nearly identical contract proposals remain separate rather than merging
  • every resulting recommendation lands in minority

Assertions

  • wait.data.all_complete == true
  • tally.data.similarity == "strict"
  • tally.data.counts.minority == 3
  • tally.data.grouped_recommendations length is 3
  • every returned recommendation has bucket == "minority"
  • the returned proposal set contains Move API contract definitions into a dedicated module.
  • the returned proposal set contains Move API contract definitions into dedicated module
  • the returned proposal set contains Add integration tests for auth flows.

Cleanup

  • use the default cleanup policy from README.md
  • if the run fails, retain TMPDIR and coord.db for replay and manual inspection