kurihada/ai-workflow-skill

Fork 0

Files

T

kurihada d17b5ebfbd Add council-review skill gap-fill test plans

2026-03-19 17:59:58 +08:00

5.8 KiB

Raw Blame History

Case: `council-tally-strict-keeps-distinct-proposals-through-bundled-cli`

Test Type

This is a forward-test and a strict-similarity tally validation.

The goal is to verify that a leader using the packaged council-review skill can request --similarity strict and preserve wording-level proposal differences that would normally collapse in normal mode.

Purpose

Validate that all of the following can be true at the same time:

the leader can drive start -> wait -> tally through the bundled council-review skill
three reviewer agents can complete their tasks through the packaged inbox skill
the architecture and implementation reviewers can submit near-duplicate but not identical proposals
strict tally keeps all three proposals as separate minority groups

Preconditions

council-review skill path exists: COUNCIL_SKILL_PATH=skills/council-review
inbox skill path exists: INBOX_SKILL_PATH=skills/inbox
bundled CLI executables exist at COUNCIL_SKILL_PATH/assets/orch and INBOX_SKILL_PATH/assets/inbox
use an empty temporary directory TMPDIR
initialize TMPDIR/coord.db before launching role agents through INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json init

Agent Topology

leader
architecture-reviewer
implementation-reviewer
risk-reviewer

Inputs

Leader Prompt

Use $council-review at COUNCIL_SKILL_PATH to act as leader on the already initialized SQLite DB TMPDIR/coord.db. Only coordinate through the bundled orch CLI from the skill. Workflow: 1) start council run council_skill_007 with a short architecture review prompt, 2) wait until all three reviewers complete, 3) tally with --similarity strict, 4) stop after reporting RUN_ID, tally counts, and the grouped proposals you observed. Do not use ordinary chat to coordinate with the reviewers.

Architecture Reviewer Prompt

Use $inbox at INBOX_SKILL_PATH to act as architecture-reviewer on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) fetch and claim your assigned council task, 2) complete it with done using this exact JSON body: {"reviewer_role":"architecture-reviewer","findings":[{"title":"Split contracts","summary":"Transport contracts are mixed into UI code.","proposal":"Move API contract definitions into a dedicated module.","rationale":"This lowers coupling.","confidence":"high","tags":["architecture"],"target_refs":{"repo_path":"."}}]}, 3) stop after reporting THREAD_ID. Do not use ordinary chat to coordinate with the leader.

Implementation Reviewer Prompt

Use $inbox at INBOX_SKILL_PATH to act as implementation-reviewer on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) fetch and claim your assigned council task, 2) complete it with done using this exact JSON body: {"reviewer_role":"implementation-reviewer","findings":[{"title":"Extract API contracts","summary":"Shared transport shapes are duplicated.","proposal":"Move API contract definitions into dedicated module","rationale":"This reduces duplication.","confidence":"medium","tags":["maintainability"],"target_refs":{"repo_path":"."}}]}, 3) stop after reporting THREAD_ID. Do not use ordinary chat to coordinate with the leader.

Risk Reviewer Prompt

Use $inbox at INBOX_SKILL_PATH to act as risk-reviewer on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) fetch and claim your assigned council task, 2) complete it with done using this exact JSON body: {"reviewer_role":"risk-reviewer","findings":[{"title":"Add auth integration tests","summary":"Login regressions are hard to catch.","proposal":"Add integration tests for auth flows.","rationale":"This catches regressions earlier.","confidence":"high","tags":["risk"],"target_refs":{"repo_path":"."}}]}, 3) stop after reporting THREAD_ID. Do not use ordinary chat to coordinate with the leader.

Execution Parameters

use the shared execution contract from README.md
use the shared timeout defaults from README.md
do not override the default cleanup policy

Execution Steps

Initialize TMPDIR/coord.db once through the bundled inbox CLI before launching agents
Inject skills/council-review/ into leader
Inject skills/inbox/ into the three reviewer agents
Point all agents at the same database path TMPDIR/coord.db
Launch leader, architecture-reviewer, implementation-reviewer, and risk-reviewer in parallel
Wait for all agents to finish
Independently run the validation commands from the main thread

Validation Commands

COUNCIL_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json council wait --run council_skill_007 --timeout-seconds 2
COUNCIL_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json council tally --run council_skill_007 --similarity strict

Expected Outcomes

all three reviewers complete their fixed-role tasks
council wait returns all_complete == true
council tally succeeds with similarity == "strict"
the two nearly identical contract proposals remain separate rather than merging
every resulting recommendation lands in minority

Assertions

wait.data.all_complete == true
tally.data.similarity == "strict"
tally.data.counts.minority == 3
tally.data.grouped_recommendations length is 3
every returned recommendation has bucket == "minority"
the returned proposal set contains Move API contract definitions into a dedicated module.
the returned proposal set contains Move API contract definitions into dedicated module
the returned proposal set contains Add integration tests for auth flows.

Cleanup

use the default cleanup policy from README.md
if the run fails, retain TMPDIR and coord.db for replay and manual inspection

5.8 KiB Raw Blame History

Case: council-tally-strict-keeps-distinct-proposals-through-bundled-cli