6.6 KiB
6.6 KiB
Case: council-brainstorm-end-to-end-through-bundled-cli
Test Type
This is a forward-test and a high-level council workflow validation.
The goal is to verify that a leader using the packaged council-review skill can drive council start -> wait -> tally -> report while three real reviewer agents return structured outputs through the packaged inbox skill.
Purpose
Validate that all of the following can be true at the same time:
- the leader can use the bundled
./assets/orchCLI through the council-review skill - three reviewer agents can claim and complete their fixed-role inbox tasks
- the leader can wait, tally, and report after all reviewer outputs arrive
- the final report defaults to
consensus,majority - a markdown report artifact is written
Preconditions
- council-review skill path exists:
COUNCIL_SKILL_PATH=skills/council-review - inbox skill path exists:
INBOX_SKILL_PATH=skills/inbox - bundled CLI executables exist at
COUNCIL_SKILL_PATH/assets/orchandINBOX_SKILL_PATH/assets/inbox - use an empty temporary directory
TMPDIR - initialize
TMPDIR/coord.dbbefore launching role agents throughINBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json init
Agent Topology
leaderarchitecture-reviewerimplementation-reviewerrisk-reviewer
Inputs
Leader Prompt
Use $council-review at COUNCIL_SKILL_PATH to act as leader on the already initialized SQLite DB TMPDIR/coord.db. Only coordinate through the bundled orch CLI from the skill. Workflow: 1) start council run council_skill_001 with a short architecture review prompt, 2) wait until all three reviewers complete, 3) tally with normal similarity, 4) report with default settings, 5) stop after reporting RUN_ID and REPORT_PATH. Do not use ordinary chat to coordinate with the reviewers.
Architecture Reviewer Prompt
Use $inbox at INBOX_SKILL_PATH to act as architecture-reviewer on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) fetch and claim your assigned council task, 2) complete it with done using this exact JSON body: {"reviewer_role":"architecture-reviewer","findings":[{"title":"Split contracts","summary":"Transport contracts are mixed into UI code.","proposal":"Move API contract definitions into a dedicated module.","rationale":"This lowers coupling.","confidence":"high","tags":["architecture"],"target_refs":{"repo_path":"."}},{"title":"Share helpers","summary":"Council report rendering paths are repeated.","proposal":"Introduce shared council coordinator helpers for report rendering.","rationale":"This keeps report assembly consistent.","confidence":"medium","tags":["reporting"],"target_refs":{"repo_path":"."}}]}, 3) stop after reporting THREAD_ID. Do not use ordinary chat to coordinate with the leader.
Implementation Reviewer Prompt
Use $inbox at INBOX_SKILL_PATH to act as implementation-reviewer on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) fetch and claim your assigned council task, 2) complete it with done using this exact JSON body: {"reviewer_role":"implementation-reviewer","findings":[{"title":"Extract contracts","summary":"Shared transport shapes are duplicated.","proposal":"Move API contract definitions into dedicated module","rationale":"This reduces duplication.","confidence":"high","tags":["maintainability"],"target_refs":{"repo_path":"."}},{"title":"Reuse report helpers","summary":"Formatting logic should stay shared.","proposal":"Introduce shared council coordinator helpers for report rendering","rationale":"This avoids formatter drift.","confidence":"medium","tags":["reporting"],"target_refs":{"repo_path":"."}}]}, 3) stop after reporting THREAD_ID. Do not use ordinary chat to coordinate with the leader.
Risk Reviewer Prompt
Use $inbox at INBOX_SKILL_PATH to act as risk-reviewer on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) fetch and claim your assigned council task, 2) complete it with done using this exact JSON body: {"reviewer_role":"risk-reviewer","findings":[{"title":"Lock contracts","summary":"Contract drift becomes risky over time.","proposal":"Move API contract definitions into a dedicated module.","rationale":"This reduces integration regressions.","confidence":"high","tags":["risk"],"target_refs":{"repo_path":"."}},{"title":"Cover JSON output","summary":"The council report response should stay stable.","proposal":"Add regression tests for council report JSON output.","rationale":"This catches contract regressions earlier.","confidence":"high","tags":["testing"],"target_refs":{"repo_path":"."}}]}, 3) stop after reporting THREAD_ID. Do not use ordinary chat to coordinate with the leader.
Execution Parameters
- use the shared execution contract from README.md
- use the shared timeout defaults from README.md
- do not override the default cleanup policy
Execution Steps
- Initialize
TMPDIR/coord.dbonce through the bundled inbox CLI before launching agents - Inject
skills/council-review/intoleader - Inject
skills/inbox/into the three reviewer agents - Point all agents at the same database path
TMPDIR/coord.db - Launch
leader,architecture-reviewer,implementation-reviewer, andrisk-reviewerin parallel - Wait for all agents to finish
- Resolve
RUN_ID=council_skill_001, reviewerTHREAD_IDs, andREPORT_PATHfrom the agent outputs - Independently run the validation commands from the main thread
Validation Commands
COUNCIL_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json status --run council_skill_001
COUNCIL_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json council report --run council_skill_001
test -f REPORT_PATH
Expected Outcomes
- the leader successfully starts
council_skill_001 - all three reviewers complete their fixed-role tasks
council waitreturnsall_complete == truecouncil tallyreturns oneconsensus, onemajority, and oneminoritycouncil reportdefaults to showingconsensus,majority- a markdown report artifact exists on disk
Assertions
status.data.run.status == "done"status.data.taskscontains exactly three reviewer tasks and all aredonereport.data.show == ["consensus","majority"]report.data.summary.consensus == 1report.data.summary.majority == 1report.data.summary.minority == 1report.data.grouped_recommendationslength is2REPORT_PATHexists
Cleanup
- use the default cleanup policy from README.md
- if the run fails, retain
TMPDIRandcoord.dbfor replay and manual inspection