kurihada/ai-workflow-skill

Fork 0

Files

T

kurihada 0b533a70f9 Add council-review skill test plan docs

2026-03-19 17:25:40 +08:00

6.6 KiB

Raw Blame History

Case: `council-brainstorm-end-to-end-through-bundled-cli`

Test Type

This is a forward-test and a high-level council workflow validation.

The goal is to verify that a leader using the packaged council-review skill can drive council start -> wait -> tally -> report while three real reviewer agents return structured outputs through the packaged inbox skill.

Purpose

Validate that all of the following can be true at the same time:

the leader can use the bundled ./assets/orch CLI through the council-review skill
three reviewer agents can claim and complete their fixed-role inbox tasks
the leader can wait, tally, and report after all reviewer outputs arrive
the final report defaults to consensus,majority
a markdown report artifact is written

Preconditions

council-review skill path exists: COUNCIL_SKILL_PATH=skills/council-review
inbox skill path exists: INBOX_SKILL_PATH=skills/inbox
bundled CLI executables exist at COUNCIL_SKILL_PATH/assets/orch and INBOX_SKILL_PATH/assets/inbox
use an empty temporary directory TMPDIR
initialize TMPDIR/coord.db before launching role agents through INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json init

Agent Topology

leader
architecture-reviewer
implementation-reviewer
risk-reviewer

Inputs

Leader Prompt

Use $council-review at COUNCIL_SKILL_PATH to act as leader on the already initialized SQLite DB TMPDIR/coord.db. Only coordinate through the bundled orch CLI from the skill. Workflow: 1) start council run council_skill_001 with a short architecture review prompt, 2) wait until all three reviewers complete, 3) tally with normal similarity, 4) report with default settings, 5) stop after reporting RUN_ID and REPORT_PATH. Do not use ordinary chat to coordinate with the reviewers.

Architecture Reviewer Prompt

Use $inbox at INBOX_SKILL_PATH to act as architecture-reviewer on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) fetch and claim your assigned council task, 2) complete it with done using this exact JSON body: {"reviewer_role":"architecture-reviewer","findings":[{"title":"Split contracts","summary":"Transport contracts are mixed into UI code.","proposal":"Move API contract definitions into a dedicated module.","rationale":"This lowers coupling.","confidence":"high","tags":["architecture"],"target_refs":{"repo_path":"."}},{"title":"Share helpers","summary":"Council report rendering paths are repeated.","proposal":"Introduce shared council coordinator helpers for report rendering.","rationale":"This keeps report assembly consistent.","confidence":"medium","tags":["reporting"],"target_refs":{"repo_path":"."}}]}, 3) stop after reporting THREAD_ID. Do not use ordinary chat to coordinate with the leader.

Implementation Reviewer Prompt

Use $inbox at INBOX_SKILL_PATH to act as implementation-reviewer on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) fetch and claim your assigned council task, 2) complete it with done using this exact JSON body: {"reviewer_role":"implementation-reviewer","findings":[{"title":"Extract contracts","summary":"Shared transport shapes are duplicated.","proposal":"Move API contract definitions into dedicated module","rationale":"This reduces duplication.","confidence":"high","tags":["maintainability"],"target_refs":{"repo_path":"."}},{"title":"Reuse report helpers","summary":"Formatting logic should stay shared.","proposal":"Introduce shared council coordinator helpers for report rendering","rationale":"This avoids formatter drift.","confidence":"medium","tags":["reporting"],"target_refs":{"repo_path":"."}}]}, 3) stop after reporting THREAD_ID. Do not use ordinary chat to coordinate with the leader.

Risk Reviewer Prompt

Use $inbox at INBOX_SKILL_PATH to act as risk-reviewer on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) fetch and claim your assigned council task, 2) complete it with done using this exact JSON body: {"reviewer_role":"risk-reviewer","findings":[{"title":"Lock contracts","summary":"Contract drift becomes risky over time.","proposal":"Move API contract definitions into a dedicated module.","rationale":"This reduces integration regressions.","confidence":"high","tags":["risk"],"target_refs":{"repo_path":"."}},{"title":"Cover JSON output","summary":"The council report response should stay stable.","proposal":"Add regression tests for council report JSON output.","rationale":"This catches contract regressions earlier.","confidence":"high","tags":["testing"],"target_refs":{"repo_path":"."}}]}, 3) stop after reporting THREAD_ID. Do not use ordinary chat to coordinate with the leader.

Execution Parameters

use the shared execution contract from README.md
use the shared timeout defaults from README.md
do not override the default cleanup policy

Execution Steps

Initialize TMPDIR/coord.db once through the bundled inbox CLI before launching agents
Inject skills/council-review/ into leader
Inject skills/inbox/ into the three reviewer agents
Point all agents at the same database path TMPDIR/coord.db
Launch leader, architecture-reviewer, implementation-reviewer, and risk-reviewer in parallel
Wait for all agents to finish
Resolve RUN_ID=council_skill_001, reviewer THREAD_IDs, and REPORT_PATH from the agent outputs
Independently run the validation commands from the main thread

Validation Commands

COUNCIL_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json status --run council_skill_001
COUNCIL_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json council report --run council_skill_001
test -f REPORT_PATH

Expected Outcomes

the leader successfully starts council_skill_001
all three reviewers complete their fixed-role tasks
council wait returns all_complete == true
council tally returns one consensus, one majority, and one minority
council report defaults to showing consensus,majority
a markdown report artifact exists on disk

Assertions

status.data.run.status == "done"
status.data.tasks contains exactly three reviewer tasks and all are done
report.data.show == ["consensus","majority"]
report.data.summary.consensus == 1
report.data.summary.majority == 1
report.data.summary.minority == 1
report.data.grouped_recommendations length is 2
REPORT_PATH exists

Cleanup

use the default cleanup policy from README.md
if the run fails, retain TMPDIR and coord.db for replay and manual inspection

6.6 KiB Raw Blame History

Case: council-brainstorm-end-to-end-through-bundled-cli