kurihada/ai-workflow-skill

Fork 0

Files

T

kurihada d17b5ebfbd Add council-review skill gap-fill test plans

2026-03-19 17:59:58 +08:00

5.7 KiB

Raw Blame History

Case: `council-reviewer-output-invalid-json-fails-tally-through-bundled-cli`

Test Type

This is a forward-test and a malformed-reviewer-output validation.

The goal is to verify that a leader using the packaged council-review skill reaches the stable tally-time invalid_input contract when one reviewer completes its inbox task with malformed council JSON.

Purpose

Validate that all of the following can be true at the same time:

the leader can start a real council run through the bundled council-review skill
all three reviewer tasks can still reach terminal done state through the packaged inbox skill
one reviewer can return malformed JSON in the result body
the leader sees council tally fail with the expected invalid-input error instead of a silent partial tally
malformed JSON is exercised as the most realistic representative of the same reviewer-output validation layer that also rejects missing reviewer_role and role mismatches

Preconditions

council-review skill path exists: COUNCIL_SKILL_PATH=skills/council-review
inbox skill path exists: INBOX_SKILL_PATH=skills/inbox
bundled CLI executables exist at COUNCIL_SKILL_PATH/assets/orch and INBOX_SKILL_PATH/assets/inbox
use an empty temporary directory TMPDIR
initialize TMPDIR/coord.db before launching role agents through INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json init

Agent Topology

leader
architecture-reviewer
implementation-reviewer
risk-reviewer

Inputs

Leader Prompt

Use $council-review at COUNCIL_SKILL_PATH to act as leader on the already initialized SQLite DB TMPDIR/coord.db. Only coordinate through the bundled orch CLI from the skill. Workflow: 1) start council run council_skill_008 with a short architecture review prompt, 2) wait until all three reviewers complete, 3) attempt council tally with normal similarity, 4) stop after reporting RUN_ID, exit code, and the error payload you observed. Do not use ordinary chat to coordinate with the reviewers.

Architecture Reviewer Prompt

Use $inbox at INBOX_SKILL_PATH to act as architecture-reviewer on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill.

Workflow:
1) fetch and claim your assigned council task
2) write TMPDIR/architecture-invalid.json containing exactly this invalid JSON body:
{"reviewer_role":"architecture-reviewer","findings":[{"title":"Split contracts","summary":"Transport contracts are mixed into UI code.","proposal":"Move API contract definitions into a dedicated module."}
3) complete the task with done using summary "Review complete" and --body-file TMPDIR/architecture-invalid.json
4) stop after reporting THREAD_ID and the body file path

Do not use ordinary chat to coordinate with the leader.

Implementation Reviewer Prompt

Use $inbox at INBOX_SKILL_PATH to act as implementation-reviewer on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) fetch and claim your assigned council task, 2) complete it with done using this exact JSON body: {"reviewer_role":"implementation-reviewer","findings":[{"title":"Extract API contracts","summary":"Shared transport shapes are duplicated.","proposal":"Move API contract definitions into dedicated module","rationale":"This reduces duplication.","confidence":"medium","tags":["maintainability"],"target_refs":{"repo_path":"."}}]}, 3) stop after reporting THREAD_ID. Do not use ordinary chat to coordinate with the leader.

Risk Reviewer Prompt

Use $inbox at INBOX_SKILL_PATH to act as risk-reviewer on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) fetch and claim your assigned council task, 2) complete it with done using this exact JSON body: {"reviewer_role":"risk-reviewer","findings":[{"title":"Add auth integration tests","summary":"Login regressions are hard to catch.","proposal":"Add integration tests for auth flows.","rationale":"This catches regressions earlier.","confidence":"high","tags":["risk"],"target_refs":{"repo_path":"."}}]}, 3) stop after reporting THREAD_ID. Do not use ordinary chat to coordinate with the leader.

Execution Parameters

use the shared execution contract from README.md
use the shared timeout defaults from README.md
do not override the default cleanup policy

Execution Steps

Initialize TMPDIR/coord.db once through the bundled inbox CLI before launching agents
Inject skills/council-review/ into leader
Inject skills/inbox/ into the three reviewer agents
Point all agents at the same database path TMPDIR/coord.db
Launch leader, architecture-reviewer, implementation-reviewer, and risk-reviewer in parallel
Wait for all agents to finish
Independently run the validation commands from the main thread

Validation Commands

COUNCIL_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json council wait --run council_skill_008 --timeout-seconds 2
COUNCIL_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json council tally --run council_skill_008 --similarity normal

Expected Outcomes

all three reviewer tasks still reach terminal done
council wait returns all_complete == true
council tally exits with the stable invalid-input contract
the error message indicates that reviewer output must be valid JSON

Assertions

wait.data.all_complete == true
command exit code for council tally is 30
error code is invalid_input
the error message mentions reviewer output must be valid JSON

Cleanup

use the default cleanup policy from README.md
if the run fails, retain TMPDIR and coord.db for replay and manual inspection

5.7 KiB Raw Blame History

Case: council-reviewer-output-invalid-json-fails-tally-through-bundled-cli