Add council-review skill gap-fill test plans
This commit is contained in:
@@ -26,7 +26,8 @@ As of now:
|
||||
- reusable Codex skill packages for `orch` and `council-review` now exist under `skills/orch/` and `skills/council-review/`, both using bundled copies of the `orch` CLI binary asset
|
||||
- an inbox skill forward-test plan directory now exists under `docs/tests/inbox-skill/`, with a shared execution template and multiple scenario cases
|
||||
- an orch skill forward-test plan directory now exists under `docs/tests/orch-skill/`, with a shared execution contract and initial leader-side workflow scenarios
|
||||
- a council-review skill forward-test plan directory now exists under `docs/tests/council-review-skill/`, with a shared execution contract and initial council workflow scenarios
|
||||
- a repo-local replay runner now exists at `scripts/run_orch_skill_forward_tests.sh`, and the five `docs/tests/orch-skill/` cases now include recorded example runs from a bundled-CLI replay captured on `2026-03-19`
|
||||
- a council-review skill forward-test plan directory now exists under `docs/tests/council-review-skill/`, with a shared execution contract and nine council workflow scenarios covering end-to-end flow, unanimous-only defaults, timeout/before-tally errors, explicit minority reporting, invalid report filters, strict tally semantics, malformed reviewer JSON, and target-file inputs
|
||||
- an execution-roadmap workflow now exists under `docs/roadmaps/active/` and `docs/roadmaps/archive/` for agent-level work traces and completion archives
|
||||
- a repo-local `scripts/package_skill_clis.sh` packaging flow now builds bundled skill CLI assets for `inbox`, `orch`, and `council-review`
|
||||
- `orch` now implements `run init/show`, `task add`, `dep add`, `ready`, `dispatch`, `reconcile`, `wait`, `blocked`, `answer`, `retry`, `reassign`, `cancel`, `cleanup`, and `status`
|
||||
|
||||
@@ -0,0 +1,62 @@
|
||||
# Title
|
||||
|
||||
Expand Council Review Skill Test Plan Coverage
|
||||
|
||||
## Status
|
||||
|
||||
- `completed`
|
||||
|
||||
## Owner
|
||||
|
||||
- Codex main agent
|
||||
|
||||
## Started At
|
||||
|
||||
- `2026-03-19`
|
||||
|
||||
## Goal
|
||||
|
||||
- Add the next batch of high-value `docs/tests/council-review-skill/` forward-test cases to cover important council workflow gaps beyond the initial smoke suite.
|
||||
|
||||
## Scope
|
||||
|
||||
- Update `docs/tests/council-review-skill/README.md` with the expanded case index.
|
||||
- Add five new council-review skill case documents.
|
||||
- Update `docs/implementation-roadmap.md` to reflect the broader skill test-plan coverage.
|
||||
|
||||
## Checklist
|
||||
|
||||
- [x] Identify the highest-value uncovered council-review skill scenarios.
|
||||
- [x] Update the council-review skill README case index.
|
||||
- [x] Author the five new case documents.
|
||||
- [x] Update implementation roadmap and archive this execution roadmap.
|
||||
|
||||
## Files
|
||||
|
||||
- `docs/tests/council-review-skill/README.md`
|
||||
- `docs/tests/council-review-skill/council-report-show-all-includes-minority-through-bundled-cli.md`
|
||||
- `docs/tests/council-review-skill/council-report-rejects-invalid-show-through-bundled-cli.md`
|
||||
- `docs/tests/council-review-skill/council-tally-strict-keeps-distinct-proposals-through-bundled-cli.md`
|
||||
- `docs/tests/council-review-skill/council-reviewer-output-invalid-json-fails-tally-through-bundled-cli.md`
|
||||
- `docs/tests/council-review-skill/council-start-with-target-file-through-bundled-cli.md`
|
||||
- `docs/implementation-roadmap.md`
|
||||
- `docs/roadmaps/archive/council-review-skill-gap-fill.md`
|
||||
|
||||
## Decisions
|
||||
|
||||
- Prioritize skill-level scenarios that add real workflow coverage rather than mechanically duplicating every lower-level `orch council` contract.
|
||||
- Focus the gap fill on explicit report filtering, strict tally semantics, malformed reviewer output, and non-prompt target context.
|
||||
|
||||
## Blockers
|
||||
|
||||
- none
|
||||
|
||||
## Next Step
|
||||
|
||||
- Capture recorded example runs or direct CLI replays for the new council-review skill cases when execution evidence is needed.
|
||||
|
||||
## Completion Summary
|
||||
|
||||
- Expanded `docs/tests/council-review-skill/README.md` so the shared execution contract now explicitly covers target-file fixtures and indexes the five new gap-fill cases.
|
||||
- Added five forward-test case documents covering explicit `--show all` minority reporting, invalid `--show` rejection, strict tally semantics, malformed reviewer JSON at tally time, and target-file council start inputs.
|
||||
- Updated `docs/implementation-roadmap.md` to record that the council-review skill test-plan directory now covers nine workflow scenarios rather than only the initial smoke suite.
|
||||
@@ -30,7 +30,7 @@ Use these defaults unless a case file explicitly overrides them:
|
||||
- require the leader to coordinate through the bundled `./assets/orch` CLI from the council-review skill instead of ordinary chat
|
||||
- require reviewer agents to coordinate through the bundled `./assets/inbox` CLI from their skill instead of ordinary chat
|
||||
- validate final council run, reviewer task state, and report state independently from the main thread after the agents stop
|
||||
- create any required repo fixture before launching agents for mixed or repo-target cases
|
||||
- create any required target-file or repo fixture before launching agents for target-file, mixed, or repo-target cases
|
||||
|
||||
## How An Agent Runs These Cases
|
||||
|
||||
@@ -41,7 +41,7 @@ The test-runner agent is responsible for:
|
||||
- reading this `README.md` first, then one specific case file
|
||||
- creating an isolated temporary directory and DB path for that run
|
||||
- initializing the DB once through the bundled inbox CLI before launching role agents
|
||||
- creating any required temporary Git repo fixture before launching role agents
|
||||
- creating any required temporary target file or Git repo fixture before launching role agents
|
||||
- launching the role agents described in `Agent Topology`
|
||||
- injecting `skills/council-review/` into the leader and `skills/inbox/` into reviewers
|
||||
- passing each role agent the prompt text from the case file with concrete values substituted for `COUNCIL_SKILL_PATH`, `INBOX_SKILL_PATH`, `TMPDIR`, `RUN_ID`, `THREAD_ID`, and `REPORT_PATH` when needed
|
||||
@@ -148,6 +148,11 @@ Each case file should use this structure:
|
||||
| `council-unanimous-only-default-report-through-bundled-cli` | [council-unanimous-only-default-report-through-bundled-cli.md](./council-unanimous-only-default-report-through-bundled-cli.md) | validates that unanimous-only runs default to `consensus` output while preserving the underlying summary counts |
|
||||
| `council-wait-timeout-through-bundled-cli` | [council-wait-timeout-through-bundled-cli.md](./council-wait-timeout-through-bundled-cli.md) | validates that the leader sees the expected timeout contract when reviewer tasks do not complete |
|
||||
| `council-report-rejects-before-tally-through-bundled-cli` | [council-report-rejects-before-tally-through-bundled-cli.md](./council-report-rejects-before-tally-through-bundled-cli.md) | validates that the skill surfaces the stable invalid-state error when report is attempted before tally |
|
||||
| `council-report-show-all-includes-minority-through-bundled-cli` | [council-report-show-all-includes-minority-through-bundled-cli.md](./council-report-show-all-includes-minority-through-bundled-cli.md) | validates that an explicit `--show all` report includes the otherwise hidden minority group |
|
||||
| `council-report-rejects-invalid-show-through-bundled-cli` | [council-report-rejects-invalid-show-through-bundled-cli.md](./council-report-rejects-invalid-show-through-bundled-cli.md) | validates that the leader sees the stable `invalid_input` contract for an invalid report bucket selection |
|
||||
| `council-tally-strict-keeps-distinct-proposals-through-bundled-cli` | [council-tally-strict-keeps-distinct-proposals-through-bundled-cli.md](./council-tally-strict-keeps-distinct-proposals-through-bundled-cli.md) | validates that strict similarity preserves near-duplicate wording as separate minority groups |
|
||||
| `council-reviewer-output-invalid-json-fails-tally-through-bundled-cli` | [council-reviewer-output-invalid-json-fails-tally-through-bundled-cli.md](./council-reviewer-output-invalid-json-fails-tally-through-bundled-cli.md) | validates that malformed reviewer result JSON reaches the leader as the stable tally-time `invalid_input` contract |
|
||||
| `council-start-with-target-file-through-bundled-cli` | [council-start-with-target-file-through-bundled-cli.md](./council-start-with-target-file-through-bundled-cli.md) | validates that the skill can start a council run from explicit `--target-file` context instead of a pure inline prompt |
|
||||
|
||||
## Scope
|
||||
|
||||
@@ -157,7 +162,10 @@ In scope:
|
||||
- bundled `./assets/orch` CLI usage for `orch council ...`
|
||||
- end-to-end council start, wait, tally, and report flows
|
||||
- interaction between a leader using `skills/council-review/` and reviewers using `skills/inbox/`
|
||||
- default report policy, unanimous-only behavior, and timeout/error-path validation
|
||||
- default report policy, explicit minority inclusion, and invalid report-filter validation
|
||||
- normal and strict tally behavior
|
||||
- malformed reviewer-output failure paths
|
||||
- non-prompt target context including `target-file`
|
||||
|
||||
Out of scope:
|
||||
|
||||
|
||||
+86
@@ -0,0 +1,86 @@
|
||||
# Case: `council-report-rejects-invalid-show-through-bundled-cli`
|
||||
|
||||
## Test Type
|
||||
|
||||
This is a `forward-test` and an invalid-input report-filter validation.
|
||||
|
||||
The goal is to verify that a leader using the packaged `council-review` skill reaches the stable `invalid_input` error contract when it asks `council report` for an unsupported bucket list.
|
||||
|
||||
## Purpose
|
||||
|
||||
Validate that all of the following can be true at the same time:
|
||||
|
||||
- the leader can drive a real council run through `start -> wait -> tally`
|
||||
- three reviewer agents can complete their tasks through the packaged inbox skill
|
||||
- the leader can attempt `council report --show consensus,invalid`
|
||||
- the skill surfaces the stable `invalid_input` error instead of silently dropping the bad bucket
|
||||
|
||||
## Preconditions
|
||||
|
||||
- council-review skill path exists: `COUNCIL_SKILL_PATH=skills/council-review`
|
||||
- inbox skill path exists: `INBOX_SKILL_PATH=skills/inbox`
|
||||
- bundled CLI executables exist at `COUNCIL_SKILL_PATH/assets/orch` and `INBOX_SKILL_PATH/assets/inbox`
|
||||
- use an empty temporary directory `TMPDIR`
|
||||
- initialize `TMPDIR/coord.db` before launching role agents through `INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json init`
|
||||
|
||||
## Agent Topology
|
||||
|
||||
- `leader`
|
||||
- `architecture-reviewer`
|
||||
- `implementation-reviewer`
|
||||
- `risk-reviewer`
|
||||
|
||||
## Inputs
|
||||
|
||||
### Leader Prompt
|
||||
|
||||
```text
|
||||
Use $council-review at COUNCIL_SKILL_PATH to act as leader on the already initialized SQLite DB TMPDIR/coord.db. Only coordinate through the bundled orch CLI from the skill. Workflow: 1) start council run council_skill_006 with a short architecture review prompt, 2) wait until all three reviewers complete, 3) tally with normal similarity, 4) attempt council report with --show consensus,invalid, 5) stop after reporting RUN_ID, exit code, and the error payload you observed. Do not use ordinary chat to coordinate with the reviewers.
|
||||
```
|
||||
|
||||
### Reviewer Prompts
|
||||
|
||||
- Reuse the same reviewer body JSON and inbox-only workflow as in [council-brainstorm-end-to-end-through-bundled-cli.md](./council-brainstorm-end-to-end-through-bundled-cli.md), but target run `council_skill_006`.
|
||||
|
||||
## Execution Parameters
|
||||
|
||||
- use the shared execution contract from [README.md](./README.md)
|
||||
- use the shared timeout defaults from [README.md](./README.md)
|
||||
- do not override the default cleanup policy
|
||||
|
||||
## Execution Steps
|
||||
|
||||
1. Initialize `TMPDIR/coord.db` once through the bundled inbox CLI before launching agents
|
||||
2. Inject `skills/council-review/` into `leader`
|
||||
3. Inject `skills/inbox/` into the three reviewer agents
|
||||
4. Point all agents at the same database path `TMPDIR/coord.db`
|
||||
5. Launch `leader`, `architecture-reviewer`, `implementation-reviewer`, and `risk-reviewer` in parallel
|
||||
6. Wait for all agents to finish
|
||||
7. Independently run the validation commands from the main thread
|
||||
|
||||
## Validation Commands
|
||||
|
||||
```bash
|
||||
COUNCIL_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json council report --run council_skill_006 --show consensus,invalid
|
||||
```
|
||||
|
||||
## Expected Outcomes
|
||||
|
||||
- the leader successfully starts `council_skill_006`
|
||||
- reviewer completion and tally both succeed before the invalid report attempt
|
||||
- the report command exits with the stable invalid-input contract
|
||||
- the error message names the accepted bucket values
|
||||
|
||||
## Assertions
|
||||
|
||||
- command exit code is `30`
|
||||
- error code is `invalid_input`
|
||||
- the error message mentions `consensus`
|
||||
- the error message mentions `majority`
|
||||
- the error message mentions `minority`
|
||||
- the error message mentions `all`
|
||||
|
||||
## Cleanup
|
||||
|
||||
- use the default cleanup policy from [README.md](./README.md)
|
||||
- if the run fails, retain `TMPDIR` and `coord.db` for replay and manual inspection
|
||||
+90
@@ -0,0 +1,90 @@
|
||||
# Case: `council-report-show-all-includes-minority-through-bundled-cli`
|
||||
|
||||
## Test Type
|
||||
|
||||
This is a `forward-test` and an explicit report-filter validation.
|
||||
|
||||
The goal is to verify that a leader using the packaged `council-review` skill can override the default report buckets and explicitly request the minority group through the bundled CLI.
|
||||
|
||||
## Purpose
|
||||
|
||||
Validate that all of the following can be true at the same time:
|
||||
|
||||
- the leader can drive a complete `start -> wait -> tally -> report` council flow through the bundled council-review skill
|
||||
- three reviewer agents can complete their tasks through the packaged inbox skill
|
||||
- the leader can request `council report --show all`
|
||||
- the final report includes `consensus`, `majority`, and `minority`
|
||||
|
||||
## Preconditions
|
||||
|
||||
- council-review skill path exists: `COUNCIL_SKILL_PATH=skills/council-review`
|
||||
- inbox skill path exists: `INBOX_SKILL_PATH=skills/inbox`
|
||||
- bundled CLI executables exist at `COUNCIL_SKILL_PATH/assets/orch` and `INBOX_SKILL_PATH/assets/inbox`
|
||||
- use an empty temporary directory `TMPDIR`
|
||||
- initialize `TMPDIR/coord.db` before launching role agents through `INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json init`
|
||||
|
||||
## Agent Topology
|
||||
|
||||
- `leader`
|
||||
- `architecture-reviewer`
|
||||
- `implementation-reviewer`
|
||||
- `risk-reviewer`
|
||||
|
||||
## Inputs
|
||||
|
||||
### Leader Prompt
|
||||
|
||||
```text
|
||||
Use $council-review at COUNCIL_SKILL_PATH to act as leader on the already initialized SQLite DB TMPDIR/coord.db. Only coordinate through the bundled orch CLI from the skill. Workflow: 1) start council run council_skill_005 with a short architecture review prompt, 2) wait until all three reviewers complete, 3) tally with normal similarity, 4) report with --show all, 5) stop after reporting RUN_ID, REPORT_PATH, and the show buckets you observed. Do not use ordinary chat to coordinate with the reviewers.
|
||||
```
|
||||
|
||||
### Reviewer Prompts
|
||||
|
||||
- Reuse the same reviewer body JSON and inbox-only workflow as in [council-brainstorm-end-to-end-through-bundled-cli.md](./council-brainstorm-end-to-end-through-bundled-cli.md), but target run `council_skill_005`.
|
||||
|
||||
## Execution Parameters
|
||||
|
||||
- use the shared execution contract from [README.md](./README.md)
|
||||
- use the shared timeout defaults from [README.md](./README.md)
|
||||
- do not override the default cleanup policy
|
||||
|
||||
## Execution Steps
|
||||
|
||||
1. Initialize `TMPDIR/coord.db` once through the bundled inbox CLI before launching agents
|
||||
2. Inject `skills/council-review/` into `leader`
|
||||
3. Inject `skills/inbox/` into the three reviewer agents
|
||||
4. Point all agents at the same database path `TMPDIR/coord.db`
|
||||
5. Launch `leader`, `architecture-reviewer`, `implementation-reviewer`, and `risk-reviewer` in parallel
|
||||
6. Wait for all agents to finish
|
||||
7. Resolve `RUN_ID=council_skill_005` and `REPORT_PATH` from the agent outputs
|
||||
8. Independently run the validation commands from the main thread
|
||||
|
||||
## Validation Commands
|
||||
|
||||
```bash
|
||||
COUNCIL_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json council report --run council_skill_005 --show all
|
||||
test -f REPORT_PATH
|
||||
```
|
||||
|
||||
## Expected Outcomes
|
||||
|
||||
- the leader successfully starts `council_skill_005`
|
||||
- all three reviewers complete their fixed-role tasks
|
||||
- the report succeeds with explicit `show == ["consensus","majority","minority"]`
|
||||
- the minority recommendation is present in `grouped_recommendations`
|
||||
- a markdown report artifact exists on disk
|
||||
|
||||
## Assertions
|
||||
|
||||
- `report.data.show == ["consensus","majority","minority"]`
|
||||
- `report.data.summary.consensus == 1`
|
||||
- `report.data.summary.majority == 1`
|
||||
- `report.data.summary.minority == 1`
|
||||
- `report.data.grouped_recommendations` length is `3`
|
||||
- at least one returned recommendation has `bucket == "minority"`
|
||||
- `REPORT_PATH` exists
|
||||
|
||||
## Cleanup
|
||||
|
||||
- use the default cleanup policy from [README.md](./README.md)
|
||||
- if the run fails, retain `TMPDIR` and `coord.db` for replay and manual inspection
|
||||
+109
@@ -0,0 +1,109 @@
|
||||
# Case: `council-reviewer-output-invalid-json-fails-tally-through-bundled-cli`
|
||||
|
||||
## Test Type
|
||||
|
||||
This is a `forward-test` and a malformed-reviewer-output validation.
|
||||
|
||||
The goal is to verify that a leader using the packaged `council-review` skill reaches the stable tally-time `invalid_input` contract when one reviewer completes its inbox task with malformed council JSON.
|
||||
|
||||
## Purpose
|
||||
|
||||
Validate that all of the following can be true at the same time:
|
||||
|
||||
- the leader can start a real council run through the bundled council-review skill
|
||||
- all three reviewer tasks can still reach terminal `done` state through the packaged inbox skill
|
||||
- one reviewer can return malformed JSON in the result body
|
||||
- the leader sees `council tally` fail with the expected invalid-input error instead of a silent partial tally
|
||||
- malformed JSON is exercised as the most realistic representative of the same reviewer-output validation layer that also rejects missing `reviewer_role` and role mismatches
|
||||
|
||||
## Preconditions
|
||||
|
||||
- council-review skill path exists: `COUNCIL_SKILL_PATH=skills/council-review`
|
||||
- inbox skill path exists: `INBOX_SKILL_PATH=skills/inbox`
|
||||
- bundled CLI executables exist at `COUNCIL_SKILL_PATH/assets/orch` and `INBOX_SKILL_PATH/assets/inbox`
|
||||
- use an empty temporary directory `TMPDIR`
|
||||
- initialize `TMPDIR/coord.db` before launching role agents through `INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json init`
|
||||
|
||||
## Agent Topology
|
||||
|
||||
- `leader`
|
||||
- `architecture-reviewer`
|
||||
- `implementation-reviewer`
|
||||
- `risk-reviewer`
|
||||
|
||||
## Inputs
|
||||
|
||||
### Leader Prompt
|
||||
|
||||
```text
|
||||
Use $council-review at COUNCIL_SKILL_PATH to act as leader on the already initialized SQLite DB TMPDIR/coord.db. Only coordinate through the bundled orch CLI from the skill. Workflow: 1) start council run council_skill_008 with a short architecture review prompt, 2) wait until all three reviewers complete, 3) attempt council tally with normal similarity, 4) stop after reporting RUN_ID, exit code, and the error payload you observed. Do not use ordinary chat to coordinate with the reviewers.
|
||||
```
|
||||
|
||||
### Architecture Reviewer Prompt
|
||||
|
||||
```text
|
||||
Use $inbox at INBOX_SKILL_PATH to act as architecture-reviewer on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill.
|
||||
|
||||
Workflow:
|
||||
1) fetch and claim your assigned council task
|
||||
2) write TMPDIR/architecture-invalid.json containing exactly this invalid JSON body:
|
||||
{"reviewer_role":"architecture-reviewer","findings":[{"title":"Split contracts","summary":"Transport contracts are mixed into UI code.","proposal":"Move API contract definitions into a dedicated module."}
|
||||
3) complete the task with done using summary "Review complete" and --body-file TMPDIR/architecture-invalid.json
|
||||
4) stop after reporting THREAD_ID and the body file path
|
||||
|
||||
Do not use ordinary chat to coordinate with the leader.
|
||||
```
|
||||
|
||||
### Implementation Reviewer Prompt
|
||||
|
||||
```text
|
||||
Use $inbox at INBOX_SKILL_PATH to act as implementation-reviewer on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) fetch and claim your assigned council task, 2) complete it with done using this exact JSON body: {"reviewer_role":"implementation-reviewer","findings":[{"title":"Extract API contracts","summary":"Shared transport shapes are duplicated.","proposal":"Move API contract definitions into dedicated module","rationale":"This reduces duplication.","confidence":"medium","tags":["maintainability"],"target_refs":{"repo_path":"."}}]}, 3) stop after reporting THREAD_ID. Do not use ordinary chat to coordinate with the leader.
|
||||
```
|
||||
|
||||
### Risk Reviewer Prompt
|
||||
|
||||
```text
|
||||
Use $inbox at INBOX_SKILL_PATH to act as risk-reviewer on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) fetch and claim your assigned council task, 2) complete it with done using this exact JSON body: {"reviewer_role":"risk-reviewer","findings":[{"title":"Add auth integration tests","summary":"Login regressions are hard to catch.","proposal":"Add integration tests for auth flows.","rationale":"This catches regressions earlier.","confidence":"high","tags":["risk"],"target_refs":{"repo_path":"."}}]}, 3) stop after reporting THREAD_ID. Do not use ordinary chat to coordinate with the leader.
|
||||
```
|
||||
|
||||
## Execution Parameters
|
||||
|
||||
- use the shared execution contract from [README.md](./README.md)
|
||||
- use the shared timeout defaults from [README.md](./README.md)
|
||||
- do not override the default cleanup policy
|
||||
|
||||
## Execution Steps
|
||||
|
||||
1. Initialize `TMPDIR/coord.db` once through the bundled inbox CLI before launching agents
|
||||
2. Inject `skills/council-review/` into `leader`
|
||||
3. Inject `skills/inbox/` into the three reviewer agents
|
||||
4. Point all agents at the same database path `TMPDIR/coord.db`
|
||||
5. Launch `leader`, `architecture-reviewer`, `implementation-reviewer`, and `risk-reviewer` in parallel
|
||||
6. Wait for all agents to finish
|
||||
7. Independently run the validation commands from the main thread
|
||||
|
||||
## Validation Commands
|
||||
|
||||
```bash
|
||||
COUNCIL_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json council wait --run council_skill_008 --timeout-seconds 2
|
||||
COUNCIL_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json council tally --run council_skill_008 --similarity normal
|
||||
```
|
||||
|
||||
## Expected Outcomes
|
||||
|
||||
- all three reviewer tasks still reach terminal `done`
|
||||
- `council wait` returns `all_complete == true`
|
||||
- `council tally` exits with the stable invalid-input contract
|
||||
- the error message indicates that reviewer output must be valid JSON
|
||||
|
||||
## Assertions
|
||||
|
||||
- `wait.data.all_complete == true`
|
||||
- command exit code for `council tally` is `30`
|
||||
- error code is `invalid_input`
|
||||
- the error message mentions `reviewer output must be valid JSON`
|
||||
|
||||
## Cleanup
|
||||
|
||||
- use the default cleanup policy from [README.md](./README.md)
|
||||
- if the run fails, retain `TMPDIR` and `coord.db` for replay and manual inspection
|
||||
@@ -0,0 +1,97 @@
|
||||
# Case: `council-start-with-target-file-through-bundled-cli`
|
||||
|
||||
## Test Type
|
||||
|
||||
This is a `forward-test` and a non-prompt target-context validation.
|
||||
|
||||
The goal is to verify that a leader using the packaged `council-review` skill can start a council run from explicit `--target-file` context instead of relying on a pure inline prompt.
|
||||
|
||||
## Purpose
|
||||
|
||||
Validate that all of the following can be true at the same time:
|
||||
|
||||
- the test runner can prepare a concrete brief file before launching the leader
|
||||
- the leader can start a council run through the bundled council-review skill using `--target-file`
|
||||
- the target-file path is persisted in council input metadata
|
||||
- reviewer tasks are still dispatched normally from the file-based target
|
||||
|
||||
## Preconditions
|
||||
|
||||
- council-review skill path exists: `COUNCIL_SKILL_PATH=skills/council-review`
|
||||
- inbox skill path exists: `INBOX_SKILL_PATH=skills/inbox`
|
||||
- bundled CLI executables exist at `COUNCIL_SKILL_PATH/assets/orch` and `INBOX_SKILL_PATH/assets/inbox`
|
||||
- `sqlite3` is available locally for metadata validation
|
||||
- use an empty temporary directory `TMPDIR`
|
||||
- initialize `TMPDIR/coord.db` before launching the leader through `INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json init`
|
||||
|
||||
## Agent Topology
|
||||
|
||||
- `leader`
|
||||
|
||||
## Inputs
|
||||
|
||||
### Target File Fixture
|
||||
|
||||
Create `TMPDIR/brief.md` before launching the leader with contents similar to:
|
||||
|
||||
```md
|
||||
# Brief
|
||||
|
||||
Review the current council-review packaging flow.
|
||||
|
||||
- Confirm the skill can carry file-based context.
|
||||
- Focus on documentation quality and report semantics.
|
||||
```
|
||||
|
||||
### Leader Prompt
|
||||
|
||||
```text
|
||||
Use $council-review at COUNCIL_SKILL_PATH to act as leader on the already initialized SQLite DB TMPDIR/coord.db. Only coordinate through the bundled orch CLI from the skill. Workflow: 1) start council run council_skill_009 using --target-file TMPDIR/brief.md, --target-type mixed, and --mode review, 2) stop after reporting RUN_ID and the target metadata you observed from the start response. Do not use ordinary chat to simulate reviewer work.
|
||||
```
|
||||
|
||||
## Execution Parameters
|
||||
|
||||
- use the shared execution contract from [README.md](./README.md)
|
||||
- use the shared timeout defaults from [README.md](./README.md)
|
||||
- do not override the default cleanup policy
|
||||
|
||||
## Execution Steps
|
||||
|
||||
1. Initialize `TMPDIR/coord.db` once through the bundled inbox CLI before launching agents
|
||||
2. Create `TMPDIR/brief.md` with the target file contents
|
||||
3. Inject `skills/council-review/` into `leader`
|
||||
4. Point the leader at the database path `TMPDIR/coord.db`
|
||||
5. Launch the leader
|
||||
6. Wait for the leader to finish
|
||||
7. Independently run the validation commands from the main thread
|
||||
|
||||
## Validation Commands
|
||||
|
||||
```bash
|
||||
COUNCIL_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json run show --run council_skill_009
|
||||
COUNCIL_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json status --run council_skill_009
|
||||
sqlite3 TMPDIR/coord.db "SELECT prompt, target_file, repo_path, target_task_id FROM council_inputs WHERE run_id = 'council_skill_009';"
|
||||
sqlite3 TMPDIR/coord.db "SELECT acceptance_json FROM tasks WHERE run_id = 'council_skill_009' AND task_id = 'CR1';"
|
||||
```
|
||||
|
||||
## Expected Outcomes
|
||||
|
||||
- the leader successfully starts `council_skill_009`
|
||||
- the run goal references the target file rather than an inline prompt
|
||||
- the stored council input row keeps `target_file == TMPDIR/brief.md`
|
||||
- reviewer task dispatch still produces the usual three council tasks
|
||||
- reviewer task acceptance metadata carries the `target_file` reference forward
|
||||
|
||||
## Assertions
|
||||
|
||||
- `run_show.data.run.goal` mentions `brief.md`
|
||||
- `status.data.tasks` length is `3`
|
||||
- `status.data.run.status` is not terminal
|
||||
- the `council_inputs` row has empty `prompt`, `repo_path`, and `target_task_id`
|
||||
- the `council_inputs` row has `target_file == "TMPDIR/brief.md"`
|
||||
- the `CR1` acceptance JSON contains `"target_file":"TMPDIR/brief.md"`
|
||||
|
||||
## Cleanup
|
||||
|
||||
- use the default cleanup policy from [README.md](./README.md)
|
||||
- if the run fails, retain `TMPDIR`, `brief.md`, and `coord.db` for replay and manual inspection
|
||||
+104
@@ -0,0 +1,104 @@
|
||||
# Case: `council-tally-strict-keeps-distinct-proposals-through-bundled-cli`
|
||||
|
||||
## Test Type
|
||||
|
||||
This is a `forward-test` and a strict-similarity tally validation.
|
||||
|
||||
The goal is to verify that a leader using the packaged `council-review` skill can request `--similarity strict` and preserve wording-level proposal differences that would normally collapse in `normal` mode.
|
||||
|
||||
## Purpose
|
||||
|
||||
Validate that all of the following can be true at the same time:
|
||||
|
||||
- the leader can drive `start -> wait -> tally` through the bundled council-review skill
|
||||
- three reviewer agents can complete their tasks through the packaged inbox skill
|
||||
- the architecture and implementation reviewers can submit near-duplicate but not identical proposals
|
||||
- strict tally keeps all three proposals as separate minority groups
|
||||
|
||||
## Preconditions
|
||||
|
||||
- council-review skill path exists: `COUNCIL_SKILL_PATH=skills/council-review`
|
||||
- inbox skill path exists: `INBOX_SKILL_PATH=skills/inbox`
|
||||
- bundled CLI executables exist at `COUNCIL_SKILL_PATH/assets/orch` and `INBOX_SKILL_PATH/assets/inbox`
|
||||
- use an empty temporary directory `TMPDIR`
|
||||
- initialize `TMPDIR/coord.db` before launching role agents through `INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json init`
|
||||
|
||||
## Agent Topology
|
||||
|
||||
- `leader`
|
||||
- `architecture-reviewer`
|
||||
- `implementation-reviewer`
|
||||
- `risk-reviewer`
|
||||
|
||||
## Inputs
|
||||
|
||||
### Leader Prompt
|
||||
|
||||
```text
|
||||
Use $council-review at COUNCIL_SKILL_PATH to act as leader on the already initialized SQLite DB TMPDIR/coord.db. Only coordinate through the bundled orch CLI from the skill. Workflow: 1) start council run council_skill_007 with a short architecture review prompt, 2) wait until all three reviewers complete, 3) tally with --similarity strict, 4) stop after reporting RUN_ID, tally counts, and the grouped proposals you observed. Do not use ordinary chat to coordinate with the reviewers.
|
||||
```
|
||||
|
||||
### Architecture Reviewer Prompt
|
||||
|
||||
```text
|
||||
Use $inbox at INBOX_SKILL_PATH to act as architecture-reviewer on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) fetch and claim your assigned council task, 2) complete it with done using this exact JSON body: {"reviewer_role":"architecture-reviewer","findings":[{"title":"Split contracts","summary":"Transport contracts are mixed into UI code.","proposal":"Move API contract definitions into a dedicated module.","rationale":"This lowers coupling.","confidence":"high","tags":["architecture"],"target_refs":{"repo_path":"."}}]}, 3) stop after reporting THREAD_ID. Do not use ordinary chat to coordinate with the leader.
|
||||
```
|
||||
|
||||
### Implementation Reviewer Prompt
|
||||
|
||||
```text
|
||||
Use $inbox at INBOX_SKILL_PATH to act as implementation-reviewer on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) fetch and claim your assigned council task, 2) complete it with done using this exact JSON body: {"reviewer_role":"implementation-reviewer","findings":[{"title":"Extract API contracts","summary":"Shared transport shapes are duplicated.","proposal":"Move API contract definitions into dedicated module","rationale":"This reduces duplication.","confidence":"medium","tags":["maintainability"],"target_refs":{"repo_path":"."}}]}, 3) stop after reporting THREAD_ID. Do not use ordinary chat to coordinate with the leader.
|
||||
```
|
||||
|
||||
### Risk Reviewer Prompt
|
||||
|
||||
```text
|
||||
Use $inbox at INBOX_SKILL_PATH to act as risk-reviewer on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) fetch and claim your assigned council task, 2) complete it with done using this exact JSON body: {"reviewer_role":"risk-reviewer","findings":[{"title":"Add auth integration tests","summary":"Login regressions are hard to catch.","proposal":"Add integration tests for auth flows.","rationale":"This catches regressions earlier.","confidence":"high","tags":["risk"],"target_refs":{"repo_path":"."}}]}, 3) stop after reporting THREAD_ID. Do not use ordinary chat to coordinate with the leader.
|
||||
```
|
||||
|
||||
## Execution Parameters
|
||||
|
||||
- use the shared execution contract from [README.md](./README.md)
|
||||
- use the shared timeout defaults from [README.md](./README.md)
|
||||
- do not override the default cleanup policy
|
||||
|
||||
## Execution Steps
|
||||
|
||||
1. Initialize `TMPDIR/coord.db` once through the bundled inbox CLI before launching agents
|
||||
2. Inject `skills/council-review/` into `leader`
|
||||
3. Inject `skills/inbox/` into the three reviewer agents
|
||||
4. Point all agents at the same database path `TMPDIR/coord.db`
|
||||
5. Launch `leader`, `architecture-reviewer`, `implementation-reviewer`, and `risk-reviewer` in parallel
|
||||
6. Wait for all agents to finish
|
||||
7. Independently run the validation commands from the main thread
|
||||
|
||||
## Validation Commands
|
||||
|
||||
```bash
|
||||
COUNCIL_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json council wait --run council_skill_007 --timeout-seconds 2
|
||||
COUNCIL_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json council tally --run council_skill_007 --similarity strict
|
||||
```
|
||||
|
||||
## Expected Outcomes
|
||||
|
||||
- all three reviewers complete their fixed-role tasks
|
||||
- `council wait` returns `all_complete == true`
|
||||
- `council tally` succeeds with `similarity == "strict"`
|
||||
- the two nearly identical contract proposals remain separate rather than merging
|
||||
- every resulting recommendation lands in `minority`
|
||||
|
||||
## Assertions
|
||||
|
||||
- `wait.data.all_complete == true`
|
||||
- `tally.data.similarity == "strict"`
|
||||
- `tally.data.counts.minority == 3`
|
||||
- `tally.data.grouped_recommendations` length is `3`
|
||||
- every returned recommendation has `bucket == "minority"`
|
||||
- the returned proposal set contains `Move API contract definitions into a dedicated module.`
|
||||
- the returned proposal set contains `Move API contract definitions into dedicated module`
|
||||
- the returned proposal set contains `Add integration tests for auth flows.`
|
||||
|
||||
## Cleanup
|
||||
|
||||
- use the default cleanup policy from [README.md](./README.md)
|
||||
- if the run fails, retain `TMPDIR` and `coord.db` for replay and manual inspection
|
||||
Reference in New Issue
Block a user