Add council-review skill test plan docs
This commit is contained in:
@@ -0,0 +1,174 @@
|
||||
# Council Review Skill Test Plan
|
||||
|
||||
## Purpose
|
||||
|
||||
This directory tracks human-readable test plans for the `skills/council-review/` Codex skill bundle.
|
||||
|
||||
These documents are not command-contract specs for the `orch council` CLI itself.
|
||||
That coverage already lives under [../orch/](../orch/).
|
||||
|
||||
This directory exists to describe a different test surface:
|
||||
|
||||
- whether a leader agent can actually use the packaged `council-review` skill
|
||||
- whether the bundled `./assets/orch` CLI works inside real skill-guided council workflows
|
||||
- whether a council run driven by the skill reaches the expected reviewer, grouping, tally, and report state
|
||||
|
||||
## Test Model
|
||||
|
||||
- `README.md` is the index for this directory
|
||||
- each skill test case lives in its own Markdown file
|
||||
- use stable case slugs in filenames
|
||||
|
||||
## Shared Execution Contract
|
||||
|
||||
Use these defaults unless a case file explicitly overrides them:
|
||||
|
||||
- run the scenario with real subagents, not simulated transcripts
|
||||
- inject `skills/council-review/` into the leader agent
|
||||
- inject `skills/inbox/` into reviewer agents whenever reviewer task completion is required
|
||||
- initialize the shared SQLite DB before launching role agents with `INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json init`
|
||||
- require the leader to coordinate through the bundled `./assets/orch` CLI from the council-review skill instead of ordinary chat
|
||||
- require reviewer agents to coordinate through the bundled `./assets/inbox` CLI from their skill instead of ordinary chat
|
||||
- validate final council run, reviewer task state, and report state independently from the main thread after the agents stop
|
||||
- create any required repo fixture before launching agents for mixed or repo-target cases
|
||||
|
||||
## How An Agent Runs These Cases
|
||||
|
||||
Use one test-runner agent to execute each case.
|
||||
|
||||
The test-runner agent is responsible for:
|
||||
|
||||
- reading this `README.md` first, then one specific case file
|
||||
- creating an isolated temporary directory and DB path for that run
|
||||
- initializing the DB once through the bundled inbox CLI before launching role agents
|
||||
- creating any required temporary Git repo fixture before launching role agents
|
||||
- launching the role agents described in `Agent Topology`
|
||||
- injecting `skills/council-review/` into the leader and `skills/inbox/` into reviewers
|
||||
- passing each role agent the prompt text from the case file with concrete values substituted for `COUNCIL_SKILL_PATH`, `INBOX_SKILL_PATH`, `TMPDIR`, `RUN_ID`, `THREAD_ID`, and `REPORT_PATH` when needed
|
||||
- coordinating launch order or parallel start according to the case file
|
||||
- collecting agent final summaries as evidence
|
||||
- resolving final run ids, thread ids, and report artifact paths from agent outputs
|
||||
- running the `Validation Commands` from the main thread after the role agents stop
|
||||
- comparing the observed results against `Expected Outcomes` and `Assertions`
|
||||
- returning a final pass/fail judgment with concrete evidence
|
||||
|
||||
The role agents are responsible for:
|
||||
|
||||
- acting only within the role assigned in the case file
|
||||
- using the injected skill bundle rather than ad hoc repository discovery
|
||||
- coordinating through the bundled CLI and shared DB
|
||||
- reporting concrete run ids, thread ids, report artifact paths, and key command outcomes back to the test-runner agent
|
||||
|
||||
The test-runner agent should treat a case as passed only when:
|
||||
|
||||
- all role agents reach a final state without violating the case contract
|
||||
- the independent validation commands succeed
|
||||
- the final council, orch, and inbox state matches the assertions in the case file
|
||||
|
||||
The test-runner agent should treat a case as failed when:
|
||||
|
||||
- any required agent times out or stalls
|
||||
- a required council, orch, or inbox action is skipped
|
||||
- the leader falls back to ordinary chat for workflow control that should go through the bundled council-review skill
|
||||
- reviewer agents fall back to ordinary chat instead of returning results through inbox
|
||||
- the final council grouping, summary, or report state conflicts with the documented assertions
|
||||
|
||||
The test-runner agent should report results in this shape:
|
||||
|
||||
- `case`
|
||||
- `db_path`
|
||||
- `run_id`
|
||||
- `thread_ids`
|
||||
- `report_paths`
|
||||
- `result`: `pass` or `fail`
|
||||
- `agent_summaries`
|
||||
- `validation_evidence`
|
||||
- `assertion_checklist`
|
||||
- `notes`
|
||||
|
||||
## Default Timeouts
|
||||
|
||||
Use these defaults unless a case file explicitly overrides them:
|
||||
|
||||
- per-agent timeout: `4m`
|
||||
- overall scenario timeout: `6m`
|
||||
- async wait margin for the main thread: `45s`
|
||||
|
||||
## Default Failure Conditions
|
||||
|
||||
Treat the test as failed if any of the following happens:
|
||||
|
||||
- any required agent does not reach a final state before timeout
|
||||
- any required council, orch, or inbox command returns a non-success result unless the case expects that failure
|
||||
- the final `council report --json` output does not match the expected grouped recommendations
|
||||
- the final `orch status` output does not match the expected reviewer task state
|
||||
- a required markdown report artifact is missing when the case expects one
|
||||
- the agents fall back to ordinary chat for critical coordination instead of the bundled CLIs
|
||||
|
||||
## Evidence Capture
|
||||
|
||||
Collect at least the following artifacts for every run:
|
||||
|
||||
- agent final summaries
|
||||
- final `council report --json` output when the case reaches report stage
|
||||
- final `orch status --run RUN_ID --json` output
|
||||
- final `inbox show --thread THREAD_ID --json` output for every relevant reviewer thread when reviewers participated
|
||||
- any `council wait` or `council tally` output relevant to the case
|
||||
- the temporary DB path, resolved run id, resolved thread ids, and any report artifact paths
|
||||
|
||||
## Cleanup Policy
|
||||
|
||||
Use these defaults unless a case file explicitly overrides them:
|
||||
|
||||
- keep the temporary DB, repo fixture, and working directory on failure for debugging
|
||||
- cleanup the temporary working directory on success only if the caller does not need replay artifacts
|
||||
|
||||
## Per-Case Template
|
||||
|
||||
Each case file should use this structure:
|
||||
|
||||
- `Test Type`
|
||||
- `Purpose`
|
||||
- `Preconditions`
|
||||
- `Agent Topology`
|
||||
- `Inputs`
|
||||
- `Execution Parameters`
|
||||
- `Execution Steps`
|
||||
- `Validation Commands`
|
||||
- `Expected Outcomes`
|
||||
- `Assertions`
|
||||
- `Cleanup`
|
||||
- `Recorded Example Run` when a real run has already been captured
|
||||
|
||||
## Case Files
|
||||
|
||||
| Case Slug | File | Coverage Note |
|
||||
| --- | --- | --- |
|
||||
| `council-brainstorm-end-to-end-through-bundled-cli` | [council-brainstorm-end-to-end-through-bundled-cli.md](./council-brainstorm-end-to-end-through-bundled-cli.md) | validates that the council-review skill can drive `start -> wait -> tally -> report` with three real reviewer agents |
|
||||
| `council-unanimous-only-default-report-through-bundled-cli` | [council-unanimous-only-default-report-through-bundled-cli.md](./council-unanimous-only-default-report-through-bundled-cli.md) | validates that unanimous-only runs default to `consensus` output while preserving the underlying summary counts |
|
||||
| `council-wait-timeout-through-bundled-cli` | [council-wait-timeout-through-bundled-cli.md](./council-wait-timeout-through-bundled-cli.md) | validates that the leader sees the expected timeout contract when reviewer tasks do not complete |
|
||||
| `council-report-rejects-before-tally-through-bundled-cli` | [council-report-rejects-before-tally-through-bundled-cli.md](./council-report-rejects-before-tally-through-bundled-cli.md) | validates that the skill surfaces the stable invalid-state error when report is attempted before tally |
|
||||
|
||||
## Scope
|
||||
|
||||
In scope:
|
||||
|
||||
- explicit `$council-review` skill invocation
|
||||
- bundled `./assets/orch` CLI usage for `orch council ...`
|
||||
- end-to-end council start, wait, tally, and report flows
|
||||
- interaction between a leader using `skills/council-review/` and reviewers using `skills/inbox/`
|
||||
- default report policy, unanimous-only behavior, and timeout/error-path validation
|
||||
|
||||
Out of scope:
|
||||
|
||||
- per-command flag and JSON contract coverage for `orch council`
|
||||
- generic leader orchestration flows that already belong under [../orch-skill/](../orch-skill/)
|
||||
- worker-only skill behavior that belongs under [../inbox-skill/](../inbox-skill/)
|
||||
- implicit skill triggering without `$council-review`
|
||||
|
||||
## Relationship To Other Test Docs
|
||||
|
||||
- [../orch/](../orch/) covers CLI command behavior
|
||||
- [../orch-skill/](../orch-skill/) covers generic leader-side orchestration behavior on top of `orch`
|
||||
- [../inbox-skill/](../inbox-skill/) covers worker-side skill-guided behavior on top of inbox
|
||||
- this directory covers the separate user-facing `council-review` skill on top of `orch council`
|
||||
@@ -0,0 +1,108 @@
|
||||
# Case: `council-brainstorm-end-to-end-through-bundled-cli`
|
||||
|
||||
## Test Type
|
||||
|
||||
This is a `forward-test` and a high-level council workflow validation.
|
||||
|
||||
The goal is to verify that a leader using the packaged `council-review` skill can drive `council start -> wait -> tally -> report` while three real reviewer agents return structured outputs through the packaged inbox skill.
|
||||
|
||||
## Purpose
|
||||
|
||||
Validate that all of the following can be true at the same time:
|
||||
|
||||
- the leader can use the bundled `./assets/orch` CLI through the council-review skill
|
||||
- three reviewer agents can claim and complete their fixed-role inbox tasks
|
||||
- the leader can wait, tally, and report after all reviewer outputs arrive
|
||||
- the final report defaults to `consensus,majority`
|
||||
- a markdown report artifact is written
|
||||
|
||||
## Preconditions
|
||||
|
||||
- council-review skill path exists: `COUNCIL_SKILL_PATH=skills/council-review`
|
||||
- inbox skill path exists: `INBOX_SKILL_PATH=skills/inbox`
|
||||
- bundled CLI executables exist at `COUNCIL_SKILL_PATH/assets/orch` and `INBOX_SKILL_PATH/assets/inbox`
|
||||
- use an empty temporary directory `TMPDIR`
|
||||
- initialize `TMPDIR/coord.db` before launching role agents through `INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json init`
|
||||
|
||||
## Agent Topology
|
||||
|
||||
- `leader`
|
||||
- `architecture-reviewer`
|
||||
- `implementation-reviewer`
|
||||
- `risk-reviewer`
|
||||
|
||||
## Inputs
|
||||
|
||||
### Leader Prompt
|
||||
|
||||
```text
|
||||
Use $council-review at COUNCIL_SKILL_PATH to act as leader on the already initialized SQLite DB TMPDIR/coord.db. Only coordinate through the bundled orch CLI from the skill. Workflow: 1) start council run council_skill_001 with a short architecture review prompt, 2) wait until all three reviewers complete, 3) tally with normal similarity, 4) report with default settings, 5) stop after reporting RUN_ID and REPORT_PATH. Do not use ordinary chat to coordinate with the reviewers.
|
||||
```
|
||||
|
||||
### Architecture Reviewer Prompt
|
||||
|
||||
```text
|
||||
Use $inbox at INBOX_SKILL_PATH to act as architecture-reviewer on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) fetch and claim your assigned council task, 2) complete it with done using this exact JSON body: {"reviewer_role":"architecture-reviewer","findings":[{"title":"Split contracts","summary":"Transport contracts are mixed into UI code.","proposal":"Move API contract definitions into a dedicated module.","rationale":"This lowers coupling.","confidence":"high","tags":["architecture"],"target_refs":{"repo_path":"."}},{"title":"Share helpers","summary":"Council report rendering paths are repeated.","proposal":"Introduce shared council coordinator helpers for report rendering.","rationale":"This keeps report assembly consistent.","confidence":"medium","tags":["reporting"],"target_refs":{"repo_path":"."}}]}, 3) stop after reporting THREAD_ID. Do not use ordinary chat to coordinate with the leader.
|
||||
```
|
||||
|
||||
### Implementation Reviewer Prompt
|
||||
|
||||
```text
|
||||
Use $inbox at INBOX_SKILL_PATH to act as implementation-reviewer on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) fetch and claim your assigned council task, 2) complete it with done using this exact JSON body: {"reviewer_role":"implementation-reviewer","findings":[{"title":"Extract contracts","summary":"Shared transport shapes are duplicated.","proposal":"Move API contract definitions into dedicated module","rationale":"This reduces duplication.","confidence":"high","tags":["maintainability"],"target_refs":{"repo_path":"."}},{"title":"Reuse report helpers","summary":"Formatting logic should stay shared.","proposal":"Introduce shared council coordinator helpers for report rendering","rationale":"This avoids formatter drift.","confidence":"medium","tags":["reporting"],"target_refs":{"repo_path":"."}}]}, 3) stop after reporting THREAD_ID. Do not use ordinary chat to coordinate with the leader.
|
||||
```
|
||||
|
||||
### Risk Reviewer Prompt
|
||||
|
||||
```text
|
||||
Use $inbox at INBOX_SKILL_PATH to act as risk-reviewer on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) fetch and claim your assigned council task, 2) complete it with done using this exact JSON body: {"reviewer_role":"risk-reviewer","findings":[{"title":"Lock contracts","summary":"Contract drift becomes risky over time.","proposal":"Move API contract definitions into a dedicated module.","rationale":"This reduces integration regressions.","confidence":"high","tags":["risk"],"target_refs":{"repo_path":"."}},{"title":"Cover JSON output","summary":"The council report response should stay stable.","proposal":"Add regression tests for council report JSON output.","rationale":"This catches contract regressions earlier.","confidence":"high","tags":["testing"],"target_refs":{"repo_path":"."}}]}, 3) stop after reporting THREAD_ID. Do not use ordinary chat to coordinate with the leader.
|
||||
```
|
||||
|
||||
## Execution Parameters
|
||||
|
||||
- use the shared execution contract from [README.md](./README.md)
|
||||
- use the shared timeout defaults from [README.md](./README.md)
|
||||
- do not override the default cleanup policy
|
||||
|
||||
## Execution Steps
|
||||
|
||||
1. Initialize `TMPDIR/coord.db` once through the bundled inbox CLI before launching agents
|
||||
2. Inject `skills/council-review/` into `leader`
|
||||
3. Inject `skills/inbox/` into the three reviewer agents
|
||||
4. Point all agents at the same database path `TMPDIR/coord.db`
|
||||
5. Launch `leader`, `architecture-reviewer`, `implementation-reviewer`, and `risk-reviewer` in parallel
|
||||
6. Wait for all agents to finish
|
||||
7. Resolve `RUN_ID=council_skill_001`, reviewer `THREAD_ID`s, and `REPORT_PATH` from the agent outputs
|
||||
8. Independently run the validation commands from the main thread
|
||||
|
||||
## Validation Commands
|
||||
|
||||
```bash
|
||||
COUNCIL_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json status --run council_skill_001
|
||||
COUNCIL_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json council report --run council_skill_001
|
||||
test -f REPORT_PATH
|
||||
```
|
||||
|
||||
## Expected Outcomes
|
||||
|
||||
- the leader successfully starts `council_skill_001`
|
||||
- all three reviewers complete their fixed-role tasks
|
||||
- `council wait` returns `all_complete == true`
|
||||
- `council tally` returns one `consensus`, one `majority`, and one `minority`
|
||||
- `council report` defaults to showing `consensus,majority`
|
||||
- a markdown report artifact exists on disk
|
||||
|
||||
## Assertions
|
||||
|
||||
- `status.data.run.status == "done"`
|
||||
- `status.data.tasks` contains exactly three reviewer tasks and all are `done`
|
||||
- `report.data.show == ["consensus","majority"]`
|
||||
- `report.data.summary.consensus == 1`
|
||||
- `report.data.summary.majority == 1`
|
||||
- `report.data.summary.minority == 1`
|
||||
- `report.data.grouped_recommendations` length is `2`
|
||||
- `REPORT_PATH` exists
|
||||
|
||||
## Cleanup
|
||||
|
||||
- use the default cleanup policy from [README.md](./README.md)
|
||||
- if the run fails, retain `TMPDIR` and `coord.db` for replay and manual inspection
|
||||
+73
@@ -0,0 +1,73 @@
|
||||
# Case: `council-report-rejects-before-tally-through-bundled-cli`
|
||||
|
||||
## Test Type
|
||||
|
||||
This is a `forward-test` and an invalid-state council workflow validation.
|
||||
|
||||
The goal is to verify that a leader using the packaged `council-review` skill sees the expected stable error when report is attempted before grouped recommendations have been persisted.
|
||||
|
||||
## Purpose
|
||||
|
||||
Validate that all of the following can be true at the same time:
|
||||
|
||||
- the leader can start a council run through the bundled council-review skill
|
||||
- the leader can attempt report without tally
|
||||
- the command returns the stable invalid-state contract rather than fabricating an empty report
|
||||
|
||||
## Preconditions
|
||||
|
||||
- council-review skill path exists: `COUNCIL_SKILL_PATH=skills/council-review`
|
||||
- bundled CLI executable exists at `COUNCIL_SKILL_PATH/assets/orch`
|
||||
- inbox skill path exists: `INBOX_SKILL_PATH=skills/inbox`
|
||||
- use an empty temporary directory `TMPDIR`
|
||||
- initialize `TMPDIR/coord.db` before launching role agents through `INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json init`
|
||||
|
||||
## Agent Topology
|
||||
|
||||
- `leader`
|
||||
|
||||
## Inputs
|
||||
|
||||
### Leader Prompt
|
||||
|
||||
```text
|
||||
Use $council-review at COUNCIL_SKILL_PATH to act as leader on the already initialized SQLite DB TMPDIR/coord.db. Only coordinate through the bundled orch CLI from the skill. Workflow: 1) start council run council_skill_004 with a short review target, 2) attempt council report immediately without running tally, 3) stop after reporting RUN_ID, exit code, and error payload. Do not use ordinary chat to simulate reviewer output.
|
||||
```
|
||||
|
||||
## Execution Parameters
|
||||
|
||||
- use the shared execution contract from [README.md](./README.md)
|
||||
- use the shared timeout defaults from [README.md](./README.md)
|
||||
- do not override the default cleanup policy
|
||||
|
||||
## Execution Steps
|
||||
|
||||
1. Initialize `TMPDIR/coord.db` once through the bundled inbox CLI before launching agents
|
||||
2. Inject `skills/council-review/` into `leader`
|
||||
3. Point the leader at the database path `TMPDIR/coord.db`
|
||||
4. Launch the leader
|
||||
5. Wait for the leader to finish
|
||||
6. Independently run the validation commands from the main thread
|
||||
|
||||
## Validation Commands
|
||||
|
||||
```bash
|
||||
COUNCIL_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json council report --run council_skill_004
|
||||
```
|
||||
|
||||
## Expected Outcomes
|
||||
|
||||
- the leader successfully starts `council_skill_004`
|
||||
- the report command exits with the stable invalid-state contract
|
||||
- the error message indicates that council tally must run first
|
||||
|
||||
## Assertions
|
||||
|
||||
- command exit code is `30`
|
||||
- error code is `invalid_state`
|
||||
- the error message mentions that grouped recommendations are not available yet or that `council tally` must run first
|
||||
|
||||
## Cleanup
|
||||
|
||||
- use the default cleanup policy from [README.md](./README.md)
|
||||
- if the run fails, retain `TMPDIR` and `coord.db` for replay and manual inspection
|
||||
+88
@@ -0,0 +1,88 @@
|
||||
# Case: `council-unanimous-only-default-report-through-bundled-cli`
|
||||
|
||||
## Test Type
|
||||
|
||||
This is a `forward-test` and a unanimous-only reporting validation.
|
||||
|
||||
The goal is to verify that a leader using the packaged `council-review` skill can run a unanimous-only council and observe the expected default report behavior after tally.
|
||||
|
||||
## Purpose
|
||||
|
||||
Validate that all of the following can be true at the same time:
|
||||
|
||||
- the leader can start a council run with `--only-unanimous`
|
||||
- three reviewer agents can complete their tasks through the packaged inbox skill
|
||||
- the leader can tally and report through the bundled council-review skill
|
||||
- the final report defaults to `consensus` only while preserving the full summary counts
|
||||
|
||||
## Preconditions
|
||||
|
||||
- council-review skill path exists: `COUNCIL_SKILL_PATH=skills/council-review`
|
||||
- inbox skill path exists: `INBOX_SKILL_PATH=skills/inbox`
|
||||
- bundled CLI executables exist at `COUNCIL_SKILL_PATH/assets/orch` and `INBOX_SKILL_PATH/assets/inbox`
|
||||
- use an empty temporary directory `TMPDIR`
|
||||
- initialize `TMPDIR/coord.db` before launching role agents through `INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json init`
|
||||
|
||||
## Agent Topology
|
||||
|
||||
- `leader`
|
||||
- `architecture-reviewer`
|
||||
- `implementation-reviewer`
|
||||
- `risk-reviewer`
|
||||
|
||||
## Inputs
|
||||
|
||||
### Leader Prompt
|
||||
|
||||
```text
|
||||
Use $council-review at COUNCIL_SKILL_PATH to act as leader on the already initialized SQLite DB TMPDIR/coord.db. Only coordinate through the bundled orch CLI from the skill. Workflow: 1) start council run council_skill_002 with --only-unanimous, 2) wait until all three reviewers complete, 3) tally with normal similarity, 4) report with default settings, 5) stop after reporting RUN_ID and the default show buckets you observed. Do not use ordinary chat to coordinate with the reviewers.
|
||||
```
|
||||
|
||||
### Reviewer Prompts
|
||||
|
||||
- Reuse the same reviewer body JSON and inbox-only workflow as in [council-brainstorm-end-to-end-through-bundled-cli.md](./council-brainstorm-end-to-end-through-bundled-cli.md), but target run `council_skill_002`.
|
||||
|
||||
## Execution Parameters
|
||||
|
||||
- use the shared execution contract from [README.md](./README.md)
|
||||
- use the shared timeout defaults from [README.md](./README.md)
|
||||
- do not override the default cleanup policy
|
||||
|
||||
## Execution Steps
|
||||
|
||||
1. Initialize `TMPDIR/coord.db` once through the bundled inbox CLI before launching agents
|
||||
2. Inject `skills/council-review/` into `leader`
|
||||
3. Inject `skills/inbox/` into the three reviewer agents
|
||||
4. Point all agents at the same database path `TMPDIR/coord.db`
|
||||
5. Launch `leader`, `architecture-reviewer`, `implementation-reviewer`, and `risk-reviewer` in parallel
|
||||
6. Wait for all agents to finish
|
||||
7. Resolve `RUN_ID=council_skill_002` from the agent outputs
|
||||
8. Independently run the validation commands from the main thread
|
||||
|
||||
## Validation Commands
|
||||
|
||||
```bash
|
||||
COUNCIL_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json council report --run council_skill_002
|
||||
COUNCIL_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json status --run council_skill_002
|
||||
```
|
||||
|
||||
## Expected Outcomes
|
||||
|
||||
- the unanimous-only run completes successfully
|
||||
- the report default `show` value is only `consensus`
|
||||
- the underlying summary still contains `consensus`, `majority`, and `minority` counts
|
||||
- only the consensus group is returned in `grouped_recommendations`
|
||||
|
||||
## Assertions
|
||||
|
||||
- `report.data.show == ["consensus"]`
|
||||
- `report.data.summary.consensus == 1`
|
||||
- `report.data.summary.majority == 1`
|
||||
- `report.data.summary.minority == 1`
|
||||
- `report.data.grouped_recommendations` length is `1`
|
||||
- the sole returned recommendation has `bucket == "consensus"`
|
||||
|
||||
## Cleanup
|
||||
|
||||
- use the default cleanup policy from [README.md](./README.md)
|
||||
- if the run fails, retain `TMPDIR` and `coord.db` for replay and manual inspection
|
||||
@@ -0,0 +1,77 @@
|
||||
# Case: `council-wait-timeout-through-bundled-cli`
|
||||
|
||||
## Test Type
|
||||
|
||||
This is a `forward-test` and a timeout-path council workflow validation.
|
||||
|
||||
The goal is to verify that a leader using the packaged `council-review` skill sees the expected timeout contract when reviewer tasks do not complete.
|
||||
|
||||
## Purpose
|
||||
|
||||
Validate that all of the following can be true at the same time:
|
||||
|
||||
- the leader can start a council run through the bundled skill CLI
|
||||
- the leader can call `council wait` with a short timeout
|
||||
- the command reports `woke == false` and `all_complete == false`
|
||||
- reviewer task metadata remains visible for later follow-up
|
||||
|
||||
## Preconditions
|
||||
|
||||
- council-review skill path exists: `COUNCIL_SKILL_PATH=skills/council-review`
|
||||
- bundled CLI executable exists at `COUNCIL_SKILL_PATH/assets/orch`
|
||||
- inbox skill path exists: `INBOX_SKILL_PATH=skills/inbox`
|
||||
- use an empty temporary directory `TMPDIR`
|
||||
- initialize `TMPDIR/coord.db` before launching role agents through `INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json init`
|
||||
|
||||
## Agent Topology
|
||||
|
||||
- `leader`
|
||||
|
||||
## Inputs
|
||||
|
||||
### Leader Prompt
|
||||
|
||||
```text
|
||||
Use $council-review at COUNCIL_SKILL_PATH to act as leader on the already initialized SQLite DB TMPDIR/coord.db. Only coordinate through the bundled orch CLI from the skill. Workflow: 1) start council run council_skill_003 with a short review target, 2) immediately call council wait with a short timeout such as 1 second, 3) stop after reporting RUN_ID and the wait result you observed. Do not use ordinary chat to simulate reviewer output.
|
||||
```
|
||||
|
||||
## Execution Parameters
|
||||
|
||||
- use the shared execution contract from [README.md](./README.md)
|
||||
- override the council wait timeout to a short interval such as `1s`
|
||||
- do not override the default cleanup policy
|
||||
|
||||
## Execution Steps
|
||||
|
||||
1. Initialize `TMPDIR/coord.db` once through the bundled inbox CLI before launching agents
|
||||
2. Inject `skills/council-review/` into `leader`
|
||||
3. Point the leader at the database path `TMPDIR/coord.db`
|
||||
4. Launch the leader
|
||||
5. Wait for the leader to finish
|
||||
6. Independently run the validation commands from the main thread
|
||||
|
||||
## Validation Commands
|
||||
|
||||
```bash
|
||||
COUNCIL_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json council wait --run council_skill_003 --timeout-seconds 1
|
||||
COUNCIL_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json status --run council_skill_003
|
||||
```
|
||||
|
||||
## Expected Outcomes
|
||||
|
||||
- the leader successfully starts `council_skill_003`
|
||||
- `council wait` times out cleanly
|
||||
- the wait response still includes three reviewer statuses
|
||||
- the run remains non-terminal because reviewers have not completed
|
||||
|
||||
## Assertions
|
||||
|
||||
- `wait.data.woke == false`
|
||||
- `wait.data.all_complete == false`
|
||||
- `wait.data.reviewers` length is `3`
|
||||
- `status.data.run.status` is not `done`
|
||||
|
||||
## Cleanup
|
||||
|
||||
- use the default cleanup policy from [README.md](./README.md)
|
||||
- if the run fails, retain `TMPDIR` and `coord.db` for replay and manual inspection
|
||||
Reference in New Issue
Block a user