# Case: `council-reviewer-output-invalid-json-fails-tally-through-bundled-cli`

## Test Type

This is a `forward-test` and a malformed-reviewer-output validation.

The goal is to verify that a leader using the packaged `council-review` skill reaches the stable tally-time `invalid_input` contract when one reviewer completes its inbox task with malformed council JSON.

## Purpose

Validate that all of the following can be true at the same time:

- the leader can start a real council run through the bundled council-review skill
- all three reviewer tasks can still reach terminal `done` state through the packaged inbox skill
- one reviewer can return malformed JSON in the result body
- the leader sees `council tally` fail with the expected invalid-input error instead of a silent partial tally
- malformed JSON is exercised as the most realistic representative of the same reviewer-output validation layer that also rejects missing `reviewer_role` and role mismatches

## Preconditions

- council-review skill path exists: `COUNCIL_SKILL_PATH=skills/council-review`
- inbox skill path exists: `INBOX_SKILL_PATH=skills/inbox`
- bundled CLI executables exist at `COUNCIL_SKILL_PATH/assets/orch` and `INBOX_SKILL_PATH/assets/inbox`
- use an empty temporary directory `TMPDIR`
- initialize `TMPDIR/coord.db` before launching role agents through `INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json init`

## Agent Topology

- `leader`
- `architecture-reviewer`
- `implementation-reviewer`
- `risk-reviewer`

## Inputs

### Leader Prompt

```text
Use $council-review at COUNCIL_SKILL_PATH to act as leader on the already initialized SQLite DB TMPDIR/coord.db. Only coordinate through the bundled orch CLI from the skill. Workflow: 1) start council run council_skill_008 with a short architecture review prompt, 2) wait until all three reviewers complete, 3) attempt council tally with normal similarity, 4) stop after reporting RUN_ID, exit code, and the error payload you observed. Do not use ordinary chat to coordinate with the reviewers.
```

### Architecture Reviewer Prompt

```text
Use $inbox at INBOX_SKILL_PATH to act as architecture-reviewer on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill.

Workflow:
1) fetch and claim your assigned council task
2) write TMPDIR/architecture-invalid.json containing exactly this invalid JSON body:
{"reviewer_role":"architecture-reviewer","findings":[{"title":"Split contracts","summary":"Transport contracts are mixed into UI code.","proposal":"Move API contract definitions into a dedicated module."}
3) complete the task with done using summary "Review complete" and --body-file TMPDIR/architecture-invalid.json
4) stop after reporting THREAD_ID and the body file path

Do not use ordinary chat to coordinate with the leader.
```

### Implementation Reviewer Prompt

```text
Use $inbox at INBOX_SKILL_PATH to act as implementation-reviewer on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) fetch and claim your assigned council task, 2) complete it with done using this exact JSON body: {"reviewer_role":"implementation-reviewer","findings":[{"title":"Extract API contracts","summary":"Shared transport shapes are duplicated.","proposal":"Move API contract definitions into dedicated module","rationale":"This reduces duplication.","confidence":"medium","tags":["maintainability"],"target_refs":{"repo_path":"."}}]}, 3) stop after reporting THREAD_ID. Do not use ordinary chat to coordinate with the leader.
```

### Risk Reviewer Prompt

```text
Use $inbox at INBOX_SKILL_PATH to act as risk-reviewer on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) fetch and claim your assigned council task, 2) complete it with done using this exact JSON body: {"reviewer_role":"risk-reviewer","findings":[{"title":"Add auth integration tests","summary":"Login regressions are hard to catch.","proposal":"Add integration tests for auth flows.","rationale":"This catches regressions earlier.","confidence":"high","tags":["risk"],"target_refs":{"repo_path":"."}}]}, 3) stop after reporting THREAD_ID. Do not use ordinary chat to coordinate with the leader.
```

## Execution Parameters

- use the shared execution contract from [README.md](./README.md)
- use the shared timeout defaults from [README.md](./README.md)
- do not override the default cleanup policy

## Execution Steps

1. Initialize `TMPDIR/coord.db` once through the bundled inbox CLI before launching agents
2. Inject `skills/council-review/` into `leader`
3. Inject `skills/inbox/` into the three reviewer agents
4. Point all agents at the same database path `TMPDIR/coord.db`
5. Launch `leader`, `architecture-reviewer`, `implementation-reviewer`, and `risk-reviewer` in parallel
6. Wait for all agents to finish
7. Independently run the validation commands from the main thread

## Validation Commands

```bash
COUNCIL_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json council wait --run council_skill_008 --timeout-seconds 2
COUNCIL_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json council tally --run council_skill_008 --similarity normal
```

## Expected Outcomes

- all three reviewer tasks still reach terminal `done`
- `council wait` returns `all_complete == true`
- `council tally` exits with the stable invalid-input contract
- the error message indicates that reviewer output must be valid JSON

## Assertions

- `wait.data.all_complete == true`
- command exit code for `council tally` is `30`
- error code is `invalid_input`
- the error message mentions `reviewer output must be valid JSON`

## Cleanup

- use the default cleanup policy from [README.md](./README.md)
- if the run fails, retain `TMPDIR` and `coord.db` for replay and manual inspection

## Recorded Real Forward Run

- recorded on: `2026-03-19`
- execution mode: `real_subagent_forward_test`
- result: `pass`
- evidence root: `/tmp/council-reviewer-output-invalid-json-fails-tally-through-bundled-cli.narrow1.i6ZP98`
- observed run id: `council_skill_008`
- observed thread ids:
- `architecture-reviewer`: `thr_350c43fdf8a449228b8611ce5114326d`
- `implementation-reviewer`: `thr_db858b530cb044a7bceeaa417f1cea75`
- `risk-reviewer`: `thr_1c93381b070c47c49e312039b8343655`
- evidence summary:
- main-thread `council wait --run council_skill_008 --timeout-seconds 2 --json` returned `woke == true` and `all_complete == true`
- main-thread `council tally --run council_skill_008 --similarity normal --json` exited with code `30`
- the returned error payload was `invalid_input` with message `reviewer output must be valid JSON`
- this run confirmed the negative path where reviewer tasks are all `done` but tally still fails on stored reviewer-output validation