Author orch Markdown test plan

This commit is contained in:
2026-03-19 16:27:28 +08:00
parent b448d98e71
commit a20bec1cac
68 changed files with 2225 additions and 160 deletions
+5 -4
View File
@@ -1,7 +1,8 @@
# Orch `council tally` Test Plan Index
## Status
## Case Files
No command case files are authored yet.
Use [../ROADMAP.md](../ROADMAP.md) for planned case slugs and document progress.
| Case Slug | File | Coverage Note |
| --- | --- | --- |
| `council-tally-groups-reviewer-findings-in-normal-mode` | [council-tally-groups-reviewer-findings-in-normal-mode.md](./council-tally-groups-reviewer-findings-in-normal-mode.md) | groups semantically similar reviewer outputs into majority and minority buckets in `normal` mode |
| `council-tally-keeps-distinct-proposals-in-strict-mode` | [council-tally-keeps-distinct-proposals-in-strict-mode.md](./council-tally-keeps-distinct-proposals-in-strict-mode.md) | preserves wording differences as separate minority groups in `strict` mode |
@@ -0,0 +1,67 @@
# Case: `council-tally-groups-reviewer-findings-in-normal-mode`
## 用例意义
验证 `council tally --similarity normal` 会把语义相近的 reviewer proposal 合并到同一组,并产出 `majority` / `minority` bucket。
## 前置条件
- 使用隔离的临时目录 `TMPDIR`
- 本地可使用 `sqlite3``task_attempts` 中读取 reviewer thread ID
- 已准备好三份 reviewer 输出 JSON;其中 architecture 与 implementation proposal 语义相近,risk proposal 独立
## 输入
```bash
cat <<'EOF' > TMPDIR/architecture-review.json
{"reviewer_role":"architecture-reviewer","findings":[{"title":"Split contracts","summary":"Transport contracts are mixed into UI code.","proposal":"Move API contract definitions into a dedicated module.","rationale":"This lowers coupling.","confidence":"high","tags":["architecture","coupling"],"target_refs":{"repo_path":"."}}]}
EOF
cat <<'EOF' > TMPDIR/implementation-review.json
{"reviewer_role":"implementation-reviewer","findings":[{"title":"Extract API contracts","summary":"Shared transport shapes are duplicated.","proposal":"Move API contract definitions into dedicated module","rationale":"This reduces duplication.","confidence":"medium","tags":["maintainability"],"target_refs":{"repo_path":"."}}]}
EOF
cat <<'EOF' > TMPDIR/risk-review.json
{"reviewer_role":"risk-reviewer","findings":[{"title":"Add auth integration tests","summary":"Login regressions are hard to catch.","proposal":"Add integration tests for auth flows.","rationale":"This catches regressions earlier.","confidence":"high","tags":["risk","testing"],"target_refs":{"repo_path":"."}}]}
EOF
orch --db TMPDIR/coord.db --json council start \
--run council_blog_tally_001 \
--target "Review the current blog architecture."
THREAD_ID_CR1=$(sqlite3 TMPDIR/coord.db "SELECT thread_id FROM task_attempts WHERE run_id = 'council_blog_tally_001' AND task_id = 'CR1' AND attempt_no = 1;")
THREAD_ID_CR2=$(sqlite3 TMPDIR/coord.db "SELECT thread_id FROM task_attempts WHERE run_id = 'council_blog_tally_001' AND task_id = 'CR2' AND attempt_no = 1;")
THREAD_ID_CR3=$(sqlite3 TMPDIR/coord.db "SELECT thread_id FROM task_attempts WHERE run_id = 'council_blog_tally_001' AND task_id = 'CR3' AND attempt_no = 1;")
inbox --db TMPDIR/coord.db --json claim --agent architecture-reviewer --thread "$THREAD_ID_CR1"
inbox --db TMPDIR/coord.db --json done --agent architecture-reviewer --thread "$THREAD_ID_CR1" --summary "Review complete" --body-file TMPDIR/architecture-review.json
inbox --db TMPDIR/coord.db --json claim --agent implementation-reviewer --thread "$THREAD_ID_CR2"
inbox --db TMPDIR/coord.db --json done --agent implementation-reviewer --thread "$THREAD_ID_CR2" --summary "Review complete" --body-file TMPDIR/implementation-review.json
inbox --db TMPDIR/coord.db --json claim --agent risk-reviewer --thread "$THREAD_ID_CR3"
inbox --db TMPDIR/coord.db --json done --agent risk-reviewer --thread "$THREAD_ID_CR3" --summary "Review complete" --body-file TMPDIR/risk-review.json
orch --db TMPDIR/coord.db --json council tally \
--run council_blog_tally_001 \
--similarity normal
```
## 预期输出
- `council tally` 退出码为 `0`
- `tally.data.similarity == "normal"`
- `tally.data.counts.majority == 1`
- `tally.data.counts.minority == 1`
- `tally.data.grouped_recommendations` 长度为 `2`
- 第一组 recommendation 的 `bucket == "majority"`
- 第一组 recommendation 的 `support_count == 2`
## 断言结论
- `normal` 模式会优先按归一化意图合并 proposal,而不是逐字面比较
- tally 输出不仅返回统计摘要,还返回分组后的 recommendation 明细
## 补充约束
- reviewer `done` 消息体必须是结构化 JSON;无效 JSON 或缺失 `reviewer_role`/`proposal` 会让 tally 返回 `invalid_input`
@@ -0,0 +1,61 @@
# Case: `council-tally-keeps-distinct-proposals-in-strict-mode`
## 用例意义
验证 `council tally --similarity strict` 不会合并 wording 不同的 proposal,即使它们语义接近,也会保留为独立 recommendation。
## 前置条件
- 使用隔离的临时目录 `TMPDIR`
- 本地可使用 `sqlite3``task_attempts` 中读取 reviewer thread ID
- 已准备好三份 reviewer 输出 JSON;其中 architecture 与 implementation proposal 语义相近但措辞不同
## 输入
```bash
cat <<'EOF' > TMPDIR/architecture-review.json
{"reviewer_role":"architecture-reviewer","findings":[{"title":"Split contracts","summary":"Transport contracts are mixed into UI code.","proposal":"Move API contract definitions into a dedicated module.","rationale":"This lowers coupling.","confidence":"high","tags":["architecture"],"target_refs":{"repo_path":"."}}]}
EOF
cat <<'EOF' > TMPDIR/implementation-review.json
{"reviewer_role":"implementation-reviewer","findings":[{"title":"Extract API contracts","summary":"Shared transport shapes are duplicated.","proposal":"Move API contract definitions into dedicated module","rationale":"This reduces duplication.","confidence":"medium","tags":["maintainability"],"target_refs":{"repo_path":"."}}]}
EOF
cat <<'EOF' > TMPDIR/risk-review.json
{"reviewer_role":"risk-reviewer","findings":[{"title":"Add auth integration tests","summary":"Login regressions are hard to catch.","proposal":"Add integration tests for auth flows.","rationale":"This catches regressions earlier.","confidence":"high","tags":["risk"],"target_refs":{"repo_path":"."}}]}
EOF
orch --db TMPDIR/coord.db --json council start \
--run council_blog_tally_002 \
--target "Review the current blog architecture."
THREAD_ID_CR1=$(sqlite3 TMPDIR/coord.db "SELECT thread_id FROM task_attempts WHERE run_id = 'council_blog_tally_002' AND task_id = 'CR1' AND attempt_no = 1;")
THREAD_ID_CR2=$(sqlite3 TMPDIR/coord.db "SELECT thread_id FROM task_attempts WHERE run_id = 'council_blog_tally_002' AND task_id = 'CR2' AND attempt_no = 1;")
THREAD_ID_CR3=$(sqlite3 TMPDIR/coord.db "SELECT thread_id FROM task_attempts WHERE run_id = 'council_blog_tally_002' AND task_id = 'CR3' AND attempt_no = 1;")
inbox --db TMPDIR/coord.db --json claim --agent architecture-reviewer --thread "$THREAD_ID_CR1"
inbox --db TMPDIR/coord.db --json done --agent architecture-reviewer --thread "$THREAD_ID_CR1" --summary "Review complete" --body-file TMPDIR/architecture-review.json
inbox --db TMPDIR/coord.db --json claim --agent implementation-reviewer --thread "$THREAD_ID_CR2"
inbox --db TMPDIR/coord.db --json done --agent implementation-reviewer --thread "$THREAD_ID_CR2" --summary "Review complete" --body-file TMPDIR/implementation-review.json
inbox --db TMPDIR/coord.db --json claim --agent risk-reviewer --thread "$THREAD_ID_CR3"
inbox --db TMPDIR/coord.db --json done --agent risk-reviewer --thread "$THREAD_ID_CR3" --summary "Review complete" --body-file TMPDIR/risk-review.json
orch --db TMPDIR/coord.db --json council tally \
--run council_blog_tally_002 \
--similarity strict
```
## 预期输出
- `council tally` 退出码为 `0`
- `tally.data.similarity == "strict"`
- `tally.data.counts.minority == 3`
- `tally.data.grouped_recommendations` 长度为 `3`
- 三组 recommendation 都应落入 `minority`
## 断言结论
- `strict` 模式的目标是保留 proposal 的字面差异,而不是宽松合并
- 当没有 proposal 被合并时,support count 会退化成单 reviewer 支持