docs: add orch skill gap-fill cases

This commit is contained in:
2026-03-19 19:27:50 +08:00
parent b753102312
commit b6ba470190
7 changed files with 723 additions and 5 deletions
+3 -3
View File
@@ -25,9 +25,9 @@ As of now:
- a reusable Codex skill package for `inbox` now exists under `skills/inbox/`, with a formal `SKILL.md`, `agents/openai.yaml`, and a bundled CLI binary asset
- reusable Codex skill packages for `orch` and `council-review` now exist under `skills/orch/` and `skills/council-review/`, both using bundled copies of the `orch` CLI binary asset
- an inbox skill forward-test plan directory now exists under `docs/tests/inbox-skill/`, with a shared execution template and multiple scenario cases
- an orch skill forward-test plan directory now exists under `docs/tests/orch-skill/`, with a shared execution contract and initial leader-side workflow scenarios
- a repo-local replay runner now exists at `scripts/run_orch_skill_forward_tests.sh`, and the five `docs/tests/orch-skill/` cases now include recorded example runs from a bundled-CLI replay captured on `2026-03-19`
- the five `docs/tests/orch-skill/` cases now also include recorded real subagent-forward runs captured on `2026-03-19`, with spawned leader and worker agents using the packaged `skills/orch/` and `skills/inbox/` bundles
- an orch skill forward-test plan directory now exists under `docs/tests/orch-skill/`, with a shared execution contract and eight leader-side workflow scenarios
- a repo-local replay runner now exists at `scripts/run_orch_skill_forward_tests.sh`, and all eight `docs/tests/orch-skill/` cases now include recorded example runs from bundled-CLI replays captured on `2026-03-19`, including added coverage for dependency-gated ready sequencing, active task cancellation, and payload-only blocked answers
- the original five `docs/tests/orch-skill/` cases also include recorded real subagent-forward runs captured on `2026-03-19`, with spawned leader and worker agents using the packaged `skills/orch/` and `skills/inbox/` bundles
- a council-review skill forward-test plan directory now exists under `docs/tests/council-review-skill/`, with a shared execution contract and nine council workflow scenarios covering end-to-end flow, unanimous-only defaults, timeout/before-tally errors, explicit minority reporting, invalid report filters, strict tally semantics, malformed reviewer JSON, and target-file inputs
- an execution-roadmap workflow now exists under `docs/roadmaps/active/` and `docs/roadmaps/archive/` for agent-level work traces and completion archives
- a repo-local `scripts/package_skill_clis.sh` packaging flow now builds bundled skill CLI assets for `inbox`, `orch`, and `council-review`
@@ -0,0 +1,66 @@
# Title
Add Missing Orch Skill Forward-Test Cases
## Status
- `completed`
## Owner
- Codex main agent
## Started At
- `2026-03-19`
## Goal
- Add three missing `docs/tests/orch-skill/` forward-test cases covering dependency-gated ready sequencing, direct task cancellation, and payload-only blocked answers.
- Extend the bundled-CLI replay runner so the new cases can be executed and recorded with concrete evidence.
## Scope
- update `docs/tests/orch-skill/README.md` with the new case index entries
- add three new case documents under `docs/tests/orch-skill/`
- extend `scripts/run_orch_skill_forward_tests.sh` to execute the new scenarios
- run the replay suite and record the new example-run evidence
- update `docs/implementation-roadmap.md`
## Checklist
- [x] Review the current orch skill docs, runner, and CLI coverage to identify the missing skill-side cases.
- [x] Add the active orch-skill gap-fill case documents and README updates.
- [x] Extend the orch-skill replay runner for the new scenarios.
- [x] Execute the replay coverage and capture recorded example-run evidence.
- [x] Update implementation progress docs and archive this roadmap.
## Files
- `docs/tests/orch-skill/README.md`
- `docs/tests/orch-skill/leader-dispatches-dependent-task-after-prerequisite-through-bundled-cli.md`
- `docs/tests/orch-skill/leader-cancels-active-task-through-bundled-cli.md`
- `docs/tests/orch-skill/leader-answers-blocked-task-with-payload-json-through-bundled-cli.md`
- `scripts/run_orch_skill_forward_tests.sh`
- `docs/implementation-roadmap.md`
- `docs/roadmaps/archive/orch-skill-gap-fill.md`
## Decisions
- Keep the new coverage in `orch-skill` rather than `docs/tests/orch/`, because the gap is leader-side skill usage, not raw CLI contract coverage.
- Use the direct bundled-CLI replay path for immediate recorded evidence, consistent with the existing orch-skill case documents.
## Blockers
- none
## Next Step
- rerun the direct replay suite when the orch skill bundle or any orch-skill case document changes, and add real subagent-forward evidence for the three new gap-fill cases when that coverage is needed.
## Completion Summary
- Added three new `docs/tests/orch-skill/` cases covering dependency-gated ready sequencing, direct active-task cancellation, and payload-only blocked answers.
- Extended `scripts/run_orch_skill_forward_tests.sh` so the replay runner now executes all eight documented orch-skill scenarios.
- Ran the full replay suite at `/tmp/orch-skill-forward-gap-fill` and recorded passing example-run evidence for the three new cases.
- Updated `docs/tests/orch-skill/README.md` and `docs/implementation-roadmap.md` to reflect the expanded orch-skill coverage and evidence status.
+7 -2
View File
@@ -130,11 +130,11 @@ This runner executes the bundled `skills/orch/assets/orch` and `skills/inbox/ass
Use it to validate packaged CLI behavior and record concrete evidence quickly, but do not treat it as a full replacement for the real subagent-forward model described above.
The case files in this directory now include recorded example runs captured through that direct replay path on `2026-03-19`.
All eight case files in this directory now include recorded example runs captured through that direct replay path on `2026-03-19`.
## Real Subagent Forward Runs
The five cases in this directory were also executed with real spawned role agents on `2026-03-19`.
The original five cases in this directory were also executed with real spawned role agents on `2026-03-19`.
That run used injected project-local `skills/orch/` and `skills/inbox/` bundles with a narrow-context fallback (`fork_context: false`) after an earlier wider-context attempt proved unreliable for this repo.
@@ -142,6 +142,8 @@ The successful evidence root for those runs was `/tmp/orch-skill-subagents.J1XWg
Some longer cases used staged leader progression while keeping the same leader agent active across phases so the run still exercised real agent-driven `orch` control flow instead of a main-thread direct replay.
The three gap-fill cases added later on `2026-03-19` currently have direct replay evidence only and have not yet been rerun through the real subagent-forward path.
## Per-Case Template
Each case file should use this structure:
@@ -166,6 +168,9 @@ Each case file should use this structure:
| `leader-run-dispatch-reconcile-through-bundled-cli` | [leader-run-dispatch-reconcile-through-bundled-cli.md](./leader-run-dispatch-reconcile-through-bundled-cli.md) | validates that a leader can drive a complete `run -> task -> dispatch -> reconcile -> status` happy path through the packaged orch skill |
| `leader-blocked-answer-resume-through-bundled-cli` | [leader-blocked-answer-resume-through-bundled-cli.md](./leader-blocked-answer-resume-through-bundled-cli.md) | validates that a leader can observe a blocked task, answer it through `orch`, and reach final completion with a real worker |
| `strict-worktree-dispatch-to-cleanup-through-bundled-cli` | [strict-worktree-dispatch-to-cleanup-through-bundled-cli.md](./strict-worktree-dispatch-to-cleanup-through-bundled-cli.md) | validates that the skill can drive strict worktree allocation, reconcile completion, and cleanup through the bundled orch CLI |
| `leader-dispatches-dependent-task-after-prerequisite-through-bundled-cli` | [leader-dispatches-dependent-task-after-prerequisite-through-bundled-cli.md](./leader-dispatches-dependent-task-after-prerequisite-through-bundled-cli.md) | validates that a leader can use `dep add` and `ready` to hold back dependent work until a prerequisite completes, then dispatch the newly ready task |
| `leader-cancels-active-task-through-bundled-cli` | [leader-cancels-active-task-through-bundled-cli.md](./leader-cancels-active-task-through-bundled-cli.md) | validates that a leader can cancel an already active task through the packaged orch skill without cancelling unrelated ready work |
| `leader-answers-blocked-task-with-payload-json-through-bundled-cli` | [leader-answers-blocked-task-with-payload-json-through-bundled-cli.md](./leader-answers-blocked-task-with-payload-json-through-bundled-cli.md) | validates that a leader can answer a blocked task with structured payload data only and still drive the run to completion |
| `leader-retries-failed-task-through-bundled-cli` | [leader-retries-failed-task-through-bundled-cli.md](./leader-retries-failed-task-through-bundled-cli.md) | validates that a leader can reconcile a failed attempt and create a successful retry through the packaged orch skill |
| `leader-reassigns-blocked-task-through-bundled-cli` | [leader-reassigns-blocked-task-through-bundled-cli.md](./leader-reassigns-blocked-task-through-bundled-cli.md) | validates that a leader can reassign a blocked task from one worker to another and close the run through the packaged orch skill |
@@ -0,0 +1,107 @@
# Case: `leader-answers-blocked-task-with-payload-json-through-bundled-cli`
## Test Type
This is a `forward-test` and a structured-answer skill validation.
The goal is to verify that a leader using the packaged `orch` skill can answer a blocked task with pure `--payload-json`, allowing the worker to resume without relying on a freeform answer body.
## Purpose
Validate that all of the following can be true at the same time:
- the leader can use `wait`, `blocked`, `answer --payload-json`, `reconcile`, and `status` through the bundled orch skill
- a worker can post a blocked question through the bundled inbox skill
- the answer reaches the active thread as structured payload data
- the worker resumes after reading that payload and completes the task
- the final run reaches `done`
## Preconditions
- orch skill path exists: `ORCH_SKILL_PATH=skills/orch`
- inbox skill path exists: `INBOX_SKILL_PATH=skills/inbox`
- bundled CLI executables exist at `ORCH_SKILL_PATH/assets/orch` and `INBOX_SKILL_PATH/assets/inbox`
- use an empty temporary directory `TMPDIR`
- initialize `TMPDIR/coord.db` before launching role agents through `INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json init`
## Agent Topology
- `leader`
- `worker-a`
## Inputs
### Leader Prompt
```text
Use $orch at ORCH_SKILL_PATH to act as leader on the already initialized SQLite DB TMPDIR/coord.db. Only coordinate through the bundled orch CLI from the skill. Workflow: 1) create run run_blog_skill_payload_answer_001, 2) add and dispatch one task T1 to worker-a, 3) wait until the task becomes blocked, 4) inspect blocked tasks, 5) answer the blocked question using payload-json only with decision=stdout, source=leader, and format=structured, 6) wait until the task completes, 7) reconcile and inspect final status, 8) stop after reporting RUN_ID and THREAD_ID. Do not use ordinary chat to coordinate with the worker.
```
### Worker Prompt
```text
Use $inbox at INBOX_SKILL_PATH to act as worker-a on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) fetch and claim the assigned task, 2) send a blocked update asking for a structured logging decision, 3) wait for a reply, 4) confirm the reply payload tells you to use stdout, 5) finish the task with done, 6) stop after reporting the THREAD_ID you handled. Do not use ordinary chat to coordinate with the leader.
```
## Execution Parameters
- use the shared execution contract from [README.md](./README.md)
- use the shared timeout defaults from [README.md](./README.md)
- do not override the default cleanup policy
## Execution Steps
1. Initialize `TMPDIR/coord.db` once through the bundled inbox CLI before launching agents
2. Inject `skills/orch/` into `leader`
3. Inject `skills/inbox/` into `worker-a`
4. Point both agents at the same database path `TMPDIR/coord.db`
5. Launch `leader` and `worker-a` in parallel
6. Wait for both agents to finish
7. Resolve `THREAD_ID` from the agent outputs
8. Independently run the validation commands from the main thread
## Validation Commands
```bash
ORCH_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json status --run run_blog_skill_payload_answer_001
INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json show --thread THREAD_ID
```
## Expected Outcomes
- the leader successfully observes a blocked event and inspects the blocked queue
- the leader successfully emits one payload-only answer through `orch`
- `worker-a` receives that answer through inbox history and sees `payload_json.decision == "stdout"`
- `worker-a` completes the task after the structured answer arrives
- the final run state is `done`
## Assertions
- `status.data.run.status == "done"`
- `status.data.tasks[0].status == "done"`
- `show.data.messages[*].kind` includes `question`, `answer`, and `result`
- one `question` message contains `payload_json.question == "Use stdout or stderr for structured logs?"`
- one `answer` message contains `payload_json.decision == "stdout"`
- one `answer` message contains `payload_json.source == "leader"`
- one `answer` message contains `payload_json.format == "structured"`
- the final thread status is `done`
## Cleanup
- use the default cleanup policy from [README.md](./README.md)
- if the run fails, retain `TMPDIR` and `coord.db` for replay and manual inspection
## Recorded Example Run
- recorded on: `2026-03-19`
- execution mode: `direct_cli_replay` via `scripts/run_orch_skill_forward_tests.sh`
- result: `pass`
- observed run id: `run_blog_skill_payload_answer_001`
- observed thread id: `thr_735bde0f91794174b2b85fbe89e80581`
- evidence summary:
- `orch wait --for task_blocked` woke after the worker question, and `orch blocked` listed task `T1` as the active blocked task
- `orch answer --payload-json '{"decision":"stdout","source":"leader","format":"structured"}'` appended an `answer` message with those exact payload fields and an empty body
- `inbox wait-reply` woke on that structured answer and exposed `payload_json.decision == "stdout"`
- final `orch status --run run_blog_skill_payload_answer_001 --json` returned `run.status == "done"` and `tasks[0].status == "done"`
- final `inbox show --thread thr_735bde0f91794174b2b85fbe89e80581 --json` contained the blocked `question`, the structured `answer`, and the terminal `result`
- note: this recorded run exercised the packaged binaries directly in a temporary DB and did not spawn separate Codex role agents
@@ -0,0 +1,105 @@
# Case: `leader-cancels-active-task-through-bundled-cli`
## Test Type
This is a `forward-test` and a direct task-cancel skill validation.
The goal is to verify that a leader using the packaged `orch` skill can cancel an already active task attempt without cancelling unrelated ready work in the same run.
## Purpose
Validate that all of the following can be true at the same time:
- the leader can use `dispatch`, `cancel`, `ready`, and `status` through the bundled orch skill
- `worker-a` can claim the original thread and report active progress through the bundled inbox skill
- the leader can cancel that active task through `orch cancel --task`
- the original thread reaches `cancelled`
- another task in the same run remains actionable instead of being implicitly cancelled
## Preconditions
- orch skill path exists: `ORCH_SKILL_PATH=skills/orch`
- inbox skill path exists: `INBOX_SKILL_PATH=skills/inbox`
- bundled CLI executables exist at `ORCH_SKILL_PATH/assets/orch` and `INBOX_SKILL_PATH/assets/inbox`
- use an empty temporary directory `TMPDIR`
- initialize `TMPDIR/coord.db` before launching role agents through `INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json init`
## Agent Topology
- `leader`
- `worker-a`
## Inputs
### Leader Prompt
```text
Use $orch at ORCH_SKILL_PATH to act as leader on the already initialized SQLite DB TMPDIR/coord.db. Only coordinate through the bundled orch CLI from the skill. Workflow: 1) create run run_blog_skill_cancel_001, 2) add task T1 for worker-a and a second task T2 that should remain untouched, 3) dispatch T1, 4) wait until worker-a has claimed it or marked it in progress, 5) cancel T1 with a clear reason through orch, 6) inspect ready work and final run status, 7) stop after reporting THREAD_ID_1. Do not use ordinary chat to coordinate with the worker.
```
### Worker Prompt
```text
Use $inbox at INBOX_SKILL_PATH to act as worker-a on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) fetch and claim the assigned thread, 2) send one in_progress update, 3) stop after reporting THREAD_ID_1 and that the task became active. Do not use ordinary chat to coordinate with the leader.
```
## Execution Parameters
- use the shared execution contract from [README.md](./README.md)
- use the shared timeout defaults from [README.md](./README.md)
- do not override the default cleanup policy
## Execution Steps
1. Initialize `TMPDIR/coord.db` once through the bundled inbox CLI before launching agents
2. Inject `skills/orch/` into `leader`
3. Inject `skills/inbox/` into `worker-a`
4. Point both agents at the same database path `TMPDIR/coord.db`
5. Launch `leader` and `worker-a` in parallel
6. Wait for both agents to finish
7. Resolve `THREAD_ID_1` from the agent outputs
8. Independently run the validation commands from the main thread
## Validation Commands
```bash
ORCH_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json status --run run_blog_skill_cancel_001
ORCH_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json ready --run run_blog_skill_cancel_001
INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json show --thread THREAD_ID_1
```
## Expected Outcomes
- `worker-a` successfully claims the original thread and reports `in_progress`
- the leader successfully cancels `T1` through `orch cancel --task`
- the original thread reaches `cancelled`
- the untouched task `T2` remains available in the ready queue
- the run remains open rather than collapsing into a fully cancelled run
## Assertions
- `status.data.tasks` contains `T1` with status `cancelled`
- `status.data.tasks` contains `T2` with status `ready`
- `status.data.run.status == "ready"`
- `ready.data.tasks` contains only `T2`
- `show.data.thread.status == "cancelled"`
- the thread history preserves the worker `progress` message before the cancel
## Cleanup
- use the default cleanup policy from [README.md](./README.md)
- if the run fails, retain `TMPDIR` and `coord.db` for replay and manual inspection
## Recorded Example Run
- recorded on: `2026-03-19`
- execution mode: `direct_cli_replay` via `scripts/run_orch_skill_forward_tests.sh`
- result: `pass`
- observed run id: `run_blog_skill_cancel_001`
- observed thread id: `thr_175e00bca76549ea8529cb4c92d99fd4`
- evidence summary:
- final `orch status --run run_blog_skill_cancel_001 --json` returned `run.status == "ready"` with task counts `cancelled: 1` and `ready: 1`
- that same `status` output showed `T1.status == "cancelled"` while `T2.status == "ready"`
- final `orch ready --run run_blog_skill_cancel_001 --json` returned only `T2`, confirming the untouched task remained dispatchable
- final `inbox show --thread thr_175e00bca76549ea8529cb4c92d99fd4 --json` returned `thread.status == "cancelled"` and preserved the worker `progress` message before the cancel
- note: this recorded run exercised the packaged binaries directly in a temporary DB and did not spawn separate Codex role agents
@@ -0,0 +1,115 @@
# Case: `leader-dispatches-dependent-task-after-prerequisite-through-bundled-cli`
## Test Type
This is a `forward-test` and a dependency-gated ready-queue skill validation.
The goal is to verify that a leader using the packaged `orch` skill can create a dependency edge, observe the correct `ready` set before and after prerequisite completion, and dispatch the dependent task only after it becomes eligible.
## Purpose
Validate that all of the following can be true at the same time:
- the leader can use `dep add`, `ready`, `dispatch`, `wait`, `reconcile`, and `status` through the bundled orch skill
- `worker-a` can complete the prerequisite task on the bundled inbox skill
- the dependent task stays out of the initial `ready` queue
- the dependent task appears in `ready` only after the prerequisite reaches `done`
- the leader can dispatch that newly ready dependent task to `worker-b` and close the run
## Preconditions
- orch skill path exists: `ORCH_SKILL_PATH=skills/orch`
- inbox skill path exists: `INBOX_SKILL_PATH=skills/inbox`
- bundled CLI executables exist at `ORCH_SKILL_PATH/assets/orch` and `INBOX_SKILL_PATH/assets/inbox`
- use an empty temporary directory `TMPDIR`
- initialize `TMPDIR/coord.db` before launching role agents through `INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json init`
## Agent Topology
- `leader`
- `worker-a`
- `worker-b`
## Inputs
### Leader Prompt
```text
Use $orch at ORCH_SKILL_PATH to act as leader on the already initialized SQLite DB TMPDIR/coord.db. Only coordinate through the bundled orch CLI from the skill. Workflow: 1) create run run_blog_skill_deps_001, 2) add prerequisite task T1 for worker-a and dependent task T2 for worker-b, 3) make T2 depend on T1, 4) inspect ready work and confirm only T1 is dispatchable at first, 5) dispatch T1, 6) wait until T1 completes, 7) reconcile and inspect ready work again, 8) dispatch T2 only after it becomes ready, 9) wait until T2 completes, 10) reconcile and inspect final status, 11) stop after reporting THREAD_ID_1 and THREAD_ID_2. Do not use ordinary chat to coordinate with the workers.
```
### Worker A Prompt
```text
Use $inbox at INBOX_SKILL_PATH to act as worker-a on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) fetch and claim the prerequisite thread assigned to worker-a, 2) send one in_progress update, 3) finish it with done, 4) stop after reporting THREAD_ID_1. Do not use ordinary chat to coordinate with the leader or worker-b.
```
### Worker B Prompt
```text
Use $inbox at INBOX_SKILL_PATH to act as worker-b on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) wait until dependent work assigned to worker-b appears, 2) fetch and claim that thread, 3) finish it with done, 4) stop after reporting THREAD_ID_2. Do not use ordinary chat to coordinate with the leader or worker-a.
```
## Execution Parameters
- use the shared execution contract from [README.md](./README.md)
- use the shared timeout defaults from [README.md](./README.md)
- do not override the default cleanup policy
## Execution Steps
1. Initialize `TMPDIR/coord.db` once through the bundled inbox CLI before launching agents
2. Inject `skills/orch/` into `leader`
3. Inject `skills/inbox/` into `worker-a` and `worker-b`
4. Point all agents at the same database path `TMPDIR/coord.db`
5. Launch `leader`, `worker-a`, and `worker-b` in parallel
6. Wait for all agents to finish
7. Resolve `THREAD_ID_1` and `THREAD_ID_2` from the agent outputs
8. Independently run the validation commands from the main thread
## Validation Commands
```bash
ORCH_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json status --run run_blog_skill_deps_001
INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json show --thread THREAD_ID_1
INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json show --thread THREAD_ID_2
```
## Expected Outcomes
- the leader initially sees only `T1` in the `ready` output
- `worker-a` completes the prerequisite thread for `T1`
- after reconcile, the leader sees `T2` become ready
- `worker-b` receives a distinct thread for `T2` and completes it
- the final run reaches `done`
## Assertions
- the initial `ready` output contains `T1` and does not contain `T2`
- the post-reconcile `ready` output contains `T2`
- `THREAD_ID_1 != THREAD_ID_2`
- `status.data.run.status == "done"`
- `status.data.tasks` contains `T1` and `T2`, both with status `done`
- `show THREAD_ID_1` reports a terminal done thread state
- `show THREAD_ID_2` reports a terminal done thread state
## Cleanup
- use the default cleanup policy from [README.md](./README.md)
- if the run fails, retain `TMPDIR` and `coord.db` for replay and manual inspection
## Recorded Example Run
- recorded on: `2026-03-19`
- execution mode: `direct_cli_replay` via `scripts/run_orch_skill_forward_tests.sh`
- result: `pass`
- observed run id: `run_blog_skill_deps_001`
- observed first thread id: `thr_7f57b577e5ce4cc094341e7d2eae4570`
- observed second thread id: `thr_5dbc81f2fe234b6dbf0c57a176e13acf`
- evidence summary:
- the initial `ready` output returned only `T1`, confirming that dependent task `T2` stayed gated before prerequisite completion
- after `worker-a` completed `T1` and the leader ran `reconcile`, the next `ready` output returned only `T2`
- final `orch status --run run_blog_skill_deps_001 --json` returned `run.status == "done"` with both tasks `T1` and `T2` in state `done`
- final `inbox show` on both thread ids returned terminal thread state `done`
- the replay also observed `orch wait --for task_done` wake on the prerequisite completion before the dependent dispatch
- note: this recorded run exercised the packaged binaries directly in a temporary DB and did not spawn separate Codex role agents
+320
View File
@@ -448,6 +448,320 @@ run_case_strict_worktree_cleanup() {
"Direct CLI replay of strict worktree dispatch, completion, and cleanup."
}
run_case_dependency_ready_dispatch() {
local case_slug="leader-dispatches-dependent-task-after-prerequisite-through-bundled-cli"
local run_id="run_blog_skill_deps_001"
local case_dir
case_dir="$(init_case_dir "${case_slug}")"
local db_path="${case_dir}/coord.db"
local started_at
started_at="$(date +%s)"
init_db "${db_path}"
run_json "${case_dir}/run.json" \
"${ORCH_BIN}" --db "${db_path}" --json run init \
--run "${run_id}" --goal "Validate dependency-gated dispatch" \
--summary "Exercise dep add, ready, and staged dispatch"
run_json "${case_dir}/task-1.json" \
"${ORCH_BIN}" --db "${db_path}" --json task add \
--run "${run_id}" --task T1 --title "Implement backend" \
--summary "Prerequisite task" --default-to worker-a
run_json "${case_dir}/task-2.json" \
"${ORCH_BIN}" --db "${db_path}" --json task add \
--run "${run_id}" --task T2 --title "Implement frontend" \
--summary "Dependent task" --default-to worker-b
run_json "${case_dir}/dep.json" \
"${ORCH_BIN}" --db "${db_path}" --json dep add \
--run "${run_id}" --task T2 --depends-on T1
run_json "${case_dir}/ready-initial.json" \
"${ORCH_BIN}" --db "${db_path}" --json ready --run "${run_id}"
run_json "${case_dir}/dispatch-1.json" \
"${ORCH_BIN}" --db "${db_path}" --json dispatch \
--run "${run_id}" --task T1 --to worker-a \
--body "Complete the prerequisite task first."
local thread_id_1
thread_id_1="$(json_get "${case_dir}/dispatch-1.json" '.data.attempt.thread_id')"
local wait_done_pid
start_wait "${case_dir}/wait-task-done-1.json" \
"${ORCH_BIN}" --db "${db_path}" --json wait \
--run "${run_id}" --for task_done --timeout-seconds 15
wait_done_pid="${LAST_BG_PID}"
sleep 0.2
run_json "${case_dir}/claim-1.json" \
"${INBOX_BIN}" --db "${db_path}" --json claim \
--agent worker-a --thread "${thread_id_1}"
run_json "${case_dir}/progress-1.json" \
"${INBOX_BIN}" --db "${db_path}" --json update \
--agent worker-a --thread "${thread_id_1}" \
--status in_progress --summary "Prerequisite started"
run_json "${case_dir}/done-1.json" \
"${INBOX_BIN}" --db "${db_path}" --json done \
--agent worker-a --thread "${thread_id_1}" \
--summary "Prerequisite complete" \
--body "T1 completed so T2 can be dispatched."
wait_for_pid "${wait_done_pid}" "${case_slug}: wait first task_done"
run_json "${case_dir}/reconcile-1.json" \
"${ORCH_BIN}" --db "${db_path}" --json reconcile --run "${run_id}"
run_json "${case_dir}/ready-after-prerequisite.json" \
"${ORCH_BIN}" --db "${db_path}" --json ready --run "${run_id}"
run_json "${case_dir}/dispatch-2.json" \
"${ORCH_BIN}" --db "${db_path}" --json dispatch \
--run "${run_id}" --task T2 --to worker-b \
--body "Dependent task is now ready after T1."
local thread_id_2
thread_id_2="$(json_get "${case_dir}/dispatch-2.json" '.data.attempt.thread_id')"
run_json "${case_dir}/claim-2.json" \
"${INBOX_BIN}" --db "${db_path}" --json claim \
--agent worker-b --thread "${thread_id_2}"
run_json "${case_dir}/done-2.json" \
"${INBOX_BIN}" --db "${db_path}" --json done \
--agent worker-b --thread "${thread_id_2}" \
--summary "Dependent task complete" \
--body "T2 completed after the dependency cleared."
run_json "${case_dir}/reconcile-2.json" \
"${ORCH_BIN}" --db "${db_path}" --json reconcile --run "${run_id}"
run_json "${case_dir}/status.json" \
"${ORCH_BIN}" --db "${db_path}" --json status --run "${run_id}"
run_json "${case_dir}/show-1.json" \
"${INBOX_BIN}" --db "${db_path}" --json show --thread "${thread_id_1}"
run_json "${case_dir}/show-2.json" \
"${INBOX_BIN}" --db "${db_path}" --json show --thread "${thread_id_2}"
json_check "${case_dir}/dep.json" '.data.dependency.task_id == "T2" and .data.dependency.depends_on_task_id == "T1"' "dep add stores T2 -> T1"
json_check "${case_dir}/ready-initial.json" '.data.tasks | length == 1 and .[0].task_id == "T1"' "initial ready lists only prerequisite"
json_check "${case_dir}/wait-task-done-1.json" '.data.woke == true and (.data.events | length) >= 1 and .data.events[0].type == "task_done"' "wait woke on prerequisite completion"
json_check "${case_dir}/ready-after-prerequisite.json" '.data.tasks | length == 1 and .[0].task_id == "T2"' "dependent task becomes ready after reconcile"
json_check "${case_dir}/status.json" '.data.run.status == "done" and ([.data.tasks[] | select(.task_id == "T1")][0].status == "done") and ([.data.tasks[] | select(.task_id == "T2")][0].status == "done")' "dependency-gated run completes with both tasks done"
json_check "${case_dir}/show-1.json" '.data.thread.status == "done"' "prerequisite thread done"
json_check "${case_dir}/show-2.json" '.data.thread.status == "done"' "dependent thread done"
if [ "${thread_id_1}" = "${thread_id_2}" ]; then
printf 'FAIL: dependency flow reused thread ID %s\n' "${thread_id_1}" >&2
exit 1
fi
printf 'PASS: dependency flow created distinct thread IDs\n'
local duration_seconds
duration_seconds="$(( $(date +%s) - started_at ))"
write_result_json \
"${case_dir}" "${case_slug}" "${db_path}" "${run_id}" pass "${duration_seconds}" \
"$(join_json_array "${thread_id_1}" "${thread_id_2}")" \
"$(join_json_array)" \
"Direct CLI replay of dependency-gated ready sequencing from prerequisite completion to dependent dispatch."
}
run_case_cancel_active_task() {
local case_slug="leader-cancels-active-task-through-bundled-cli"
local run_id="run_blog_skill_cancel_001"
local case_dir
case_dir="$(init_case_dir "${case_slug}")"
local db_path="${case_dir}/coord.db"
local started_at
started_at="$(date +%s)"
init_db "${db_path}"
run_json "${case_dir}/run.json" \
"${ORCH_BIN}" --db "${db_path}" --json run init \
--run "${run_id}" --goal "Validate active task cancellation" \
--summary "Exercise direct orch cancel on an active attempt"
run_json "${case_dir}/task-1.json" \
"${ORCH_BIN}" --db "${db_path}" --json task add \
--run "${run_id}" --task T1 --title "Implement backend" \
--summary "Task that will be cancelled mid-flight" --default-to worker-a
run_json "${case_dir}/task-2.json" \
"${ORCH_BIN}" --db "${db_path}" --json task add \
--run "${run_id}" --task T2 --title "Implement frontend" \
--summary "Task that should remain ready" --default-to worker-b
run_json "${case_dir}/dispatch.json" \
"${ORCH_BIN}" --db "${db_path}" --json dispatch \
--run "${run_id}" --task T1 --to worker-a \
--body "Start work so the leader can cancel an active task."
local thread_id
thread_id="$(json_get "${case_dir}/dispatch.json" '.data.attempt.thread_id')"
run_json "${case_dir}/claim.json" \
"${INBOX_BIN}" --db "${db_path}" --json claim \
--agent worker-a --thread "${thread_id}"
run_json "${case_dir}/progress.json" \
"${INBOX_BIN}" --db "${db_path}" --json update \
--agent worker-a --thread "${thread_id}" \
--status in_progress --summary "Active work in progress"
run_json "${case_dir}/cancel.json" \
"${ORCH_BIN}" --db "${db_path}" --json cancel \
--run "${run_id}" --task T1 \
--reason "Task superseded by dependency review."
run_json "${case_dir}/status.json" \
"${ORCH_BIN}" --db "${db_path}" --json status --run "${run_id}"
run_json "${case_dir}/ready.json" \
"${ORCH_BIN}" --db "${db_path}" --json ready --run "${run_id}"
run_json "${case_dir}/show.json" \
"${INBOX_BIN}" --db "${db_path}" --json show --thread "${thread_id}"
json_check "${case_dir}/cancel.json" '.data.cancelled_tasks | length == 1 and .[0].task_id == "T1" and .[0].status == "cancelled"' "cancel returns cancelled T1"
json_check "${case_dir}/status.json" '.data.run.status == "ready"' "run remains ready after cancelling one active task"
json_check "${case_dir}/status.json" '([.data.tasks[] | select(.task_id == "T1")][0].status == "cancelled") and ([.data.tasks[] | select(.task_id == "T2")][0].status == "ready")' "status keeps T2 ready while T1 is cancelled"
json_check "${case_dir}/ready.json" '.data.tasks | length == 1 and .[0].task_id == "T2"' "ready queue still exposes untouched T2"
json_check "${case_dir}/show.json" '.data.thread.status == "cancelled"' "cancelled thread reaches terminal state"
json_check "${case_dir}/show.json" 'any(.data.messages[]; .kind == "progress")' "cancelled thread preserves worker progress history"
local duration_seconds
duration_seconds="$(( $(date +%s) - started_at ))"
write_result_json \
"${case_dir}" "${case_slug}" "${db_path}" "${run_id}" pass "${duration_seconds}" \
"$(join_json_array "${thread_id}")" \
"$(join_json_array)" \
"Direct CLI replay of cancelling an active task while leaving unrelated ready work untouched."
}
run_case_payload_only_answer() {
local case_slug="leader-answers-blocked-task-with-payload-json-through-bundled-cli"
local run_id="run_blog_skill_payload_answer_001"
local case_dir
case_dir="$(init_case_dir "${case_slug}")"
local db_path="${case_dir}/coord.db"
local started_at
started_at="$(date +%s)"
init_db "${db_path}"
run_json "${case_dir}/run.json" \
"${ORCH_BIN}" --db "${db_path}" --json run init \
--run "${run_id}" --goal "Validate payload-only answers" \
--summary "Exercise answer --payload-json on a blocked task"
run_json "${case_dir}/task.json" \
"${ORCH_BIN}" --db "${db_path}" --json task add \
--run "${run_id}" --task T1 --title "Build frontend" \
--summary "Resume after structured leader decision" --default-to worker-a
run_json "${case_dir}/dispatch.json" \
"${ORCH_BIN}" --db "${db_path}" --json dispatch \
--run "${run_id}" --task T1 --to worker-a \
--body "Pause if a structured logging decision is needed."
local thread_id
thread_id="$(json_get "${case_dir}/dispatch.json" '.data.attempt.thread_id')"
local wait_blocked_pid
start_wait "${case_dir}/wait-task-blocked.json" \
"${ORCH_BIN}" --db "${db_path}" --json wait \
--run "${run_id}" --for task_blocked --timeout-seconds 15
wait_blocked_pid="${LAST_BG_PID}"
sleep 0.2
run_json "${case_dir}/claim.json" \
"${INBOX_BIN}" --db "${db_path}" --json claim \
--agent worker-a --thread "${thread_id}"
run_json "${case_dir}/blocked.json" \
"${INBOX_BIN}" --db "${db_path}" --json update \
--agent worker-a --thread "${thread_id}" \
--status blocked --summary "Need structured logging decision" \
--payload-json '{"question":"Use stdout or stderr for structured logs?"}'
wait_for_pid "${wait_blocked_pid}" "${case_slug}: wait task_blocked"
local blocked_message_id
blocked_message_id="$(json_get "${case_dir}/blocked.json" '.data.message.message_id')"
run_json "${case_dir}/orch-blocked.json" \
"${ORCH_BIN}" --db "${db_path}" --json blocked --run "${run_id}"
local wait_reply_pid
start_wait "${case_dir}/wait-reply.json" \
"${INBOX_BIN}" --db "${db_path}" --agent worker-a --json wait-reply \
--thread "${thread_id}" --after-message "${blocked_message_id}" --timeout-seconds 15
wait_reply_pid="${LAST_BG_PID}"
sleep 0.2
run_json "${case_dir}/answer.json" \
"${ORCH_BIN}" --db "${db_path}" --json answer \
--run "${run_id}" --task T1 \
--payload-json '{"decision":"stdout","source":"leader","format":"structured"}'
wait_for_pid "${wait_reply_pid}" "${case_slug}: inbox wait-reply"
local wait_done_pid
start_wait "${case_dir}/wait-task-done.json" \
"${ORCH_BIN}" --db "${db_path}" --json wait \
--run "${run_id}" --for task_done --timeout-seconds 15
wait_done_pid="${LAST_BG_PID}"
sleep 0.2
run_json "${case_dir}/resume.json" \
"${INBOX_BIN}" --db "${db_path}" --json update \
--agent worker-a --thread "${thread_id}" \
--status in_progress --summary "Structured decision applied"
run_json "${case_dir}/done.json" \
"${INBOX_BIN}" --db "${db_path}" --json done \
--agent worker-a --thread "${thread_id}" \
--summary "Frontend complete" \
--body "Worker resumed after reading the structured leader decision."
wait_for_pid "${wait_done_pid}" "${case_slug}: wait task_done"
run_json "${case_dir}/reconcile.json" \
"${ORCH_BIN}" --db "${db_path}" --json reconcile --run "${run_id}"
run_json "${case_dir}/status.json" \
"${ORCH_BIN}" --db "${db_path}" --json status --run "${run_id}"
run_json "${case_dir}/show.json" \
"${INBOX_BIN}" --db "${db_path}" --json show --thread "${thread_id}"
json_check "${case_dir}/wait-task-blocked.json" '.data.woke == true and (.data.events | length) >= 1 and .data.events[0].type == "task_blocked"' "payload-answer flow woke on task_blocked"
json_check "${case_dir}/orch-blocked.json" '.data.blocked | length == 1 and .[0].task.task_id == "T1"' "blocked queue lists payload-answer task"
json_check "${case_dir}/answer.json" '.data.message.kind == "answer" and .data.message.payload_json.decision == "stdout" and .data.message.payload_json.source == "leader" and .data.message.payload_json.format == "structured"' "answer writes payload-json without body"
json_check "${case_dir}/wait-reply.json" '.data.woke == true and .data.message.kind == "answer" and .data.message.payload_json.decision == "stdout"' "wait-reply exposes structured answer payload"
json_check "${case_dir}/status.json" '.data.run.status == "done" and .data.tasks[0].status == "done"' "payload-answer flow run completes"
json_check "${case_dir}/show.json" 'any(.data.messages[]; .kind == "question" and .payload_json.question == "Use stdout or stderr for structured logs?")' "payload-answer question recorded"
json_check "${case_dir}/show.json" 'any(.data.messages[]; .kind == "answer" and .payload_json.decision == "stdout" and .payload_json.source == "leader" and .payload_json.format == "structured")' "payload-answer message preserved in history"
local duration_seconds
duration_seconds="$(( $(date +%s) - started_at ))"
write_result_json \
"${case_dir}" "${case_slug}" "${db_path}" "${run_id}" pass "${duration_seconds}" \
"$(join_json_array "${thread_id}")" \
"$(join_json_array)" \
"Direct CLI replay of answering a blocked task with payload-json only."
}
run_case_retry() {
local case_slug="leader-retries-failed-task-through-bundled-cli"
local run_id="run_blog_skill_retry_001"
@@ -723,6 +1037,9 @@ main() {
run_case_happy_path
run_case_blocked_answer
run_case_strict_worktree_cleanup
run_case_dependency_ready_dispatch
run_case_cancel_active_task
run_case_payload_only_answer
run_case_retry
run_case_reassign
@@ -731,6 +1048,9 @@ main() {
print_case_summary "leader-run-dispatch-reconcile-through-bundled-cli" "${OUTPUT_ROOT}/leader-run-dispatch-reconcile-through-bundled-cli"
print_case_summary "leader-blocked-answer-resume-through-bundled-cli" "${OUTPUT_ROOT}/leader-blocked-answer-resume-through-bundled-cli"
print_case_summary "strict-worktree-dispatch-to-cleanup-through-bundled-cli" "${OUTPUT_ROOT}/strict-worktree-dispatch-to-cleanup-through-bundled-cli"
print_case_summary "leader-dispatches-dependent-task-after-prerequisite-through-bundled-cli" "${OUTPUT_ROOT}/leader-dispatches-dependent-task-after-prerequisite-through-bundled-cli"
print_case_summary "leader-cancels-active-task-through-bundled-cli" "${OUTPUT_ROOT}/leader-cancels-active-task-through-bundled-cli"
print_case_summary "leader-answers-blocked-task-with-payload-json-through-bundled-cli" "${OUTPUT_ROOT}/leader-answers-blocked-task-with-payload-json-through-bundled-cli"
print_case_summary "leader-retries-failed-task-through-bundled-cli" "${OUTPUT_ROOT}/leader-retries-failed-task-through-bundled-cli"
print_case_summary "leader-reassigns-blocked-task-through-bundled-cli" "${OUTPUT_ROOT}/leader-reassigns-blocked-task-through-bundled-cli"