Add orch skill forward test evidence

This commit is contained in:
2026-03-19 18:36:31 +08:00
parent d17b5ebfbd
commit e9cbb15c2d
10 changed files with 1036 additions and 0 deletions
@@ -96,3 +96,34 @@ INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json show --thread THREAD_I
- use the default cleanup policy from [README.md](./README.md)
- if the run fails, retain `TMPDIR` and `coord.db` for replay and manual inspection
## Recorded Example Run
- recorded on: `2026-03-19`
- execution mode: `direct_cli_replay` via `scripts/run_orch_skill_forward_tests.sh`
- result: `pass`
- observed run id: `run_blog_skill_reassign_001`
- observed original thread id: `thr_0a61240412134de3b3d9ab219b6c8f19`
- observed reassigned thread id: `thr_12fbcf6d89d948548306198d013d77a5`
- evidence summary:
- `orch wait --for task_blocked` woke after worker-a posted a blocked question with payload `Proceed with v1 scope?`
- `orch reassign --run run_blog_skill_reassign_001 --task T1 --to worker-b --json` returned `attempt_no == 2` and assigned the new attempt to `worker-b`
- final `inbox show` on the original thread returned `thread.status == "cancelled"` and preserved the blocked `question` message
- final `inbox show` on the reassigned thread returned `thread.status == "done"`
- final `orch status --run run_blog_skill_reassign_001 --json` returned `run.status == "done"` and `tasks[0].status == "done"`
- note: this recorded run exercised the packaged binaries directly in a temporary DB and did not spawn separate Codex role agents
## Recorded Real Forward Run
- recorded on: `2026-03-19`
- execution mode: `real_subagent_forward_test`
- result: `pass`
- evidence root: `/tmp/orch-skill-subagents.J1XWgs/leader-reassigns-blocked-task-through-bundled-cli-phased`
- observed run id: `run_blog_skill_reassign_001`
- observed original thread id: `thr_7d43af5bc1f7467da98a39adb0de5808`
- observed reassigned thread id: `thr_eba253db8965423b855d0c784a29702c`
- evidence summary:
- the same real leader agent using `skills/orch/` completed the case in three phases: initial `run/task/dispatch`, then `wait --for task_blocked` plus `reassign`, then final `wait --for task_done` plus `status`
- a real `worker-a` agent using `skills/inbox/` claimed the original thread and posted the blocked question `Proceed with v1 scope?`
- a real `worker-b` agent using `skills/inbox/` claimed the reassigned thread and completed it
- main-thread validation confirmed the original thread finished `cancelled`, the reassigned thread finished `done`, and the original blocked question remained visible in thread history