Add orch skill forward test evidence
This commit is contained in:
@@ -89,3 +89,33 @@ INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json show --thread THREAD_I
|
||||
|
||||
- use the default cleanup policy from [README.md](./README.md)
|
||||
- if the run fails, retain `TMPDIR` and `coord.db` for replay and manual inspection
|
||||
|
||||
## Recorded Example Run
|
||||
|
||||
- recorded on: `2026-03-19`
|
||||
- execution mode: `direct_cli_replay` via `scripts/run_orch_skill_forward_tests.sh`
|
||||
- result: `pass`
|
||||
- observed run id: `run_blog_skill_retry_001`
|
||||
- observed first thread id: `thr_8dbf2d2e46d7469891cc1ef604da476f`
|
||||
- observed second thread id: `thr_bdd86f4fe08e4ebfb39b8151ac41a3bb`
|
||||
- evidence summary:
|
||||
- `orch wait --for task_failed` woke after the first worker-owned thread failed
|
||||
- `orch retry --run run_blog_skill_retry_001 --task T1 --json` returned `attempt_no == 2` with a distinct replacement thread for the same worker
|
||||
- final `inbox show` on the first thread returned `thread.status == "failed"`
|
||||
- final `inbox show` on the second thread returned `thread.status == "done"`
|
||||
- final `orch status --run run_blog_skill_retry_001 --json` returned `run.status == "done"` and `tasks[0].status == "done"`
|
||||
- note: this recorded run exercised the packaged binaries directly in a temporary DB and did not spawn separate Codex role agents
|
||||
|
||||
## Recorded Real Forward Run
|
||||
|
||||
- recorded on: `2026-03-19`
|
||||
- execution mode: `real_subagent_forward_test`
|
||||
- result: `pass`
|
||||
- evidence root: `/tmp/orch-skill-subagents.J1XWgs/leader-retries-failed-task-through-bundled-cli-phased`
|
||||
- observed run id: `run_blog_skill_retry_001`
|
||||
- observed first thread id: `thr_1e22121642294b56aae351ddec5180d1`
|
||||
- observed second thread id: `thr_f2ab1f1899964007b2447796204e1928`
|
||||
- evidence summary:
|
||||
- the same real leader agent using `skills/orch/` completed the case in three phases: initial `run/task/dispatch`, then `wait --for task_failed` plus `retry`, then final `wait --for task_done` plus `status`
|
||||
- a real worker agent using `skills/inbox/` failed the first thread, polled for the retried pending thread, then claimed and completed the second thread
|
||||
- main-thread validation confirmed the two thread ids were distinct, the first thread finished `failed`, the second thread finished `done`, and the run/task both finished `done`
|
||||
|
||||
Reference in New Issue
Block a user