122 lines
5.9 KiB
Markdown
122 lines
5.9 KiB
Markdown
# Case: `leader-retries-failed-task-through-bundled-cli`
|
|
|
|
## Test Type
|
|
|
|
This is a `forward-test` and a retry-path skill validation.
|
|
|
|
The goal is to verify that a leader using the packaged `orch` skill can reconcile a failed attempt, issue `retry`, and drive the task to success through a second attempt handled by a real worker.
|
|
|
|
## Purpose
|
|
|
|
Validate that all of the following can be true at the same time:
|
|
|
|
- the leader can use the bundled orch skill to dispatch an initial attempt
|
|
- a worker can fail the first attempt through inbox
|
|
- the leader can reconcile that failure and create a fresh retry attempt
|
|
- the worker can complete the retried attempt
|
|
- the final run reaches `done` and the two attempts map to different threads
|
|
|
|
## Preconditions
|
|
|
|
- orch skill path exists: `ORCH_SKILL_PATH=skills/orch`
|
|
- inbox skill path exists: `INBOX_SKILL_PATH=skills/inbox`
|
|
- bundled CLI executables exist at `ORCH_SKILL_PATH/assets/orch` and `INBOX_SKILL_PATH/assets/inbox`
|
|
- use an empty temporary directory `TMPDIR`
|
|
- initialize `TMPDIR/coord.db` before launching role agents through `INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json init`
|
|
|
|
## Agent Topology
|
|
|
|
- `leader`
|
|
- `worker-a`
|
|
|
|
## Inputs
|
|
|
|
### Leader Prompt
|
|
|
|
```text
|
|
Use $orch at ORCH_SKILL_PATH to act as leader on the already initialized SQLite DB TMPDIR/coord.db. Only coordinate through the bundled orch CLI from the skill. Workflow: 1) create run run_blog_skill_retry_001, 2) add and dispatch one task T1 to worker-a with --execution-mode analysis, 3) wait until the first attempt fails, 4) reconcile, 5) retry T1 with a short retry note, 6) wait until the retried attempt completes, 7) reconcile again and inspect final status, 8) stop after reporting RUN_ID, THREAD_ID_1, and THREAD_ID_2. Do not use ordinary chat to coordinate with the worker.
|
|
```
|
|
|
|
### Worker Prompt
|
|
|
|
```text
|
|
Use $inbox at INBOX_SKILL_PATH to act as worker-a on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) fetch and claim the first assigned thread, 2) fail that first attempt with a clear summary, 3) keep watching for retried work assigned to worker-a, 4) fetch and claim the retried thread, 5) finish the retried attempt with done, 6) stop after reporting both THREAD_ID_1 and THREAD_ID_2. Do not use ordinary chat to coordinate with the leader.
|
|
```
|
|
|
|
## Execution Parameters
|
|
|
|
- use the shared execution contract from [README.md](./README.md)
|
|
- use the shared timeout defaults from [README.md](./README.md)
|
|
- do not override the default cleanup policy
|
|
|
|
## Execution Steps
|
|
|
|
1. Initialize `TMPDIR/coord.db` once through the bundled inbox CLI before launching agents
|
|
2. Inject `skills/orch/` into `leader`
|
|
3. Inject `skills/inbox/` into `worker-a`
|
|
4. Point both agents at the same database path `TMPDIR/coord.db`
|
|
5. Launch `leader` and `worker-a` in parallel
|
|
6. Wait for both agents to finish
|
|
7. Resolve `THREAD_ID_1` and `THREAD_ID_2` from the agent outputs
|
|
8. Independently run the validation commands from the main thread
|
|
|
|
## Validation Commands
|
|
|
|
```bash
|
|
ORCH_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json status --run run_blog_skill_retry_001
|
|
INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json show --thread THREAD_ID_1
|
|
INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json show --thread THREAD_ID_2
|
|
```
|
|
|
|
## Expected Outcomes
|
|
|
|
- the first worker-owned thread reaches `failed`
|
|
- the leader successfully issues `retry`
|
|
- the second worker-owned thread is distinct from the first
|
|
- the second worker-owned thread reaches `done`
|
|
- the final run state is `done`
|
|
|
|
## Assertions
|
|
|
|
- `THREAD_ID_1 != THREAD_ID_2`
|
|
- `status.data.run.status == "done"`
|
|
- `status.data.tasks[0].status == "done"`
|
|
- `show THREAD_ID_1` reports a terminal failed thread state
|
|
- `show THREAD_ID_2` reports a terminal done thread state
|
|
- the worker summary confirms that the retried attempt was a new thread rather than a reused one
|
|
|
|
## Cleanup
|
|
|
|
- use the default cleanup policy from [README.md](./README.md)
|
|
- if the run fails, retain `TMPDIR` and `coord.db` for replay and manual inspection
|
|
|
|
## Recorded Example Run
|
|
|
|
- recorded on: `2026-03-19`
|
|
- execution mode: `direct_cli_replay` via `scripts/run_orch_skill_forward_tests.sh`
|
|
- result: `pass`
|
|
- observed run id: `run_blog_skill_retry_001`
|
|
- observed first thread id: `thr_8dbf2d2e46d7469891cc1ef604da476f`
|
|
- observed second thread id: `thr_bdd86f4fe08e4ebfb39b8151ac41a3bb`
|
|
- evidence summary:
|
|
- `orch wait --for task_failed` woke after the first worker-owned thread failed
|
|
- `orch retry --run run_blog_skill_retry_001 --task T1 --json` returned `attempt_no == 2` with a distinct replacement thread for the same worker
|
|
- final `inbox show` on the first thread returned `thread.status == "failed"`
|
|
- final `inbox show` on the second thread returned `thread.status == "done"`
|
|
- final `orch status --run run_blog_skill_retry_001 --json` returned `run.status == "done"` and `tasks[0].status == "done"`
|
|
- note: this recorded run exercised the packaged binaries directly in a temporary DB and did not spawn separate Codex role agents
|
|
|
|
## Recorded Real Forward Run
|
|
|
|
- recorded on: `2026-03-19`
|
|
- execution mode: `real_subagent_forward_test`
|
|
- result: `pass`
|
|
- evidence root: `/tmp/orch-skill-subagents.J1XWgs/leader-retries-failed-task-through-bundled-cli-phased`
|
|
- observed run id: `run_blog_skill_retry_001`
|
|
- observed first thread id: `thr_1e22121642294b56aae351ddec5180d1`
|
|
- observed second thread id: `thr_f2ab1f1899964007b2447796204e1928`
|
|
- evidence summary:
|
|
- the same real leader agent using `skills/orch/` completed the case in three phases: initial `run/task/dispatch`, then `wait --for task_failed` plus `retry`, then final `wait --for task_done` plus `status`
|
|
- a real worker agent using `skills/inbox/` failed the first thread, polled for the retried pending thread, then claimed and completed the second thread
|
|
- main-thread validation confirmed the two thread ids were distinct, the first thread finished `failed`, the second thread finished `done`, and the run/task both finished `done`
|