5.9 KiB
5.9 KiB
Case: leader-retries-failed-task-through-bundled-cli
Test Type
This is a forward-test and a retry-path skill validation.
The goal is to verify that a leader using the packaged orch skill can reconcile a failed attempt, issue retry, and drive the task to success through a second attempt handled by a real worker.
Purpose
Validate that all of the following can be true at the same time:
- the leader can use the bundled orch skill to dispatch an initial attempt
- a worker can fail the first attempt through inbox
- the leader can reconcile that failure and create a fresh retry attempt
- the worker can complete the retried attempt
- the final run reaches
doneand the two attempts map to different threads
Preconditions
- orch skill path exists:
ORCH_SKILL_PATH=skills/orch - inbox skill path exists:
INBOX_SKILL_PATH=skills/inbox - bundled CLI executables exist at
ORCH_SKILL_PATH/assets/orchandINBOX_SKILL_PATH/assets/inbox - use an empty temporary directory
TMPDIR - initialize
TMPDIR/coord.dbbefore launching role agents throughINBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json init
Agent Topology
leaderworker-a
Inputs
Leader Prompt
Use $orch at ORCH_SKILL_PATH to act as leader on the already initialized SQLite DB TMPDIR/coord.db. Only coordinate through the bundled orch CLI from the skill. Workflow: 1) create run run_blog_skill_retry_001, 2) add and dispatch one task T1 to worker-a with --execution-mode analysis, 3) wait until the first attempt fails, 4) reconcile, 5) retry T1 with a short retry note, 6) wait until the retried attempt completes, 7) reconcile again and inspect final status, 8) stop after reporting RUN_ID, THREAD_ID_1, and THREAD_ID_2. Do not use ordinary chat to coordinate with the worker.
Worker Prompt
Use $inbox at INBOX_SKILL_PATH to act as worker-a on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) fetch and claim the first assigned thread, 2) fail that first attempt with a clear summary, 3) keep watching for retried work assigned to worker-a, 4) fetch and claim the retried thread, 5) finish the retried attempt with done, 6) stop after reporting both THREAD_ID_1 and THREAD_ID_2. Do not use ordinary chat to coordinate with the leader.
Execution Parameters
- use the shared execution contract from README.md
- use the shared timeout defaults from README.md
- do not override the default cleanup policy
Execution Steps
- Initialize
TMPDIR/coord.dbonce through the bundled inbox CLI before launching agents - Inject
skills/orch/intoleader - Inject
skills/inbox/intoworker-a - Point both agents at the same database path
TMPDIR/coord.db - Launch
leaderandworker-ain parallel - Wait for both agents to finish
- Resolve
THREAD_ID_1andTHREAD_ID_2from the agent outputs - Independently run the validation commands from the main thread
Validation Commands
ORCH_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json status --run run_blog_skill_retry_001
INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json show --thread THREAD_ID_1
INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json show --thread THREAD_ID_2
Expected Outcomes
- the first worker-owned thread reaches
failed - the leader successfully issues
retry - the second worker-owned thread is distinct from the first
- the second worker-owned thread reaches
done - the final run state is
done
Assertions
THREAD_ID_1 != THREAD_ID_2status.data.run.status == "done"status.data.tasks[0].status == "done"show THREAD_ID_1reports a terminal failed thread stateshow THREAD_ID_2reports a terminal done thread state- the worker summary confirms that the retried attempt was a new thread rather than a reused one
Cleanup
- use the default cleanup policy from README.md
- if the run fails, retain
TMPDIRandcoord.dbfor replay and manual inspection
Recorded Example Run
- recorded on:
2026-03-19 - execution mode:
direct_cli_replayviascripts/run_orch_skill_forward_tests.sh - result:
pass - observed run id:
run_blog_skill_retry_001 - observed first thread id:
thr_8dbf2d2e46d7469891cc1ef604da476f - observed second thread id:
thr_bdd86f4fe08e4ebfb39b8151ac41a3bb - evidence summary:
orch wait --for task_failedwoke after the first worker-owned thread failedorch retry --run run_blog_skill_retry_001 --task T1 --jsonreturnedattempt_no == 2with a distinct replacement thread for the same worker- final
inbox showon the first thread returnedthread.status == "failed" - final
inbox showon the second thread returnedthread.status == "done" - final
orch status --run run_blog_skill_retry_001 --jsonreturnedrun.status == "done"andtasks[0].status == "done" - note: this recorded run exercised the packaged binaries directly in a temporary DB and did not spawn separate Codex role agents
Recorded Real Forward Run
- recorded on:
2026-03-19 - execution mode:
real_subagent_forward_test - result:
pass - evidence root:
/tmp/orch-skill-subagents.J1XWgs/leader-retries-failed-task-through-bundled-cli-phased - observed run id:
run_blog_skill_retry_001 - observed first thread id:
thr_1e22121642294b56aae351ddec5180d1 - observed second thread id:
thr_f2ab1f1899964007b2447796204e1928 - evidence summary:
- the same real leader agent using
skills/orch/completed the case in three phases: initialrun/task/dispatch, thenwait --for task_failedplusretry, then finalwait --for task_doneplusstatus - a real worker agent using
skills/inbox/failed the first thread, polled for the retried pending thread, then claimed and completed the second thread - main-thread validation confirmed the two thread ids were distinct, the first thread finished
failed, the second thread finisheddone, and the run/task both finisheddone