Files
ai-workflow-skill/docs/tests/orch-skill/leader-retries-failed-task-through-bundled-cli.md
T

5.9 KiB

Case: leader-retries-failed-task-through-bundled-cli

Test Type

This is a forward-test and a retry-path skill validation.

The goal is to verify that a leader using the packaged orch skill can reconcile a failed attempt, issue retry, and drive the task to success through a second attempt handled by a real worker.

Purpose

Validate that all of the following can be true at the same time:

  • the leader can use the bundled orch skill to dispatch an initial attempt
  • a worker can fail the first attempt through inbox
  • the leader can reconcile that failure and create a fresh retry attempt
  • the worker can complete the retried attempt
  • the final run reaches done and the two attempts map to different threads

Preconditions

  • orch skill path exists: ORCH_SKILL_PATH=skills/orch
  • inbox skill path exists: INBOX_SKILL_PATH=skills/inbox
  • bundled CLI executables exist at ORCH_SKILL_PATH/assets/orch and INBOX_SKILL_PATH/assets/inbox
  • use an empty temporary directory TMPDIR
  • initialize TMPDIR/coord.db before launching role agents through INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json init

Agent Topology

  • leader
  • worker-a

Inputs

Leader Prompt

Use $orch at ORCH_SKILL_PATH to act as leader on the already initialized SQLite DB TMPDIR/coord.db. Only coordinate through the bundled orch CLI from the skill. Workflow: 1) create run run_blog_skill_retry_001, 2) add and dispatch one task T1 to worker-a with --execution-mode analysis, 3) wait until the first attempt fails, 4) reconcile, 5) retry T1 with a short retry note, 6) wait until the retried attempt completes, 7) reconcile again and inspect final status, 8) stop after reporting RUN_ID, THREAD_ID_1, and THREAD_ID_2. Do not use ordinary chat to coordinate with the worker.

Worker Prompt

Use $inbox at INBOX_SKILL_PATH to act as worker-a on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) fetch and claim the first assigned thread, 2) fail that first attempt with a clear summary, 3) keep watching for retried work assigned to worker-a, 4) fetch and claim the retried thread, 5) finish the retried attempt with done, 6) stop after reporting both THREAD_ID_1 and THREAD_ID_2. Do not use ordinary chat to coordinate with the leader.

Execution Parameters

  • use the shared execution contract from README.md
  • use the shared timeout defaults from README.md
  • do not override the default cleanup policy

Execution Steps

  1. Initialize TMPDIR/coord.db once through the bundled inbox CLI before launching agents
  2. Inject skills/orch/ into leader
  3. Inject skills/inbox/ into worker-a
  4. Point both agents at the same database path TMPDIR/coord.db
  5. Launch leader and worker-a in parallel
  6. Wait for both agents to finish
  7. Resolve THREAD_ID_1 and THREAD_ID_2 from the agent outputs
  8. Independently run the validation commands from the main thread

Validation Commands

ORCH_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json status --run run_blog_skill_retry_001
INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json show --thread THREAD_ID_1
INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json show --thread THREAD_ID_2

Expected Outcomes

  • the first worker-owned thread reaches failed
  • the leader successfully issues retry
  • the second worker-owned thread is distinct from the first
  • the second worker-owned thread reaches done
  • the final run state is done

Assertions

  • THREAD_ID_1 != THREAD_ID_2
  • status.data.run.status == "done"
  • status.data.tasks[0].status == "done"
  • show THREAD_ID_1 reports a terminal failed thread state
  • show THREAD_ID_2 reports a terminal done thread state
  • the worker summary confirms that the retried attempt was a new thread rather than a reused one

Cleanup

  • use the default cleanup policy from README.md
  • if the run fails, retain TMPDIR and coord.db for replay and manual inspection

Recorded Example Run

  • recorded on: 2026-03-19
  • execution mode: direct_cli_replay via scripts/run_orch_skill_forward_tests.sh
  • result: pass
  • observed run id: run_blog_skill_retry_001
  • observed first thread id: thr_8dbf2d2e46d7469891cc1ef604da476f
  • observed second thread id: thr_bdd86f4fe08e4ebfb39b8151ac41a3bb
  • evidence summary:
  • orch wait --for task_failed woke after the first worker-owned thread failed
  • orch retry --run run_blog_skill_retry_001 --task T1 --json returned attempt_no == 2 with a distinct replacement thread for the same worker
  • final inbox show on the first thread returned thread.status == "failed"
  • final inbox show on the second thread returned thread.status == "done"
  • final orch status --run run_blog_skill_retry_001 --json returned run.status == "done" and tasks[0].status == "done"
  • note: this recorded run exercised the packaged binaries directly in a temporary DB and did not spawn separate Codex role agents

Recorded Real Forward Run

  • recorded on: 2026-03-19
  • execution mode: real_subagent_forward_test
  • result: pass
  • evidence root: /tmp/orch-skill-subagents.J1XWgs/leader-retries-failed-task-through-bundled-cli-phased
  • observed run id: run_blog_skill_retry_001
  • observed first thread id: thr_1e22121642294b56aae351ddec5180d1
  • observed second thread id: thr_f2ab1f1899964007b2447796204e1928
  • evidence summary:
  • the same real leader agent using skills/orch/ completed the case in three phases: initial run/task/dispatch, then wait --for task_failed plus retry, then final wait --for task_done plus status
  • a real worker agent using skills/inbox/ failed the first thread, polled for the retried pending thread, then claimed and completed the second thread
  • main-thread validation confirmed the two thread ids were distinct, the first thread finished failed, the second thread finished done, and the run/task both finished done