kurihada/ai-workflow-skill

Fork 0

Files

T

kurihada 5859ff219e orch: require explicit dispatch execution mode

2026-03-20 19:27:30 +08:00

5.9 KiB

Raw Blame History

Case: `leader-retries-failed-task-through-bundled-cli`

Test Type

This is a forward-test and a retry-path skill validation.

The goal is to verify that a leader using the packaged orch skill can reconcile a failed attempt, issue retry, and drive the task to success through a second attempt handled by a real worker.

Purpose

Validate that all of the following can be true at the same time:

the leader can use the bundled orch skill to dispatch an initial attempt
a worker can fail the first attempt through inbox
the leader can reconcile that failure and create a fresh retry attempt
the worker can complete the retried attempt
the final run reaches done and the two attempts map to different threads

Preconditions

orch skill path exists: ORCH_SKILL_PATH=skills/orch
inbox skill path exists: INBOX_SKILL_PATH=skills/inbox
bundled CLI executables exist at ORCH_SKILL_PATH/assets/orch and INBOX_SKILL_PATH/assets/inbox
use an empty temporary directory TMPDIR
initialize TMPDIR/coord.db before launching role agents through INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json init

Agent Topology

leader
worker-a

Inputs

Leader Prompt

Use $orch at ORCH_SKILL_PATH to act as leader on the already initialized SQLite DB TMPDIR/coord.db. Only coordinate through the bundled orch CLI from the skill. Workflow: 1) create run run_blog_skill_retry_001, 2) add and dispatch one task T1 to worker-a with --execution-mode analysis, 3) wait until the first attempt fails, 4) reconcile, 5) retry T1 with a short retry note, 6) wait until the retried attempt completes, 7) reconcile again and inspect final status, 8) stop after reporting RUN_ID, THREAD_ID_1, and THREAD_ID_2. Do not use ordinary chat to coordinate with the worker.

Worker Prompt

Use $inbox at INBOX_SKILL_PATH to act as worker-a on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) fetch and claim the first assigned thread, 2) fail that first attempt with a clear summary, 3) keep watching for retried work assigned to worker-a, 4) fetch and claim the retried thread, 5) finish the retried attempt with done, 6) stop after reporting both THREAD_ID_1 and THREAD_ID_2. Do not use ordinary chat to coordinate with the leader.

Execution Parameters

use the shared execution contract from README.md
use the shared timeout defaults from README.md
do not override the default cleanup policy

Execution Steps

Initialize TMPDIR/coord.db once through the bundled inbox CLI before launching agents
Inject skills/orch/ into leader
Inject skills/inbox/ into worker-a
Point both agents at the same database path TMPDIR/coord.db
Launch leader and worker-a in parallel
Wait for both agents to finish
Resolve THREAD_ID_1 and THREAD_ID_2 from the agent outputs
Independently run the validation commands from the main thread

Validation Commands

ORCH_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json status --run run_blog_skill_retry_001
INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json show --thread THREAD_ID_1
INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json show --thread THREAD_ID_2

Expected Outcomes

the first worker-owned thread reaches failed
the leader successfully issues retry
the second worker-owned thread is distinct from the first
the second worker-owned thread reaches done
the final run state is done

Assertions

THREAD_ID_1 != THREAD_ID_2
status.data.run.status == "done"
status.data.tasks[0].status == "done"
show THREAD_ID_1 reports a terminal failed thread state
show THREAD_ID_2 reports a terminal done thread state
the worker summary confirms that the retried attempt was a new thread rather than a reused one

Cleanup

use the default cleanup policy from README.md
if the run fails, retain TMPDIR and coord.db for replay and manual inspection

Recorded Example Run

recorded on: 2026-03-19
execution mode: direct_cli_replay via scripts/run_orch_skill_forward_tests.sh
result: pass
observed run id: run_blog_skill_retry_001
observed first thread id: thr_8dbf2d2e46d7469891cc1ef604da476f
observed second thread id: thr_bdd86f4fe08e4ebfb39b8151ac41a3bb
evidence summary:
orch wait --for task_failed woke after the first worker-owned thread failed
orch retry --run run_blog_skill_retry_001 --task T1 --json returned attempt_no == 2 with a distinct replacement thread for the same worker
final inbox show on the first thread returned thread.status == "failed"
final inbox show on the second thread returned thread.status == "done"
final orch status --run run_blog_skill_retry_001 --json returned run.status == "done" and tasks[0].status == "done"
note: this recorded run exercised the packaged binaries directly in a temporary DB and did not spawn separate Codex role agents

Recorded Real Forward Run

recorded on: 2026-03-19
execution mode: real_subagent_forward_test
result: pass
evidence root: /tmp/orch-skill-subagents.J1XWgs/leader-retries-failed-task-through-bundled-cli-phased
observed run id: run_blog_skill_retry_001
observed first thread id: thr_1e22121642294b56aae351ddec5180d1
observed second thread id: thr_f2ab1f1899964007b2447796204e1928
evidence summary:
the same real leader agent using skills/orch/ completed the case in three phases: initial run/task/dispatch, then wait --for task_failed plus retry, then final wait --for task_done plus status
a real worker agent using skills/inbox/ failed the first thread, polled for the retried pending thread, then claimed and completed the second thread
main-thread validation confirmed the two thread ids were distinct, the first thread finished failed, the second thread finished done, and the run/task both finished done

5.9 KiB Raw Blame History

Case: leader-retries-failed-task-through-bundled-cli