Add orch skill test plan docs

2026-03-19 17:25:03 +08:00
parent 8f10dff823
commit 8b26815d53
8 changed files with 697 additions and 0 deletions
@@ -0,0 +1,91 @@
+# Case: `leader-retries-failed-task-through-bundled-cli`
+
+## Test Type
+
+This is a `forward-test` and a retry-path skill validation.
+
+The goal is to verify that a leader using the packaged `orch` skill can reconcile a failed attempt, issue `retry`, and drive the task to success through a second attempt handled by a real worker.
+
+## Purpose
+
+Validate that all of the following can be true at the same time:
+
+- the leader can use the bundled orch skill to dispatch an initial attempt
+- a worker can fail the first attempt through inbox
+- the leader can reconcile that failure and create a fresh retry attempt
+- the worker can complete the retried attempt
+- the final run reaches `done` and the two attempts map to different threads
+
+## Preconditions
+
+- orch skill path exists: `ORCH_SKILL_PATH=skills/orch`
+- inbox skill path exists: `INBOX_SKILL_PATH=skills/inbox`
+- bundled CLI executables exist at `ORCH_SKILL_PATH/assets/orch` and `INBOX_SKILL_PATH/assets/inbox`
+- use an empty temporary directory `TMPDIR`
+- initialize `TMPDIR/coord.db` before launching role agents through `INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json init`
+
+## Agent Topology
+
+- `leader`
+- `worker-a`
+
+## Inputs
+
+### Leader Prompt
+
+```text
+Use $orch at ORCH_SKILL_PATH to act as leader on the already initialized SQLite DB TMPDIR/coord.db. Only coordinate through the bundled orch CLI from the skill. Workflow: 1) create run run_blog_skill_retry_001, 2) add and dispatch one task T1 to worker-a, 3) wait until the first attempt fails, 4) reconcile, 5) retry T1 with a short retry note, 6) wait until the retried attempt completes, 7) reconcile again and inspect final status, 8) stop after reporting RUN_ID, THREAD_ID_1, and THREAD_ID_2. Do not use ordinary chat to coordinate with the worker.
+```
+
+### Worker Prompt
+
+```text
+Use $inbox at INBOX_SKILL_PATH to act as worker-a on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) fetch and claim the first assigned thread, 2) fail that first attempt with a clear summary, 3) keep watching for retried work assigned to worker-a, 4) fetch and claim the retried thread, 5) finish the retried attempt with done, 6) stop after reporting both THREAD_ID_1 and THREAD_ID_2. Do not use ordinary chat to coordinate with the leader.
+```
+
+## Execution Parameters
+
+- use the shared execution contract from [README.md](./README.md)
+- use the shared timeout defaults from [README.md](./README.md)
+- do not override the default cleanup policy
+
+## Execution Steps
+
+1. Initialize `TMPDIR/coord.db` once through the bundled inbox CLI before launching agents
+2. Inject `skills/orch/` into `leader`
+3. Inject `skills/inbox/` into `worker-a`
+4. Point both agents at the same database path `TMPDIR/coord.db`
+5. Launch `leader` and `worker-a` in parallel
+6. Wait for both agents to finish
+7. Resolve `THREAD_ID_1` and `THREAD_ID_2` from the agent outputs
+8. Independently run the validation commands from the main thread
+
+## Validation Commands
+
+```bash
+ORCH_SKILL_PATH/assets/orch --db TMPDIR/coord.db --json status --run run_blog_skill_retry_001
+INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json show --thread THREAD_ID_1
+INBOX_SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json show --thread THREAD_ID_2
+```
+
+## Expected Outcomes
+
+- the first worker-owned thread reaches `failed`
+- the leader successfully issues `retry`
+- the second worker-owned thread is distinct from the first
+- the second worker-owned thread reaches `done`
+- the final run state is `done`
+
+## Assertions
+
+- `THREAD_ID_1 != THREAD_ID_2`
+- `status.data.run.status == "done"`
+- `status.data.tasks[0].status == "done"`
+- `show THREAD_ID_1` reports a terminal failed thread state
+- `show THREAD_ID_2` reports a terminal done thread state
+- the worker summary confirms that the retried attempt was a new thread rather than a reused one
+
+## Cleanup
+
+- use the default cleanup policy from [README.md](./README.md)
+- if the run fails, retain `TMPDIR` and `coord.db` for replay and manual inspection