Add orch skill forward test evidence

2026-03-19 18:36:31 +08:00
parent d17b5ebfbd
commit e9cbb15c2d
10 changed files with 1036 additions and 0 deletions
@@ -0,0 +1,66 @@
+# Title
+
+Direct Replay For Orch Skill Cases
+
+## Status
+
+- `completed`
+
+## Owner
+
+- codex
+
+## Started At
+
+- `2026-03-19`
+
+## Goal
+
+- Execute the documented `docs/tests/orch-skill/` scenarios against the bundled `skills/orch/assets/orch` and `skills/inbox/assets/inbox` binaries, capture concrete evidence, and sync the repo docs with the observed results.
+
+## Scope
+
+- add a reusable local runner for the five documented orch-skill scenarios
+- run the scenarios and capture per-case evidence
+- update the orch-skill docs with recorded runs and note the execution mode
+- update the implementation roadmap to reflect the new replay coverage
+
+## Checklist
+
+- [x] Review the orch-skill case docs and bundled CLI surfaces.
+- [x] Add a reusable direct replay runner for the five orch-skill scenarios.
+- [x] Execute the runner and collect evidence for all five cases.
+- [x] Update the orch-skill docs with recorded example runs and execution notes.
+- [x] Update the implementation roadmap and archive this execution roadmap.
+
+## Files
+
+- `scripts/run_orch_skill_forward_tests.sh`
+- `docs/tests/orch-skill/README.md`
+- `docs/tests/orch-skill/leader-run-dispatch-reconcile-through-bundled-cli.md`
+- `docs/tests/orch-skill/leader-blocked-answer-resume-through-bundled-cli.md`
+- `docs/tests/orch-skill/strict-worktree-dispatch-to-cleanup-through-bundled-cli.md`
+- `docs/tests/orch-skill/leader-retries-failed-task-through-bundled-cli.md`
+- `docs/tests/orch-skill/leader-reassigns-blocked-task-through-bundled-cli.md`
+- `docs/implementation-roadmap.md`
+- `docs/roadmaps/archive/orch-skill-direct-replay.md`
+
+## Decisions
+
+- Use direct bundled-CLI replay instead of spawning Codex role agents in this turn, because the current session does not permit sub-agent delegation unless the user explicitly asks for it.
+- Keep the replay runner repo-local so the same scenarios can be rerun later without reconstructing the command flow by hand.
+
+## Blockers
+
+- none
+
+## Next Step
+
+- rerun `scripts/run_orch_skill_forward_tests.sh` when the bundled skill binaries or orch-skill case docs change, and add true multi-agent forward coverage later if explicit sub-agent execution is needed
+
+## Completion Summary
+
+- Added `scripts/run_orch_skill_forward_tests.sh` as a reusable direct bundled-CLI replay runner for the five documented orch-skill scenarios.
+- Executed the runner on `2026-03-19`; all five scenarios passed and produced per-case JSON evidence under a temporary output root.
+- Updated `docs/tests/orch-skill/README.md` plus all five case files with recorded example runs and explicit execution-mode notes.
+- Updated `docs/implementation-roadmap.md` to record the new replay runner and captured orch-skill execution evidence.
@@ -0,0 +1,67 @@
+# Title
+
+Real Subagent Forward Tests For Orch Skill
+
+## Status
+
+- `completed`
+
+## Owner
+
+- codex
+
+## Started At
+
+- `2026-03-19`
+
+## Goal
+
+- Execute the documented `docs/tests/orch-skill/` scenarios using real spawned role agents with injected `skills/orch/` and `skills/inbox/`, then record concrete pass/fail evidence and sync the repository docs.
+
+## Scope
+
+- validate subagent skill injection for project-local orch and inbox skills
+- run the five documented orch-skill forward cases with real leader and worker subagents
+- collect main-thread validation evidence and agent summaries
+- update the orch-skill docs and implementation roadmap with the real forward-test results
+
+## Checklist
+
+- [x] Re-read the orch-skill shared execution contract and worker skill constraints.
+- [x] Validate project-local skill injection with a small spawned-agent probe.
+- [x] Execute the five orch-skill cases with real spawned role agents and collect evidence.
+- [x] Update the orch-skill docs and implementation roadmap with the real forward-test results.
+- [x] Archive this execution roadmap with a completion summary.
+
+## Files
+
+- `docs/tests/orch-skill/README.md`
+- `docs/tests/orch-skill/leader-run-dispatch-reconcile-through-bundled-cli.md`
+- `docs/tests/orch-skill/leader-blocked-answer-resume-through-bundled-cli.md`
+- `docs/tests/orch-skill/strict-worktree-dispatch-to-cleanup-through-bundled-cli.md`
+- `docs/tests/orch-skill/leader-retries-failed-task-through-bundled-cli.md`
+- `docs/tests/orch-skill/leader-reassigns-blocked-task-through-bundled-cli.md`
+- `docs/implementation-roadmap.md`
+- `docs/roadmaps/archive/orch-skill-real-forward-test.md`
+
+## Decisions
+
+- Use real spawned role agents per case instead of the direct replay runner, because the user explicitly asked for true tests with subagents.
+- Keep the main thread responsible for DB setup, fixture creation, and independent validation so the final judgment does not rely only on role-agent self-reporting.
+- Fall back from `fork_context: true` to `fork_context: false` for the real case runs after the first wider-context attempt stalled and mis-executed the worker-side contract in this repo.
+- For the longer `retry` and `reassign` cases, keep one leader agent active across staged prompts instead of one long monolithic prompt, because staged execution proved more reliable while still preserving a real agent-owned `orch` flow.
+
+## Blockers
+
+- none
+
+## Next Step
+
+- rerun the same five cases when the packaged skill binaries or case docs change, and consider adding the same real subagent coverage for `council-review` if that surface needs parity
+
+## Completion Summary
+
+- Verified both project-local skill bundles with spawned-agent help-command probes before the real runs.
+- Collected successful real subagent evidence for all five orch-skill cases under `/tmp/orch-skill-subagents.J1XWgs`.
+- Main-thread validation confirmed all five final successful runs reached the expected `orch` and `inbox` states.
+- Updated `docs/tests/orch-skill/README.md`, all five case files, and `docs/implementation-roadmap.md` to record the new real forward-test coverage.