docs: add execution roadmap workflow

2026-03-19 12:52:02 +08:00
parent 72d7caa552
commit b110bb24d9
8 changed files with 238 additions and 0 deletions
@@ -29,6 +29,55 @@ Use these defaults unless a case file explicitly overrides them:
 - require every agent to coordinate through the bundled CLI and shared SQLite DB instead of ordinary chat
 - validate the final inbox state independently from the main thread after the agents stop

+## How An Agent Runs These Cases
+
+Use one test-runner agent to execute each case.
+
+The test-runner agent is responsible for:
+
+- reading this `README.md` first, then one specific case file
+- creating an isolated temporary directory and SQLite DB path for that run
+- launching the role agents described in `Agent Topology`
+- injecting the same `skills/inbox/` bundle into every role agent
+- passing each role agent the prompt text from the case file with concrete values substituted for `SKILL_PATH`, `TMPDIR`, and `THREAD_ID` when needed
+- coordinating launch order or parallel start according to the case file
+- collecting agent final summaries as evidence
+- resolving the final `THREAD_ID`
+- running the `Validation Commands` from the main thread after the role agents stop
+- comparing the observed results against `Expected Outcomes` and `Assertions`
+- returning a final pass/fail judgment with concrete evidence
+
+The role agents are responsible for:
+
+- acting only within the role assigned in the case file
+- using the injected inbox skill rather than ad hoc repository discovery
+- coordinating through the bundled CLI and shared DB
+- reporting the concrete thread id, key command outcomes, and final observed state back to the test-runner agent
+
+The test-runner agent should treat a case as passed only when:
+
+- all role agents reach a final state without violating the case contract
+- the independent validation commands succeed
+- the final inbox state matches the assertions in the case file
+
+The test-runner agent should treat a case as failed when:
+
+- any role agent times out or stalls
+- a required inbox action is skipped
+- a role agent falls back to ordinary chat for critical coordination
+- the final inbox state conflicts with the documented assertions
+
+The test-runner agent should report results in this shape:
+
+- `case`
+- `db_path`
+- `thread_id`
+- `result`: `pass` or `fail`
+- `agent_summaries`
+- `validation_evidence`
+- `assertion_checklist`
+- `notes`
+
 ## Default Timeouts

 Use these defaults unless a case file explicitly overrides them: