docs: add execution roadmap workflow

This commit is contained in:
2026-03-19 12:52:02 +08:00
parent 72d7caa552
commit b110bb24d9
8 changed files with 238 additions and 0 deletions
+49
View File
@@ -29,6 +29,55 @@ Use these defaults unless a case file explicitly overrides them:
- require every agent to coordinate through the bundled CLI and shared SQLite DB instead of ordinary chat
- validate the final inbox state independently from the main thread after the agents stop
## How An Agent Runs These Cases
Use one test-runner agent to execute each case.
The test-runner agent is responsible for:
- reading this `README.md` first, then one specific case file
- creating an isolated temporary directory and SQLite DB path for that run
- launching the role agents described in `Agent Topology`
- injecting the same `skills/inbox/` bundle into every role agent
- passing each role agent the prompt text from the case file with concrete values substituted for `SKILL_PATH`, `TMPDIR`, and `THREAD_ID` when needed
- coordinating launch order or parallel start according to the case file
- collecting agent final summaries as evidence
- resolving the final `THREAD_ID`
- running the `Validation Commands` from the main thread after the role agents stop
- comparing the observed results against `Expected Outcomes` and `Assertions`
- returning a final pass/fail judgment with concrete evidence
The role agents are responsible for:
- acting only within the role assigned in the case file
- using the injected inbox skill rather than ad hoc repository discovery
- coordinating through the bundled CLI and shared DB
- reporting the concrete thread id, key command outcomes, and final observed state back to the test-runner agent
The test-runner agent should treat a case as passed only when:
- all role agents reach a final state without violating the case contract
- the independent validation commands succeed
- the final inbox state matches the assertions in the case file
The test-runner agent should treat a case as failed when:
- any role agent times out or stalls
- a required inbox action is skipped
- a role agent falls back to ordinary chat for critical coordination
- the final inbox state conflicts with the documented assertions
The test-runner agent should report results in this shape:
- `case`
- `db_path`
- `thread_id`
- `result`: `pass` or `fail`
- `agent_summaries`
- `validation_evidence`
- `assertion_checklist`
- `notes`
## Default Timeouts
Use these defaults unless a case file explicitly overrides them: