docs: add execution roadmap workflow
This commit is contained in:
@@ -29,6 +29,55 @@ Use these defaults unless a case file explicitly overrides them:
|
||||
- require every agent to coordinate through the bundled CLI and shared SQLite DB instead of ordinary chat
|
||||
- validate the final inbox state independently from the main thread after the agents stop
|
||||
|
||||
## How An Agent Runs These Cases
|
||||
|
||||
Use one test-runner agent to execute each case.
|
||||
|
||||
The test-runner agent is responsible for:
|
||||
|
||||
- reading this `README.md` first, then one specific case file
|
||||
- creating an isolated temporary directory and SQLite DB path for that run
|
||||
- launching the role agents described in `Agent Topology`
|
||||
- injecting the same `skills/inbox/` bundle into every role agent
|
||||
- passing each role agent the prompt text from the case file with concrete values substituted for `SKILL_PATH`, `TMPDIR`, and `THREAD_ID` when needed
|
||||
- coordinating launch order or parallel start according to the case file
|
||||
- collecting agent final summaries as evidence
|
||||
- resolving the final `THREAD_ID`
|
||||
- running the `Validation Commands` from the main thread after the role agents stop
|
||||
- comparing the observed results against `Expected Outcomes` and `Assertions`
|
||||
- returning a final pass/fail judgment with concrete evidence
|
||||
|
||||
The role agents are responsible for:
|
||||
|
||||
- acting only within the role assigned in the case file
|
||||
- using the injected inbox skill rather than ad hoc repository discovery
|
||||
- coordinating through the bundled CLI and shared DB
|
||||
- reporting the concrete thread id, key command outcomes, and final observed state back to the test-runner agent
|
||||
|
||||
The test-runner agent should treat a case as passed only when:
|
||||
|
||||
- all role agents reach a final state without violating the case contract
|
||||
- the independent validation commands succeed
|
||||
- the final inbox state matches the assertions in the case file
|
||||
|
||||
The test-runner agent should treat a case as failed when:
|
||||
|
||||
- any role agent times out or stalls
|
||||
- a required inbox action is skipped
|
||||
- a role agent falls back to ordinary chat for critical coordination
|
||||
- the final inbox state conflicts with the documented assertions
|
||||
|
||||
The test-runner agent should report results in this shape:
|
||||
|
||||
- `case`
|
||||
- `db_path`
|
||||
- `thread_id`
|
||||
- `result`: `pass` or `fail`
|
||||
- `agent_summaries`
|
||||
- `validation_evidence`
|
||||
- `assertion_checklist`
|
||||
- `notes`
|
||||
|
||||
## Default Timeouts
|
||||
|
||||
Use these defaults unless a case file explicitly overrides them:
|
||||
|
||||
Reference in New Issue
Block a user