docs: add inbox skill test scenarios
This commit is contained in:
@@ -0,0 +1,84 @@
|
||||
# Case: `leader-cancels-claimed-thread-through-bundled-cli`
|
||||
|
||||
## Test Type
|
||||
|
||||
This is a `forward-test` and a terminal-state intervention validation.
|
||||
|
||||
The goal is to verify that a leader and worker can both observe a thread transition to `cancelled` through the bundled inbox skill while the thread is actively claimed.
|
||||
|
||||
## Purpose
|
||||
|
||||
Validate that all of the following can be true at the same time:
|
||||
|
||||
- the worker can fetch and claim a real thread through the skill
|
||||
- the leader can cancel that thread through the same bundled CLI
|
||||
- the final thread state is `cancelled`
|
||||
- both parties can inspect the terminal state from inbox history
|
||||
|
||||
## Preconditions
|
||||
|
||||
- skill path exists: `SKILL_PATH=skills/inbox`
|
||||
- bundled CLI executable exists: `SKILL_PATH/assets/inbox`
|
||||
- use an empty temporary directory `TMPDIR`
|
||||
- test database path is `TMPDIR/coord.db`
|
||||
|
||||
## Agent Topology
|
||||
|
||||
- `leader`
|
||||
- `worker-a`
|
||||
|
||||
## Inputs
|
||||
|
||||
### Leader Prompt
|
||||
|
||||
```text
|
||||
Use $inbox at SKILL_PATH to act as leader on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) initialize the DB, 2) send exactly one task to worker-a, 3) wait until worker-a has claimed the thread or reported in_progress, 4) cancel the thread with a clear reason, 5) inspect the final thread with show, 6) stop. Do not use ordinary chat to coordinate with the other agent.
|
||||
```
|
||||
|
||||
### Worker Prompt
|
||||
|
||||
```text
|
||||
Use $inbox at SKILL_PATH to act as worker-a on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) fetch pending work, 2) claim it, 3) send an in_progress update, 4) keep monitoring the thread until it reaches a terminal state, 5) stop after reporting the final status you observed. Do not use ordinary chat to coordinate with the other agent.
|
||||
```
|
||||
|
||||
## Execution Parameters
|
||||
|
||||
- use the shared execution contract from [README.md](./README.md)
|
||||
- use the shared timeout defaults from [README.md](./README.md)
|
||||
- do not override the default cleanup policy
|
||||
|
||||
## Execution Steps
|
||||
|
||||
1. Inject the same `skills/inbox/` skill into both real agents
|
||||
2. Point both agents at the same database path `TMPDIR/coord.db`
|
||||
3. Launch `leader` and `worker-a` in parallel
|
||||
4. Wait for both agents to finish
|
||||
5. Resolve `THREAD_ID` from the agent outputs or inbox history
|
||||
6. Independently run the validation commands from the main thread
|
||||
|
||||
## Validation Commands
|
||||
|
||||
```bash
|
||||
SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json show --thread THREAD_ID
|
||||
SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json list --status cancelled
|
||||
```
|
||||
|
||||
## Expected Outcomes
|
||||
|
||||
- `worker-a` successfully claims the thread
|
||||
- `worker-a` emits one `progress` message
|
||||
- `leader` successfully emits `cancel` with a reason
|
||||
- the final thread status is `cancelled`
|
||||
- the worker reports that it observed the cancelled terminal state
|
||||
|
||||
## Assertions
|
||||
|
||||
- `show` contains at least `task -> event -> progress -> control`
|
||||
- the final thread status is `cancelled`
|
||||
- the terminal message or thread history captures the cancel reason
|
||||
- `list --status cancelled` returns the thread
|
||||
|
||||
## Cleanup
|
||||
|
||||
- use the default cleanup policy from [README.md](./README.md)
|
||||
- if the run fails, retain `TMPDIR` and `coord.db` for replay and manual inspection
|
||||
Reference in New Issue
Block a user