4.4 KiB
4.4 KiB
Case: multi-agent-roundtrip-through-bundled-cli
Test Type
This is a forward-test and a multi-agent end-to-end skill validation.
The goal is not to validate one CLI subcommand in isolation. The goal is to validate that two real agents can complete a closed-loop coordination flow through the packaged skills/inbox/ skill and bundled CLI.
Purpose
Validate that all of the following can be true at the same time:
- both agents can explicitly use
$inbox - both agents coordinate through the bundled
./assets/inboxagainst the same SQLite DB - the worker follows the protocol
fetch -> claim -> update -> wait-reply -> done - the leader follows the protocol
init -> send -> show/reply -> show - the final inbox thread state and message history match the expected contract
Preconditions
- skill path exists:
SKILL_PATH=skills/inbox - bundled CLI executable exists:
SKILL_PATH/assets/inbox - use an empty temporary directory
TMPDIR - test database path is
TMPDIR/coord.db
Agent Topology
leaderworker-a
Inputs
Leader Prompt
Use $inbox at SKILL_PATH to act as leader on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) initialize the DB, 2) send exactly one task to worker-a asking them to implement a small logging choice, 3) monitor the thread until worker-a asks one blocked question, 4) answer the blocked question with a clear decision ('use stdout'), 5) wait until worker-a marks the thread done, 6) inspect the final thread with show, then stop. Do not use ordinary chat to coordinate with the other agent.
Worker Prompt
Use $inbox at SKILL_PATH to act as worker-a on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) wait until there is pending work for worker-a, 2) fetch it, 3) claim it, 4) send an in_progress update, 5) send a blocked update with one precise question asking whether logging should go to stdout or stderr, 6) wait for a reply, 7) finish the task with done using the received decision, 8) stop. Do not use ordinary chat to coordinate with the other agent.
Execution Parameters
- use the shared execution contract from README.md
- use the shared timeout defaults from README.md
- do not override the default cleanup policy
Execution Steps
- Inject the same
skills/inbox/skill into both real agents - Point both agents at the same database path
TMPDIR/coord.db - Launch
leaderandworker-ain parallel - Wait for both agents to finish
- Resolve
THREAD_IDfrom the agent outputs or inbox history - Independently run the validation commands from the main thread
Validation Commands
SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json show --thread THREAD_ID
SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json list --assigned-to worker-a
Expected Outcomes
leadersuccessfully runsinitleadersuccessfullysends one new thread toworker-aworker-asuccessfullyfetches that thread and successfullyclaims itworker-aemits oneprogressmessageworker-aemits onequestionmessage focused onstdoutvsstderrleadersuccessfully emits oneanswermessage with the explicit decisionUse stdout.worker-asuccessfully consumes that answer throughwait-replyworker-asuccessfully emitsdoneshowreturnsthread.status == "done"
Assertions
showcontains at least the following message kinds in order:taskevent(thread claimed)progressquestionanswerresult
question.body == "Should logging go to stdout or stderr?"answer.body == "Use stdout."- the final
resultmessage explicitly states that logging usesstdout list --assigned-to worker-ashows the thread and its status isdone- coordination happens primarily through the inbox thread rather than ordinary chat
Cleanup
- use the default cleanup policy from README.md
- if the run fails, retain
TMPDIRandcoord.dbfor replay and manual inspection
Recorded Example Run
This case already has one reference forward-test run:
- DB:
/tmp/inbox-skill-fwd.j9kKvp/coord.db - Thread:
thr_48d6f6a77eff4c2e88ce80e8fdc05da3
That run passed. The thread history contained task -> event -> progress -> question -> answer -> result, and the final thread state was done.