Files
ai-workflow-skill/docs/tests/inbox-skill/multi-agent-roundtrip-through-bundled-cli.md
T

4.4 KiB

Case: multi-agent-roundtrip-through-bundled-cli

Test Type

This is a forward-test and a multi-agent end-to-end skill validation.

The goal is not to validate one CLI subcommand in isolation. The goal is to validate that two real agents can complete a closed-loop coordination flow through the packaged skills/inbox/ skill and bundled CLI.

Purpose

Validate that all of the following can be true at the same time:

  • both agents can explicitly use $inbox
  • both agents coordinate through the bundled ./assets/inbox against the same SQLite DB
  • the worker follows the protocol fetch -> claim -> update -> wait-reply -> done
  • the leader follows the protocol init -> send -> show/reply -> show
  • the final inbox thread state and message history match the expected contract

Preconditions

  • skill path exists: SKILL_PATH=skills/inbox
  • bundled CLI executable exists: SKILL_PATH/assets/inbox
  • use an empty temporary directory TMPDIR
  • test database path is TMPDIR/coord.db

Agent Topology

  • leader
  • worker-a

Inputs

Leader Prompt

Use $inbox at SKILL_PATH to act as leader on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) initialize the DB, 2) send exactly one task to worker-a asking them to implement a small logging choice, 3) monitor the thread until worker-a asks one blocked question, 4) answer the blocked question with a clear decision ('use stdout'), 5) wait until worker-a marks the thread done, 6) inspect the final thread with show, then stop. Do not use ordinary chat to coordinate with the other agent.

Worker Prompt

Use $inbox at SKILL_PATH to act as worker-a on SQLite DB TMPDIR/coord.db. Only coordinate through the bundled inbox CLI from the skill. Workflow: 1) wait until there is pending work for worker-a, 2) fetch it, 3) claim it, 4) send an in_progress update, 5) send a blocked update with one precise question asking whether logging should go to stdout or stderr, 6) wait for a reply, 7) finish the task with done using the received decision, 8) stop. Do not use ordinary chat to coordinate with the other agent.

Execution Parameters

  • use the shared execution contract from README.md
  • use the shared timeout defaults from README.md
  • do not override the default cleanup policy

Execution Steps

  1. Inject the same skills/inbox/ skill into both real agents
  2. Point both agents at the same database path TMPDIR/coord.db
  3. Launch leader and worker-a in parallel
  4. Wait for both agents to finish
  5. Resolve THREAD_ID from the agent outputs or inbox history
  6. Independently run the validation commands from the main thread

Validation Commands

SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json show --thread THREAD_ID
SKILL_PATH/assets/inbox --db TMPDIR/coord.db --json list --assigned-to worker-a

Expected Outcomes

  • leader successfully runs init
  • leader successfully sends one new thread to worker-a
  • worker-a successfully fetches that thread and successfully claims it
  • worker-a emits one progress message
  • worker-a emits one question message focused on stdout vs stderr
  • leader successfully emits one answer message with the explicit decision Use stdout.
  • worker-a successfully consumes that answer through wait-reply
  • worker-a successfully emits done
  • show returns thread.status == "done"

Assertions

  • show contains at least the following message kinds in order:
    • task
    • event (thread claimed)
    • progress
    • question
    • answer
    • result
  • question.body == "Should logging go to stdout or stderr?"
  • answer.body == "Use stdout."
  • the final result message explicitly states that logging uses stdout
  • list --assigned-to worker-a shows the thread and its status is done
  • coordination happens primarily through the inbox thread rather than ordinary chat

Cleanup

  • use the default cleanup policy from README.md
  • if the run fails, retain TMPDIR and coord.db for replay and manual inspection

Recorded Example Run

This case already has one reference forward-test run:

  • DB: /tmp/inbox-skill-fwd.j9kKvp/coord.db
  • Thread: thr_48d6f6a77eff4c2e88ce80e8fdc05da3

That run passed. The thread history contained task -> event -> progress -> question -> answer -> result, and the final thread state was done.