run the scenario with real subagents, not simulated transcripts
inject the same skill bundle into every participating agent
launch all role agents in parallel when the scenario depends on agent-to-agent timing
require every agent to coordinate through the bundled CLI and shared SQLite DB instead of ordinary chat
validate the final inbox state independently from the main thread after the agents stop

Default Timeouts

Use these defaults unless a case file explicitly overrides them:

Default Failure Conditions

Treat the test as failed if any of the following happens:

any required agent does not reach a final state before timeout
any required inbox command returns a non-success result unless the case expects that failure
the final show output does not match the expected thread state
the expected message sequence or key message bodies do not appear
the agents fall back to ordinary chat for critical coordination instead of inbox messages

Collect at least the following artifacts for every run:

Use these defaults unless a case file explicitly overrides them:

keep the temporary DB and working directory on failure for debugging
cleanup the temporary DB and working directory on success only if the caller does not need replay artifacts

Each case file should use this structure:

Case Slug	File	Coverage Note
`multi-agent-roundtrip-through-bundled-cli`	multi-agent-roundtrip-through-bundled-cli.md	validates that two agents can use the bundled inbox skill to complete a blocked question and done result roundtrip
`parallel-workers-claim-conflict-through-bundled-cli`	parallel-workers-claim-conflict-through-bundled-cli.md	validates that two workers using the skill observe a real `lease_conflict` on the same thread
`blocked-worker-timeout-without-reply-through-bundled-cli`	blocked-worker-timeout-without-reply-through-bundled-cli.md	validates that a blocked worker using the skill receives the expected `wait-reply` timeout outcome when no leader reply arrives
`leader-cancels-claimed-thread-through-bundled-cli`	leader-cancels-claimed-thread-through-bundled-cli.md	validates that a leader can cancel an actively claimed thread and that both agents observe the cancelled terminal state
`artifact-roundtrip-through-bundled-cli`	artifact-roundtrip-through-bundled-cli.md	validates that bundled CLI usage through the skill preserves body-file and artifact data across task and result messages

In scope:

Out of scope: