Inbox Skill Test Plan
Purpose
This directory tracks human-readable test plans for the skills/inbox/ Codex skill bundle.
These documents are not command-contract specs for the inbox CLI itself.
That coverage already lives under ../inbox/.
This directory exists to describe a different test surface:
- whether an agent can actually use the packaged inbox skill
- whether multiple agents can coordinate through the bundled CLI asset
- whether a real skill-guided conversation reaches the expected inbox state
Test Model
README.mdis the index for this directory- each skill test case lives in its own Markdown file
- use stable case slugs in filenames
Shared Execution Contract
Use these defaults unless a case file explicitly overrides them:
- run the scenario with real subagents, not simulated transcripts
- inject the same skill bundle into every participating agent
- launch all role agents in parallel when the scenario depends on agent-to-agent timing
- require every agent to coordinate through the bundled CLI and shared SQLite DB instead of ordinary chat
- validate the final inbox state independently from the main thread after the agents stop
Default Timeouts
Use these defaults unless a case file explicitly overrides them:
- per-agent timeout:
3m - overall scenario timeout:
5m - async wait margin for the main thread:
30s
Default Failure Conditions
Treat the test as failed if any of the following happens:
- any required agent does not reach a final state before timeout
- any required inbox command returns a non-success result unless the case expects that failure
- the final
showoutput does not match the expected thread state - the expected message sequence or key message bodies do not appear
- the agents fall back to ordinary chat for critical coordination instead of inbox messages
Evidence Capture
Collect at least the following artifacts for every run:
- agent final summaries
- final
show --thread THREAD_ID --jsonoutput - at least one independent listing or lookup command such as
listorfetch - the temporary DB path and resolved thread id
Cleanup Policy
Use these defaults unless a case file explicitly overrides them:
- keep the temporary DB and working directory on failure for debugging
- cleanup the temporary DB and working directory on success only if the caller does not need replay artifacts
Per-Case Template
Each case file should use this structure:
Test TypePurposePreconditionsAgent TopologyInputsExecution ParametersExecution StepsValidation CommandsExpected OutcomesAssertionsCleanupRecorded Example Runwhen a real run has already been captured
Case Files
| Case Slug | File | Coverage Note |
|---|---|---|
multi-agent-roundtrip-through-bundled-cli |
multi-agent-roundtrip-through-bundled-cli.md | validates that two agents can use the bundled inbox skill to complete a blocked question and done result roundtrip |
parallel-workers-claim-conflict-through-bundled-cli |
parallel-workers-claim-conflict-through-bundled-cli.md | validates that two workers using the skill observe a real lease_conflict on the same thread |
blocked-worker-timeout-without-reply-through-bundled-cli |
blocked-worker-timeout-without-reply-through-bundled-cli.md | validates that a blocked worker using the skill receives the expected wait-reply timeout outcome when no leader reply arrives |
leader-cancels-claimed-thread-through-bundled-cli |
leader-cancels-claimed-thread-through-bundled-cli.md | validates that a leader can cancel an actively claimed thread and that both agents observe the cancelled terminal state |
artifact-roundtrip-through-bundled-cli |
artifact-roundtrip-through-bundled-cli.md | validates that bundled CLI usage through the skill preserves body-file and artifact data across task and result messages |
Scope
In scope:
- explicit
$inboxskill invocation - bundled
./assets/inboxCLI usage - shared SQLite DB coordination between multiple agents
- end-to-end thread state and message history validation
- negative-path skill scenarios such as lease conflicts and reply timeouts
- skill-guided artifact and body-file roundtrips
Out of scope:
- per-command flag and JSON contract coverage
- store-level race conditions
- implicit skill triggering without
$inbox
Relationship To Other Test Docs
- ../inbox/ covers CLI command behavior
- this directory covers skill-guided multi-agent behavior on top of that CLI