ai-workflow-skill/docs/tests/inbox-skill/README.md

# Inbox Skill Test Plan

## Purpose

This directory tracks human-readable test plans for the `skills/inbox/` Codex skill bundle.

These documents are not command-contract specs for the `inbox` CLI itself.
That coverage already lives under [../inbox/](../inbox/).

This directory exists to describe a different test surface:

- whether an agent can actually use the packaged inbox skill
- whether multiple agents can coordinate through the bundled CLI asset
- whether a real skill-guided conversation reaches the expected inbox state

## Test Model

- `README.md` is the index for this directory
- each skill test case lives in its own Markdown file
- use stable case slugs in filenames

## Shared Execution Contract

Use these defaults unless a case file explicitly overrides them:

- run the scenario with real subagents, not simulated transcripts
- inject the same skill bundle into every participating agent
- launch all role agents in parallel when the scenario depends on agent-to-agent timing
- require every agent to coordinate through the bundled CLI and shared SQLite DB instead of ordinary chat
- validate the final inbox state independently from the main thread after the agents stop

## Default Timeouts

Use these defaults unless a case file explicitly overrides them:

- per-agent timeout: `3m`
- overall scenario timeout: `5m`
- async wait margin for the main thread: `30s`

## Default Failure Conditions

Treat the test as failed if any of the following happens:

- any required agent does not reach a final state before timeout
- any required inbox command returns a non-success result unless the case expects that failure
- the final `show` output does not match the expected thread state
- the expected message sequence or key message bodies do not appear
- the agents fall back to ordinary chat for critical coordination instead of inbox messages

## Evidence Capture

Collect at least the following artifacts for every run:

- agent final summaries
- final `show --thread THREAD_ID --json` output
- at least one independent listing or lookup command such as `list` or `fetch`
- the temporary DB path and resolved thread id

## Cleanup Policy

Use these defaults unless a case file explicitly overrides them:

- keep the temporary DB and working directory on failure for debugging
- cleanup the temporary DB and working directory on success only if the caller does not need replay artifacts

## Per-Case Template

Each case file should use this structure:

- `Test Type`
- `Purpose`
- `Preconditions`
- `Agent Topology`
- `Inputs`
- `Execution Parameters`
- `Execution Steps`
- `Validation Commands`
- `Expected Outcomes`
- `Assertions`
- `Cleanup`
- `Recorded Example Run` when a real run has already been captured

## Case Files

| Case Slug | File | Coverage Note |
| --- | --- | --- |
| `multi-agent-roundtrip-through-bundled-cli` | [multi-agent-roundtrip-through-bundled-cli.md](./multi-agent-roundtrip-through-bundled-cli.md) | validates that two agents can use the bundled inbox skill to complete a blocked question and done result roundtrip |
| `parallel-workers-claim-conflict-through-bundled-cli` | [parallel-workers-claim-conflict-through-bundled-cli.md](./parallel-workers-claim-conflict-through-bundled-cli.md) | validates that two workers using the skill observe a real `lease_conflict` on the same thread |
| `blocked-worker-timeout-without-reply-through-bundled-cli` | [blocked-worker-timeout-without-reply-through-bundled-cli.md](./blocked-worker-timeout-without-reply-through-bundled-cli.md) | validates that a blocked worker using the skill receives the expected `wait-reply` timeout outcome when no leader reply arrives |
| `leader-cancels-claimed-thread-through-bundled-cli` | [leader-cancels-claimed-thread-through-bundled-cli.md](./leader-cancels-claimed-thread-through-bundled-cli.md) | validates that a leader can cancel an actively claimed thread and that both agents observe the cancelled terminal state |
| `artifact-roundtrip-through-bundled-cli` | [artifact-roundtrip-through-bundled-cli.md](./artifact-roundtrip-through-bundled-cli.md) | validates that bundled CLI usage through the skill preserves body-file and artifact data across task and result messages |

## Scope

In scope:

- explicit `$inbox` skill invocation
- bundled `./assets/inbox` CLI usage
- shared SQLite DB coordination between multiple agents
- end-to-end thread state and message history validation
- negative-path skill scenarios such as lease conflicts and reply timeouts
- skill-guided artifact and body-file roundtrips

Out of scope:

- per-command flag and JSON contract coverage
- store-level race conditions
- implicit skill triggering without `$inbox`

## Relationship To Other Test Docs

- [../inbox/](../inbox/) covers CLI command behavior
- this directory covers skill-guided multi-agent behavior on top of that CLI