docs: add inbox skill test scenarios

2026-03-19 12:35:05 +08:00
parent 1a9fc4c136
commit 72d7caa552
7 changed files with 568 additions and 0 deletions
@@ -0,0 +1,113 @@
+# Inbox Skill Test Plan
+
+## Purpose
+
+This directory tracks human-readable test plans for the `skills/inbox/` Codex skill bundle.
+
+These documents are not command-contract specs for the `inbox` CLI itself.
+That coverage already lives under [../inbox/](../inbox/).
+
+This directory exists to describe a different test surface:
+
+- whether an agent can actually use the packaged inbox skill
+- whether multiple agents can coordinate through the bundled CLI asset
+- whether a real skill-guided conversation reaches the expected inbox state
+
+## Test Model
+
+- `README.md` is the index for this directory
+- each skill test case lives in its own Markdown file
+- use stable case slugs in filenames
+
+## Shared Execution Contract
+
+Use these defaults unless a case file explicitly overrides them:
+
+- run the scenario with real subagents, not simulated transcripts
+- inject the same skill bundle into every participating agent
+- launch all role agents in parallel when the scenario depends on agent-to-agent timing
+- require every agent to coordinate through the bundled CLI and shared SQLite DB instead of ordinary chat
+- validate the final inbox state independently from the main thread after the agents stop
+
+## Default Timeouts
+
+Use these defaults unless a case file explicitly overrides them:
+
+- per-agent timeout: `3m`
+- overall scenario timeout: `5m`
+- async wait margin for the main thread: `30s`
+
+## Default Failure Conditions
+
+Treat the test as failed if any of the following happens:
+
+- any required agent does not reach a final state before timeout
+- any required inbox command returns a non-success result unless the case expects that failure
+- the final `show` output does not match the expected thread state
+- the expected message sequence or key message bodies do not appear
+- the agents fall back to ordinary chat for critical coordination instead of inbox messages
+
+## Evidence Capture
+
+Collect at least the following artifacts for every run:
+
+- agent final summaries
+- final `show --thread THREAD_ID --json` output
+- at least one independent listing or lookup command such as `list` or `fetch`
+- the temporary DB path and resolved thread id
+
+## Cleanup Policy
+
+Use these defaults unless a case file explicitly overrides them:
+
+- keep the temporary DB and working directory on failure for debugging
+- cleanup the temporary DB and working directory on success only if the caller does not need replay artifacts
+
+## Per-Case Template
+
+Each case file should use this structure:
+
+- `Test Type`
+- `Purpose`
+- `Preconditions`
+- `Agent Topology`
+- `Inputs`
+- `Execution Parameters`
+- `Execution Steps`
+- `Validation Commands`
+- `Expected Outcomes`
+- `Assertions`
+- `Cleanup`
+- `Recorded Example Run` when a real run has already been captured
+
+## Case Files
+
+| Case Slug | File | Coverage Note |
+| --- | --- | --- |
+| `multi-agent-roundtrip-through-bundled-cli` | [multi-agent-roundtrip-through-bundled-cli.md](./multi-agent-roundtrip-through-bundled-cli.md) | validates that two agents can use the bundled inbox skill to complete a blocked question and done result roundtrip |
+| `parallel-workers-claim-conflict-through-bundled-cli` | [parallel-workers-claim-conflict-through-bundled-cli.md](./parallel-workers-claim-conflict-through-bundled-cli.md) | validates that two workers using the skill observe a real `lease_conflict` on the same thread |
+| `blocked-worker-timeout-without-reply-through-bundled-cli` | [blocked-worker-timeout-without-reply-through-bundled-cli.md](./blocked-worker-timeout-without-reply-through-bundled-cli.md) | validates that a blocked worker using the skill receives the expected `wait-reply` timeout outcome when no leader reply arrives |
+| `leader-cancels-claimed-thread-through-bundled-cli` | [leader-cancels-claimed-thread-through-bundled-cli.md](./leader-cancels-claimed-thread-through-bundled-cli.md) | validates that a leader can cancel an actively claimed thread and that both agents observe the cancelled terminal state |
+| `artifact-roundtrip-through-bundled-cli` | [artifact-roundtrip-through-bundled-cli.md](./artifact-roundtrip-through-bundled-cli.md) | validates that bundled CLI usage through the skill preserves body-file and artifact data across task and result messages |
+
+## Scope
+
+In scope:
+
+- explicit `$inbox` skill invocation
+- bundled `./assets/inbox` CLI usage
+- shared SQLite DB coordination between multiple agents
+- end-to-end thread state and message history validation
+- negative-path skill scenarios such as lease conflicts and reply timeouts
+- skill-guided artifact and body-file roundtrips
+
+Out of scope:
+
+- per-command flag and JSON contract coverage
+- store-level race conditions
+- implicit skill triggering without `$inbox`
+
+## Relationship To Other Test Docs
+
+- [../inbox/](../inbox/) covers CLI command behavior
+- this directory covers skill-guided multi-agent behavior on top of that CLI