# Repo Memory Skill Test Plan ## Purpose This directory tracks human-readable test plans for the `skills/repo-memory/` Codex skill bundle. These documents are not direct CLI command-contract specs for `repo-memory`. That coverage now lives under [../repo-memory/](../repo-memory/). These documents are also not package-level unit tests for the runtime. Those live under `packages/repo-memory-runtime/`. This directory covers a different surface: - whether an agent can actually use the packaged `repo-memory` skill - whether the bundled `./assets/repo-memory` CLI works inside real skill-guided repository work - whether durable repository knowledge is stored and retrieved correctly ## Test Model - `README.md` is the index for this directory - each skill test case lives in its own Markdown file - use stable case slugs in filenames ## Shared Execution Contract Use these defaults unless a case file explicitly overrides them: - run the scenario with one real agent using the bundled `repo-memory` skill - create an isolated temporary directory, repository fixture, and SQLite DB path - require the agent to use the bundled `./assets/repo-memory` CLI instead of ad hoc notes - validate final database state independently from the main thread after the agent stops ## How An Agent Runs These Cases Use one test-runner agent to execute each case. The test-runner agent is responsible for: - reading this `README.md` first, then one specific case file - creating an isolated temporary directory, repository fixture, and SQLite DB path - injecting `skills/repo-memory/` into the role agent - passing the concrete `SKILL_PATH`, `TMPDIR`, `DB_PATH`, and `REPO_PATH` values from the case file - requiring the role agent to use the bundled `./assets/repo-memory` CLI instead of free-form notes - collecting the role agent final summary as evidence - running the case `Validation Commands` from the main thread after the role agent stops - comparing the observed results against `Expected Outcomes` and `Assertions` The role agent is responsible for: - acting only within the case scope - using the injected `repo-memory` skill rather than ad hoc repository discovery - coordinating through the bundled CLI and SQLite DB - reporting concrete keys, entry ids, and final observed state back to the test-runner agent ## Default Timeouts Use these defaults unless a case file explicitly overrides them: - per-agent timeout: `3m` - overall scenario timeout: `4m` ## Default Failure Conditions Treat the test as failed if any of the following happens: - the role agent does not reach a final state before timeout - a required bundled CLI command returns a non-success result unless the case expects that failure - the final repo-memory DB state conflicts with the documented assertions - the role agent falls back to free-form notes for durable knowledge that should go through the bundled CLI ## Evidence Capture Collect at least the following artifacts for every run: - the role agent final summary - the temporary DB path and repository path - the outputs of the case `Validation Commands` - any resolved entry ids, keys, or relation rows needed to verify the case ## Cleanup Policy Use these defaults unless a case file explicitly overrides them: - keep the temporary DB and repo fixture on failure for debugging - cleanup on success only if replay artifacts are not needed ## Per-Case Template Each case file should use this structure: - `Test Type` - `Purpose` - `Preconditions` - `Inputs` - `Execution Parameters` - `Execution Steps` - `Validation Commands` - `Expected Outcomes` - `Assertions` - `Cleanup` - `Recorded Example Run` when a real run has already been captured ## Case Files | Case Slug | File | Coverage Note | | --- | --- | --- | | `search-and-add-through-bundled-cli` | [search-and-add-through-bundled-cli.md](./search-and-add-through-bundled-cli.md) | validates that an agent can miss on search, add one durable entry, then retrieve it through the packaged `repo-memory` skill | | `ingest-and-search-through-bundled-cli` | [ingest-and-search-through-bundled-cli.md](./ingest-and-search-through-bundled-cli.md) | validates that an agent can ingest `docs/ai` markdown through the bundled CLI and then retrieve imported knowledge through search and list | | `verify-downgrade-after-file-change-through-bundled-cli` | [verify-downgrade-after-file-change-through-bundled-cli.md](./verify-downgrade-after-file-change-through-bundled-cli.md) | validates that an agent can record confirmed knowledge, mutate the tracked file, run verify, and observe a `needs_review` downgrade | | `verify-stale-missing-hard-dependency-through-bundled-cli` | [verify-stale-missing-hard-dependency-through-bundled-cli.md](./verify-stale-missing-hard-dependency-through-bundled-cli.md) | validates that an agent can detect a missing hard dependency through `verify` and observe a `stale` result | | `link-two-entries-through-bundled-cli` | [link-two-entries-through-bundled-cli.md](./link-two-entries-through-bundled-cli.md) | validates that an agent can add two entries, link them, and leave a durable relation in the packaged repo-memory database | ## Scope In scope: - explicit `$repo-memory` skill invocation - bundled `./assets/repo-memory` CLI usage - durable knowledge add/search/list/event flows - markdown ingest through `docs/ai` - verify downgrade and stale transitions - entry relation/link flows - package-backed SQLite memory database behavior as surfaced through the skill Out of scope: - direct CLI contract coverage that now belongs under [../repo-memory/](../repo-memory/) - package-level unit tests for `packages/repo-memory-runtime` - future auto-export flows such as `repo-brief` generation - implicit skill triggering without `$repo-memory`