136 lines
5.7 KiB
Markdown
136 lines
5.7 KiB
Markdown
# Repo Memory Skill Test Plan
|
|
|
|
## Purpose
|
|
|
|
This directory tracks human-readable test plans for the `skills/repo-memory/`
|
|
Codex skill bundle.
|
|
|
|
These documents are not direct CLI command-contract specs for `repo-memory`.
|
|
That coverage now lives under [../repo-memory/](../repo-memory/).
|
|
|
|
These documents are also not package-level unit tests for the runtime.
|
|
Those live under `packages/repo-memory-runtime/`.
|
|
|
|
This directory covers a different surface:
|
|
|
|
- whether an agent can actually use the packaged `repo-memory` skill
|
|
- whether the bundled `./assets/repo-memory` CLI works inside real skill-guided
|
|
repository work
|
|
- whether durable repository knowledge is stored and retrieved correctly
|
|
|
|
## Test Model
|
|
|
|
- `README.md` is the index for this directory
|
|
- each skill test case lives in its own Markdown file
|
|
- use stable case slugs in filenames
|
|
|
|
## Shared Execution Contract
|
|
|
|
Use these defaults unless a case file explicitly overrides them:
|
|
|
|
- run the scenario with one real agent using the bundled `repo-memory` skill
|
|
- create an isolated temporary directory, repository fixture, and SQLite DB path
|
|
- require the agent to use the bundled `./assets/repo-memory` CLI instead of ad hoc
|
|
notes
|
|
- validate final database state independently from the main thread after the
|
|
agent stops
|
|
|
|
## How An Agent Runs These Cases
|
|
|
|
Use one test-runner agent to execute each case.
|
|
|
|
The test-runner agent is responsible for:
|
|
|
|
- reading this `README.md` first, then one specific case file
|
|
- creating an isolated temporary directory, repository fixture, and SQLite DB path
|
|
- injecting `skills/repo-memory/` into the role agent
|
|
- passing the concrete `SKILL_PATH`, `TMPDIR`, `DB_PATH`, and `REPO_PATH` values from the case file
|
|
- requiring the role agent to use the bundled `./assets/repo-memory` CLI instead of free-form notes
|
|
- collecting the role agent final summary as evidence
|
|
- running the case `Validation Commands` from the main thread after the role agent stops
|
|
- comparing the observed results against `Expected Outcomes` and `Assertions`
|
|
|
|
The role agent is responsible for:
|
|
|
|
- acting only within the case scope
|
|
- using the injected `repo-memory` skill rather than ad hoc repository discovery
|
|
- coordinating through the bundled CLI and SQLite DB
|
|
- reporting concrete keys, entry ids, and final observed state back to the test-runner agent
|
|
|
|
## Default Timeouts
|
|
|
|
Use these defaults unless a case file explicitly overrides them:
|
|
|
|
- per-agent timeout: `3m`
|
|
- overall scenario timeout: `4m`
|
|
|
|
## Default Failure Conditions
|
|
|
|
Treat the test as failed if any of the following happens:
|
|
|
|
- the role agent does not reach a final state before timeout
|
|
- a required bundled CLI command returns a non-success result unless the case expects that failure
|
|
- the final repo-memory DB state conflicts with the documented assertions
|
|
- the role agent falls back to free-form notes for durable knowledge that should go through the bundled CLI
|
|
|
|
## Evidence Capture
|
|
|
|
Collect at least the following artifacts for every run:
|
|
|
|
- the role agent final summary
|
|
- the temporary DB path and repository path
|
|
- the outputs of the case `Validation Commands`
|
|
- any resolved entry ids, keys, or relation rows needed to verify the case
|
|
|
|
## Cleanup Policy
|
|
|
|
Use these defaults unless a case file explicitly overrides them:
|
|
|
|
- keep the temporary DB and repo fixture on failure for debugging
|
|
- cleanup on success only if replay artifacts are not needed
|
|
|
|
## Per-Case Template
|
|
|
|
Each case file should use this structure:
|
|
|
|
- `Test Type`
|
|
- `Purpose`
|
|
- `Preconditions`
|
|
- `Inputs`
|
|
- `Execution Parameters`
|
|
- `Execution Steps`
|
|
- `Validation Commands`
|
|
- `Expected Outcomes`
|
|
- `Assertions`
|
|
- `Cleanup`
|
|
- `Recorded Example Run` when a real run has already been captured
|
|
|
|
## Case Files
|
|
|
|
| Case Slug | File | Coverage Note |
|
|
| --- | --- | --- |
|
|
| `search-and-add-through-bundled-cli` | [search-and-add-through-bundled-cli.md](./search-and-add-through-bundled-cli.md) | validates that an agent can miss on search, add one durable entry, then retrieve it through the packaged `repo-memory` skill |
|
|
| `ingest-and-search-through-bundled-cli` | [ingest-and-search-through-bundled-cli.md](./ingest-and-search-through-bundled-cli.md) | validates that an agent can ingest `docs/ai` markdown through the bundled CLI and then retrieve imported knowledge through search and list |
|
|
| `verify-downgrade-after-file-change-through-bundled-cli` | [verify-downgrade-after-file-change-through-bundled-cli.md](./verify-downgrade-after-file-change-through-bundled-cli.md) | validates that an agent can record confirmed knowledge, mutate the tracked file, run verify, and observe a `needs_review` downgrade |
|
|
| `verify-stale-missing-hard-dependency-through-bundled-cli` | [verify-stale-missing-hard-dependency-through-bundled-cli.md](./verify-stale-missing-hard-dependency-through-bundled-cli.md) | validates that an agent can detect a missing hard dependency through `verify` and observe a `stale` result |
|
|
| `link-two-entries-through-bundled-cli` | [link-two-entries-through-bundled-cli.md](./link-two-entries-through-bundled-cli.md) | validates that an agent can add two entries, link them, and leave a durable relation in the packaged repo-memory database |
|
|
|
|
## Scope
|
|
|
|
In scope:
|
|
|
|
- explicit `$repo-memory` skill invocation
|
|
- bundled `./assets/repo-memory` CLI usage
|
|
- durable knowledge add/search/list/event flows
|
|
- markdown ingest through `docs/ai`
|
|
- verify downgrade and stale transitions
|
|
- entry relation/link flows
|
|
- package-backed SQLite memory database behavior as surfaced through the skill
|
|
|
|
Out of scope:
|
|
|
|
- direct CLI contract coverage that now belongs under [../repo-memory/](../repo-memory/)
|
|
- package-level unit tests for `packages/repo-memory-runtime`
|
|
- future auto-export flows such as `repo-brief` generation
|
|
- implicit skill triggering without `$repo-memory`
|