diff --git a/docs/tests/repo-memory-skill/README.md b/docs/tests/repo-memory-skill/README.md index 98e5c0f..47be397 100644 --- a/docs/tests/repo-memory-skill/README.md +++ b/docs/tests/repo-memory-skill/README.md @@ -35,6 +35,60 @@ Use these defaults unless a case file explicitly overrides them: - validate final database state independently from the main thread after the agent stops +## How An Agent Runs These Cases + +Use one test-runner agent to execute each case. + +The test-runner agent is responsible for: + +- reading this `README.md` first, then one specific case file +- creating an isolated temporary directory, repository fixture, and SQLite DB path +- injecting `skills/repo-memory/` into the role agent +- passing the concrete `SKILL_PATH`, `TMPDIR`, `DB_PATH`, and `REPO_PATH` values from the case file +- requiring the role agent to use the bundled `./assets/repo-memory` CLI instead of free-form notes +- collecting the role agent final summary as evidence +- running the case `Validation Commands` from the main thread after the role agent stops +- comparing the observed results against `Expected Outcomes` and `Assertions` + +The role agent is responsible for: + +- acting only within the case scope +- using the injected `repo-memory` skill rather than ad hoc repository discovery +- coordinating through the bundled CLI and SQLite DB +- reporting concrete keys, entry ids, and final observed state back to the test-runner agent + +## Default Timeouts + +Use these defaults unless a case file explicitly overrides them: + +- per-agent timeout: `3m` +- overall scenario timeout: `4m` + +## Default Failure Conditions + +Treat the test as failed if any of the following happens: + +- the role agent does not reach a final state before timeout +- a required bundled CLI command returns a non-success result unless the case expects that failure +- the final repo-memory DB state conflicts with the documented assertions +- the role agent falls back to free-form notes for durable knowledge that should go through the bundled CLI + +## Evidence Capture + +Collect at least the following artifacts for every run: + +- the role agent final summary +- the temporary DB path and repository path +- the outputs of the case `Validation Commands` +- any resolved entry ids, keys, or relation rows needed to verify the case + +## Cleanup Policy + +Use these defaults unless a case file explicitly overrides them: + +- keep the temporary DB and repo fixture on failure for debugging +- cleanup on success only if replay artifacts are not needed + ## Per-Case Template Each case file should use this structure: @@ -56,6 +110,10 @@ Each case file should use this structure: | Case Slug | File | Coverage Note | | --- | --- | --- | | `search-and-add-through-bundled-cli` | [search-and-add-through-bundled-cli.md](./search-and-add-through-bundled-cli.md) | validates that an agent can miss on search, add one durable entry, then retrieve it through the packaged `repo-memory` skill | +| `ingest-and-search-through-bundled-cli` | [ingest-and-search-through-bundled-cli.md](./ingest-and-search-through-bundled-cli.md) | validates that an agent can ingest `docs/ai` markdown through the bundled CLI and then retrieve imported knowledge through search and list | +| `verify-downgrade-after-file-change-through-bundled-cli` | [verify-downgrade-after-file-change-through-bundled-cli.md](./verify-downgrade-after-file-change-through-bundled-cli.md) | validates that an agent can record confirmed knowledge, mutate the tracked file, run verify, and observe a `needs_review` downgrade | +| `verify-stale-missing-hard-dependency-through-bundled-cli` | [verify-stale-missing-hard-dependency-through-bundled-cli.md](./verify-stale-missing-hard-dependency-through-bundled-cli.md) | validates that an agent can detect a missing hard dependency through `verify` and observe a `stale` result | +| `link-two-entries-through-bundled-cli` | [link-two-entries-through-bundled-cli.md](./link-two-entries-through-bundled-cli.md) | validates that an agent can add two entries, link them, and leave a durable relation in the packaged repo-memory database | ## Scope @@ -64,6 +122,9 @@ In scope: - explicit `$repo-memory` skill invocation - bundled `./assets/repo-memory` CLI usage - durable knowledge add/search/list/event flows +- markdown ingest through `docs/ai` +- verify downgrade and stale transitions +- entry relation/link flows - package-backed SQLite memory database behavior as surfaced through the skill Out of scope: diff --git a/docs/tests/repo-memory-skill/ingest-and-search-through-bundled-cli.md b/docs/tests/repo-memory-skill/ingest-and-search-through-bundled-cli.md new file mode 100644 index 0000000..6a02d6d --- /dev/null +++ b/docs/tests/repo-memory-skill/ingest-and-search-through-bundled-cli.md @@ -0,0 +1,72 @@ +# Ingest And Search Through Bundled CLI + +## Test Type + +- forward skill execution + +## Purpose + +- validate that a single agent can use `skills/repo-memory/` to ingest + repository-local `docs/ai` markdown through the bundled CLI and retrieve the + imported knowledge afterwards through `search` and `list` + +## Preconditions + +- `skills/repo-memory/assets/repo-memory` exists and is executable +- the test runner can create a temporary Git repository fixture +- the test runner can create a temporary SQLite DB path +- the repository fixture includes one `docs/ai/repo-memory.md` file with at + least `Module Map` and `Danger Zones` sections + +## Inputs + +- `SKILL_PATH=/.../skills/repo-memory` +- `TMPDIR=/tmp/...` +- `DB_PATH=TMPDIR/repo-memory.db` +- `REPO_PATH=TMPDIR/repo-fixture` + +## Execution Parameters + +- one agent only +- per-agent timeout: `3m` +- overall timeout: `4m` + +## Execution Steps + +1. Create a temporary Git repository fixture under `REPO_PATH`. +2. Add `docs/ai/repo-memory.md` with markdown content that describes module and + danger knowledge. +3. Ask the agent to use `$repo-memory` against `DB_PATH`. +4. Have the agent initialize or bootstrap the DB as needed, run `ingest` + against `REPO_PATH`, then use `search` and `list` to confirm the imported + knowledge is visible. +5. Capture the agent summary and the concrete imported entry keys it reports. + +## Validation Commands + +Run these from the main thread after the agent stops: + +```bash +SKILL_PATH/assets/repo-memory ingest --db DB_PATH --repo REPO_PATH +SKILL_PATH/assets/repo-memory search --db DB_PATH --repo REPO_PATH --query "gateway" +SKILL_PATH/assets/repo-memory list --db DB_PATH --repo REPO_PATH +``` + +## Expected Outcomes + +- `ingest` succeeds and reports one imported markdown document +- `search` returns the imported `module` entry for the `Module Map` section +- `list` returns at least one `module` entry and one `danger` entry for the + fixture repo + +## Assertions + +- the agent used the bundled CLI instead of copying markdown into ad hoc notes +- the imported knowledge is attached to the target repo path +- the imported keys match the expected `repo-memory:` style generated from + the markdown sections + +## Cleanup + +- keep the temporary DB and repo on failure +- remove temporary artifacts on success only if replay evidence is not needed diff --git a/docs/tests/repo-memory-skill/link-two-entries-through-bundled-cli.md b/docs/tests/repo-memory-skill/link-two-entries-through-bundled-cli.md new file mode 100644 index 0000000..90fc1ce --- /dev/null +++ b/docs/tests/repo-memory-skill/link-two-entries-through-bundled-cli.md @@ -0,0 +1,69 @@ +# Link Two Entries Through Bundled CLI + +## Test Type + +- forward skill execution + +## Purpose + +- validate that a single agent can use `skills/repo-memory/` to add two durable + knowledge entries, create a relation between them through the bundled CLI, + and leave a durable graph edge in the SQLite database + +## Preconditions + +- `skills/repo-memory/assets/repo-memory` exists and is executable +- the test runner can create a temporary Git repository fixture +- the test runner can create a temporary SQLite DB path +- the repository fixture includes any evidence files needed for the two entries + +## Inputs + +- `SKILL_PATH=/.../skills/repo-memory` +- `TMPDIR=/tmp/...` +- `DB_PATH=TMPDIR/repo-memory.db` +- `REPO_PATH=TMPDIR/repo-fixture` + +## Execution Parameters + +- one agent only +- per-agent timeout: `3m` +- overall timeout: `4m` + +## Execution Steps + +1. Create a temporary Git repository fixture under `REPO_PATH`. +2. Add any files needed to justify two durable knowledge entries. +3. Ask the agent to use `$repo-memory` against `DB_PATH`. +4. Have the agent add one `term` entry and one `chain` entry for the same repo. +5. Have the agent link the first entry to the second with relation + `related_to`. +6. Capture the agent summary and the concrete entry ids it reports. + +## Validation Commands + +Run these from the main thread after the agent stops: + +```bash +SKILL_PATH/assets/repo-memory list --db DB_PATH --repo REPO_PATH +SKILL_PATH/assets/repo-memory events --db DB_PATH --id 1 +SKILL_PATH/assets/repo-memory events --db DB_PATH --id 2 +sqlite3 DB_PATH "SELECT relation FROM knowledge_links WHERE from_entry_id = 1 AND to_entry_id = 2;" +``` + +## Expected Outcomes + +- both `add` calls succeed and leave two queryable entries +- `link` succeeds and reports the relation textually +- the final SQL validation returns one `related_to` row + +## Assertions + +- the agent used the bundled CLI for entry creation and relation creation +- the relation is durable in the packaged SQLite DB, not just mentioned in the summary +- both entries remain independently inspectable through `events` + +## Cleanup + +- keep the temporary DB and repo on failure +- remove temporary artifacts on success only if replay evidence is not needed diff --git a/docs/tests/repo-memory-skill/verify-downgrade-after-file-change-through-bundled-cli.md b/docs/tests/repo-memory-skill/verify-downgrade-after-file-change-through-bundled-cli.md new file mode 100644 index 0000000..0b82180 --- /dev/null +++ b/docs/tests/repo-memory-skill/verify-downgrade-after-file-change-through-bundled-cli.md @@ -0,0 +1,71 @@ +# Verify Downgrade After File Change Through Bundled CLI + +## Test Type + +- forward skill execution + +## Purpose + +- validate that a single agent can use `skills/repo-memory/` to record + confirmed knowledge with a hard file dependency, change that file, run + `verify`, and observe the expected `needs_review` downgrade + +## Preconditions + +- `skills/repo-memory/assets/repo-memory` exists and is executable +- the test runner can create a temporary Git repository fixture +- the repository fixture contains one evidence file committed in Git before the + agent starts +- the test runner can modify the evidence file before or during the scenario + +## Inputs + +- `SKILL_PATH=/.../skills/repo-memory` +- `TMPDIR=/tmp/...` +- `DB_PATH=TMPDIR/repo-memory.db` +- `REPO_PATH=TMPDIR/repo-fixture` +- `EVIDENCE_PATH=REPO_PATH/foo.txt` + +## Execution Parameters + +- one agent only +- per-agent timeout: `3m` +- overall timeout: `4m` + +## Execution Steps + +1. Create a temporary Git repository fixture under `REPO_PATH`. +2. Commit one evidence file at `EVIDENCE_PATH`. +3. Ask the agent to use `$repo-memory` against `DB_PATH`. +4. Have the agent add one `confirmed` entry that depends on `EVIDENCE_PATH`. +5. Mutate `EVIDENCE_PATH` after the entry is recorded. +6. Have the agent run `verify`, then inspect the result with `list` and + `events`. +7. Capture the agent summary and the final entry status it reports. + +## Validation Commands + +Run these from the main thread after the agent stops: + +```bash +SKILL_PATH/assets/repo-memory verify --db DB_PATH --repo REPO_PATH +SKILL_PATH/assets/repo-memory list --db DB_PATH --repo REPO_PATH --status needs_review +SKILL_PATH/assets/repo-memory events --db DB_PATH --id 1 +``` + +## Expected Outcomes + +- `verify` reports one downgraded entry +- `list` returns the target entry in `needs_review` +- `events` includes a `downgraded` event for the target entry + +## Assertions + +- the agent used the bundled CLI for both the write and the verification flow +- the downgrade reason is driven by real repository state, not by chat-only reasoning +- the final state transition is visible both in the current listing and the event history + +## Cleanup + +- keep the temporary DB and repo on failure +- remove temporary artifacts on success only if replay evidence is not needed diff --git a/docs/tests/repo-memory-skill/verify-stale-missing-hard-dependency-through-bundled-cli.md b/docs/tests/repo-memory-skill/verify-stale-missing-hard-dependency-through-bundled-cli.md new file mode 100644 index 0000000..08d0ae3 --- /dev/null +++ b/docs/tests/repo-memory-skill/verify-stale-missing-hard-dependency-through-bundled-cli.md @@ -0,0 +1,70 @@ +# Verify Stale Missing Hard Dependency Through Bundled CLI + +## Test Type + +- forward skill execution + +## Purpose + +- validate that a single agent can use `skills/repo-memory/` to record + confirmed knowledge with a missing hard dependency, run `verify`, and observe + the expected `stale` outcome + +## Preconditions + +- `skills/repo-memory/assets/repo-memory` exists and is executable +- the test runner can create a temporary Git repository fixture +- the repository fixture has a valid Git HEAD before verification starts +- the hard dependency path referenced by the entry does not exist + +## Inputs + +- `SKILL_PATH=/.../skills/repo-memory` +- `TMPDIR=/tmp/...` +- `DB_PATH=TMPDIR/repo-memory.db` +- `REPO_PATH=TMPDIR/repo-fixture` +- `MISSING_PATH=REPO_PATH/missing.txt` + +## Execution Parameters + +- one agent only +- per-agent timeout: `3m` +- overall timeout: `4m` + +## Execution Steps + +1. Create a temporary Git repository fixture under `REPO_PATH` and ensure it + has an initial commit. +2. Ask the agent to use `$repo-memory` against `DB_PATH`. +3. Have the agent add one `confirmed` entry that declares `MISSING_PATH` as a + hard dependency. +4. Have the agent run `verify`, then inspect the result with `list` and + `events`. +5. Capture the agent summary and the final entry status it reports. + +## Validation Commands + +Run these from the main thread after the agent stops: + +```bash +SKILL_PATH/assets/repo-memory verify --db DB_PATH --repo REPO_PATH +SKILL_PATH/assets/repo-memory list --db DB_PATH --repo REPO_PATH --status stale +SKILL_PATH/assets/repo-memory events --db DB_PATH --id 1 +``` + +## Expected Outcomes + +- `verify` reports one stale entry +- `list` returns the target entry in `stale` +- `events` includes a `marked_stale` event for the target entry + +## Assertions + +- the agent used the bundled CLI for the full verify flow +- the stale result is driven by the missing hard dependency, not by a generic command failure +- the final state is visible in both current listing output and event history + +## Cleanup + +- keep the temporary DB and repo on failure +- remove temporary artifacts on success only if replay evidence is not needed diff --git a/skills/repo-memory/assets/repo-memory b/skills/repo-memory/assets/repo-memory index a03b005..9b221e7 100755 Binary files a/skills/repo-memory/assets/repo-memory and b/skills/repo-memory/assets/repo-memory differ