72 lines
2.2 KiB
Markdown
72 lines
2.2 KiB
Markdown
# Verify Downgrade After File Change Through Bundled CLI
|
|
|
|
## Test Type
|
|
|
|
- forward skill execution
|
|
|
|
## Purpose
|
|
|
|
- validate that a single agent can use `skills/repo-memory/` to record
|
|
confirmed knowledge with a hard file dependency, change that file, run
|
|
`verify`, and observe the expected `needs_review` downgrade
|
|
|
|
## Preconditions
|
|
|
|
- `skills/repo-memory/assets/repo-memory` exists and is executable
|
|
- the test runner can create a temporary Git repository fixture
|
|
- the repository fixture contains one evidence file committed in Git before the
|
|
agent starts
|
|
- the test runner can modify the evidence file before or during the scenario
|
|
|
|
## Inputs
|
|
|
|
- `SKILL_PATH=/.../skills/repo-memory`
|
|
- `TMPDIR=/tmp/...`
|
|
- `DB_PATH=TMPDIR/repo-memory.db`
|
|
- `REPO_PATH=TMPDIR/repo-fixture`
|
|
- `EVIDENCE_PATH=REPO_PATH/foo.txt`
|
|
|
|
## Execution Parameters
|
|
|
|
- one agent only
|
|
- per-agent timeout: `3m`
|
|
- overall timeout: `4m`
|
|
|
|
## Execution Steps
|
|
|
|
1. Create a temporary Git repository fixture under `REPO_PATH`.
|
|
2. Commit one evidence file at `EVIDENCE_PATH`.
|
|
3. Ask the agent to use `$repo-memory` against `DB_PATH`.
|
|
4. Have the agent add one `confirmed` entry that depends on `EVIDENCE_PATH`.
|
|
5. Mutate `EVIDENCE_PATH` after the entry is recorded.
|
|
6. Have the agent run `verify`, then inspect the result with `list` and
|
|
`events`.
|
|
7. Capture the agent summary and the final entry status it reports.
|
|
|
|
## Validation Commands
|
|
|
|
Run these from the main thread after the agent stops:
|
|
|
|
```bash
|
|
SKILL_PATH/assets/repo-memory verify --db DB_PATH --repo REPO_PATH
|
|
SKILL_PATH/assets/repo-memory list --db DB_PATH --repo REPO_PATH --status needs_review
|
|
SKILL_PATH/assets/repo-memory events --db DB_PATH --id 1
|
|
```
|
|
|
|
## Expected Outcomes
|
|
|
|
- `verify` reports one downgraded entry
|
|
- `list` returns the target entry in `needs_review`
|
|
- `events` includes a `downgraded` event for the target entry
|
|
|
|
## Assertions
|
|
|
|
- the agent used the bundled CLI for both the write and the verification flow
|
|
- the downgrade reason is driven by real repository state, not by chat-only reasoning
|
|
- the final state transition is visible both in the current listing and the event history
|
|
|
|
## Cleanup
|
|
|
|
- keep the temporary DB and repo on failure
|
|
- remove temporary artifacts on success only if replay evidence is not needed
|