2.2 KiB
2.2 KiB
Verify Downgrade After File Change Through Bundled CLI
Test Type
- forward skill execution
Purpose
- validate that a single agent can use
skills/repo-memory/to record confirmed knowledge with a hard file dependency, change that file, runverify, and observe the expectedneeds_reviewdowngrade
Preconditions
skills/repo-memory/assets/repo-memoryexists and is executable- the test runner can create a temporary Git repository fixture
- the repository fixture contains one evidence file committed in Git before the agent starts
- the test runner can modify the evidence file before or during the scenario
Inputs
SKILL_PATH=/.../skills/repo-memoryTMPDIR=/tmp/...DB_PATH=TMPDIR/repo-memory.dbREPO_PATH=TMPDIR/repo-fixtureEVIDENCE_PATH=REPO_PATH/foo.txt
Execution Parameters
- one agent only
- per-agent timeout:
3m - overall timeout:
4m
Execution Steps
- Create a temporary Git repository fixture under
REPO_PATH. - Commit one evidence file at
EVIDENCE_PATH. - Ask the agent to use
$repo-memoryagainstDB_PATH. - Have the agent add one
confirmedentry that depends onEVIDENCE_PATH. - Mutate
EVIDENCE_PATHafter the entry is recorded. - Have the agent run
verify, then inspect the result withlistandevents. - Capture the agent summary and the final entry status it reports.
Validation Commands
Run these from the main thread after the agent stops:
SKILL_PATH/assets/repo-memory verify --db DB_PATH --repo REPO_PATH
SKILL_PATH/assets/repo-memory list --db DB_PATH --repo REPO_PATH --status needs_review
SKILL_PATH/assets/repo-memory events --db DB_PATH --id 1
Expected Outcomes
verifyreports one downgraded entrylistreturns the target entry inneeds_revieweventsincludes adowngradedevent for the target entry
Assertions
- the agent used the bundled CLI for both the write and the verification flow
- the downgrade reason is driven by real repository state, not by chat-only reasoning
- the final state transition is visible both in the current listing and the event history
Cleanup
- keep the temporary DB and repo on failure
- remove temporary artifacts on success only if replay evidence is not needed