Verify Downgrade After File Change Through Bundled CLI

Test Type

forward skill execution

Purpose

validate that a single agent can use skills/repo-memory/ to record confirmed knowledge with a hard file dependency, change that file, run verify, and observe the expected needs_review downgrade

Preconditions

skills/repo-memory/assets/repo-memory exists and is executable
the test runner can create a temporary Git repository fixture
the repository fixture contains one evidence file committed in Git before the agent starts
the test runner can modify the evidence file before or during the scenario

Inputs

SKILL_PATH=/.../skills/repo-memory
TMPDIR=/tmp/...
DB_PATH=TMPDIR/repo-memory.db
REPO_PATH=TMPDIR/repo-fixture
EVIDENCE_PATH=REPO_PATH/foo.txt

Execution Parameters

one agent only
per-agent timeout: 3m
overall timeout: 4m

Execution Steps

Create a temporary Git repository fixture under REPO_PATH.
Commit one evidence file at EVIDENCE_PATH.
Ask the agent to use $repo-memory against DB_PATH.
Have the agent add one confirmed entry that depends on EVIDENCE_PATH.
Mutate EVIDENCE_PATH after the entry is recorded.
Have the agent run verify, then inspect the result with list and events.
Capture the agent summary and the final entry status it reports.

Validation Commands

Run these from the main thread after the agent stops:

SKILL_PATH/assets/repo-memory verify --db DB_PATH --repo REPO_PATH
SKILL_PATH/assets/repo-memory list --db DB_PATH --repo REPO_PATH --status needs_review
SKILL_PATH/assets/repo-memory events --db DB_PATH --id 1

Expected Outcomes

verify reports one downgraded entry
list returns the target entry in needs_review
events includes a downgraded event for the target entry

Assertions

the agent used the bundled CLI for both the write and the verification flow
the downgrade reason is driven by real repository state, not by chat-only reasoning
the final state transition is visible both in the current listing and the event history

Cleanup

keep the temporary DB and repo on failure
remove temporary artifacts on success only if replay evidence is not needed

2.2 KiB Raw Blame History

Verify Downgrade After File Change Through Bundled CLI

Test Type

Purpose

Preconditions

Inputs

Execution Parameters

Execution Steps

Validation Commands

Expected Outcomes

Assertions

Cleanup

2.2 KiB

Raw Blame History