800db360b3
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
179 lines
7.0 KiB
Markdown
179 lines
7.0 KiB
Markdown
# Harness Engineering
|
|
|
|
Cadence UI already has good validation primitives. Harness engineering makes those primitives
|
|
agent-usable by turning the repo into a system that can explain itself, accept explicit execution
|
|
plans, and expose repeatable machine-runnable feedback loops.
|
|
|
|
## What it means in this repo
|
|
|
|
For Cadence UI, harness engineering means:
|
|
|
|
- the repository has clear system-of-record files for architecture, contracts, and release rules
|
|
- non-trivial work starts with an execution plan checked into git
|
|
- validation is exposed as stable suites that humans, agents, and CI can run the same way
|
|
- known gaps are recorded explicitly instead of being rediscovered by every new task
|
|
|
|
This follows the direction described in OpenAI's harness engineering guidance: repositories should
|
|
be more legible to agents, use explicit execution plans, and prefer faster feedback loops over
|
|
purely ad hoc prompting.
|
|
|
|
## System Of Record
|
|
|
|
Agents and contributors should treat these files as the repository's baseline knowledge:
|
|
|
|
- `DESIGN.md`: active visual language, dynamic color direction, and motion rules
|
|
- `README.md`: repo purpose, workspace layout, distribution modes, and QA surface
|
|
- `CONTRIBUTING.md`: component contract, review expectations, and definition of done
|
|
- `roadmap.md`: current system direction and planned component work
|
|
- `packages/ui/src/lib/contracts.ts`: public authoring contract for components
|
|
- `apps/docs/src/component-authoring.stories.tsx`: review surface for authoring rules
|
|
- `docs/registry.md`: source-copy registry contract
|
|
- `docs/releasing.md`: package release contract
|
|
- `docs/rfcs/*`: design decisions that should not be silently bypassed
|
|
- `AGENTS.md`: agent operating mode for this repository
|
|
|
|
## Execution Plans
|
|
|
|
Non-trivial changes should start with an execution plan under `docs/exec-plans/`.
|
|
|
|
Use an execution plan when the work:
|
|
|
|
- touches multiple repo surfaces such as `packages/ui`, `apps/docs`, `tests`, or `registry`
|
|
- changes public component contracts or release behavior
|
|
- introduces new dependencies, workflows, or automation
|
|
- is large enough that another engineer or agent may need to resume it later
|
|
|
|
Plans should state:
|
|
|
|
- the problem or goal
|
|
- constraints and non-goals
|
|
- affected surfaces and likely files
|
|
- validation suites to run
|
|
- a status log with concrete checkpoints
|
|
|
|
## Validation Suites
|
|
|
|
Harness validation is exposed through `scripts/harness/validate.mjs` and root `pnpm` scripts.
|
|
|
|
Primary suites:
|
|
|
|
- `pnpm harness:validate:static`
|
|
- lint and workspace typecheck for general repo changes
|
|
- `pnpm harness:validate:component`
|
|
- lint, typecheck, and unit coverage for normal component work
|
|
- `pnpm harness:validate:docs`
|
|
- Storybook build for docs-surface changes
|
|
- `pnpm harness:validate:a11y`
|
|
- curated Storybook accessibility validation from the shared browser coverage contract
|
|
- `pnpm harness:validate:docs-smoke`
|
|
- Playwright smoke coverage for high-value Storybook flows
|
|
- `pnpm harness:validate:consumers`
|
|
- registry metadata plus consumer smoke validation
|
|
- `pnpm harness:validate:pr`
|
|
- baseline pull request gate for packages, docs build, and consumer surfaces
|
|
- `pnpm harness:validate:release`
|
|
- full release gate, including browser-driven smoke coverage
|
|
- `pnpm harness:validate:changed`
|
|
- selects suites from git diff or working tree changes before validating
|
|
|
|
Each run writes a JSON report to `.artifacts/harness/<suite>.json`.
|
|
|
|
GitHub Actions uploads the generated harness, a11y, and browser test artifacts from `.artifacts/`
|
|
so CI failures stay debuggable after the run finishes.
|
|
|
|
## Working Loop
|
|
|
|
Recommended change loop:
|
|
|
|
1. Read the relevant system-of-record files.
|
|
2. Create or update an execution plan when the change is non-trivial.
|
|
3. Modify the smallest surface that can prove the change.
|
|
4. Run the narrowest useful harness suite first.
|
|
5. Escalate to broader suites before merge.
|
|
6. Record any skipped checks or known failures in the execution plan or PR.
|
|
|
|
## Diff-aware Selection
|
|
|
|
Harness selection is exposed through `pnpm harness:select` and `pnpm harness:validate:changed`.
|
|
|
|
- `pnpm harness:select`
|
|
- reads the current working tree diff by default
|
|
- `pnpm harness:select -- --from <ref> --to <ref>`
|
|
- selects suites from an explicit git range
|
|
- `pnpm harness:validate:changed`
|
|
- runs the selected suites with a JSON report
|
|
|
|
The CI workflow uses `pnpm harness:validate:changed` as its execution entrypoint. Selection is no
|
|
longer advisory-only: harness-risk and release-risk surfaces escalate directly to the broader `pr`
|
|
or `release` gates instead of falling back to `static`.
|
|
|
|
Selection intentionally maps repo surfaces to validation surfaces:
|
|
|
|
- package source changes select `component`, `docs`, `docs-smoke`, and `consumers`
|
|
- docs/story changes select `static`, `docs`, and `docs-smoke`
|
|
- registry/consumer changes select `static` and `consumers`
|
|
- doc-only or metadata-only changes may select no suites
|
|
|
|
High-risk control-plane changes override those narrow mappings:
|
|
|
|
- `scripts/harness/*` and other harness-control files escalate to `pr`
|
|
- release workflows, release docs, registry docs, and consumer/release scripts escalate to `release`
|
|
|
|
## Browser Coverage Contract
|
|
|
|
Curated browser coverage lives in `tests/e2e/support/story-harness-contract.json`.
|
|
|
|
That contract is the shared source of truth for:
|
|
|
|
- `pnpm harness:validate:a11y`
|
|
- `pnpm harness:validate:docs-smoke`
|
|
|
|
Each entry records the story id, why it is covered, which suites own it, and any required
|
|
interaction scenario keys. When a new high-value review surface should participate in browser or
|
|
accessibility validation, update this contract instead of adding one-off story lists in multiple
|
|
scripts.
|
|
|
|
## Worktree Orchestration
|
|
|
|
Cadence UI also exposes an orchestration wrapper:
|
|
|
|
- `pnpm harness:orch -- <orch command>`
|
|
|
|
The wrapper keeps orchestration state under `.artifacts/orch/` and applies worktree defaults for
|
|
dispatch. Details live in [docs/orchestration.md](/Users/xd/project/cadence-ui/docs/orchestration.md).
|
|
|
|
Use `pnpm harness:orch -- doctor` to verify binary discovery and plan parsing on a new machine, and
|
|
prefer `dispatch --plan-file <plan>` so execution plans remain the source of truth for generated
|
|
task bodies.
|
|
|
|
## Current Rollout Scope
|
|
|
|
This repository is adding harness engineering in phases.
|
|
|
|
Phase 1 established:
|
|
|
|
- a documented harness workflow
|
|
- execution-plan conventions
|
|
- shared validation suites
|
|
- a pull request workflow that runs the repository harness gate
|
|
|
|
Phase 2 established:
|
|
|
|
- change-aware suite selection from git diff
|
|
- a stabilized Storybook smoke harness
|
|
- worktree-oriented orchestration defaults
|
|
|
|
Phase 3 hardens:
|
|
|
|
- authoritative changed-suite execution in CI
|
|
- uploaded CI artifacts for harness, a11y, and browser test output
|
|
- a shared browser coverage contract
|
|
- orchestration preflight and plan-linked dispatch helpers
|
|
- harness self-tests for selection, artifact, and orchestration helpers
|
|
|
|
Future phases can layer on:
|
|
|
|
- richer browser/app harnesses for interactive review
|
|
- deeper plan-to-task automation for orchestration runs
|
|
- stronger safety rails for agent-owned automation
|