Files
ai-workflow-skill/docs/architecture.md
T

162 lines
6.4 KiB
Markdown

# Agent Coordination Architecture
## Purpose
This document defines the system split between the worker-facing `inbox` layer and the leader-facing `orch` layer.
The design target is a local, file-portable agent coordination stack:
- `inbox`: durable communication bus
- `orch`: task graph and scheduling control plane
- worktree-backed task execution for code-writing workers
- optional user-facing council review workflow on top of `orch`
- shared SQLite database file
- leader and workers coordinated through stable CLI commands
## Why Two Layers
`inbox` and `orch` solve different problems.
- `inbox` answers: how do agents exchange durable messages, claim work, report progress, and return results?
- `orch` answers: what work exists, which tasks are ready, who should get them, and what happens after a block, failure, or retry?
If `inbox` is reduced to pure chat storage, the scheduler must reconstruct state from message history and ownership becomes ambiguous. If `inbox` tries to become a full scheduler, worker concerns and leader concerns get mixed into one unstable interface.
## Role Model
- `user`: talks only to the leader
- `leader`: owns the overall goal, task graph, acceptance criteria, and final integration
- `worker`: executes one assigned task at a time and reports through `inbox`
- `inbox`: durable thread/message/lease/artifact store
- `orch`: run/task/dependency/dispatch state machine built on top of `inbox`
## Default Usage Rules
- The leader should use `orch` as the default control surface.
- The leader may use `inbox` directly for inspection or manual repair.
- Workers should use `inbox` only.
- Workers should not use `orch`.
- User-facing discussion stays with the leader.
- Code-writing workers should run in `orch`-assigned Git worktrees, not in the user's primary checkout.
## Shared Storage Model
Both CLIs should point at the same SQLite file.
- `inbox` owns communication tables such as threads, messages, leases, and artifacts.
- `orch` owns scheduling tables such as runs, tasks, dependencies, and attempts.
- both layers append to a shared event stream for blocking waits
- `orch dispatch` creates or updates `inbox` threads.
- `orch reconcile` reads `inbox` state and updates task state.
This preserves a clean boundary while keeping deployment simple.
## Worker Execution Model
For code tasks, execution should be isolated from the user's primary checkout.
- `orch dispatch` should create a task-attempt worktree
- the assigned worktree path should be stored in attempt metadata and inbox task payload
- the worker runtime should execute inside that worktree
- strict mode should require a committed base revision
See [worktree-execution.md](/home/kurihada/project/ai-workflow-skill/docs/worktree-execution.md) for the full lifecycle.
## Event-Driven Waiting
The leader does not receive worker messages as an in-memory push. Workers write state into `inbox`, and the leader must read it back through CLI commands.
The intended solution is event-driven blocking waits, not ad hoc `sleep` loops.
- leaders should use `orch wait`
- blocked workers should use `inbox wait-reply`
- low-level polling may still exist internally, but it should be hidden inside the CLI
This means there is still one logical leader. The extra behavior is a blocking wait primitive, not a second leader.
## Shared Event Stream
To support blocking waits cleanly, both layers should append rows to a shared `events` table.
Typical emitters:
- `inbox`: claim, progress, blocked, answer, done, fail, cancel
- `orch`: dispatch, answer, retry, reassign, cancel, reconcile-driven task state changes
Typical consumers:
- `orch wait`: watches run-scoped task events for the leader
- `inbox wait-reply`: watches thread-scoped reply events for a blocked worker
Every waiter should use a monotonic cursor such as `event_id` or `message_id`, so it can resume safely without reprocessing old events.
## Recommended Binary Layout
The recommended v1 shape is:
- `inbox` binary for communication primitives
- `orch` binary for leader-side planning and scheduling
- one shared `--db PATH`
If packaging later favors a single binary, the same model can be exposed as command groups:
- `agentctl inbox ...`
- `agentctl orch ...`
## Responsibility Split
`inbox` should own:
- directed messages
- durable threads
- worker claiming and leases
- progress, blocked, result, and failure events
- artifact references
- thread history and watch functionality
- thread-scoped waiting for replies
`orch` should own:
- runs
- task graph and dependencies
- ready queue calculation
- dispatch decisions
- task-attempt worktree allocation
- blocked queue review for the leader
- retries, reassignment, and cancellation
- mapping task attempts to inbox threads
- run-scoped waiting for actionable events
- reusable higher-level workflows such as council review
## What Not To Mix
Do not put these into `inbox`:
- dependency graph logic
- automatic worker selection policy
- retry policy
- acceptance-driven task completion logic
Do not put these into `orch`:
- worker claiming
- low-level message append/reply primitives
- raw thread history storage
## Reading Order
- [inbox-cli.md](/home/kurihada/project/ai-workflow-skill/docs/inbox-cli.md): worker-facing bus and low-level message protocol
- [orch-cli.md](/home/kurihada/project/ai-workflow-skill/docs/orch-cli.md): leader-facing scheduler and task graph control plane
- [worktree-execution.md](/home/kurihada/project/ai-workflow-skill/docs/worktree-execution.md): strict worktree model for code-writing task attempts
- [council-review.md](/home/kurihada/project/ai-workflow-skill/docs/council-review.md): user-facing three-reviewer brainstorm and voting workflow
- [implementation-roadmap.md](/home/kurihada/project/ai-workflow-skill/docs/implementation-roadmap.md): handoff-oriented implementation order and next steps
- [blog-project-example.md](/home/kurihada/project/ai-workflow-skill/docs/blog-project-example.md): concrete example using both layers
## Skills
The intended skill split mirrors the CLI split.
- `inbox` skill: used when an agent needs to fetch work, claim a thread, send progress, ask blocked questions, reply, or return results through `inbox`
- `orch` skill: used when the leader needs to create runs, decompose tasks, manage dependencies, dispatch ready work, inspect blocks, answer them, retry failures, or reassign work through `orch`
- `council-review` skill: used when the user explicitly wants a structured three-reviewer brainstorm or review with grouped and tallied recommendations