Files
ai-workflow-skill/docs/architecture.md
T

6.5 KiB

Agent Coordination Architecture

Purpose

This document defines the system split between the worker-facing inbox layer and the leader-facing orch layer.

The design target is a local, file-portable agent coordination stack:

  • inbox: durable communication bus
  • orch: task graph and scheduling control plane
  • worktree-backed task execution for code-writing workers
  • optional user-facing council review workflow on top of orch
  • shared SQLite database file
  • leader and workers coordinated through stable CLI commands

Why Two Layers

inbox and orch solve different problems.

  • inbox answers: how do agents exchange durable messages, claim work, report progress, and return results?
  • orch answers: what work exists, which tasks are ready, who should get them, and what happens after a block, failure, or retry?

If inbox is reduced to pure chat storage, the scheduler must reconstruct state from message history and ownership becomes ambiguous. If inbox tries to become a full scheduler, worker concerns and leader concerns get mixed into one unstable interface.

Role Model

  • user: talks only to the leader
  • leader: owns the overall goal, task graph, acceptance criteria, and final integration
  • worker: executes one assigned task at a time and reports through inbox
  • inbox: durable thread/message/lease/artifact store
  • orch: run/task/dependency/dispatch state machine built on top of inbox

Default Usage Rules

  • The leader should use orch as the default control surface.
  • The leader may use inbox directly for inspection or manual repair.
  • Workers should use inbox only.
  • Workers should not use orch.
  • User-facing discussion stays with the leader.
  • Code-writing workers should run in orch-assigned Git worktrees, not in the user's primary checkout.

Shared Storage Model

Both CLIs should point at the same SQLite file.

  • inbox owns communication tables such as threads, messages, leases, and artifacts.
  • orch owns scheduling tables such as runs, tasks, dependencies, and attempts.
  • both layers append to a shared event stream for blocking waits
  • orch dispatch creates or updates inbox threads.
  • orch reconcile reads inbox state and updates task state.

This preserves a clean boundary while keeping deployment simple.

Worker Execution Model

For code tasks, execution should be isolated from the user's primary checkout.

  • orch dispatch should create a task-attempt worktree
  • the assigned worktree path should be stored in attempt metadata and inbox task payload
  • the worker runtime should execute inside that worktree
  • strict mode should require a committed base revision

See worktree-execution.md for the full lifecycle.

Event-Driven Waiting

The leader does not receive worker messages as an in-memory push. Workers write state into inbox, and the leader must read it back through CLI commands.

The intended solution is event-driven blocking waits, not ad hoc sleep loops.

  • leaders should use orch wait
  • blocked workers should use inbox wait-reply
  • low-level polling may still exist internally, but it should be hidden inside the CLI

This means there is still one logical leader. The extra behavior is a blocking wait primitive, not a second leader.

Shared Event Stream

To support blocking waits cleanly, both layers should append rows to a shared events table.

Typical emitters:

  • inbox: claim, progress, blocked, answer, done, fail, cancel
  • orch: dispatch, answer, retry, reassign, cancel, reconcile-driven task state changes

Typical consumers:

  • orch wait: watches run-scoped task events for the leader
  • inbox wait-reply: watches thread-scoped reply events for a blocked worker

Every waiter should use a monotonic cursor such as event_id or message_id, so it can resume safely without reprocessing old events.

The recommended v1 shape is:

  • inbox binary for communication primitives
  • orch binary for leader-side planning and scheduling
  • one shared --db PATH

If packaging later favors a single binary, the same model can be exposed as command groups:

  • agentctl inbox ...
  • agentctl orch ...

Responsibility Split

inbox should own:

  • directed messages
  • durable threads
  • worker claiming and leases
  • progress, blocked, result, and failure events
  • artifact references
  • thread history and watch functionality
  • thread-scoped waiting for replies

orch should own:

  • runs
  • task graph and dependencies
  • ready queue calculation
  • dispatch decisions
  • task-attempt worktree allocation
  • blocked queue review for the leader
  • retries, reassignment, and cancellation
  • mapping task attempts to inbox threads
  • run-scoped waiting for actionable events
  • reusable higher-level workflows such as council review

What Not To Mix

Do not put these into inbox:

  • dependency graph logic
  • automatic worker selection policy
  • retry policy
  • acceptance-driven task completion logic

Do not put these into orch:

  • worker claiming
  • low-level message append/reply primitives
  • raw thread history storage

Reading Order

Skills

The intended skill split mirrors the CLI split.

  • inbox skill: used when an agent needs to fetch work, claim a thread, send progress, ask blocked questions, reply, or return results through inbox
  • orchestrator skill: used when the leader needs to create runs, decompose tasks, manage dependencies, dispatch ready work, inspect blocks, answer them, retry failures, or reassign work through orch
  • council-review skill: used when the user explicitly wants a structured three-reviewer brainstorm or review with grouped and tallied recommendations