# Agent Coordination Architecture ## Purpose This document defines the system split between the worker-facing `inbox` layer and the leader-facing `orch` layer. The design target is a local, file-portable agent coordination stack: - `inbox`: durable communication bus - `orch`: task graph and scheduling control plane - worktree-backed task execution for code-writing workers - optional user-facing council review workflow on top of `orch` - shared SQLite database file - leader and workers coordinated through stable CLI commands ## Why Two Layers `inbox` and `orch` solve different problems. - `inbox` answers: how do agents exchange durable messages, claim work, report progress, and return results? - `orch` answers: what work exists, which tasks are ready, who should get them, and what happens after a block, failure, or retry? If `inbox` is reduced to pure chat storage, the scheduler must reconstruct state from message history and ownership becomes ambiguous. If `inbox` tries to become a full scheduler, worker concerns and leader concerns get mixed into one unstable interface. ## Role Model - `user`: talks only to the leader - `leader`: owns the overall goal, task graph, acceptance criteria, and final integration - `worker`: executes one assigned task at a time and reports through `inbox` - `inbox`: durable thread/message/lease/artifact store - `orch`: run/task/dependency/dispatch state machine built on top of `inbox` ## Default Usage Rules - The leader should use `orch` as the default control surface. - The leader may use `inbox` directly for inspection or manual repair. - Workers should use `inbox` only. - Workers should not use `orch`. - `orch dispatch` creates handoff state, not execution. Leaders still need a separate worker runtime or worker agent to consume the assigned inbox thread. - User-facing discussion stays with the leader. - Code-writing workers should run in `orch`-assigned Git worktrees, not in the user's primary checkout. ## Shared Storage Model Both CLIs should point at the same SQLite file. - `inbox` owns communication tables such as threads, messages, leases, and artifacts. - `orch` owns scheduling tables such as runs, tasks, dependencies, and attempts. - both layers append to a shared event stream for blocking waits - `orch dispatch` creates or updates `inbox` threads. - `orch reconcile` reads `inbox` state and updates task state. This preserves a clean boundary while keeping deployment simple. ## Optional Codex Launch Bridge Some environments may layer an execution bridge on top of `orch`. Recommended shape: - `orch dispatch --json` creates the durable handoff state - a leader-side Codex bridge reads the dispatch result - that bridge may spawn a worker sub-agent and pass it the mapped `thread_id`, `assigned_to`, and any `worktree_path` - the worker still reports only through `inbox` This bridge belongs above the CLI layer. It should not be implemented as core `orch` runtime behavior because worker launch is host-specific while run and attempt state are meant to stay portable. ## Worker Execution Model For code tasks, execution should be isolated from the user's primary checkout. - `orch dispatch` should create a task-attempt worktree - the assigned worktree path should be stored in attempt metadata and inbox task payload - the worker runtime should execute inside that worktree - strict mode should require a committed base revision - non-code tasks may stay on a thread-only dispatch path with no worktree, but they still require a separate worker runtime to claim the inbox thread See [worktree-execution.md](/home/kurihada/project/ai-workflow-skill/docs/worktree-execution.md) for the full lifecycle. ## Event-Driven Waiting The leader does not receive worker messages as an in-memory push. Workers write state into `inbox`, and the leader must read it back through CLI commands. The intended solution is event-driven blocking waits, not ad hoc `sleep` loops. - leaders should use `orch wait` - blocked workers should use `inbox wait-reply` - low-level polling may still exist internally, but it should be hidden inside the CLI This means there is still one logical leader. The extra behavior is a blocking wait primitive, not a second leader. ## Shared Event Stream To support blocking waits cleanly, both layers should append rows to a shared `events` table. Typical emitters: - `inbox`: claim, progress, blocked, answer, done, fail, cancel - `orch`: dispatch, answer, retry, reassign, cancel, reconcile-driven task state changes Typical consumers: - `orch wait`: watches run-scoped task events for the leader - `inbox wait-reply`: watches thread-scoped reply events for a blocked worker Every waiter should use a monotonic cursor such as `event_id` or `message_id`, so it can resume safely without reprocessing old events. ## Recommended Binary Layout The recommended v1 shape is: - `inbox` binary for communication primitives - `orch` binary for leader-side planning and scheduling - one shared `--db PATH` If packaging later favors a single binary, the same model can be exposed as command groups: - `agentctl inbox ...` - `agentctl orch ...` ## Responsibility Split `inbox` should own: - directed messages - durable threads - worker claiming and leases - progress, blocked, result, and failure events - artifact references - thread history and watch functionality - thread-scoped waiting for replies `orch` should own: - runs - task graph and dependencies - ready queue calculation - dispatch decisions - task-attempt worktree allocation - blocked queue review for the leader - retries, reassignment, and cancellation - mapping task attempts to inbox threads - run-scoped waiting for actionable events - reusable higher-level workflows such as council review ## What Not To Mix Do not put these into `inbox`: - dependency graph logic - automatic worker selection policy - retry policy - acceptance-driven task completion logic Do not put these into `orch`: - worker claiming - low-level message append/reply primitives - raw thread history storage ## Reading Order - [inbox-cli.md](/home/kurihada/project/ai-workflow-skill/docs/inbox-cli.md): worker-facing bus and low-level message protocol - [orch-cli.md](/home/kurihada/project/ai-workflow-skill/docs/orch-cli.md): leader-facing scheduler and task graph control plane - [worktree-execution.md](/home/kurihada/project/ai-workflow-skill/docs/worktree-execution.md): strict worktree model for code-writing task attempts - [council-review.md](/home/kurihada/project/ai-workflow-skill/docs/council-review.md): user-facing three-reviewer brainstorm and voting workflow - [skill-workspace-monorepo.md](/home/kurihada/project/ai-workflow-skill/docs/skill-workspace-monorepo.md): repository structure, package ownership, and skill workspace layout ## Skills The intended skill split mirrors the CLI split. - `inbox` skill: used when an agent needs to fetch work, claim a thread, send progress, ask blocked questions, reply, or return results through `inbox` - `orch` skill: used when the leader needs to create runs, decompose tasks, manage dependencies, dispatch ready work, inspect blocks, answer them, retry failures, or reassign work through `orch`; it is not itself the worker launcher - `orch` skill may include helper assets for leader-side launch bridges, but the durable source of truth for scheduling remains the `orch` CLI and shared SQLite state - `council-review` skill: used when the user explicitly wants a structured three-reviewer brainstorm or review with grouped and tallied recommendations