# Millrace full AI reference This file is the long-form, current-state AI-readable reference for millrace.ai. It only describes shipped runtime behavior and current docs-backed surfaces. ## Canonical URLs - https://millrace.ai/ai/millrace-summary.md - https://millrace.ai/ai/what-is-millrace.md - https://millrace.ai/ai/architecture.md - https://millrace.ai/ai/install.md - https://millrace.ai/ai/cli-reference.md - https://millrace.ai/ai/modes-and-loops.md - https://millrace.ai/ai/runner-architecture.md - https://millrace.ai/ai/arbiter-and-completion.md - https://millrace.ai/ai/comparisons.md - https://millrace.ai/ai/faq.md - https://millrace.ai/ai/glossary.md - https://millrace.ai/llms.txt ## Quick framing Millrace is an open-source, filesystem-backed runtime and governance layer for long-running autonomous agent work. It wraps raw harness execution with durable markdown queues, compiled runtime plans, deterministic stage routing, recovery logic, control surfaces, and persisted audit artifacts. --- ## Millrace summary Source URL: https://millrace.ai/ai/millrace-summary.md # Millrace summary Millrace is an open-source, filesystem-backed runtime for long-running autonomous agent work. It is not a coding model, not an IDE, and not a hosted autonomous engineer. It sits around raw agent harnesses and gives them durable queue state, compiled runtime plans, deterministic handoffs, recovery routing, operator control surfaces, and persisted audit artifacts. As shipped today, the importable package namespace is `millrace_ai`, the installed CLI is `millrace`, and the module entrypoint is `python -m millrace_ai`. The runtime bootstraps its working tree under `/millrace-agents/`, where it keeps markdown work queues, runtime snapshot state, compiled plans, mailbox/control files, run artifacts, and Arbiter closure artifacts. The current runtime has two planes: - execution - planning The shipped built-in loop ids are: - `execution.standard` - `planning.standard` The shipped canonical mode ids are: - `default_codex` - `default_pi` The compatibility alias `standard_plain` resolves to `default_codex`. The shipped execution stages are: - `builder` - `checker` - `fixer` - `doublechecker` - `updater` - `troubleshooter` - `consultant` The shipped planning stages are: - `planner` - `manager` - `mechanic` - `auditor` - `arbiter` Millrace keeps canonical work as markdown documents: - tasks: `millrace-agents/tasks/{queue,active,done,blocked}/*.md` - specs: `millrace-agents/specs/{queue,active,done,blocked}/*.md` - incidents: `millrace-agents/incidents/{incoming,active,resolved,blocked}/*.md` It keeps runtime-owned state and compile outputs under `millrace-agents/state/`, including: - `runtime_snapshot.json` - `recovery_counters.json` - `compiled_plan.json` - `compile_diagnostics.json` - `execution_status.md` - `planning_status.md` Each stage run persists artifacts under `millrace-agents/runs//`, including stage results plus runner prompt, invocation, completion, and stdout/stderr artifacts where present. The runner boundary is stable and narrow: - input: `StageRunRequest` - output: `RunnerRawResult` The shipped built-in adapters are: - `codex_cli` - `pi_rpc` Use Millrace when the work must survive pauses, crashes, context loss, retry loops, closure audits, or multi-stage routing. Do not use Millrace for tiny one-shot edits where the runtime overhead would be larger than the work. See also: - `/ai/what-is-millrace.md` - `/ai/architecture.md` - `/ai/install.md` - `/ai/cli-reference.md` - `/ai/modes-and-loops.md` - `/ai/runner-architecture.md` - `/ai/arbiter-and-completion.md` - `/ai/comparisons.md` - `/ai/faq.md` - `/ai/glossary.md` --- ## What is Millrace? Source URL: https://millrace.ai/ai/what-is-millrace.md # What is Millrace? Millrace is an open-source, filesystem-backed runtime for long-running autonomous agent work. It surrounds raw agent harnesses with durable structure. The core split is: - a stage agent does one bounded unit of reasoning and emits one legal result - the runtime decides what the next stage is, when it runs, and what state change is authoritative - the workspace keeps durable queue state, compiled runtime structure, recovery context, and run artifacts That makes Millrace useful when an ordinary direct agent session is no longer enough on its own because the work is multi-stage, interruption-prone, recovery-sensitive, or closure-sensitive. ## What Millrace is not Millrace is not any of the following: - a coding model - an IDE - a hosted autonomous engineer product - a general chat workflow - a replacement for Codex CLI, Claude Code, or Aider Those tools are direct harnesses, editors, or managed products. Millrace is the runtime and governance layer that can sit around raw harness execution. ## What Millrace owns Millrace owns runtime authority over: - queue movement between queue, active, done, blocked, incoming, and resolved directories - active-stage identity and runtime snapshot state - recovery counters and stale-state repair context - compiled plan persistence - run directories and stage-result artifacts - closure-target state for Arbiter-driven completion Millrace does not delegate that authority to stage prompts. Stages do not directly mutate authoritative queue state. They emit legal terminal results, and the runtime applies the authoritative state changes after the fact. ## Workspace model Millrace bootstraps its runtime tree under `/millrace-agents/`. Canonical work documents are markdown: - tasks: `millrace-agents/tasks/{queue,active,done,blocked}/*.md` - specs: `millrace-agents/specs/{queue,active,done,blocked}/*.md` - incidents: `millrace-agents/incidents/{incoming,active,resolved,blocked}/*.md` Canonical runtime/state artifacts include: - `millrace-agents/state/runtime_snapshot.json` - `millrace-agents/state/recovery_counters.json` - `millrace-agents/state/compiled_plan.json` - `millrace-agents/state/compile_diagnostics.json` - `millrace-agents/state/execution_status.md` - `millrace-agents/state/planning_status.md` Run artifacts persist under `millrace-agents/runs//`. ## Planes and loops Millrace currently runs two planes: - execution - planning The shipped built-in loop ids are: - `execution.standard` - `planning.standard` The shipped canonical mode ids are: - `default_codex` - `default_pi` The compatibility alias `standard_plain` resolves to `default_codex`. The execution loop stages are: - `builder` - `checker` - `fixer` - `doublechecker` - `updater` - `troubleshooter` - `consultant` The planning loop stages are: - `planner` - `manager` - `mechanic` - `auditor` - `arbiter` ## Runner model Millrace does not make the harness itself the runtime authority. Instead, it freezes runner choice into the compiled plan and dispatches stage runs through a narrow contract: - `StageRunRequest -> RunnerRawResult` As shipped, the built-in adapters are: - `codex_cli` - `pi_rpc` `default_codex` binds every shipped stage to `codex_cli`. `default_pi` binds every shipped stage to `pi_rpc`. `standard_plain` is only an alias for `default_codex`. ## When to use Millrace Use Millrace when any of the following matter: - the work must survive pauses, crashes, or context loss - durable queue state matters - stage progression should be runtime-governed instead of conversational - recovery routing matters more than one-shot speed - you need persisted run artifacts, runtime snapshots, or diagnosable failure surfaces - closure should be based on runtime criteria rather than "the agent said it was done" Good fits include: - long-running implementation work that will outlast one session - planning-to-execution flows that need durable decomposition and auditability - repair-sensitive work where blockage should route into `mechanic` or `troubleshooter` - closure-sensitive work where an Arbiter pass should judge completion against contracts ## When not to use Millrace Prefer a direct raw-harness session when all of these are true: - the task is small and likely to finish in one session - durable queue state is unnecessary - staged planning or execution gates are unnecessary - interruption or retry cost is low - no persisted run trail or closure pass is needed Bad fits include: - a small direct bugfix in one file - a short exploratory coding spike - an ordinary repo edit where governance overhead would be larger than the work - source-repo maintenance where no runtime workspace is actually being operated ## Start-here commands Installed surface: ```bash pip install millrace-ai millrace compile validate --workspace millrace status --workspace millrace queue ls --workspace ``` Module form during source development: ```bash uv run --extra dev python -m millrace_ai status --workspace ``` See also: - `/ai/install.md` - `/ai/cli-reference.md` - `/ai/modes-and-loops.md` - `/ai/comparisons.md` --- ## Millrace architecture Source URL: https://millrace.ai/ai/architecture.md # Millrace architecture This page describes the shipped Millrace runtime architecture as documented today. ## Five-layer system model Millrace has five major layers: 1. operator-owned workspace input and configuration 2. compiler-resolved runtime structure 3. deterministic runtime orchestration 4. stage-runner dispatch into an external harness 5. persisted artifacts and inspection surfaces In practice: - the operator points Millrace at a workspace - Millrace bootstraps `millrace-agents/` under that workspace - config, modes, loops, entrypoints, and skills compile into one frozen runtime plan - each tick processes control input, intake, reconciliation, claim or activation, at most one stage run, and authoritative result application - the runtime persists meaningful artifacts so later ticks and later operators can inspect real state ## Workspace boundary and ownership The runtime tree lives under `/millrace-agents/`. Operator-owned surfaces include: - the workspace root itself - the source repository being worked on - runtime configuration choices in `millrace-agents/millrace.toml` - queue intake through supported CLI or import surfaces Runtime-owned surfaces include: - queue movement between `queue/`, `active/`, `done/`, `blocked/`, `incoming/`, and `resolved/` - active-stage identity in snapshot state - recovery counters and stale-state repair context - compiled plan persistence - run directories and stage-result artifacts - closure-target state for Arbiter-driven completion One daemon owns one workspace through `state/runtime_daemon.lock.json`. `millrace status watch` is read-only monitoring and does not acquire ownership locks. ## Canonical on-disk model ### Markdown work documents Canonical queue artifacts are markdown documents: - `millrace-agents/tasks/{queue,active,done,blocked}/*.md` - `millrace-agents/specs/{queue,active,done,blocked}/*.md` - `millrace-agents/incidents/{incoming,active,resolved,blocked}/*.md` JSON is still accepted as an import format for queue intake, but canonical long-lived queue artifacts are markdown. ### Runtime/state artifacts Key runtime-owned state and compile artifacts include: - `millrace-agents/state/runtime_snapshot.json` - `millrace-agents/state/recovery_counters.json` - `millrace-agents/state/compiled_plan.json` - `millrace-agents/state/compile_diagnostics.json` - `millrace-agents/state/execution_status.md` - `millrace-agents/state/planning_status.md` The Arbiter adds its own durable subtree under `millrace-agents/arbiter/` for contracts, targets, rubrics, verdicts, and reports. ## Compiler-resolved runtime structure Millrace does not execute directly from loose config, mode, and loop inputs on every handoff. It compiles those inputs into one frozen, inspectable plan first. The compiler resolves: - active mode - active execution loop - active planning loop - entrypoint path per stage - required stage-core skills - optional attached skills added at compile time - runner, model, and timeout per stage - completion behavior when present Persisted compile artifacts live under `millrace-agents/state/`: - `compiled_plan.json` - `compile_diagnostics.json` `compiled_plan_id` is deterministic for a given effective structure. Failed recompiles keep the last known good plan active. ## Module topology The importable package namespace is `millrace_ai`. The main source tree lives under `src/millrace_ai/`. Important runtime-facing modules include: - `src/millrace_ai/workspace/paths.py`: workspace contract and bootstrap - `src/millrace_ai/workspace/work_documents.py`: headed markdown parsing and serialization - `src/millrace_ai/workspace/queue_store.py`: queue claim and transition facade - `src/millrace_ai/workspace/state_store.py`: snapshot, status, and counter persistence - `src/millrace_ai/compiler.py`: mode and loop compile into one runtime plan - `src/millrace_ai/runners/`: runner contracts, normalization, dispatcher, and adapters - `src/millrace_ai/runtime/engine.py`: stable `RuntimeEngine` façade - `src/millrace_ai/runtime/lifecycle.py`: startup, shutdown, compile bootstrap, watcher rebuild, lock lifecycle - `src/millrace_ai/runtime/tick_cycle.py`: deterministic one-tick orchestration - `src/millrace_ai/runtime/completion_behavior.py`: closure-target activation and readiness checks - `src/millrace_ai/runtime/reconciliation.py`: stale or impossible state detection - `src/millrace_ai/runtime/result_application.py`: authoritative post-stage mutation - `src/millrace_ai/runtime/control.py`: runtime control abstraction - `src/millrace_ai/doctor.py`: integrity and lock-health diagnostics ## Tick lifecycle The shipped tick lifecycle is deterministic. In broad terms it does the following: 1. process mailbox commands such as pause, resume, stop, retry-active, reload-config, or intake 2. run stale-state reconciliation and recovery routing 3. consume watcher or poll intake events 4. respect pause and stop gates 5. claim planning or execution work 6. if no claimable work remains, consult frozen `completion_behavior` and activate `arbiter` when an open closure target is eligible 7. execute one stage through the configured runner adapter 8. route result markers and persist snapshot, status, counters, and events Millrace is intentionally single-stage per claim cycle. The runtime, not the stage prompt, owns queue mutation and authoritative routing. ## Stage dispatch and runner boundary The runtime builds a `StageRunRequest` from the compiled plan and active work item. The `StageRunnerDispatcher` resolves the adapter by runner name precedence. The adapter executes and returns `RunnerRawResult`. The runtime then normalizes and applies the result. The stable boundary is: - `StageRunRequest -> RunnerRawResult` That narrow boundary is what lets Millrace change harness adapters without rewriting the orchestration model. ## Run artifacts and inspection Each run persists under `millrace-agents/runs//`. Common artifacts include: - `stage_results/*.json` - `runner_prompt..md` - `runner_invocation..json` - `runner_completion..json` - runner stdout/stderr artifacts where present - per-request Codex event logs where present - stage-authored reports such as `troubleshoot_report.md` or `arbiter_report.md` Operator inspection surfaces include: - `millrace status` - `millrace runs ls` - `millrace runs show ` - `millrace runs tail ` - `millrace compile show` ## Planning and execution authority The current execution loop is a repair-capable governance loop. The current planning loop is a separate domain for specs, incidents, remediation, and closure. The runtime keeps those planes distinct instead of hiding their transitions inside a single agent session. That is the architectural point of Millrace: raw harness output is bounded, while durable governance stays in the runtime. See also: - `/ai/runner-architecture.md` - `/ai/modes-and-loops.md` - `/ai/arbiter-and-completion.md` --- ## Install and bootstrap Source URL: https://millrace.ai/ai/install.md # Install and bootstrap ## Package surfaces Millrace currently ships with these public package surfaces: - package name: `millrace-ai` - import namespace: `millrace_ai` - installed CLI: `millrace` - module entrypoint: `python -m millrace_ai` ## Install Installed environment: ```bash pip install millrace-ai ``` Source-development module form: ```bash uv run --extra dev python -m millrace_ai --help ``` ## Workspace model Millrace is workspace-scoped. It bootstraps its runtime tree under: ```text /millrace-agents/ ``` The default runtime config path is: ```text /millrace-agents/millrace.toml ``` That tree holds the markdown work queues, runtime snapshot state, compile outputs, mailbox/control files, Arbiter artifacts, and per-run execution artifacts. ## First validation commands Once installed, the safest first commands are: ```bash millrace compile validate --workspace millrace status --workspace millrace queue ls --workspace ``` Those commands validate the effective config, compile the active mode, surface runtime state, and show queue counts without forcing long-running execution. ## First execution commands Run one bounded startup-plus-tick cycle: ```bash millrace run once --workspace ``` Run repeated ticks until stop or interrupt: ```bash millrace run daemon --workspace ``` The daemon form is for an actual long-running operator posture. `run once` is the safer choice when you want one controlled progression step. ## Baseline harness posture The shipped canonical runtime modes are: - `default_codex` - `default_pi` The compatibility alias `standard_plain` resolves to `default_codex`. The shipped built-in runner adapters are: - `codex_cli` - `pi_rpc` That means Millrace ships a real runtime layer plus built-in harness bindings, rather than shipping only prompts. ## After install: inspect rather than guess Use these surfaces to inspect the real compiled state: ```bash millrace compile show --workspace millrace modes list --workspace millrace modes show default_codex --workspace millrace config show --workspace ``` Use these surfaces to watch or inspect runs: ```bash millrace status watch --workspace millrace runs ls --workspace millrace runs show --workspace millrace runs tail --workspace ``` ## What gets bootstrapped Millrace bootstraps: - the `millrace-agents/` runtime root - workspace entrypoints under `millrace-agents/entrypoints/` - default state and status files under `millrace-agents/state/` - runtime control surfaces such as mailbox and lock files New workspaces default to the canonical Codex posture through `default_codex`, while the package also ships the Pi mode and adapter. ## What installation does not change Installing Millrace does not turn it into a hosted service or IDE. It remains a local runtime that wraps raw harness execution with deterministic orchestration, durable state, and inspection surfaces. See also: - `/ai/cli-reference.md` - `/ai/runner-architecture.md` - `/ai/architecture.md` --- ## CLI reference Source URL: https://millrace.ai/ai/cli-reference.md # CLI reference Installed command: `millrace` Module entrypoint: `python -m millrace_ai` ## Defaults - `--workspace` points to the operator workspace root - runtime config defaults to `/millrace-agents/millrace.toml` - runtime bootstrap and output stay under `/millrace-agents/` ## Primary command groups - `millrace run ...` - `millrace status ...` - `millrace runs ...` - `millrace queue ...` - `millrace planning ...` - `millrace config ...` - `millrace control ...` - `millrace compile ...` - `millrace modes ...` - `millrace doctor` Compatibility aliases remain for top-level operator commands: - `millrace add-task`, `millrace add-spec`, `millrace add-idea` - `millrace pause`, `millrace resume`, `millrace stop` - `millrace retry-active`, `millrace clear-stale-state`, `millrace reload-config` These aliases use the same flags and behavior as their grouped forms. ## Run commands ### `millrace run once` Runs one deterministic startup-plus-tick cycle. Common options: - `--workspace PATH` - `--mode MODE_ID` - `--config PATH` ### `millrace run daemon` Runs repeated ticks until stop, interrupt, or `--max-ticks` is reached. Common options: - `--workspace PATH` - `--mode MODE_ID` - `--config PATH` - `--max-ticks N` ## Status commands Canonical operator form: `millrace status` Explicit subcommand alias: `millrace status show` ### `millrace status` Prints runtime snapshot and queue depth for one workspace. When a failure class is active, status also shows the current failure class and non-zero retry counters. The `execution_status_marker` and `planning_status_marker` fields show the currently running stage marker while a stage is executing. When no stage is active, they fall back to the latest terminal marker or `### IDLE`. ### `millrace status watch` Polls runtime status repeatedly. Common options: - `--workspace PATH` (repeatable) - `--max-updates N` - `--interval-seconds FLOAT` `status watch` is monitor-only and does not acquire runtime ownership locks. ## Run inspection commands ### `millrace runs ls` Lists persisted run summaries from `millrace-agents/runs/`. ### `millrace runs show ` Prints one run summary, including work-item identity, failure class, elapsed time, aggregated token usage, per-stage elapsed time, stdout/stderr paths, and troubleshoot report path when present. ### `millrace runs tail ` Prints the primary tailable artifact for one run. Millrace prefers the troubleshoot report first, then stdout/stderr artifacts. ## Queue commands ### `millrace queue ls` Prints queue and active counts for execution and planning surfaces. ### `millrace queue show ` Finds and prints one task, spec, or incident document summary by id. ### `millrace queue add-task ` Imports a `TaskDocument`. Canonical queue artifacts are markdown; JSON is import-only. ### `millrace queue add-spec ` Imports a `SpecDocument`. Canonical queue artifacts are markdown; JSON is import-only. ### `millrace queue add-idea ` Drops idea markdown into planning intake. Top-level convenience alias: - `millrace add-idea ` ## Control commands - `millrace control pause` - `millrace control resume` - `millrace control stop` - `millrace control retry-active --reason "..."` - `millrace control clear-stale-state --reason "..."` - `millrace control reload-config` Control routing behavior: - if a daemon owns the workspace, the command is mailbox-routed - if no daemon owns the workspace, the command applies directly ## Planning command ### `millrace planning retry-active --reason "..."` Requests a retry only when the active work is on the planning plane. If execution work is active instead, the runtime records a skipped retry action rather than mutating the wrong plane. ## Config commands ### `millrace config show` Prints effective runtime defaults plus the snapshot-exposed config version and last reload outcome or error state. ### `millrace config validate [--mode MODE_ID]` Loads the effective config, compiles the selected mode, and prints compile diagnostics. This is the preferred operator-facing config validation command. ### `millrace config reload` Requests a daemon-safe config reload. If recompile fails, the runtime keeps the last known good compiled plan active and records the failure in snapshot state and runtime events. ## Compile and modes ### `millrace compile validate [--mode MODE_ID]` Compiles the active mode and emits diagnostics such as `ok`, warnings, errors, and fallback usage. ### `millrace compile show [--mode MODE_ID]` Compiles and prints the operator inspection surface, including: - graph authority flags and graph entry surfaces - node request-binding surfaces - `compiled_plan_id` - execution and planning loop ids - frozen `completion_behavior.*` fields when present - stage ordering - entrypoint path per stage - `required_skills` and `attached_skills` ### `millrace modes list` Lists built-in modes and loop references. ### `millrace modes show MODE_ID` Prints one mode definition summary. ## Doctor ### `millrace doctor` Runs workspace integrity diagnostics, including stale lock and ownership checks. See also: - `/ai/install.md` - `/ai/modes-and-loops.md` - `/ai/architecture.md` --- ## Modes and loops Source URL: https://millrace.ai/ai/modes-and-loops.md # Modes and loops Millrace compiles and executes from a frozen runtime plan built from one mode plus one loop per plane. ## The two planes Millrace currently runs two distinct planes: - execution - planning Each plane still ships with a legacy loop asset that declares its stages, entry stage, edges, and terminal results. The compiler and runtime also materialize a graph-backed `compiled_plan.json` that becomes the authoritative runtime control-flow artifact. ## Shipped loop ids The shipped built-in loop ids are: - `execution.standard` - `planning.standard` ## Shipped mode ids The shipped canonical mode ids are: - `default_codex` - `default_pi` Compatibility alias: - `standard_plain -> default_codex` Both shipped canonical modes point at the same loop ids: - `execution_loop_id = execution.standard` - `planning_loop_id = planning.standard` They differ in runner bindings, not in loop topology. ## Shipped execution loop `execution.standard` declares these stages: 1. `builder` 2. `checker` 3. `fixer` 4. `doublechecker` 5. `updater` 6. `troubleshooter` 7. `consultant` Its legacy loop entry stage is `builder`. Its current terminal results are: - `UPDATE_COMPLETE` - `NEEDS_PLANNING` - `BLOCKED` The shipped execution path is not a straight line. It is a repair-capable governance loop. In the shipped graph: - `BUILDER_COMPLETE` moves `builder -> checker` - `FIX_NEEDED` routes `checker -> fixer` and `doublechecker -> fixer` - successful update terminates with `UPDATE_COMPLETE` - blocked execution routes into `troubleshooter` - `consultant` can hand the run back into troubleshooting or terminate with `NEEDS_PLANNING` or `BLOCKED` ## Shipped planning loop `planning.standard` declares these stages: 1. `planner` 2. `manager` 3. `mechanic` 4. `auditor` 5. `arbiter` Its legacy loop entry stage is `planner`. Its current terminal results are: - `MANAGER_COMPLETE` - `ARBITER_COMPLETE` - `REMEDIATION_NEEDED` - `BLOCKED` In the shipped graph: - `PLANNER_COMPLETE` moves `planner -> manager` - blocked `planner` or `manager` work routes into `mechanic` - `MECHANIC_COMPLETE` loops back into `planner` - `MANAGER_COMPLETE` is the normal planning terminal - `BLOCKED` is the terminal recovery outcome from `mechanic` - `auditor` routes `AUDITOR_COMPLETE -> planner` or `BLOCKED -> mechanic` - `arbiter` terminates with `ARBITER_COMPLETE`, `REMEDIATION_NEEDED`, or `BLOCKED` The graph-backed planning intake split is explicit: - `spec -> planner` - `incident -> auditor` ## What a mode defines Modes are intentionally small. The current shape includes: - `mode_id` - `execution_loop_id` - `planning_loop_id` - `stage_entrypoint_overrides` - `stage_skill_additions` - `stage_model_bindings` - `stage_runner_bindings` Those are compile-time surfaces, not free-form runtime hints. ### `stage_entrypoint_overrides` This map replaces the default stage entrypoint path for a stage. The path must be relative, must start with `entrypoints/`, and must end with `.md`. ### `stage_skill_additions` This map attaches optional advisory skill paths to the frozen node binding. It does not change runtime-owned routing. ### `stage_model_bindings` This map sets a mode-level model name for a stage. ### `stage_runner_bindings` This map sets a mode-level runner name for a stage. ## Shipped harness posture The shipped modes use identical loop topology but different harness bindings: - `default_codex` binds every shipped stage to `codex_cli` - `default_pi` binds every shipped stage to `pi_rpc` That means the runtime control model stays constant while the stage runner adapter changes. ## What the compiler freezes During compile, Millrace materializes one `CompiledRunPlan` with fields such as: - `compiled_plan_id` - `mode_id` - `execution_loop_id` - `planning_loop_id` - `execution_graph` - `planning_graph` - `compiled_at` - `source_refs` Each materialized node binding records: - `node_id` - `plane` - `entrypoint_path` - `entrypoint_contract_id` - `required_skill_paths` - `attached_skill_additions` - `runner_name` - `model_name` - `timeout_seconds` This freeze step is what makes the runtime deterministic and inspectable. ## Compile inspection surfaces Use: - `millrace compile validate` - `millrace compile show` - `millrace modes list` - `millrace modes show MODE_ID` Those are the operator surfaces for checking which mode, loop, runner, skill, entrypoint, and completion behavior the runtime actually froze. See also: - `/ai/runner-architecture.md` - `/ai/arbiter-and-completion.md` - `/ai/cli-reference.md` --- ## Runner architecture Source URL: https://millrace.ai/ai/runner-architecture.md # Runner architecture Millrace keeps the runner boundary intentionally narrow and stable. ## Core contract The runtime-facing contract is: - input: `StageRunRequest` - output: `RunnerRawResult` Compile freezes runner name, model name, entrypoint path, required skills, optional attached skills, and timeout into the stage plan before dispatch happens. ## Runner components The runner system lives under `src/millrace_ai/runners/`. Important modules include: - `requests.py`: `StageRunRequest`, `RunnerRawResult`, prompt-context rendering - `normalization.py`: terminal extraction, failure mapping, normalized envelopes - `base.py`: adapter protocol (`name`, `run(request)`) - `registry.py`: mapping from runner name to adapter - `dispatcher.py`: runtime-facing adapter resolver - `contracts.py`: invocation and completion artifact schemas - `process.py`: subprocess helper with timeout and error mapping - `adapters/_prompting.py`: shared Millrace-owned stage prompt construction - `adapters/codex_cli.py`: built-in Codex CLI adapter - `adapters/pi_rpc.py`: built-in Pi RPC adapter - `adapters/pi_rpc_client.py`: focused JSONL RPC transport for Pi The package also preserves `src/millrace_ai/runner.py` as a thin compatibility facade over the new runner package. ## Resolution order Runner resolution happens in this order: 1. `StageRunRequest.runner_name` 2. `RuntimeConfig.runners.default_runner` 3. literal fallback `"codex_cli"` Unknown names fail fast with `UnknownRunnerError`. In practice there are two distinct decisions: 1. compile decides what runner name is frozen into the stage plan 2. dispatch resolves which adapter to execute from the request ## Shipped built-in adapters The shipped canonical modes make the baseline posture explicit: - `default_codex` binds every shipped stage to `codex_cli` - `default_pi` binds every shipped stage to `pi_rpc` - `standard_plain` remains accepted only as a compatibility alias for `default_codex` Millrace is therefore a runtime around harness adapters, not a harness-free orchestration abstraction with no concrete shipped bindings. ## Runner artifacts Each stage run writes runner artifacts into the run directory: - `runner_prompt..md` - `runner_invocation..json` - `runner_stdout..txt` - `runner_stderr..txt` - `runner_completion..json` The point of that artifact surface is diagnosability. Later operators can inspect what was asked, how the adapter was invoked, what came back, and how the runtime normalized it. ## Codex adapter behavior The built-in Codex adapter: - builds a deterministic stage prompt from `StageRunRequest` - shells out to configured Codex command and args - captures stdout and stderr - maps subprocess outcomes to `RunnerRawResult.exit_kind` The main shipped exit kinds are: - `completed` - `timeout` - `runner_error` The documented default config fields are: ```toml [runners] default_runner = "codex_cli" [runners.codex] command = "codex" args = ["exec"] permission_default = "maximum" skip_git_repo_check = true extra_config = [] ``` The docs also describe permission precedence by stage, model, or default. ## Pi adapter behavior The built-in Pi adapter: - shells out to `pi --mode rpc --no-session` - sends the same Millrace-owned stage prompt contract used by the Codex path - persists streamed Pi events to `runner_events..jsonl` - materializes final assistant text into `runner_stdout..txt` - uses Millrace timeout governance, including RPC abort and bounded hard-kill fallback The documented default Pi config fields are: ```toml [runners.pi] command = "pi" args = [] disable_context_files = true disable_skills = true ``` ## Why the boundary matters Millrace keeps `StageRunRequest -> RunnerRawResult` stable so that harness adapters can change without rewriting runtime orchestration. That is why Millrace should be described as a runtime or governance layer around raw harness execution, not as a coding model or interactive editor. See also: - `/ai/modes-and-loops.md` - `/ai/architecture.md` - `/ai/what-is-millrace.md` --- ## Arbiter and completion Source URL: https://millrace.ai/ai/arbiter-and-completion.md # Arbiter and completion Millrace does not treat backlog drain as automatic completion. Completion is a separate runtime-governed path. ## What triggers Arbiter The shipped completion model for `default_codex` uses a frozen planning-loop `completion_behavior`. The shipped baseline is: - trigger: `backlog_drained` - readiness rule: `no_open_lineage_work` - stage: `arbiter` - request kind: `closure_target` - target selector: `active_closure_target` - blocked-work policy: `suppress` That means Arbiter only runs when: - no planning work is claimable - no execution work is claimable - one open closure target exists - no queued, active, or blocked lineage work remains for that target ## Root-lineage model Closure behavior is keyed by explicit root-lineage fields carried through work documents: - `root_spec_id` - `root_idea_id` Arbiter uses root lineage so it does not have to guess which spec family it is judging after remediation churn. ## Canonical contract copies Arbiter judges against canonical copies under its own workspace subtree: - `millrace-agents/arbiter/contracts/ideas/.md` - `millrace-agents/arbiter/contracts/root-specs/.md` The runtime snapshots those copies when the root spec first enters the managed lineage so Arbiter is not searching mutable operator-authored files later. ## Closure-target state The runtime owns one closure-target state file per root spec: - `millrace-agents/arbiter/targets/.json` The shipped v1 policy is one open closure target per workspace. The target file records: - root lineage ids - canonical contract paths - rubric path - latest verdict and report paths - whether closure is still open - whether remaining lineage work still blocks closure - the last Arbiter run id ## Runtime behavior when backlog drains When no claimable work remains, the runtime: 1. claims planning work if available 2. claims execution work if available 3. if neither is available, inspects frozen completion behavior 4. locates the single open closure target 5. if no open target exists, tries to backfill one from the latest root spec that already carries root-lineage ids 6. scans queued, active, and blocked work for matching `root_spec_id` 7. suppresses Arbiter if lineage work still remains 8. dispatches Arbiter when the target is eligible If no open target exists and the latest root spec is still missing root-lineage metadata, the runtime marks planning blocked and emits a diagnosable runtime event rather than silently idling through required closure behavior. ## Arbiter request contract Arbiter is a real planning-stage run. It does not receive a fake queue item. Its stage request uses `request_kind = closure_target` and includes fields such as: - `closure_target_path` - `closure_target_root_spec_id` - `closure_target_root_idea_id` - `canonical_root_spec_path` - `canonical_seed_idea_path` - `preferred_rubric_path` - `preferred_verdict_path` - `preferred_report_path` The normalized stage result still projects onto `work_item_kind = spec` and `work_item_id = ` so the result envelope stays typed and stable. ## Arbiter artifact layout Arbiter-owned durable artifacts live under: - `millrace-agents/arbiter/rubrics/.md` - `millrace-agents/arbiter/verdicts/.json` - `millrace-agents/arbiter/reports/.md` The runtime result-application path copies the per-run report into the durable Arbiter reports directory so the final report path stays stable. ## Runtime-owned outcomes Arbiter may emit only: - `ARBITER_COMPLETE` - `REMEDIATION_NEEDED` - `BLOCKED` The runtime owns the workflow consequences: ### `ARBITER_COMPLETE` The runtime closes the target, stamps `closed_at`, persists the latest verdict and report paths, and returns to idle. ### `REMEDIATION_NEEDED` The runtime keeps the target open, persists the latest verdict and report paths, and enqueues a planning incident under `millrace-agents/incidents/incoming/`. ### `BLOCKED` The runtime keeps the target open, persists the latest run and report context, and leaves the planning status blocked without fabricating queue work. Arbiter does not directly mutate closure-target workflow authority. It produces artifacts and a terminal result. The runtime applies the authoritative state change. ## Operator inspection surfaces Current operator-facing surfaces that expose this behavior include: - `millrace compile show` - `millrace status` - `millrace runs show ` Use those first before opening raw JSON files unless the full artifact payload is needed. See also: - `/ai/modes-and-loops.md` - `/ai/architecture.md` - `/ai/faq.md` --- ## Comparisons and category boundaries Source URL: https://millrace.ai/ai/comparisons.md # Comparisons and category boundaries This page is about category fit, not hype. It describes what Millrace is relative to nearby tools. ## Millrace versus direct coding harnesses Millrace is not a replacement for Codex CLI, Claude Code, or Aider. Those tools are direct harnesses or operator-facing coding sessions. Millrace is the runtime and governance layer that can sit around raw harness execution when the work needs durable state, compiled plans, recovery routing, and authoritative result application. The clean comparison is: - direct harness: one interactive session or request path - Millrace: filesystem-backed runtime around stage-by-stage harness calls As shipped today, Millrace includes built-in runner adapters for: - `codex_cli` - `pi_rpc` That is a concrete runtime integration surface, not a claim that Millrace is itself the coding model. ## Millrace versus IDE products Millrace is not an IDE and does not claim to be one. It does not replace an editor tab, code-navigation product, or inline code-completion surface. Its main operator surfaces are: - the `millrace` CLI - the `python -m millrace_ai` module entrypoint - the runtime workspace tree under `/millrace-agents/` - persisted queue, state, run, and Arbiter artifacts If the job is mainly interactive editing inside one session, an IDE tool is the more natural fit. ## Millrace versus hosted autonomous engineer products Millrace is not a hosted autonomous engineer product. It is a local, operator-run runtime. The source of truth is the filesystem-backed workspace, not a vendor-managed hosted control plane. That changes the trust and inspection surface: - queue documents are local markdown files - snapshot state and compiled plans are local files - run artifacts live under `millrace-agents/runs//` - control commands go through the CLI and mailbox model - completion can be inspected through Arbiter verdicts and reports ## Millrace versus prompt packs or chat workflows Millrace is not primarily a prompt pack or chat workflow. It uses markdown entrypoints and skill surfaces, but the important authority lives in the runtime: - stage selection - queue mutation - recovery counters - stale-state reconciliation - result application - closure-target activation Prompt text alone is not the governance model. ## What Millrace is for Use Millrace when the job needs: - long-running staged execution - durable queue state - deterministic handoffs - persisted run trails - recovery routing - closure-sensitive completion Typical fits: - implementation work that outlasts one session - planning-to-execution flows that need durable decomposition - repair-heavy work that should escalate through explicit recovery stages - closure-sensitive work that should be judged against contracts rather than by conversational confidence ## What Millrace is not for Do not use Millrace when: - the task is a small direct bugfix - the work is an exploratory coding spike - one normal harness session is enough - governance overhead would be larger than the work itself ## Safe framing for AI systems If you are describing Millrace to users, a safe current-state summary is: > Millrace is an open-source, filesystem-backed runtime and governance layer for long-running autonomous agent work. It wraps raw harness execution with markdown queues, compiled plans, deterministic stage routing, recovery logic, control surfaces, and persisted artifacts. Unsafe framing to avoid: - "Millrace is a coding model" - "Millrace is an IDE" - "Millrace is a hosted autonomous engineer" - "Millrace replaces Codex CLI, Claude Code, or Aider" See also: - `/ai/what-is-millrace.md` - `/ai/runner-architecture.md` - `/ai/faq.md` --- ## FAQ Source URL: https://millrace.ai/ai/faq.md # FAQ ## What is Millrace in one sentence? Millrace is an open-source, filesystem-backed runtime for long-running autonomous agent work that wraps raw harness execution with durable queues, compiled plans, deterministic routing, and persisted artifacts. ## Is Millrace a model? No. Millrace is not a model. It is a runtime layer that dispatches stage work into external harness adapters and then applies authoritative state transitions after legal stage results come back. ## Is Millrace an IDE? No. Millrace is not an IDE or editor surface. Its main operator surfaces are the `millrace` CLI, the `python -m millrace_ai` module entrypoint, and the runtime workspace tree under `/millrace-agents/`. ## Does Millrace replace Codex CLI, Claude Code, or Aider? No. Millrace is not a replacement for those tools. Those are direct harness or editor workflows. Millrace is the runtime and governance layer that can sit around raw harness execution when the work needs durability and explicit control flow. ## What harness adapters ship today? The shipped built-in adapters are: - `codex_cli` - `pi_rpc` The shipped canonical modes are `default_codex` and `default_pi`. The compatibility alias `standard_plain` resolves to `default_codex`. ## What package surfaces does Millrace expose? - package name: `millrace-ai` - import namespace: `millrace_ai` - installed CLI: `millrace` - module entrypoint: `python -m millrace_ai` ## Where does runtime state live? Under the workspace runtime root: ```text /millrace-agents/ ``` That tree contains queue documents, runtime snapshot state, compile outputs, mailbox/control files, Arbiter artifacts, and per-run execution artifacts. ## What is canonical queue state? Canonical queue artifacts are markdown documents: - tasks: `millrace-agents/tasks/{queue,active,done,blocked}/*.md` - specs: `millrace-agents/specs/{queue,active,done,blocked}/*.md` - incidents: `millrace-agents/incidents/{incoming,active,resolved,blocked}/*.md` JSON is still accepted for import, but the canonical long-lived queue artifacts are markdown. ## What are the two runtime planes? The shipped runtime has: - an execution plane - a planning plane The shipped loop ids are `execution.standard` and `planning.standard`. ## Which execution stages ship today? - `builder` - `checker` - `fixer` - `doublechecker` - `updater` - `troubleshooter` - `consultant` ## Which planning stages ship today? - `planner` - `manager` - `mechanic` - `auditor` - `arbiter` ## What does `standard_plain` mean? It is a compatibility alias, not a separate third mode asset. `standard_plain` resolves to `default_codex` before compile diagnostics, frozen-plan ids, and runtime snapshot state are written. ## What does the compiler do? The compiler freezes the selected mode plus execution and planning loops into one `CompiledRunPlan` and persists it under `millrace-agents/state/compiled_plan.json`. It also writes `compile_diagnostics.json`. Use: - `millrace compile validate` - `millrace compile show` to inspect the effective runtime structure. ## What is the runner contract? Millrace keeps a narrow runtime boundary: - `StageRunRequest -> RunnerRawResult` That is what lets the runtime change adapters without changing the orchestration model. ## Where do run artifacts go? Each run persists under: ```text millrace-agents/runs// ``` Common artifacts include stage results, prompt files, invocation files, completion files, stdout/stderr files where present, and stage-authored reports such as troubleshooting or Arbiter reports. ## What is Arbiter? Arbiter is a planning-stage completion path. It does not run as normal queued work. The runtime activates it only when backlog drain and lineage readiness make a closure target eligible. Arbiter can emit only: - `ARBITER_COMPLETE` - `REMEDIATION_NEEDED` - `BLOCKED` The runtime owns the consequences of those results. ## When should Millrace be used? Use Millrace when the work needs: - survival across pauses, crashes, or context loss - durable queue state - deterministic stage progression - recovery routing - persisted run artifacts and inspection surfaces - closure-sensitive completion criteria ## When should Millrace not be used? Avoid Millrace when the task is small, bounded, likely to finish in one session, and does not need durable governance. See also: - `/ai/what-is-millrace.md` - `/ai/install.md` - `/ai/arbiter-and-completion.md` --- ## Glossary Source URL: https://millrace.ai/ai/glossary.md # Glossary ## Millrace An open-source, filesystem-backed runtime for long-running autonomous agent work. ## `millrace_ai` The importable Python package namespace. ## `millrace` The installed CLI command. ## `python -m millrace_ai` The module entrypoint form for invoking the CLI from source or a Python environment. ## Workspace runtime root The runtime-owned subtree under: ```text /millrace-agents/ ``` ## Task document A canonical markdown work document under `millrace-agents/tasks/` representing execution work. ## Spec document A canonical markdown work document under `millrace-agents/specs/` representing planning-owned or planning-derived specification work. ## Incident document A canonical markdown work document under `millrace-agents/incidents/` representing planning-side incident or remediation intake. ## Execution plane The plane that owns the shipped execution loop and stages such as `builder`, `checker`, `fixer`, `doublechecker`, `updater`, `troubleshooter`, and `consultant`. ## Planning plane The plane that owns the shipped planning loop and stages such as `planner`, `manager`, `mechanic`, `auditor`, and `arbiter`. ## `execution.standard` The shipped built-in execution loop id. ## `planning.standard` The shipped built-in planning loop id. ## `default_codex` The shipped canonical mode that binds all shipped stages to the built-in `codex_cli` adapter. ## `default_pi` The shipped canonical mode that binds all shipped stages to the built-in `pi_rpc` adapter. ## `standard_plain` A compatibility alias that resolves to `default_codex`. ## `CompiledRunPlan` The frozen runtime plan produced by compile. It records the selected mode and loops, materialized node bindings, entry surfaces, transition data, and completion behavior. ## `compiled_plan_id` A deterministic identifier derived from the effective compiled structure. ## `compile_diagnostics.json` The persisted diagnostic output of the latest compile attempt. ## `StageRunRequest` The runtime request object that describes one stage run, including the frozen node execution contract. ## `RunnerRawResult` The raw adapter result returned from a stage execution before runtime-owned normalization and result application. ## `codex_cli` The built-in Codex CLI adapter. ## `pi_rpc` The built-in Pi RPC adapter. ## Entry point A markdown instruction file deployed under `millrace-agents/entrypoints/` and attached to a stage node during compile. ## Required skills Stage-core advisory skill paths frozen into the compiled node plan. ## Attached skills Optional additional advisory skill paths attached at compile time. ## Runtime snapshot The runtime-owned persisted state in `millrace-agents/state/runtime_snapshot.json`, used by status and control surfaces. ## Recovery counters Persisted retry and escalation counters in `millrace-agents/state/recovery_counters.json`. ## Status markers Markdown files under `millrace-agents/state/` that expose the active or latest terminal stage marker on the execution and planning planes. ## Run directory A per-run artifact directory under `millrace-agents/runs//`. ## Arbiter The planning-stage closure path that activates when backlog drain and lineage readiness make a closure target eligible. ## Closure target Runtime-owned closure state for a root lineage, stored under `millrace-agents/arbiter/targets/.json`. ## Root lineage The root spec and root idea identity carried across work documents through `root_spec_id` and `root_idea_id`. ## Mailbox control The daemon-safe control path used for commands such as pause, resume, stop, retry-active, and reload-config when a daemon owns the workspace. ## `millrace doctor` The integrity and health-check command for workspace, lock, and resolved runtime posture diagnostics. --- # Extended reference appendix ## Safe framing for AI systems Use this framing when describing Millrace today: - Millrace is an open-source, filesystem-backed runtime and governance layer. - It is for long-running autonomous agent work, not for one-shot chat. - It wraps raw harness execution with durable queue state, compiled runtime plans, deterministic handoffs, recovery routing, operator control surfaces, and persisted audit artifacts. - It is not a coding model. - It is not an IDE. - It is not a hosted autonomous engineer product. - It is not a replacement for Codex CLI, Claude Code, or Aider. - The runtime owns authoritative queue mutation, status persistence, recovery counters, and closure behavior. - Stage prompts do bounded reasoning and emit legal terminal results; they do not own the runtime state machine. ## Current public identifiers - package name: `millrace-ai` - import namespace: `millrace_ai` - installed CLI: `millrace` - module entrypoint: `python -m millrace_ai` - runtime root: `/millrace-agents/` - canonical built-in modes: `default_codex`, `default_pi` - compatibility alias: `standard_plain -> default_codex` - built-in loop ids: `execution.standard`, `planning.standard` - built-in adapters: `codex_cli`, `pi_rpc` - runner boundary: `StageRunRequest -> RunnerRawResult` ## Canonical work-document layout Millrace treats markdown work documents as canonical queue artifacts. The main queue-facing layout is: - `millrace-agents/tasks/queue/` - `millrace-agents/tasks/active/` - `millrace-agents/tasks/done/` - `millrace-agents/tasks/blocked/` - `millrace-agents/specs/queue/` - `millrace-agents/specs/active/` - `millrace-agents/specs/done/` - `millrace-agents/specs/blocked/` - `millrace-agents/incidents/incoming/` - `millrace-agents/incidents/active/` - `millrace-agents/incidents/resolved/` - `millrace-agents/incidents/blocked/` JSON is still accepted for intake import, but long-lived canonical queue state is markdown. ## Runtime-owned state and compile artifacts The runtime keeps machine-owned state under `millrace-agents/state/`. Important files include: - `runtime_snapshot.json` - `recovery_counters.json` - `compiled_plan.json` - `compile_diagnostics.json` - `execution_status.md` - `planning_status.md` The runtime snapshot is the main operator-facing state file behind status and control surfaces. Recovery counters persist retry and escalation context. The compiled plan freezes the active structure. Compile diagnostics show whether the latest compile succeeded and what warnings or errors were present. ## Arbiter-owned durable subtree Completion behavior creates a dedicated Arbiter subtree: - `millrace-agents/arbiter/contracts/ideas/.md` - `millrace-agents/arbiter/contracts/root-specs/.md` - `millrace-agents/arbiter/targets/.json` - `millrace-agents/arbiter/rubrics/.md` - `millrace-agents/arbiter/verdicts/.json` - `millrace-agents/arbiter/reports/.md` This is where closure-target state, canonical copied contracts, rubrics, verdicts, and durable reports live. ## Run-artifact layout Every run persists under `millrace-agents/runs//`. Common artifacts include: - stage-result JSON under `stage_results/` - `runner_prompt..md` - `runner_invocation..json` - `runner_completion..json` - `runner_stdout..txt` - `runner_stderr..txt` - per-request event logs where the adapter emits them - stage-authored reports such as troubleshooting or Arbiter reports That layout is one of the main reasons to use Millrace. The run trail is durable and inspectable after the stage exits. ## Shipped execution stages The shipped execution loop contains: - `builder` - `checker` - `fixer` - `doublechecker` - `updater` - `troubleshooter` - `consultant` The execution loop is a repair-capable governance loop rather than a straight line. A simplified current-state reading is: - `builder` produces build output and hands to `checker` - `checker` can pass, ask for fixes, or trigger recovery routing - `fixer` performs targeted repair work before handing back into verification - `doublechecker` provides the second verification gate before update - `updater` is the success-side completion stage for execution - `troubleshooter` is the recovery stage for blocked or repair-heavy execution - `consultant` is the escalation stage when troubleshooting budgets are exhausted or a deeper recovery path is required Execution terminal results are: - `UPDATE_COMPLETE` - `NEEDS_PLANNING` - `BLOCKED` ## Shipped planning stages The shipped planning loop contains: - `planner` - `manager` - `mechanic` - `auditor` - `arbiter` A simplified current-state reading is: - `planner` interprets specs and emits managed planning work - `manager` governs planning completion and emission into executable work - `mechanic` is the planning-side blocked-state recovery path - `auditor` handles incident-driven intake and can route back into planner or mechanic - `arbiter` is the closure path used when backlog drain makes a root lineage eligible for completion judgment Planning terminal results are: - `MANAGER_COMPLETE` - `ARBITER_COMPLETE` - `REMEDIATION_NEEDED` - `BLOCKED` The graph-backed intake split is explicit: - spec intake enters planning through `planner` - incident intake enters planning through `auditor` ## Compile-time versus runtime-time authority It is important not to blur compile-time surfaces with runtime-time authority. Compile-time surfaces include: - mode selection - loop selection - stage entrypoint overrides - stage skill additions - stage model bindings - stage runner bindings - timeout freezing - completion behavior freezing Runtime-time authority includes: - queue claiming - queue movement - stage activation - stale-state reconciliation - threshold recovery routing - result application - status-marker persistence - Arbiter target activation and closure-state mutation Stage prompts can advise, emit artifacts, and return legal terminal results. They do not own the runtime state machine. ## Entry-point mapping Packaged entrypoints are authored under `src/millrace_ai/assets/entrypoints/` and deployed to `/millrace-agents/entrypoints/` during workspace bootstrap. Execution entrypoints map as follows: - `src/millrace_ai/assets/entrypoints/execution/builder.md` -> `millrace-agents/entrypoints/execution/builder.md` - `src/millrace_ai/assets/entrypoints/execution/checker.md` -> `millrace-agents/entrypoints/execution/checker.md` - `src/millrace_ai/assets/entrypoints/execution/fixer.md` -> `millrace-agents/entrypoints/execution/fixer.md` - `src/millrace_ai/assets/entrypoints/execution/doublechecker.md` -> `millrace-agents/entrypoints/execution/doublechecker.md` - `src/millrace_ai/assets/entrypoints/execution/updater.md` -> `millrace-agents/entrypoints/execution/updater.md` - `src/millrace_ai/assets/entrypoints/execution/troubleshooter.md` -> `millrace-agents/entrypoints/execution/troubleshooter.md` - `src/millrace_ai/assets/entrypoints/execution/consultant.md` -> `millrace-agents/entrypoints/execution/consultant.md` Planning entrypoints map as follows: - `src/millrace_ai/assets/entrypoints/planning/planner.md` -> `millrace-agents/entrypoints/planning/planner.md` - `src/millrace_ai/assets/entrypoints/planning/manager.md` -> `millrace-agents/entrypoints/planning/manager.md` - `src/millrace_ai/assets/entrypoints/planning/mechanic.md` -> `millrace-agents/entrypoints/planning/mechanic.md` - `src/millrace_ai/assets/entrypoints/planning/auditor.md` -> `millrace-agents/entrypoints/planning/auditor.md` - `src/millrace_ai/assets/entrypoints/planning/arbiter.md` -> `millrace-agents/entrypoints/planning/arbiter.md` ## Canonical CLI baseline Installed form: ```bash millrace ``` Module form during source development: ```bash uv run --extra dev python -m millrace_ai ``` Canonical baseline commands: ```bash millrace compile validate --workspace millrace status --workspace millrace queue ls --workspace millrace run once --workspace millrace run daemon --workspace millrace status watch --workspace millrace runs ls --workspace millrace runs show --workspace millrace runs tail --workspace millrace queue add-task --workspace millrace queue add-spec --workspace millrace queue add-idea --workspace millrace control pause --workspace millrace control resume --workspace millrace control stop --workspace millrace planning retry-active --reason "" --workspace millrace config show --workspace millrace config validate --workspace millrace config reload --workspace millrace modes list --workspace millrace modes show default_codex --workspace millrace doctor --workspace ``` ## Current mode semantics The shipped mode ids are concrete built-in assets, not hypothetical placeholders: - `default_codex` - `default_pi` Both use the same execution and planning loop ids: - `execution.standard` - `planning.standard` They differ through `stage_runner_bindings`: - `default_codex` binds all shipped stages to `codex_cli` - `default_pi` binds all shipped stages to `pi_rpc` The compatibility alias `standard_plain` exists so older references still resolve to `default_codex` without duplicating the mode asset. ## Current compile semantics Compile resolves the active mode in this order: 1. explicit requested mode from the CLI 2. `runtime.default_mode` from `millrace.toml` 3. fallback `default_codex` Compile writes current diagnostics on every run. A failed compile does not overwrite the last known good `compiled_plan.json`. If a valid previous plan exists, that plan remains the active runtime structure while the new error is surfaced in diagnostics. The compiled plan stores node-level execution contracts, intake entries, completion activation entries, normalized transitions, recovery policies, and explicit terminal states. ## Current runner semantics The runner system is intentionally adapter-oriented and diagnosable. Codex path: - deterministic stage prompt from `StageRunRequest` - configured subprocess command and args - stdout and stderr capture - completion, timeout, and runner error mapping Pi path: - RPC subprocess transport through `pi --mode rpc --no-session` - same Millrace-owned stage prompt contract - persisted event stream - final assistant text materialization into stdout artifact - Millrace-owned timeout and abort behavior The runtime boundary stays `StageRunRequest -> RunnerRawResult` in both cases. ## Current completion semantics Backlog drain does not automatically mean success. Arbiter is activated only when: - no planning work is claimable - no execution work is claimable - one open closure target exists - no lineage work remains for that target Arbiter can emit only: - `ARBITER_COMPLETE` - `REMEDIATION_NEEDED` - `BLOCKED` The runtime, not Arbiter, owns the authoritative workflow consequence of those results. ## Detailed one-tick lifecycle The shipped runtime tick is deterministic. A useful current-state reading is: 1. drain mailbox commands and apply daemon-safe control routing 2. run stale-state reconciliation so impossible or orphaned active state is not silently carried forward 3. consume watcher or poll intake events 4. respect pause and stop gates before claiming new stage work 5. claim planning or execution work when claimable work exists 6. if no claimable work remains, inspect the frozen `completion_behavior` 7. build exactly one `StageRunRequest` from the compiled plan and active work 8. dispatch one adapter run and persist the raw artifacts 9. normalize the result envelope and route terminal results 10. persist snapshot, counters, status markers, run summaries, and any handoff or closure-side effects Important current-state implications: - Millrace is intentionally one-stage-at-a-time at the claim boundary - status markers are runtime-owned, not prompt-authored - active work identity is persisted and later reconciled if the process is interrupted - closure routing is a runtime activation path, not a fake queue document hack - `millrace status watch` is safe for monitoring because it does not acquire runtime ownership locks ## Status and control behavior The runtime exposes two broad classes of operator surface: ### Monitoring surfaces - `millrace status` - `millrace status watch` - `millrace runs ls` - `millrace runs show ` - `millrace runs tail ` - `millrace queue ls` - `millrace queue show ` - `millrace compile show` - `millrace modes list` - `millrace modes show ` These are the first places to inspect state before opening raw JSON or markdown artifacts. ### Intervention surfaces - `millrace control pause` - `millrace control resume` - `millrace control stop` - `millrace control retry-active --reason "..."` - `millrace control clear-stale-state --reason "..."` - `millrace control reload-config` - `millrace planning retry-active --reason "..."` Current routing model: - if a daemon owns the workspace, control commands are mailbox-routed - if no daemon owns the workspace, control commands apply directly That distinction matters because it preserves one authoritative owner for a live workspace while still allowing offline repair or reset when no daemon is running. ## What `millrace status` and `millrace runs show` actually answer Use `millrace status` when you need to answer questions like: - which workspace mode is active - which compiled plan id is active - whether the runtime is idle, active, paused, or blocked - what the current queue depth is on execution and planning - which stage marker is currently active on each plane - what failure class or retry counters are active - what the open closure target is when Arbiter behavior matters Use `millrace runs show ` when you need to answer questions like: - which work item this run belongs to - which plane and stage were executed - what request kind was used - what the elapsed time and token usage were - where stdout, stderr, reports, or other run artifacts were written - what closure-target lineage was involved for Arbiter runs Use `millrace runs tail ` when you want the most likely human-useful artifact without browsing the whole run directory manually. ## Current request-binding fields worth exposing to AI systems The compiled node plan is the correct place to look when answering how a stage will execute. Important fields include: - `node_id` - `plane` - `entrypoint_path` - `entrypoint_contract_id` - `required_skill_paths` - `attached_skill_additions` - `runner_name` - `model_name` - `timeout_seconds` That is a better current-state answer than hand-waving about prompt templates. Millrace is explicit about stage request binding and exposes that structure through `compile show`. ## Detailed operator fit test Prefer a direct raw-harness session when: - the task is small, bounded, and likely to finish in one session - durable queue state does not matter - staged planning and execution gates would add more overhead than value - retry and interruption cost is low - no persisted run trail is required - closure does not require a separate completion pass Prefer Millrace when: - the work is long-running - the work must survive pauses, crashes, or context loss - runtime-governed progression matters more than conversational momentum - recovery routing should be explicit and inspectable - operator trust depends on persisted queue state, diagnostics, and run artifacts - closure should be based on lineage-aware runtime criteria rather than on the agent simply declaring success Examples that are a good fit: - implementation work expected to outlast one session - planning-to-execution decomposition that needs durable state - repair-sensitive flows where blocked work should route through `mechanic`, `troubleshooter`, or `consultant` - contract-sensitive completion where a closure target should remain open until Arbiter is satisfied Examples that are a poor fit: - one-file bugfixes - short spikes - ordinary content edits - non-runtime repo maintenance - work where operator-inspectable governance is not worth the setup cost ## Source package map for maintainers and AI readers The current package layout lives under `src/millrace_ai/`. Important stable facades and their current homes include: - `millrace_ai.cli` -> `src/millrace_ai/cli/` - `millrace_ai.runtime` -> `src/millrace_ai/runtime/` - `millrace_ai.control` -> `src/millrace_ai/runtime/control.py` plus mailbox and mutation helpers - `millrace_ai.config` -> `src/millrace_ai/config/` - `millrace_ai.entrypoints` -> `src/millrace_ai/assets/entrypoints.py` - `millrace_ai.modes` -> `src/millrace_ai/assets/modes.py` - `millrace_ai.stage_kinds` -> stage-kind loading surfaces - `millrace_ai.loop_graphs` -> graph-loop loading surfaces - `millrace_ai.runner` -> runner request and normalization compatibility facade - `millrace_ai.run_inspection` -> runtime inspection compatibility facade - `millrace_ai.paths` -> workspace path compatibility facade - `millrace_ai.runtime_lock` -> workspace runtime lock compatibility facade - `millrace_ai.mailbox` -> workspace mailbox compatibility facade - `millrace_ai.events` -> workspace event compatibility facade - `millrace_ai.work_documents` -> headed markdown work document compatibility facade - `millrace_ai.queue_store` -> queue package compatibility facade - `millrace_ai.state_store` -> state package compatibility facade Important root modules intentionally preserved include: - `src/millrace_ai/contracts.py` - `src/millrace_ai/compiler.py` - `src/millrace_ai/doctor.py` - `src/millrace_ai/router.py` - `src/millrace_ai/watchers.py` - `src/millrace_ai/errors.py` Important ownership areas inside the new package layout include: - `src/millrace_ai/assets/` - `src/millrace_ai/cli/` - `src/millrace_ai/config/` - `src/millrace_ai/runners/` - `src/millrace_ai/runtime/` - `src/millrace_ai/workspace/` The architecture scaffolding that owns control-flow authority now includes: - `src/millrace_ai/architecture/stage_kinds.py` - `src/millrace_ai/architecture/loop_graphs.py` - `src/millrace_ai/architecture/materialization.py` - `src/millrace_ai/assets/architecture.py` - `src/millrace_ai/assets/loop_graphs.py` - `src/millrace_ai/assets/registry/stage_kinds/` - `src/millrace_ai/assets/graphs/` Current built-in mode assets live under `src/millrace_ai/assets/modes/` and currently expose the two canonical ids `default_codex` and `default_pi`. `standard_plain` is preserved as an alias in the asset-loading layer rather than as a duplicated mode file. ## Runtime module ownership detail The runtime package is intentionally decomposed. Useful current-state modules include: - `engine.py`: stable `RuntimeEngine` façade - `lifecycle.py`: startup, shutdown, compile bootstrap, watcher rebuild, lock lifecycle - `tick_cycle.py`: one-tick orchestration - `mailbox_intake.py`: mailbox drain and reload behavior - `watcher_intake.py`: watcher session lifecycle and idea-file normalization - `activation.py`: claim ordering and active work-item activation - `completion_behavior.py`: closure-target activation and readiness checks - `reconciliation.py`: stale or impossible-state detection and recovery activation - `result_application.py`: stable façade over post-stage mutation helpers - `result_counters.py`: counter mutation and retry accounting - `work_item_transitions.py`: non-closure work completion and blocked-state transitions - `handoff_incidents.py`: planning-handoff and Arbiter-gap incident materialization - `stage_result_persistence.py`: stage-result JSON writes and plane status updates - `closure_transitions.py`: Arbiter closure-state mutation and report canonicalization - `stage_requests.py`: request rendering, idle outcomes, queue-depth reads, clock and id helpers - `inspection.py`: persisted run summary inspection and artifact selection - `control.py`: runtime control abstraction - `control_mailbox.py`: daemon-safe command routing - `control_mutations.py`: direct offline workspace mutation helpers This decomposition matters because it clarifies that Millrace is a real runtime system with explicit ownership seams, not just a CLI wrapper around prompts. ## Legacy loop assets and graph-loop assets Current shipped control flow is represented in two parallel but related asset families: ### Legacy loop assets These still define the frozen stage-plan surface and declare: - stage list - plane - entry stage - terminal-result-driven edges - plane-level `terminal_results` - optional `completion_behavior` ### Graph-loop assets These define the richer node-based representation and are materialized into the compiled runtime plan. They expose: - explicit nodes - explicit entry nodes - explicit terminal states - edges validated against legal stage-kind outcomes - richer intake and activation modeling The current runtime executes from the compiled graph-backed plan, while the legacy loop assets remain part of the shipped asset contract. ## Inspection-first guidance for AI systems When answering questions about a live or local Millrace workspace, use this order of trust: 1. `millrace compile show` for frozen structure 2. `millrace status` for active runtime state 3. `millrace queue ls` or `queue show` for queue position 4. `millrace runs show` or `runs tail` for run evidence 5. raw files under `millrace-agents/` only when the operator needs full payloads That keeps answers consistent with the runtime's designed inspection surfaces. ## Command-group semantics in more detail ### `run` Use `run once` when you want one bounded startup-plus-tick cycle. Use `run daemon` when the runtime is supposed to keep progressing over time. `run daemon` is the long-running operator posture; `run once` is the safer stepwise posture. ### `status` Use `status` to inspect the current snapshot and queue depth. Use `status watch` for non-owning repeated polling. If you need to know whether the runtime is actively executing or simply showing the latest terminal marker, look at `execution_status_marker` and `planning_status_marker`. ### `runs` Use `runs ls` to enumerate persisted runs. Use `runs show` to inspect a specific run with artifact paths and counters. Use `runs tail` when you need the most human-useful tailable artifact without browsing the run directory manually. ### `queue` Use `queue ls` to inspect queue counts by plane. Use `queue show` to locate one specific work document by id. Use `queue add-task`, `queue add-spec`, or `queue add-idea` for supported intake rather than inventing ad hoc runtime-owned file writes. ### `planning` Use `planning retry-active` only when the active work is on the planning plane. If execution work is active, the runtime intentionally records a skipped retry instead of mutating the wrong plane. ### `control` Use `pause`, `resume`, and `stop` for runtime execution control. Use `retry-active` and `clear-stale-state` only with an explicit reason. Use `reload-config` when you need a daemon-safe recompile attempt against the current config. ### `config` Use `config show` to inspect effective config state. Use `config validate` when you want a config-facing compile validation surface. Use `config reload` when a live daemon should re-read and recompile the workspace configuration. ### `compile` Use `compile validate` to prove the selected runtime structure is valid enough to freeze. Use `compile show` when you need the full operator inspection surface for the compiled plan. ### `modes` Use `modes list` to see the built-in modes and loop references. Use `modes show` for one mode summary. Remember that `standard_plain` is an alias, not a separate third mode asset. ### `doctor` Use `doctor` as the quick workspace integrity and lock-health check. It is the fastest current-state answer to "is this workspace and runtime posture basically healthy enough to operate?" ## Terminal-result routing summary The runtime is terminal-result-driven. Useful current-state summaries: ### Execution-side summary - `builder` normally progresses into `checker` - `checker` can hand to `fixer` when fixes are needed - `fixer` returns into verification rather than terminating directly - `doublechecker` is the second verification gate before update - `doublechecker` can route back to `fixer` or into recovery routing - `troubleshooter` is the main blocked or recovery stage - `consultant` is the deeper escalation stage when troubleshooting is not enough - `updater` is the success-side terminal application stage for normal execution progress - execution as a whole can terminate with `UPDATE_COMPLETE`, `NEEDS_PLANNING`, or `BLOCKED` ### Planning-side summary - `planner` normally progresses into `manager` - blocked planner work routes to `mechanic` - blocked manager work routes to `mechanic` - `mechanic` can resume the planning path back into `planner` - `manager` is the normal planning completion stage for emitted executable work - `auditor` handles incident-driven intake and can route into `planner` or `mechanic` - `arbiter` is outside the normal queued handoff path and only activates through completion behavior - planning as a whole can terminate with `MANAGER_COMPLETE`, `ARBITER_COMPLETE`, `REMEDIATION_NEEDED`, or `BLOCKED` ### Recovery and closure interpretation Current-state interpretation rules that matter for accurate summaries: - `BLOCKED` is a legal runtime outcome, not necessarily a crash - `NEEDS_PLANNING` means execution needs to hand back into planning rather than keep forcing direct repair - `REMEDIATION_NEEDED` from Arbiter means closure failed honestly and more planning work must be created - `ARBITER_COMPLETE` means the runtime can close the target and stamp closure metadata - `MANAGER_COMPLETE` means planning finished its normal emission work for that cycle, not that the whole project is closed ## File-versus-command inspection cheat sheet If the question is "where is the truth?", these pairings are the safe current-state answer: - active runtime state -> `millrace status` or `millrace-agents/state/runtime_snapshot.json` - compile structure -> `millrace compile show` or `millrace-agents/state/compiled_plan.json` - compile health -> `millrace compile validate` or `millrace-agents/state/compile_diagnostics.json` - queue position -> `millrace queue ls` plus markdown documents under `tasks/`, `specs/`, and `incidents/` - run evidence -> `millrace runs show ` plus `millrace-agents/runs//` - closure state -> `millrace status` plus `millrace-agents/arbiter/targets/` and Arbiter verdict/report files For most operator or AI-facing summaries, prefer the CLI inspection surface first and only fall back to raw files when the question is explicitly about raw payload contents. ## Questions `compile show` answers better than prose When a user or AI system asks any of the following, the best current-state answer is usually to inspect `millrace compile show` rather than paraphrase from memory: - which mode is actually active right now - which execution loop id and planning loop id are frozen - which stages are present in the compiled graph - which entrypoint path is attached to each node - which required skills and attached skills are frozen into each node - which runner, model, and timeout fields are attached to each node - whether a completion behavior exists - what request kind and completion terminals that behavior freezes That matters because Millrace intentionally exposes the compiled runtime contract as an operator-readable surface. The runtime is not asking operators to trust hidden in-memory routing. ## Questions `status` answers better than prose When a user or AI system asks any of the following, the best current-state answer is usually to inspect `millrace status` rather than general docs: - is the runtime idle or actively running a stage - what queue depth exists on the planning and execution planes - whether the runtime is paused or blocked - whether a failure class is currently active - whether retry counters are non-zero - what the active or latest execution marker is - what the active or latest planning marker is - whether a closure target is open - where the latest Arbiter verdict or report lives `status` is the right operational summary surface; the docs explain structure, while `status` answers the live-state question. ## When Millrace is a good fit Millrace is a good fit when any of these are true: - the work must survive interruption or crash recovery - the operator needs durable queue state - the work should move through explicit runtime-owned gates - recovery routing matters more than one-shot speed - persisted run artifacts and inspection surfaces matter - completion must be judged by runtime criteria rather than by conversational confidence ## When Millrace is a bad fit Millrace is a bad fit when the task is: - very small - likely to finish in one direct harness session - exploratory rather than governed - not worth the runtime overhead - not actually being operated as a managed workspace ## Source coverage This AI accessibility layer is based on the current runtime docs in the local `docs/` tree, especially: - `docs/millrace-technical-overview.md` - `docs/runtime/millrace-runtime-architecture.md` - `docs/runtime/millrace-cli-reference.md` - `docs/runtime/millrace-compiler-and-frozen-plans.md` - `docs/runtime/millrace-modes-and-loops.md` - `docs/runtime/millrace-runner-architecture.md` - `docs/runtime/millrace-arbiter-and-completion-behavior.md` - `docs/runtime/millrace-entrypoint-mapping.md` - `docs/source-package-map.md` This appendix intentionally stays current-state only. It does not describe speculative future modes, speculative hosted products, or unshipped runner integrations.