Executable Roadmap

This roadmap is intentionally staged around testable increments. Each milestone should produce a working Web regression surface, not only library code.

Milestone 0: Project Spine and Governance

Target: 1 week

Goal: create the repo skeleton, rules, CI baseline, and architectural fitness checks before agent behavior grows.

Tasks:

Create monorepo structure.
Add AGENTS.md, rules/mainline.md, and mainline-guardian skill.
Add package boundaries and import rules.
Add schema package with initial event types.
Add CI workflow skeleton.
Add PR template.
Add ADR template.
Add Web app shell with empty session list.

Acceptance criteria:

CI runs lint, typecheck, unit tests, and architecture checks.
A fake session fixture can be rendered in Web.
Any cross-boundary import violation fails CI.
Every PR must reference a roadmap item.

Milestone 1: Event-Sourced Session Core

Target: 2 weeks

Goal: build a minimal session engine without tools.

Tasks:

Implement append-only JSONL event log.
Implement SQLite index for sessions and turns.
Implement SessionEngine.createSession.
Implement SessionEngine.runTurn with fake model provider.
Implement cancellation.
Implement session replay.
Build Web event timeline and transcript viewer.

Acceptance criteria:

A user message produces a deterministic fake assistant response.
Session replay reconstructs the exact transcript.
Web can display live and replayed events.
Golden transcript tests pass.

Milestone 2: Model Gateway

Target: 2 weeks

Goal: connect one real model provider behind the normalized stream interface.

Tasks:

Define ModelProvider interface.
Implement one provider first, preferably OpenAI Responses or OpenAI-compatible.
Normalize text delta, tool call request, usage, and errors.
Add provider fixture tests with recorded responses.
Add token and cost accounting fields.
Add Web model request inspector.

Acceptance criteria:

A real model can answer in a session.
Provider-specific raw payloads are not used by core.
Fixture tests do not require network.
Web shows model latency, usage, and normalized stream events.

Milestone 3: Local Tools and Permission Engine

Target: 2 weeks

Goal: give the agent safe local coding capabilities.

Tasks:

Implement PermissionEngine.
Implement local tools:
- read_file
- list_files
- search_text
- shell
- apply_patch
- git_diff
Add command risk classifier.
Add output budget and truncation.
Add Web approval UI.
Add Web diff viewer.

Acceptance criteria:

Every tool call has a permission event.
Read-only tools can be auto-approved by policy.
Shell and patch tools require approval by default.
Tool output is bounded and visible in Web.
Regression scenario can modify a fixture repo and show diff.

Milestone 4: Context Builder, Instructions, and Compaction

Target: 2 weeks

Goal: make sessions aware of project rules without letting context become unbounded.

Tasks:

Implement instruction discovery:
- global file
- project root file
- directory-scoped file
Support AGENTS.md as the default project instruction file.
Add configurable fallback names.
Add context budget accounting.
Add deterministic compaction.
Add Web context inspector.

Acceptance criteria:

Nested instructions load in documented order.
Context budget is visible before model call.
Compaction creates a replayable session.compacted event.
Instructions survive compaction.

Milestone 5: Skills

Target: 2 weeks

Goal: package repeatable workflows without bloating every prompt.

Tasks:

Implement skill discovery.
Parse skill metadata.
Load only metadata at startup.
Lazy-load full SKILL.md on invocation.
Add allowed_tools enforcement.
Add /skill list and /skill run.
Add Web skill inspector.

Acceptance criteria:

Skills cannot use tools outside their declared policy.
Skill load events appear in transcript.
A regression skill can run a documented workflow.
Adding a new skill does not change base context size except metadata.

Milestone 6: MCP Stdio

Target: 2 weeks

Goal: connect external tools through MCP without bypassing permissions.

Tasks:

Implement MCP stdio transport.
Implement server lifecycle.
Implement initialize.
Implement tools/list and tools/call.
Implement tool namespacing.
Implement include/exclude tool config.
Add timeout, cancellation, and health state.
Add Web MCP server panel.

Acceptance criteria:

A local MCP server can expose a tool.
MCP tool calls go through PermissionEngine.
Tool name conflicts are resolved by namespace.
Server crash is visible and recoverable.
MCP contract tests pass against a fake MCP server.

Milestone 7: MCP Resources and Prompts

Target: 1-2 weeks

Goal: support MCP context and reusable prompt templates.

Tasks:

Implement resources/list.
Implement resources/read.
Implement prompts/list.
Implement prompts/get.
Add resource selection model.
Add prompt invocation model.
Add Web resource and prompt browser.

Acceptance criteria:

Resource content can be explicitly included in context.
Prompt templates can be invoked by user command.
MCP resources are not automatically dumped into model context.

Milestone 8: ACP Server

Target: 2 weeks

Goal: expose the core to ACP-compatible clients.

Tasks:

Implement JSON-RPC server.
Implement initialize.
Implement session/new.
Implement session/load.
Implement session/prompt.
Implement session/cancel.
Translate core events into ACP updates.
Forward permission requests.
Add ACP protocol fixture tests.

Acceptance criteria:

ACP client can start a session and receive streamed updates.
ACP session replay matches Web replay.
ACP adapter owns no agent logic.
ACP errors are typed and tested.

Milestone 9: Hardening and Beta

Target: 3-4 weeks

Goal: make the tool reliable enough for real projects.

Tasks:

Add sandbox profile support.
Add secret redaction.
Add audit log export.
Add large output summarization.
Add retry/backoff for providers.
Add failure taxonomy.
Add benchmark scenarios.
Add release packaging.

Acceptance criteria:

Regression suite covers common coding tasks.
Permission bypass tests pass.
Large repo fixture does not exceed context budget unexpectedly.
A release can be installed and used on a clean machine.

Roadmap Rule

No milestone should be considered complete until:

Web regression scenario exists.
Unit tests cover core behavior.
Contract tests cover external protocol behavior.
Documentation is updated.
mainline-guardian review has no blocking findings.