Executable Roadmap
This roadmap is intentionally staged around testable increments. Each milestone should produce a working Web regression surface, not only library code.
Milestone 0: Project Spine and Governance
Target: 1 week
Goal: create the repo skeleton, rules, CI baseline, and architectural fitness checks before agent behavior grows.
Tasks:
- Create monorepo structure.
- Add
AGENTS.md,rules/mainline.md, andmainline-guardianskill. - Add package boundaries and import rules.
- Add schema package with initial event types.
- Add CI workflow skeleton.
- Add PR template.
- Add ADR template.
- Add Web app shell with empty session list.
Acceptance criteria:
- CI runs lint, typecheck, unit tests, and architecture checks.
- A fake session fixture can be rendered in Web.
- Any cross-boundary import violation fails CI.
- Every PR must reference a roadmap item.
Milestone 1: Event-Sourced Session Core
Target: 2 weeks
Goal: build a minimal session engine without tools.
Tasks:
- Implement append-only JSONL event log.
- Implement SQLite index for sessions and turns.
- Implement
SessionEngine.createSession. - Implement
SessionEngine.runTurnwith fake model provider. - Implement cancellation.
- Implement session replay.
- Build Web event timeline and transcript viewer.
Acceptance criteria:
- A user message produces a deterministic fake assistant response.
- Session replay reconstructs the exact transcript.
- Web can display live and replayed events.
- Golden transcript tests pass.
Milestone 2: Model Gateway
Target: 2 weeks
Goal: connect one real model provider behind the normalized stream interface.
Tasks:
- Define
ModelProviderinterface. - Implement one provider first, preferably OpenAI Responses or OpenAI-compatible.
- Normalize text delta, tool call request, usage, and errors.
- Add provider fixture tests with recorded responses.
- Add token and cost accounting fields.
- Add Web model request inspector.
Acceptance criteria:
- A real model can answer in a session.
- Provider-specific raw payloads are not used by core.
- Fixture tests do not require network.
- Web shows model latency, usage, and normalized stream events.
Milestone 3: Local Tools and Permission Engine
Target: 2 weeks
Goal: give the agent safe local coding capabilities.
Tasks:
- Implement
PermissionEngine. - Implement local tools:
read_filelist_filessearch_textshellapply_patchgit_diff
- Add command risk classifier.
- Add output budget and truncation.
- Add Web approval UI.
- Add Web diff viewer.
Acceptance criteria:
- Every tool call has a permission event.
- Read-only tools can be auto-approved by policy.
- Shell and patch tools require approval by default.
- Tool output is bounded and visible in Web.
- Regression scenario can modify a fixture repo and show diff.
Milestone 4: Context Builder, Instructions, and Compaction
Target: 2 weeks
Goal: make sessions aware of project rules without letting context become unbounded.
Tasks:
- Implement instruction discovery:
- global file
- project root file
- directory-scoped file
- Support
AGENTS.mdas the default project instruction file. - Add configurable fallback names.
- Add context budget accounting.
- Add deterministic compaction.
- Add Web context inspector.
Acceptance criteria:
- Nested instructions load in documented order.
- Context budget is visible before model call.
- Compaction creates a replayable
session.compactedevent. - Instructions survive compaction.
Milestone 5: Skills
Target: 2 weeks
Goal: package repeatable workflows without bloating every prompt.
Tasks:
- Implement skill discovery.
- Parse skill metadata.
- Load only metadata at startup.
- Lazy-load full
SKILL.mdon invocation. - Add
allowed_toolsenforcement. - Add
/skill listand/skill run. - Add Web skill inspector.
Acceptance criteria:
- Skills cannot use tools outside their declared policy.
- Skill load events appear in transcript.
- A regression skill can run a documented workflow.
- Adding a new skill does not change base context size except metadata.
Milestone 6: MCP Stdio
Target: 2 weeks
Goal: connect external tools through MCP without bypassing permissions.
Tasks:
- Implement MCP stdio transport.
- Implement server lifecycle.
- Implement
initialize. - Implement
tools/listandtools/call. - Implement tool namespacing.
- Implement include/exclude tool config.
- Add timeout, cancellation, and health state.
- Add Web MCP server panel.
Acceptance criteria:
- A local MCP server can expose a tool.
- MCP tool calls go through
PermissionEngine. - Tool name conflicts are resolved by namespace.
- Server crash is visible and recoverable.
- MCP contract tests pass against a fake MCP server.
Milestone 7: MCP Resources and Prompts
Target: 1-2 weeks
Goal: support MCP context and reusable prompt templates.
Tasks:
- Implement
resources/list. - Implement
resources/read. - Implement
prompts/list. - Implement
prompts/get. - Add resource selection model.
- Add prompt invocation model.
- Add Web resource and prompt browser.
Acceptance criteria:
- Resource content can be explicitly included in context.
- Prompt templates can be invoked by user command.
- MCP resources are not automatically dumped into model context.
Milestone 8: ACP Server
Target: 2 weeks
Goal: expose the core to ACP-compatible clients.
Tasks:
- Implement JSON-RPC server.
- Implement
initialize. - Implement
session/new. - Implement
session/load. - Implement
session/prompt. - Implement
session/cancel. - Translate core events into ACP updates.
- Forward permission requests.
- Add ACP protocol fixture tests.
Acceptance criteria:
- ACP client can start a session and receive streamed updates.
- ACP session replay matches Web replay.
- ACP adapter owns no agent logic.
- ACP errors are typed and tested.
Milestone 9: Hardening and Beta
Target: 3-4 weeks
Goal: make the tool reliable enough for real projects.
Tasks:
- Add sandbox profile support.
- Add secret redaction.
- Add audit log export.
- Add large output summarization.
- Add retry/backoff for providers.
- Add failure taxonomy.
- Add benchmark scenarios.
- Add release packaging.
Acceptance criteria:
- Regression suite covers common coding tasks.
- Permission bypass tests pass.
- Large repo fixture does not exceed context budget unexpectedly.
- A release can be installed and used on a clean machine.
Roadmap Rule
No milestone should be considered complete until:
- Web regression scenario exists.
- Unit tests cover core behavior.
- Contract tests cover external protocol behavior.
- Documentation is updated.
mainline-guardianreview has no blocking findings.