Skip to content

Implementation Backlog

This backlog converts the roadmap into issue-sized work. Each item should become one PR unless the implementation proves smaller.

Definition of Done for Every Item

  • Roadmap id is named in PR.
  • Package ownership is clear.
  • Tests are added or updated.
  • Web regression impact is documented.
  • No architecture fitness rule is weakened.
  • mainline-guardian has no blocking finding.

M0: Project Spine

M0-01: Create Monorepo Skeleton

Depends on: none

Deliverables:

  • apps/web-client.
  • apps/cli.
  • apps/acp-server.
  • packages/core.
  • packages/schema.
  • packages/storage.
  • packages/permissions.

Tests:

  • Workspace install.
  • Empty package typecheck.

Acceptance:

  • CI can discover all packages.
  • No package has circular dependencies.

M0-02: Add Architecture Fitness Test Harness

Depends on: M0-01

Deliverables:

  • Import boundary test.
  • Circular dependency test.
  • Forbidden dependency list.

Tests:

  • Positive fixture.
  • Negative fixture that fails on core -> apps/*.

Acceptance:

  • CI fails when core imports a client package.

M0-03: Add Event Schema Foundation

Depends on: M0-01

Deliverables:

  • Versioned event envelope.
  • Initial event union.
  • Fixture normalization helper.

Tests:

  • Schema parse tests.
  • Fixture round-trip tests.

Acceptance:

  • A fake event log validates fully from disk.

M0-04: Add Web Shell

Depends on: M0-03

Deliverables:

  • Session list screen.
  • Transcript screen.
  • Event timeline screen.
  • Fixture event loader.

Tests:

  • Playwright loads fixture session.
  • Screenshot smoke test.

Acceptance:

  • Web renders a static fake session without a backend.

M1: Session Core

M1-01: Implement Append-Only Event Log

Depends on: M0-03

Deliverables:

  • JSONL writer.
  • JSONL reader.
  • Event ordering guarantees.
  • Corrupt-line recovery policy.

Tests:

  • Append/replay test.
  • Crash-safe partial write test.

Acceptance:

  • Replay returns exactly the events that were committed.

M1-02: Implement Session Index

Depends on: M1-01

Deliverables:

  • SQLite session index.
  • Turn index.
  • Rebuild index from JSONL.

Tests:

  • Index rebuild test.
  • Missing index recovery test.

Acceptance:

  • Deleting SQLite and rebuilding from JSONL restores session list.

M1-03: Implement Fake Provider Turn

Depends on: M1-01

Deliverables:

  • SessionEngine.createSession.
  • SessionEngine.runTurn.
  • Fake streaming model provider.

Tests:

  • Turn state machine test.
  • Cancellation test.

Acceptance:

  • Web can display live fake streaming from backend events.

M1-04: Implement Replay API

Depends on: M1-01, M1-03

Deliverables:

  • Session replay endpoint/API.
  • Normalized transcript projection.

Tests:

  • Golden transcript test.
  • Replay/live equivalence test.

Acceptance:

  • Web replay matches the live transcript.

M2: Model Gateway

M2-01: Define Model Provider Port

Depends on: M1-03

Deliverables:

  • Provider interface.
  • Normalized stream event types.
  • Capability model.

Tests:

  • Fake provider contract test.

Acceptance:

  • Core depends only on provider port, not SDKs.

M2-02: Add First Real Provider

Depends on: M2-01

Deliverables:

  • One provider adapter.
  • Fixture recording format.
  • Error normalization.

Tests:

  • Recorded stream fixture test.
  • Network-disabled CI test.

Acceptance:

  • Real provider works locally.
  • CI can test provider behavior without network.

M3: Tools and Permissions

M3-01: Implement Permission Engine

Depends on: M1-03

Deliverables:

  • Permission request schema.
  • Policy evaluator.
  • Ask/allow/deny decisions.

Tests:

  • Policy matrix tests.
  • Permission event emission tests.

Acceptance:

  • No tool executor can be reached without a permission result.

M3-02: Implement Read/Search Tools

Depends on: M3-01

Deliverables:

  • read_file.
  • list_files.
  • search_text.
  • Output budget.

Tests:

  • Path safety tests.
  • Gitignore behavior tests.
  • Output truncation tests.

Acceptance:

  • Read-only tool calls are visible in Web event timeline.

M3-03: Implement Shell Tool

Depends on: M3-01

Deliverables:

  • Command execution.
  • Timeout.
  • CWD restriction.
  • Environment redaction.

Tests:

  • Deny dangerous command.
  • Timeout.
  • Output budget.

Acceptance:

  • Risky shell calls require approval by default.

M3-04: Implement Patch Tool

Depends on: M3-01

Deliverables:

  • Apply patch.
  • Diff capture.
  • Dirty worktree warning.

Tests:

  • Patch success.
  • Patch conflict.
  • Existing user change preservation.

Acceptance:

  • Web diff viewer shows generated changes by turn.

M4: Context, Instructions, and Compaction

M4-01: Implement Instruction Discovery

Depends on: M1-03

Deliverables:

  • Global instruction file loading.
  • Project instruction file loading.
  • Directory-scoped instruction loading.
  • Configurable fallback file names.

Tests:

  • Nested instruction order.
  • Override behavior.
  • Missing file behavior.
  • Max bytes behavior.

Acceptance:

  • Web context inspector shows every loaded instruction source in order.

M4-02: Implement Context Budget Accounting

Depends on: M4-01, M2-01

Deliverables:

  • Context part model.
  • Token estimation.
  • Budget categories.
  • Truncation policy.

Tests:

  • Budget calculation fixtures.
  • Deterministic truncation tests.

Acceptance:

  • Every model request records a context.built event with budget details.

M4-03: Implement Deterministic Compaction

Depends on: M4-02

Deliverables:

  • Compaction trigger.
  • Summary event.
  • Replay integration.
  • Instruction preservation.

Tests:

  • Compaction replay test.
  • Instruction survival test.
  • Golden transcript after compaction.

Acceptance:

  • A compacted session can continue and replay without losing active instructions.

M4-04: Implement Memory Candidate Workflow

Depends on: M4-01

Deliverables:

  • Memory candidate schema.
  • Markdown diff candidate storage.
  • Review/apply/discard model.
  • Web memory candidate panel.

Tests:

  • Candidate creation.
  • Candidate apply.
  • Candidate discard.
  • Rollback.

Acceptance:

  • No durable memory write can happen without an auditable candidate event.

M5: Skills

M5-01: Implement Skill Discovery

Depends on: M4-02

Deliverables:

  • Skill directory scan.
  • SKILL.md metadata parser.
  • Skill registry.
  • Startup metadata context.

Tests:

  • Valid skill metadata.
  • Invalid skill metadata.
  • Duplicate skill names.

Acceptance:

  • Base context includes skill metadata only, not full skill bodies.

M5-02: Implement Skill Lazy Loading

Depends on: M5-01

Deliverables:

  • Skill invocation model.
  • Full body load on demand.
  • skill.loaded event.
  • Allowed tool policy.

Tests:

  • Lazy-load behavior.
  • Tool policy enforcement.
  • Missing skill behavior.

Acceptance:

  • Skill bodies load only when invoked and cannot exceed declared tool policy.

M5-03: Add Skill UX and Regression

Depends on: M5-02

Deliverables:

  • /skill list.
  • /skill run.
  • Web skill inspector.
  • Regression skill fixture.

Tests:

  • CLI command test.
  • Web skill inspector test.
  • End-to-end skill scenario.

Acceptance:

  • A skill can drive a repeatable workflow with visible events and Web regression evidence.

M6: MCP Stdio

M6-01: Implement MCP Server Lifecycle

Depends on: M3-01

Deliverables:

  • MCP server config schema.
  • Stdio process startup.
  • Initialize handshake.
  • Shutdown.
  • Health state.

Tests:

  • Fake MCP server initialize.
  • Startup failure.
  • Shutdown cleanup.

Acceptance:

  • Web MCP panel shows configured servers and health.

M6-02: Implement MCP Tool Discovery and Calls

Depends on: M6-01

Deliverables:

  • tools/list.
  • tools/call.
  • Tool namespace.
  • Include/exclude config.
  • Tool output normalization.

Tests:

  • Tool list contract.
  • Tool call contract.
  • Namespace collision.
  • Invalid arguments.

Acceptance:

  • MCP tools appear in the tool registry with namespaced identifiers.

M6-03: Integrate MCP With Permissions

Depends on: M6-02

Deliverables:

  • MCP risk classification.
  • Permission event integration.
  • Timeout and cancellation.
  • Server crash recovery behavior.

Tests:

  • MCP tool requires approval.
  • MCP timeout.
  • MCP cancellation.
  • MCP crash during call.

Acceptance:

  • No MCP tool can execute without a permission decision.

M7: MCP Resources, Prompts, and HTTP

M7-01: Implement MCP Resources

Depends on: M6-01

Deliverables:

  • resources/list.
  • resources/read.
  • Resource selection model.
  • Web resource browser.

Tests:

  • Resource list contract.
  • Resource read contract.
  • Explicit inclusion only.

Acceptance:

  • MCP resources can be included in context only by explicit user or policy action.

M7-02: Implement MCP Prompts

Depends on: M6-01

Deliverables:

  • prompts/list.
  • prompts/get.
  • Prompt argument validation.
  • Prompt invocation UX.

Tests:

  • Prompt list contract.
  • Prompt get contract.
  • Missing argument behavior.

Acceptance:

  • MCP prompts can be surfaced as user-invoked commands.

M7-03: Implement Streamable HTTP MCP

Depends on: M6-02

Deliverables:

  • HTTP transport.
  • Protocol version header.
  • Session id header handling.
  • SSE response handling.
  • Reconnect behavior.

Tests:

  • HTTP initialize.
  • SSE response stream.
  • Session id propagation.
  • 404 session restart behavior.

Acceptance:

  • A Streamable HTTP MCP test server passes the same tool contract suite as stdio.

M8: ACP Server

M8-01: Implement ACP JSON-RPC Transport

Depends on: M1-04

Deliverables:

  • JSON-RPC message parser.
  • Request/response mapping.
  • Error model.
  • Initialization method.

Tests:

  • Valid JSON-RPC request.
  • Invalid JSON-RPC request.
  • Initialize negotiation.

Acceptance:

  • ACP client can initialize and receive declared capabilities.

M8-02: Implement ACP Session Methods

Depends on: M8-01, M1-04

Deliverables:

  • session/new.
  • session/load.
  • session/prompt.
  • session/cancel.
  • Session id mapping.

Tests:

  • New session.
  • Load session.
  • Prompt session.
  • Cancel turn.

Acceptance:

  • ACP session behavior matches core session behavior.

M8-03: Translate Core Events to ACP Updates

Depends on: M8-02

Deliverables:

  • Event-to-update mapper.
  • Permission request forwarding.
  • Tool update forwarding.
  • Plan/update extension hooks.

Tests:

  • Streaming update fixture.
  • Permission request fixture.
  • Tool call fixture.

Acceptance:

  • ACP replay and Web replay show equivalent turn semantics.

M9: Hardening and Beta

M9-01: Add Sandbox Execution Profiles

Depends on: M3-03

Deliverables:

  • Sandbox config schema.
  • Local sandbox adapter.
  • Policy integration.
  • Web sandbox visibility.

Tests:

  • Read-only sandbox.
  • Workspace-write sandbox.
  • Denied path write.

Acceptance:

  • Shell and patch behavior can be constrained by sandbox policy.

M9-02: Add Secret Redaction and Audit Export

Depends on: M1-01, M3-03

Deliverables:

  • Secret pattern redactor.
  • Log redaction pipeline.
  • Audit export.
  • Redaction test fixtures.

Tests:

  • API-key-like string redaction.
  • Redaction before persistence.
  • Audit export integrity.

Acceptance:

  • Secret-looking values are redacted before durable logs.

M9-03: Add Release Packaging

Depends on: M8-03

Deliverables:

  • CLI package.
  • Web build package.
  • ACP server entrypoint.
  • Version metadata.
  • Release dry run.

Tests:

  • Clean install smoke test.
  • Version command.
  • Packaged Web launch.

Acceptance:

  • A clean machine can install and run the beta build.

M9-04: Add Beta Regression Suite

Depends on: M9-01, M9-02

Deliverables:

  • Fixture repo suite.
  • Golden event logs.
  • Playwright screenshot baselines.
  • Replay compatibility suite.

Tests:

  • Basic coding task.
  • Permission denial.
  • MCP tool call.
  • Skill workflow.
  • Compaction/resume.

Acceptance:

  • Release candidates must pass the full beta regression suite.