Web Client Regression Plan
Purpose
The Web client should be the first client because it makes the agent observable. It should expose the exact event stream, tool calls, context construction, permissions, and diffs that other clients may hide.
This is how the project avoids debugging a black-box agent.
MVP Web Screens
Session List
Shows:
- Session id.
- CWD.
- Created time.
- Last turn status.
- Model.
- Error state.
Transcript
Shows:
- User messages.
- Assistant deltas.
- Tool calls.
- Tool results.
- Final response.
- Compact summaries.
Event Timeline
Shows every raw event in order, with filters:
- Model.
- Tool.
- Permission.
- Memory.
- Skill.
- Error.
Permission Panel
Shows:
- Requested tool.
- Arguments.
- Risk class.
- CWD.
- Policy reason.
- Allow, deny, or edit decision.
Tool Inspector
Shows:
- Tool source: local, MCP, skill.
- Input.
- Output.
- Duration.
- Truncation.
- Error.
Diff Viewer
Shows:
- Files changed.
- Patch.
- User pre-existing changes warning.
- Generated changes by turn.
Context Inspector
Shows:
- Instruction files loaded.
- Skills metadata loaded.
- Memory entries loaded.
- MCP resources included.
- Token budget.
- Truncation or compaction decisions.
Regression Runner
Runs canned scenarios and stores:
- Input prompt.
- Fixture repo.
- Expected event predicates.
- Expected file diffs.
- Screenshots.
- Replay logs.
Regression Scenario Format
Each scenario should define:
id: basic-read-searchfixture: fixture-basic-jsmodel: fake-tool-modelprompt: "Find the add function and explain it."expect: events: - tool.started: search_text - tool.completed: search_text - turn.completed filesChanged: []Initial Scenarios
session-basic-response
Goal: prove session creation, streaming, and replay.
Expected:
- Session created.
- User message event.
- Model delta event.
- Turn completed.
- Replay matches live transcript.
permission-deny-shell
Goal: prove dangerous tools can be denied.
Expected:
- Shell permission requested.
- User denies.
- Tool does not execute.
- Model receives denial result.
patch-small-file
Goal: prove controlled file edit.
Expected:
- Read file.
- Patch permission requested.
- Patch applied.
- Diff displayed.
- Tests run after approval.
context-nested-agents
Goal: prove instruction hierarchy.
Expected:
- Global instruction loaded.
- Project instruction loaded.
- Directory instruction loaded.
- Later instruction overrides earlier instruction in context order.
skill-lazy-load
Goal: prove skills do not bloat base context.
Expected:
- Skill metadata available.
- Full skill body not loaded at session start.
- Skill loaded only after invocation.
mcp-stdio-tool
Goal: prove MCP tool discovery and call.
Expected:
- MCP server initialized.
- Tools listed.
- Namespaced tool called.
- Permission event exists.
session-compact-resume
Goal: prove compaction does not lose active instructions.
Expected:
- Context crosses budget threshold.
- Compaction event created.
- Session resumes.
- Instruction files still present in rebuilt context.
Playwright Expectations
The UI tests should assert:
- No blank transcript during streaming.
- Permission panel appears before risky tool execution.
- Diff viewer renders changed files.
- Event filters work.
- Replay screen matches live transcript snapshot.
Golden Event Tests
For each Web scenario, save a normalized event log fixture.
Normalize:
- Timestamps.
- Random ids.
- Durations.
- Provider request ids.
Do not normalize:
- Event order.
- Tool names.
- Permission decisions.
- File paths within fixture root.
- Error categories.