Skip to content

RFC: browser event capture#145

Open
rgarcia wants to merge 5 commits intomainfrom
rfc/browser-event-capture
Open

RFC: browser event capture#145
rgarcia wants to merge 5 commits intomainfrom
rfc/browser-event-capture

Conversation

@rgarcia
Copy link
Contributor

@rgarcia rgarcia commented Feb 5, 2026

Summary

Design document for a configurable browser event streaming system on the image server.

  • Captures CDP events (console, network, DOM, layout shifts, screenshots, interactions) via raw WebSocket to Chrome
  • Tags every event with tab/frame/target context (session ID, target ID, frame ID) using Target.setAutoAttach with flatten: true
  • Computes meta-events for smart waiting: network_idle, layout_settled, navigation_settled (composite of dom_content_loaded + network_idle + layout_settled)
  • Dual-writes events to S2 streams (durable, multi-consumer) and a local ring buffer (SSE endpoint)
  • All capture is off by default; turned on/reconfigured via POST /events/start with a config body
  • Events capped at 1MB (S2 limit); large network response bodies truncated with a truncated flag

The full RFC is in .cursor/plans/2026-02-05-events.md. Also adds devtools-protocol/ as a reference for CDP domain definitions.

Test plan

  • Review RFC for completeness and correctness
  • Validate event schema covers agent use cases
  • Validate computed settling signals are useful wait primitives
  • Confirm S2 integration approach matches existing kernel patterns

Made with Cursor


Note

Low Risk
Documentation-only change that adds no runtime code or API surface; low risk aside from potentially setting expectations for future implementation.

Overview
Adds a new RFC document (.cursor/plans/2026-02-05-events.md) describing a proposed browser event capture/streaming system for the image server.

The doc specifies the intended event schema, capture configuration and endpoints (/events/start, /events/stop, /events/stream), computed “settling” events, multi-target CDP strategy, screenshot/S2 streaming approach, and a planned testing matrix.

Written by Cursor Bugbot for commit 30372ae. This will update automatically on new commits. Configure here.

Add design document for a configurable browser event streaming system
that captures CDP events (console, network, DOM, layout shifts,
screenshots, interactions), tags them with tab/frame context, and
writes them durably to S2 streams.

Co-authored-by: Cursor <cursoragent@cursor.com>
- layout_settled: start 1s timer after page_load, reset on each shift,
  emit when timer expires. Handles zero-shift pages correctly.
- screenshots: downscale PNG by halving dimensions if base64 exceeds
  ~950KB, rather than truncating (which corrupts binary data).

Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link
Contributor Author

@rgarcia rgarcia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both addressed in 7b9c491. Thanks for the catches.

| Type | Trigger |
|------|---------|
| `network_idle` | Pending request count at 0 for 500ms after navigation |
| `layout_settled` | 1s of no layout-shift entries after page_load (timer resets on each shift) |
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch -- the table and description were contradictory. Fixed in 7b9c491: after page_load, start a 1s timer. Each layout shift resets the timer. layout_settled fires when the timer expires (1s of quiet). For zero-shift pages, this correctly fires 1s after page_load.

| `interaction_key` | Injected JS | key, selector, tag |
| `interaction_scroll` | Injected JS | from_x, from_y, to_x, to_y, target_selector |
| `layout_shift` | Injected PerformanceObserver | score, sources (element, previous_rect, current_rect) |
| `screenshot` | ffmpeg x11grab (full display) | base64 PNG in data |
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid concern -- truncating base64 PNG data produces corrupt output. We don't support 4K displays so this is unlikely in practice, but the plan now specifies: if the base64 PNG exceeds ~950KB, downscale by halving dimensions and re-encode. This keeps a usable PNG under the 1MB S2 limit. Fixed in 7b9c491.

Copy link
Contributor

@ulziibay-kernel ulziibay-kernel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this an infra through we we can log IP addresses the browser sessions are assigned to? That is highly relevant for https://linear.app/onkernel/issue/KERNEL-801/residential-ip-reputation-measurement

Copy link
Contributor

@Sayan- Sayan- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

building on top of CDP, S2 makes sense! I think the main risks are going to be some of the signal settling + chromium lifecycle handling but all solvable problems

@rgarcia
Copy link
Contributor Author

rgarcia commented Feb 6, 2026

Responding to all of @Sayan-'s comments inline below. Will push a spec update commit shortly.

CDP perf (L37): Dug into the Chromium source (content/browser/devtools/devtools_session.cc and devtools_agent_host_impl.cc). CDP connections are fully session-isolated: each WebSocket client gets its own DevToolsSession with its own handlers_ map, and each domain handler has its own enabled_ flag. Events flow through SendProtocolNotification → DispatchProtocolMessageToClient → client_->DispatchProtocolMessage(), scoped to the session that enabled the domain. So a second CDP connection does NOT cause Chrome to "double" events — each connection only receives events for domains it has explicitly enabled. The overhead is one additional WebSocket connection + the serialization of events the monitor subscribes to. User opts into the scope/duration/quantity of events to capture, so this is acceptable. Will add a note to the spec clarifying this and recommending a benchmark under load once implemented.

monitor_disconnected/reconnected (L39): Agreed. Will add monitor_disconnected and monitor_reconnected as synthetic events so consumers know there's a gap in the stream.

Sequence ID / ordering (L52): Will add a monotonic seq field to BrowserEvent that resets on server startup. This gives total ordering within a capture session. (seq, type, ts) triples can be used for deduplication. SSE events will include id: <seq> so clients can use Last-Event-ID for reconnection.

cdp_session_id (L56): Good call. Renaming session_idcdp_session_id in the schema to avoid confusion with Kernel's broader "session" concept.

Custom transform semantics (L67): Confirmed — this is intentional. Each event type has a custom handler that maps CDP params into our data schema. This gives us control over field naming, truncation, context enrichment, and forward compatibility. Generic CDP passthrough would leak internal Chrome details and make the consumer contract fragile. Adding a new event type is a small, contained code change.

Ring buffer as single write path (L204): Yes — rewiring the architecture so the monitor writes only to the ring buffer, and the S2 writer is just another consumer (like SSE clients). Single write path, S2 latency decoupled from CDP processing. Updating the diagram and description.

Testing gaps (L235): Adding failure mode test scenarios: Chrome crash/restart during capture (verify monitor_disconnected / monitor_reconnected + automatic re-subscribe), ring buffer overflow under high event volume (verify oldest events evicted, no crash), and /events/start when Chrome isn't ready (verify graceful error or queued start).

…tecture, failure tests

- Add monotonic `seq` field to BrowserEvent for total ordering and SSE reconnection
- Rename session_id → cdp_session_id to avoid confusion with Kernel sessions
- Rewire architecture: monitor writes only to ring buffer, S2 writer is a consumer
- Add CDP connection isolation note (confirmed from Chromium source)
- Add monitor_disconnected/reconnected synthetic events for gap detection
- Add e2e_events_failure_test.go for Chrome crash, ring buffer overflow, early start
- Update SSE endpoint to include id: <seq> for Last-Event-ID support

Co-authored-by: Cursor <cursoragent@cursor.com>
@cursor
Copy link

cursor bot commented Feb 6, 2026

Bugbot Autofix prepared fixes for 3 of the 3 bugs found in the latest run.

  • ✅ Fixed: Deduplication strategy fails across server restarts
    • Added a capture_session_id (UUIDv4) field to BrowserEvent and changed deduplication guidance from (seq, type, ts) to (capture_session_id, seq), which is unique across server restarts.
  • ✅ Fixed: Computed events have undocumented config dependencies
    • Documented that computed_events requires network, navigation, layout_shifts, and interactions, and specified that config validation will auto-enable these dependencies with a warning.
  • ✅ Fixed: Missing state reset after Chrome crash reconnection
    • Added explicit reconnect protocol requiring full settling state reset (counters, timers, booleans), script re-injection (PerformanceObserver, interaction JS), and CDP domain re-subscription, with corresponding test plan updates.

Create PR

Or push these changes by commenting:

@cursor push 80a186b49e
Preview (80a186b49e)
diff --git a/.cursor/plans/2026-02-05-events.md b/.cursor/plans/2026-02-05-events.md
--- a/.cursor/plans/2026-02-05-events.md
+++ b/.cursor/plans/2026-02-05-events.md
@@ -54,20 +54,21 @@
 
 ```go
 type BrowserEvent struct {
-    Seq           uint64          `json:"seq"`                           // monotonic sequence number, resets on server startup
-    Timestamp     int64           `json:"ts"`                            // unix millis
-    Type          string          `json:"type"`                          // snake_case event name
-    TargetID      string          `json:"target_id,omitempty"`           // CDP target ID (tab/window)
-    CDPSessionID  string          `json:"cdp_session_id,omitempty"`      // CDP session ID (not Kernel session)
-    FrameID       string          `json:"frame_id,omitempty"`            // CDP frame ID
-    ParentFrameID string          `json:"parent_frame_id,omitempty"`     // non-empty = iframe
-    URL           string          `json:"url,omitempty"`                 // URL context
-    Data          json.RawMessage `json:"data"`                          // event-specific payload
-    Truncated     bool            `json:"truncated,omitempty"`           // true if payload was cut to fit 1MB
+    CaptureSessionID string          `json:"capture_session_id"`            // unique ID generated at each capture start (UUIDv4), stable across reconnects within one session
+    Seq              uint64          `json:"seq"`                           // monotonic sequence number, resets on server startup
+    Timestamp        int64           `json:"ts"`                            // unix millis
+    Type             string          `json:"type"`                          // snake_case event name
+    TargetID         string          `json:"target_id,omitempty"`           // CDP target ID (tab/window)
+    CDPSessionID     string          `json:"cdp_session_id,omitempty"`      // CDP session ID (not Kernel session)
+    FrameID          string          `json:"frame_id,omitempty"`            // CDP frame ID
+    ParentFrameID    string          `json:"parent_frame_id,omitempty"`     // non-empty = iframe
+    URL              string          `json:"url,omitempty"`                 // URL context
+    Data             json.RawMessage `json:"data"`                          // event-specific payload
+    Truncated        bool            `json:"truncated,omitempty"`           // true if payload was cut to fit 1MB
 }

-The seq field provides total ordering within a capture session. Consumers can use (seq, type, ts) triples for deduplication (S2 provides at-least-once delivery). The counter is a uint64 incremented atomically and resets when the server process restarts.
+The capture_session_id is a UUIDv4 generated once when POST /events/start creates a new capture session; it remains stable across Chrome crash reconnections within that session but changes on every new start. The seq field provides total ordering within a capture session. Consumers should use (capture_session_id, seq) pairs for deduplication (S2 provides at-least-once delivery). This is safe across server restarts because capture_session_id is unique per session even though seq resets. The counter is a uint64 incremented atomically and resets when the server process restarts.

Event Types

@@ -101,6 +102,12 @@

These events let consumers detect gaps in the event stream rather than silently missing events during Chrome restarts.

+On reconnect, the monitor must perform a full state reset before re-subscribing to CDP domains:
+
+1. Reset settling state: zero the pending network request counter, cancel all in-flight timers (network idle, layout settled), and clear the navigation_settled boolean flags (dom_content_loaded_fired, network_idle_fired, layout_settled_fired). Without this, the request counter could be stuck at a non-zero value from requests that will never complete in the crashed context, blocking network_idle forever.
+2. Re-inject page scripts: the PerformanceObserver for layout shifts and the interaction tracking JS (clicks, keys, scrolls) lived in the old page context and are lost on crash. After reconnection and domain re-subscription, re-inject these scripts into all attached targets.
+3. Re-subscribe CDP domains: call the appropriate *.enable methods and Target.setAutoAttach on the new connection, as already implied by the architecture.
+
Computed meta-events (emitted by the monitor's settling logic):

| Type | Trigger |
@@ -204,7 +211,13 @@
description: Inject JS to track clicks, keys, scrolls
computed_events:
type: boolean

  •  description: Emit computed meta-events (network_idle, layout_settled, scroll_settled, navigation_settled)
    
  •  description: >
    
  •    Emit computed meta-events (network_idle, layout_settled, scroll_settled, navigation_settled).
    
  •    Requires network=true (for network_idle request counting), navigation=true (for page load/navigation triggers),
    
  •    layout_shifts=true (for layout_settled shift detection), and interactions=true (for scroll_settled tracking).
    
  •    When computed_events is true, config validation will auto-enable these dependencies and log a warning
    
  •    for any that were not explicitly set. Without these dependencies, computed events would produce
    
  •    vacuously-true or never-firing signals.
    

## Multi-Target via setAutoAttach
@@ -233,8 +246,8 @@
|------|---------|
| `server/lib/cdpmonitor/monitor.go` | Core: raw coder/websocket CDP client, domain enablement, setAutoAttach, event dispatch loop |
| `server/lib/cdpmonitor/events.go` | BrowserEvent struct, event type constants, JSON serialization, 1MB truncation |
-| `server/lib/cdpmonitor/config.go` | EventCaptureConfig struct, validation, reconfiguration |
-| `server/lib/cdpmonitor/settling.go` | Network idle state machine, layout shift observer injection/polling, composite navigation_settled |
+| `server/lib/cdpmonitor/config.go` | EventCaptureConfig struct, validation (including computed_events dependency auto-enable), reconfiguration |
+| `server/lib/cdpmonitor/settling.go` | Network idle state machine, layout shift observer injection/polling, composite navigation_settled, full state reset on reconnect |
| `server/lib/cdpmonitor/interactions.go` | JS injection for click/key/scroll tracking, 500ms polling, scroll 300ms debounce |
| `server/lib/cdpmonitor/screenshot.go` | Full-display screenshot via ffmpeg x11grab, base64 encode, triggered by event hooks |
| `server/lib/cdpmonitor/s2writer.go` | Batched S2 append writer, graceful degradation |
@@ -258,7 +271,7 @@
| File | Coverage |
|------|----------|
| `events_test.go` | Event serialization, 1MB truncation (verify truncated flag set, payload under limit), snake_case type validation |
-| `config_test.go` | Config validation, defaults, reconfiguration merging, network_response_body requires network |
+| `config_test.go` | Config validation, defaults, reconfiguration merging, network_response_body requires network, computed_events auto-enables network/navigation/layout_shifts/interactions |
| `settling_test.go` | Network idle state machine (request counting, 500ms timer, reset on navigation), layout settled 1s timer, composite navigation_settled requires all 3 signals |
| `buffer_test.go` | Ring buffer overflow, subscriber catch-up, concurrent read/write safety |
| `s2writer_test.go` | Time-based and count-based flush batching, graceful skip when S2 not configured |
@@ -272,7 +285,7 @@
| `e2e_events_core_test.go` | **Lifecycle**: start/stop/restart capture. **Reconfigure**: start with network-only, verify no console events, reconfigure to add console, verify console events appear. **Console**: navigate to page with console.log/console.error, verify `console_log` and `console_error` events. **Network**: navigate to page that fetches an API, verify `network_request` + `network_response`, test with response bodies enabled, test large response truncation. |
| `e2e_events_navigation_test.go` | **Navigation & settling**: navigate between pages, verify `navigation`, `dom_content_loaded`, `page_load` events. Verify `network_idle`, `layout_settled`, `navigation_settled` fire in correct order. **Iframes**: load page with iframe, verify events carry correct `frame_id` and `parent_frame_id`. **Screenshots**: configure screenshot on `navigation_settled`, verify `screenshot` event with base64 PNG data. |
| `e2e_events_targets_test.go` | **Multi-target (setAutoAttach)**: open new tab via `window.open()`, verify `target_created` with correct URL and distinct `cdp_session_id`. Navigate in second tab, verify events attributed correctly. Close tab, verify `target_destroyed`. **Interactions**: click element, type in input, scroll page; verify `interaction_click`, `interaction_key`, `interaction_scroll`, `scroll_settled` events. |
-| `e2e_events_failure_test.go` | **Chrome crash/restart**: kill Chrome process during active capture, verify `monitor_disconnected` event with reason, verify automatic reconnection and `monitor_reconnected` event, verify domain re-subscription and events resume. **Ring buffer overflow**: generate high event volume (e.g., tight network request loop), verify oldest events are evicted without crash, verify SSE clients receive latest events. **Start before Chrome ready**: call `/events/start` before Chrome has finished launching, verify graceful error response (503) or queued start that activates once Chrome is available. |
+| `e2e_events_failure_test.go` | **Chrome crash/restart**: kill Chrome process during active capture, verify `monitor_disconnected` event with reason, verify automatic reconnection and `monitor_reconnected` event, verify domain re-subscription and events resume, verify settling state is fully reset (network_idle fires after reconnect navigation rather than staying stuck), verify PerformanceObserver and interaction tracking JS are re-injected (layout_settled and interaction events work post-reconnect). **Ring buffer overflow**: generate high event volume (e.g., tight network request loop), verify oldest events are evicted without crash, verify SSE clients receive latest events. **Start before Chrome ready**: call `/events/start` before Chrome has finished launching, verify graceful error response (503) or queued start that activates once Chrome is available. |

## Appendix: Prior Art

@rgarcia
Copy link
Contributor Author

rgarcia commented Feb 7, 2026

@cursor push 80a186b

- Add capture_session_id (UUIDv4) to BrowserEvent schema for robust
  deduplication via (capture_session_id, seq) instead of (seq, type, ts),
  preventing silent data loss across server restarts.

- Document computed_events config dependencies (network, navigation,
  layout_shifts, interactions) and specify auto-enable with warning.
  Without these, computed events produce vacuously-true or never-firing signals.

- Specify full settling state reset and script re-injection on Chrome
  crash reconnection: zero request counter, cancel timers, clear boolean
  flags, re-inject PerformanceObserver and interaction tracking JS.
  Update test plan to verify post-reconnect behavior.

Applied via @cursor push command
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is ON. A Cloud Agent has been kicked off to fix the reported issue.

@cursor
Copy link

cursor bot commented Feb 7, 2026

Bugbot Autofix prepared fixes for 1 of the 1 bugs found in the latest run.

  • ✅ Fixed: layout_settled missing from screenshot triggers enum
    • Added layout_settled to the screenshot_triggers enum so it is consistent with the computed meta-events list that already includes it.

Create PR

Or push these changes by commenting:

@cursor push 84e8f5d0c7
Preview (84e8f5d0c7)
diff --git a/.cursor/plans/2026-02-05-events.md b/.cursor/plans/2026-02-05-events.md
--- a/.cursor/plans/2026-02-05-events.md
+++ b/.cursor/plans/2026-02-05-events.md
@@ -201,7 +201,7 @@
       type: array
       items:
         type: string
-        enum: [error, page_load, navigation_settled, scroll_settled, network_idle]
+        enum: [error, page_load, navigation_settled, scroll_settled, network_idle, layout_settled]
       description: Which events trigger a screenshot. Default [error, navigation_settled]
     targets:
       type: boolean

@rgarcia
Copy link
Contributor Author

rgarcia commented Feb 8, 2026

@cursor push 84e8f5d

The screenshot_triggers enum included all computed events (network_idle,
scroll_settled, navigation_settled) except layout_settled. This was an
oversight since layout_settled is a distinct computed meta-event and should
be available as a screenshot trigger for capturing visual stability moments
independently of network activity.

Applied via @cursor push command
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants