# Carrot Browser Bridge API Base URL: use your bridge URL. Community-hosted instance: https://browser.carrotlabs.ai Carrot Browser Bridge lets an authorized AI agent control the user's Chrome browser through either direct HTTP calls or MCP tools. The complete browser command surface is available through `POST /cmd`; convenience HTTP routes and MCP tools cover common operations. ## Agent Workflow 1. Ask the user to install/open the Carrot Chrome extension. 2. Ask the user to generate a pairing code from the Carrot side panel. 3. Claim the pairing code: ```http POST /sessions/claim Content-Type: application/json {"code":"ABC123","agent_name":"your-agent-name"} ``` 4. Use the returned `session_token` on authenticated requests: ```http Authorization: Bearer SESSION_TOKEN ``` 5. Start page work by calling `readPage` or `find`, then use returned `ref` IDs for interactions. Local `--no-auth` servers do not require pairing or bearer tokens. Public or network-accessible bridge instances should normally require auth. ## Sessions and Scope Pairing codes are single-use and expire quickly. Sessions default to 8 hours. The user chooses one of these scopes: | Scope | Meaning | |---|---| | `tab:TAB_ID` | Commands are constrained to a single tab. | | `window:WINDOW_ID` | Commands are constrained to one Chrome window. | | `browser` | Commands can act across the browser. | Tab-scoped sessions can use page reading, page interaction, navigation, debug, screenshots, and tab-local operations. Window-scoped sessions add window-local tab/window/group operations. Browser-scoped sessions can use all commands. ## Top-Level HTTP Routes All agent-facing routes require `Authorization: Bearer SESSION_TOKEN` when auth is enabled, except public information routes. ### Public Information | Route | Description | |---|---| | `GET /` | Human-readable bridge status page. | | `GET /status` | JSON service status, auth mode, live counts, and doc links. | | `GET /llms.txt` | Compact agent quick-start and discovery file. | | `GET /api.md` | This full HTTP API and command reference. | | `GET /openapi.json` | FastAPI OpenAPI schema for top-level HTTP routes. | | `GET /privacy` | Privacy policy HTML for the community-hosted instance. | | `GET /privacy.md` | Privacy policy Markdown for the community-hosted instance. | | `/mcp` | Streamable HTTP MCP endpoint for MCP clients. | ### Session Routes | Route | Body / Params | Description | |---|---|---| | `POST /sessions/claim` | `{"code":"ABC123","agent_name":"name"}` | Claim a user-generated pairing code. Returns `session_token`, `scope`, `browser_id`, and `expires_in`. | | `GET /sessions` | none | List sessions visible to the current session. | ### Generic Browser Command `POST /cmd` is the preferred HTTP interface for browser actions. ```http POST /cmd Content-Type: application/json Authorization: Bearer SESSION_TOKEN {"type":"readPage"} ``` The request body must include `type`; all other fields are passed as command parameters. The default command timeout is 30 seconds. Override it with `"_timeout": 60`. Typical response shape: ```json {"result": "...command-specific result..."} ``` Errors return JSON with an `error` field and a non-2xx status when the bridge or extension rejects the command. ### Convenience Routes These routes wrap common `/cmd` command types. They are kept for simple clients; agents should prefer `POST /cmd` for the complete command surface. | Route | Equivalent command | Notes | |---|---|---| | `POST /navigate` | `navigate` | Body: `url`, optional `tabId`. | | `POST /execute` | `execute` | Body: `script`, optional `tabId`, `world`. | | `POST /click` | `click` | Body: `selector`, optional `index`, `tabId`. | | `POST /query` | `query` | Body: `selector`, optional `limit`, `tabId`. | | `POST /scroll` | `scroll` | Body: optional `selector`, `direction`, `tabId`. | | `POST /press` | `press` | Body: `key`, optional `tabId`. | | `POST /close` | `closeTab` | Body: `tabId`. | | `GET /tabs` | `tabs` | Lists tabs, constrained by session scope. | | `GET /focused` | `focused` | Active tab in the last-focused window/scope. | | `GET /screenshot` | `screenshot` | Query: optional `windowId`, `tabId`, `format`. | | `GET /windows` | `getWindows` | Lists browser windows and their tabs. | | `GET /groups` | `queryGroups` | Query: optional `windowId`. | | `GET /history` | `searchHistory` | Query: `q`, `maxResults`, optional `startTime`. | | `GET /bookmarks` | `searchBookmarks` | Query: `q`. | | `GET /bookmark_tree` | `getBookmarkTree` | Full bookmark tree. | ## Command Reference All commands below are sent as JSON to `POST /cmd`. ### Page Reading and Element Discovery Always call `readPage` or `find` before interacting with elements. Prefer `ref` IDs over CSS selectors for actions because refs target the page snapshot the extension just observed. | Command | Parameters | Description | |---|---|---| | `readPage` | optional `selector`, `maxElements`, `tabId` | Accessibility-style tree with visible text and interactive elements. Returns `ref` IDs. | | `find` | `query`; optional `role`, `limit`, `tabId` | Search visible elements by text, aria-label, placeholder, and role. Returns `ref` IDs. | | `getPageText` | optional `selector`, `maxLength`, `tabId` | Extract page text, up to 500k chars by default. | | `dom` | optional `selector`, `tabId` | Alias-like text/DOM extraction path used by older clients. | | `query` | `selector`; optional `limit`, `tabId` | CSS selector query returning refs and text. | Examples: ```json {"type":"readPage","selector":"body","maxElements":500} {"type":"find","query":"Submit","role":"button","limit":5} {"type":"query","selector":"button.primary","limit":10} ``` ### Page Interaction Most interaction commands accept either `ref` or `selector`; use `ref` when available. Many also accept `tabId`. | Command | Parameters | Description | |---|---|---| | `click` | `ref` or `selector`; optional `index`, `tabId` | Click an element. | | `hover` | `ref` or `selector`; optional `index`, `tabId` | Hover an element. | | `type` | `text`; optional `ref`, `selector`, `tabId` | Type with simulated keyboard input. | | `formInput` | `value`; optional `ref`, `selector`, `tabId` | Set input/select/checkbox/radio/textarea values directly. | | `fillContentEditable` | `text`; optional `ref`, `selector`, `maxScrolls`, `tabId` | Fill rich text/contenteditable editors. | | `scroll` | optional `direction`, `amount`, `ref`, `selector`, `tabId` | Scroll page or element. Directions: `up`, `down`, `left`, `right`. | | `press` | `key`; optional `ref`, `selector`, `tabId` | Send a key press such as `Enter`, `Tab`, `Escape`, or `ArrowDown`. | Examples: ```json {"type":"click","ref":"ref_42"} {"type":"formInput","ref":"ref_7","value":"hello@example.com"} {"type":"press","key":"Enter"} ``` ### Navigation and JavaScript | Command | Parameters | Description | |---|---|---| | `navigate` | `url`; optional `tabId` | Navigate a tab. | | `goBack` | optional `tabId` | Navigate back. | | `goForward` | optional `tabId` | Navigate forward. | | `execute` | `script`; optional `tabId`, `world` | Execute JavaScript in the page. Prefer page actions for normal interaction. | Examples: ```json {"type":"navigate","url":"https://example.com"} {"type":"execute","script":"document.title","world":"MAIN"} ``` ### Tabs | Command | Parameters | Description | |---|---|---| | `focused` | optional `tabId`, `windowId` | Return focused tab context. | | `tabs` | optional `tabId`, `windowId` | List tabs or return one tab. | | `activateTab` | `tabId` | Focus Chrome and activate a tab. | | `createTab` | optional `url`, `active`, `pinned`, `index`, `windowId` | Create a tab. | | `closeTab` | optional `tabId` | Close a tab; defaults to active scoped tab where allowed. | | `reloadTab` | optional `tabId`, `bypassCache` | Reload a tab. | | `duplicateTab` | optional `tabId` | Duplicate a tab. | | `pinTab` | optional `tabId`, `pinned` | Pin or unpin a tab. | | `muteTab` | optional `tabId`, `muted` | Mute or unmute a tab. | | `moveTab` | optional `tabId`, `index`, `windowId` | Move a tab. | | `discardTab` | optional `tabId` | Discard a tab to free memory. | Examples: ```json {"type":"createTab","url":"https://example.com","active":true} {"type":"activateTab","tabId":123} {"type":"moveTab","tabId":123,"index":0} ``` ### Tab Groups | Command | Parameters | Description | |---|---|---| | `groupTabs` | `tabIds`; optional `groupId`, `title`, `color`, `collapsed` | Group tabs and optionally update group metadata. | | `ungroupTabs` | `tabIds` | Remove tabs from groups. | | `updateGroup` | `groupId`; optional `title`, `color`, `collapsed` | Update group metadata. | | `queryGroups` | optional `queryInfo` | Query tab groups. Window scope injects its `windowId`. | Example: ```json {"type":"groupTabs","tabIds":[1,2],"title":"Research","color":"blue"} ``` ### Windows | Command | Parameters | Description | |---|---|---| | `getWindows` | optional `windowId` | List windows with tabs. | | `createWindow` | optional `url`, `windowType`, `state`, `incognito`, `focused`, `left`, `top`, `width`, `height` | Create a Chrome window. | | `updateWindow` | `windowId`; optional `state`, `focused`, `drawAttention`, `left`, `top`, `width`, `height` | Update a window. | | `closeWindow` | `windowId` | Close a window. | | `resizeWindow` | optional `windowId`, `width`, `height`, `left`, `top` | Resize/reposition a window; defaults to last-focused window. | Example: ```json {"type":"createWindow","url":"https://example.com","state":"maximized"} ``` ### Screenshots | Command | Parameters | Description | |---|---|---| | `screenshot` | optional `windowId`, `tabId`, `format`, `quality` | Capture visible tab or a specific tab. `format` defaults to `png`; `quality` defaults to 90. | Example: ```json {"type":"screenshot","tabId":123,"format":"png"} ``` ### Debugging Console and network commands install page interceptors. Call with `"install": true` once per page load, then call again to read buffered events. | Command | Parameters | Description | |---|---|---| | `readConsole` | optional `install`, `tabId` | Read console log/warn/error messages. | | `readNetwork` | optional `install`, `tabId` | Read fetch/XHR network activity. | Examples: ```json {"type":"readConsole","install":true} {"type":"readNetwork","install":true} ``` ### History, Bookmarks, Downloads, Notifications Some browser-wide commands require browser scope. Window/tab scopes may reject commands that expose browser-wide data or act outside the granted scope. | Command | Parameters | Description | |---|---|---| | `searchHistory` | optional `query`, `startTime`, `endTime`, `maxResults` | Search Chrome history. | | `deleteHistory` | `url` | Delete a URL from history. | | `searchBookmarks` | optional `query` | Search bookmarks. | | `createBookmark` | optional `title`, `url`, `parentId` | Create a bookmark or folder. | | `getBookmarkTree` | none | Return the full bookmark tree. | | `download` | `url`; optional `filename`, `saveAs` | Start a browser download. | | `getDownloads` | optional `queryInfo` | Search downloads. | | `notify` | optional `title`, `message`, `iconUrl` | Create a Chrome notification. | Examples: ```json {"type":"searchHistory","query":"docs","maxResults":25} {"type":"createBookmark","title":"Carrot","url":"https://carrotlabs.ai"} {"type":"download","url":"https://example.com/file.pdf","filename":"file.pdf"} ``` ## MCP MCP clients can connect to any bridge instance's `/mcp` endpoint. For the community-hosted instance: ```text https://browser.carrotlabs.ai/mcp ``` Use the `claim_session` MCP tool with a user-generated pairing code, then call the available MCP tools. MCP is a convenience interface for common commands; the direct HTTP `/cmd` endpoint remains the canonical complete command surface. ## Extension and Legacy Routes The Chrome extension connects to `GET /ws` by WebSocket. Older no-auth local setups may use `GET /poll` and `POST /complete`; those legacy routes are not available when auth is enabled. The extension also uses `POST /ping` for browser registration/keepalive. ## Best Practices for Agents - Read `/llms.txt` first for the compact flow, then `/api.md` for details. - Always call `readPage` or `find` before interacting with page elements. - Prefer `ref` IDs over selectors. - Use `formInput` for form values; use `type` when simulated keystrokes matter. - Use `fillContentEditable` for rich text editors. - Respect the scope the user granted. Do not ask for broader scope unless the task truly needs it. - Verify important actions by reading the page again.