Skip to main content
Control a Chromium browser for web navigation, element interaction and content extraction. Supports JavaScript-rendered pages and uses a compact DOM snapshot so the Agent can efficiently understand page structure.

Installation

  1. Supported on Ubuntu 20.04+, Debian 10+, macOS and Windows. Older systems such as Ubuntu 18.04 will fall back to a compatible version automatically.
  2. The browser tool has heavy dependencies (~300MB) and is optional. For lightweight web content retrieval, use the web_fetch tool.

Workflow

A typical browser workflow for the Agent:
  1. navigate — Open the target URL
  2. snapshot — Get a compact DOM with auto-numbered interactive elements (ref)
  3. click / fill / select — Operate elements by ref
  4. snapshot — Snapshot again to verify the result

Supported Actions

ActionDescriptionKey parameters
navigateOpen URLurl
snapshotGet structured page text (primary way)selector (optional)
clickClick an elementref or selector
fillFill text into an inputref or selector, text
selectSelect a dropdown optionref or selector, value
scrollScroll the pagedirection (up/down/left/right)
screenshotSave a screenshot to the workspacefull_page
waitWait for an element or timeoutselector, timeout
pressPress a key (Enter, Tab, etc.)key
back / forwardBrowser back / forward-
get_textGet an element’s text contentselector
evaluateRun JavaScriptscript

Use Cases

  • Access a URL to retrieve dynamic page content
  • Fill in forms and log in
  • Operate web elements (click buttons, select options, etc.)
  • Verify the result of a deployed web page
  • Scrape content that requires JS rendering

Run Mode

The browser picks a mode based on the runtime environment:
EnvironmentMode
macOS / WindowsHeaded (browser window visible)
Linux desktop (with DISPLAY)Headed
Linux server (no DISPLAY)Headless
You can override it in config.json:
{
  "tools": {
    "browser": {
      "headless": true
    }
  }
}

Persistent Login

Log in to a target site once and the Agent can keep using it. Two ways are supported:

Option 1: Persistent mode (default)

Works out of the box. Login state is saved under ~/.cow/browser_profile. No configuration needed. To disable persistence and start with a clean environment every time:
{
  "tools": {
    "browser": {
      "persistent": false
    }
  }
}

Option 2: CDP mode (attach to real Chrome)

Have the Agent connect to a separately launched real Chrome (instead of the Chromium bundled with Playwright) for full browser fingerprints. Useful for sites with strict bot detection. Launch Chrome with a debugging port and a dedicated user data directory:
"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" \
  --remote-debugging-port=9222 \
  --user-data-dir="$HOME/.cow/chrome-cdp"
Then point the Agent at the endpoint in config.json:
{
  "tools": {
    "browser": {
      "cdp_endpoint": "http://localhost:9222"
    }
  }
}
Chrome 137+ requires --remote-debugging-port to be paired with a dedicated --user-data-dir. As a result, the CDP-launched Chrome cannot directly reuse the login state of your daily Chrome; you’ll need to log in once inside this dedicated profile.