Browserbase Release Notes
109 release notes curated from 75 sources by the Releasebot Team. Last updated: May 28, 2026
Browserbase Products
- May 28, 2026
- Date parsed from source:May 28, 2026
- First seen by Releasebot:May 28, 2026
stagehand/server-v3 v3.7.1
Stagehand adds a SEA --version flag in a new server release.
What's Changed
- STG-1746 Add SEA --version flag by @monadoid in #2167
Full Changelog: stagehand-server-v3/v3.7.0...stagehand-server-v3/v3.7.1
Original source - May 27, 2026
- Date parsed from source:May 27, 2026
- First seen by Releasebot:May 28, 2026
stagehand/server-v3 v3.7.0
Stagehand adds a screenshot option to Extract, expands verifier and eval tooling with backend facade, shell and types, publishes eval results, and records agent trajectories while also fixing the Braintrust API key requirement and SDK reference labels.
What's Changed
- Add screenshot option to Extract by @miguelg719 in #2149
- feat(verifier): add evaluator backend facade by @miguelg719 in #2129
- Workflow: publish eval results by @miguelg719 in #2093
- fix[evals] remove braintrust api key requirement (#2145) by @miguelg719 in #2153
- Delete packages/server-v4 entirely for now by @pirate in #2151
- feat(verifier): add verifier evaluator shell and types by @miguelg719 in #2157
- feat(verifier): record agent trajectories by @miguelg719 in #2131
- Fix SDK API reference parameter labels by @monadoid in #2164
- [chore]: update lockfile by @seanmcguire12 in #2168
- [chore]: bump ws dep by @seanmcguire12 in #2169
- [STG-1756] forward Vertex model config by @monadoid in #2160
Full Changelog: stagehand-server-v3/v3.6.10...stagehand-server-v3/v3.7.0
Original source All of your release notes in one feed
Join Releasebot and get updates from Browserbase and hundreds of other software products.
- May 21, 2026
- Date parsed from source:May 21, 2026
- First seen by Releasebot:May 22, 2026
Fetch API now returns markdown and JSON
Browserbase adds an updated Fetch API that returns web content as markdown or JSON, with a bigger 5MB data cap and lower costs.
Use the updated Fetch API to fetch and return web content as markdown or structured JSON from any URL. Faster and cheaper than spinning up a browser, in formats that are easier for humans and models to read.
Data cap raised from 1MB to 5MB.
Extract ~$4 / 1k pages, ~$7 / 1k with proxies.
Original source - May 20, 2026
- Date parsed from source:May 20, 2026
- First seen by Releasebot:May 28, 2026
stagehand/server-v3 v3.6.10
Stagehand ships a broad update with evals TUI improvements, onboarding flow updates, better Anthropic model support, selector and snapshot fixes, Chrome launch and CDP attachment enhancements, Vertex auth params, and refreshed docs plus dependency maintenance.
What's Changed
Version Packages by @github-actions[bot] in #2067
[chore]: bump mcp sdk & fastify by @seanmcguire12 in #2106
Evals TUI tree traversal by @miguelg719 in #2100
[docs]: add docs for ignoreSelectors by @seanmcguire12 in #2097
[docs]: add ignoreSelectors to docs for extract() by @seanmcguire12 in #2088
[fix] Anthropic CUA triple_click mapping (#2104) by @miguelg719 in #2107
[evals] Onboarding flow by @miguelg719 in #2103
Updated readme for evals package by @miguelg719 in #2112
[chore]: bump more deps by @seanmcguire12 in #2114
[chore]: bump mintlify version by @seanmcguire12 in #2115
[fix]: include native selector state into snapshot by @seanmcguire12 in #2116
Pin github action/workflow versions by @miguelg719 in #2121
Fix structuredOutputMode for newer Anthropic models by @miguelg719 in #2120
[chore]: omit test files from published package by @seanmcguire12 in #2122
[chore]: rm deepmerge from peer deps by @seanmcguire12 in #2124
[chore]: rm langchain deps by @seanmcguire12 in #2123
[fix]: import ToolSet from ai public export by @seanmcguire12 in #2126
[feat]: forward ignoreDefaultArgs to chrome-launcher by @seanmcguire12 in #2127
[chore]: rm cli release workflow by @seanmcguire12 in #2143
Pass local launch options when attaching over CDP by @shriyatheunicorn in #2146
[STG-1756] add Vertex auth params to Stagehand spec by @monadoid in #2118
New Contributors
@shriyatheunicorn made their first contribution in #2146
Full Changelog: stagehand-server-v3/v3.6.9...stagehand-server-v3/v3.6.10
Original source - May 20, 2026
- Date parsed from source:May 20, 2026
- First seen by Releasebot:May 27, 2026
Adding Foreground Tab Tracking to the Chrome Devtools Protocol
Browserbase says Chrome Canary now exposes tabActive, tabPinned and tabGroupId in CDP TargetInfo, giving browser automation libraries reliable tab state without focus-stealing hacks.
TLDR; Browser automation libraries no longer need crazy hacks to find the foreground tab! Chrome v150+ now exposes tabActive, tabPinned, tabGroupId, and more via TargetInfo!
Chrome’s DevTools Protocol has been the de-facto standard for automating browsers for years.
It’s nearly feature-complete, it can evaluate JavaScript, inspect the DOM, watch network traffic, capture screenshots, drive input, and attach to workers and frames. However, until recently, browser automation libraries still had a painful blind spot: there was no CDP API to tell where a tab sits in the browser UI and whether it was in the foreground or background.
That matters when a browser driver needs to coexist with a human user or another agent.
Automation code often needs to answer basic questions:
- What order are the tabs in?
- Which tab is foregrounded?
- Is this tab pinned?
- Is this tab in a tab group?
- Which browser window contains this tab?
Chrome extension APIs have exposed most of this for a long time via chrome.tabs.query(...), but CDP lacked any equivalent API.
This post documents my work to fix this in a Chromium patch, which just landed in Chrome Canary v150.0.7848.0 (on May 20th, 2026)!
🎭 Playwright, Puppeteer, Selenium, Stagehand, etc. have historically struggled to track foreground tab state reliably.
Browser driver libraries have had to infer tab state indirectly. That leads to bad behavior.
❌ Some libraries assumed the most recently opened tab is the foreground tab. That breaks as soon as a human clicks elsewhere, another automation client activates something else, the browser restores tabs, or a tab opens in the background.
❌ Some libraries forced every tab they used into the foreground with Target.activateTarget. That is disruptive. It steals focus from the human, and on macOS it can bring the entire browser application to the front, which makes background browser agents almost unusable.
❌ Some libraries assumed the order returned by Target.getTargets matches the tab strip order or the most-recently-foregrounded order. It does neither.
❌ Some libraries injected scripts into every page to watch focus, visibility, mousemove, load state, or foreground JavaScript events. Those approaches are brittle. Browser UI transitions do not reliably fire page JS events, pages can be frozen or discarded, and injected scripts cannot access Chrome’s tab strip APIs.
The missing data was ordinary tab metadata users can see with their eyes: order, active state, pinned state, and group membership.
🫣 First Try: Adding a new Target.queryTabs() method
My first proposal was direct: add a CDP command named Target.queryTabs that worked exactly like the Extensions API chrome.tabs.query(...).
I prototyped this in my Chromium fork first:
https://github.com/pirate/chromium/pull/1Then I submitted it to Gerrit:
https://chromium-review.googlesource.com/c/chromium/src/+/7787097The Chrome DevTools reviewers pushed the design in a better direction: add an extensible embedderData object to Target.TargetInfo, then let Chrome fill it for tab targets.
👌 Second Try: Target.getTargets() + embedderData
The Google developers on the Chromium team mentioned Tabs can be embedded in multiple UIs, not just Chrome (e.g. on Android, Chrome OS, Fuchsia, etc.). They wanted me to expose implementation-specific metadata that keeps the core protocol generic across embedders with different tab models.
The final shape we landed on is:
Target.getTargets({ filter: [{ type: 'tab', exclude: false }] }) -> { "targetId": "...", "type": "tab", "title": "...", "url": "...", "attached": false, "browserContextId": "...", "embedderData": { "tabStripIndex": 0, "tabActive": true, "tabPinned": false, "tabGroupId": "..." } }tabGroupId is an optional string, present only when the tab is in a group.
browserContextId already exists on TargetInfo, so the final patch kept it there instead of duplicating it inside embedderData.
windowId stayed out of embedderData as well. CDP has Browser.getWindowForTarget for mapping a tab target to its window and bounds.
The change intentionally does not emit new Target.targetInfoChanged events when embedderData changes. For now this is pull-based: clients call Target.getTargets or Target.getTargetInfo.
The feature is available in Chrome Canary today:
Chrome Canary >= 150.0.7848.0 (commit:5aa804ae0b62bd1b0d54f57494211239e2ed5ffe)📑 Why are tab & page duplicate targets for the same URL?
Most browser automation libraries are built around CDP targets of type "page". These are the targets you attach to for Runtime., Page., DOM.*, and similar page-level commands.
The new metadata lives on targets of type "tab" (which you may have never seen before).
In Chromium, a tab target is the browser/UI container. It corresponds to a WebContents and gives Chrome a reliable place to answer tab strip questions.
A page target is a renderer/main-frame debugging target: the surface you drive when you inspect DOM, evaluate JavaScript, or watch page lifecycle.
In 2020 Chromium changed their internal models to allow exposing multiple page-like targets associated with one tab as part of the MPArch (multi-page tab architecture) initiative. Internally MPArch is used for prerendering and back/forward caches, but curiously Chrome’s new split view feature does not leverage MPArch (instead those show up as multiple tab targets).
More historic context on MPArch:
- Overview of the MPArch Project in Chromium | Igalia Blogpost
- Multi-Page Architecture Original Design Document | Google
- Multi Page Architecture (BlinkOn 13) | YouTube Talk + Slides
- Pre-Rendering (BlinkOn 13) | YouTube Talk
- Fixing Long Tail of Features for MPArch | Google Doc
That makes this model too narrow:
type Tab = { tabTargetId: string; pageTargetId: string; };A better normalized model is:
type Tab = { tabTargetId: string; pages: PageTarget[]; };The tab target describes where the browser UI container is visible.
The page targets are the debuggable page surfaces you can send DOM.* commands to.
🧑💻 So how do we use this new feature?
async function getAllTabs(cdp) { // get the list of all *tab* targets (not page targets) const tabsResp = await cdp.send("Target.getTargets", { filter: [{ type: "tab", exclude: false }] }); // sort them by tab strip UI order const tabs = tabsResp.targetInfos.filter(target => target.type === "tab" && target.embedderData).sort((a, b) => a.embedderData.tabStripIndex - b.embedderData.tabStripIndex); const tabs_with_info = []; // for each tab, get the pages within for (const tab of tabs) { const pages = []; // setup the event listener to collect page info const stopCollecting = cdp.onEvent("Target.attachedToTarget", (event) => { if (event.targetInfo?.type !== "page") return; pages.push({ pageTargetId: event.targetInfo.targetId, pageSessionId: event.sessionId, url: event.targetInfo.url, title: event.targetInfo.title, subtype: event.targetInfo.subtype, }); }); // tell CDP to attach all the pages for this tab (triggers the listener above) await cdp.send("Target.autoAttachRelated", { targetId: tab.targetId, waitForDebuggerOnStart: false, filter: [{ type: "page", exclude: false }] }); stopCollecting(); // get window info for the tab let windowId; try { const windowInfo = await cdp.send("Browser.getWindowForTarget", { targetId: tab.targetId }); windowId = windowInfo.windowId; } catch { // Some embedders or target kinds may lack window lookup support. } tabs_with_info.push({ tabIdx: tab.embedderData.tabStripIndex, tabActive: tab.embedderData.tabActive, tabPinned: tab.embedderData.tabPinned, tabGroupId: tab.embedderData.tabGroupId ?? null, tabTargetId: tab.targetId, windowId, primaryUrl: tab.url, primaryTitle: tab.title, pages, }); } return tabs_with_info; }➕ Chromium Contribution Flow for Non-Google Devs
This was my first Chromium patch that I’ve submitted upstream, and I hit several process issues that were obvious in hindsight.
The path that worked was:
- Opened an issue with a detailed problem statement + got stakeholders to upvote it: https://issues.chromium.org/u/1/issues/497896141
- Prototyped the fix on GitHub fork (just for convenience, I find GH easier to use than Gerrit): https://github.com/pirate/chromium/pull/1
- Submitted my patchset to Gerrit: https://chromium-review.googlesource.com/c/chromium/src/+/7787097
- Updated the patch in Gerrit based on comments + CI errors:
- used an alias email for gerrit & AUTHORS to avoid getting spam for years
- added myself to AUTHORS (keeping AUTHORS in alphabetical order)
- added test coverage on all the happy paths (it’s ok if not every null branch is covered)
- Get code owner review and a Chromium committer to trigger CI + the commit queue.
I never once had to build all of Chromium on my local machine (which would’ve taken 8hr+).
I used cheap local checks where possible, and relied on Gerrit CI for the full platform matrix.
The most useful local checks were:
git diff --check python3 third_party/inspector_protocol/convert_protocol_to_json.py third_party/blink/public/devtools_protocol/browser_protocol.pdl /tmp/browser_protocol.json git cl presubmit -f --files=...🚀 Potential Future Improvements
This is a big improvement already, but there are several things we could do to make it even easier to interact with tab state via CDP:
- a way to get all page/tab/window state in a single call without multiple roundtrips/events
- an event that notifies when foreground tab or tab ordering/grouping/pinning state changes
- inline tabTargetId and pageTargetId on every response to easily link tabs<->pages
- a CDP field to get the chrome.tabs.query()[0].id integer tab ID exposed to Extensions
For now, Target.getTargets({ filter: [{ type: "tab", exclude: false }] }), Target.autoAttachRelated, and Browser.getWindowForTarget are enough to build a reliable tab inventory without focus-stealing hacks.
Original source - May 20, 2026
- Date parsed from source:May 20, 2026
- First seen by Releasebot:May 22, 2026
Adding Foreground Tab Tracking to the Chrome Devtools Protocol
Browserbase highlights Chrome v150+ tab metadata in CDP, bringing tabActive, tabPinned, tabGroupId, and tab strip order to browser automation so libraries can track foreground tabs more reliably without focus-stealing hacks.
TLDR;
Browser automation libraries no longer need crazy hacks to find the foreground tab! Chrome v150+ now exposes tabActive, tabPinned, tabGroupId, and more via TargetInfo!
Chrome’s DevTools Protocol has been the de-facto standard for automating browsers for years.
It’s nearly feature-complete, it can evaluate JavaScript, inspect the DOM, watch network traffic, capture screenshots, drive input, and attach to workers and frames. However, until recently, browser automation libraries still had a painful blind spot: there was no CDP API to tell where a tab sits in the browser UI and whether it was in the foreground or background.
That matters when a browser driver needs to coexist with a human user or another agent.
Automation code often needs to answer basic questions:
- What order are the tabs in?
- Which tab is foregrounded?
- Is this tab pinned?
- Is this tab in a tab group?
- Which browser window contains this tab?
Chrome extension APIs have exposed most of this for a long time via chrome.tabs.query(...), but CDP lacked any equivalent API.
This post documents my work to fix this in a Chromium patch, which just landed in Chrome Canary v150.0.7848.0 (on May 20th, 2026)!
🎭 Playwright, Puppeteer, Selenium, Stagehand, etc. have historically struggled to track foreground tab state reliably.
Browser driver libraries have had to infer tab state indirectly. That leads to bad behavior.
❌ Some libraries assumed the most recently opened tab is the foreground tab. That breaks as soon as a human clicks elsewhere, another automation client activates something else, the browser restores tabs, or a tab opens in the background.
❌ Some libraries forced every tab they used into the foreground with Target.activateTarget. That is disruptive. It steals focus from the human, and on macOS it can bring the entire browser application to the front, which makes background browser agents almost unusable.
❌ Some libraries assumed the order returned by Target.getTargets matches the tab strip order or the most-recently-foregrounded order. It does neither.
❌ Some libraries injected scripts into every page to watch focus, visibility, mousemove, load state, or foreground JavaScript events. Those approaches are brittle. Browser UI transitions do not reliably fire page JS events, pages can be frozen or discarded, and injected scripts cannot access Chrome’s tab strip APIs.
The missing data was ordinary tab metadata users can see with their eyes: order, active state, pinned state, and group membership.
🫣 First Try: Adding a new Target.queryTabs() method
My first proposal was direct: add a CDP command named Target.queryTabs that worked exactly like the Extensions API chrome.tabs.query(...).
I prototyped this in my Chromium fork first:
https://github.com/pirate/chromium/pull/1Then I submitted it to Gerrit:
https://chromium-review.googlesource.com/c/chromium/src/+/7787097The Chrome DevTools reviewers pushed the design in a better direction: add an extensible embedderData object to Target.TargetInfo, then let Chrome fill it for tab targets.
👌 Second Try: Target.getTargets() + embedderData
The Google developers on the Chromium team mentioned Tabs can be embedded in multiple UIs, not just Chrome (e.g. on Android, Chrome OS, Fuchsia, etc.). They wanted me to expose implementation-specific metadata that keeps the core protocol generic across embedders with different tab models.
The final shape we landed on is:
Target.getTargets({ filter: [{ type: 'tab', exclude: false }] }) -> { targetId: "...", type: "tab", title: "...", url: "...", attached: false, browserContextId: "...", embedderData: { // ✨ new tabStripIndex: 0, tabActive: true, tabPinned: false, tabGroupId: "..." } }tabGroupId is an optional string, present only when the tab is in a group.
browserContextId already exists on TargetInfo, so the final patch kept it there instead of duplicating it inside embedderData.
windowId stayed out of embedderData as well. CDP has Browser.getWindowForTarget for mapping a tab target to its window and bounds.
The change intentionally does not emit new Target.targetInfoChanged events when embedderData changes. For now this is pull-based: clients call Target.getTargets or Target.getTargetInfo.
The feature is available in Chrome Canary today:
Chrome Canary >= 150.0.7848.0 (commit:5aa804ae0b62bd1b0d54f57494211239e2ed5ffe)📑 Why are tab & page duplicate targets for the same URL?
Most browser automation libraries are built around CDP targets of type "page". These are the targets you attach to for Runtime., Page., DOM.*, and similar page-level commands.
The new metadata lives on targets of type "tab" (which you may have never seen before).
In Chromium, a tab target is the browser/UI container. It corresponds to a WebContents and gives Chrome a reliable place to answer tab strip questions.
A page target is a renderer/main-frame debugging target: the surface you drive when you inspect DOM, evaluate JavaScript, or watch page lifecycle.
In 2020 Chromium changed their internal models to allow exposing multiple page-like targets associated with one tab as part of the MPArch (multi-page tab architecture) initiative. Internally MPArch is used for prerendering and back/forward caches, but curiously Chrome’s new split view feature does not leverage MPArch (instead those show up as multiple tab targets).
More historic context on MPArch:
- Overview of the MPArch Project in Chromium | Igalia Blogpost
- Multi-Page Architecture Original Design Document | Google
- Multi Page Architecture (BlinkOn 13) | YouTube Talk + Slides
- Pre-Rendering (BlinkOn 13) | YouTube Talk
- Fixing Long Tail of Features for MPArch | Google Doc
That makes this model too narrow:
type Tab = { tabTargetId: string; pageTargetId: string; };A better normalized model is:
type Tab = { tabTargetId: string; pages: PageTarget[]; };The tab target describes where the browser UI container is visible. The page targets are the debuggable page surfaces you can send DOM.* commands to.
🧑💻 So how do we use this new feature?
async function getAllTabs(cdp) { // get the list of all *tab* targets (not page targets) const tabsResp = await cdp.send("Target.getTargets", { filter: [{ type: "tab", exclude: false }, { exclude: true }], }); // sort them by tab strip UI order const tabs = tabsResp.targetInfos.filter((target) => target.type === "tab" && target.embedderData).sort((a, b) => a.embedderData.tabStripIndex - b.embedderData.tabStripIndex); const tabs_with_info = []; // for each tab, get the pages within for (const tab of tabs) { const pages = []; // setup the event listener to collect page info const stopCollecting = cdp.onEvent("Target.attachedToTarget", (event) => { if (event.targetInfo?.type !== "page") return; pages.push({ pageTargetId: event.targetInfo.targetId, pageSessionId: event.sessionId, url: event.targetInfo.url, title: event.targetInfo.title, subtype: event.targetInfo.subtype, }); }); // tell CDP to attach all the pages for this tab (triggers the listener above) await cdp.send("Target.autoAttachRelated", { targetId: tab.targetId, waitForDebuggerOnStart: false, filter: [{ type: "page", exclude: false }, { exclude: true }], }); stopCollecting(); // get window info for the tab let windowId; try { const windowInfo = await cdp.send("Browser.getWindowForTarget", { targetId: tab.targetId }); windowId = windowInfo.windowId; } catch { // Some embedders or target kinds may lack window lookup support. } tabs_with_info.push({ tabIdx: tab.embedderData.tabStripIndex, tabActive: tab.embedderData.tabActive, tabPinned: tab.embedderData.tabPinned, tabGroupId: tab.embedderData.tabGroupId ?? null, tabTargetId: tab.targetId, windowId, primaryUrl: tab.url, primaryTitle: tab.title, pages, }); } return tabs_with_info; }➕ Chromium Contribution Flow for Non-Google Devs
This was my first Chromium patch that I’ve submitted upstream, and I hit several process issues that were obvious in hindsight.
The path that worked was:
- Opened an issue with a detailed problem statement + got stakeholders to upvote it:
https://issues.chromium.org/u/1/issues/497896141 - Prototyped the fix on GitHub fork (just for convenience, I find GH easier to use than Gerrit):
https://github.com/pirate/chromium/pull/1 - Submitted my patchset to Gerrit:
https://chromium-review.googlesource.com/c/chromium/src/+/7787097 - Updated the patch in Gerrit based on comments + CI errors:
- used an alias email for gerrit & AUTHORS to avoid getting spam for years
- added myself to AUTHORS (keeping AUTHORS in alphabetical order)
- added test coverage on all the happy paths (it’s ok if not every null branch is covered)
- Get code owner review and a Chromium committer to trigger CI + the commit queue.
I never once had to build all of Chromium on my local machine (which would’ve taken 8hr+).
I used cheap local checks where possible, and relied on Gerrit CI for the full platform matrix.
The most useful local checks were:
git diff --check python3 third_party/inspector_protocol/convert_protocol_to_json.py third_party/blink/public/devtools_protocol/browser_protocol.pdl /tmp/browser_protocol.json git cl presubmit -f --files=...🚀 Potential Future Improvements
This is a big improvement already, but there are several things we could do to make it even easier to interact with tab state via CDP:
- a way to get all page/tab/window state in a single call without multiple roundtrips/events
- an event that notifies when foreground tab or tab ordering/grouping/pinning state changes
- inline tabTargetId and pageTargetId on every response to easily link tabs<->pages
- a CDP field to get the chrome.tabs.query()[0].id integer tab ID exposed to Extensions
For now, Target.getTargets({ filter: [{ type: "tab", exclude: false }] }), Target.autoAttachRelated, and Browser.getWindowForTarget are enough to build a reliable tab inventory without focus-stealing hacks.
Original source - May 18, 2026
- Date parsed from source:May 18, 2026
- First seen by Releasebot:May 27, 2026
Browse.sh, a catalog of browser skills for the agentic future
Browserbase launches Browse.sh, an open catalog of 100+ curated browser skills that agents can install with one CLI command. Built on Autobrowse, it helps browser agents reuse reliable workflows, cut discovery, and lower costs across real websites.
Kyle Jeong
Shubhankar Srivastava
Alex Qiu
Shrey PandyaTL;DR
We built Browse.sh, an open catalog of 100+ curated browser skills that any agent can install with one CLI command. Our Skills are durable, reusable playbooks that capture how to navigate real websites, so your agents stop re-discovering every site from scratch on every run. All of this is powered by Autobrowse, our system that uses AI to iterate on real tasks until it converges on the cheapest, fastest path. Open source, free, and ready to use today at browse.sh.
Over 100 skills. Zero re-learning. Your agent’s brain grew some grooves.
Browser Agents are everywhere right now, living in Claude Code, Cursor, and Codex. AI products are now shipping some version of "let the model drive a browser." And yet, every single one of these agents does the same dumb thing: it re-discovers every website from scratch, every time it runs.
Open a browser. Poke around. Find the button. Click it. Parse the response. Close the session. Forget everything, then do it all again tomorrow.
We've been building browser agents and infra at Browserbase for a while now. We've watched agents burn through tokens re-learning sites they've already conquered. We've watched customers painstakingly hand-write Playwright scripts for workflows an agent already solved last Tuesday. We've watched the same discovery tax get paid over and over, across thousands of sessions, by thousands of teams.
Today we're launching Browse.sh: an open catalog of browser skills that any agent can install and use immediately. 100 curated skills at launch and one CLI command to install.
What is Browse.sh?
Browse.sh is two things:
- A catalog of browser skills at browse.sh, where you can search, preview, and install curated skills for navigating real websites.
- The Browse CLI (npm i -g browse), the open-source command-line tool your agents use to actually drive browsers, fetch pages, search the web, and load skills on demand.
A "skill" is a markdown file (SKILL.md) plus any helper scripts needed to repeat a browser workflow reliably. It contains the exact steps, gotchas, API endpoints, selectors, and fallback strategies an agent needs to complete a task on a specific site. No vector embeddings or screenshot reels. Just plain text that humans can read and agents can execute.
It’s just like a playbook. An agent that loads the Craigslist skill doesn't need to spend 30 turns figuring out that the search page is fully JS-rendered and that there's a hidden JSON API at sapi.craigslist.org. That knowledge is already in the skill. The agent reads it, runs it, and moves on.
Why this exists
If you've shipped a browser agent into production, you know this shape intimately.
The first run on a new site is exciting. The agent wanders around, figures out the page, eventually completes the task. The second run looks almost identical. The hundredth run is depressing. By then you've paid for the same exploration a hundred times, the cost graph is a straight line going up, and you still don't have a clean artifact you can hand to a teammate and say "this is how we do this job."
Reasoning has stopped being the constraint. Memory has become the bottleneck, in a form that humans and agents can both read and trust.
The unit economics are brutal
We benchmarked this on Craigslist. A generic agent loop searching listings costs ~$0.22 per run. The agent has to discover that the search page is fully JS-rendered, stumble onto the hidden JSON API at sapi.craigslist.org, figure out the positional array decoding, learn that item[0] is an offset (not the posting ID), and work around IP-based geo-scoping. Every run pays that discovery tax from scratch.
After four Autobrowse iterations, the graduated Browse.sh skill does the same job for ~$0.12 per run. The 45% cost reduction comes from better memory.
Every subsequent run after the first is fundamentally cheaper because the skill encodes the shortest reliable path the agent could find (the undocumented endpoint, the decode tables, the geo-override hack) and reuses it instead of re-deriving it. At scale, this is the difference between a cost curve that flatlines and one that compounds against you.
Skills are the new primitives
The industry is converging on this. Claude Code ships with skills. OpenAI Codex supports them. The AgentSkills standard is gaining traction. Every major agent framework is adding some version of "load a markdown file that tells the agent how to do a specific thing."
Browser skills are the natural next step. The web is messy: sites render differently for different user agents, gate content behind JavaScript, hide data behind undocumented endpoints, throw CAPTCHAs on a whim, and redesign their flows on a Tuesday. A generic agent loop copes with all of that in the moment, then forgets everything once the session closes.
Browse.sh captures what the agent learned, so the next agent (or the next teammate, or the next customer) doesn't have to learn it again.
How it works
Install the CLI
npm i -g browse
That's it.
Browse a skill
Head to browse.sh and search for the site or task you need. Each skill page shows what the skill does, how it works, site-specific gotchas, and the install command.
Install a skill
browse skills add zillow.com/extract-listings
This pulls the skill into your local skills directory. Your agent can now load it on demand.
Use it in your agent
Point your agent at the skill and let it run. The skill provides the playbook; the agent provides the reasoning. A typical prompt looks like:
Use /extract-listings to find apartments under $3,000 in SF with 2+ bedrooms.
The agent reads the SKILL.md, follows the workflow, handles edge cases using the documented gotchas, and returns structured results.
Verify it yourself
Quick check: does the CLI see your installed skills?
browse skills list
What's inside a skill?
Every skill graduates from Autobrowse, our system that uses AI to improve AI. You give an agent a real task on a real site. It runs the task end to end, studies its own trace, iterates on its strategy, and keeps going until the workflow becomes reliable rather than lucky. Once it converges, it writes out a durable skill.
Here's what that looks like in practice. This is a real excerpt from our Craigslist skill:
Site-Specific Gotchas
- Snapshot returns 0 refs on
/search/: The search page is fully JS-rendered. Don't usebrowse snapshot. item[0]is NOT the postingId - it's an offset fromdata.decode.minPostingId. Treating it as the ID produces 404s.- API geolocates by request IP. Add
postal=<zip>to override. A residential proxy is not required. - Rate-limit: keep ≤ 1 req/s sustained.
This is the kind of knowledge that takes a human engineer a couple of hours to reverse-engineer, and an agent dozens of dollars in tokens to discover from scratch. Once it's in a skill, it's free forever.
If the agent discovered an undocumented JSON endpoint, that endpoint is in there. If a particular form needs a small wait before submission, that's in there too. If a domain-specific helper script is worth keeping around, it gets checked in next to the skill.
What shipped?
We're launching with 100 skills spanning:
- Marketplaces: Craigslist, Zillow, Amazon, eBay
- Food & dining: OpenTable, DoorDash, McDonald's online ordering
- Travel: flight search, hotel booking, Airbnb
- Government: federal grants portals, state program catalogs
- Developer tools: GitHub, npm, documentation sites
- Enterprise SaaS: via partner integrations
Each skill is tagged with a category, verified status, and the site it targets. Partner skills from companies like Ramp, Lovable, Poke, and Reducto ship with a verified badge.
Generate your own
Don't see the skill you need? Type any domain and task into browse.sh, and Autobrowse will generate a skill for you. It runs the task against the live site, iterates until it converges, and publishes the result to the public catalog for anyone to use.
Every new skill makes the catalog more valuable, which brings more users, who generate more skills.
Who is this for?
Do you build agents that need the web? ├── Yes
│ ├── Are you tired of re-writing browser logic?
│ │ ├── Yes → install browse, load skills, ship
│ │ └── No → you will be. bookmark this
│ └── Do you want your agents to get cheaper over time?
│ ├── Yes → browse.sh skills compound
│ └── No → keep paying the discovery tax
└── No → browse.sh isn't for you (yet)More specifically:
- AI engineers building agents that automate web workflows (QA, data extraction, form filling, monitoring).
- Product teams shipping browser-based features who want deterministic, auditable playbooks instead of black-box agent runs.
- Platform teams looking to reduce token spend and latency across their agent fleet.
- Anyone using Claude Code, Cursor, or Codex who wants their coding agent to browse the web with pre-built expertise.
The Vision
A dominant story about browser agents right now is that they'll get good when the underlying models get good. We're one Anthropic or OpenAI release away from agents that just work on the web.
We don't entirely buy that.
Even a perfect model still has to discover, on every new site, what a perfect model would already know if it had been there before. Without a place to put what the agent learns, every run is a fresh start. The models will keep getting better. The web will keep getting messier. The gap between "can reason about a page" and "knows the fastest path through this specific site" will persist.
Browse.sh is that place. One CLI. A growing catalog of skills. Memory that compounds.
We built this because we believe the real unlock for browser agents isn't better reasoning. It's better memory, in a form that humans can audit and agents can execute.
Install our CLI with:
npm i -g browse
And find or create the skill you need at browse.sh.
The bottleneck for browser agents was never intelligence. It was amnesia.
Browse.sh is the cure.
Original source - May 18, 2026
- Date parsed from source:May 18, 2026
- First seen by Releasebot:May 19, 2026
Browse.sh, a catalog of browser skills for the agentic uture
Browserbase launches Browse.sh, an open catalog of 100+ curated browser skills and an open-source CLI that lets agents install reusable web workflows with one command. Powered by Autobrowse, it aims to cut token spend, speed up browser tasks, and help agents stop relearning sites from scratch.
TL;DR
We built Browse.sh, an open catalog of 100+ curated browser skills that any agent can install with one CLI command. Our Skills are durable, reusable playbooks that capture how to navigate real websites, so your agents stop re-discovering every site from scratch on every run. All of this is powered by Autobrowse, our system that uses AI to iterate on real tasks until it converges on the cheapest, fastest path. Open source, free, and ready to use today at browse.sh.
Over 100 skills. Zero re-learning. Your agent’s brain grew some grooves.
Browser Agents are everywhere right now, living in Claude Code, Cursor, and Codex. AI products are now shipping some version of "let the model drive a browser." And yet, every single one of these agents does the same dumb thing: it re-discovers every website from scratch, every time it runs.
Open a browser. Poke around. Find the button. Click it. Parse the response. Close the session. Forget everything, then do it all again tomorrow.
We've been building browser agents and infra at Browserbase for a while now. We've watched agents burn through tokens re-learning sites they've already conquered. We've watched customers painstakingly hand-write Playwright scripts for workflows an agent already solved last Tuesday. We've watched the same discovery tax get paid over and over, across thousands of sessions, by thousands of teams.
Today we're launching Browse.sh: an open catalog of browser skills that any agent can install and use immediately. 100 curated skills at launch and one CLI command to install.
What is Browse.sh?
Browse.sh is two things:
- A catalog of browser skills at browse.sh, where you can search, preview, and install curated skills for navigating real websites.
- The Browse CLI (npm i -g browse), the open-source command-line tool your agents use to actually drive browsers, fetch pages, search the web, and load skills on demand.
A "skill" is a markdown file (SKILL.md) plus any helper scripts needed to repeat a browser workflow reliably. It contains the exact steps, gotchas, API endpoints, selectors, and fallback strategies an agent needs to complete a task on a specific site. No vector embeddings or screenshot reels. Just plain text that humans can read and agents can execute.
It’s just like a playbook. An agent that loads the Craigslist skill doesn't need to spend 30 turns figuring out that the search page is fully JS-rendered and that there's a hidden JSON API at sapi.craigslist.org. That knowledge is already in the skill. The agent reads it, runs it, and moves on.
Why this exists
If you've shipped a browser agent into production, you know this shape intimately.
The first run on a new site is exciting. The agent wanders around, figures out the page, eventually completes the task. The second run looks almost identical. The hundredth run is depressing. By then you've paid for the same exploration a hundred times, the cost graph is a straight line going up, and you still don't have a clean artifact you can hand to a teammate and say "this is how we do this job."
Reasoning has stopped being the constraint. Memory has become the bottleneck, in a form that humans and agents can both read and trust.
The unit economics are brutal
We benchmarked this on Craigslist. A generic agent loop searching listings costs ~$0.22 per run. The agent has to discover that the search page is fully JS-rendered, stumble onto the hidden JSON API at sapi.craigslist.org, figure out the positional array decoding, learn that item[0] is an offset (not the posting ID), and work around IP-based geo-scoping. Every run pays that discovery tax from scratch.
After four Autobrowse iterations, the graduated Browse.sh skill does the same job for ~$0.12 per run. The 45% cost reduction comes from better memory.
Every subsequent run after the first is fundamentally cheaper because the skill encodes the shortest reliable path the agent could find (the undocumented endpoint, the decode tables, the geo-override hack) and reuses it instead of re-deriving it. At scale, this is the difference between a cost curve that flatlines and one that compounds against you.
Skills are the new primitives
The industry is converging on this. Claude Code ships with skills. OpenAI Codex supports them. The AgentSkills standard is gaining traction. Every major agent framework is adding some version of "load a markdown file that tells the agent how to do a specific thing."
Browser skills are the natural next step. The web is messy: sites render differently for different user agents, gate content behind JavaScript, hide data behind undocumented endpoints, throw CAPTCHAs on a whim, and redesign their flows on a Tuesday. A generic agent loop copes with all of that in the moment, then forgets everything once the session closes.
Browse.sh captures what the agent learned, so the next agent (or the next teammate, or the next customer) doesn't have to learn it again.
How it works
Install the CLI
npm i -g browseThat's it.
Browse a skill
Head to browse.sh and search for the site or task you need. Each skill page shows what the skill does, how it works, site-specific gotchas, and the install command.
Install a skill
browse skills add zillow.com/extract-listingsThis pulls the skill into your local skills directory. Your agent can now load it on demand.
Use it in your agent
Point your agent at the skill and let it run. The skill provides the playbook; the agent provides the reasoning. A typical prompt looks like:
Use /extract-listings to find apartments under $3,000 in SF with 2+ bedrooms.The agent reads the SKILL.md, follows the workflow, handles edge cases using the documented gotchas, and returns structured results.
Verify it yourself
Quick check: does the CLI see your installed skills?
browse skills listWhat's inside a skill?
Every skill graduates from Autobrowse, our system that uses AI to improve AI. You give an agent a real task on a real site. It runs the task end to end, studies its own trace, iterates on its strategy, and keeps going until the workflow becomes reliable rather than lucky. Once it converges, it writes out a durable skill.
Here's what that looks like in practice. This is a real excerpt from our Craigslist skill:
Site-Specific Gotchas
- Snapshot returns 0 refs on
/search/: The search page is fully JS-rendered. Don't usebrowse snapshot. -item[0]is NOT the postingId - it's an offset fromdata.decode.minPostingId. Treating it as the ID produces 404s. - API geolocates by request IP. Addpostal=<zip>to override. A residential proxy is not required. - Rate-limit: keep ≤ 1 req/s sustained.
This is the kind of knowledge that takes a human engineer a couple of hours to reverse-engineer, and an agent dozens of dollars in tokens to discover from scratch. Once it's in a skill, it's free forever.
If the agent discovered an undocumented JSON endpoint, that endpoint is in there. If a particular form needs a small wait before submission, that's in there too. If a domain-specific helper script is worth keeping around, it gets checked in next to the skill.
What shipped?
We're launching with 100 skills spanning:
- Marketplaces: Craigslist, Zillow, Amazon, eBay
- Food & dining: OpenTable, DoorDash, McDonald's online ordering
- Travel: flight search, hotel booking, Airbnb
- Government: federal grants portals, state program catalogs
- Developer tools: GitHub, npm, documentation sites
- Enterprise SaaS: via partner integrations
Each skill is tagged with a category, verified status, and the site it targets. Partner skills from companies like Ramp, Lovable, Poke, and Reducto ship with a verified badge.
Generate your own
Don't see the skill you need? Type any domain and task into browse.sh, and Autobrowse will generate a skill for you. It runs the task against the live site, iterates until it converges, and publishes the result to the public catalog for anyone to use.
Every new skill makes the catalog more valuable, which brings more users, who generate more skills.
Who is this for?
Do you build agents that need the web? ├── Yes
│ ├── Are you tired of re-writing browser logic?
│ │ ├── Yes → install browse, load skills, ship
│ │ └── No → you will be. bookmark this
│ └── Do you want your agents to get cheaper over time?
│ ├── Yes → browse.sh skills compound
│ └── No → keep paying the discovery tax
└── No → browse.sh isn't for you (yet)More specifically:
- AI engineers building agents that automate web workflows (QA, data extraction, form filling, monitoring).
- Product teams shipping browser-based features who want deterministic, auditable playbooks instead of black-box agent runs.
- Platform teams looking to reduce token spend and latency across their agent fleet.
- Anyone using Claude Code, Cursor, or Codex who wants their coding agent to browse the web with pre-built expertise.
The Vision
A dominant story about browser agents right now is that they'll get good when the underlying models get good. We're one Anthropic or OpenAI release away from agents that just work on the web.
We don't entirely buy that.
Even a perfect model still has to discover, on every new site, what a perfect model would already know if it had been there before. Without a place to put what the agent learns, every run is a fresh start. The models will keep getting better. The web will keep getting messier. The gap between "can reason about a page" and "knows the fastest path through this specific site" will persist.
Browse.sh is that place. One CLI. A growing catalog of skills. Memory that compounds.
We built this because we believe the real unlock for browser agents isn't better reasoning. It's better memory, in a form that humans can audit and agents can execute.
Install our CLI with:
npm i -g browseAnd find or create the skill you need at browse.sh.
The bottleneck for browser agents was never intelligence. It was amnesia.
Browse.sh is the cure.
Original source - May 14, 2026
- Date parsed from source:May 14, 2026
- First seen by Releasebot:May 27, 2026
Introducing the Session Replay API: Stream browser session replays
Browserbase adds a Session Replay API that lets teams stream completed browser session replays through HLS, list recorded tabs, and embed playback in their own products with standard players like hls.js, Shaka, video.js, or Safari.
TL;DR Today you can curl a Browserbase session and get back HLS. List the tabs in a session, fetch a .m3u8 for any of them, hand it to hls.js (or Shaka, or video.js, or Safari natively), and you're done. The same fMP4 segments that power our session replay, embedded directly in your product.
What is Session Replay?
Every browser session you run on Browserbase records itself by default. Frames, tab switches, the lot, captured as the session happens and stored for 31 days. We've been using this in our own dashboard for a while. Open any session in our dashboard and you get a scrubbable video timeline, a tab switcher, and synchronized network and console panels alongside the playback. When a browser agent does something strange at 3 AM, that's where you go to see the exact frames it saw.
Many of our customers have been using that view and asking for the same recording, but inside their product instead of ours. A QA tool that drops the failing run next to the bug report, a support dashboard where the end-user replays the agent that booked their flight, an internal review surface where ops looks over sessions without bouncing into Browserbase.
What’s new with Session Replays?
Our new Session Replay API allows you to stream completed browser session replays when you want, wherever you want.
The new surface fits into the API you already use. There are two endpoints, both gated by your existing x-bb-api-key, and live on your backend so the key never reaches a browser. The first lists every tab a session recorded; the second hands back the HLS playlist for whichever one you ask for.
The metadata response is a list of pages, ordered by pageId ascending, with millisecond offsets from the moment the session started (not Unix epoch):
Each url is a relative path against our API (https://api.browserbase.com), and a session can record up to ten concurrent tabs. Hand the second endpoint to any HLS-compatible player and you'll get back a standard VOD .m3u8 whose segment URLs are pre-signed CloudFront links, which means the browser pulls bytes straight from our CDN with your servers nowhere in the loop.
No re-encoding, no custom token format, no SDK.
Embed it in your dashboard
Pick whichever player your stack already has. The playlist URL is the same in all four.
For example:
<!-- hls.js: the default for Chromium and Firefox --> <video id="replay" controls muted autoplay playsinline></video> <script src="<https://cdn.jsdelivr.net/npm/hls.js@1>"></script> <script> const hls = new Hls(); hls.loadSource("/your-backend-route/replay.m3u8"); hls.attachMedia(document.getElementById("replay")); </script>Shaka Player, video.js, and Safari's native all work too; the docs have the boilerplate for each.
The integration pattern we recommend
Three hops, no proxying segments:
- Your backend hits GET /v1/sessions/:id/replays/:pageId with x-bb-api-key, gets back a .m3u8, forwards the body to your frontend unchanged.
- Your frontend points its HLS player at your backend route. The player parses the manifest and starts requesting segments.
- The browser fetches segments directly from CloudFront using the pre-signed URLs embedded in the playlist.
Two things this gives you for free. The API key never reaches a browser (and never gets leaked in the client). And segments stream directly from the CDN to your end-user, so you don't pay double-egress to proxy them through your servers.
Whats next?
For most of the last year, session replay was something you could look at inside our dashboard. That was the right starting point, most of the work was getting the recording itself trustworthy, getting encoding off the critical path, getting multiple tab recording right.
Now, the same fMP4 segments that power Session Inspector are now reachable from your backend with a key you already have, behind a .m3u8 any player understands.
You can stream session replays in your own product. Drop a failing agent run next to the bug report in your QA tool. Show an end-user the exact session that booked their flight. Let ops review overnight runs without ever opening our dashboard. The recording was always yours, now the surface is too.
Get started with our new Session Replay API with the docs here.
Original source - May 14, 2026
- Date parsed from source:May 14, 2026
- First seen by Releasebot:May 15, 2026
Session replay streaming
Browserbase adds embeddable session replays that stream to users in seconds, with HLS playback and CDN delivery included.
Session replays can now be embedded directly in your product and streamed to your end users within seconds of a session ending.
Includes:
- API for embedding session replays in your own product
- Playback in any HLS-compatible player
- Storage, fMP4 encoding, and CDN delivery handled by Browserbase
Your backend fetches the session's
.m3u8playlist from our API, your frontend hands it to a player, and end users stream segments directly from our CDNFree on every plan, up to 120 sessions per minute.
Original source - May 14, 2026
- Date parsed from source:May 14, 2026
- First seen by Releasebot:May 14, 2026
Introducing the Session Replay API: Stream browser session replays
Browserbase introduces Session Replay Streaming, letting users list recorded tabs in a session and fetch HLS .m3u8 replays for any page. The API is built for backend use and works with HLS players like hls.js, Shaka, video.js, and Safari.
TL;DR
Today you can curl a Browserbase session and get back HLS. List the tabs in a session, fetch a .m3u8 for any of them, hand it to hls.js (or Shaka, or video.js, or Safari natively), and you're done. The same fMP4 segments that power our session replay, embedded directly in your product.
What is Session Replay?
Every browser session you run on Browserbase records itself by default. Frames, tab switches, the lot, captured as the session happens and stored for 31 days. We've been using this in our own dashboard for a while. Open any session in our dashboard and you get a scrubbable video timeline, a tab switcher, and synchronized network and console panels alongside the playback. When a browser agent does something strange at 3 AM, that's where you go to see the exact frames it saw.
Many of our customers have been using that view and asking for the same recording, but inside their product instead of ours. A QA tool that drops the failing run next to the bug report, a support dashboard where the end-user replays the agent that booked their flight, an internal review surface where ops looks over sessions without bouncing into Browserbase.
What’s new with Session Replays?
Our new Session Replay API allows you to stream completed browser session replays when you want, wherever you want.
The new surface fits into the API you already use. There are two endpoints, both gated by your existing x-bb-api-key, and live on your backend so the key never reaches a browser. The first lists every tab a session recorded; the second hands back the HLS playlist for whichever one you ask for.
// 1. List the tabs (pages) recorded in a session const pages = await fetch(`https://api.browserbase.com/v1/sessions/${SESSION_ID}/replays`, { headers: { "x-bb-api-key": BROWSERBASE_API_KEY } }).then((r) => r.json()); // 2. Fetch the HLS playlist for any page const playlist = await fetch(`https://api.browserbase.com/v1/sessions/${SESSION_ID}/replays/0`, { headers: { "x-bb-api-key": BROWSERBASE_API_KEY } }).then((r) => r.text());The metadata response is a list of pages, ordered by pageId ascending, with millisecond offsets from the moment the session started (not Unix epoch):
{ "pages": [ { "pageId": "0", "url": "/v1/sessions/0a4c.../replays/0", "startTimeMs": 0, "endTimeMs": 121382 }, { "pageId": "1", "url": "/v1/sessions/0a4c.../replays/1", "startTimeMs": 13001, "endTimeMs": 121382 } ], "pageCount": 2 }Each url is a relative path against our API (https://api.browserbase.com), and a session can record up to ten concurrent tabs. Hand the second endpoint to any HLS-compatible player and you'll get back a standard VOD .m3u8 whose segment URLs are pre-signed CloudFront links, which means the browser pulls bytes straight from our CDN with your servers nowhere in the loop.
No re-encoding, no custom token format, no SDK.
Embed it in your dashboard
Pick whichever player your stack already has. The playlist URL is the same in all four.
For example:
<!-- hls.js: the default for Chromium and Firefox --> <video id="replay" controls muted autoplay playsinline></video> <script src="<https://cdn.jsdelivr.net/npm/hls.js@1>"></script> <script> const hls = new Hls(); hls.loadSource("/your-backend-route/replay.m3u8"); hls.attachMedia(document.getElementById("replay")); </script>Shaka Player, video.js, and Safari's native all work too; the docs have the boilerplate for each.
The integration pattern we recommend
Three hops, no proxying segments:
- Your backend hits GET /v1/sessions/:id/replays/:pageId with x-bb-api-key, gets back a .m3u8, forwards the body to your frontend unchanged.
- Your frontend points its HLS player at your backend route. The player parses the manifest and starts requesting segments.
- The browser fetches segments directly from CloudFront using the pre-signed URLs embedded in the playlist.
// example backend impl. The only place the API key lives. app.get("/replay/:sessionId/:pageId", async (c) => { const { sessionId, pageId } = c.req.param(); const res = await fetch(`{{<https://api.browserbase.com/v1/sessions/${sessionId}>}}/replays/${pageId}`, { headers: { "x-bb-api-key": process.env.BROWSERBASE_API_KEY! }, }); return new Response(await res.text(), { headers: { "content-type": "application/vnd.apple.mpegurl" }, }); });Two things this gives you for free. The API key never reaches a browser (and never gets leaked in the client). And segments stream directly from the CDN to your end-user, so you don't pay double-egress to proxy them through your servers.
Whats next?
For most of the last year, session replay was something you could look at inside our dashboard. That was the right starting point, most of the work was getting the recording itself trustworthy, getting encoding off the critical path, getting multiple tab recording right.
Now, the same fMP4 segments that power Session Inspector are now reachable from your backend with a key you already have, behind a .m3u8 any player understands.
You can stream session replays in your own product. Drop a failing agent run next to the bug report in your QA tool. Show an end-user the exact session that booked their flight. Let ops review overnight runs without ever opening our dashboard. The recording was always yours, now the surface is too.
Get started with our new Session Replay Streaming with the docs here.
Original source - May 13, 2026
- Date parsed from source:May 13, 2026
- First seen by Releasebot:May 14, 2026
Reduce context bloat in Stagehand 3.4
Browserbase adds Stagehand 3.4.0, bringing ignoreSelectors to keep noisy page elements out of extract() and observe(), plus support for agent variables in the API schema, smarter agent mode defaults, new CUA model support, and stronger frame handling.
Stagehand 3.4.0 is live.
Use ignoreSelectors to keep noisy parts of a page out of extract() and observe(): ads, nav, modals, related posts, and anything else your agent should ignore.
const article = await stagehand.extract( "extract the article title and body", object({ title: z.string(), body: z.string(), }), { ignoreSelectors: [ ".ad", ".newsletter-modal" ], }, );Plus:
- Agent variables are now supported in the Stagehand API schema without the experimental requirement.
- Agent mode now defaults to hybrid for compatible models and DOM mode otherwise.
- New CUA model support: openai/gpt-5.4-mini, openai/gpt-5.5, and anthropic/claude-haiku-4-5.
- Better OOPIF frame handling and stronger observe element ID prompting.
Read the Stagehand docs for extract() and observe().
Original source - May 11, 2026
- Date parsed from source:May 11, 2026
- First seen by Releasebot:May 12, 2026
stagehand/server-v3 v3.6.9
Stagehand ships v3.6.9 with evals help, docs cleanup, and dependency updates.
What's Changed
Evals man help by @miguelg719 in #2092
[docs]: rm lockfile from docs package by @seanmcguire12 in #2099
[chore]: move integration libs into peer deps by @seanmcguire12 in #2101
[chore]: rm evals changeset by @seanmcguire12 in #2108
Full Changelog: stagehand-server-v3/v3.6.8...stagehand-server-v3/v3.6.9
Original source - May 11, 2026
- Date parsed from source:May 11, 2026
- First seen by Releasebot:May 12, 2026
@browserbasehq/[email protected]
Stagehand adds ignoreSelectors for extract() and observe(), expands v3 agentExecute variables support, improves frame handling and prompts, updates default agent mode, and adds support for new CUA models.
Minor Changes
- #2084 0641d44 Thanks @seanmcguire12! - add ignoreSelectors param to extract()
- #2096 a11603d Thanks @seanmcguire12! - add ignoreSelectors to observe()
Patch Changes
- #2080 21c78b3 Thanks @miguelg719! - Add variables support to v3 agentExecute API schema and remove experimental requirement
- #2077 f437f73 Thanks @monadoid! - Fix frame registry handling for OOPIF pages
- #2098 a783b99 Thanks @seanmcguire12! - bump transitive deps to patched versions
- #2089 8d2f354 Thanks @shrey150! - Strengthen observe prompts so LLMs return complete encoded element IDs.
- #2047 a87c1fc Thanks @tkattkat! - Set default agent mode to hybrid with auto routing to dom for non compatible models
- #2101 26e6c96 Thanks @seanmcguire12! - move playwright-core, puppeteer-core, patchright-core from optional dependencies to peer dependencies
- #2068 1d176c4 Thanks @filip-michalsky! - Remove the default temperature setting from v3 agent AI SDK calls so reasoning models that do not support temperature run without provider warnings.
- #2040 1fa9613 Thanks @monadoid! - Prefer STAGEHAND_API_URL for Stagehand API overrides while retaining STAGEHAND_BASE_URL as a deprecated fallback.
- #2065 9ff70dd Thanks @miguelg719! - Add support for CUA models: openai/gpt-5.4-mini, openai/gpt-5.5, and anthropic/claude-haiku-4-5
- #2039 7640381 Thanks @monadoid! - Deprecate Browserbase project ID configuration.
- May 8, 2026
- Date parsed from source:May 8, 2026
- First seen by Releasebot:May 9, 2026
stagehand/server-v3 v3.6.8
Stagehand ships a maintenance update with bumped transitive dependencies across the monorepo.
What's Changed
- [chore]: bump various transitive deps across monorepo by @seanmcguire12 in #2098
Full Changelog: stagehand-server-v3/v3.6.7...stagehand-server-v3/v3.6.8
Original source
Curated by the Releasebot team
Releasebot is an aggregator of official release notes from hundreds of software vendors and thousands of sources.
Our editorial process involves the manual review and audit of release notes procured with the help of automated systems.
Similar to Browserbase with recent updates:
- Anthropic release notes587 release notes · Latest May 29, 2026
- Perplexity release notes24 release notes · Latest May 11, 2026
- Canva release notes37 release notes · Latest May 27, 2026
- Notion release notes130 release notes · Latest May 27, 2026
- xAI release notes79 release notes · Latest May 29, 2026
- Anydesk release notes67 release notes · Latest May 21, 2026