How We Made Cloud Browsers 3x Cheaper and Faster

B browser-use.com ↗

▲ 322 points • 238 comments • by gregpr07 • 1w ago • HN discussion ↗

Pangram verdict · v3.3

We believe that this document is primarily human-written, with a small amount of AI content detected

12 %

AI likelihood · overall

Human

90% human-written 10% AI-generated

SEGMENTS · HUMAN 4 of 5

SEGMENTS · AI 1 of 5

WORD COUNT 1,772

PEAK AI % 76% · §5

Analyzed

Jun 17

backend: pangram/v3.3

Segments scanned

5 windows

avg 354 words each

Distribution

90 / 10%

human / AI fraction

Verdict

Human

Pangram v3.3

Article text · 1,772 words · 5 segments analyzed

Human AI-generated

§1 Human · 0%

Our cloud browsers need to do three things at once: start quickly, remain isolated, and be cheap. That is why we rebuilt Browser Use Cloud, so a new session starts in under a second and costs $0.02 per browser hour, down from $0.06. This is harder than it sounds. A browser has Chromium, a filesystem, cookies, cache, proxy settings, downloads, and sometimes a logged-in customer session. If one browser can read another browser's state, it creates a security problem. The normal answer is a virtual machine, or VM. A VM is a computer inside a computer: it gets its own CPU, memory, disk, and network devices. It is separate from everything else on its host, and if the browser breaks, leaks information, or gets attacked, the damage stays within the VM. Normal VMs, however, are too heavy for cloud browsers. We need to create them constantly, sometimes thousands at a time, and throw them away as soon as sessions end. If each browser needs a slow, expensive VM, the product becomes slow and expensive, too. The question for us is whether we could give every browser its own VM without making users wait or pay for it. We now do that with Firecracker, a lightweight VM system. Every Browser Use Cloud session runs in its own, tiny VM. These VMs run on EC2, Amazon's rented cloud servers. That is the unusual part. Firecracker is normally run on bare-metal servers, where you rent the whole physical machine. To reduce customers' cost, we run it on regular EC2, where AWS has already put your server inside a VM. This should be slow. Nested VMs make memory and CPU operations more expensive, and Chromium takes time to start. This post is about how we made this setup fast and efficient. But first, why did we rebuild our infrastructure? It is difficult to be fast, isolated, and cheap all at once. Why we left unikernels behind We used to run cloud browsers with Unikraft, which builds small, VM-like containers called unikernels. Unikernels, instead of booting a full Linux system, load a small image built for your purposes. Unikernels start quickly and are cheap when idle because you can shut them down when no one is using them.

§2 Human · 0%

Unikraft was good for turning browsers off when they were not in use, but bad at adding more browsers quickly when traffic spiked. If more users suddenly asked for browsers at once, you would need to scale browser capacity rapidly. Unikraft does not have good built-in autoscaling, so an engineer had to change a variable, manually adding more instances. During a burst in traffic, the system, instead of reacting on its own, required humans to adjust it. This caused problems: one load test brought down production for 45 minutes. So we rebuilt our setup on Firecracker. Unikraft needed an engineer to add capacity by hand, so it lagged behind a traffic spike and collapsed. After the rebuild, capacity tracks demand automatically. Firecracker provides a layer through which you can create, monitor, and run VMs. It gives each VM CPU, memory, disk, and network devices, and it keeps it isolated from the host and from other VMs. Teaching browsers to scale themselves Firecracker gave each browser its own VM. But it did not inherently solve the problem that broke the old system: deciding how many VMs to run, where to put them, and when to add more. So we built our own control plane. The control plane monitors our fleet of browsers and decides whether we should scale up or down. When a user asks for a browser, the control plane picks a machine with room. When traffic rises, it starts more machines. When traffic falls, it stops sending new browsers to machines we want to remove. It checks the fleet in real time. That is much faster than waiting on CloudWatch, AWS's monitoring service, which usually reacts on one-minute windows. It also knows things generic metrics do not: browsers that are still starting, machines we are trying to remove, and machines that should not receive new sessions. A request flows from user code through stateless edge routers; the control plane picks an EC2 host with room and keeps draining hosts out. Why we run VMs inside VMs Once we had a control plane, the next question was what kind of machines it should add. The usual way to run Firecracker on AWS is a .metal instance. This means you rent the whole physical server, and Firecracker runs directly on it. We chose regular EC2 instead. Regular EC2 machines are faster to get and cheaper to keep around.

§3 Human · 9%

Our hosts boot from a pre-built image and start serving browsers about 30 seconds after launch. The faster we can add a host, the less idle capacity we need to pay for, and the lower the cost we pass on to our customers. The catch is that regular EC2 is already a VM. AWS runs our host inside its own isolation layer, and then we run browser VMs inside that host. In other words, every browser is a VM inside a VM. This is not the normal way of using Firecracker. When a browser VM needs help from the host, the request passes through two VM layers instead of one, adding latency. We decided the tradeoff was worth it, as regular EC2 gives us faster scale-up and lower cost. To mitigate the effects of nested virtualization, we focused on making Firecracker as speedy as possible. On regular EC2, the browser VM sits above an extra AWS hypervisor layer, so a page fault can cross both VM layers. From request to usable browser When a user asks for a browser, the control plane picks a machine with room. That machine restores a saved browser VM, starts Chromium inside it, waits until Chromium is ready to be controlled, and returns a connection URL. That URL is what the user's agent connects to. Browser Use controls Chromium over a WebSocket using the Chrome DevTools Protocol, or CDP. CDP is the remote-control API for Chrome: click this button, type this text, read this page, take this screenshot. Every session in five steps: the control plane picks a host, the VM resumes from a snapshot, Chromium launches, and once it is CDP-ready, the agent connects over a WebSocket. Three things made this take longer: restoring the VM's memory, launching Chromium, and keeping the browser stealthy and undetected by anti-bot security. The first slowdown: memory The first bottleneck was memory. A production browser is not booted from scratch. We resume it from a snapshot: a saved VM that is already booted and paused just before Chromium launches. Resuming a VM is much faster than booting one. Our first resumes were still too slow. When a restored VM touches memory for the first time, the host has to map that memory back in.

§4 Human · 13%

This event is called a page fault. In a nested VM, each page fault is expensive because it can cross both VM layers. During an early cold start, page faults were 72% of all VM exits. Getting from resume to a CDP-ready browser took 9.8 seconds. The fix was to map memory in larger chunks. Before, the VM restored memory in 4KB pages. Now, it uses 2MB pages. Each page covers 512 times more memory, so the browser triggers far fewer page faults while it wakes up. Fewer page faults mean fewer trips through the nested VM layers. Mapping memory in 2MB pages made browsers start more quickly. We also now handle page faults ourselves with a custom handler for userfaultfd, a Linux API for handling missing memory pages. Before the VM starts running, our handler loads the memory Chromium is most likely to access first. Our handler keeps Chromium from receiving a flood of page faults as it starts. The host has already loaded the hot pages, and the remaining pages arrive before the browser needs most of them. These changes cut the time from resuming the VM to having a browser ready to accept commands from 9.8 seconds to 3.1 seconds. They also cut the number of times the browser VM had to stop and ask the host to handle missing memory from roughly 100,000 times per resume to about 1,100, about a 91x drop. We made smaller refinements, too. The VM was spending 500ms looking for an old PS/2 keyboard that didn't exist. We disabled this check. Additionally, we changed how the host waits for the browser to become ready. Before, the host kept polling the VM with HTTP requests. That created extra VM exits, or moments when the browser VM had to pause so the host could handle work for it. Now, the browser driver writes a ready message to its log, and the host reads that log over vsock, a fast communication channel between the host and the VM. The host sees the ready message in under a millisecond. The second slowdown: Chromium startup The next bottleneck was CPU. When Chromium starts, it is hungry and demanding. It creates renderers, compositors, and V8 isolates at once. After that, browser automation is much quieter.

§5 AI · 76%

An agent clicks, waits, reads, clicks again. Because Chromium is quieter after it has started, we can pack many browsers into the same instance. A single host can accommodate many browsers because browsers spend most of their time waiting: waiting for a page, a network response, or the next agent action. Browser VMs packed onto a single regular EC2 host. We handle the launch burst in two phases. While a browser resumes and Chromium starts, we leave its virtual CPUs unpinned. That means Linux can move the browser's CPU work across the host instead of locking it to fixed cores. This spreads the burst out. Once the browser reports that it's ready, we pin those virtual CPUs to stable cores. That means the browser VM now runs on specific cores. Stable placement lets us pack more browsers onto the same host without guessing. We know which cores are taken, which ones still have room, and which browsers might interfere with each other. The launch phase is like letting a crowd enter through every open door. Once everyone is inside, assigned seats work better. Pinning from the start made things worse. When many browsers launched at once, they piled onto the same hot cores, and some launches failed. We also became careful about hyperthreads. A physical CPU core often appears as two logical CPUs, called sibling threads. Those siblings still share the same physical core. If two browser VMs each get one sibling, they fight over the same core.