Pangram verdict · v3.3
We believe that this document is fully AI-generated
AI likelihood · overall
AIArticle text · 1,120 words · 4 segments analyzed
5 min read4 days ago--I’ve always found serverless platforms a little magical. You deploy a function, send a request, and somewhere in the world, resources are allocated automatically and your code gets executed. But the more I used these platforms, the more I wanted to understand what was actually happening underneath.So I built a lightweight serverless runtime from scratch using Firecracker microVMs.This is an honest account of what I learned — including the parts where I had no idea what I was doing.The full project is on GitHub: https://github.com/vivek1504/serverless-runtimeWhy Firecracker?Traditional containers boot fast but offer weak isolation. Traditional VMs offer strong isolation but boot slowly. Neither is a great fit for serverless, where you need to run arbitrary user-submitted code quickly and safely on shared infrastructure.Firecracker is a minimal VMM (Virtual Machine Monitor) developed by Amazon, built specifically for Lambda and Fargate. Instead of booting full VMs, it launches microVMs with a stripped-down device model, minimal attack surface, and KVM-based virtualization — fast enough for serverless workloads, isolated enough to run untrusted code.The Cold Start ProblemThe biggest challenge in any serverless system is cold start latency. A naive execution looks like this:Request → Create VM → Boot kernel → Start runtime → Load user code → Execute handlerEven optimized microVMs take time to boot. In my implementation, a full cold boot clocked at around ~200ms. At serverless scale, that becomes noticeable and painful.The solution is snapshots. Boot the VM once, initialize the runtime, then take a memory snapshot of that state. For every subsequent invocation, restore from the snapshot instead of booting fresh.- Full VM boot → ~200ms- Snapshot restore → ~1–5msThat’s a 40–200x improvement. Most of the work in this project went into making that restoration reliable.System ArchitectureThe runtime has two layers:Control Plane — handles deployment, VM lifecycle, snapshot orchestration, scheduling, and request routing.Execution Layer — handles isolated function execution inside Firecracker microVMs.both of this layers communicate using vsock.
Press enter or click to view image in full sizeWhen a function is deployed:The function zip is uploadedA minimal root filesystem is preparedUser code is extracted into the VM filesystemFirecracker boots the microVMThe runtime initializes inside the VMA snapshot is takenFuture invocations restore directly from step 6.Three Things That Were Harder Than Expected1. The PID 1 Rabbit HoleThe first time I booted a custom microVM, I got a kernel panic almost immediately. The VM crashed and gave me nothing useful to debug with no SSH, no shell, just silence.Linux expects PID 1 to stay alive and take responsibility for the entire userspace. It needs to reap zombie processes, forward signals, and manage process lifecycle. My init script was exiting after launching the runtime, which the kernel interpreted as a catastrophic failure.I spent a whole afternoon trying to figureout why my VMs were crashing. The fix was tini — a minimal init system that handles zombie cleanup, signal forwarding, and process lifecycle correctly. One binary, many headaches solved.2. Snapshot Boundaries Are Architectural DecisionsI assumed snapshotting was straightforward — freeze the VM state, save it, restore it later. It’s not.The tricky part is timing. Snapshot too early, and the runtime isn’t initialized. Snapshot too late, and you’ve captured live socket state, open vsock connections, or in-flight network handles that won’t survive a restore correctly.I learned this the hard way when restored VMs would inherit inherited identical vsock state, causing guest-host communication to collide or hang immediately after restorekThe fix was using a newer Firecracker development build with improved vsock handling during snapshot restore3. vsock IPC Race ConditionsHost-to-VM communication uses vsock . Conceptually simple but In practice, much harder to implement.The failure mode I kept hitting was the guest runtime would start listening on a vsock port before the host-side bridge was ready, or vice versa. The connection would either fail silently or hang indefinitely. Using socat as a bridge introduced its own synchronization race conditions on top.The fix required careful lifecycle ordering — host bridge first, then guest listener, with explicit readiness signaling before any IPC traffic. But the bigger lesson was that vsock is only a byte-stream transport. Message framing, connection lifecycle, and error propagation are entirely your responsibility. There are no built-in message boundaries. Partial reads are real.
I had to design an explicit framing protocol on top.Runtime DesignInside each microVM, a lightweight Node.js runtime handles function execution. The lifecycle is deterministic:request → execute handler → return response → (reset state)Each invocation receives a request over vsock, calls the exported handler function, and returns the result to the host. The runtime also handles malformed input, execution errors, and JSON parsing failures gracefully.One of the most important design decisions: warm runtime reuse. Rather than restoring a fresh VM snapshot for every request, I reused warm runtimes across multiple invocations. This trades strict isolation (no shared state between invocations) for dramatically better throughput and latency.Multi-Tenant SchedulingServerless platforms never run just one function. To handle concurrent workloads, I implemented:Per-function queues — each deployed function has its own execution queueConcurrency limits — prevents any single function from monopolizing VM capacityWeighted scheduling — distributes load fairly across functionsWithout this, a single bursty function can starve everything else. Scheduling turned out to be a distributed systems problem in miniature — balancing fairness, throughput, and isolation under load.Benchmark ResultsTested with autocannon, 10 concurrent connections over 30 seconds.Host: Intel i5-11400H · 8GB RAM · Ubuntu 24.04 · Firecracker v1.16 · Node.js v20Metric ResultThroughput ~5,400 req/sp50 latency ~1msp99 latency ~4msKey optimizations that got us here:Snapshot reuse instead of cold bootsPersistent runtime reuse across invocationsPersistent vsock channels instead of per-request connectionsTradeoffs I’d Think About DifferentlyIsolation vs. throughput — Warm runtime reuse is fast, but it means shared process state between invocations. A stricter model (fresh snapshot restore per request) would improve isolation at the cost of latency. The right choice depends on your threat model.Node.js only — The current runtime is intentionally scoped to Node.js handlers. Supporting multiple runtimes would require a more flexible execution contract and significantly more orchestration work.What This Project Taught MeBefore building this, serverless platforms felt abstract to me — something that just worked.
Now I understand the engineering problems underneath:Why cold starts are hard to eliminateWhy isolation and performance are fundamentally in tensionWhy observability in minimal systems requires deliberate designIt was the most technically difficult project I’ve worked on. It was also the most educational. Every layer — virtualization, Linux boot internals, IPC design, scheduling — required genuinely learning something new, not just applying existing knowledge.If you’re interested in systems programming, infrastructure engineering, or just want to understand how serverless actually works under the hood — I’d encourage building something like this. Not because it’s practical, but because it’s one of the best ways to develop real systems intuition.The full code is on GitHub: https://github.com/vivek1504/serverless-runtime