Composition Shouldn't be this Hard — Cambra

C cambra.dev ↗

▲ 126 points • 76 comments • by larelli • 2mo ago • HN discussion ↗

Pangram verdict · v3.3

We believe that this document is fully human-written

2 %

AI likelihood · overall

Human

100% human-written 0% AI-generated

SEGMENTS · HUMAN 5 of 5

SEGMENTS · AI 0 of 5

WORD COUNT 1,952

PEAK AI % 2% · §1

Analyzed

Apr 24

backend: pangram/v3.3

Segments scanned

5 windows

avg 390 words each

Distribution

100 / 0%

human / AI fraction

Verdict

Human

Pangram v3.3

Article text · 1,952 words · 5 segments analyzed

Human AI-generated

§1 Human · 2%

I’ve spent over a decade building data infrastructure: observability systems at Twitter, streaming data processing systems at Google, declarative data processing systems at Snowflake. From the beginning, I noticed a strange gap between the conceptual elegance of programming languages and databases, and the reality of developing and operating real systems using them. That reality is filled with tedium and stress. All of the systems I’ve ever worked on have felt brittle in one way or another: hard to change, and easy to break. Infrastructure engineers develop paranoia around change. We invest more effort testing and deploying changes than making them. We call it maturity, but I’ve never stopped questioning it. There must be a way to delegate the tedium to our tools and focus on what attracted us to this field: brainstorming ideas, trying them out, and seeing their effects. But what’s missing, exactly? Decades of effort by thousands of brilliant minds have gone into the field of computing, much of it directed at closing the gap between accidental and inherent complexity. Surely some major innovation in the foundation isn’t just waiting to be discovered—wouldn’t someone have found it already? Maybe. But maybe not. The structure of modern abstractions points to a specific opportunity: the status quo forces a choice between powerful tools and general-purpose tools. This feels like a false dichotomy. There’s no reason we can’t have both—if we can find the right model. After years of searching, I think I’ve found a model that can break out of this tradeoff. Implementing it is more than I can do alone, which is why my cofounders, Daniel Mills and Skylar Cook, and I are starting Cambra. We are developing a new kind of programming system that rethinks the traditional internet software stack on the basis of a new model. Our goal: make developing internet software feel like working on a single, coherent system, not wiring together a fragmented mess of components. In what follows, I will explain why models matter, how fragmentation undermines them, and why building multi-domain coherent systems is both possible and necessary. Models Give You Superpowers Computers are magic. They let abstract concepts manifest in and affect the real world. A spreadsheet formula updates a budget, and you decide whether you can afford that shiny new thing. A routing algorithm computes the shortest path, and you arrive at your destination.

§2 Human · 1%

A database records a transaction, and money moves between bank accounts. Every computer program works in terms of a model: an abstract way to represent the world in simplified terms. Models allow programs to ignore the overwhelming complexity of reality, and instead focus on the parts of the world that are essential to the programmer’s goal. At its most reductive, a program is a loop that receives input, updates internal state, computes consequences, and sends output. However, that oversimplification masks a deep truth: the choice of model has a huge impact on which programs are feasible to develop and maintain. In other words, there are better models and worse models. Better models rely on intuitive, well-behaved concepts and give you useful rules about how to create programs and reason about their behavior. Great models give you superpowers. They don’t just make programs easier to read and write. They make them easier to reason about. They make it possible to create tooling that can verify, optimize, and refactor programs automatically. So why don’t we just use great models all the time? To answer that, we need to start at the bottom. All modern computer programs ultimately work in terms of the same foundational model: bits stored in memory and instructions to manipulate them. But this model is so low-level that it’s hard to map its concepts to the familiar concepts we typically care about. In other words, given a program written in terms of bits and instructions, it’s very difficult to infer its purpose. Conversely, given an intuitive specification of a program’s effects on the real world, it’s very difficult to map this specification to a “bits and instructions” program. To make this mapping easier, we build higher-level models atop this foundation: programming languages, operating systems, databases. Programming in terms of a higher-level model comes with a sacrifice: you give up control over how the program is “lowered” into lower-level terms. But with that loss of control comes a reduction in complexity, which is often a favorable trade. For example, garbage collection allows a programmer to not worry about deallocation, in exchange for giving up control over memory management. Models form a partially-ordered hierarchy, with a model being higher than those it builds upon and lower than those that build upon it. But higher level models are not necessarily better suited for implementing a particular program. A better choice is a model whose concepts correspond cleanly to those of the problem domain.

§3 Human · 1%

Working within a domain-aligned model makes it easier to convert back and forth between requirements and implementation. Much of the value of models comes from tooling. Tooling can help us ensure correctness, improve performance, and evolve our systems over time. But tooling works in terms of a specific model, and only has leverage over the concepts in that model. For example, consider what an OS-level tool like top can tell you about your program: resource consumption, uptime, network throughput, etc. It cannot do the things that are possible for a language-level tool like gdb, which works in terms of C’s programming model. But since tooling only helps within its model, if you frequently need to “drop down” to a lower level, you lose those benefits. The best higher-level models are ones where you rarely need to drop down. We call these models sealed: they provide an abstraction that doesn’t leak its internal details often. The modern world has many examples of ubiquitous, sealed models: it’s rare to find programs written directly in assembly, that implement their own operating system, or that manage state without a database. Once a model becomes sealed, efforts bifurcate: some people develop programs in terms of that model, others develop programs that implement it. This is the ideal: work within a sealed, domain-aligned model, and let tooling handle the boring stuff. But what happens when the system you’re building doesn’t fit within a single model? Interoperability Causes Fragmentation Modern software systems are assembled from components: databases, caches, queues, services, frontends. In principle, this is empowering—you take components off the shelf, wire them together, and have a sophisticated system. In practice, the process is often frustrating:

It’s tedious. The job of so many software developers in the last decade has come to involve an inordinate amount of configuration management and quality assurance, at the cost of the creativity and ingenuity that attracted us to the field. It’s inflexible. Once you’ve chosen some components and wired them together, changing the capabilities of your system is quite difficult, as modifying or swapping components is often very hard.. It’s error prone. Ensuring that they’re wired together correctly is the developer’s responsibility, with only limited tooling available to assist. Bespoke testing frameworks abound, but they invariably fall short.

§4 Human · 1%

It’s unperformant. Priorities are (rightly) driven by the need to minimize development cost and mitigate deployment risk. As a result, performance rarely receives much attention, and often degrades over the lifetime of a system.

So, the systems we build often end up brittle. But why? Is it a necessary consequence of building complex systems? We don’t think so. We think it happens for a specific reason. Each component has an internal model—the concepts it uses internally. But components also need to interact with each other, and often use a different, lower-level model for those interactions. A library interacts in terms of the same model as your code. A microservice exposing an API does not. When we build a system out of components, the model we use to reason about the system is determined by these interaction models, not the internal models. When components use a lower-level model to interact, the whole system is forced down to that level. In internet software, systems are overwhelmingly forced into what we call the “networks and operating systems” model: computers, processes, memory, network addresses, packets. These are powerful abstractions, but they’re far removed from what we actually care about. They work in terms of bytes and addresses, not objects, people, places, and actions. For example, say we write a program and connect it to a relational database. The internal models of the program and database have clean, well-defined semantics, and they allow us to model our domain reasonably well. But the behavior of the system is not easily constrained by the semantics of either model. Instead, we have to think in terms of networks and operating systems to understand any problem that is not entirely contained to one of the components (e.g. “the server process crashed”, “the data encoding is corrupted”, “the connection was dropped”). There’s a good reason so many components use a different interaction model than their internal model: interoperability. There are lots of models out there, with valuable components built using them. But most of those models are incompatible with each other—either because they have incompatible concepts (e.g. programming languages and databases) or because they simply don’t have the concepts we need (e.g. the programs that most programming languages produce run in a single OS process, not across multiple machines). Components with incompatible internal models cannot interact directly—they must drop down to a lower-level, common model.

§5 Human · 1%

This is why the “networks and OSes” model is ubiquitous: it’s powerful, battle-tested, and sufficiently low-level that most components can build atop it. But achieving interoperability this way sacrifices the system-level benefits of working within a domain-aligned model. The Costs of Fragmentation Let’s call this kind of system a fragmented system. The distinguishing characteristic of a fragmented system is that it is assembled out of numerous components with incompatible internal models. Fragmented systems are brittle: they are hard to change and easy to break. In practice, that brittleness manifests in many ways. Contract Mismatches

Tweak the semantic meaning of an API field, downstream service still expects the old meaning—runtime error Microservice A deploys v2, Microservice B still expects v1—runtime error None of these are caught at compile time because the structure of the overall system isn’t represented anywhere but runtime

Cross-component Optimizations

“Push a filter down”—you want to fetch less data, but it requires changing the API contract at every layer between UI and database “Reorder a join”—changing the order in which lookups are done can massively reduce processing, but might require moving logic between components in a very awkward way. Move some logic from app to database (or vice versa)—rewrite in a different language, re-test, hope semantics match

Ceremony and risk around changes

Database migrations: write SQL, write rollback SQL, coordinate deploy order, handle partial failures Changing a shared data model: update schema, update every service, deploy in the right order and pray, or spend weeks testing with staging environments

Impedance Mismatches

The type systems of databases and programming languages are often incompatible, leading to subtle edge cases that are hard to test because they depend on the data actually stored in the database. Logic tests and data tests live in separate worlds even though they’re fundamentally specifying requirements on the same program. Your ORM makes relationships easy to traverse, but generates N+1 queries because it doesn’t understand the database

These are symptoms. What is the underlying cause? In a fragmented system, the developer must reason about behavior in terms of a low-level interaction model. Components are not trivially composable—every time one is added or modified, the implications of that change on other parts of the system are not constrained by that component’s internal model.