Project Valhalla, Explained: How a Decade of Work Arrives in JDK 28 - JVM Weekly vol. 180
Pangram verdict · v3.3
We believe that this document is a mix of AI-generated, AI-assisted, and human-written content
AI likelihood · overall
MixedArticle text · 1,723 words · 5 segments analyzed
On June 15, Oracle engineer Lois Foltan confirmed what a good chunk of the industry had stopped believing: JEP 401: Value Classes and Objects will be integrated into the main OpenJDK repository and is targeting JDK 28.The change is so large that the remaining committers were asked to hold off on bigger commits during the integration. The pull request alone adds over 197 thousand lines of code across 1,816 files.Before we pop the champagne, though: this is preview, disabled by default, and, as Brian Goetz was quick to cool everyone down, “only the first part of Valhalla.” Goetz added a great observation that the “they’ll never ship it” crowd will now smoothly switch over to “but they didn’t ship the most important part” (and a joke has been going around the community for years that we’ll sooner end up in Valhalla ourselves, the Norse-afterlife one, than the project ships).You have to earn your own haters.So this is a good moment to tell the whole story. This issue is one big deep-dive, written on the assumption that you’ve never followed the work on Valhalla before: from the 2014 problem, through the evolution of ideas (a fair number of which ended up in the trash), all the way to what exactly we’ll be getting our hands on in JDK 28. Brew yourself a coffee. I’ve been sitting on this edition for a long time, saving it for exactly this occasion.The slogan Valhalla has carried from the start is: “codes like a class, works like an int.” In a single sentence it captures the whole point of the project: we want to write normal, readable classes with methods, constructor validation, and sensible field names, but we want the JVM to be able to treat them as efficiently as primitives.To understand why this is a problem, you have to go back to Java’s foundation. In this language, with the exception of the eight primitives (int, long, double, boolean, and the rest), everything is a reference type. When you write Point p = new Point(1, 2), the variable p isn’t a point. The variable p is a pointer, a coat-check number: somewhere on the heap sits an object, and you’re holding a slip of paper with its address.
Every time you want to read a field, the JVM has to “go to the coat check,” performing a hop through the pointer (pointer indirection).For a single object, that’s nothing. The problem starts at scale. Every object on the heap has its own header (a dozen-or-so bytes of metadata: among other things, so the JVM knows what type it is and whether anyone is synchronizing on it). Incidentally, this is exactly the problem Project Lilliput has been tackling lately, helping to shrink object header sizes. But header size isn’t everything. Every object has to be allocated, and later garbage collected. And since objects are scattered across the heap, an array of a million Points is in practice a million slips of paper pointing at a million boxes strewn across the whole warehouse.Brian Goetz, in his “State of Valhalla” documents, calls such a memory layout “fluffy”: puffed up, bloated. What we dream of is a dense layout, one where the data lies side by side.Why does density matter? Because the hardware changed faster than Java did. In 1995, a memory access cost roughly the same as a CPU operation. Today the CPU is two orders of magnitude faster than main memory, and the whole gap is bridged by the cache. The processor reads memory in chunks called cache lines (usually 64 bytes). If the data lies densely and in order, one such chunk brings in a ton of useful values at once. If we’re hopping across pointers, every access risks a cache miss, and that can be a hundred times slower than a hit. This is locality of reference, and it’s the real stake in this whole game.
“But the JVM has escape analysis,” someone sharp will say. True: the virtual machine can recognize that some object never “escapes” beyond a local fragment of code, and then it doesn’t allocate it at all. From the programmer’s point of view it looks as if the object exists, but in reality its fields get spread out into ordinary variables or CPU registers. In the best case, the cost of allocation and the later cleanup by the garbage collector drops to practically zero.The trouble is that this optimization is unpredictable and fragile. It works only when the JIT compiler can trace the object’s entire flow with high confidence. But all it takes is for the object to land in a field of another class, get stored in an array, get passed into a more complex method, or appear beyond the boundary of code the JIT can analyze, and the whole trick stops working. The source code stays identical, but the performance behavior can change dramatically.This is precisely why experienced JVM programmers treat escape analysis as a nice bonus, not a project’s foundation. If an application’s performance depends on whether a particular JIT version manages to apply this optimization, it’s very easy to fall into the trap of hard-to-predict regressions. A minor refactor, a JDK update, or a change in code structure can send objects back onto the heap, and the costs of allocation and garbage-collector work return in full force.That leaves the brute-force option: give up on objects and encode the data by hand. Instead of a Color class, hold three bytes r, g, b. This isn’t just an academic example. The approach has been used for years in game engines, graphics libraries, image-processing systems, databases, analytics engines, and HPC code, where every byte of memory and every allocation matters. The trouble is that the speed comes at the cost of safety and readability. We lose names, private state, validation, and methods.
JEP 401 gives a simple example: a developer working on “raw” color bytes might mistakenly interpret them as BGR instead of RGB, swap red with blue, and quietly corrupt the entire image. A class wouldn’t have allowed it. A bare int? Sure it would.And it’s exactly this dichotomy, either convenient classes, or fast primitives, that Valhalla is trying to erase.Officially, Project Valhalla started in 2014. James Gosling described it at the time as “six PhDs tied into a single knot,” and that was no exaggeration. Interestingly, the idea is older than the project itself: Java’s creators wanted value types as early as the first version of the language, but in 1995 they gave up, because the problem was too hard.The goal was set ambitiously: to restore alignment between the programming model and the performance characteristics of modern hardware. In other words, to let programmers declare their own types that are flat and dense in memory like primitives, but look and behave like normal classes.Easier said than done. Over the following years the team built five different prototypes, each probing a different aspect of the problem. And this is where the most interesting part of the story begins, because to appreciate Valhalla’s current shape, you have to see how many ideas died along the way.The early prototypes went in a direction we now call “Q World.” It assumed that the new value types were a fundamentally different beast from objects, with separate type descriptors, separate bytecodes, and separate top types, exactly like primitives. Sounds logical: if they’re supposed to work like int, let them be represented like int. The trouble is that such a separation flooded the entire JVM type system with extra complexity: everything had to be done in two variants.The breakthrough came with a prototype christened “L World” (roughly around 2019). The name comes from the fact that value types started sharing the same “L carrier” (the L descriptor, the same one the JVM uses for ordinary references) with object references. The team expected such a unification to be too hard, and yet, to their own surprise, it worked without major compromises and incidentally solved a whole pile of problems from the earlier rounds.L World produced one more fundamental “aha” that shaped everything that came after: the language model and the JVM model don’t have to overlap one hundred percent.
L World is the right model for the virtual machine, but you can treat it as a translation target and offer the programmer something more convenient in the language. This separation of layers turned out to be the key to the rest of the project.That’s also when the plan to split the work into two phases crystallized: first value classes (still called something else at the time, more on that shortly), and only then, specialized generics. We’ll come back to generics in section 6, because that’s a separate, longer treatise.If you’ve ever tried to read about Valhalla and bounced off a wall of contradictory terms, it’s not your fault. The naming changed several times here, and not cosmetically: behind each name change stood a change in the model. Let’s trace it, because it’s the best illustration of how this feature was designed.Stage 1: value types: The earliest term. Vague, because it wasn’t yet clear what exactly these things were supposed to be.Stage 2: inline classes: Around 2019–2020 a distinction settled in that has survived to this day in its essence: classes split into identity classes (the ones with identity, that is, everything we’ve known until now) and the new inline classes (without identity). That’s when the slogan “codes like a class, works like an int” was coined, and the basic constraints were set: inline classes are final by default, their fields are final, you can’t synchronize on them.Stage 3: “primitive classes” and the two-projection model. And here it gets interesting, because this is exactly the idea that got significantly cut down. In the 2021 “State of Valhalla” documents, Valhalla promised three things: value objects, primitive classes, and specialized generics. The idea for a “primitive class” was that a single type would have two projections: a value variant (flat, never null, behaving like a primitive) and a reference variant (a box that allows null). Across various iterations this was written as Point.val/Point.ref, and later they experimented with the Point! and Point? syntax.The model was powerful, but also mentally heavy. A programmer would have to juggle two forms of the same type day to day and understand when a conversion between them happens.