Value numbering

B bernsteinbear.com ↗

▲ 38 points • 0 comments • by surprisetalk • 2w ago • HN discussion ↗

Pangram verdict · v3.3

We believe that this document is fully human-written

0 %

AI likelihood · overall

Human

100% human-written 0% AI-generated

SEGMENTS · HUMAN 6 of 6

SEGMENTS · AI 0 of 6

WORD COUNT 1,858

PEAK AI % 0% · §2

Analyzed

Jun 10

backend: pangram/v3.3

Segments scanned

6 windows

avg 310 words each

Distribution

100 / 0%

human / AI fraction

Verdict

Human

Pangram v3.3

Article text · 1,858 words · 6 segments analyzed

Human AI-generated

§1 Human · 0%

Welcome back to compiler land. Today we’re going to talk about value numbering, which is like SSA, but more.

Static single assignment (SSA) gives names to values: every expression has a name, and each name corresponds to exactly one expression. It transforms programs like this:

x = 0 x = x + 1 x = x + 1

where the variable x is assigned more than once in the program text, into programs like this:

v0 = 0 v1 = v0 + 1 v2 = v1 + 1

where each assignment to x has been replaced with an assignment to a new fresh name.

It’s great because it makes clear the differences between the two x + 1 expressions. Though they textually look similar, they compute different values. The first computes 1 and the second computes 2. In this example, it is not possible to substitute in a variable and re-use the value of x + 1, because the xs are different.

But what if we see two “textually” identical instructions in SSA? That sounds much more promising than non-SSA because the transformation into SSA form has removed (much of) the statefulness of it all. When can we re-use the result?

Identifying instructions that are known at compile-time to always produce the same value at run-time is called value numbering.

Eliminating common subexpressions

To understand value numbering, let’s extend the above IR snippet with two more instructions, v3 and v4.

v0 = 0 v1 = v0 + 1 v2 = v1 + 1 v3 = v0 + 1 # new v4 = do_something(v2, v3) # new

In this new snippet, v3 looks the same as v1: adding v0 and 1. Assuming our addition operation is some ideal mathematical addition, we can absolutely re-use v1; no need to compute the addition again.

§2 Human · 0%

We can rewrite the IR to something like:

v0 = 0 v1 = v0 + 1 v2 = v1 + 1 v3 = v1 v4 = do_something(v2, v3)

This is kind of similar to the destructive union-find representation that JavaScriptCore and a couple other compilers use, where the optimizer doesn’t eagerly re-write all uses but instead leaves a little breadcrumb Identity/Assign instruction1.

We could then run our copy propagation pass (“union-find cleanup”?) and get:

v0 = 0 v1 = v0 + 1 v2 = v1 + 1 v4 = do_something(v2, v1)

Great. But how does this happen? How does an optimizer identify reusable instruction candidates that are “textually identical”? Generally, there is no actual text in the IR.

One popular solution is to compute a hash of each instruction. Then any instructions with the same hash (that also compare equal, in case of collisions) are considered equivalent. This is called hash-consing.

When trying to figure all this out, I read through a couple of different implementations. I particularly like the Maxine VM implementation. For example, here is the valueNumber (hashing) and valueEqual functions for most binary operations, slightly modified for clarity:

public abstract class Instruction extends Value { ... }

// The base class for binary operations public abstract class Op2 extends Instruction { // Each binary operation has an opcode and two opearands public final int opcode; // (IMUL, IADD, ...) Value x; Value y;

@Override public int valueNumber() { // There are other fields but only opcode, and operands get hashed. // Always set at least one bit in case the hash wraps to zero.

§3 Human · 0%

return 0x20000000 | (opcode + 7 * System.identityHashCode(x) + 11 * System.identityHashCode(y)); }

@Override public boolean valueEqual(Instruction i) { if (i instanceof Op2) { Op2 o = (Op2) i; return opcode == o.opcode && x == o.x && y == o.y; } return false; } }

The rest of the value numbering implementation assumes that if a valueNumber function returns 0, it does not wish to be considered for value numbering. Why might an instruction opt-out of value numbering?

Pure vs impure

An instruction might opt out of value numbering if it is not “pure”.

Some instructions are not pure. Purity is in the eye of the beholder, but in general it means that an instruction does not interact with the state of the outside world, except for trivial computation on its operands. (What does it mean to de-duplicate/cache/reuse printf?)

A load from an array object is also not a pure operation2. The load operation implicitly relies on the state of the memory. Also, even if the array was known-constant, in some runtime systems, the load might raise an exception. Changing the source location where an exception is raised is generally frowned upon. Languages such as Java often have requirements about where exceptions are raised codified in their specifications.

We’ll work only on pure operations for now, but we’ll come back to this later. We do often want to optimize impure operations as well!

We’ll start off with the simplest form of value numbering, which operates only on linear sequences of instructions, like basic blocks or traces.

Local value numbering

Let’s build a small implementation of local value numbering (LVN). We’ll start with straight-line code—no branches or anything tricky.

Most compiler optimizations on control-flow graphs (CFGs) iterate over the instructions “top to bottom”3 and it seems like we can do the same thing here too.

§4 Human · 0%

From what we’ve seen so far optimizing our made-up IR snippet, we can do something like this:

initialize a map from instruction numbers to instruction pointers for each instruction i if i wants to participate in value numbering if i’s value number is already in the map, replace all pointers to i in the rest of the program with the corresponding value from the map otherwise, add i to the map

The find-and-replace, remember, is not a literal find-and-replace, but instead something like:

instr.opcode = "Assign" instr.operands[0] = replacement

instr.make_equal_to(replacement)

(if you have been following along with the toy optimizer series)

This several-line function (as long as you already have a hash map and a union-find available to you) is enough to build local value numbering! And real compilers are built this way, too.

If you don’t believe me, take a look at this slightly edited snippet from Maxine’s value numbering implementation. It has all of the components we just talked about: iterating over instructions, map lookup, and some substitution.

// Local value numbering BlockBegin block = ...; ValueMap currentMap = new ValueMap(); InstructionSubstituter subst = new InstructionSubstituter();

// visit all instructions of this block for (Instruction instr = block.next(); instr != null; instr = instr.next()) { // attempt value numbering (uses valueNumber() and valueEqual()) // // return a previous instruction if it exists in the map, or insert the // current instruction into the map and return it Instruction f = currentMap.findInsert(instr); if (f != instr) { // remember the replacement in the union-find subst.setSubst(instr, f); } }

This alone will get you pretty far. Code generators of all shapes tend to leave messy repeated computations all over their generated code and this will make short work of them.

Sometimes, though, your computations are spread across control flow—over multiple basic blocks.

§5 Human · 0%

What do you do then?

Global value numbering

Computing value numbers for an entire function is called global value numbering (GVN) and it requires dealing with control flow (if, loops, etc). I don’t just mean that for an entire function, we run local value numbering block-by-block. Global value numbering implies that expressions can be de-duplicated and shared across blocks.

Let’s tackle control flow case by case.

First is the simple case from above: one block. In this case, we can go top to bottom with our value numbering and do alright.

The second case is also reasonable to handle: one block flowing into another. In this case, we can still go top to bottom. We just have to find a way to iterate over the blocks.

If we’re not going to share value maps between blocks, the order doesn’t matter. But since the point of global value numbering is to share values, we have to iterate them in topological order (reverse post order (RPO)). This ensures that predecessors get visited before successors. If you have bb0 -> bb1, we have to visit first bb0 and then bb1.

Because of how SSA works and how CFGs work, the second block can “look up” into the first block and use the values from it. To get global value numbering working, we have to copy bb0’s value map before we start processing bb1 so we can re-use the instructions.

Maybe something like:

value_map = ValueMap() for block in function.reverse_post_order(): local_value_numbering(block, value_map)

Then the expressions can accrue across blocks. bb1 can re-use the already-computed Add v0, 1 from bb0 because it is still in the map.

…but this breaks as soon as you have control-flow splits. Consider the following shape graph:

We’re going to iterate over that graph in one of two orders: A B C or A C B. In either case, we’re going to be adding all this stuff into the value map from one block (say, B) that is not actually available to its sibling block (say, C).

When I say “not available”, I mean “would not have been computed before”.

§6 Human · 0%

This is because we execute either A then B or A then C. There’s no world in which we execute B then C.

But alright, look at a third case where there is such a world: a control-flow join. In this diagram, we have two predecessor blocks B and C each flowing into D. In this diagram, B always flows into D and also C always flows into D. So the iterator order is fine, right?

Well, still no. We have the same sibling problem as before. B and C still can’t share value maps.

We also have a weird question when we enter D: where did we come from? If we came from B, we can re-use expressions from B. If we came from C, we can re-use expressions from C. But we cannot in general know which predecessor block we came from.

The only block we know for sure that we executed before D is A. This means we can re-use A’s value map in D because we can guarantee that all execution paths that enter D have previously gone through A.

This relationship is called a dominator relationship and this is the key to one style of global value numbering that we’re going to talk about in this post. A block can always use the value map from any other block that dominates it. For completeness’ sake, in the diamond diagram, A dominates each of B and C, too.

We can compute dominators a couple of ways4, but that’s a little bit out of scope for this blog post. If we assume that we have dominator information available in our CFG, we can use that for global value numbering. And that’s just what—you guessed it—Maxine VM does.

It iterates over all blocks in reverse post-order, doing local value numbering, threading through value maps from dominator blocks. In this case, their method dominator gets the immediate dominator: the “closest” dominator block of all the blocks that dominate the current one.