Floats Don't Agree With Themselves

D docs.merca.earth ↗

▲ 37 points • 34 comments • by cremer • 3w ago • HN discussion ↗

Pangram verdict · v3.3

We believe that this document is primarily human-written, with some AI-generated content detected

34 %

AI likelihood · overall

Mixed

85% human-written 15% AI-generated

SEGMENTS · HUMAN 2 of 6

SEGMENTS · AI 0 of 6

WORD COUNT 1,883

PEAK AI % 66% · §6

Analyzed

May 8

backend: pangram/v3.3

Segments scanned

6 windows

avg 314 words each

Distribution

85 / 15%

human / AI fraction

Verdict

Mixed

Pangram v3.3

Article text · 1,883 words · 6 segments analyzed

Human AI-generated

§1 Human · 17%

I was debugging a polygon-overlap test that worked locally and failed on the server. Same code. Same input. Different answer.The function deciding the answer was small. Three points A, B, C; return the sign of (B - A) cross (C - A). That sign tells you whether a vertex is convex or reflex, whether a diagonal stays inside the polygon, whether a point is above a line or below. On x86, LLVM had folded the multiply and the subtract into a single fma — one rounding step instead of two. On WASM there is no FMA, so two roundings. A vertex sitting in the epsilon neighborhood of zero fell on opposite sides. From one bit of disagreement, the whole decomposition forked.This wasn't a bug in my code. It was IEEE 754 working as advertised. The standard pins down the storage format. It does not pin down behavior. Reassociation, FMA contraction, x87 registers at 80 bits, denormal flush flags — four separate ways to get a different answer from the same inputs. Tightening tolerances doesn't help: convex decomposition is discrete. A vertex is reflex or it isn't. An epsilon at the input is a binary difference at the output.So I wrote exact-poly, a 2D geometry library that uses no floats at all.A 10-vertex star decomposed into 5 convex parts. Vertex labels (v0–v9) match input order; part labels (p0.0–p5.3) are emitted by the cascade. Sum of part areas equals the ring area, exactly.Try it GitHub — mercaearth/exact-poly Live demo (7 tabs) — exact-poly.merca.earth In production — merca.earth runs exact-poly in WASM client-side and at validation; pick a spot, draw a polygon, watch the same i64 math accept or reject the claim on both sides Why floats aren't the same everywhereIEEE 754 specifies formats and basic operations. The gaps between those operations are where reproducibility leaks out.

§2 Mixed · 42%

EffectMechanismWhere it divergesIntermediate registersx87 FPU keeps values in 80 bits, rounds only on a spill to memoryx86 vs ARM, RISC-V, WASM (no extended precision)Fused multiply-addfma(a, b, c) rounds once instead of twice — more accurate, different resultLLVM enables on one target, disables on the nextReassociation(a + b) + c rewritten as a + (b + c) changes rounding-ffast-math default, often on without intentDenormalsFlush-to-zero avoids the subnormal slowdownPer-process flag, silently flipped"IEEE 754 compatibility" is about how floats are stored, not how they behave. Reproducibility has to be asked for explicitly: one architecture, one toolchain, one set of flags. Cross a process or ISA boundary and the guarantees stop.Why one bit ruins a decompositioncross_sign(A, B, C) is the only signal the algorithm has. Positive means left turn, negative right, zero collinear. From that sign comes reflex vs convex, inside vs outside, above vs below.Near zero — three points nearly collinear — float can return +1e-12 on one machine and -2e-13 on another. Flip that one sign bit and a different vertex is reflex. A different reflex vertex means a different starting cut, which means a different decomposition graph. One bit, dominoes."Round to a micrometer before comparing" doesn't save you. Convex decomposition is discrete: a vertex is reflex or it isn't. Any epsilon zone around zero eventually catches a real polygon, and the behavior forks.The honest fix is not to use floats. With integers the sign is bits, not an approximation. Same input, same answer on x86, ARM, WASM — anywhere i64 exists, and on platforms with no float type at all. Shewchuk's 1996 paper on adaptive precision predicates is what Triangle and CGAL sit on; everyone is patching the same hole. exact-poly takes the most direct route: compute the cross product in i128 with no filter. One extra multiplier per call, no "try double, expand on failure." Cheap and correct, always.Picking a scale"Don't use floats" is simple with an expensive corollary: integer arithmetic doesn't scale itself.

§3 Mixed · 41%

You pick a scale for your problem and pay for the choice.The budget is i64::MAX ≈ 9.2 × 10^18. Two knobs: unit of measure, and the largest coordinate you ever want to handle. Their product hits the ceiling.DomainSCALE1 unitMax coord (i64)Geodesy (Web Mercator)10⁶1 µm±9 × 10¹² m — more than the planetPhysics engine10³1 mm±9 × 10¹⁵ m — solar-system territoryCAD machining10⁹1 nm±9 × 10⁹ m — factory floor, not a cityThere is no right scale. Each is a tradeoff.exact-poly is currently calibrated for geodesy. SCALE = 1_000_000. One meter is a million units; one unit is a micrometer. Consumer GPS is good to about a centimeter on a good day, so a micrometer is four orders finer than anything we measure — not for the user, but as headroom for edge normalization, midpoints, and snapping.Earth's circumference in Web Mercator is 40_075_017 meters, or 4 × 10^13 units. That's the upper bound of the use case, not the library. From there to i64::MAX is five and a half orders of headroom. Every coordinate on the planet fits in i64.SCALE and MAX_WORLD are hardcoded at compile time on purpose. Change them at runtime and two processes running the same code can quietly disagree about what a "meter" is — exactly the failure the library was built to prevent. Forking and recompiling for a different calibration is two minutes of work; runtime configurability is a foot-gun.When i128 shows upThe cross product is (bx - ax)(cy - ay) - (by - ay)(cx - ax). Differences run up to ~8 × 10^13; their product can hit 6.4 × 10^27 — well past i64.i128::MAX is about 1.7 × 10^38, ten orders of headroom.

§4 Mixed · 36%

The rule is one line: coordinates are i64, anything multiplied is i128.// src/signed.rs:21-27pub fn cross_sign(ax: i64, ay: i64, bx: i64, by: i64, cx: i64, cy: i64) -> i128 { let dx1 = (bx as i128) - (ax as i128); let dy1 = (by as i128) - (ay as i128); let dx2 = (cx as i128) - (ax as i128); let dy2 = (cy as i128) - (ay as i128); dx1 * dy2 - dy1 * dx2}Widen before subtract(bx as i128) - (ax as i128), not (bx - ax) as i128. Subtract first in i64 and the multiply overflows; you get garbage in i128 shape. Half the naive integer-geometry implementations I've read get this backwards.Why i128 and not arbitrary-precision rationals like GMP? Because i128 fits in two registers, runs in hardware, allocates nothing. Rationals want a heap and branches. For a cross product the precision needed is bounded: i64 in, product of two differences out, fits in i128. Not infinite, just enough for a proven bound. Same trick Shewchuk uses in filtered predicates: cheap-and-correct first, fall back only if you must. [1]Area as exact equalityThe shoelace formula gives 2 × area. No reason to divide by 2 — every check works on the doubled value, and dividing throws away the low bit. Why throw away a bit you paid for in i128?twice_area is a signed i128. Positive for CCW, negative for CW. The central invariant of the library is this: sum(twice_area(parts[i])) == twice_area(original_ring) ==, not abs(a - b) < eps. Exact integer equality. On floats this invariant is impossible in principle — accumulating N areas perturbs low bits in different ways.

§5 Human · 17%

On integers the parts add up to the whole, bit for bit. If a single bit is off, the decomposer lost or duplicated a microscopic piece, and the test fails.Why decompose at allThe Separating Axis Theorem is a fast, simple test for whether two polygons overlap. It only works on convex polygons. Anything non-convex has to be carved into convex parts and checked pairwise.The simplest non-convex shape is an L. One reflex vertex and SAT no longer applies. A typical territory polygon — a parcel, a frequency band, a sensor footprint — has thirty to fifty vertices and several reflex corners. SAT needs convex parts before it can answer "do these overlap" honestly.A cascade with rotationsAlgorithms for convex decomposition have been around since the eighties. Chazelle published the asymptotically optimal O(n + r²) in 1991, and that ceiling still stands. Implementing it honestly in integer arithmetic is three months of work and an unstable trapezoidal decomposition to keep upright. For practical n < 100, simpler algorithms run faster than Chazelle's implementation runs at all. Textbooks have Chazelle 1991. Production has Bayazit, Keil, and Hertel-Mehlhorn.Each of the simpler algorithms has a weak spot. ExactPartition is a greedy partitioner that prioritizes diagonals from reflex vertices — fast, usually works, gives up on pathological inputs. Bayazit is recursive and can insert Steiner points mid-edge — stronger, but doesn't always converge in a reasonable number of steps. EarClip is the 1985 ear-clipping triangulation; it works on any simple polygon and produces O(n) triangles, which then need merging. That's where Hertel-Mehlhorn (1983) comes in — a merge step with a 4 × OPT guarantee on part count.The trick is not to pick one. It's to chain them. If the first fails, the next picks up. If all three fail, rotate the ring and try again. Defense in depth.Figure 1. The cascade. Each algorithm tries; on failure the next picks up. If all three fail, rotate the ring and retry.The Hertel-Mehlhorn merge step is itself a small example of how cheap integers make things. To merge two triangles across a shared edge, you check that the result stays convex: two cross products with the same sign.

§6 Mixed · 66%

No epsilon, no tolerance.The rotation step sounds like "try again and pray." It is a heuristic — no optimality proof. It works because pathology is usually a function of one vertex, not all of them. The starting index is a lever you can pull without changing the geometry: same area, same perimeter, same set of reflex vertices. Different traversal, different result. Worst case is N²; in practice, almost everything is solved at rotation zero.Three traps in one seamBayazit sometimes inserts a vertex mid-edge — a Steiner point — when no diagonal between existing vertices works. Useful, but it lights up three integer-specific traps.Midpoint without losing a bit(a + b) / 2 on integers throws away the low bit; the point ends up a half-unit off, and collinearity with neighbors breaks. The fix is round_div2 — (a + b) >> 1 with round-half-away-from-zero for negatives.Snap away from endpointsIf a midpoint lands close to an existing vertex, it has to be nudged off, or three points go collinear and break in_cone for every diagonal through them.Synthetic-vertex bookkeepingSteiner points are tagged so they don't drift the input vertex count. They live internally during decomposition and materialize into the outer ring only at commit. Otherwise a downstream validator counts a different number of vertices than the client created, and the system silently disagrees with itself again — exactly the failure mode the integer rewrite was supposed to eliminate.These aren't brilliant ideas. They're three small wedges driven into the same seam, where continuous geometry meets discrete arithmetic. The literature mostly skips them. They appear when a test on a specific polygon refuses to converge.What doesn't workThe honest list, not the marketing one. Self-intersecting input — rejected by is_simple() (O(n²) over non-adjacent edge pairs). A figure-8 returns DecompError::NotSimple. No heroic recovery; fix it on the caller side, e.g. with a Clipper2 boolean preprocess. Polygons with holes — not supported. Outer ring only. Parts above MAX_VERTICES_PER_PART — validator rejects. Not a fallback; a signal that the input doesn't fit the configured limits. Zero-area or collinear three-vertex polygons — rejected after normalize_ring collapses consecutive collinear vertices.