Modern rendering culling techniques | krupitskas 🌦️

K krupitskas.com ↗

▲ 195 points • 43 comments • by krupitskas • 3mo ago • HN discussion ↗

Pangram verdict · v3.3

We believe that this document is a mix of AI-generated, and human-written content

45 %

AI likelihood · overall

Mixed

65% human-written 35% AI-generated

SEGMENTS · HUMAN 2 of 6

SEGMENTS · AI 1 of 6

WORD COUNT 1,868

PEAK AI % 100% · §4

Analyzed

Apr 20

backend: pangram/v3.3

Segments scanned

6 windows

avg 311 words each

Distribution

65 / 35%

human / AI fraction

Verdict

Mixed

Pangram v3.3

Article text · 1,868 words · 6 segments analyzed

Human AI-generated

§1 Mixed · 37%

Saints Row: The Third Remastered - my first shipped title. Steelport is a dense open-world city, but the game also has tight indoor corridors, jets, cars, and parachute sequences. Getting culling right across all of that was a real challenge. Shoutout to Timur Gagiev - absolute legend!IntroIn the modern era of AI coding, “AI game generation”, DLSS 5, Unreal Engine 5, and phenomenal Gaussian Splat demos, people tend to think graphics and games are solved problems. “Just grab AI and start building games within days,” they say. Obviously that’s bullshit. The hard engineering work, knowledge, tradeoffs, and art direction are not going anywhere. Whether your game is 2D or 3D, realistic or cartoonish, set in a closed Mars base or an open-world zombie-infested New York, you still need to optimize it. One of the most important optimizations every game has used, and will keep using, is culling.Good news: almost 80% of the optimizations I’ve seen over my career boil down to “don’t do extra stupid work when you don’t need to.”Bad news: you still need to implement culling while balancing scene structure, game design, art direction, hardware limits, and performance budgets.So this article walks through the main culling techniques used in modern real-time renderers. I’ll group them by category so it’s easier to see how they relate to each other. Almost every one of these techniques deserves its own article, because as always, the devil is in the details.1. The Basics: Distance, Backface, and FrustumThese are the cheapest and most universally applied techniques. They catch the obvious cases before anything more expensive runs.Distance CullingThe simplest form: if an object is farther than some max distance from the camera, skip it. That’s it.This is trivially fast and works well for small props where the visual impact of disappearing is minimal. Most engines let you set a cull distance per mesh or per material.The tricky part is avoiding visible pop-in.

§2 Mixed · 44%

Common mitigations are dithered fade-out, aggressive LOD before the cull point, or impostors (billboards that replace the real mesh at distance).This is covered in more detail in the Screen Size Culling section below, but it’s worth flagging here: if something projects to only a handful of pixels, it’s often not worth the cost to draw. Distance alone doesn’t catch that cleanly - you also want a screen-space size check.Exaggerated example with a house and small rocks. Small rocks disappearing at distance is barely noticeable, but larger objects popping in and out is hard to miss.Backface CullingThis is the first culling technique you’ll usually encounter when working with a graphics API because it’s configured as part of the pipeline state object (PSO) and is one of the easiest wins to enable.Every triangle has a front face and a back face. For closed meshes, back faces are never visible because they’re inside the object. The GPU can automatically skip them based on winding order, which saves roughly half the rasterization and fragment work for typical geometry.Triangle winding orderRotating icosahedron showing viewer-facing triangles being rasterized while the others are skippedOne thing worth knowing: in a traditional vertex + fragment pipeline, backface culling happens after the vertex shader has already processed the vertices. So you don’t save vertex work, only rasterization and fragment work. In more GPU-driven pipelines, you can move this decision earlier, for example in compute or task/amplification work that culls meshlets before they ever reach rasterization.This is mostly free, but it’s worth understanding because it interacts with transparency, two-sided materials, and some culling algorithms that exploit it explicitly.Frustum CullingTop-down view: objects outside the frustum are culled and never submitted to the rendering pipelineFor a perspective camera, the view frustum is the truncated pyramid-shaped volume that represents what the camera can see. Anything outside of it doesn’t need to be rendered. Frustum culling tests objects, usually via bounding volumes like spheres or AABBs, against the six planes of the frustum and skips anything that doesn’t intersect.This is almost always the first pass in a culling pipeline, or second after distance culling.

§3 Human · 28%

It’s fast, cheap, and can cut a huge chunk of the scene in one shot, especially in open worlds where large portions of the map are behind or beside the camera.This is how Horizon Zero Dawn’s frustum culling worksNotice in the gif above that big objects like mountains are still rendered even when they’re almost outside the frustum. This is the core tradeoff with object-level culling: many small objects give you fine-grained culling opportunities but each one is a draw call and a CPU-side visibility test. A handful of large objects is cheap on draw calls, but you’re stuck rendering the whole thing even when 90% of its triangles are offscreen - and you pay vertex shader cost for all of them, since the rasterizer clips after vertex shading, not before. That wasted vertex work on off-screen geometry is exactly the problem meshlet culling in section 4 solves.2. Occlusion CullingOcclusion culling tells you what’s behind other things. It’s harder but often gives you the biggest win in dense scenes like cities or interiors.This is only occlusion culling. Note how the boxes behind the house disappear on the right; when we peek around the corner, some of them come back.Hardware Occlusion QueriesAll major graphics APIs expose occlusion-query-style features. Direct3D 12 has query heaps, Vulkan has occlusion queries, and Metal has visibility result buffers. The idea is the same: render proxy geometry, typically the object’s bounds, and count whether any samples passed the depth test.

§4 AI · 100%

Zero visible samples means the proxy was fully occluded from that view, so the real object can usually be skipped.In DX12 you’d use D3D12_QUERY_TYPE_BINARY_OCCLUSION which returns just 0 or 1 rather than an exact sample count - cheaper and enough for culling:// setup (once) D3D12_QUERY_HEAP_DESC desc = { D3D12_QUERY_HEAP_TYPE_OCCLUSION, objectCount }; device->CreateQueryHeap(&desc, IID_PPV_ARGS(&queryHeap));

// per frame - render proxy, wrap with query cmdList->BeginQuery(queryHeap, D3D12_QUERY_TYPE_BINARY_OCCLUSION, objectIndex); cmdList->DrawIndexedInstanced(...); // draw bounding box cmdList->EndQuery(queryHeap, D3D12_QUERY_TYPE_BINARY_OCCLUSION, objectIndex);

// resolve to a readback buffer (still on GPU timeline) cmdList->ResolveQueryData(queryHeap, D3D12_QUERY_TYPE_BINARY_OCCLUSION, 0, objectCount, readbackBuffer, 0); The catch is latency and synchronization. Results only become visible to the CPU after the GPU finishes, so in practice you often read frame N’s results while rendering frame N+1. That one-frame lag is usually acceptable, but it can briefly keep rendering something that just became occluded, or skip something that just became visible.Software Occlusion Culling (CPU)Instead of asking the GPU, you rasterize a low-resolution depth buffer on the CPU and test objects against it. Intel’s Masked Software Occlusion Culling (MSOC) is probably the most well-known implementation here. It uses SIMD to rasterize triangles in 8x4 pixel tiles and can process millions of triangles per second.The upside is zero readback latency since it all happens on the CPU before you submit anything to the GPU. The downside is CPU cost and the need to maintain a separate simplified occluder mesh, since you can’t afford to rasterize your full scene geometry.Battlefield 3 - final rendered sceneBattlefield 3 - the same scene rasterized by the CPU software occluder.

§5 Mixed · 37%

The result is intentionally coarse - over-culling something that’s actually visible is worse than under-culling something that should have been skipped.Hi-Z (Hierarchical Z-Buffer)Hi-Z is a mip chain of the depth buffer, often called a depth pyramid, where each level stores a conservative depth value for a larger region of the screen.Visualization of a very simple 3D sceneTo test whether an object is occluded, you project its bounds to screen space, choose the mip level that roughly matches its footprint, and compare the object’s nearest depth against the pyramid. For a conventional LESS depth test this pyramid often stores the maximum depth in each region; with reversed-Z it is typically the minimum. The important part is that the representation stays conservative. If the test says “occluded”, you can safely skip the object. If not, you keep it. Good implementations prefer false negatives over false positives.HI-Z PyramidThis is the basis for most GPU-driven occlusion culling today. It’s fast to build and query, and it lives entirely on the GPU.Two-Pass Occlusion CullingA common pattern in GPU-driven renderers: use the previous frame’s Hi-Z to cull objects before rendering the current frame.The simple version is one pass: cull everything against last frame’s Hi-Z, render what survives. It’s cheap, but objects that just became visible get wrongly culled and stay invisible for one frame.The two-pass version fixes this. Pass 1 tests objects that were visible last frame, renders the survivors, and builds a fresh Hi-Z from them. Pass 2 then takes everything that was culled in pass 1 and retests it against the new Hi-Z. Anything that just became visible gets a second chance and renders this frame. The Hi-Z used in pass 1 is still one frame old, so there’s a small residual inaccuracy that no extra passes can fix. In “normal gameplay” you won’t notice it. The case where it breaks down is a hard camera cut, like a sudden 90-degree rotation: pass 1’s visible set is basically wrong, the rebuilt Hi-Z is unreliable, and you get one bad frame. Engines usually detect this and fall back to a full depth prepass for that frame.The GPU-side cost is much lower than always doing a full depth prepass, which is why most modern game engines use this approach.3.

§6 Human · 26%

Even More Culling Techniques!Screen Size CullingInstead of a fixed world-space distance, you cull based on projected screen area. An object 10 meters away might be worth rendering, but the same object at 2000 meters might project to 3 pixels and not be worth the draw call overhead. Screen size culling handles this more gracefully than a raw distance threshold.For example, Unreal uses screen size as the primary metric for static-mesh LOD transitions, while min/max draw distance are separate distance-based controls.PVS (Potentially Visible Sets)PVS precomputes for each region of the world which other regions can possibly be seen from it. At runtime you just look up the current region’s PVS and skip anything that isn’t in it. This is extremely fast at runtime but expensive to compute and doesn’t handle dynamic objects well.While this is precomputed and effective, it can be impractical or impossible for procedurally generated games.Quake made PVS famous. It’s still useful in some indoor games where the scene geometry is static and bake time is acceptable.Portal CullingFor indoor scenes with well-defined rooms and doorways, portal culling is very effective. Each doorway is a portal. You trace the camera’s view through portals and only render rooms that are reachable through visible portals. This can eliminate entire rooms of geometry very cheaply.Portal culling shows up in a lot of first-person games set in buildings. It complements frustum culling well since portals naturally shrink the effective view cone as you look through multiple doorways.4. GPU-Driven Rendering and Cluster CullingThis is where things get interesting!Instead of the CPU deciding what to draw and issuing one draw call per object, you push the culling logic onto the GPU and use indirect draw calls to let the GPU decide.Indirect DrawingDirectX, Vulkan, and Metal all support indirect drawing, although the exact API differs. The draw arguments, such as index count and base vertex, come from a GPU buffer rather than CPU code. A compute shader runs culling and writes only surviving objects into that buffer.