3D Gaussian Splatting in a Weekend

B bfeldman.me ↗

▲ 135 points • 12 comments • by b__feldman • 2mo ago • HN discussion ↗

Pangram verdict · v3.3

We believe that this document is primarily human-written, with a small amount of AI content detected

6 %

AI likelihood · overall

Human

97% human-written 3% AI-generated

SEGMENTS · HUMAN 6 of 6

SEGMENTS · AI 0 of 6

WORD COUNT 1,731

PEAK AI % 1% · §4

Analyzed

May 16

backend: pangram/v3.3

Segments scanned

6 windows

avg 289 words each

Distribution

97 / 3%

human / AI fraction

Verdict

Human

Pangram v3.3

Article text · 1,731 words · 6 segments analyzed

Human AI-generated

§1 Human · 0%

Introduction3D Gaussian splatting is a technique that answers this question: given a dataset of pictures of a scene, how can I reconstruct it in 3D? This is achieved using a machine learning algorithm that, for many different camera angles, does the following: render the scene, compare it to the picture taken at the same camera angle, and update the scene to reduce the difference between the rendered image and the ground truth. However, unlike traditional 3D renderers, 3DGS does not use triangles as primitives, but objects called Gaussian splats, making the rendering algorithm unique to 3DGS.In this post, I aim to show how 3D Gaussian splatting works by building a simplified renderer from scratch, in ~1000 lines of code. The main motivation is to build intuition on the 3DGS math. It is recommended to understand the basics of linear algebra, probability theory and computer graphics.The renderer is written in C++ and OpenGL. All code is available on GitHub, but I tried to make this tutorial general enough that you can reproduce it using any graphics engine (WebGPU, Metal, DirectX…). At the end of the tutorial, we’ll be able to render a Gaussian scene like this, in real time: I’ve also included interactive WebGPU visualizations that you can navigate with WASD and the mouse.This article covers only rendering, not training. However, there are technical decisions in the 3DGS renderer Kerbl et al. 2023 that are tightly coupled to the training pipeline (differentiability, positive semi-definite covariance preservation) that we’ll highlight.Loading a 3DGS sceneFirst, we need a scene to render. At the time of this article, good 3DGS scenes are still hard to find, but thankfully Supersplat recently made it possible to download splats from their website, so this is what we’ll be doing. We’ll use this plate of tomatoes scene because it’s fairly lightweight (only ~200k splats), but feel free to use any scene you like. Just be aware that the renderer we’re building is not really optimized for large scenes.Then, we’re going to load the Gaussian splat scene we downloaded from Supersplat.

§2 Human · 1%

This lets us check that the scene is oriented correctly, and more importantly, gives us a first look at what a splat actually is. The typical format is .ply, for which I’ve included a custom loader in ply_loader.h. The loader parses the .ply file into an array of GaussianSplat objects, wrapped in a Scene:constexpr int SH_COUNT = 16; constexpr int SH_CHANNEL_COUNT = 3; constexpr int SH_FLOAT_COUNT = SH_COUNT * SH_CHANNEL_COUNT;

struct GaussianSplat { glm::vec3 centroid = glm::vec3(0.0f); float opacity = 0.0f; std::array<float, SH_FLOAT_COUNT> sphericalHarmonics = {}; std::array<float, 3> scale = {0.0f, 0.0f, 0.0f}; std::array<float, 4> rotation = {1.0f, 0.0f, 0.0f, 0.0f}; };

struct Scene { std::vector<GaussianSplat> splats; }; Let’s break this down: centroid is the position of the splat in world-space coordinates. This differs from the usual graphics Model-View-Projection pipeline where model data is represented in model space. Therefore, there won’t be any Model matrix in the 3DGS pipeline, only a View matrix (3D world-space -> 3D camera-space) and a projection matrix (3D camera space -> 2D screen space). scale and rotation describe the geometry of the splat, but we’ll get back to this later opacity and sphericalHarmonics describe the visibility and color of the splat, but we’ll also get back to this later In a trained 3DGS scene, these values are optimized during training by rendering the splats from known camera views, comparing the result to the training photos, and backpropagating the image error into each splat’s centroid, scale, rotation, opacity, and color coefficients.

§3 Human · 1%

As a first sanity-check, let’s load our 3DGS scene and draw each splat centroid using GL_POINTS (e.g. as a point cloud):Scene scene = loadPly("scene.ply");

std::vector<glm::vec3> centroids; centroids.reserve(scene.splats.size());

for (const GaussianSplat& splat : scene.splats) { centroids.push_back(splat.centroid); }

GLuint vao = 0; GLuint vbo = 0; glGenVertexArrays(1, &vao); glGenBuffers(1, &vbo);

glBindVertexArray(vao); glBindBuffer(GL_ARRAY_BUFFER, vbo); glBufferData(GL_ARRAY_BUFFER, centroids.size() * sizeof(glm::vec3), centroids.data(), GL_STATIC_DRAW);

glEnableVertexAttribArray(0); glVertexAttribPointer(0, 3, GL_FLOAT, GL_FALSE, sizeof(glm::vec3), nullptr);

glDrawArrays(GL_POINTS, 0, static_cast<GLsizei>(centroids.size())); Adding color: spherical harmonicsNow that we have our point cloud, let’s add color! One thing that makes 3DGS look so good is that it captures view-dependent color: shiny objects, highlights, and small reflections change as the camera moves. This is achieved by storing color as spherical harmonics (SH), rather than as a single RGB value. This lets each splat have a color that depends on the viewing direction, which is useful for glossy materials.I’m not going to go deep on SH here, but the intuition is fairly simple: it’s a way to express any function on a sphere using basis functions, a bit like Fourier series. This allows us to have a function $\mathrm{rgb}(\theta, \phi)$ that, for each splat, outputs its RGB color at direction $(\theta, \phi)$.It’s defined like this:$$ \mathrm{rgb}(\theta, \phi) = \sum_{\ell=0}^{L}\sum_{m=-\ell}^{\ell} c_{\ell m}Y_{\ell m}(\theta,\phi) $$where $Y_{\ell m}$ are the SH basis functions and $c_{\ell m}$ are the RGB coefficients stored in the splat.

§4 Human · 1%

For a deeper visual explanation of SH, I recommend Visual Notes on Spherical Harmonics.To recover RGB for a given camera view, we take the normalized direction from the camera to the splat centroid, evaluate the SH basis in that direction, multiply each basis value by the stored RGB coefficient, sum the result, then apply a +0.5 bias (so a zero-centered SH output lands around mid-grey instead of black) and clamp to [0, 1].In shader terms, the operation is basically:vec3 rgb = vec3(0.5); for (int i = 0; i < 16; ++i) { rgb += shCoefficient[i] * shBasis(i, direction); } rgb = clamp(rgb, 0.0, 1.0); Our actual shader writes the SH basis functions out explicitly instead of using a loop, but this is too verbose to fit in here.We now have a colored point cloud. You can already see how the colors change based on the viewing direction!The Gaussian distributionLet’s leave point clouds and get into actual splats! But for that we need to do a bit of math. The main mathematical concept in 3D Gaussian splatting is… Gaussians! While I don’t intend this article to serve as a probability course, I do think it’s important to keep a few elements in mind about the Gaussian distribution.You’ve likely encountered the 1D Gaussian, which is a probability distribution parameterized by its mean $\mu\in\mathbb{R}$ and its standard deviation $\sigma \geq 0$:$$ p(x) = \frac{1}{\sigma\sqrt{2\pi}}\exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right) $$ A 1D Gaussian with mean $\mu$ and variance $\sigma^2$ This distribution can also be extended to 2 or 3 dimensions. For a $d$-dimensional Gaussian, the probability density function is:$$ p(x) = \frac{1}{\sqrt{(2\pi)^d|\Sigma|}} \exp\left(-\frac{1}{2}(x-\mu)^\top\Sigma^{-1}(x-\mu)\right) $$ In 3D, $\mu$ is now a 3D vector, called the centroid.

§5 Human · 0%

This is our GaussianSplat.centroid that we loaded earlier. The geometric interpretation of $\mu$ is simply the splat’s position in world space. Our standard deviation $\sigma$ becomes a $3\times 3$ positive semi-definite matrix $\Sigma$, called the covariance matrix.In general, if we want to talk about a Gaussian with mean $\mu$ and covariance $\Sigma$, we write $\mathcal{N}(\mu,\Sigma)$. The $\mathcal{N}$ stands for the normal distribution, another name for the Gaussian distribution.And here’s what it looks like in 2D and 3D: The iso-density contours of a 2D Gaussian are concentric ellipses that are implied from the eigenvectors and eigenvalues of $\Sigma$

A 3D Gaussian is similar to what we have in the 2D case: iso-density surfaces are ellipsoids We can already understand how 3D Gaussian splatting works: Start from a 3D Gaussian distribution $\mathcal{N}(\mu, \Sigma)$. Project it into 2D space so it becomes a 2D Gaussian distribution $\mathcal{N}(\mu_{2D}, \Sigma_{2D})$. Draw the ellipse resulting from $(\mu_{2D}, \Sigma_{2D})$.

The splat lives as a probability distribution in 3D, gets projected to a 2D distribution in screen space, and only at the very end becomes pixels Do this for every single splat in our scene and that’s how we render a 3DGS scene! There are, however, a few tricky parts in this.The main idea to understand at this stage is that in steps 1 and 2, we are not dealing with concrete geometric objects, like we would with a triangle in rasterization. Instead, we’re dealing with probability distributions, which only become drawable geometry in the final step. This might seem a bit abstract at this point, but we’ll get to it!Reparameterizing the 3D Gaussian distributionA covariance matrix has to be symmetric and positive semi-definite (PSD). Intuitively, this means it can stretch space along some axes, but it cannot create a negative variance. This matters because the covariance controls the ellipsoid’s size and orientation.

§6 Human · 0%

The 3D Gaussian distribution is parameterized by $\mu\in\mathbb{R}^3$ and $\Sigma\in S_+^3$, the set of $3\times 3$ symmetric positive semi-definite matrices.An important property of $\Sigma$ is that there exists a rotation matrix $R$ and a scaling matrix $S$ such that: $$\Sigma = RS(RS)^\top$$See the appendix for why this is the case.We can also see this result the other way around: for any rotation matrix $R$ and scaling matrix $S$, $\Sigma = RS(RS)^\top$ is guaranteed to be a covariance matrix.This compact primitive is part of why 3DGS is practical: the geometry of each splat is described with a centroid, three scales, and a rotation, so the training code has only a few geometry parameters to optimize per splat.Remember our GaussianSplat struct? It’s using exactly this representation, with the scale and rotation members, not a full $3 \times 3$ covariance matrix!scale is three log-space floats. After applying exp(scale), they become the diagonal values of the scaling matrix $S$.rotation is four floats encoding a quaternion, which is a compact way to store the rotation matrix $R$. I won’t cover quaternions in detail here, but there are a lot of good resources online.Why does this matter? Remember that a 3DGS scene is created via a training loop, where each Gaussian’s parameters are updated by backpropagating the error between the rendered image at a specific camera angle and the ground truth image. If during one update we directly change the entries of $\Sigma$, nothing guarantees that the updated matrix is still PSD, so it would not be a covariance matrix anymore. But if we update the scale and rotation parameters, we know that the reconstructed $\Sigma$ will be a covariance matrix.Rendering a splatAs we’ve just seen, our graphics primitive is a probability distribution. Unlike points or triangles, we cannot directly draw such a primitive using a standard raster pipeline.A naive way to put our splat on screen would be to sample points from the distribution, and draw those points. This would look like a cloud of random dots: dense near the centroid, sparse near the edges.