Skip to content
Salah Adawi Salah Adawi

A 3D Body from Eight Questions — No Photo, No GPU

We believe that this document is fully human-written.

Hacker News Article AI Analysis

Content Label

Human

AI Generated

0%

Human

100%

Window 1 - Human
8 questions in, 58 Anny body params out. A small MLP trained with a physics-aware loss, runs in milliseconds on CPU. Height accuracy 0.3 cm, mass 0.3 kg, BWH 3-4 cm — better than our photo pipeline on circumferences, without needing a photo. That’s the questionnaire path I promised in the previous post.The whole story begins with one observation: that height and weight can estimate body measurements quite accurately (Bartol’s regression). The original idea isn’t as accurate as it claims, but after a bit of tuning the results are quite promising.The questionnaire addresses privacy, speed and cost concerns. Plus we skip the phase where the user spends 5 minutes scrolling for perfect-light, tight-clothes photos. Additionally, it helped us find and address a mass calculation inconsistency in the Anny model, and model the “muscle weighs more” problem.BackstoryWhen we want to create a digital twin, we naturally think of HMR photo reconstruction. This route has a lot of ups and downs. During one “down”, the research agent brought up this:The most striking finding is from Bartol et al. (2022): a simple linear regression from just height + weight (no photo!) predicts 15 body measurements at 1.2-1.6 cm MAE. Many deep learning methods with photos don’t even beat this.At first I quickly calculated the number of combinations and the number of people, and thought it didn’t make sense. But then, after comparing friends, I thought there might be something to it.It’s not just height and weightIntuitively we all know that you can be a man with 178cm and 80kg with a belly, or from the gym. So it wasn’t a surprise that we came up with these two bodies:They are a bit cartoonish and pushed to extremes, but clearly show the problem.Next obvious thing to do: the weights from the original regression are public, so we downloaded them and ran them on our validation set. Raw BWH MAE landed around 9-11 cm, up to ~25 cm at the worst.
Window 2 - Human
Some of that is measurement-convention mismatch — Bartol slices SMPL meshes at fixed landmark vertex indices (e.g. a “lower belly point” vertex for waist), while we follow ISO 8559-1 anatomical rules (waist at the narrowest point, bust at the breast prominence). Same plane-sweep math, different slice location — bust alone is off by ~10 cm systematically. After correcting for that bias, BWH MAE drops to ~7 cm. Still above Bartol’s own ~3 cm BWH MAE on their data (paper Table 6 on BODY-fit+W: chest 3.0, waist 4.0, hips 2.2 cm), but that’s not really the story — we’re evaluating on a different population. Anny has explicit body-shape variation at fixed h/w that an h+w regression fundamentally can’t see, regardless of how well it was trained. And I’m not saying this to undermine the original research, but rather the opposite — it’s a good spark for this project.What else carries signalAs the previous example showed, the same height and weight can produce different results, but we can differentiate them via more params. Some obvious ones:Build/belly — muscular athletic or soft with a belly. Common knowledge is that muscle weighs more than fat, so a fat-heavy body will have more volume (and thus different measurements) than an athletic one.Shape — there are people with wider hips, while others have a bigger bust. This difference is part of body shape, which tells us how the weight is distributed. The problem I will describe later is that people don’t know their shape.Cupsize — relevant for women, quite an obvious feature.These are the features we naturally think of. To make sure they carry enough signal and aren’t too noisy, we ran the numbers against the dataset. The method is simple — bucket people by height (±1 cm), weight (±1 kg), and shape, then measure how much waist variation is left as each additional feature is locked in.Features lockedWaist std inside bucketTheoretical best MAEh, w, shape, build2.25 cm~1.8 cm+ belly2.08 cm~1.7 cm+ cup, gender1.30 cm~1.0 cmSmaller std inside a bucket means the features explain more of what’s going on.
Window 3 - Human
Build does most of the work — on its own, it moves the waist by about 1.8 cm at fixed h/w/shape. Belly adds another ~0.2 cm. Cup plus gender knocks 0.8 cm more off. Each feature earns its place.Side-finding: build signal is strongest on inverted-triangle shapes — 8 of the top 10 high-signal buckets are inverted triangle. The narrow waist amplifies relative fat changes; shapes with wider baseline waists (apple, rectangle) show smaller absolute shifts.At the extremes: same height, same weight, different body shape — bust can differ by 25 cm, hips by 30. Six clothing sizes at identical h/w. A height+weight regression simply can’t see this — the signal isn’t there in the input.And there’s a floor. Even with every questionnaire input locked, about 1.3 cm of waist variation stays, coming from ~50 continuous blendshape params that don’t map to any multiple-choice question. So the theoretical best a form can ever do is ~1 cm waist MAE.Model & datasetThe previous article describes the available body models. After the initial phase we operate solely on the Anny model, heavily leveraging its explainable features. Thanks to it, tasks like generating a huge dataset of people are easy.The dataset we generate and use for distribution analysis, training and validation contains a couple of tens of thousands of synthetically generated bodies, validated against a broad population distribution. For each body in the dataset we determine the described features using the body measurements.Anny is full of blendshapes, but for the virtual try-on, not all of them matter. We carefully selected 58 of them which matter here. The 8 questionnaire questions one-hot encode into 20 features, so the space is 20 input x 58 output params. We actually train two such models — one per gender. Male and female bodies differ enough that a shared network wastes capacity reconciling them.Training a small MLPThe original paper used simple regression to predict the params, so that was the obvious starting point. On our synthetic dataset it gets around 2.5 cm BWH MAE — decent. The problem was mass: Ridge predicts each of the 58 params independently, but mass depends on many of them working together (torso width × depth × height, hip volume, limb fat…).
Window 4 - Human
L2 regularization shrinks them all toward zero, and the small errors compound. Result: 3.9 kg mean mass error, 9.7 kg at p95, up to 16 kg for heavy bodies — even after output standardization and tuned regularization (the best Ridge we could build on this dataset).So we moved to an MLP. Two hidden layers, 256 units each, ReLU, a bit of dropout. Tiny — about 85 KB of weights, trains on a laptop in ~60 minutes per gender. Nothing fancy architecturally.The loss is the interesting part. The user already gives us their exact height and weight — those need to match precisely in the generated body, not just be close on average. Standard MSE on the 58 params doesn’t care about that and treats every param equally. And mass isn’t a param at all, instead it’s a consequence of volume, which comes out of the body model’s forward pass.So we include the forward pass in the loss. The MLP’s 58 outputs go through Anny — blendshapes, vertices, volume — and we compare the resulting mass and height against the user-provided targets. Gradients from a mass error flow back through all the volume-related params together. Ridge couldn’t do that because each output was solved independently; the MLP can, because the hidden layers couple them. This is what closes the mass gap.graph LR Q[8 questionnaire inputs] --> MLP[MLP] MLP --> P[58 Anny params] P --> A[Anny forward] A --> MHW[mass, height, waist predicted] MHW --> L[loss vs targets] P --> L L -. gradients .-> MLP The dotted arrow is the whole trick. Anny’s forward is surprisingly autograd-friendly — blendshapes are linear, volume is a sum of signed tetrahedra. No custom backward, standard PyTorch ops end to end. Measurements like waist are differentiable too, but that’s a whole story for the measurements tuning post.On top of params, mass, and height, we added a waist term. That’s it — bust and hip looked tempting, but in practice they introduced more noise than signal, and waist carries the most body-shape signal anyway.
Window 5 - Human
Honest resultsHeight is essentially solved — 0.3 cm mean MAE on both genders. Mass lands right there too, around 0.3 kg mean (p95 under 1 kg). Circumferences are harder; BWH sits at 3-4 cm, with waist the weakest.Averages lie about the tails, and a person who gets a 15 cm bust error doesn’t care that the mean is 4 cm. So we tracked p95 (5% of predictions worse than this) and max alongside the mean, and actively optimized for them — barrier terms in the loss that specifically penalize outliers on height and mass.MaleFemaleHeight — mean / p95 / max0.3 / 0.8 / 3.9 cm0.3 / 0.8 / 4.6 cmMass — mean / p95 / max0.5 / 1.2 / 3.3 kg0.4 / 1.0 / 2.1 kgBust — mean / p95 / max4.9 / 11.9 / 18.4 cm2.7 / 6.6 / 11.0 cmWaist — mean / p95 / max4.3 / 10.0 / 20.7 cm4.0 / 9.0 / 13.0 cmHips — mean / p95 / max3.3 / 8.4 / 14.8 cm3.3 / 8.0 / 13.3 cmFor comparison: on the same validation set, Bartol’s h+w regression sits at ~7 cm BWH MAE (bias-corrected, as above). Our photo-based pipeline from the previous post gets 5-8 cm BWH MAE on real people. The questionnaire beats both — without needing a photo.The numbers above are from synthetic Anny bodies — same model we train against. We also validated on a small group of real people measured by hand with tape. First results there were ugly — mass off by several kg even when circumferences were close. That pushed us to fix how mass is calculated in the first place (next section). After those fixes landed, real-people numbers line up with the synthetic ones on the measurements we tested.
Window 6 - Human
Worth remembering: it’s a statistical model, so what you get is the population-average body for your inputs, not your exact body. Everyone is different — but it’s a very good base for measurements tuning, which then gets <1 cm error. I’m planning the next article on that.Lessons learnedThe most striking was the real-world inconsistency in Anny’s anthropometry module. To calculate the mass, the approach is simple: calculate the volume of the body and multiply by body density. Primary school math. But Anny used 980 kg/m³ density, which is indeed the value you get after typing “average person density” into a web search. However, it’s more subtle than it initially seems.The first thing is that the value is different for men and women. The second is that “body density” isn’t one number — it depends on the convention. Whole-body density (lungs included, ~985 kg/m³) is what you’d measure by submerging someone in a tank — just below water, which is why humans barely float. Tissue-only density (~1030–1080 kg/m³) is what hydrostatic weighing reports after subtracting residual lung air, and it’s what fat-vs-muscle composition actually gives you. The 980 kg/m³ figure sits between these two conventions — close to whole-body but not quite. The third is that “muscle weighs more”. The per-gender tissue-only medians we ended up using (male ~1059, female ~1031 kg/m³) live in clad-body, derived from body-fat percentage via the Siri two-component model. Empirically the correction works — lean bodies gain mass, soft bodies lose it — though the absolute scale still rests on the 980 calibration being roughly right for the “average” subject.Density isn’t unique for all people, and muscle has a different density than fat. Not much, but it can change the mass by 2–3 kg. To respond to that, clad-body estimates body fat using the Navy formula.The second finding (which will be described more in the measurements tuning post) is that each cm matters. A 2 cm shift across all torso circumferences (bust, waist, hips) moves the computed mass by ~2 kg!All the above summed together had a big impact on predicting incorrect mass.