Your Face Is a Façade: What Apple's Personas Reveal About Gaussian Splatting
Apple's Vision Pro Personas use Gaussian splatting. PAZ unpacks the geometry behind it — the R·S·Sᵀ·Rᵀ covariance — and what it means for AEC survey.
Apple just confirmed, almost in passing, the most interesting sentence in spatial computing this year. Asked how Vision Pro’s Personas — the ghost-walking telepresence avatars now out of beta in visionOS 26 — manage to look like a person rather than a polygon mannequin, Apple’s Jeff Norris told CNET the answer plainly: Gaussian splatting. “There’s machine learning involved,” Norris added, “but not many people realize it’s a concert of networks — over a dozen.” The headline writes itself as a gadget story. The deeper signal is geometric: a human face and a cathedral façade are now the same fitting problem.
←TODAY: In 2026, the same representation that captures a Berlin monument from tourist photos captures your face from a handful of phone-grade frames — metric, view-dependent, on-device. →3012: Every surface in the Zurich-3012 city carries a fitted radiance twin; “as-built” stops meaning a drawing and starts meaning a measured field. Fulcrum: Both directions only resolve once you see that a person and a building are reconstructed by minimising the exact same photometric energy over a cloud of anisotropic blobs.
What it is: A 3D Gaussian Splat is a scene represented not as a mesh, not as a neural network, but as a cloud of roughly a million little ellipsoidal smudges floating in space. Each smudge — each Gaussian — carries five things: a position μ in 3D, a 3×3 covariance Σ that sets its shape and orientation, an opacity α, and a view-dependent colour stored as spherical-harmonic coefficients so the smudge can look bluer head-on and warmer at a grazing angle. To render, you project every ellipsoid onto the image plane, sort them by depth per screen tile, and alpha-composite front to back. That is rasterisation — the same operation a GPU does for game triangles — which is precisely why the thing runs at 100+ FPS on consumer silicon, and why Apple can paint your Persona’s eyelashes in real time inside a FaceTime call.
Why it works: The cleverness is that the covariance Σ is not stored as six free numbers — it is factored as Σ = R S Sᵀ Rᵀ, a rotation R (a unit quaternion) times a diagonal scale S. This guarantees Σ stays a valid positive-semidefinite covariance no matter what gradient descent does to it, so the whole scene is differentiable end-to-end. You hand the optimiser a hundred photos, render the current cloud, measure the photometric error against each real image, and let backpropagation nudge every μ, R, S, α and colour toward agreement. The objective function being minimised is just image reconstruction loss — and that is the line a structural reviewer of geometry should never let pass unspoken. As our Kaffipedia panel on neural radiance fields puts it, an implicit NeRF is “a continuous physics model of how a building reflects light, and a Gaussian splat is its rasterisable twin.” Same captured reality, two languages.
Origins: The lineage is short and exact. March 2020: Ben Mildenhall and colleagues present NeRF at ECCV — a tiny MLP mapping a 5D coordinate to colour and density, gorgeous reconstructions, training measured in hours. August 2023: Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler and George Drettakis at INRIA Sophia-Antipolis publish 3D Gaussian Splatting for Real-Time Radiance Field Rendering at SIGGRAPH, throw out the implicit network, and replace it with explicit ellipsoids and a custom CUDA splatter. Photogrammetry stopped being a mesh you reconstruct and became a field you fit. The emission-absorption integral underneath both is older still — Kajiya and Von Herzen formalised it for clouds and smoke in 1984. Apple’s Personas, and Gracia’s newly launched 4D (moving) splatting app for Vision Pro, are simply the consumer edge of that 2023 hinge.
In practice: For a Swiss studio the payoff is not avatars — it is survey. A five-minute phone walk-around of a Zürich Altstadt façade now yields a metric, view-dependent capture good enough for as-built verification, heritage documentation, and BIM clash-checking against the real building instead of against a CAD wish. ETH Zurich’s photogrammetry tradition and the TU Delft 2024 aerial benchmark already treat splats as a planning instrument, not a demo. The trade-off is brutal and worth stating plainly: an explicit splat is several gigabytes of un-derivable state, a beautiful guess you cannot defend in a structural review and cannot rebuild once the format goes dark. Atelier: capture the field this week, but extract a clean mesh or measured dimensions from it the same day — keep the geometry you can reconstruct from principle, not the file you merely downloaded.
Hack: This Hack teaches you to build the one matrix that makes a splat a splat — the covariance Σ = R S Sᵀ Rᵀ — from a quaternion and a scale, so the ellipsoid is correct by construction. Understand these four lines and you understand why no Gaussian can ever become an invalid shape.
import numpy as np
def covariance(quat, scale): # quat=(w,x,y,z), scale=(sx,sy,sz) in metres
w,x,y,z = quat / np.linalg.norm(quat)
R = np.array([[1-2*(y*y+z*z), 2*(x*y-w*z), 2*(x*z+w*y)],
[2*(x*y+w*z), 1-2*(x*x+z*z), 2*(y*z-w*x)],
[2*(x*z-w*y), 2*(y*z+w*x), 1-2*(x*x+y*y)]])
S = np.diag(scale)
return R @ S @ S.T @ R.T # always positive-semidefinite
print(covariance((1,0,0,0), (0.10, 0.02, 0.02))) # a flat, oriented disc
Change the scale to (0.10, 0.02, 0.02) and you get a flat disc that hugs a wall; change the quaternion and it tilts — but it can never invert or collapse, because R S Sᵀ Rᵀ is a valid covariance for any input. That single guarantee is what lets gradient descent fit a million of them to your face without ever producing a broken one.
The discipline AXIS//NOLL would press on you: before you let a tool fit a structure — a face, a façade, a shell — make it tell you which energy it minimised and which assumptions it factored in. Run the snippet, watch the matrix stay valid, and you will never again mistake a downloaded splat for a derived geometry.
Source: HN Concepts
SOURCE · ↗
PAZ Kaffi · multidisciplinary editorial, led by PAZ Academy