Revisiting an Old Perspective Projection for Monocular 3D Morphable Models Regression

This paper introduces a novel camera model that extends orthographic projection with a shrinkage parameter to effectively capture perspective distortion in close-up monocular 3D Morphable Model regression, enabling stable and accurate fitting for head-mounted camera footage.

Toby Chong, Ryota Nakajima

Published 2026-03-06
📖 4 min read☕ Coffee break read

Imagine you are trying to build a perfect clay sculpture of a friend's face based on a single photo. This is what computer scientists call 3D Morphable Model (3DMM) regression. They use math to turn a flat 2D picture into a 3D digital head.

For a long time, these digital sculptors have used a very simple rule of thumb: "Everything is the same size, no matter how far away it is."

In the paper you shared, Toby Chong and Ryota Nakajima from Toei Company say, "Hey, that rule doesn't work for close-up photos!" Here is the story of their fix, explained simply.

The Problem: The "Floating Jaw" and the "Tiny Nose"

Imagine you take a selfie with your phone held very close to your face. Because of perspective, your nose looks huge, and your ears look small and far away. This is how our eyes and real cameras work.

However, most computer vision software uses a "flat" projection (called Orthogonal Projection). It's like looking at your face through a window where everything is flattened.

  • The Result: When the computer tries to recreate a close-up selfie, it gets confused. It thinks, "If the nose is big in the photo, it must be a big nose." But since the software assumes the face is flat, it shrinks the nose to fit the math.
  • The Glitch: The final 3D model ends up with a tiny nose and a jawline that looks like it's floating in mid-air. It also creates a weird "Expanding Brain" effect, where the top of the head looks like it's bulging outward.

The Solution: The "Magic Shrinkage Knob"

The authors realized they didn't need to throw away the old, stable math. Instead, they invented a new camera model that acts like a "magic knob."

  1. The Old Way: The computer draws the face flat.
  2. The New Way: They added a learnable parameter (let's call it ρ\rho or "Rho") that acts like a perspective dial.
    • Turn the dial to 0: The face looks flat (like the old way).
    • Turn the dial to 5: The face starts to warp, making the nose look bigger and the ears smaller, just like a real close-up photo.

Think of it like a lens filter on a camera. The software learns to twist this "lens" just enough to match the distortion in the photo, without breaking the rest of the math.

The "Head-Mounted Camera" Dataset

To teach their computer this new trick, they needed photos that showed extreme distortion. They couldn't just use standard celebrity photos (which are usually taken from far away).

So, they created a special dataset called HMC1M.

  • The Setup: They strapped cameras to the heads of 200 professional actors.
  • The Result: These cameras were only 15–30 cm (6–12 inches) away from the actors' faces. This created the "extreme close-up" look where the nose is huge and the perspective is wild.
  • The Training: They took an existing AI model (trained on flat photos) and "fine-tuned" it using these crazy close-up photos. The AI learned to turn that "Magic Shrinkage Knob" automatically.

Why This Matters

This isn't just about making better 3D models; it's about fixing the "Uncanny Valley."

  • Before: If you tried to put a 3D face on a video game character or a movie actor using a head-mounted camera, the face would look weird, with a tiny nose and a floating chin.
  • After: With this new "Shrinkage Knob," the computer understands that close-up = distortion. It can now recreate a realistic 3D face from a selfie or a camera strapped to a helmet, making the nose look big and the jaw look grounded.

The Bottom Line

The authors didn't reinvent the wheel; they just added a suspension system to it.

They took a method that was great for driving on straight highways (standard photos) and added a suspension that allows it to handle bumpy, off-road terrain (extreme close-ups). Now, whether the camera is far away or right in your face, the 3D model looks real, not like a cartoon with a tiny nose.