No Caption, No Problem: Caption-Free Membership Inference via Model-Fitted Embeddings

This paper introduces MoFit, a caption-free membership inference attack framework for latent diffusion models that constructs model-fitted synthetic embeddings to effectively identify training data memorization without relying on ground-truth text captions.

Joonsung Jeon, Woo Jae Kim, Suhyeon Ha, Sooel Son, Sung-Eui Yoon

Published 2026-02-27
📖 5 min read🧠 Deep dive

Imagine you have a magical art machine (a Latent Diffusion Model) that learned to draw pictures by studying a secret photo album. You suspect this machine memorized specific photos from that album and can reproduce them almost perfectly.

You want to catch the machine in the act: Did it memorize this specific photo you're holding, or did it just learn the general style? This is called a Membership Inference Attack (MIA).

The Problem: The Missing Recipe

Usually, to test if the machine memorized a photo, you need two things:

  1. The Photo.
  2. The Exact Caption (the "recipe") that was used to teach the machine about that photo.

Think of the caption as the specific instruction: "A golden retriever wearing a red hat." If you give the machine the photo and the exact same instruction it learned from, it says, "Oh, I know this! I drew this!"

But here's the catch: In the real world, you often only have the photo. You don't have the secret recipe. The artist who trained the machine never told you what words they used.

If you try to guess the recipe using a smart AI (a Vision-Language Model) that looks at the photo and writes a description, it usually gets it close, but not exact. It might say, "A dog with a hat."

  • The Result: The machine gets confused. It doesn't react strongly to your "guess" recipe, whether the photo is from its secret album or not. The test fails because the signal is too weak.

The Solution: MOFIT (The "Overfitting" Trick)

The authors of this paper, MOFIT, came up with a clever workaround. Instead of trying to guess the real recipe, they create a fake, super-specific recipe that is perfectly tuned to the machine's brain, even if it doesn't match the photo perfectly.

Here is how they do it, step-by-step:

Step 1: The "Chameleon" Photo (Surrogate Optimization)

Imagine you have a photo of a cat. You want to know if the machine memorized it.
Instead of asking the machine to describe the cat, you take the photo and start tweaking it slightly (adding tiny, invisible noise). You keep tweaking it until the machine looks at it and thinks, "Wow, this looks exactly like something I've seen before!"

You aren't trying to make the photo look better; you are trying to make the photo fit perfectly into the machine's memory. You create a "Chameleon Photo" that the machine loves.

Step 2: The "Perfect" Recipe (Embedding Extraction)

Now that you have this "Chameleon Photo" that the machine loves, you ask the machine: "What words would you use to describe this specific Chameleon Photo?"

The machine spits out a super-specific text embedding (a digital recipe) that is perfectly matched to that Chameleon Photo. Let's call this the "MOFIT Recipe."

Step 3: The Trap (The Mismatch)

Here is the magic trick. You take the MOFIT Recipe (which was made for the Chameleon Photo) and you feed it to the machine along with your Original Photo.

  • If the Original Photo is a "Member" (from the secret album): The machine's brain is wired to be very sensitive to its own training data. When you give it a recipe that is almost right but slightly off (because it was made for the Chameleon, not the Original), the machine gets very stressed. It screams, "This doesn't match my memory!" Its internal error score goes way up.
  • If the Original Photo is a "Non-Member" (new to the machine): The machine doesn't care as much. It's not used to these specific photos, so a slightly mismatched recipe doesn't bother it much. Its error score stays low.

The Analogy: The Strict Chef vs. The Casual Cook

Think of the machine as a Strict Chef who memorized a specific cookbook.

  1. The Old Way (Guessing the Recipe): You show the Chef a dish and ask, "Did you make this?" You guess the recipe is "Spicy Chicken." The Chef says, "Maybe, maybe not." (Low accuracy).
  2. The MOFIT Way:
    • You take the dish and tweak it until it looks exactly like a dish the Chef memorized.
    • You ask the Chef, "What is the name of this tweaked dish?" He writes down a very specific, complex name: "Spicy Chicken with a pinch of saffron and a hint of lemon."
    • Now, you show him the Original Dish (which is just "Spicy Chicken") but tell him the Complex Name.
    • If he memorized the dish: He panics! "Wait, my memory says it needs saffron! This is wrong!" (High Stress = MEMBER).
    • If he didn't memorize it: He shrugs. "I don't know what that is, but I'll eat it." (Low Stress = NON-MEMBER).

Why This Matters

  • Privacy: This proves that even without the secret training data (the captions), hackers can still figure out if a specific person's photo was used to train an AI.
  • Better than Guessing: The paper shows that this "Chameleon" trick works much better than just using a smart AI to guess the caption. In fact, on some tests, it worked even better than methods that did have the secret captions!
  • The Warning: It tells AI developers that they need to be more careful. Just because you hide the text descriptions doesn't mean the images are safe from being "sniffed out."

In short: MOFIT tricks the AI into revealing its secrets by creating a perfect "fake match" and seeing how the AI reacts when that match is applied to the real photo. If the AI freaks out, it was probably there all along.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →