Morphological Addressing of Identity Basins in Text-to-Image Diffusion Models

This paper demonstrates that morphological structures, ranging from descriptive feature descriptors to sub-lexical sound-symbolic patterns in prompts, create navigable gradients within text-to-image diffusion models that allow for the systematic generation of specific identity basins and coherent visual concepts without requiring target names or training images.

Andrew Fraser

Published 2026-02-24
📖 4 min read☕ Coffee break read

Imagine you have a giant, magical library where every book is a picture. This library was built by a robot that read billions of books and looked at billions of photos. The robot didn't just store the photos; it learned the vibe of everything. It knows what "Marilyn Monroe" looks like, but it also knows what "platinum blonde hair," "a beauty mark," and "1950s glamour" look like separately.

This paper is about two clever tricks to navigate this library without using the "name tags" (like "Marilyn Monroe" or "Crungus") that the robot might have been told to ignore. Instead, the researchers used Morphological Addressing—which is a fancy way of saying "using the building blocks of language to find specific places in the robot's mind."

Here is the story of their two main discoveries, explained simply:

1. The "Marilyn" Puzzle (Study 1)

The Problem: You can't just ask the robot to draw "Marilyn Monroe" because the library has rules against famous names. Even if you try to describe her, the robot might just draw a generic blonde woman.

The Solution: The researchers realized that Marilyn Monroe isn't just a name; she is a specific intersection of features. Think of it like a Venn diagram.

  • Circle A: Platinum blonde hair.
  • Circle B: A beauty mark on the cheek.
  • Circle C: 1950s Hollywood style.

Where these three circles overlap is "Marilyn." The researchers didn't use her name. Instead, they fed the robot a list of these overlapping features over and over again, teaching it a special "map" (called a LoRA) to find that specific intersection.

The Magic Result:

  • The Magnet: Once they built this map, they could ask for a simple "portrait of a woman," and the robot would pull the image toward the Marilyn spot.
  • The Inverse: They also tested what happens if they push the robot away from Marilyn.
    • Without the map, the robot just makes weird, broken monsters (like a horror movie).
    • With the map, the robot makes something called the "Uncanny Valley." It looks like a human, but slightly wrong—like a doll with hollow eyes. The map was so strong it shaped not just the "good" version, but also the "weird" version. It's like having a magnet that pulls metal toward it, but also pushes other metal into a specific, strange shape.

2. The "Crungus" Hunt (Study 2)

The Problem: The internet had a mystery. People found that if you type a nonsense word like "Crungus" into the robot, it draws the exact same weird creature every time. But "Crungus" doesn't exist! How does the robot know what it is?

The Solution: The researchers looked at Sound Symbolism (Phonesthemes). This is the idea that certain sounds in English naturally feel like certain things.

  • "Cr-" sounds like crashing or breaking (Crash, Crush, Crumble).
  • "Sn-" sounds like sneaking or noses (Snout, Sniff, Sneak).
  • "-oid" sounds like a robot or a thing that resembles something (Android, Humanoid).

They made up 200 new nonsense words using these sound blocks. For example, they made "Snudgeoid" (Sn- + sludge + -oid).

The Magic Result:

  • When they asked the robot to draw "Snudgeoid," it didn't draw random noise. It drew a robot made of sludge.
  • When they asked for "Crashax" (Crash + Ax), it drew a rugged off-road vehicle.
  • When they asked for "Broomix" (Broom + the comic book suffix -ix), it drew a cartoon character that looks like it belongs in an Asterix comic.

Why this matters:
The robot wasn't remembering a picture of a "Snudgeoid" because no one ever took a photo of one. Instead, the robot was building the picture from the sounds. It heard "Sn-" and thought "slimy/metal," heard "-oid" and thought "robot," and glued them together.

The Big Picture: The Library is Organized

The main takeaway is that the robot's brain (its "latent space") isn't a chaotic mess. It's actually very structured, like a city with neighborhoods.

  1. You can find things without names: You can navigate to a specific "neighborhood" (like Marilyn Monroe) just by describing the street signs (features) that lead there.
  2. Sounds have maps: The way a word sounds gives the robot a map to a specific visual neighborhood. If you use the right sound blocks, you can invent a new creature that the robot will draw consistently, even if that creature has never existed before.

In short: The researchers proved that you don't need to know the "secret password" (the name) to find a specific place in the robot's imagination. You just need to know the grammar of the sounds and features that build that place. They turned the robot's brain from a black box into a map we can actually read.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →