Beyond Semantic Similarity: Open Challenges for Embedding-Based Creative Process Analysis Across AI Design Tools

This paper argues that relying solely on fixed embedding similarity for analyzing creative processes in AI design tools is insufficient because it fails to capture meaningful creative pivots, and it outlines three key challenges—aligning metrics with creative significance, handling multimodal traces, and evaluating agentic systems—while proposing context-aware LLM interventions to better capture session-specific dynamics.

Seung Won Lee, Semin Jin, Kyung Hoon Hyun

Published Tue, 10 Ma
📖 5 min read🧠 Deep dive

Imagine you are trying to understand how different artists create their masterpieces. Some paint with watercolors, some sculpt with clay, and some design digital worlds.

Currently, when we try to study how these artists work, we usually look at the final picture or the finished statue. We ask, "Is this pretty?" or "Is this useful?" But this is like judging a movie only by the final frame; it tells us nothing about the story, the plot twists, or the journey the characters took to get there.

This paper argues that we need a new way to study the journey of creation, especially when AI is helping the artist. The authors suggest using a tool called "Embedding Analysis," which is like a super-smart translator that turns every word, sketch, or action an artist takes into a mathematical point on a map. If two actions are close together on the map, the computer thinks they are similar.

Here is the problem: The computer's map is too simple. It sees "similarity" only as "looking the same," but it misses the "spark" of creativity.

The authors use three main metaphors to explain why this current method fails and what we need to fix:

1. The "Same Word, Different World" Trap

Imagine a designer is building a tiny apartment.

  • Step A: They say, "I need stackable modules to save space." (They are thinking about furniture).
  • Step B: They say, "I need stackable modules to change the room layout." (They are now thinking about architecture).

To a human, this is a huge creative leap. The designer didn't just add more furniture; they completely changed the problem they were solving. They pivoted from "storage" to "spatial design."

But to the computer's "Similarity Map," these two steps look almost identical because they both use the words "stackable" and "modules." The computer thinks, "Oh, they are just continuing the same idea." It misses the plot twist. It's like a GPS that sees you turning a corner and thinks you are still driving straight because the road looks the same.

The Fix: We need to teach the computer to understand the context and the intent, not just the words. It's like hiring a human editor who knows when a character in a story has changed their mind, even if they are using the same vocabulary.

2. The "Mixed Media" Puzzle

Most creative tools today are messy. You might type a prompt, draw a rough sketch, ask the AI to generate an image, and then tweak a slider. It's a mix of text, pictures, and numbers.

Currently, the computer's map mostly understands text.

  • If you draw a rough sketch and then a polished final version, they might look totally different to a computer (low similarity). But to a human, they are clearly the same idea evolving (high continuity).
  • Conversely, two very different-looking sketches might actually be part of the same creative strategy.

The challenge is figuring out what counts as a single "move" in this messy mix. Is a single brushstroke a move? Or is the whole sketch a move? Without a clear rule, the computer's map gets scrambled, like trying to build a puzzle where some pieces are 3D and others are flat.

3. The "Self-Driving Car" Problem

This is the most tricky part. In the future, AI won't just be a tool; it will be a partner (an "agent") that makes its own decisions. It will suggest ideas, pick the best ones, and steer the project.

If the AI uses this simple "Similarity Map" to decide what to do next, it might get stuck in a loop.

  • Imagine an AI agent that is told to "be diverse." It might generate 100 wildly different ideas just to look diverse, not because the human designer is exploring new ground.
  • The computer sees a "diverse" path and thinks, "Great, the process is working!"
  • But in reality, the AI is just spinning its wheels, creating noise instead of progress.

The danger is that the AI's own "personality" (how it is programmed) starts to look like the human's creativity. We need a way to tell the difference between "The human is exploring" and "The robot is just following its programming."

The Big Picture

The authors are saying: "We have a great new telescope (Embedding Analysis) to watch how people create, but the lens is blurry."

It sees the surface details but misses the deep meaning. To fix this, they propose using Large Language Models (LLMs) as a "smart guide" to help interpret the data. Instead of just measuring how close two things look, we need to ask, "Did the creator change their mind? Did they solve a new problem?"

By fixing these three issues, we can finally compare how a graphic designer works with an AI versus how a musician works with an AI, giving us a true understanding of human-AI creativity across all fields.