DEO: Training-Free Direct Embedding Optimization for Negation-Aware Retrieval

Imagine you are asking a very smart librarian for a book. You say: "I want a story about a dragon, but not one that breathes fire."

In the world of traditional search engines (the "librarians" of the internet), this is a nightmare. Why? Because most search engines are like enthusiastic but slightly confused assistants. When they hear "dragon," they get excited and start pulling out every dragon book they have. When they hear "not fire," their brain often glitches. They might think, "Oh, they want a dragon, and maybe they want to know about fire too?" or they simply ignore the "not" part entirely. The result? You get a pile of fire-breathing dragons, and you have to sift through them all to find the one you actually wanted.

This paper introduces a new method called DEO (Direct Embedding Optimization) to fix this problem. It's like giving the librarian a special pair of glasses and a magic eraser, all without needing to retrain the librarian for years.

Here is how DEO works, broken down into simple steps:

1. The "Translator" Step (Query Decomposition)

First, DEO uses a super-smart AI (a Large Language Model) to act as a translator. It takes your messy, complicated sentence and breaks it down into two clear lists:

The "Yes" List (Positive): Things you do want. (e.g., "Dragon," "Fantasy story," "Mythical creature").
The "No" List (Negative): Things you definitely do not want. (e.g., "Fire," "Burning," "Ash").

Think of this as the librarian reading your request and saying, "Okay, I understand. You want the concept of a dragon, but I need to make sure I don't show you anything with flames."

2. The "Magic Adjustment" (Direct Embedding Optimization)

This is the secret sauce. Usually, to make a search engine smarter, you have to feed it thousands of examples and retrain it (like sending the librarian to a 6-month boot camp). That takes a lot of time, money, and computer power.

DEO skips the boot camp entirely. Instead, it takes your original search request and tweaks it on the fly right before the search happens.

Imagine your search request is a magnet.

The "Yes" List acts like a strong magnet pulling your request closer to the right answers.
The "No" List acts like a repulsive force (like two north poles of a magnet) pushing your request away from the wrong answers.

DEO uses a mathematical formula to nudge your search request just enough so that it sits perfectly in the "sweet spot" of the library. It moves your request closer to the "dragon without fire" section and pushes it far away from the "fire-breathing dragon" section.

3. The Result

Once this tiny adjustment is made, the search engine runs the query. Because the request has been "tuned" to understand the "not," the engine instantly finds the exact book you wanted, skipping all the fire-breathing ones.

Why is this a big deal?

No Re-training: You don't need to spend months teaching the computer new tricks. It works with the tools you already have.
Works Everywhere: It works for text (searching documents) and images (searching for photos). For example, if you ask for "a picture of a cat, but not a black one," DEO helps the computer understand that "black" is a thing to avoid, not a thing to include.
Fast and Cheap: Because it doesn't require massive computer power to retrain models, it's fast and can be used by almost anyone.

The Analogy in a Nutshell

If traditional search is like shouting "Find me a red car!" and getting back a red car, a blue car, and a red truck because the computer didn't listen to your "not blue" or "not truck" instructions...

DEO is like whispering to the computer, "Okay, I see the red car. Now, let's gently push the blue car and the red truck out of the way so the red car is the only thing left in front of you."

It's a simple, clever trick that makes search engines much better at understanding what you don't want, not just what you do.

Here is a detailed technical summary of the paper "DEO: Training-Free Direct Embedding Optimization for Negation-Aware Retrieval."

1. Problem Statement

Recent advancements in Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) have improved information retrieval, but existing methods struggle significantly with negation and exclusion queries (e.g., "show me X, but exclude Y").

Limitations of Current Approaches:
- Fine-tuning: Existing solutions often rely on fine-tuning embedding models or using sparse autoencoders (SAE) to control latent features. These methods require substantial GPU resources, large-scale datasets, and extensive training time, making them impractical for resource-constrained environments.
- Performance Degradation: Fine-tuning can sometimes degrade general retrieval performance or lack controllability.
- Semantic Ambiguity: Standard dense retrievers often fail to distinguish between inclusion and exclusion semantics, leading to the retrieval of irrelevant documents that contain the excluded terms.

2. Methodology: Direct Embedding Optimization (DEO)

The authors propose DEO, a training-free method that optimizes query embeddings at inference time without updating the underlying encoder or requiring additional training data. The process consists of two main stages:

A. Query Decomposition (via LLM)

The input query is semantically analyzed by a Large Language Model (LLM) to explicitly separate positive (inclusion) and negative (exclusion) intents.

Input: A query with negation (e.g., "characteristics of Bayreuth excluding its identity as a city").
Output:
- Positive Sub-queries ( $P$ ): Enriched queries capturing the desired semantic content (e.g., "cultural significance of Bayreuth").
- Negative Sub-queries ( $N$ ): Explicit queries encoding what to exclude (e.g., "geographic location of Bayreuth").
Embedding: Both the original query ( $q$ ) and the decomposed sub-queries are encoded using a frozen pre-trained embedding model $E(\cdot)$ .

B. Direct Embedding Optimization

Instead of updating model weights, DEO treats the query embedding vector itself as a learnable parameter. It initializes a user embedding $e_u$ with the original query embedding $e_o$ and optimizes it using a contrastive loss function over a fixed number of steps (e.g., 20 steps using Adam optimizer).

The loss function $L(e_u)$ comprises three terms:

Attraction Term: Pulls $e_u$ closer to the mean of positive sub-query embeddings ( $e_{pi}$ ).
Repulsion Term: Pushes $e_u$ away from the mean of negative sub-query embeddings ( $e_{nj}$ ).
Consistency Term: Prevents $e_u$ from drifting too far from the original query semantics ( $e_o$ ).

$L(e_u) = \lambda_p \frac{1}{K}\sum_{i=1}^K \|e_u - e_{pi}\|^2 - \lambda_n \frac{1}{M}\sum_{j=1}^M \|e_u - e_{nj}\|^2 + \lambda_o \|e_u - e_o\|^2$

Where $\lambda_p, \lambda_n, \lambda_o$ are hyperparameters controlling the strength of attraction, repulsion, and consistency, respectively.

C. Retrieval

The optimized embedding $e_u$ is used directly for retrieval (e.g., via cosine similarity in FAISS). This approach is model-agnostic and modality-agnostic, applicable to both text-only and multimodal (text-to-image) retrieval.

3. Key Contributions

Training-Free Framework: DEO achieves state-of-the-art performance on negation-aware tasks without fine-tuning the backbone model or requiring new datasets.
Explicit Intent Separation: By decomposing queries into positive and negative components and optimizing the embedding space via contrastive loss, DEO precisely captures user intent regarding exclusion.
Generalizability: The method works across diverse embedding models (e.g., BGE variants, CLIP) and modalities (text and image), demonstrating consistent improvements over baselines.

4. Experimental Results

The authors evaluated DEO on NegConstraint (text retrieval), NevIR (pairwise discrimination), and COCO-Neg (text-to-image retrieval).

Text Retrieval (NegConstraint):
- Using BGE-large-en-v1.5, DEO improved MAP@100 by +0.1028 (from 0.6299 to 0.7327) and nDCG@10 by +0.0738 (from 0.7139 to 0.7877).
- Consistent gains were observed across all BGE model variants (Small, Large, M3).
Multimodal Retrieval (COCO-Neg):
- Using OpenAI CLIP, DEO increased Recall@5 by +6% (from 0.4792 to 0.5392).
- Improvements were also seen on NegCLIP (a model already fine-tuned for negation), proving DEO adds value even on specialized models.
Ablation Studies:
- Decomposition vs. Optimization: Decomposition alone yielded marginal gains; the primary performance boost came from the embedding optimization step.
- LLM Choice: While larger LLMs (GPT-4.1-nano) provided better decomposition than smaller ones (Qwen2.5-1.5B), DEO improved performance regardless of the LLM used.
- Optimization Steps: Performance peaked around 20–50 steps; excessive steps (>100) led to performance degradation.

5. Significance and Impact

Practicality: DEO offers a lightweight, computationally efficient solution for real-world retrieval systems where fine-tuning is too expensive or data is unavailable.
Robustness: It effectively handles complex user intents involving negation and exclusion, a known weakness in current RAG and search systems.
Efficiency: The optimization process is extremely fast (approx. 0.016 seconds on CPU for 20 steps), making it suitable for low-latency applications.
Future Direction: The paper suggests that DEO provides a foundation for building controllable retrieval systems, with potential extensions to adaptive parameter selection and other modalities like audio.

In conclusion, DEO demonstrates that direct optimization of the query embedding space is a highly effective strategy for negation-aware retrieval, outperforming fine-tuned baselines while avoiding the associated computational costs.