Efficient Vector Search in the Wild: One Model for Multi-K Queries

The paper introduces OMEGA, a K-generalizable learned top-K search method that leverages a base model trained on K=1 with trajectory-based features and a dynamic refinement procedure to achieve high accuracy and low latency for multi-K vector queries while significantly reducing preprocessing time compared to state-of-the-art methods.

Yifan Peng, Jiafei Fan, Xingda Wei, Sijie Shen, Rong Chen, Jianning Wang, Xiaojian Luo, Wenyuan Yu, Jingren Zhou, Haibo Chen

Published Mon, 09 Ma
📖 4 min read☕ Coffee break read

Imagine you are running a massive, high-speed library where millions of books (vectors) are stored. Your job is to find the "best match" for a user's request. Sometimes a user asks for just the one best book (K=1K=1). Other times, they want a list of the top 100 best books (K=100K=100).

In the past, finding these books was like searching a dark maze. You had to set a "search budget" (how many steps you take) before you started.

  • If you set the budget too low, you might miss the best book (low accuracy).
  • If you set it too high, you waste time walking around the library when you could have stopped earlier (high latency).

The Problem: The "One-Size-Fits-None" Model

Recently, scientists built a "Smart Librarian" (a machine learning model) who could look at the maze and say, "Stop! You've found the best book!" This saved a lot of time.

But there was a catch: This Smart Librarian was trained only to find the single best book.

  • If you asked for the Top 1, the librarian was perfect.
  • If you asked for the Top 10, the librarian got confused. It would stop too early because it was used to finding just one, leaving you with a bad list.
  • If you asked for the Top 1, but the librarian was trained on "Top 100," it would walk the whole maze unnecessarily, wasting time.

To fix this, other researchers tried training a different librarian for every possible number (one for Top 1, one for Top 10, one for Top 100...).
The downside? Training all these librarians takes forever and costs a fortune in computing power. It's like hiring 50 different specialists just to answer simple questions.

The Solution: OMEGA (The "Master Detective")

The authors of this paper introduce OMEGA, a new system that solves this with a clever trick.

1. The "One Model to Rule Them All"

Instead of hiring 50 librarians, OMEGA hires one Master Detective who is an expert at finding the single best item (K=1K=1).

  • The Magic Trick: If you need the Top 5, OMEGA doesn't ask for a new expert. It asks the Master Detective to find the #1 book. Then, it says, "Okay, ignore that book. Now, find the best book among the remaining ones."
  • It repeats this process 5 times.
  • Why it works: The paper discovered that the pattern of how the detective gets closer to the target (the "distance trajectory") looks the same whether they are looking for the #1 book or the #50 book. So, the same detective can do all the jobs if you just tell them to ignore the ones they already found.

2. The "Crystal Ball" (Statistical Forecasting)

There was still a problem: Asking the detective to find the Top 100 one by one takes 100 questions. That's too many questions!

  • The Fix: OMEGA uses a "Crystal Ball" (statistics).
  • Once the detective has found the first 20 books, the Crystal Ball can predict: "Based on the pattern, there's a 95% chance the remaining 80 books are already in your list. You don't need to ask the detective anymore!"
  • This allows OMEGA to stop asking questions early, saving massive amounts of time.

The Real-World Impact

The authors tested this in a real-world scenario (Alibaba's cloud database), which handles millions of queries with different "Top K" requests every day.

  • Speed: OMEGA is 6% to 33% faster than the current best methods.
  • Cost: It requires only 16% to 30% of the preparation time (training cost) compared to other methods.
  • Accuracy: It hits the same high accuracy targets as the others.

The Analogy Summary

Think of it like a GPS navigation system:

  • Old Way: You have a different GPS map for every trip length. If you want a 1-mile trip, you load the "1-mile map." If you want a 100-mile trip, you load the "100-mile map." Loading all these maps takes forever.
  • The "Bad" Smart GPS: You have one GPS that only knows how to find the nearest gas station. If you ask for the "Top 5 gas stations," it gets lost or takes too long.
  • OMEGA: You have one GPS that is an expert at finding the nearest gas station.
    1. It finds the nearest one.
    2. It marks it as "visited."
    3. It instantly finds the next nearest one from the remaining list.
    4. After finding a few, it uses a statistical guess to say, "I'm 99% sure the next 95 stations are right here, so let's just stop and give you the list."

Result: You get the perfect list of gas stations, instantly, without needing to download a million different maps.