Information-Theoretic Bayesian Optimization for Bilevel Optimization Problems

This paper proposes an information-theoretic Bayesian optimization framework for bilevel problems with expensive black-box functions, introducing a unified acquisition criterion that simultaneously maximizes information gain for both upper- and lower-level optimal solutions and values.

Takuya Kanayama, Yuki Ito, Tomoyuki Tamura, Masayuki Karasuyama

Published 2026-02-27
📖 5 min read🧠 Deep dive

Imagine you are trying to design the perfect new smartphone.

You have two goals that are tangled together:

  1. The Upper Goal (The Boss): You want the phone to have the best possible battery life and camera quality.
  2. The Lower Goal (The Worker): But, before you can judge the battery or camera, you first have to figure out the perfect internal wiring that makes those features work. If the wiring is bad, the battery and camera don't matter.

This is a Bilevel Optimization Problem. It's like a "Boss" trying to make a decision, but the Boss can only make a good decision if a "Worker" first solves a complex problem perfectly.

The Problem: It's Expensive and Slow

In the real world, testing these phones isn't free.

  • The Boss's test: Building a prototype costs $10,000.
  • The Worker's test: Simulating the wiring inside a supercomputer takes 10 hours and costs $5,000.

You can't just try a million different designs. You have a limited budget and time. This is where Bayesian Optimization (BO) comes in. It's like a smart, cautious explorer who uses a map to guess where the treasure is, so they don't waste steps.

However, most existing "smart explorers" only looked at the Boss's map. They assumed the Worker's job was easy and free. But in this paper, the authors say: "Wait, the Worker's job is also expensive and hard!"

The Solution: The "Information Detective" (BLJES)

The authors propose a new method called BLJES (Bilevel optimization via Lower-bound based Joint Entropy Search).

To understand how it works, let's use a Detective Analogy.

1. The Old Way (The "Guess and Check" Detective)

Old methods were like detectives who only cared about solving the final crime (the Boss's goal). They would ask the Worker, "What's the best wiring?" 100 times for every single clue they found.

  • Result: They wasted a lot of money on the Worker, or they made bad guesses because they didn't understand how the Worker's job influenced the final result.

2. The New Way (The "Information Detective")

The new method, BLJES, is a detective who asks a different question: "Which single test will teach me the most about both the final crime AND the wiring?"

Instead of just looking for the "best" phone, the detective looks for the "most informative" phone.

  • If testing a specific wiring design teaches us a lot about how to fix the battery and the camera, that's a high-value test.
  • If testing a design only tells us something we already know, we skip it.

How Do They Do It? (The Magic Tricks)

The paper uses some heavy math, but here are the two main "magic tricks" they use to make this work:

Trick #1: The "What If" Simulation (The Truncation)
Imagine you are guessing the winner of a race.

  • Normal thinking: "Who is the fastest runner?"
  • BLJES thinking: "If I knew the winner was exactly Runner A, what would that tell me about the other runners?"

The authors use a mathematical trick called Truncation. They pretend they already know the answer (the optimal solution) and ask, "If this were the answer, what would the data look like?" By comparing their current guess to this "perfect answer" scenario, they can calculate exactly how much "information" they would gain by running a new test.

Trick #2: The "Shadow Puppet" (Random Fourier Features)
Calculating the "perfect answer" is computationally impossible because there are too many variables. It's like trying to simulate every single atom in a phone.

  • The Fix: They use Random Fourier Features. Think of this as creating a "shadow puppet" of the complex problem. Instead of simulating the real, heavy 3D object, they simulate a simplified 2D shadow that moves exactly the same way. This allows them to run thousands of "What If" simulations in their head (on the computer) very quickly to find the best move.

The Result

The authors tested this new detective (BLJES) against old methods on various problems, from designing chemical reactions to optimizing energy markets.

  • The Outcome: The new method found better solutions faster and spent less money. It was able to balance the needs of the "Boss" and the "Worker" simultaneously, rather than treating them as separate problems.

Summary

  • The Problem: Solving a two-layered puzzle where both layers are expensive to test.
  • The Mistake: Old methods ignored the cost of the inner layer.
  • The Fix: A new method (BLJES) that treats the whole puzzle as one big information game. It asks, "Where should I look next to learn the most about the whole system?"
  • The Analogy: Instead of just trying to win the race, the detective figures out which practice run will teach them the most about both the track conditions and the runner's shoes, ensuring every step counts.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →