Knowledge-informed Bidding with Dual-process Control for Online Advertising

This paper proposes KBD, a novel bid optimization method that integrates human expertise as inductive biases and employs a dual-process control mechanism combining a fast rule-based PID system with a Decision Transformer to overcome the limitations of black-box models in data-sparse, long-term, and out-of-distribution scenarios.

Huixiang Luo, Longyu Gao, Yaqi Liu, Qianqian Chen, Pingchun Huang, Tianning Li

Published 2026-03-06
📖 5 min read🧠 Deep dive

Imagine you are running a lemonade stand in a very busy, chaotic city square. You have a limited amount of money (your budget) and you want to sell as many cups as possible (GMV) without spending more than you earn per cup (tCPA).

Every time a new customer walks by, you have to decide instantly: Do I offer them a discount? Do I raise my price? Do I even bother trying to sell to them?

In the past, companies used simple rules (like "always offer 10% off") or complex computer programs that just looked at yesterday's sales to guess what to do today. But these methods often failed when the weather changed, a festival started, or a new competitor opened next door. They were either too rigid or too short-sighted.

This paper introduces a new, smarter way to make these decisions called KBD. Think of KBD as a super-lemonade-stand manager that combines three powerful tools: Human Wisdom, Long-Term Vision, and Two Brains.

Here is how it works, broken down into simple parts:

1. The Two-Stage Strategy: The Daily Plan vs. The Hourly Hustle

KBD doesn't just make one decision; it thinks in two timeframes:

  • The Macro Stage (The Daily Plan):
    Imagine you are the Head Manager who wakes up every morning. You look at the forecast, check your bank account, and decide, "Today, we should aim to sell 500 cups at an average price of $2."

    • The Innovation: Instead of just guessing, this manager uses Human Wisdom (Expert Knowledge). It knows that if you lower the price too much, you might sell a lot but lose money. It uses a special "Price-Volume Map" that is built with rules humans have learned over years (e.g., "price goes down, volume goes up, but not forever"). This ensures the daily plan is realistic and safe, even if there isn't much data yet.
  • The Micro Stage (The Hourly Hustle):
    Now, imagine the Floor Manager who is actually standing at the stand. Every hour, they look at the crowd. Is it raining? Is a bus stopping nearby? Are people rushing? They tweak the price slightly up or down based on what's happening right now.

    • The Innovation: This manager uses a Decision Transformer (DT). Think of this as a "Time-Traveler." Instead of just looking at the last 5 minutes, it simulates the next 24 hours. It asks, "If I lower the price now, will I run out of money by 5 PM?" It optimizes for the whole day, not just the next minute.

2. The Secret Sauce: The "Two-Brain" System (Dual-Process Control)

This is the most creative part of the paper. The authors realized that sometimes the "Time-Traveler" (the AI) gets confused. Maybe a surprise parade happens, or a new product launches, and the AI's past data doesn't match the present. It might make a crazy mistake.

To fix this, they added a Second Brain based on how humans think:

  • System 1 (The Reflex Brain - PID Controller):
    This is your knee-jerk reaction. If you see a fire, you run. If your spending is getting too fast, this "Reflex Brain" immediately hits the brakes. It's a simple, rule-based calculator that says, "We are spending too fast, slow down!" It's fast, reliable, and never panics.
  • System 2 (The Thinking Brain - The AI):
    This is the slow, deep thinker. It analyzes complex patterns and plans for the future. It's smart but can be slow or make mistakes when things are weird.

How they work together:
The system fuses these two.

  • When things are normal, the Thinking Brain (AI) takes the lead, making smart, long-term bets.
  • When things get weird (like a sudden storm or a viral trend), the Reflex Brain steps in. It acts as a safety net. If the AI gets too excited or confused, the Reflex Brain gently pulls the reins back to keep you safe.

3. Why is this better than the old way?

  • Old Way (Black Box): The computer just memorized the past. If the past changed, the computer crashed or made bad bets. It was like driving a car with your eyes closed, hoping the road doesn't change.
  • KBD (Knowledge-Informed):
    1. It respects human rules: It knows that "more sales usually mean lower prices" and builds that rule into its brain from day one.
    2. It thinks ahead: It doesn't just win the next auction; it plans the whole day's strategy.
    3. It has a safety net: By combining the smart AI with a simple, reliable rule-based system, it can handle sudden changes (like a holiday sale) without losing money.

The Result

In their tests (simulated and real-world), this "Two-Brain" manager made more money and stayed within budget much better than the old methods. It was especially good at handling "surprises" where other systems failed.

In short: KBD is like hiring a brilliant strategist who plans the whole week, paired with a vigilant guard who keeps an eye on the clock and the wallet, ensuring you never overspend even when the world gets crazy.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →