Dynamic Knowledge Fusion for Multi-Domain Dialogue State Tracking

This paper proposes a dynamic knowledge fusion framework for multi-domain dialogue state tracking that addresses challenges in modeling dialogue history and data scarcity by using a contrastive learning-based encoder to select relevant slots and leveraging their structured information as contextual prompts to improve tracking accuracy and generalization.

Haoxiang Su, Ruiyu Fang, Liting Jiang, Xiaomeng Huang, Shuangyong Song

Published Thu, 12 Ma
📖 4 min read☕ Coffee break read

Imagine you are a super-efficient concierge at a massive, multi-story hotel. This hotel isn't just for guests; it's also a travel agency, a restaurant guide, a taxi service, and a hospital all rolled into one.

Every day, guests (the users) come to you with complex requests. They might say, "I need a cheap hotel near the beach, but also a taxi to get there, and I want to book a table at a Mexican restaurant for later."

Your job is Dialogue State Tracking (DST). You need to listen to the conversation, remember what the guest wants, and keep a perfect mental list of their "state" (e.g., Hotel: Cheap/Beach, Taxi: Needed, Restaurant: Mexican).

The Problem: The "Information Overload" Headache

In the past, trying to be this concierge was incredibly hard for two reasons:

  1. The Memory Wall: The guest talks for a long time. Keeping track of every single detail without getting confused is tough.
  2. The "Kitchen Sink" Approach: To help the concierge, developers used to dump everything they knew about the hotel onto the desk. They gave the concierge a giant book containing every possible room type, every taxi route, and every menu item, even if the guest only asked about a hotel.
    • The Result: The concierge got overwhelmed. They spent too much time reading the irrelevant parts of the book (like the taxi menu when the guest just wanted a hotel), leading to mistakes. This is called "attention dilution."

The Solution: The "Smart Filter" System (DKF-DST)

The paper introduces a new system called DKF-DST (Dynamic Knowledge Fusion). Think of this as giving your concierge a smart, magical assistant that works in two distinct steps.

Step 1: The "Relevance Radar" (Information Selection)

Before the concierge even looks at the guest's request, the assistant scans the conversation and the giant knowledge book.

  • How it works: It uses a technique called Contrastive Learning. Imagine the assistant is playing a game of "Hot and Cold." It compares the guest's words ("I need a cheap hotel") against every single item in the knowledge book.
  • The Magic: It instantly realizes, "Hey, 'cheap' and 'hotel' are a hot match! But 'taxi' and 'restaurant' are cold right now."
  • The Result: It filters out 90% of the noise. It only pulls out the specific pages about "Hotel Prices" and "Hotel Locations" and puts them on the desk. It ignores the taxi and restaurant info for now. This saves the concierge's brainpower.

Step 2: The "Dynamic Prompt" (Knowledge Fusion)

Now that the relevant pages are on the desk, the assistant doesn't just hand them over; it organizes them into a fill-in-the-blank template.

  • The Setup: Instead of a messy pile of papers, the assistant creates a neat form that says: "The guest wants a [0] hotel in the [1] area."
  • The Dynamic Part: It fills the [0] and [1] spots with the specific options the guest mentioned (e.g., "cheap" and "south").
  • The Output: The concierge (the AI model) looks at this clean, focused form and easily writes the final answer: "Okay, I found a cheap hotel in the south."

Why is this better?

  1. No More Distractions: By filtering out irrelevant info first (Step 1), the model doesn't get confused by things the guest didn't ask about.
  2. Adaptable: If the guest suddenly says, "Actually, forget the hotel, I need a taxi," the system instantly re-runs the radar, drops the hotel pages, and pulls up the taxi pages. It's dynamic.
  3. Works with Less Data: Because the system is so smart at focusing, it doesn't need to have read millions of examples to learn how to do this. It can generalize well even with fewer training examples.

The Analogy Summary

  • Old Way: Giving a student a library of 10,000 books and asking them to write an essay on "Apples." They spend hours reading about "Bananas" and "Cars" before finally finding the apple section.
  • DKF-DST Way: A librarian (the radar) instantly finds the 3 books about apples, hands them to the student, and gives them a worksheet with blanks to fill in. The student writes the essay quickly and perfectly.

The Bottom Line

This paper proves that by filtering information before processing it, AI can understand complex, multi-topic conversations much better. It makes the AI less like a confused robot drowning in data, and more like a sharp, focused expert who knows exactly what to pay attention to.