IntPro: A Proxy Agent for Context-Aware Intent Understanding via Retrieval-conditioned Inference

Imagine you are talking to a very smart, but slightly distant, AI assistant (like a super-charged version of Siri or a chatbot). Sometimes, this AI gets your request wrong because it doesn't "know" you. It sees the words you type, but it misses the vibe, your past habits, or the specific reason you're asking.

The paper introduces IntPro, a solution to this problem. Think of IntPro not as the main AI, but as your personal "Context Coach" or a smart interpreter sitting between you and the big AI.

Here is how it works, broken down with simple analogies:

1. The Problem: The "Amnesiac" AI

Imagine you are at a restaurant. You say, "I'll have the usual."

Without IntPro: The waiter (the AI) looks at you blankly. "What is usual? I don't know you. Do you want the steak? The salad? The soup?" It treats every order as a brand-new stranger.
The Issue: Current AIs are great at reading text, but they are bad at remembering who you are and why you usually do things. They miss the "context."

2. The Solution: IntPro (The Personal Coach)

IntPro is a small, specialized agent that sits between you and the big AI. Its job is to figure out what you really mean before passing the message along.

It does this in two clever ways:

A. The "Intent Explanation" (The Translator)

Instead of just guessing your intent (e.g., "He wants food"), IntPro writes a short, human-like note explaining why.

Analogy: Instead of just handing the waiter a ticket that says "Order," IntPro writes a note: "This customer is stressed about work and usually orders the spicy soup to comfort themselves. They are likely asking for the spicy soup."
This note is stored in a Personal Library for that specific user.

B. The "Smart Memory Check" (The Librarian)

This is the magic part. IntPro doesn't always guess. It knows when to check its memory.

The Easy Case: If you say "I want pizza," IntPro knows immediately. It doesn't need to check the library. It just passes the order.
The Tricky Case: If you say something vague like "Ugh, do that again," IntPro gets confused. Is it a joke? Are you angry?
- The Action: IntPro acts like a librarian. It goes to your Personal Library, looks at your past notes, and finds similar situations.
- The Discovery: It finds a past note: "Last time you said 'do that again,' you were annoyed at your boss."
- The Result: IntPro now knows you are annoyed, not joking. It updates the note for the big AI: "User is annoyed, likely complaining about a recurring task."

3. How It Learned to Be Smart (The Training)

You might wonder, "How does this coach know when to check the library and when to just guess?"

The researchers taught IntPro using a method called Reinforcement Learning (like training a dog with treats).

The Game: They gave IntPro thousands of scenarios.
The Reward:
- If the situation was easy and IntPro guessed right without checking the library, it got a small treat (because checking the library takes time).
- If the situation was hard and IntPro guessed wrong without checking, it got a "scolding."
- If the situation was hard and IntPro checked the library and got it right, it got a big treat.
- If the situation was easy but IntPro wasted time checking the library, it got a small scolding.

Over time, IntPro learned the perfect balance: "Be fast when you know the answer, but dig deep into the memory when you're unsure."

4. Why This Matters

Privacy: Because IntPro is small and runs on your device (like your phone or laptop), your personal history stays with you. You don't have to send your private thoughts to a giant cloud server.
Speed: It's much faster than waiting for a giant server to think, because IntPro is a lightweight "coach" that knows exactly what to look for.
Understanding: It stops the AI from being a "one-size-fits-all" robot and makes it feel like it actually knows you.

Summary

IntPro is like a personal translator and memory keeper for your AI. It listens to you, checks your personal history if things are confusing, writes a clear explanation of what you mean, and then tells the big AI exactly how to respond. It makes the AI feel less like a machine and more like a friend who remembers your habits.

Here is a detailed technical summary of the paper "IntPro: A Proxy Agent for Context-Aware Intent Understanding via Retrieval-conditioned Inference".

1. Problem Statement

In Human-LLM collaboration workflows, accurately understanding user intent is critical for generating satisfactory responses. However, existing approaches face three main challenges:

Context-Awareness Gap: User intent is not static; it depends heavily on situational environments (e.g., current text, dialogue history) and underlying motivations. Standard models often treat intent recognition as a static classification task, failing to leverage implicit patterns in interaction history.
Lack of Personalization: Different users exhibit distinct intent patterns even under similar contexts. Current systems rarely adapt to individual user histories or provide explanations for why an intent was inferred.
Cost and Fragility: Directly using large cloud LLMs for complex context reasoning is expensive and relies on fragile prompt engineering. Conversely, small on-device models often lack the reasoning depth to handle ambiguity without external aids.

The paper proposes IntPro, a proxy agent designed to sit between the user and the cloud LLM. Its goal is to perform context-aware intent understanding by actively reasoning over both immediate context and personalized historical intent patterns, outputting structured intent labels and explanations.

2. Methodology

IntPro operates as a retrieval-conditioned inference agent. The methodology consists of three core components:

A. Intent Explanations as Retrieval Representations

Instead of storing raw user queries, IntPro generates Intent Explanations—natural language descriptions that abstract how contextual signals connect to the expressed intent.

Generic vs. Personalized: Explanations can be generic (semantic patterns) or personalized (incorporating user-specific motivations).
Library Construction: These explanations are stored in a per-user Intent History Library. This library serves as the retrieval source, allowing the agent to match current ambiguous contexts with past similar situations.

B. Training Pipeline

The training process involves two distinct phases:

Supervised Fine-Tuning (SFT) with Trajectory Generation:
- A framework generates training trajectories where the model learns two behaviors: Direct Inference (answering confidently) and Retrieval-Conditioned Inference (calling a retrieval tool when uncertain).
- The model learns to generate intent options, retrieve relevant historical patterns from the library, and synthesize a final judgment.
- This phase initializes the agent with the ability to generate structured explanations and use tools.
Reinforcement Learning (RL) via GRPO:
- The authors employ Group Relative Policy Optimization (GRPO) to refine the agent's policy.
- Tool-Aware Reward Function: A novel reward mechanism dynamically adjusts based on context difficulty (estimated by group accuracy):
  - Easy Contexts: Rewards correct direct answers; penalizes unnecessary tool calls.
  - Hard/Ambiguous Contexts: Rewards successful retrieval (finding the correct intent via history); penalizes incorrect direct answers.
- This encourages the agent to learn a conditional strategy: deciding when to rely on historical patterns versus when to infer directly, rather than blindly retrieving or ignoring history.

C. Inference Workflow

The proxy receives user context ( $C$ ).
It decides whether to invoke the retrieval tool based on confidence.
If retrieving, it queries the user's Intent History Library for similar past intent explanations.
It synthesizes the retrieved evidence with the current context to generate the final intent label ( $\ell$ ) and a detailed explanation ( $exp$ ).
This structured output is sent to the cloud LLM to generate the final response.

3. Key Contributions

Retrieval-Conditioned Intent Inference: A novel framework that treats intent understanding as a generative process augmented by a personalized, evolving history library.
Intent Explanations as Retrieval Keys: The design of natural language "intent explanations" that abstract context-intent connections, proving more effective for retrieval than raw utterances or full context.
Tool-Aware GRPO Training: A multi-turn reinforcement learning paradigm that explicitly teaches the agent to balance direct inference and retrieval based on context difficulty, avoiding the pitfalls of always-retrieving or never-retrieving.
Comprehensive Evaluation: Validation across three diverse domains (Reading, Dialogue, Social Media) and multiple model sizes (3B–4B parameters), demonstrating robustness and generalization.

4. Experimental Results

Experiments were conducted on Highlight-Intent (reading), MIntRec2.0 (multi-party dialogue), and Weibo Post-Sync (social media).

Performance: IntPro consistently outperformed baselines, including:
- Cloud LLMs (GPT-4o, Qwen3-30B): Even when augmented with retrieval, IntPro (using smaller 3B-4B models) achieved higher accuracy and better generalization.
- Discriminative Models (BERT, RoBERTa): IntPro surpassed them by generating explanations and handling long-tail intent distributions better.
- Training Variants: IntPro significantly outperformed "Naive GRPO" (without tool-aware rewards) and standard SFT, particularly in generalization gaps and handling ambiguous cases.
Ablation Studies:
- Personalized Explanations: Showed a 46% improvement in retrieval accuracy (Global R@1) compared to generic explanations.
- Reward Design: Removing tool-aware rewards caused the model to converge to a retrieval-averse policy, confirming the necessity of the dynamic reward signal.
- Retrieval Strategy: The "Self-decided" strategy (IntPro) outperformed both "Forced Retrieval" and "No Retrieval," proving the agent learns to adaptively select the best strategy.
Progressive Accumulation: As the user's intent history library grew, IntPro's performance improved linearly (+5.5% gain), whereas non-retrieval baselines remained flat.
Efficiency: IntPro runs on-device (3B-4B models) with low latency (~~150ms) and memory usage (~~8-11GB), offering a privacy-preserving alternative to cloud LLMs while maintaining high performance.

5. Significance

This paper addresses a critical bottleneck in Human-AI collaboration: the inability of current systems to deeply understand user-specific intent in complex contexts.

Paradigm Shift: It moves intent understanding from a static classification task to a dynamic, retrieval-augmented generative process.
Personalization at Scale: By maintaining an evolving intent history library, IntPro enables continuous adaptation to individual users without retraining the base model.
Deployment Viability: It demonstrates that small, on-device models can achieve state-of-the-art context-aware reasoning when equipped with the right retrieval mechanisms and RL training, making advanced AI assistants feasible for local, privacy-sensitive environments.
Interpretability: The generation of "intent explanations" provides transparency, allowing humans to audit why the system inferred a specific intent, bridging the gap between black-box AI and human oversight.