LocalSUG: Geography-Aware LLM for Query Suggestion in Local-Life Services

Imagine you are walking into a massive, bustling food court in a new city. You are hungry and type "Pizza" into the search bar.

In a traditional system, the computer looks at a giant, static list of the most popular pizzas ever ordered anywhere in the world. It might suggest "Domino's." But if you are in a city where Domino's doesn't exist, that suggestion is useless. You have to keep typing, get frustrated, and maybe leave without eating.

This paper introduces LocalSUG, a new, super-smart assistant designed specifically for local life services (like food delivery, hotels, or local shops) that fixes these problems using a Large Language Model (LLM).

Here is how LocalSUG works, explained through simple analogies:

1. The Problem: The "Tourist" vs. The "Local"

The authors identified three main headaches with using AI for local search:

The Tourist Problem (Geographic Grounding): A standard AI is like a tourist who thinks "Pizza" means "Domino's" everywhere. It doesn't know that in your specific city, "Pizza" actually means "Papa John's" or a local favorite. It gives you suggestions that sound right but are impossible to find.
The "Practice vs. Reality" Problem (Exposure Bias): Imagine a chef who practices cooking by reading a recipe book (training) but then has to cook a meal for a crowd using a specific order of ingredients (inference). If the practice doesn't match the real cooking method, the meal fails. Traditional AI trains on lists of past searches but tries to generate suggestions one by one, leading to messy, inconsistent results.
The Speed Bump (Latency): These smart AI models are heavy. Asking them for an answer takes too long. In a busy food court, if the waiter takes 5 seconds to think of a suggestion, the customer walks away.

2. The Solution: LocalSUG

LocalSUG is a framework built to be a hyper-local, fast, and consistent assistant. Here are its three secret weapons:

A. The "Local Guide" Map (City-Aware Mining)

Instead of guessing, LocalSUG acts like a local guide who knows exactly what's open in your neighborhood.

How it works: Before the AI even starts thinking, it pulls a "cheat sheet" based on your city. If you are in Beijing, it knows to suggest Domino's. If you are in Macau, it knows Domino's isn't there and suggests Pizza Hut instead.
The Analogy: It's like having a GPS that doesn't just show you the fastest route, but also knows which gas stations are actually open in your specific town.

B. The "Rehearsal" Method (Beam-Search-Driven Training)

To fix the "Practice vs. Reality" problem, the authors changed how they teach the AI.

How it works: Instead of just teaching the AI to write one perfect sentence, they teach it to generate a whole list of options at once (like a beam search) and then pick the best ones. They use a special scoring system (GRPO) that rewards the AI not just for being correct, but for being relevant to the business (e.g., "Did the user click? Did they order?").
The Analogy: Imagine a basketball coach. Instead of just telling the player, "Shoot the ball," the coach makes the player practice shooting a whole series of shots in a game scenario. The player learns to handle the pressure of the game, not just the drill. This ensures the AI performs perfectly when it's actually serving customers.

C. The "Speedy Chef" (Quality-Aware Acceleration)

To fix the speed issue, they made the AI run faster without losing its brainpower.

How it works: They taught the AI to ignore words it will never use (vocabulary pruning) and to stop thinking once it has found a few good answers (early stopping).
The Analogy: Imagine a chef who usually checks 10,000 spices in the pantry. LocalSUG is like a chef who only keeps the 30,000 most common spices on the counter and knows exactly when to stop tasting the soup. They get the perfect flavor 3x faster.

3. The Results: A Happier Customer

When they tested this in the real world (on a massive platform with millions of users), the results were impressive:

Fewer Dead Ends: The rate of users searching for something and finding nothing dropped by 2.56%.
More Clicks: People clicked on the suggestions more often (+0.35%).
Less Typing: Users had to type fewer letters to find what they wanted because the suggestions were so accurate.
Discovery: Users found more unique, interesting items they hadn't seen before (like that hidden gem pizza place).

Summary

LocalSUG is like upgrading a generic, slow, out-of-touch tour guide into a local expert who knows your neighborhood, practices for the real game, and gives you answers instantly. It turns the frustrating experience of "I can't find anything" into a smooth, helpful discovery of exactly what you need, right where you are.

Here is a detailed technical summary of the paper "LocalSUG: Geography-Aware LLM for Query Suggestion in Local-Life Services."

1. Problem Statement

In local-life service platforms (e.g., food delivery, hotel booking), Query Suggestion (QS) is critical for reducing user typing effort and accelerating search. However, existing systems face three major challenges when transitioning from traditional multi-stage cascading architectures to Large Language Model (LLM)-based generative approaches:

Lack of Geographic Grounding: User intent is highly location-dependent. A globally trained LLM may suggest valid entities (e.g., "Domino's Pizza") that do not exist in a specific city (e.g., Macao), leading to irrelevant suggestions and poor user experience.
Exposure Bias in Preference Optimization: Traditional generative methods train on historical logs using sequence-level preference optimization (e.g., S-DPO). However, production systems use beam search for list-wise generation and ranking. This mismatch between training (sequence-level) and inference (list-wise) causes coherence and ranking inconsistencies.
Online Inference Latency: Local-life services require strict real-time responses. High-capacity LLMs often incur computational overhead that makes naive deployment infeasible for high-concurrency industrial environments.

2. Methodology: LocalSUG Framework

The authors propose LocalSUG, an end-to-end generative framework tailored for local-life services, built upon a Qwen3-0.6B backbone. The framework consists of three core phases:

A. City-Aware Candidate Mining

To inject geographic grounding without relying on heavy retrieval models (like RQ-VAE), the authors employ a statistical term co-occurrence strategy:

Dual Candidate Lists: For every input prefix, the system maintains two sorted candidate lists based on historical click counts:
1. City-Specific ( $L_{city}$ ): Prioritized to ensure local relevance.
2. Global ( $L_{global}$ ): Used as a fallback if city-specific candidates are insufficient.
Dynamic Updates: The candidate pool is updated daily using a sliding window of the past 7 days' logs to capture temporal trends.
Input Construction: The model input concatenates the user prefix, the mined candidates, trending hot words, user behavior history, and user profiles.

B. Beam-Search-Driven GRPO (Group Relative Policy Optimization)

To bridge the gap between offline training and online beam-search inference, LocalSUG introduces a novel training algorithm:

Mechanism: Instead of standard sequence-level optimization, the model generates a group of $G$ outputs via beam search during training.
Multi-Objective Reward Function: A unified reward $R(y_i, Y_G)$ $R (y_{i}, Y_{G})$ is calculated for each candidate in the group, combining five objectives:
1. Gap Shaping: Penalizes tail candidates ( $G-K$ ) to force high-quality results into the top- $K$ beam.
2. Target Hit: Rewards the presence of the ground-truth query in the generated list.
3. Rank Bonus: Rewards higher ranks for the ground truth ($1/\log_{10}(\text{rank} + 1)$).
4. Format Penalty: Penalizes repetition, invalid tokens, or structural errors.
5. Miss Penalty: Penalizes the absence of the ground truth in the top- $K$ .
Order-Aware Weighting: The loss function includes a weight $\omega$ that prioritizes samples leading to actual conversions (orders), aligning the model with business metrics.

C. Quality-Aware Accelerated Beam Search (QA-BS) & Pruning

To meet latency constraints, the authors developed specific inference acceleration techniques:

QA-BS: An adaptive beam search algorithm that:
- Prunes active beams if their cumulative log-probability falls below a threshold $\tau$ .
- Early Terminates if valid candidates saturate the capacity or if all beams fall below the threshold (Fail-Safe).
Vocabulary Pruning: Statistical analysis of 38M logs revealed sparse token frequency. The model's output head was pruned to the top 30,000 most frequent tokens, significantly reducing computation for irrelevant tokens with negligible impact on accuracy.

3. Key Contributions

Geographically Grounded Framework: A production-ready system that integrates city-aware candidate mining to ensure suggestions are valid for specific locations.
Beam-Search-Driven GRPO: A novel alignment algorithm that bridges the training-inference gap by optimizing for list-wise objectives and business metrics simultaneously, improving ranking stability and diversity.
Efficient Deployment: The introduction of QA-BS and vocabulary pruning enables the first large-scale deployment of a 0.6B-parameter LLM in a high-concurrency industrial environment, overcoming latency bottlenecks.
Empirical Validation: Extensive offline and online evaluations demonstrating significant improvements in user experience and business metrics.

4. Experimental Results

Offline Evaluation

LocalSUG was compared against baselines including a traditional Multi-stage Cascading Architecture (MCA), standard SFT, and OneSug (a state-of-the-art generative framework).

Performance: LocalSUG achieved the best results across HitRate@12 (96.36%), MRR (81.13%), and Quality (98.55%).
Ablation: Removing the format penalty ( $r_{fmt}$ ) or rank bonus ( $r_{rank}$ ) caused significant performance drops, confirming the necessity of the multi-objective reward design.

Online A/B Testing

Deployed on 10% of real-world traffic for one week, LocalSUG demonstrated:

User Experience: Reduced the Few/No-Result rate by 2.56%.
Business Metrics: Increased Page View (PV) CTR by 0.35% and Session CTR by 0.25%.
Diversity: Increased Unique Item Exposure by 7.50%, indicating better discovery of long-tail items.
Efficiency: Reduced the average SUG input length by 0.75%, meaning users found what they needed with fewer keystrokes.

Efficiency Analysis

The combination of QA-BS and vocabulary pruning reduced inference latency significantly (from ~1.41 samples/s to ~3.77 samples/s in specific configurations) while maintaining high HitRate and MRR.
Human evaluation showed a 1.24% increase in "Good" ratings compared to the baseline MCA system.

5. Significance

LocalSUG represents a significant step forward in applying LLMs to industrial-scale, low-latency, geographically constrained search systems.

Solving the "Long-Tail" Problem: By leveraging generative capabilities, it effectively addresses long-tail and emerging intents that traditional retrieval-based systems miss.
Bridging the Gap: The Beam-Search-Driven GRPO offers a new paradigm for aligning generative models with real-world inference constraints, moving beyond simple sequence-level optimization.
Scalability: It proves that small-parameter LLMs (0.6B), when optimized with domain-specific strategies (geographic grounding, vocabulary pruning), can outperform larger, unoptimized models in production environments.

This work demonstrates that with careful architectural design and reward engineering, generative AI can be successfully deployed to solve complex, real-world problems in local-life services, balancing semantic richness with strict operational constraints.