cs.IR papers | Gist.Science

RED: Robust Event-Guided Motion Deblurring with Modality-Specific Disentanglement

This paper introduces RED, a robust event-guided motion deblurring network that employs a robustness-oriented perturbation strategy and a modality-specific disentanglement mechanism to effectively reconstruct sharp images from fragmented event data caused by real-world sensor under-reporting.

Yihong Leng, Siming Zheng, Jinwei Chen, Bo Li, Jiaojiao Li, Peng-Tao JiangMon, 09 Ma💻 cs

MLLMRec-R1: Incentivizing Reasoning Capability in Large Language Models for Multimodal Sequential Recommendation

MLLMRec-R1 is an efficient GRPO-based framework for multimodal sequential recommendation that overcomes the high computational costs of visual token processing and the issue of reward inflation by textualizing visual signals offline and employing a mixed-grained data augmentation strategy to construct high-quality reasoning supervision.

Yu Wang, Yonghui Yang, Le Wu, Jiancan Wu, Hefei Xu, Hui LinMon, 09 Ma💻 cs

ChatShopBuddy: Towards Reliable Conversational Shopping Agents via Reinforcement Learning

This paper introduces ChatShopBuddy, a conversational shopping agent optimized via Reinforcement Learning using a new benchmark (SmartShopBench), a Hierarchical Reward Modeling framework, and a Dynamic Contrastive Policy Optimization algorithm to effectively balance product correctness, persuasiveness, and operational efficiency in real-world scenarios.

Yiruo Cheng, Kelong Mao, Tianhao Li, Jiejun Tan, Ji-Rong Wen, Zhicheng DouMon, 09 Ma💻 cs

AutothinkRAG: Complexity-Aware Control of Retrieval-Augmented Reasoning for Image-Text Interaction

AutoThinkRAG is a complexity-aware framework for image-text interaction that improves document question answering by routing queries based on difficulty and decoupling visual interpretation from logical reasoning to achieve state-of-the-art performance with reduced inference costs.

Jiashu Yang, Chi Zhang, Abudukelimu Wuerkaixi, Xuxin Cheng, Cao Liu, Ke Zeng, Xu Jia, Xunliang CaiMon, 09 Ma💻 cs

Both Ends Count! Just How Good are LLM Agents at "Text-to-Big SQL"?

This paper introduces novel "Text-to-Big SQL" evaluation metrics to address the limitations of existing benchmarks in assessing production-level LLM agents, demonstrating that traditional Text-to-SQL metrics fail to capture critical cost, latency, and efficiency implications that arise when scaling to large datasets.

Germán T. Eizaguirre, Lars Tissen, Marc Sánchez-ArtigasMon, 09 Ma💬 cs.CL

Verify as You Go: An LLM-Powered Browser Extension for Fake News Detection

This paper introduces Aletheia, a novel LLM-powered browser extension that combines Retrieval-Augmented Generation with interactive user features to effectively detect fake news and provide transparent, evidence-based explanations, outperforming existing baselines in both detection accuracy and user engagement.

Dorsaf Sallami, Esma AïmeurMon, 09 Ma💬 cs.CL

GaiaFlow: Semantic-Guided Diffusion Tuning for Carbon-Frugal Search

GaiaFlow is a novel framework that achieves carbon-frugal search by integrating semantic-guided diffusion tuning, retrieval-guided Langevin dynamics, and adaptive efficiency protocols to balance high retrieval accuracy with significantly reduced environmental impact.

Rong Fu, Jia Yee Tan, Chunlei Meng, Shuo Yin, Xiaowen Ma, Wangyu Wu, Muge Qi, Guangzhen Yao, Zhaolu Kang, Zeli Su, Simon FongMon, 09 Ma🤖 cs.LG

Efficient, Property-Aligned Fan-Out Retrieval via RL-Compiled Diffusion

The paper proposes R4T, a three-stage framework that leverages reinforcement learning to synthesize objective-aligned training data for a lightweight diffusion retriever, enabling efficient, high-quality set-valued retrieval that optimizes complex properties like diversity and coverage while significantly reducing inference latency compared to RL-based baselines.

Pengcheng Jiang, Judith Yue Li, Moonkyung Ryu, R. Lily Hu, Kun Su, Zhong Yi Wan, Liam Hebert, Hao Peng, Jiawei Han, Dima Kuzmin, Craig BoutilierMon, 09 Ma🤖 cs.LG

Efficient Vector Search in the Wild: One Model for Multi-K Queries

The paper introduces OMEGA, a K-generalizable learned top-K search method that leverages a base model trained on K=1 with trajectory-based features and a dynamic refinement procedure to achieve high accuracy and low latency for multi-K vector queries while significantly reducing preprocessing time compared to state-of-the-art methods.

Yifan Peng, Jiafei Fan, Xingda Wei, Sijie Shen, Rong Chen, Jianning Wang, Xiaojian Luo, Wenyuan Yu, Jingren Zhou, Haibo ChenMon, 09 Ma🤖 cs.LG

HCT-QA: A Benchmark for Question Answering on Human-Centric Tables

This paper introduces HCT-QA, a comprehensive benchmark comprising thousands of real-world and synthetic human-centric tables with natural language question-answer pairs, designed to evaluate and improve the performance of Large Language Models and Vision Language Models in querying complex tabular data.

Mohammad S. Ahmad, Zan A. Naeem, Michaël Aupetit, Ahmed Elmagarmid, Mohamed Eltabakh, Xiaosong Ma, Mourad Ouzzani, Chaoyi Ruan, Hani Al-SayehMon, 09 Ma🤖 cs.AI

Sensitivity-Aware Retrieval-Augmented Intent Clarification

This paper proposes a three-step framework to develop sensitivity-aware retrieval-augmented intent clarification systems that balance user utility with the protection of sensitive information in domains like healthcare and legal contexts by defining attack models, designing retrieval-level defenses, and establishing evaluation metrics for the protection-utility trade-off.

Maik LarooijMon, 09 Ma🤖 cs.AI

Balancing Domestic and Global Perspectives: Evaluating Dual-Calibration and LLM-Generated Nudges for Diverse News Recommendation

This study evaluates a dual-calibration algorithmic nudge and an LLM-based presentation nudge within a personalized diversity framework, finding that while algorithmic nudges effectively increase news consumption diversity and shift long-term reading habits toward balanced domestic and global coverage, LLM-based presentation nudges yield variable results and user-specific topic interest remains the strongest predictor of engagement.

Ruixuan Sun, Matthew Zent, Minzhu Zhao, Thanmayee Boyapati, Xinyi Li, Joseph A. KonstanMon, 09 Ma🤖 cs.AI

The DSA's Blind Spot: Algorithmic Audit of Advertising and Minor Profiling on TikTok

This paper presents an algorithmic audit of TikTok revealing that while the platform technically complies with the Digital Service Act's ban on profiled advertising to minors, it effectively circumvents this protection by delivering highly personalized, often undisclosed influencer marketing content to adolescents, thereby highlighting the urgent need to expand the regulatory definition of "advertisement" to cover such commercial practices.

Sara Solarova, Matej Mosnar, Matus Tibensky, Jan Jakubcik, Adrian Bindas, Simon Liska, Filip Hossner, Matúš Mesarčík, Ivan SrbaMon, 09 Ma🤖 cs.AI

CBR-to-SQL: Rethinking Retrieval-based Text-to-SQL using Case-based Reasoning in the Healthcare Domain

The paper introduces CBR-to-SQL, a Case-Based Reasoning framework that improves Text-to-SQL generation in healthcare by utilizing abstract case templates and a two-stage retrieval process to achieve higher accuracy, sample efficiency, and robustness compared to standard Retrieval-Augmented Generation methods on the MIMICSQL dataset.

Hung Nguyen, Hans Moen, Pekka MarttinenMon, 09 Ma🤖 cs.AI

VDCook:DIY video data cook your MLLMs

VDCook is a self-evolving, configurable video data operating system that enables researchers and domain teams to automatically generate, update, and manage specialized video training datasets for MLLMs through natural language queries, integrated retrieval-synthesis modules, and automated metadata annotation.

Chengwei WuMon, 09 Ma🤖 cs.AI

MDER-DR: Multi-Hop Question Answering with Entity-Centric Summaries

The paper introduces MDER-DR, a novel Retrieval-Augmented Generation framework that combines a Map-Disambiguate-Enrich-Reduce indexing strategy with a Decompose-Resolve retrieval mechanism to significantly improve multi-hop question answering on Knowledge Graphs by preserving contextual nuance and enabling robust reasoning without explicit graph traversal.

Riccardo Campi, Nicolò Oreste Pinciroli Vago, Mathyas Giudici, Marco Brambilla, Piero FraternaliFri, 13 Ma💬 cs.CL

How Auditing Methodologies Can Impact Our Understanding of YouTube's Recommendation Systems

This paper examines how various methodological decisions and configuration parameters in YouTube audits significantly impact the accuracy of inferred algorithmic biases, ultimately offering strategies to optimize audit overhead without compromising scientific validity.

Sarmad Chandio, Daniyal Pirwani Dar, Rishab Nithyanand2026-03-10💻 cs

Agent-OM: Leveraging LLM Agents for Ontology Matching

This paper introduces Agent-OM, a novel framework leveraging Siamese LLM agents and specialized tools to address ontology matching challenges, demonstrating competitive performance on simple tasks and significant improvements on complex and few-shot scenarios compared to state-of-the-art systems.

Zhangcheng Qiang, Weiqing Wang, Kerry Taylor2026-03-10💬 cs.CL

Detecting RAG Advertisements Across Advertising Styles

This paper introduces a taxonomy for advertising styles in RAG systems, simulates style-based evasion tactics, and demonstrates that while entity recognition models effectively detect generated ads and remain robust to style changes, lightweight models suitable for end-user devices currently lack the necessary resilience.

Sebastian Heineking, Wilhelm Pertsch, Ines Zelch + 4 more2026-03-06💻 cs

Beyond Text: Aligning Vision and Language for Multimodal E-Commerce Retrieval

This paper proposes a novel modality fusion network that leverages domain-specific fine-tuning and a two-stage alignment strategy to effectively unify text and visual signals for improved multimodal retrieval in e-commerce search.

Qujiaheng Zhang, Guagnyue Xu, Fengjie Li2026-03-06💻 cs

← Previous Next →