Beyond Linear LLM Invocation: An Efficient and Effective Semantic Filter Paradigm

This paper proposes Clustering-Sampling-Voting (CSV), a novel framework that significantly reduces the linear latency and token costs of semantic filtering in large language models by embedding tuples into semantic clusters, sampling subsets for evaluation, and inferring cluster-level labels through voting strategies, thereby achieving sublinear complexity with strong error guarantees.

Nan Hou, Kangfei Zhao, Jiadong Xie + 1 more2026-03-06💻 cs

Beyond the Context Window: A Cost-Performance Analysis of Fact-Based Memory vs. Long-Context LLMs for Persistent Agents

This paper compares fact-based memory systems against long-context LLMs for persistent agents, finding that while long-context models generally offer superior factual recall, fact-based memory provides a more cost-effective solution for long-term interactions by maintaining fixed per-turn costs after an initial write phase, with a specific break-even point determined by context length and interaction volume.

Natchanon Pollertlam, Witchayut Kornsuwannawit2026-03-06💬 cs.CL

From Unfamiliar to Familiar: Detecting Pre-training Data via Gradient Deviations in Large Language Models

This paper proposes GDS, a novel method for detecting pre-training data in Large Language Models by analyzing systematic gradient deviations—specifically update magnitudes, locations, and neuron activation patterns—that distinguish familiar samples from unfamiliar ones, achieving state-of-the-art performance and superior cross-dataset transferability compared to existing likelihood-based or heuristic approaches.

Ruiqi Zhang, Lingxiang Wang, Hainan Zhang + 2 more2026-03-06💬 cs.CL

An Approach to Simultaneous Acquisition of Real-Time MRI Video, EEG, and Surface EMG for Articulatory, Brain, and Muscle Activity During Speech Production

This paper presents a novel framework for the simultaneous acquisition of real-time MRI, EEG, and surface EMG to capture brain, muscle, and articulatory activity during speech, featuring a specialized artifact suppression pipeline to overcome technical challenges and enable unprecedented insights into speech neuroscience.

Jihwan Lee, Parsa Razmara, Kevin Huang + 16 more2026-03-06🤖 cs.AI