cs.CL papers | Gist.Science

One Language, Two Scripts: Probing Script-Invariance in LLM Concept Representations

This paper demonstrates that Sparse Autoencoder features in Gemma models capture abstract semantics rather than surface orthography by showing that identical Serbian sentences written in completely different tokenized scripts (Latin and Cyrillic) activate highly overlapping features, with this script invariance increasing as model scale grows.

Sripad KarneWed, 11 Ma💬 cs.CL

Missing-by-Design: Certifiable Modality Deletion for Revocable Multimodal Sentiment Analysis

The paper introduces Missing-by-Design (MBD), a unified framework for revocable multimodal sentiment analysis that combines structured representation learning with a certifiable parameter-modification pipeline to enable the machine-verifiable deletion of specific data modalities while maintaining predictive performance and privacy compliance.

Rong Fu, Ziming Wang, Chunlei Meng, Jiaxuan Lu, Jiekai Wu, Kangan Qian, Hao Zhang, Simon FongWed, 11 Ma🤖 cs.LG

You Didn't Have to Say It like That: Subliminal Learning from Faithful Paraphrases

This paper demonstrates that language models can covertly acquire behavioral traits from a teacher model through "subliminal learning" on faithful paraphrases, where the student adopts the teacher's preferences even when the paraphrased content is semantically unrelated or explicitly contradicts those preferences, rendering content-based inspection ineffective.

Isaia Gisler (ETH Zürich), Zhonghao He (University of Cambridge), Tianyi Qiu (Peking University)Wed, 11 Ma🤖 cs.LG

Exclusive Self Attention

The paper introduces Exclusive Self Attention (XSA), a modification that constrains attention to information orthogonal to a token's own value vector, thereby improving Transformer performance in language modeling tasks, particularly as sequence length increases.

Shuangfei ZhaiWed, 11 Ma🤖 cs.LG

CRANE: Causal Relevance Analysis of Language-Specific Neurons in Multilingual Large Language Models

This paper introduces CRANE, a causal relevance analysis framework that identifies language-specific neurons in multilingual LLMs through targeted interventions, demonstrating that these neurons functionally specialize in language-conditioned predictions rather than merely exhibiting high activation magnitudes.

Yifan Le, Yunliang LiWed, 11 Ma🤖 cs.AI

Enhancing Retrieval-Augmented Generation with Entity Linking for Educational Platforms

This paper introduces ELERAG, an enhanced Retrieval-Augmented Generation system that integrates Wikidata-based Entity Linking and a hybrid re-ranking strategy to significantly improve factual accuracy in Italian educational question-answering, particularly outperforming standard methods in domain-specific contexts while demonstrating the importance of domain-adapted strategies.

Francesco Granata, Francesco Poggi, Misael MongiovìWed, 11 Ma🤖 cs.AI

NavSpace: How Navigation Agents Follow Spatial Intelligence Instructions

This paper introduces the NavSpace benchmark to systematically evaluate the spatial intelligence of navigation agents through six task categories and 1,228 trajectory-instruction pairs, revealing limitations in current models and proposing SNav, a new spatially intelligent navigation model that outperforms existing agents on both the benchmark and real robot tests.

Haolin Yang, Yuxing Long, Zhuoyuan Yu, Zihan Yang, Minghan Wang, Jiapeng Xu, Yihan Wang, Ziyan Yu, Wenzhe Cai, Lei Kang, Hao DongWed, 11 Ma🤖 cs.AI

Latent Speech-Text Transformer

The Latent Speech-Text Transformer (LST) improves the efficiency and performance of auto-regressive speech-text models by aggregating speech tokens into latent patches, which aligns sequence granularity with text, reduces computational costs, and achieves significant accuracy gains across speech and text benchmarks.

Yen-Ju Lu, Yashesh Gaur, Wei Zhou, Benjamin Muller, Jesus Villalba, Najim Dehak, Luke Zettlemoyer, Gargi Ghosh, Mike Lewis, Srinivasan Iyer, Duc LeWed, 11 Ma🤖 cs.AI

v-HUB: A Benchmark for Video Humor Understanding from Vision and Sound

This paper introduces v-HUB, a novel benchmark for video humor understanding based on non-verbal short videos with rich annotations, which reveals that current multimodal large language models struggle with visual humor alone but show improved performance when environmental audio is incorporated.

Zhengpeng Shi, Yanpeng Zhao, Jianqun Zhou, Yuxuan Wang, Qinrong Cui, Wei Bi, Songchun Zhu, Bo Zhao, Zilong ZhengWed, 11 Ma🤖 cs.AI

VSSFlow: Unifying Video-conditioned Sound and Speech Generation via Joint Learning

VSSFlow introduces a unified flow-matching framework that seamlessly integrates Video-to-Sound and Visual Text-to-Speech generation through a disentangled condition aggregation mechanism, demonstrating that joint learning can surpass specialized state-of-the-art baselines without performance degradation.

Xin Cheng, Yuyue Wang, Xihua Wang, Yihan Wu, Kaisi Guan, Yijing Chen, Peng Zhang, Xiaojiang Liu, Meng Cao, Ruihua SongWed, 11 Ma🤖 cs.AI

Reasoning Efficiently Through Adaptive Chain-of-Thought Compression: A Self-Optimizing Framework

This paper introduces SEER, a self-optimizing framework that adaptively compresses Chain-of-Thought reasoning to significantly reduce computational costs and latency while improving accuracy and robustness in software engineering and mathematical tasks.

Kerui Huang, Shuhan Liu, Xing Hu, Tongtong Xu, Lingfeng Bao, Xin XiaWed, 11 Ma🤖 cs.AI

TaoSR1: The Thinking Model for E-commerce Relevance Search

TaoSR1 is a novel framework that enables the direct deployment of Large Language Models for e-commerce relevance search by employing a three-stage training pipeline—incorporating Chain-of-Thought fine-tuning, DPO, and GRPO—to overcome reasoning errors and hallucinations while achieving superior performance in both offline benchmarks and online human evaluations.

Chenhe Dong, Shaowei Yao, Pengkun Jiao, Jianhui Yang, Yiming Jin, Zerui Huang, Xiaojiang Zhou, Dan Ou, Haihong Tang, Bo ZhengWed, 11 Ma🤖 cs.AI

OPENXRD: A Comprehensive Benchmark Framework for LLM/MLLM XRD Question Answering

The paper introduces OPENXRD, a comprehensive benchmark framework featuring 217 expert-curated X-ray diffraction questions that evaluates how large language and multimodal models assimilate domain-specific context, revealing that mid-sized models benefit most from high-quality reference materials while very large models often exhibit saturation or interference.

Ali Vosoughi, Ayoub Shahnazari, Yufeng Xi, Zeliang Zhang, Griffin Hess, Chenliang Xu, Niaz AbdolrahimWed, 11 Ma🤖 cs.AI

ConLID: Supervised Contrastive Learning for Low-Resource Language Identification

The paper proposes ConLID, a supervised contrastive learning approach that learns domain-invariant representations to significantly improve language identification performance for low-resource languages on out-of-domain data while maintaining accuracy for high-resource languages.

Negar Foroutan, Jakhongir Saydaliev, Ye Eun Kim, Antoine BosselutWed, 11 Ma🤖 cs.AI

UltraEdit: Training-, Subject-, and Memory-Free Lifelong Editing in Language Models

The paper introduces UltraEdit, a training-, subject-, and memory-free approach for lifelong language model editing that achieves unprecedented scalability and efficiency by computing parameter shifts in a single step, enabling 7B models to be edited on consumer GPUs with over 2 million updates while outperforming existing methods in speed, memory usage, and accuracy.

Xiaojie Gu, Ziying Huang, Jia-Chen Gu, Kai ZhangWed, 11 Ma🤖 cs.AI

Stepwise Guided Policy Optimization: Coloring your Incorrect Reasoning in GRPO

This paper introduces Stepwise Guided Policy Optimization (SGPO), a framework that enhances Group Relative Policy Optimization (GRPO) by utilizing a step-wise judge model to provide learning signals from all-negative sample groups, thereby enabling large language models to learn from incorrect reasoning and improving performance across various reasoning benchmarks.

Peter Chen, Xiaopeng Li, Ziniu Li, Xi Chen, Tianyi LinWed, 11 Ma🤖 cs.AI

GateLens: A Reasoning-Enhanced LLM Agent for Automotive Software Release Analytics

GateLens is a reasoning-enhanced LLM agent that utilizes Relational Algebra as a formal intermediate representation to bridge the gap between natural language and executable code, enabling fast, transparent, and highly accurate analysis of complex tabular data in automotive software release analytics without requiring few-shot examples or complex agent orchestration.

Arsham Gholamzadeh Khoee, Shuai Wang, Robert Feldt, Dhasarathy Parthasarathy, Yinan YuWed, 11 Ma🤖 cs.AI

MKE-Coder: Multi-Axial Knowledge with Evidence Verification in ICD Coding for Chinese EMRs

This paper presents MKE-Coder, a novel framework that improves automatic ICD coding for Chinese electronic medical records by integrating multi-axial disease knowledge with a clinical evidence verification module to address challenges in information extraction and code validity.

Xinxin You, Xien Liu, Xue Yang, Ziyi Wang, Ji WuWed, 11 Ma🤖 cs.AI

Robust Training of Neural Networks at Arbitrary Precision and Sparsity

This paper introduces a unified framework that models quantization and sparsification as additive noise to derive a principled, noise-corrective gradient path, enabling the stable training of neural networks at arbitrary low precisions and sparsity levels without relying on heuristic estimators like the Straight-Through Estimator.

Chengxi Ye, Grace Chu, Yanfeng Liu, Yichi Zhang, Lukasz Lew, Li Zhang, Mark Sandler, Andrew HowardWed, 11 Ma🤖 cs.AI

From Self-Evolving Synthetic Data to Verifiable-Reward RL: Post-Training Multi-turn Interactive Tool-Using Agents

This paper introduces EigenData, a unified framework that combines a self-evolving multi-agent system for synthesizing verifiable tool-use dialogues with a verifier-based reinforcement learning recipe, enabling scalable post-training of interactive agents that achieve state-of-the-art performance on complex multi-turn benchmarks without relying on expensive human annotation.

Jiaxuan Gao, Jiaao Chen, Chuyi He, Shusheng Xu, Di Jin, Yi WuWed, 11 Ma🤖 cs.AI

← Previous Next →