AdaFuse: Accelerating Dynamic Adapter Inference via Token-Level Pre-Gating and Fused Kernel Optimization

AdaFuse is a framework that accelerates dynamic adapter inference in Large Language Models by employing a token-level pre-gating strategy to enable a single global routing decision, which is then executed via a custom fused CUDA kernel to reduce decoding latency by over 2.4x while maintaining accuracy.

Qiyang Li, Rui Kong, Yuchen Li, Hengyi Cai, Shuaiqiang Wang, Linghe Kong, Guihai Chen, Dawei Yin2026-03-13🤖 cs.AI

Bielik-Minitron-7B: Compressing Large Language Models via Structured Pruning and Knowledge Distillation for the Polish Language

This paper introduces Bielik-Minitron-7B, a compressed 7.35B-parameter Polish language model created by applying structured pruning and knowledge distillation to the Bielik-11B-v3.0 model, which achieves a 33.4% parameter reduction and up to 50% inference speedup while retaining approximately 90% of the original model's performance.

Remigiusz Kinas, Paweł Kiszczak, Sergio P. Perez, Krzysztof Ociepa, Łukasz Flis, Krzysztof Wróbel, Adrian Gwozdziej2026-03-13💬 cs.CL

Think While Watching: Online Streaming Segment-Level Memory for Multi-Turn Video Reasoning in Multimodal Large Language Models

The paper proposes "Think While Watching," a memory-anchored streaming framework that enables efficient multi-turn video reasoning in multimodal large language models by preserving segment-level memory and overlapping perception with generation, thereby significantly improving accuracy on streaming benchmarks while reducing output tokens.

Lu Wang (The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China), Zhuoran Jin (The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China), Yupu Hao (The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China), Yubo Chen (The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China), Kang Liu (The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China), Yulong Ao (Beijing Academy of Artificial Intelligence), Jun Zhao (The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China)2026-03-13💬 cs.CL

MobileKernelBench: Can LLMs Write Efficient Kernels for Mobile Devices?

This paper introduces MobileKernelBench, a framework revealing that current LLMs struggle to generate efficient mobile kernels due to compilation failures and hallucinations, and proposes MoKA, a multi-agent system that significantly improves compilation success and kernel performance.

Xingze Zou, Jing Wang, Yuhua Zheng, Xueyi Chen, Haolei Bai, Lingcheng Kong, Syed A. R. Abu-Bakar, Zhaode Wang, Chengfei Lv, Haoji Hu, Huan Wang2026-03-13🤖 cs.LG

Fair Learning for Bias Mitigation and Quality Optimization in Paper Recommendation

This paper introduces Fair-PaperRec, an MLP-based model that effectively mitigates demographic biases in paper recommendations by penalizing disparities through intersectional criteria and a customized fairness loss, achieving a significant increase in underrepresented group participation while simultaneously improving overall utility without compromising academic rigor.

Uttamasha Anjally Oyshi, Susan Gauch2026-03-13🤖 cs.AI

Prototype-Based Knowledge Guidance for Fine-Grained Structured Radiology Reporting

The paper proposes ProtoSR, a prototype-based framework that leverages an instruction-tuned LLM to extract visual prototypes from free-text radiology reports, thereby injecting unstructured knowledge to significantly improve fine-grained structured report generation and achieve state-of-the-art performance on the Rad-ReStruct benchmark.

Chantal Pellegrini, Adrian Delchev, Ege Özsoy, Nassir Navab, Matthias Keicher2026-03-13🤖 cs.AI

Effective Resistance Rewiring: A Simple Topological Correction for Over-Squashing

This paper introduces Effective Resistance Rewiring (ERR), a parameter-free topological correction method that iteratively optimizes graph connectivity by adding and removing edges based on global effective resistance to alleviate over-squashing in Graph Neural Networks, while demonstrating that combining this approach with normalization techniques effectively balances the trade-off between improved long-range signal propagation and oversmoothing.

Bertran Miquel-Oliver, Manel Gil-Sorribes, Victor Guallar, Alexis Molina2026-03-13🤖 cs.LG

Delayed Backdoor Attacks: Exploring the Temporal Dimension as a New Attack Surface in Pre-Trained Models

This paper introduces Delayed Backdoor Attacks (DBA), a novel threat paradigm that decouples trigger exposure from malicious activation via a temporal dimension, enabling the use of common words as triggers and demonstrating the feasibility of the DND prototype which remains dormant before achieving near-perfect attack success rates while evading current defenses.

Zikang Ding, Haomiao Yang, Meng Hao, Wenbo Jiang, Kunlan Xiang, Runmeng Du, Yijing Liu, Ruichen Zhang, Dusit Niyato2026-03-13🤖 cs.AI

Learning Transferable Sensor Models via Language-Informed Pretraining

This paper introduces SLIP, an open-source framework that leverages language-informed pretraining with a flexible patch-embedder and cross-attention mechanism to learn transferable sensor representations capable of handling diverse configurations and achieving superior zero-shot performance in classification, captioning, and question answering across 11 datasets.

Yuliang Chen, Arvind Pillai, Yu Yvonne Wu, Tess Z. Griffin, Lisa Marsch, Michael V. Heinz, Nicholas C. Jacobson, Andrew Campbell2026-03-13🤖 cs.AI

Normative Common Ground Replication (NormCoRe): Replication-by-Translation for Studying Norms in Multi-agent AI

This paper introduces NormCoRe, a novel methodological framework that systematically translates human subject experiments into multi-agent AI environments to study normative coordination, demonstrating through a distributive justice replication that AI agents' normative judgments differ from human baselines and are sensitive to foundation model and persona choices.

Luca Deck, Simeon Allmendinger, Lucas Müller, Niklas Kühl2026-03-13🤖 cs.AI

HomeSafe-Bench: Evaluating Vision-Language Models on Unsafe Action Detection for Embodied Agents in Household Scenarios

This paper introduces HomeSafe-Bench, a comprehensive benchmark for evaluating unsafe action detection in household scenarios using 438 diverse cases, and proposes HD-Guard, a hierarchical dual-brain architecture that effectively balances real-time inference efficiency with deep multimodal reasoning safety monitoring.

Jiayue Pu, Zhongxiang Sun, Zilu Zhang, Xiao Zhang, Jun Xu2026-03-13🤖 cs.AI

LABSHIELD: A Multimodal Benchmark for Safety-Critical Reasoning and Planning in Scientific Laboratories

This paper introduces LABSHIELD, a multimodal benchmark grounded in OSHA and GHS standards that evaluates the safety awareness and reasoning capabilities of large language models in laboratory settings, revealing a significant performance gap in hazard identification and safety-critical planning compared to general-domain tasks.

Qianpu Sun, Xiaowei Chi, Yuhan Rui, Ying Li, Kuangzhi Ge, Jiajun Li, Sirui Han, Shanghang Zhang2026-03-13🤖 cs.AI

BTZSC: A Benchmark for Zero-Shot Text Classification Across Cross-Encoders, Embedding Models, Rerankers and LLMs

This paper introduces BTZSC, a comprehensive benchmark of 22 datasets designed to systematically evaluate and compare the zero-shot text classification capabilities of NLI cross-encoders, embedding models, rerankers, and instruction-tuned LLMs, revealing that modern rerankers currently achieve state-of-the-art performance while embedding models offer the best accuracy-latency trade-off.

Ilias Aarab2026-03-13💬 cs.CL