Tracking Cancer Through Text: Longitudinal Extraction From Radiology Reports Using Open-Source Large Language Models

This paper presents a fully open-source, locally deployable pipeline using the Qwen2.5-72B model to accurately extract and link longitudinal tumor burden data from radiology reports in compliance with RECIST criteria, demonstrating that privacy-preserving open-source large language models can achieve clinically meaningful performance in oncology.

Luc Builtjes, Alessa HeringWed, 11 Ma💬 cs.CL

Modelling the Diachronic Emergence of Phoneme Frequency Distributions

This paper demonstrates that key statistical regularities in phoneme frequency distributions, such as exponential-tailed patterns and the inverse relationship between inventory size and relative entropy, can emerge naturally from a stochastic model of diachronic sound change incorporating functional load and a stabilizing preference for inventory size, rather than requiring explicit optimization mechanisms.

Fermín Moscoso del Prado Martín, Suchir SalhanWed, 11 Ma💬 cs.CL

Reading, Not Thinking: Understanding and Bridging the Modality Gap When Text Becomes Pixels in Multimodal LLMs

This paper systematically diagnoses the performance gap between text and image inputs in multimodal LLMs, revealing that visual text primarily amplifies reading errors rather than reasoning failures, and proposes a self-distillation method that effectively bridges this gap by training models on their own text-based reasoning traces paired with image inputs.

Kaiser Sun, Xiaochuang Yuan, Hongjun Liu, Chen Zhao, Cheng Zhang, Mark Dredze, Fan BaiWed, 11 Ma💬 cs.CL

Learning When to Sample: Confidence-Aware Self-Consistency for Efficient LLM Chain-of-Thought Reasoning

This paper proposes a confidence-aware self-consistency framework that adaptively selects between single-path and multi-path reasoning based on features from a single trajectory, achieving comparable accuracy to multi-path baselines while reducing token usage by up to 80% without additional fine-tuning.

Juming Xiong, Kevin Guo, Congning Ni, Chao Yan, Katherine Brown, Avinash Baidya, Xiang Gao, Bradley Marlin, Zhijun YinWed, 11 Ma💬 cs.CL

Automated Thematic Analysis for Clinical Qualitative Data: Iterative Codebook Refinement with Full Provenance

This paper presents an automated thematic analysis framework that combines iterative codebook refinement with full provenance tracking to significantly improve the scalability, reproducibility, and expert alignment of qualitative clinical data analysis compared to existing baselines.

Seungjun Yi, Joakim Nguyen, Huimin Xu, Terence Lim, Joseph Skrovan, Mehak Beri, Hitakshi Modi, Andrew Well, Carlos M. Mery, Yan Zhang, Mia K. Markey, Ying DingWed, 11 Ma💬 cs.CL

SciTaRC: Benchmarking QA on Scientific Tabular Data that Requires Language Reasoning and Complex Computation

The paper introduces SciTaRC, an expert-authored benchmark demonstrating that current state-of-the-art AI models struggle significantly with scientific tabular questions requiring both deep language reasoning and complex computation due to a universal "execution bottleneck" where models fail to faithfully execute plans despite having correct strategies.

Hexuan Wang, Yaxuan Ren, Srikar Bommireddypalli, Shuxian Chen, Adarsh Prabhudesai, Rongkun Zhou, Elina Baral, Philipp KoehnWed, 11 Ma💬 cs.CL

MultiGraSCCo: A Multilingual Anonymization Benchmark with Annotations of Personal Identifiers

The paper introduces MultiGraSCCo, a multilingual benchmark containing over 2,500 annotated personal identifiers across ten languages, which was created using culturally adapted machine translation of synthetic data to facilitate the development and evaluation of anonymization systems while bypassing privacy regulations associated with real patient data.

Ibrahim Baroud, Christoph Otto, Vera Czehmann, Christine Hovhannisyan, Lisa Raithel, Sebastian Möller, Roland RollerWed, 11 Ma💬 cs.CL