Beyond the Prompt in Large Language Models: Comprehension, In-Context Learning, and Chain-of-Thought

This paper provides a theoretical framework explaining how Large Language Models achieve semantic prompt comprehension, In-Context Learning, and Chain-of-Thought reasoning by inferring transition probabilities, reducing prompt ambiguity, and decomposing complex tasks into simpler sub-problems, respectively, thereby offering novel insights into the statistical superiority of advanced prompt engineering techniques.

Yuling Jiao, Yanming Lai, Huazhen Lin, Wensen Ma, Houduo Qi, Defeng SunThu, 12 Ma💬 cs.CL

Leveraging Wikidata for Geographically Informed Sociocultural Bias Dataset Creation: Application to Latin America

This paper introduces LatamQA, a geographically informed sociocultural bias dataset of over 26,000 multilingual multiple-choice questions derived from Wikidata and Wikipedia, which reveals that current large language models exhibit significant performance disparities across Latin American countries, favoring Iberian Spanish culture and their original training languages.

Yannis Karmim (ALMAnaCH), Renato Pino (UCHILE), Hernan Contreras (UCHILE), Hernan Lira (CENIA), Sebastian Cifuentes (CENIA), Simon Escoffier (PUC), Luis Martí (UP4, ALPAGE), Djamé Seddah (UP4, ALPAGE), Valentin Barrière (UCHILE, CENIA)Thu, 12 Ma💬 cs.CL

SpreadsheetArena: Decomposing Preference in LLM Generation of Spreadsheet Workbooks

This paper introduces SpreadsheetArena, a platform for evaluating large language models' end-to-end spreadsheet generation capabilities through blind pairwise comparisons, revealing that while models can produce functional workbooks, they often fail to align with domain-specific best practices and that user preferences vary significantly across different use cases.

Srivatsa Kundurthy, Clara Na, Michael Handley, Zach Kirshner, Chen Bo Calvin Zhang, Manasi Sharma, Emma Strubell, John LingThu, 12 Ma💬 cs.CL

Fine-Tune, Don't Prompt, Your Language Model to Identify Biased Language in Clinical Notes

This study demonstrates that fine-tuning specialized language models on domain-specific clinical data significantly outperforms prompting-based approaches for detecting biased language, highlighting the critical need for specialty-specific adaptation to accurately capture context-dependent semantic shifts in medical documentation.

Isotta Landi, Eugenia Alleva, Nicole Bussola, Rebecca M. Cohen, Sarah Nowlin, Leslee J. Shaw, Alexander W. Charney, Kimberly B. GlazerThu, 12 Ma💬 cs.CL

SENS-ASR: Semantic Embedding injection in Neural-transducer for Streaming Automatic Speech Recognition

The paper introduces SENS-ASR, a streaming automatic speech recognition approach that improves transcription quality under low-latency constraints by injecting semantic information extracted from past frame-embeddings via a context module trained through knowledge distillation from a fine-tuned language model.

Youness Dkhissi (LIUM), Valentin Vielzeuf (LIUM), Elys Allesiardo (LIUM), Anthony Larcher (LIUM)Thu, 12 Ma💬 cs.CL

Adaptive Engram Memory System for Indonesian Language Model: Generative AI Based on TOBA LM for Batak and Minang Language

This study introduces TOBA-LM, a 1.2-billion-parameter trilingual language model for Indonesian, Batak, and Minangkabau that integrates an adaptive Engram Memory mechanism to achieve significantly faster training convergence and reduced computational costs compared to conventional transformer architectures.

Hokky Situngkir, Kevin Siringoringo, Andhika Bernard LumbantobingThu, 12 Ma💬 cs.CL

GATech at AbjadGenEval Shared Task: Multilingual Embeddings for Arabic Machine-Generated Text Classification

The GATech team's approach to the AbjadGenEval shared task utilized a fine-tuned multilingual E5-large encoder with simple mean pooling to achieve an F1 score of 0.75 for detecting AI-generated Arabic text, finding that this stable baseline outperformed complex pooling strategies likely due to data limitations and a distinct length difference between human-written and machine-generated texts.

Ahmed Khaled KhamisThu, 12 Ma💬 cs.CL

Personalized Group Relative Policy Optimization for Heterogenous Preference Alignment

This paper introduces Personalized Group Relative Policy Optimization (P-GRPO), a novel framework that improves alignment with diverse individual preferences by decoupling advantage estimation from batch statistics and normalizing rewards against preference-group-specific histories, thereby overcoming the limitations of standard GRPO in handling heterogeneous user signals.

Jialu Wang, Heinrich Peters, Asad A. Butt, Navid Hashemi, Alireza Hashemi, Pouya M. Ghari, Joseph Hoover, James Rae, Morteza DehghaniThu, 12 Ma🤖 cs.LG

Measuring and Eliminating Refusals in Military Large Language Models

This paper introduces a novel gold-standard dataset developed by US military veterans to quantify excessive safety refusals in military Large Language Models, demonstrating that while specialized fine-tuning can significantly reduce these refusals, achieving zero refusals and maximum accuracy requires deeper, end-to-end specialization.

Jack FitzGerald, Dylan Bates, Aristotelis Lazaridis, Aman Sharma, Vincent Lu, Brian King, Yousif Azami, Sean Bailey, Jeremy Cao, Peter Damianov, Kevin de Haan, Joseph Madigan, Jeremy McLaurin, Luke Kerbs, Jonathan Tainer, Dave Anderson, Jonathan Beck, Jamie Cuticello, Colton Malkerson, Tyler SaltsmanThu, 12 Ma💬 cs.CL

A Principle-Driven Adaptive Policy for Group Cognitive Stimulation Dialogue for Elderly with Cognitive Impairment

This paper presents GCSD, a principle-driven adaptive policy system that leverages a large-scale real-world and simulated dataset along with four specialized modules to overcome the limitations of existing LLMs in delivering scalable, personalized, and therapeutically effective group cognitive stimulation dialogues for the elderly with cognitive impairment.

Jiyue Jiang, Yanyu Chen, Pengan Chen, Kai Liu, Jingqi Zhou, Zheyong Zhu, He Hu, Fei Ma, Qi Tian, Chuan WuThu, 12 Ma💬 cs.CL

TriageSim: A Conversational Emergency Triage Simulation Framework from Structured Electronic Health Records

The paper introduces TriageSim, a framework that generates controlled, multi-turn synthetic nurse-patient triage conversations and audio from structured electronic health records, which are validated for linguistic and medical fidelity and demonstrated to support conversational triage classification.

Dipankar Srirag, Quoc Dung Nguyen, Aditya Joshi, Padmanesan Narasimhan, Salil KanhereThu, 12 Ma💬 cs.CL