CARE: Towards Clinical Accountability in Multi-Modal Medical Reasoning with an Evidence-Grounded Agentic Framework

The paper introduces CARE, an evidence-grounded agentic framework that enhances clinical accountability and reasoning accuracy in multi-modal medical AI by decomposing tasks into specialized modules for entity proposal, pixel-level localization, and evidence-based reasoning, thereby outperforming state-of-the-art models on medical VQA benchmarks.

Yuexi Du, Jinglu Wang, Shujie Liu, Nicha C. Dvornek, Yan Lu2026-03-12🤖 cs.AI

One Model, Many Skills: Parameter-Efficient Fine-Tuning for Multitask Code Analysis

This paper presents the first comprehensive evaluation of parameter-efficient fine-tuning (PEFT) for multitask code analysis, demonstrating that a single shared PEFT module can match or surpass full fine-tuning performance while significantly reducing computational and storage costs, provided that tasks are strategically grouped based on factors like complementarity and stability.

Amal Akli, Maxime Cordy, Mike Papadakis, Yves Le Traon2026-03-12💻 cs

TAMUSA-Chat: A Domain-Adapted Large Language Model Conversational System for Research and Responsible Deployment

This paper introduces TAMUSA-Chat, a research-oriented framework that enables academic institutions to build responsible, domain-adapted conversational AI systems through supervised fine-tuning, retrieval-augmented generation, and systematic evaluation, while providing a publicly available codebase to support reproducible experimentation and ethical deployment.

Izzat Alsmadi, Anas Alsobeh2026-03-12💬 cs.CL

There Are No Silly Questions: Evaluation of Offline LLM Capabilities from a Turkish Perspective

This study evaluates the robustness and pedagogical safety of offline large language models for Turkish heritage language education using a custom anomaly suite, finding that reasoning-oriented models in the 8B–14B parameter range offer the optimal balance between cost and safety while demonstrating that anomaly resistance is not strictly dependent on model scale.

Edibe Yilmaz, Kahraman Kostas2026-03-12💬 cs.CL

Beyond the Prompt in Large Language Models: Comprehension, In-Context Learning, and Chain-of-Thought

This paper provides a theoretical framework explaining how Large Language Models achieve semantic prompt comprehension, In-Context Learning, and Chain-of-Thought reasoning by inferring transition probabilities, reducing prompt ambiguity, and decomposing complex tasks into simpler sub-problems, respectively, thereby offering novel insights into the statistical superiority of advanced prompt engineering techniques.

Yuling Jiao, Yanming Lai, Huazhen Lin, Wensen Ma, Houduo Qi, Defeng Sun2026-03-12💬 cs.CL

Leveraging Wikidata for Geographically Informed Sociocultural Bias Dataset Creation: Application to Latin America

This paper introduces LatamQA, a geographically informed sociocultural bias dataset of over 26,000 multilingual multiple-choice questions derived from Wikidata and Wikipedia, which reveals that current large language models exhibit significant performance disparities across Latin American countries, favoring Iberian Spanish culture and their original training languages.

Yannis Karmim (ALMAnaCH), Renato Pino (UCHILE), Hernan Contreras (UCHILE), Hernan Lira (CENIA), Sebastian Cifuentes (CENIA), Simon Escoffier (PUC), Luis Martí (UP4, ALPAGE), Djamé Seddah (UP4, ALPAGE), Valentin Barrière (UCHILE, CENIA)2026-03-12💬 cs.CL

SpreadsheetArena: Decomposing Preference in LLM Generation of Spreadsheet Workbooks

This paper introduces SpreadsheetArena, a platform for evaluating large language models' end-to-end spreadsheet generation capabilities through blind pairwise comparisons, revealing that while models can produce functional workbooks, they often fail to align with domain-specific best practices and that user preferences vary significantly across different use cases.

Srivatsa Kundurthy, Clara Na, Michael Handley, Zach Kirshner, Chen Bo Calvin Zhang, Manasi Sharma, Emma Strubell, John Ling2026-03-12💬 cs.CL

GATech at AbjadGenEval Shared Task: Multilingual Embeddings for Arabic Machine-Generated Text Classification

The GATech team's approach to the AbjadGenEval shared task utilized a fine-tuned multilingual E5-large encoder with simple mean pooling to achieve an F1 score of 0.75 for detecting AI-generated Arabic text, finding that this stable baseline outperformed complex pooling strategies likely due to data limitations and a distinct length difference between human-written and machine-generated texts.

Ahmed Khaled Khamis2026-03-12💬 cs.CL

Personalized Group Relative Policy Optimization for Heterogenous Preference Alignment

This paper introduces Personalized Group Relative Policy Optimization (P-GRPO), a novel framework that improves alignment with diverse individual preferences by decoupling advantage estimation from batch statistics and normalizing rewards against preference-group-specific histories, thereby overcoming the limitations of standard GRPO in handling heterogeneous user signals.

Jialu Wang, Heinrich Peters, Asad A. Butt, Navid Hashemi, Alireza Hashemi, Pouya M. Ghari, Joseph Hoover, James Rae, Morteza Dehghani2026-03-12🤖 cs.LG

Defining AI Models and AI Systems: A Framework to Resolve the Boundary Problem

This paper addresses the regulatory ambiguity surrounding "AI models" and "AI systems" by proposing clear conceptual and operational definitions that distinguish trained parameters from broader system components, thereby facilitating the precise allocation of obligations across the AI value chain.

Yuanyuan Sun, Timothy Parker, Lara Gierschmann, Sana Shams, Teo Canmetin, Mathieu Duteil, Rokas Gipiškis, Ze Shen Chin2026-03-12🤖 cs.AI

LWM-Temporal: Sparse Spatio-Temporal Attention for Wireless Channel Representation Learning

LWM-Temporal is a task-agnostic foundation model for wireless channels that leverages a novel Sparse Spatio-Temporal Attention mechanism and physics-informed pretraining to learn universal, geometry-consistent embeddings, achieving superior performance in channel prediction across diverse mobility regimes with limited fine-tuning data.

Sadjad Alikhani, Akshay Malhotra, Shahab Hamidi-Rad, Ahmed Alkhateeb2026-03-12🤖 cs.LG

HTM-EAR: Importance-Preserving Tiered Memory with Hybrid Routing under Saturation

HTM-EAR is a hierarchical tiered memory system that combines HNSW-based working memory with archival storage, importance-aware eviction, and hybrid routing to effectively preserve essential information and maintain high retrieval precision under sustained saturation, significantly outperforming traditional LRU approaches while approaching the performance of unbounded oracle memory.

Shubham Kumar Singh2026-03-12🤖 cs.AI