cs.LG papers | Gist.Science

How Large Language Models Get Stuck: Early structure with persistent errors

This paper investigates how Large Language Models trained on the BabyLM dataset often fail to learn specific grammatical rules because early erroneous biases, driven by misleading bigram statistics, become entrenched and persist throughout training, hindering efficient learning.

Alokesh Manna, William Snyder, Whitney Tabor2026-03-12💬 cs.CL

CARE: Towards Clinical Accountability in Multi-Modal Medical Reasoning with an Evidence-Grounded Agentic Framework

The paper introduces CARE, an evidence-grounded agentic framework that enhances clinical accountability and reasoning accuracy in multi-modal medical AI by decomposing tasks into specialized modules for entity proposal, pixel-level localization, and evidence-based reasoning, thereby outperforming state-of-the-art models on medical VQA benchmarks.

Yuexi Du, Jinglu Wang, Shujie Liu, Nicha C. Dvornek, Yan Lu2026-03-12🤖 cs.AI

CFG-Ctrl: Control-Based Classifier-Free Diffusion Guidance

This paper introduces CFG-Ctrl, a unified framework that reinterprets Classifier-Free Guidance as a control problem and proposes Sliding Mode Control CFG (SMC-CFG) to overcome the instability and overshooting of linear methods by enforcing nonlinear feedback for improved semantic alignment and robustness across various guidance scales.

Hanyang Wang, Yiyang Liu, Jiawei Chi, Fangfu Liu, Ran Xue, Yueqi Duan2026-03-12🤖 cs.LG

One Model, Many Skills: Parameter-Efficient Fine-Tuning for Multitask Code Analysis

This paper presents the first comprehensive evaluation of parameter-efficient fine-tuning (PEFT) for multitask code analysis, demonstrating that a single shared PEFT module can match or surpass full fine-tuning performance while significantly reducing computational and storage costs, provided that tasks are strategically grouped based on factors like complementarity and stability.

Amal Akli, Maxime Cordy, Mike Papadakis, Yves Le Traon2026-03-12💻 cs

Explainable LLM Unlearning Through Reasoning

This paper proposes Targeted Reasoning Unlearning (TRU), a novel framework that utilizes a reasoning-based unlearning target to guide models in precisely removing specific undesirable knowledge while preserving general capabilities and enhancing robustness against attacks.

Junfeng Liao, Qizhou Wang, Shanshan Ye, Xin Yu, Ling Chen, Zhen Fang2026-03-12🤖 cs.LG

MoE-SpAc: Efficient MoE Inference Based on Speculative Activation Utility in Heterogeneous Edge Scenarios

MoE-SpAc is an efficient inference framework for Mixture-of-Experts models on heterogeneous edge devices that repurposes speculative decoding as a predictive sensor for memory management, achieving significant throughput improvements through dynamic workload balancing and asynchronous execution.

Shuhuai Li, Jianghao Lin, Dongdong Ge, Yinyu Ye2026-03-12🤖 cs.LG

Evolving Demonstration Optimization for Chain-of-Thought Feature Transformation

This paper proposes a closed-loop framework that optimizes Large Language Model-driven Feature Transformation by evolving and selecting diverse, task-verified transformation trajectories via chain-of-thought reasoning, thereby outperforming existing methods in generating effective feature operators for downstream predictive tasks.

Xinyuan Wang, Kunpeng Liu, Arun Vignesh Malarkkan, Yanjie Fu2026-03-12💬 cs.CL

TAMUSA-Chat: A Domain-Adapted Large Language Model Conversational System for Research and Responsible Deployment

This paper introduces TAMUSA-Chat, a research-oriented framework that enables academic institutions to build responsible, domain-adapted conversational AI systems through supervised fine-tuning, retrieval-augmented generation, and systematic evaluation, while providing a publicly available codebase to support reproducible experimentation and ethical deployment.

Izzat Alsmadi, Anas Alsobeh2026-03-12💬 cs.CL

There Are No Silly Questions: Evaluation of Offline LLM Capabilities from a Turkish Perspective

This study evaluates the robustness and pedagogical safety of offline large language models for Turkish heritage language education using a custom anomaly suite, finding that reasoning-oriented models in the 8B–14B parameter range offer the optimal balance between cost and safety while demonstrating that anomaly resistance is not strictly dependent on model scale.

Edibe Yilmaz, Kahraman Kostas2026-03-12💬 cs.CL

Beyond the Prompt in Large Language Models: Comprehension, In-Context Learning, and Chain-of-Thought

This paper provides a theoretical framework explaining how Large Language Models achieve semantic prompt comprehension, In-Context Learning, and Chain-of-Thought reasoning by inferring transition probabilities, reducing prompt ambiguity, and decomposing complex tasks into simpler sub-problems, respectively, thereby offering novel insights into the statistical superiority of advanced prompt engineering techniques.

Yuling Jiao, Yanming Lai, Huazhen Lin, Wensen Ma, Houduo Qi, Defeng Sun2026-03-12💬 cs.CL

Leveraging Wikidata for Geographically Informed Sociocultural Bias Dataset Creation: Application to Latin America

This paper introduces LatamQA, a geographically informed sociocultural bias dataset of over 26,000 multilingual multiple-choice questions derived from Wikidata and Wikipedia, which reveals that current large language models exhibit significant performance disparities across Latin American countries, favoring Iberian Spanish culture and their original training languages.

Yannis Karmim (ALMAnaCH), Renato Pino (UCHILE), Hernan Contreras (UCHILE), Hernan Lira (CENIA), Sebastian Cifuentes (CENIA), Simon Escoffier (PUC), Luis Martí (UP4, ALPAGE), Djamé Seddah (UP4, ALPAGE), Valentin Barrière (UCHILE, CENIA)2026-03-12💬 cs.CL

SpreadsheetArena: Decomposing Preference in LLM Generation of Spreadsheet Workbooks

This paper introduces SpreadsheetArena, a platform for evaluating large language models' end-to-end spreadsheet generation capabilities through blind pairwise comparisons, revealing that while models can produce functional workbooks, they often fail to align with domain-specific best practices and that user preferences vary significantly across different use cases.

Srivatsa Kundurthy, Clara Na, Michael Handley, Zach Kirshner, Chen Bo Calvin Zhang, Manasi Sharma, Emma Strubell, John Ling2026-03-12💬 cs.CL

Probing the Limits of the Lie Detector Approach to LLM Deception

This paper challenges the assumption that LLM deception is coextensive with lying by demonstrating that models can successfully deceive through misleading non-falsities that current truth probes fail to detect, thereby revealing a critical blind spot in mechanistic deception detection.

Tom-Felix Berger2026-03-12💬 cs.CL

GATech at AbjadGenEval Shared Task: Multilingual Embeddings for Arabic Machine-Generated Text Classification

The GATech team's approach to the AbjadGenEval shared task utilized a fine-tuned multilingual E5-large encoder with simple mean pooling to achieve an F1 score of 0.75 for detecting AI-generated Arabic text, finding that this stable baseline outperformed complex pooling strategies likely due to data limitations and a distinct length difference between human-written and machine-generated texts.

Ahmed Khaled Khamis2026-03-12💬 cs.CL

GATech at AbjadMed: Bidirectional Encoders vs. Causal Decoders: Insights from 82-Class Arabic Medical Classification

This paper demonstrates that fine-tuned bidirectional encoders, specifically a hybrid AraBERTv2 architecture, significantly outperform large-scale causal decoders in the challenging task of 82-class Arabic medical text classification by better capturing global semantic context despite data imbalance and label noise.

Ahmed Khaled Khamis2026-03-12💬 cs.CL

Personalized Group Relative Policy Optimization for Heterogenous Preference Alignment

This paper introduces Personalized Group Relative Policy Optimization (P-GRPO), a novel framework that improves alignment with diverse individual preferences by decoupling advantage estimation from batch statistics and normalizing rewards against preference-group-specific histories, thereby overcoming the limitations of standard GRPO in handling heterogeneous user signals.

Jialu Wang, Heinrich Peters, Asad A. Butt, Navid Hashemi, Alireza Hashemi, Pouya M. Ghari, Joseph Hoover, James Rae, Morteza Dehghani2026-03-12🤖 cs.LG

Defining AI Models and AI Systems: A Framework to Resolve the Boundary Problem

This paper addresses the regulatory ambiguity surrounding "AI models" and "AI systems" by proposing clear conceptual and operational definitions that distinguish trained parameters from broader system components, thereby facilitating the precise allocation of obligations across the AI value chain.

Yuanyuan Sun, Timothy Parker, Lara Gierschmann, Sana Shams, Teo Canmetin, Mathieu Duteil, Rokas Gipiškis, Ze Shen Chin2026-03-12🤖 cs.AI

LWM-Temporal: Sparse Spatio-Temporal Attention for Wireless Channel Representation Learning

LWM-Temporal is a task-agnostic foundation model for wireless channels that leverages a novel Sparse Spatio-Temporal Attention mechanism and physics-informed pretraining to learn universal, geometry-consistent embeddings, achieving superior performance in channel prediction across diverse mobility regimes with limited fine-tuning data.

Sadjad Alikhani, Akshay Malhotra, Shahab Hamidi-Rad, Ahmed Alkhateeb2026-03-12🤖 cs.LG

HTM-EAR: Importance-Preserving Tiered Memory with Hybrid Routing under Saturation

HTM-EAR is a hierarchical tiered memory system that combines HNSW-based working memory with archival storage, importance-aware eviction, and hybrid routing to effectively preserve essential information and maintain high retrieval precision under sustained saturation, significantly outperforming traditional LRU approaches while approaching the performance of unbounded oracle memory.

Shubham Kumar Singh2026-03-12🤖 cs.AI

Tureis: Transformer-based Unified Resilience for IoT Devices in Smart Homes

Tureis is a self-supervised, Transformer-based framework designed for edge deployment in smart homes that achieves fine-grained, multi-failure sensor localization by encoding heterogeneous sensor streams into compact features and using a masked reconstruction objective to identify faulty devices without requiring labeled data or human intervention.

Alireza Borhani, Vafa Andalibi, Bahar Asgari2026-03-12💻 cs

← Previous Next →