cs.AI papers | Gist.Science

Context Over Compute Human-in-the-Loop Outperforms Iterative Chain-of-Thought Prompting in Interview Answer Quality

This paper demonstrates through controlled experiments that a human-in-the-loop approach significantly outperforms iterative chain-of-thought prompting in improving behavioral interview answer quality, offering superior gains in confidence and authenticity with fewer iterations by prioritizing context availability over computational resources.

Kewen Zhu, Zixi Liu, Yanjing Li2026-03-12💬 cs.CL

There Are No Silly Questions: Evaluation of Offline LLM Capabilities from a Turkish Perspective

This study evaluates the robustness and pedagogical safety of offline large language models for Turkish heritage language education using a custom anomaly suite, finding that reasoning-oriented models in the 8B–14B parameter range offer the optimal balance between cost and safety while demonstrating that anomaly resistance is not strictly dependent on model scale.

Edibe Yilmaz, Kahraman Kostas2026-03-12💬 cs.CL

Empathy Is Not What Changed: Clinical Assessment of Psychological Safety Across GPT Model Generations

This study refutes the claim that newer AI models have lost empathy, demonstrating through clinical assessment that while empathetic responses remain statistically consistent across generations, users' perception of "lost empathy" actually stems from a significant shift toward heightened crisis detection and altered safety postures that make the models appear more intrusive during vulnerable moments.

Michael Keeman, Anastasia Keeman2026-03-12💬 cs.CL

Automated evaluation of LLMs for effective machine translation of Mandarin Chinese to English

This paper presents an automated evaluation framework using semantic and sentiment analysis to assess Mandarin-to-English machine translation across news and literary texts, revealing that while LLMs like GPT-4o and DeepSeek excel in news translation and semantic conservation, they still struggle with preserving cultural subtleties, classical references, and figurative expressions in literary works.

Yue Zhang, Rodney Beard, John Hawkins, Rohitash Chandra2026-03-12💬 cs.CL

A Retrieval-Augmented Language Assistant for Unmanned Aircraft Safety Assessment and Regulatory Compliance

This paper presents a retrieval-augmented language assistant designed to support unmanned aircraft safety assessments and regulatory compliance by grounding responses in authoritative sources to ensure traceable, auditable, and citation-driven decision support without replacing expert judgment.

Gabriele Immordino, Andrea Vaiuso, Marcello Righi2026-03-12💬 cs.CL

Leveraging Wikidata for Geographically Informed Sociocultural Bias Dataset Creation: Application to Latin America

This paper introduces LatamQA, a geographically informed sociocultural bias dataset of over 26,000 multilingual multiple-choice questions derived from Wikidata and Wikipedia, which reveals that current large language models exhibit significant performance disparities across Latin American countries, favoring Iberian Spanish culture and their original training languages.

Yannis Karmim (ALMAnaCH), Renato Pino (UCHILE), Hernan Contreras (UCHILE), Hernan Lira (CENIA), Sebastian Cifuentes (CENIA), Simon Escoffier (PUC), Luis Martí (UP4, ALPAGE), Djamé Seddah (UP4, ALPAGE), Valentin Barrière (UCHILE, CENIA)2026-03-12💬 cs.CL

SpreadsheetArena: Decomposing Preference in LLM Generation of Spreadsheet Workbooks

This paper introduces SpreadsheetArena, a platform for evaluating large language models' end-to-end spreadsheet generation capabilities through blind pairwise comparisons, revealing that while models can produce functional workbooks, they often fail to align with domain-specific best practices and that user preferences vary significantly across different use cases.

Srivatsa Kundurthy, Clara Na, Michael Handley, Zach Kirshner, Chen Bo Calvin Zhang, Manasi Sharma, Emma Strubell, John Ling2026-03-12💬 cs.CL

SENS-ASR: Semantic Embedding injection in Neural-transducer for Streaming Automatic Speech Recognition

The paper introduces SENS-ASR, a streaming automatic speech recognition approach that improves transcription quality under low-latency constraints by injecting semantic information extracted from past frame-embeddings via a context module trained through knowledge distillation from a fine-tuned language model.

Youness Dkhissi (LIUM), Valentin Vielzeuf (LIUM), Elys Allesiardo (LIUM), Anthony Larcher (LIUM)2026-03-12💬 cs.CL

GATech at AbjadMed: Bidirectional Encoders vs. Causal Decoders: Insights from 82-Class Arabic Medical Classification

This paper demonstrates that fine-tuned bidirectional encoders, specifically a hybrid AraBERTv2 architecture, significantly outperform large-scale causal decoders in the challenging task of 82-class Arabic medical text classification by better capturing global semantic context despite data imbalance and label noise.

Ahmed Khaled Khamis2026-03-12💬 cs.CL

Personalized Group Relative Policy Optimization for Heterogenous Preference Alignment

This paper introduces Personalized Group Relative Policy Optimization (P-GRPO), a novel framework that improves alignment with diverse individual preferences by decoupling advantage estimation from batch statistics and normalizing rewards against preference-group-specific histories, thereby overcoming the limitations of standard GRPO in handling heterogeneous user signals.

Jialu Wang, Heinrich Peters, Asad A. Butt, Navid Hashemi, Alireza Hashemi, Pouya M. Ghari, Joseph Hoover, James Rae, Morteza Dehghani2026-03-12🤖 cs.LG

FERRET: Framework for Expansion Reliant Red Teaming

The paper introduces FERRET, a multi-faceted automated red teaming framework that employs horizontal, vertical, and meta expansions to generate effective multi-modal adversarial conversations, demonstrating superior performance over existing state-of-the-art approaches.

Ninareh Mehrabi, Vitor Albiero, Maya Pavlova, Joanna Bitton2026-03-12💬 cs.CL

Measuring and Eliminating Refusals in Military Large Language Models

This paper introduces a novel gold-standard dataset developed by US military veterans to quantify excessive safety refusals in military Large Language Models, demonstrating that while specialized fine-tuning can significantly reduce these refusals, achieving zero refusals and maximum accuracy requires deeper, end-to-end specialization.

Jack FitzGerald, Dylan Bates, Aristotelis Lazaridis, Aman Sharma, Vincent Lu, Brian King, Yousif Azami, Sean Bailey, Jeremy Cao, Peter Damianov, Kevin de Haan, Joseph Madigan, Jeremy McLaurin, Luke Kerbs, Jonathan Tainer, Dave Anderson, Jonathan Beck, Jamie Cuticello, Colton Malkerson, Tyler Saltsman2026-03-12💬 cs.CL

Assessing Cognitive Biases in LLMs for Judicial Decision Support: Virtuous Victim and Halo Effects

This study evaluates five large language models for judicial sentencing support and finds that while they exhibit a stronger virtuous victim effect and lack a significant penalty for adjacent consent compared to humans, they generally demonstrate reduced prestige-based halo effects, particularly regarding credentials, though current variability still limits their immediate deployment in legal settings.

Sierra S. Liu2026-03-12💻 cs

DeliberationBench: A Normative Benchmark for the Influence of Large Language Models on Users' Views

This paper introduces DeliberationBench, a normative benchmark that evaluates the persuasive influence of large language models by comparing their effects on user opinions against the standards of deliberative democracy, finding that tested frontier models produce substantial and epistemically desirable shifts in beliefs.

Luke Hewitt, Maximilian Kroner Dale, Paul de Font-Reaulx2026-03-12💻 cs

Prompts and Prayers: the Rise of GPTheology

This paper introduces the concept of "GPTheology" to explore the emerging phenomenon of Large Language Models being treated as divine oracles, analyzing how online narratives and real-world projects reflect the development of techno-religious belief systems that intertwine AI with traditional religious constructs.

Ioana Cheres, Adrian Groza, Ioana Moldovan, Mick O'Hara, Connell Vaughan2026-03-12💻 cs

Defining AI Models and AI Systems: A Framework to Resolve the Boundary Problem

This paper addresses the regulatory ambiguity surrounding "AI models" and "AI systems" by proposing clear conceptual and operational definitions that distinguish trained parameters from broader system components, thereby facilitating the precise allocation of obligations across the AI value chain.

Yuanyuan Sun, Timothy Parker, Lara Gierschmann, Sana Shams, Teo Canmetin, Mathieu Duteil, Rokas Gipiškis, Ze Shen Chin2026-03-12🤖 cs.AI

RedFuser: An Automatic Operator Fusion Framework for Cascaded Reductions on AI Accelerators

RedFuser is an automatic framework that employs a formal theoretical methodology to identify and fuse cascaded reduction operations into optimized single-loop kernels, achieving significant speedups over state-of-the-art AI compilers while matching the performance of hand-written solutions.

Xinsheng Tang, Yangcheng Li, Nan Wang, Zhiyi Shu, Xingyu Ling, Junna Xing, Peng Zhou, Qiang Liu2026-03-12🤖 cs.AI

A Governance and Evaluation Framework for Deterministic, Rule-Based Clinical Decision Support in Empiric Antibiotic Prescribing

This paper proposes a governance and evaluation framework for deterministic, rule-based clinical decision support systems in empiric antibiotic prescribing that prioritizes transparency, auditability, and conservative behavior by formally separating decision logic from scope constraints and utilizing synthetic case validation to ensure behavioral alignment with predefined rules.

Francisco José Gárate, Paloma Chausa, Diego Moreno, Judit López Luque, Vicens Díaz-Brito, Enrique Javier Gómez2026-03-12🤖 cs.AI

How to Count AIs: Individuation and Liability for AI Agents

This paper diagnoses the legal challenges of identifying autonomous AI agents due to their lack of physical bodies and fluid nature, proposing the "Algorithmic Corporation" (A-corp) as a novel legal entity that ties AI actions to human owners while enabling AI agents to self-organize into persistent, liable units with coherent goals.

Yonathan Arbel, Peter Salib, Simon Goldstein2026-03-12🤖 cs.AI

The DMA Streaming Framework: Kernel-Level Buffer Orchestration for High-Performance AI Data Paths

This paper introduces dmaplane, a Linux kernel module that provides explicit kernel-level buffer orchestration for high-performance AI data paths by integrating features like DMA lifecycle management, NUMA-aware allocation, and RDMA-based cross-device sharing to enable efficient, safe, and disaggregated AI inference.

Marco Graziano2026-03-12🤖 cs.AI

← Previous Next →