Measuring and Eliminating Refusals in Military Large Language Models

This paper introduces a novel gold-standard dataset developed by US military veterans to quantify excessive safety refusals in military Large Language Models, demonstrating that while specialized fine-tuning can significantly reduce these refusals, achieving zero refusals and maximum accuracy requires deeper, end-to-end specialization.

Jack FitzGerald, Dylan Bates, Aristotelis Lazaridis, Aman Sharma, Vincent Lu, Brian King, Yousif Azami, Sean Bailey, Jeremy Cao, Peter Damianov, Kevin de Haan, Joseph Madigan, Jeremy McLaurin, Luke Kerbs, Jonathan Tainer, Dave Anderson, Jonathan Beck, Jamie Cuticello, Colton Malkerson, Tyler Saltsman2026-03-12💬 cs.CL

Assessing Cognitive Biases in LLMs for Judicial Decision Support: Virtuous Victim and Halo Effects

This study evaluates five large language models for judicial sentencing support and finds that while they exhibit a stronger virtuous victim effect and lack a significant penalty for adjacent consent compared to humans, they generally demonstrate reduced prestige-based halo effects, particularly regarding credentials, though current variability still limits their immediate deployment in legal settings.

Sierra S. Liu2026-03-12💻 cs

Defining AI Models and AI Systems: A Framework to Resolve the Boundary Problem

This paper addresses the regulatory ambiguity surrounding "AI models" and "AI systems" by proposing clear conceptual and operational definitions that distinguish trained parameters from broader system components, thereby facilitating the precise allocation of obligations across the AI value chain.

Yuanyuan Sun, Timothy Parker, Lara Gierschmann, Sana Shams, Teo Canmetin, Mathieu Duteil, Rokas Gipiškis, Ze Shen Chin2026-03-12🤖 cs.AI

A Governance and Evaluation Framework for Deterministic, Rule-Based Clinical Decision Support in Empiric Antibiotic Prescribing

This paper proposes a governance and evaluation framework for deterministic, rule-based clinical decision support systems in empiric antibiotic prescribing that prioritizes transparency, auditability, and conservative behavior by formally separating decision logic from scope constraints and utilizing synthetic case validation to ensure behavioral alignment with predefined rules.

Francisco José Gárate, Paloma Chausa, Diego Moreno, Judit López Luque, Vicens Díaz-Brito, Enrique Javier Gómez2026-03-12🤖 cs.AI

Architecture-Aware LLM Inference Optimization on AMD Instinct GPUs: A Comprehensive Benchmark and Deployment Study

This paper presents a comprehensive benchmark of production LLM inference on AMD Instinct MI325X GPUs, demonstrating that architecture-aware optimizations—specifically the selective use of the AITER runtime and specific KV cache configurations—are critical for maximizing throughput across diverse model families while maintaining high reliability under heavy concurrency.

Athos Georgiou2026-03-12🤖 cs.AI

HTM-EAR: Importance-Preserving Tiered Memory with Hybrid Routing under Saturation

HTM-EAR is a hierarchical tiered memory system that combines HNSW-based working memory with archival storage, importance-aware eviction, and hybrid routing to effectively preserve essential information and maintain high retrieval precision under sustained saturation, significantly outperforming traditional LRU approaches while approaching the performance of unbounded oracle memory.

Shubham Kumar Singh2026-03-12🤖 cs.AI

AMB-DSGDN: Adaptive Modality-Balanced Dynamic Semantic Graph Differential Network for Multimodal Emotion Recognition

The paper proposes AMB-DSGDN, a novel network for multimodal emotion recognition that utilizes modality-specific semantic graphs with a differential attention mechanism to filter noise and an adaptive balancing strategy to prevent dominant modalities from suppressing complementary cues, thereby enhancing the accuracy of dynamic emotional state modeling.

Yunsheng Wang, Yuntao Shou, Yilong Tan, Wei Ai, Tao Meng, Keqin Li2026-03-12🤖 cs.AI

Gated Adaptation for Continual Learning in Human Activity Recognition

This paper proposes a parameter-efficient continual learning framework for Human Activity Recognition that mitigates catastrophic forgetting in domain-incremental scenarios by employing channel-wise gated modulation to adapt frozen pretrained representations through bounded diagonal scaling, thereby achieving superior stability and plasticity with minimal parameter updates.

Reza Rahimi Azghan, Gautham Krishna Gudur, Mohit Malu, Edison Thomaz, Giulia Pedrielli, Pavan Turaga, Hassan Ghasemzadeh2026-03-12🤖 cs.LG

Toward Epistemic Stability: Engineering Consistent Procedures for Industrial LLM Hallucination Reduction

This paper presents and evaluates five prompt engineering strategies for reducing LLM hallucinations in industrial settings without modifying model weights, finding that an Enhanced Data Registry (M4) achieved perfect consistency in initial trials while a revised Decomposed Model-Agnostic Prompting (M2) showed the most significant improvement in subsequent verification.

Brian Freeman, Adam Kicklighter, Matt Erdman, Zach Gordon2026-03-12🤖 cs.AI

Where Do Flow Semantics Reside? A Protocol-Native Tabular Pretraining Paradigm for Encrypted Traffic Classification

This paper addresses the failure of byte-sequence-based masked modeling in encrypted traffic classification by identifying a mismatch in inductive bias and proposing FlowSem-MAE, a protocol-native tabular masked autoencoder that leverages field-specific semantics and temporal patterns to significantly outperform existing methods with substantially less labeled data.

Sizhe Huang, Shujie Yang2026-03-12🤖 cs.AI