cs.AI papers | Gist.Science

Optimizing Language Models for Crosslingual Knowledge Consistency

This paper introduces Direct Consistency Optimization (DCO), a reinforcement learning-inspired method that significantly improves crosslingual knowledge consistency in large language models by deriving a structured reward function directly from the model itself, thereby eliminating the need for an explicit reward model while outperforming existing approaches.

Tianyu Liu, Jirui Qi, Mrinmaya Sachan + 3 more2026-03-06💻 cs

Why the Brain Consolidates: Predictive Forgetting for Optimal Generalisation

This paper proposes that memory consolidation serves a computational role beyond mere stabilization, utilizing "predictive forgetting" to compress stored representations into a form that optimizes generalization by selectively retaining information that predicts future outcomes, a process necessitated by high-capacity encoding constraints and validated through simulations across diverse neural and transformer models.

Zafeirios Fountas, Adnan Oomerjee, Haitham Bou-Ammar + 2 more2026-03-06💻 cs

Hate Speech Detection using Large Language Models with Data Augmentation and Feature Enhancement

This study evaluates the impact of data augmentation and feature enhancement techniques on hate speech detection across traditional and transformer-based models, revealing that while the open-source gpt-oss-20b achieves the highest overall performance, augmentation strategies significantly boost traditional classifiers like Delta TF-IDF and that detection efficacy varies based on the interaction between dataset properties, model architecture, and enhancement methods.

Brian Jing Hong Nge, Stefan Su, Thanh Thi Nguyen + 3 more2026-03-06💻 cs

Detection of Illicit Content on Online Marketplaces using Large Language Models

This research demonstrates that fine-tuned Large Language Models, particularly Llama 3.2, significantly outperform traditional machine learning and baseline transformer models in detecting and classifying complex, multilingual illicit content on online marketplaces, offering a scalable solution for enhanced platform safety and law enforcement.

Quoc Khoa Tran, Thanh Thi Nguyen, Campbell Wilson2026-03-06💻 cs

When Denoising Hinders: Revisiting Zero-Shot ASR with SAM-Audio and Whisper

This paper demonstrates that applying the SAM-Audio speech enhancement model as a preprocessing step for zero-shot ASR with Whisper consistently degrades recognition accuracy despite improving perceptual audio quality, revealing a fundamental mismatch between human-perceived signal cleanliness and machine recognition robustness.

Akif Islam, Raufun Nahar, Md. Ekramul Hamid2026-03-06💻 cs

Probabilistic Dreaming for World Models

This paper introduces "Probabilistic Dreaming," a novel enhancement to the Dreamer world model that utilizes probabilistic methods to enable parallel latent exploration and maintain distinct hypotheses for mutually exclusive futures, resulting in a 4.5% performance improvement and 28% variance reduction on the MPE SimpleTag domain.

Gavin Wong2026-03-06💻 cs

AI-Assisted Moot Courts: Simulating Justice-Specific Questioning in Oral Arguments

This paper proposes a two-layer evaluation framework to assess AI models' ability to simulate justice-specific questioning in moot courts, finding that while models generate realistic questions that cover key legal issues, they still struggle with diversity and sycophancy—shortcomings that naive evaluation methods would miss.

Kylie Zhang, Nimra Nadeem, Lucia Zheng + 2 more2026-03-06💻 cs

Model Medicine: A Clinical Framework for Understanding, Diagnosing, and Treating AI Models

This paper introduces "Model Medicine," a comprehensive clinical research program that treats AI models as biological-like organisms by establishing a taxonomy of subdisciplines, a behavioral genetics framework, and novel diagnostic tools like Neural MRI to systematically understand, diagnose, and treat model disorders.

Jihoon Jeong2026-03-06💻 cs

From Offline to Periodic Adaptation for Pose-Based Shoplifting Detection in Real-world Retail Security

This paper presents a periodic adaptation framework for unsupervised, pose-based shoplifting detection on edge IoT devices, validated by a new large-scale real-world dataset (RetailS) and demonstrating superior performance and rapid update times compared to offline baselines.

Shanle Yao, Narges Rashvand, Armin Danesh Pazho + 1 more2026-03-06💻 cs

Are Multimodal LLMs Ready for Surveillance? A Reality Check on Zero-Shot Anomaly Detection in the Wild

This paper evaluates the viability of zero-shot Multimodal LLMs for real-world video anomaly detection, revealing that while prompt engineering can significantly improve F1-scores, a persistent conservative bias toward the "normal" class severely limits recall, highlighting a critical gap between current MLLM capabilities and the demands of practical surveillance.

Shanle Yao, Armin Danesh Pazho, Narges Rashvand + 1 more2026-03-06💻 cs

Solving an Open Problem in Theoretical Physics using AI-Assisted Discovery

This paper presents a neuro-symbolic AI system that autonomously solved an open problem in theoretical physics by deriving novel, exact analytical solutions for the gravitational radiation power spectrum of cosmic strings, thereby surpassing previous partial results and demonstrating AI's potential to accelerate mathematical discovery.

Michael P. Brenner, Vincent Cohen-Addad, David Woodruff2026-03-06💻 cs

Interactive Benchmarks

This paper proposes "Interactive Benchmarks," a unified evaluation paradigm that assesses model intelligence through active information acquisition and reasoning under budget constraints in interactive proofs and games, demonstrating that current models still have significant room for improvement in these dynamic scenarios.

Baoqing Yue, Zihan Zhu, Yifan Zhang + 3 more2026-03-06💻 cs

Memory as Ontology: A Constitutional Memory Architecture for Persistent Digital Citizens

This paper challenges the prevailing view of AI memory as a mere functional tool by proposing the "Memory-as-Ontology" paradigm and its associated "Animesis" system, which prioritizes identity continuity and governance over retrieval performance to enable persistent digital citizens that survive model transitions.

Zhenghui Li2026-03-06✓ Author reviewed ⓘ💻 cs

CONE: Embeddings for Complex Numerical Data Preserving Unit and Variable Semantics

This paper introduces CONE, a hybrid transformer encoder that utilizes a novel composite embedding algorithm to preserve the semantics of units and variables for complex numerical data, achieving state-of-the-art performance in numerical reasoning tasks across diverse domains.

Gyanendra Shrestha, Anna Pyayt, Michael Gubanov2026-03-06💻 cs

DARE: Aligning LLM Agents with the R Statistical Ecosystem via Distribution-Aware Retrieval

The paper introduces DARE, a lightweight retrieval model that integrates data distribution information with function metadata to significantly improve R package retrieval and LLM agent performance in statistical analysis tasks, supported by a new knowledge base and evaluation framework.

Maojun Sun, Yue Wu, Yifei Xie + 5 more2026-03-06💻 cs

Visioning Human-Agentic AI Teaming: Continuity, Tension, and Future Research

This paper argues that the rise of agentic AI necessitates a reconceptualization of Team Situation Awareness to address the structural uncertainty of open-ended agency, distinguishing between continuous alignment mechanisms and emerging tensions to guide future human-AI teaming research.

Bowen Lou, Tian Lu, T. S. Raghu + 1 more2026-03-06💻 cs

HiMAP-Travel: Hierarchical Multi-Agent Planning for Long-Horizon Constrained Travel

HiMAP-Travel is a hierarchical multi-agent framework that addresses long-horizon travel planning with hard constraints by employing a Coordinator and parallel Day Executors linked through transactional monitoring and bargaining protocols, achieving state-of-the-art performance and reduced latency on TravelPlanner and FlexTravelBench benchmarks.

The Viet Bui, Wenjun Li, Yong Liu2026-03-06💻 cs

Evaluating the Search Agent in a Parallel World

To overcome the limitations of existing benchmarks for evaluating Search Agents—such as high construction costs, dynamic obsolescence, attribution ambiguity, and reproducibility issues—this paper introduces Mind-ParaWorld, a novel framework that simulates a parallel world with synthetic, time-shifted scenarios and atomic facts to rigorously assess agents' evidence collection, synthesis, and stopping decisions.

Jiawei Chen, Xintian Shen, Lihao Zheng + 7 more2026-03-06💻 cs

MOOSEnger -- a Domain-Specific AI Agent for the MOOSE Ecosystem

MOOSEnger is a domain-specific AI agent that combines retrieval-augmented generation with deterministic, MOOSE-aware parsing and execution tools to automatically convert natural language into validated simulation inputs, achieving a 93% execution success rate on a diverse benchmark compared to just 8% for an LLM-only baseline.

Mengnan Li, Jason Miller, Zachary Prince + 2 more2026-03-06💻 cs

Stacked from One: Multi-Scale Self-Injection for Context Window Extension

The paper proposes SharedLLM, a novel framework that extends the context window of large language models to over 128K tokens using a multi-scale self-injection architecture with stacked short-context models and a tree-based retrieval structure, achieving superior performance and efficiency without requiring costly long-context pre-training.

Wei Han, Pan Zhou, Shuicheng Yan2026-03-06💻 cs

← Previous Next →