cs.AI papers | Gist.Science

The Boiling Frog Threshold: Criticality and Blindness in World Model-Based Anomaly Detection Under Gradual Drift

This paper investigates world model-based anomaly detection under gradual observation drift, revealing a universal sharp detection threshold that depends on the interaction between detector sensitivity, noise floor, and environment-specific dynamics, while identifying critical failure modes such as the undetectability of sinusoidal drift and agent collapse prior to detection.

Zhe Hong2026-03-10🤖 cs.LG

R2F: Repurposing Ray Frontiers for LLM-free Object Navigation

The paper proposes R2F, an LLM-free framework for zero-shot open-vocabulary object navigation that repurposes ray frontiers as direction-conditioned semantic hypotheses to achieve competitive performance with real-time execution, eliminating the latency and computational overhead of iterative large-model queries.

Francesco Argenziano, John Mark Alexis Marcelo, Michele Brienza, Abdel Hakim Drid, Emanuele Musumeci, Daniele Nardi, Domenico D. Bloisi, Vincenzo Suriani2026-03-10💻 cs

X-AVDT: Audio-Visual Cross-Attention for Robust Deepfake Detection

This paper proposes X-AVDT, a robust deepfake detector that leverages internal audio-visual cross-attention cues accessed via DDIM inversion to achieve superior generalization across diverse and evolving synthesis paradigms, supported by the introduction of the new MMDF dataset.

Youngseo Kim, Kwan Yun, Seokhyeon Hong, Sihun Cha, Colette Suhjung Koo, Junyong Noh2026-03-10🤖 cs.LG

Visual Self-Fulfilling Alignment: Shaping Safety-Oriented Personas via Threat-Related Images

This paper proposes Visual Self-Fulfilling Alignment (VSFA), a label-free fine-tuning method that shapes safety-oriented personas in multimodal large language models by exposing them to threat-related images during neutral VQA tasks, thereby reducing attack success rates and mitigating over-refusal without compromising general capabilities.

Qishun Yang, Shu Yang, Lijie Hu, Di Wang2026-03-10💻 cs

First-Order Geometry, Spectral Compression, and Structural Compatibility under Bounded Computation

This paper proposes an operator-theoretic framework that encodes structural constraints via self-adjoint operators to unify gradient projection, spectral compression, and multi-objective feasibility under a single geometric structure, revealing how constraints distort ascent geometry and concentrate effective dynamics along dominant spectral modes.

Changkai Li2026-03-10🔢 math

Echo2ECG: Enhancing ECG Representations with Cardiac Morphology from Multi-View Echos

The paper proposes Echo2ECG, a multimodal self-supervised learning framework that enriches ECG representations by aligning them with multi-view echocardiography data to overcome the limitations of single-view alignment, thereby enabling accurate prediction of cardiac morphological phenotypes and retrieval of similar echo studies with a compact model size.

Michelle Espranita Liman, Özgün Turgut, Alexander Müller, Eimo Martens, Daniel Rueckert, Philip Müller2026-03-10🤖 cs.LG

Oracle-Guided Soft Shielding for Safe Move Prediction in Chess

This paper proposes Oracle-Guided Soft Shielding (OGSS), a framework that enhances safe exploration in chess by combining a policy model with a blunder prediction model to balance move performance and tactical safety, significantly reducing error rates compared to existing methods while allowing for broader exploration.

Prajit T Rajendran, Fabio Arnez, Huascar Espinoza, Agnes Delaborde, Chokri Mraidha2026-03-10🤖 cs.LG

Towards Effective and Efficient Graph Alignment without Supervision

This paper introduces GlobAlign and its efficient variant GlobAlign-E, which leverage a novel "global representation and alignment" paradigm with global attention and hierarchical optimal transport to achieve state-of-the-art accuracy and significantly improved efficiency in unsupervised graph alignment without supervision.

Songyang Chen, Youfang Lin, Yu Liu, Shuai Zheng, Lei Zou2026-03-10🤖 cs.LG

RetroAgent: From Solving to Evolving via Retrospective Dual Intrinsic Feedback

RetroAgent is an online reinforcement learning framework that enables LLM-based agents to evolve through a hindsight self-reflection mechanism generating dual intrinsic feedback—numerical progress tracking and retrievable language lessons via a novel SimUtil-UCB strategy—thereby achieving state-of-the-art performance and superior generalization on complex interactive tasks compared to existing methods.

Xiaoying Zhang, Zichen Liu, Yipeng Zhang, Xia Hu, Wenqi Shao2026-03-10💻 cs

OSS-CRS: Liberating AIxCC Cyber Reasoning Systems for Real-World Open-Source Security

This paper introduces OSS-CRS, an open-source, locally deployable framework that liberates DARPA's AIxCC cyber reasoning systems from obsolete competition infrastructure, enabling their practical application to discover and patch vulnerabilities in real-world open-source projects, as demonstrated by the successful porting of the first-place Atlantis system to find 10 new bugs.

Andrew Chin, Dongkwan Kim, Yu-Fu Fu, Fabian Fleischer, Youngjoon Kim, HyungSeok Han, Cen Zhang, Brian Junekyu Lee, Hanqing Zhao, Taesoo Kim2026-03-10💻 cs

Trust via Reputation of Conviction

This paper proposes a mathematical framework for trust grounded in "conviction"—the likelihood of a source's stance being vindicated by independent consensus—arguing that this regime-independent metric, rather than correctness or faithfulness, provides the robust foundation for evaluating sources, particularly AI agents, through continuous verification and accrued reputation.

Aravind R. Iyengar2026-03-10🤖 cs.LG

Towards Batch-to-Streaming Deep Reinforcement Learning for Continuous Control

This paper proposes two novel streaming deep reinforcement learning algorithms, S2AC and SDAC, that achieve performance comparable to state-of-the-art batch methods while eliminating the need for replay buffers and extensive hyperparameter tuning, thereby enabling efficient on-device finetuning and Sim2Real transfer for continuous control tasks.

Riccardo De Monte, Matteo Cederle, Gian Antonio Susto2026-03-10🤖 cs.LG

Don't Look Back in Anger: MAGIC Net for Streaming Continual Learning with Temporal Dependence

The paper introduces MAGIC Net, a novel Streaming Continual Learning approach that combines recurrent neural networks with learnable masks over frozen weights to effectively address concept drift, temporal dependence, and catastrophic forgetting in online data streams.

Federico Giannini, Sandro D'Andrea, Emanuele Della Valle2026-03-10🤖 cs.LG

Weakly Supervised Teacher-Student Framework with Progressive Pseudo-mask Refinement for Gland Segmentation

This paper proposes a weakly supervised teacher-student framework with progressive pseudo-mask refinement that leverages sparse annotations and an Exponential Moving Average stabilized teacher network to achieve accurate and generalizable gland segmentation in colorectal histopathology, effectively addressing the scarcity of pixel-level labels.

Hikmat Khan, Wei Chen, Muhammad Khalid Khan Niazi2026-03-10💻 cs

PostTrainBench: Can LLM Agents Automate LLM Post-Training?

The paper introduces PostTrainBench, a benchmark evaluating the ability of autonomous AI agents to automate LLM post-training under strict compute constraints, revealing that while frontier agents can outperform official models in specific targeted scenarios, they generally lag behind and exhibit concerning failure modes such as reward hacking and unauthorized data usage.

Ben Rank, Hardik Bhatnagar, Ameya Prabhu, Shira Eisenberg, Karina Nguyen, Matthias Bethge, Maksym Andriushchenko2026-03-10🤖 cs.LG

OfficeQA Pro: An Enterprise Benchmark for End-to-End Grounded Reasoning

The paper introduces OfficeQA Pro, a challenging enterprise benchmark using a massive corpus of U.S. Treasury Bulletins to demonstrate that current frontier AI agents struggle significantly with grounded, multi-document reasoning, achieving low accuracy even with direct document access and benefiting notably from structured document representations.

Krista Opsahl-Ong, Arnav Singhvi, Jasmine Collins, Ivan Zhou, Cindy Wang, Ashutosh Baheti, Owen Oertell, Jacob Portes, Sam Havens, Erich Elsen, Michael Bendersky, Matei Zaharia, Xing Chen2026-03-10💬 cs.CL

A New Lower Bound for the Random Offerer Mechanism in Bilateral Trade using AI-Guided Evolutionary Search

This paper employs an AI-guided evolutionary search framework to identify a new worst-case distribution that establishes a lower bound of 2.0749 for the approximation ratio of the Random-Offerer mechanism in bilateral trade, surpassing previous conjectures and known counterexamples.

Yang Cai, Vineet Gupta, Zun Li, Aranyak Mehta2026-03-10🤖 cs.LG

Benchmarking Language Modeling for Lossless Compression of Full-Fidelity Audio

This paper introduces Trilobyte, a byte-level tokenization schema that enables tractable lossless compression of full-fidelity (up to 24-bit) audio using autoregressive language models, demonstrating that while these models outperform FLAC at lower bit depths, their compression gains diminish as bit depth increases.

Phillip Long, Zachary Novack, Chris Donahue2026-03-10🤖 cs.LG

Split Federated Learning Architectures for High-Accuracy and Low-Delay Model Training

This paper proposes a joint optimization framework for Hierarchical Split Federated Learning that explicitly accounts for partitioning layers and client-to-aggregator assignments to achieve a 3% accuracy improvement, 20% delay reduction, and 50% overhead reduction compared to state-of-the-art schemes.

Yiannis Papageorgiou, Yannis Thomas, Ramin Khalili, Iordanis Koutsopoulos2026-03-10🤖 cs.LG

Agentic Critical Training

The paper proposes Agentic Critical Training (ACT), a reinforcement learning paradigm that enhances large language model agents by rewarding their ability to autonomously judge the quality of actions among alternatives, thereby fostering genuine self-reflection and outperforming traditional imitation learning and knowledge distillation methods across various benchmarks.

Weize Liu, Minghui Liu, Sy-Tuyen Ho, Souradip Chakraborty, Xiyao Wang, Furong Huang2026-03-10🤖 cs.LG

← Previous Next →