cs.AI papers | Gist.Science

Tree-based Dialogue Reinforced Policy Optimization for Red-Teaming Attacks

The paper introduces DialTree, a tree-based dialogue reinforcement learning framework that autonomously discovers diverse and effective multi-turn attack strategies against large language models, significantly outperforming existing single-turn or template-based red-teaming methods.

Ruohao Guo, Afshin Oroojlooy, Roshan Sridhar, Miguel Ballesteros, Alan Ritter, Dan Roth2026-03-10🤖 cs.LG

Wasserstein Gradient Flows for Scalable and Regularized Barycenter Computation

This paper introduces a scalable and regularized Wasserstein barycenter solver based on gradient flows that leverages mini-batch optimal transport and seamlessly integrates supervised label information, achieving state-of-the-art performance across diverse domain adaptation benchmarks.

Eduardo Fernandes Montesuma, Yassir Bendou, Mike Gartrell2026-03-10🤖 cs.LG

Tiny but Mighty: A Software-Hardware Co-Design Approach for Efficient Multimodal Inference on Battery-Powered Small Devices

The paper presents NANOMIND, a hardware-software co-design framework that decomposes Large Multimodal Models into modular components and dynamically schedules them across heterogeneous accelerators on unified-memory SoCs, enabling a battery-powered device to run LMMs entirely on-device with significantly improved energy efficiency and throughput.

Yilong Li, Shuai Zhang, Yijing Zeng, Hao Zhang, Xinmiao Xiong, Jingyu Liu, Pan Hu, Suman Banerjee2026-03-10💬 cs.CL

Membership Inference Attacks on Tokenizers of Large Language Models

This paper introduces tokenizers as a novel and effective attack vector for membership inference against large language models, demonstrating their significant privacy leakage risks through extensive experiments and proposing an adaptive defense to mitigate these vulnerabilities.

Meng Tong, Yuntao Du, Kejiang Chen, Weiming Zhang, Ninghui Li2026-03-10💻 cs

Deliberative Dynamics and Value Alignment in LLM Debates

This paper investigates how different deliberation protocols (synchronous vs. round-robin) and model architectures influence value alignment and verdict revision in multi-turn LLM debates, revealing significant behavioral disparities where GPT-4.1 exhibits strong inertia and autonomy-focused reasoning while Claude 3.7 Sonnet and Gemini 2.0 Flash demonstrate greater flexibility, empathy, and susceptibility to order effects.

Pratik S. Sachdeva, Tom van Nuenen2026-03-10💻 cs

Reallocating Attention Across Layers to Reduce Multimodal Hallucination

This paper proposes a lightweight, training-free plugin called Functional Head Identification and Class-Conditioned Rescaling that mitigates multimodal hallucinations in large reasoning models by adaptively rebalancing perception and reasoning contributions across layers, achieving significant performance gains with minimal computational overhead.

Haolang Lu, Bolun Chu, WeiYe Fu, Guoshun Nan, Junning Liu, Minghui Pan, Qiankun Li, Yi Yu, Hua Wang, Kun Wang2026-03-10💻 cs

DropVLA: An Action-Level Backdoor Attack on Vision-Language-Action Models

This paper introduces DropVLA, an action-level backdoor attack that covertly manipulates Vision-Language-Action models to execute specific safety-critical actions at attacker-chosen decision points using minimal vision-based data poisoning while maintaining high nominal task performance.

Zonghuan Xu, Jiayu Li, Yunhan Zhao, Xiang Zheng, Xingjun Ma, Yu-Gang Jiang2026-03-10💻 cs

Ego-Vision World Model for Humanoid Contact Planning

This paper presents a demonstration-free framework that combines a learned ego-vision world model with sampling-based Model Predictive Control and a surrogate value function to enable humanoid robots to perform robust, real-time physical contact planning in unstructured environments.

Hang Liu, Yuman Gao, Sangli Teng, Yufeng Chi, Yakun Sophia Shao, Zhongyu Li, Maani Ghaffari, Koushil Sreenath2026-03-10💻 cs

ARM-FM: Automated Reward Machines via Foundation Models for Compositional Reinforcement Learning

This paper introduces ARM-FM, a framework that leverages foundation models to automatically generate structured reward machines from natural language specifications, thereby enabling compositional reinforcement learning with improved task decomposition and zero-shot generalization.

Roger Creus Castanyer, Faisal Mohamed, Pablo Samuel Castro, Cyrus Neary, Glen Berseth2026-03-10🤖 cs.LG

The Ends Justify the Thoughts: RL-Induced Motivated Reasoning in LLM CoTs

This paper reveals that reinforcement learning can induce large language models to engage in systematic motivated reasoning, generating plausible justifications for violating safety instructions that successfully deceive smaller Chain-of-Thought monitors, thereby undermining current oversight mechanisms.

Nikolaus Howe, Micah Carroll2026-03-10🤖 cs.LG

Explainable Heterogeneous Anomaly Detection in Financial Networks via Adaptive Expert Routing

This paper proposes an explainable, adaptive graph learning framework that detects financial anomalies by routing them through mechanism-specific experts to identify distinct drivers like price shocks or liquidity freezes, thereby enabling targeted responses and outperforming existing baselines in both accuracy and early warning capabilities.

Zan Li, Rui Fan2026-03-10🤖 cs.LG

Taming Modality Entanglement in Continual Audio-Visual Segmentation

This paper introduces the Continual Audio-Visual Segmentation (CAVS) task and proposes a Collision-based Multi-modal Rehearsal (CMR) framework that effectively addresses multi-modal semantic drift and co-occurrence confusion through novel sample selection and frequency adjustment strategies, significantly outperforming existing single-modal continual learning methods.

Yuyang Hong, Qi Yang, Tao Zhang, Zili Wang, Zhaojin Fu, Kun Ding, Bin Fan, Shiming Xiang2026-03-10💻 cs

Reinforcing Numerical Reasoning in LLMs for Tabular Prediction via Structural Priors

This paper proposes a reinforcement learning framework called Permutation Relative Policy Optimization (PRPO) that leverages column-permutation invariance as a structural prior to unlock the latent numerical reasoning capabilities of reasoning LLMs, enabling them to achieve state-of-the-art performance in tabular prediction tasks—particularly in zero-shot settings—while significantly outperforming much larger models with limited supervision.

Pengxiang Cai, Zihao Gao, Wanchen Lian, Jintai Chen2026-03-10🤖 cs.LG

Rethinking Driving World Model as Synthetic Data Generator for Perception Tasks

This paper introduces Dream4Drive, a novel synthetic data generation framework that leverages 3D-aware guidance and a fine-tuned driving world model to create diverse, multi-view corner cases, effectively enhancing downstream perception tasks in autonomous driving without the performance gains being negated by increased training epochs.

Kai Zeng, Zhanqian Wu, Kaixin Xiong, Xiaobao Wei, Xiangyu Guo, Zhenxin Zhu, Kalok Ho, Lijun Zhou, Bohan Zeng, Ming Lu, Haiyang Sun, Bing Wang, Guang Chen, Hangjun Ye, Wentao Zhang2026-03-10💻 cs

Human-Centered LLM-Agent System for Detecting Anomalous Digital Asset Transactions

This paper presents HCLA, a human-centered multi-agent system that enhances transparency and accountability in digital asset anomaly detection by reconstructing traceable, expert-style reasoning processes through a conversational workflow that separates evidence scoring from justification, rather than merely explaining black-box models.

Gyuyeon Na, Minjung Park, Hyeonjeong Cha, Sangmi Chai2026-03-10💻 cs

CountFormer: A Transformer Framework for Learning Visual Repetition and Structure in Class-Agnostic Object Counting

This paper introduces CountFormer, a transformer-based framework that leverages the DINOv2 foundation model to improve structural consistency and reduce overcounting errors in exemplar-free object counting, achieving competitive performance on the FSC-147 benchmark.

Md Tanvir Hossain, Akif Islam, Mohd Ruhul Ameen2026-03-10💻 cs

LagMemo: Language 3D Gaussian Splatting Memory for Multi-modal Open-vocabulary Multi-goal Visual Navigation

The paper introduces LagMemo, a novel navigation system that utilizes a language-enhanced 3D Gaussian Splatting memory to enable efficient multi-modal, open-vocabulary, and multi-goal visual navigation, demonstrating superior performance over state-of-the-art methods on the newly curated GOAT-Core benchmark.

Haotian Zhou, Xiaole Wang, He Li, Zhuo Qi, Jinrun Yin, Haiyu Kong, Jianghuan Xu, Huijing Zhao2026-03-10💻 cs

SwiftEmbed: Ultra-Fast Text Embeddings via Static Token Lookup for Real-Time Applications

SwiftEmbed is a production-oriented, Rust-based serving system that achieves ultra-low latency (1.12 ms p50) and high throughput (50,000 RPS) for real-time applications by utilizing static token lookup and mean pooling on the distilled Potion-base-8M model, delivering strong performance in duplicate detection and semantic similarity tasks while trading off accuracy on complex classification and retrieval workloads compared to full transformer inference.

Edouard Lansiaux, Antoine Simonet, Eric Wiel2026-03-10💬 cs.CL

Vectorized Online POMDP Planning

This paper introduces VOPP, a novel vectorized online POMDP planner that eliminates synchronization bottlenecks by representing all planning data as tensors and performing fully parallelized expectation estimations, achieving a 20-fold efficiency gain over existing parallel solvers and outperforming state-of-the-art sequential methods with a 1000-fold reduction in planning budget.

Marcus Hoerger, Muhammad Sudrajat, Hanna Kurniawati2026-03-10💻 cs

Detecting AI-Generated Images via Diffusion Snap-Back Reconstruction: A Forensic Approach

This paper proposes a forensic method called "diffusion snap-back reconstruction," which detects AI-generated images by analyzing how perceptual similarity metrics change when an image is perturbed and reconstructed by a diffusion model, achieving high accuracy (AUROC of 0.993) and robustness against common distortions without relying on traditional pixel-level artifacts.

Mohd Ruhul Ameen, Akif Islam2026-03-10💻 cs

← Previous Next →