cs.AI papers | Gist.Science

Compose by Focus: Scene Graph-based Atomic Skills

This paper introduces a scene graph-based framework that enhances the compositional generalization of generalist robots by learning robust, focused atomic skills via graph neural networks and diffusion models, which are then orchestrated by a vision-language model planner to achieve superior performance in complex, long-horizon tasks.

Han Qi, Changhe Chen, Heng Yang2026-03-10💻 cs

Efficient Construction of Implicit Surface Models From a Single Image for Motion Generation

This paper introduces Fast Image-to-Neural Surface (FINS), a lightweight framework that efficiently reconstructs high-fidelity implicit surfaces and SDF fields from a single image within seconds by leveraging multi-resolution hash grids and pre-trained foundation models, outperforming existing methods in speed and accuracy for robotics applications.

Wei-Teng Chu, Tianyi Zhang, Matthew Johnson-Roberson, Weiming Zhi2026-03-10💻 cs

Linear probes rely on textual evidence: Results from leakage mitigation studies in language models

This paper demonstrates that linear probes used to detect harmful behaviors in language models are heavily reliant on explicit textual evidence, as their performance significantly degrades when such surface-level cues are filtered out or when models are trained to express behaviors without verbalization.

Gerard Boxo, Aman Neelappa, Shivam Raval2026-03-10🤖 cs.LG

Towards Strategic Persuasion with Language Models

This paper introduces a theory-driven framework grounded in Bayesian persuasion theory to evaluate and train large language models as strategic persuaders, demonstrating that both frontier and smaller models can achieve significant persuasion gains and exhibit sophisticated strategies through reinforcement learning.

Zirui Cheng, Jiaxuan You2026-03-10💻 cs

Generative Evolutionary Meta-Solver (GEMS): Scalable Surrogate-Free Multi-Agent Reinforcement Learning

The paper introduces Generative Evolutionary Meta-Solver (GEMS), a scalable, surrogate-free multi-agent reinforcement learning framework that replaces explicit policy populations with a compact generator and latent anchors to achieve significantly faster training, lower memory usage, and higher rewards than traditional methods like PSRO while maintaining game-theoretic guarantees.

Alakh Sharma, Gaurish Trivedi, Kartikey Singh Bhandari, Yash Sinha, Dhruv Kumar, Pratik Narang, Jagat Sesh Challa2026-03-10🤖 cs.LG

Mapping Overlaps in Benchmarks through Perplexity in the Wild

This paper introduces "benchmark signatures"—sets of salient tokens from in-the-wild corpora whose perplexity predicts model performance—to reveal nuanced overlaps and distinct capacities across 89 LLM benchmarks, offering a robust alternative to raw performance correlations for understanding the landscape of LLM abilities and the divergence between machine and human semantic organization.

Siyang Wu, Honglin Bao, Sida Li, Ari Holtzman, James A. Evans2026-03-10💬 cs.CL

ELHPlan: Efficient Long-Horizon Task Planning for Multi-Agent Collaboration

ELHPlan is a novel framework for efficient long-horizon multi-agent planning that utilizes intention-bound action chains within a cyclical validation process to achieve comparable task success rates to state-of-the-art methods while significantly reducing computational costs and token consumption.

Shaobin Ling, Yun Wang, Chenyou Fan, Tin Lun Lam, Junjie Hu2026-03-10💻 cs

Cold-Start Active Correlation Clustering

This paper addresses the cold-start scenario in active correlation clustering, where no initial pairwise similarities are available, by proposing a coverage-aware method that encourages diversity to efficiently query similarities and achieve effective clustering.

Linus Aronsson, Han Wu, Morteza Haghir Chehreghani2026-03-10🤖 cs.LG

Your Agent May Misevolve: Emergent Risks in Self-evolving LLM Agents

This paper introduces and empirically validates the concept of "misevolution," demonstrating that self-evolving LLM agents face widespread, emergent risks across model, memory, tool, and workflow pathways that can lead to safety degradation and unintended vulnerabilities, thereby highlighting an urgent need for new safety paradigms.

Shuai Shao, Qihan Ren, Chen Qian, Boyi Wei, Dadi Guo, Jingyi Yang, Xinhao Song, Linfeng Zhang, Weinan Zhang, Dongrui Liu, Jing Shao2026-03-10🤖 cs.LG

CroSTAta: Cross-State Transition Attention Transformer for Robotic Manipulation

The paper introduces CroSTAta, a Cross-State Transition Attention Transformer that enhances robotic manipulation robustness by employing a novel State Transition Attention mechanism to model temporal structures like failure and recovery patterns, outperforming standard attention and sequential models in simulation.

Giovanni Minelli, Giulio Turrisi, Victor Barasuol, Claudio Semini2026-03-10🤖 cs.LG

Automated Extraction of Material Properties using LLM-based AI Agents

This study presents an automated, cost-effective LLM-based agentic workflow that successfully extracts over 27,000 thermoelectric and structural property records from approximately 10,000 scientific articles, creating the largest LLM-curated dataset to date and establishing a scalable foundation for data-driven materials discovery.

Subham Ghosh, Abhishek Tewari2026-03-10🔬 cond-mat.mtrl-sci

FOR-Prompting: From Objection to Revision via an Asymmetric Prompting Protocol

The paper introduces FOR-Prompting, a model-agnostic, asymmetric prompting protocol that enhances reasoning and iterative refinement across diverse tasks by structuring interactions between a Defender, a Questioner, and an optional Host, enabling even small models to achieve performance comparable to or better than standard baselines without requiring training or access to model internals.

He Zhang, Anzhou Zhang, Jian Dai2026-03-10💬 cs.CL

Tree-based Dialogue Reinforced Policy Optimization for Red-Teaming Attacks

The paper introduces DialTree, a tree-based dialogue reinforcement learning framework that autonomously discovers diverse and effective multi-turn attack strategies against large language models, significantly outperforming existing single-turn or template-based red-teaming methods.

Ruohao Guo, Afshin Oroojlooy, Roshan Sridhar, Miguel Ballesteros, Alan Ritter, Dan Roth2026-03-10🤖 cs.LG

Wasserstein Gradient Flows for Scalable and Regularized Barycenter Computation

This paper introduces a scalable and regularized Wasserstein barycenter solver based on gradient flows that leverages mini-batch optimal transport and seamlessly integrates supervised label information, achieving state-of-the-art performance across diverse domain adaptation benchmarks.

Eduardo Fernandes Montesuma, Yassir Bendou, Mike Gartrell2026-03-10🤖 cs.LG

Tiny but Mighty: A Software-Hardware Co-Design Approach for Efficient Multimodal Inference on Battery-Powered Small Devices

The paper presents NANOMIND, a hardware-software co-design framework that decomposes Large Multimodal Models into modular components and dynamically schedules them across heterogeneous accelerators on unified-memory SoCs, enabling a battery-powered device to run LMMs entirely on-device with significantly improved energy efficiency and throughput.

Yilong Li, Shuai Zhang, Yijing Zeng, Hao Zhang, Xinmiao Xiong, Jingyu Liu, Pan Hu, Suman Banerjee2026-03-10💬 cs.CL

Membership Inference Attacks on Tokenizers of Large Language Models

This paper introduces tokenizers as a novel and effective attack vector for membership inference against large language models, demonstrating their significant privacy leakage risks through extensive experiments and proposing an adaptive defense to mitigate these vulnerabilities.

Meng Tong, Yuntao Du, Kejiang Chen, Weiming Zhang, Ninghui Li2026-03-10💻 cs

Deliberative Dynamics and Value Alignment in LLM Debates

This paper investigates how different deliberation protocols (synchronous vs. round-robin) and model architectures influence value alignment and verdict revision in multi-turn LLM debates, revealing significant behavioral disparities where GPT-4.1 exhibits strong inertia and autonomy-focused reasoning while Claude 3.7 Sonnet and Gemini 2.0 Flash demonstrate greater flexibility, empathy, and susceptibility to order effects.

Pratik S. Sachdeva, Tom van Nuenen2026-03-10💻 cs

Reallocating Attention Across Layers to Reduce Multimodal Hallucination

This paper proposes a lightweight, training-free plugin called Functional Head Identification and Class-Conditioned Rescaling that mitigates multimodal hallucinations in large reasoning models by adaptively rebalancing perception and reasoning contributions across layers, achieving significant performance gains with minimal computational overhead.

Haolang Lu, Bolun Chu, WeiYe Fu, Guoshun Nan, Junning Liu, Minghui Pan, Qiankun Li, Yi Yu, Hua Wang, Kun Wang2026-03-10💻 cs

DropVLA: An Action-Level Backdoor Attack on Vision-Language-Action Models

This paper introduces DropVLA, an action-level backdoor attack that covertly manipulates Vision-Language-Action models to execute specific safety-critical actions at attacker-chosen decision points using minimal vision-based data poisoning while maintaining high nominal task performance.

Zonghuan Xu, Jiayu Li, Yunhan Zhao, Xiang Zheng, Xingjun Ma, Yu-Gang Jiang2026-03-10💻 cs

Ego-Vision World Model for Humanoid Contact Planning

This paper presents a demonstration-free framework that combines a learned ego-vision world model with sampling-based Model Predictive Control and a surrogate value function to enable humanoid robots to perform robust, real-time physical contact planning in unstructured environments.

Hang Liu, Yuman Gao, Sangli Teng, Yufeng Chi, Yakun Sophia Shao, Zhongyu Li, Maani Ghaffari, Koushil Sreenath2026-03-10💻 cs

← Previous Next →