cs.MA papers | Gist.Science

Let's Think in Two Steps: Mitigating Agreement Bias in MLLMs with Self-Grounded Verification

This paper identifies a pervasive "agreement bias" in Multimodal LLM verifiers that causes them to over-validate agent behavior, and proposes a lightweight Self-Grounded Verification (SGV) method that significantly improves failure detection and task completion across web navigation, computer use, and robotics by decoupling prior generation from trajectory evaluation.

Moises Andrade, Joonhyuk Cha, Brandon Ho, Vriksha Srihari, Karmesh Yadav, Zsolt KiraTue, 10 Ma🤖 cs.LG

CRAwDAD: Causal Reasoning Augmentation with Dual-Agent Debate

The paper introduces CRAwDAD, a dual-agent debate framework that enhances causal inference in reasoning language models by facilitating structured dialogue and adversarial critique between agents, significantly improving accuracy on the CLadder benchmark across all levels of Pearl's causal ladder.

Finn G. Vamosi, Nils D. ForkertTue, 10 Ma🤖 cs.LG

IronEngine: Towards General AI Assistant

This paper introduces IronEngine, a general AI assistant platform featuring a unified orchestration core and a three-phase pipeline that integrates diverse backends, adaptive memory, and extensive tooling to achieve high task completion rates while separating planning quality from execution capability.

Xi MoTue, 10 Ma🤖 cs.LG

Multi-Agent DRL for V2X Resource Allocation: Disentangling Challenges and Benchmarking Solutions

This paper addresses the lack of systematic evaluation in Multi-Agent Deep Reinforcement Learning for C-V2X resource allocation by introducing a disentangled benchmark suite of interference games and diverse datasets to isolate specific challenges, ultimately identifying policy robustness and generalization across vehicular topologies as the primary hurdle and demonstrating the superiority of actor-critic methods over value-based approaches.

Siyuan Wang, Lei Lei, Pranav Maheshwari, Sam Bellefeuille, Kan Zheng, Dusit NiyatoTue, 10 Ma🤖 cs.LG

What Do Agents Think One Another Want? Level-2 Inverse Games for Inferring Agents' Estimates of Others' Objectives

This paper proposes a novel level-2 inverse game framework that infers agents' estimates of each other's objectives to address the limitations of traditional level-1 methods in decentralized scenarios, demonstrating through theory and experiments that accounting for these mutual misalignments is crucial for accurately predicting strategic interactions.

Hamzah I. Khan, Jingqi Li, David Fridovich-KeilThu, 12 Ma💻 cs

LLMGreenRec: LLM-Based Multi-Agent Recommender System for Sustainable E-Commerce

The paper introduces LLMGreenRec, a novel multi-agent framework leveraging Large Language Models to deduce user green intents and prioritize sustainable product recommendations, thereby bridging the gap between eco-friendly intentions and actions while minimizing the system's own digital carbon footprint.

Hao N. Nguyen, Hieu M. Nguyen, Son Van Nguyen, Nguyen Thi HanhThu, 12 Ma💻 cs

The Yokai Learning Environment: Tracking Beliefs Over Space and Time

This paper introduces the Yokai Learning Environment (YLE), a new open-source benchmark for zero-shot coordination that overcomes the saturation of the Hanabi Learning Environment by requiring agents to track moving cards and reason under ambiguous hints, thereby revealing that current state-of-the-art methods fail to maintain consistent internal models when paired with unseen partners.

Constantin Ruhdorfer, Matteo Bortoletto, Johannes Forkel, Jakob Foerster, Andreas BullingThu, 12 Ma🤖 cs.AI

GRACE: A Unified 2D Multi-Robot Path Planning Simulator & Benchmark for Grid, Roadmap, And Continuous Environments

This paper introduces GRACE, a unified 2D simulator and benchmark that enables transparent, reproducible comparisons of multi-robot path planning algorithms across grid, roadmap, and continuous environments by standardizing task instantiation, execution, and evaluation protocols.

Chuanlong Zang, Anna Mannucci, Isabelle Barz, Philipp Schillinger, Florian Lier, Wolfgang HönigThu, 12 Ma🤖 cs.AI

Mindstorms in Natural Language-Based Societies of Mind

This paper proposes Natural Language-Based Societies of Mind (NLSOMs), a modular framework where large multimodal neural networks communicate via natural language to solve complex AI tasks more effectively than single models, while also exploring the emerging social, economic, and structural challenges of scaling these heterogeneous societies to include billions of agents.

Mingchen Zhuge, Haozhe Liu, Francesco Faccio, Dylan R. Ashley, Róbert Csordás, Anand Gopalakrishnan, Abdullah Hamdi, Hasan Abed Al Kader Hammoud, Vincent Herrmann, Kazuki Irie, Louis Kirsch, Bing Li, Guohao Li, Shuming Liu, Jinjie Mai, Piotr Pi\k{e}kos, Aditya Ramesh, Imanol Schlag, Weimin Shi, Aleksandar Stanic, Wenyi Wang, Yuhui Wang, Mengmeng Xu, Deng-Ping Fan, Bernard Ghanem, Jürgen SchmidhuberThu, 12 Ma💬 cs.CL

COMIC: Agentic Sketch Comedy Generation

The paper presents COMIC, a fully automated AI system that generates high-quality, diverse comedic sketch videos by employing a multi-agent framework with specialized roles and LLM-based critics trained on YouTube data to iteratively refine content toward professional standards.

Susung Hong, Brian Curless, Ira Kemelmacher-Shlizerman, Steve SeitzThu, 12 Ma💬 cs.CL

Multi-Agent Memory from a Computer Architecture Perspective: Visions and Challenges Ahead

This position paper reframes multi-agent memory as a computer architecture challenge by proposing a three-layer hierarchy and identifying critical protocol gaps, with a specific focus on resolving multi-agent memory consistency as the primary obstacle to building reliable and scalable collaborative systems.

Zhongming Yu, Naicheng Yu, Hejia Zhang, Wentao Ni, Mingrui Yin, Jiaying Yang, Yujie Zhao, Jishen ZhaoThu, 12 Ma🤖 cs.AI

Sequential Causal Normal Form Games: Theory, Computation, and Strategic Signaling

This paper extends Causal Normal Form Games to sequential settings by introducing Sequential Causal Multi-Agent Systems, but its comprehensive theoretical and empirical analysis reveals that, under standard rational assumptions, these causal frameworks offer no welfare advantage over classical Stackelberg equilibrium, thereby highlighting a fundamental incompatibility between rational choice and causal reasoning benefits in current game-theoretic models.

Dennis ThummThu, 12 Ma📊 stat

KernelSkill: A Multi-Agent Framework for GPU Kernel Optimization

KernelSkill is a multi-agent framework that enhances GPU kernel optimization by replacing opaque LLM heuristics with a knowledge-driven, dual-level memory architecture of expert skills, achieving state-of-the-art speedups and a 100% success rate on KernelBench.

Qitong Sun, Jun Han, Tianlin Li, Zhe Tang, Sheng Chen, Fei Yang, Aishan Liu, Xianglong Liu, Yang LiuThu, 12 Ma🤖 cs.LG

Symmetry-Breaking in Multi-Agent Navigation: Winding Number-Aware MPC with a Learned Topological Strategy

This paper introduces WNumMPC, a hierarchical multi-agent navigation framework that combines a reinforcement learning-based planner and a model-based controller to resolve symmetry-induced deadlocks in dense environments by leveraging topological winding numbers for robust, communication-free coordination.

Tomoki Nakao, Kazumi Kasaura, Tadashi KozunoMon, 09 Ma💻 cs

MARLIN: Multi-Agent Reinforcement Learning with Murmuration Intelligence and LLM Guidance for Reservoir Management

The paper introduces MARLIN, a decentralized reservoir management framework that combines multi-agent reinforcement learning inspired by starling murmurations with LLM-guided reward shaping to effectively handle environmental uncertainties, significantly improving flood response times and computational efficiency compared to traditional methods.

Heming Fu, Shan Lin, Guojun XiongMon, 09 Ma💻 cs

OA-Bug: An Olfactory-Auditory Augmented Bug Algorithm for Swarm Robots in a Denied Environment

This paper proposes the Olfactory-Auditory augmented Bug algorithm (OA-Bug) for swarm robots to effectively explore denied environments without GNSS or central processing, demonstrating through simulations and real-world experiments that it achieves significantly higher search coverage (96.93%) compared to existing methods like SGBA.

Siqi Tan, Xiaoya Zhang, Jingyao Li, Ruitao Jing, Mufan Zhao, Yang Liu, Quan QuanMon, 09 Ma💻 cs

Impact of arbitrage between leveraged ETF and futures on market liquidity during market crash

Using artificial market simulations, this study demonstrates that arbitrage trading between leveraged ETFs and futures acts as a critical liquidity bridge during market crashes, supplying depth and tightness from the stable market to the distressed one to mitigate price declines.

Ryuki Hayase, Takanobu Mizuta, Isao YagiMon, 09 Ma💻 cs

The Coordination Gap: Alternation Metrics for Temporal Dynamics in Multi-Agent Battle of the Exes

This paper introduces temporally sensitive Alternation (ALT) metrics to reveal that conventional outcome-based evaluations can severely mischaracterize multi-agent coordination, as demonstrated by Q-learning agents in a Battle of the Exes variant that achieve high traditional fairness scores but perform significantly worse than random baselines in actual turn-taking dynamics.

Nikolaos Al. Papadopoulos, Konstantinos PsannisMon, 09 Ma🤖 cs.LG

Information-Theoretic Privacy Control for Sequential Multi-Agent LLM Systems

This paper addresses the risk of amplified privacy leakage in sequential multi-agent LLM systems by formalizing compositional leakage through mutual information, deriving a theoretical bound on its propagation, and proposing a privacy-regularized training framework that enforces system-level privacy guarantees rather than relying on local agent constraints alone.

Sadia Asif, Mohammad Mohammadi AmiriMon, 09 Ma🤖 cs.LG

XR-DT: Extended Reality-Enhanced Digital Twin for Safe Motion Planning via Human-Aware Model Predictive Path Integral Control

This paper introduces XR-DT, an Extended Reality-enhanced Digital Twin framework that integrates a novel Human-Aware Model Predictive Path Integral (HA-MPPI) controller with an attention-based trajectory prediction model to enable safe, efficient, and interpretable motion planning for mobile robots operating alongside humans.

Tianyi Wang, Jiseop Byeon, Ahmad Yehia, Yiming Xu, Jihyung Park, Tianyi Zeng, Sikai Chen, Ziran Wang, Junfeng Jiao, Christian ClaudelMon, 09 Ma🤖 cs.AI

← Previous Next →