cs.AI papers | Gist.Science

SafeGen-LLM: Enhancing Safety Generalization in Task Planning for Robotic Systems

This paper introduces SafeGen-LLM, a two-stage post-training framework combining supervised fine-tuning and GRPO with formal verification rewards to enhance the safety satisfaction and generalization of robotic task planning across diverse domains and input formats.

Jialiang Fan, Weizhe Xu, Mengyu Liu + 3 more2026-03-11🤖 cs.AI

Breaking the Factorization Barrier in Diffusion Language Models

The paper introduces Coupled Discrete Diffusion (CoDD), a hybrid framework that overcomes the "factorization barrier" in diffusion language models by replacing fully factorized outputs with a lightweight probabilistic inference layer, thereby enabling efficient parallel generation of coherent, high-quality text without the prohibitive costs of full joint modeling or reinforcement learning.

Ian Li, Zilei Shao, Benjie Wang, Rose Yu, Guy Van den Broeck, Anji Liu2026-03-11🤖 cs.AI

OrthoAI: A Neurosymbolic Framework for Evidence-Grounded Biomechanical Reasoning in Clear Aligner Orthodontics

OrthoAI is a neurosymbolic framework that bridges 3D tooth segmentation and clinical reasoning for clear aligner orthodontics by combining sparse-supervision learning, knowledge-grounded biomechanical constraint inference, and multi-criteria treatment evaluation to enable fast, evidence-based automated decision support.

Edouard Lansiaux, Margaux Leman, Mehdi Ammi2026-03-11🤖 cs.AI

Zero-Shot and Supervised Bird Image Segmentation Using Foundation Models: A Dual-Pipeline Approach with Grounding DINO~1.5, YOLOv11, and SAM~2.1

This paper proposes a dual-pipeline framework for bird image segmentation that leverages the frozen SAM 2.1 backbone with either a zero-shot Grounding DINO 1.5 detector or a supervised fine-tuned YOLOv11 detector, achieving state-of-the-art performance on the CUB-200-2011 dataset while eliminating the need for retraining the segmentation model across different species or domains.

Abhinav Munagala2026-03-11🤖 cs.AI

Pri4R: Learning World Dynamics for Vision-Language-Action Models with Privileged 4D Representation

Pri4R is a simple yet effective method that enhances Vision-Language-Action models with an implicit understanding of world dynamics by training them to predict 3D point tracks using privileged 4D information, thereby significantly improving physical manipulation performance without adding inference overhead.

Jisoo Kim, Jungbin Cho, Sanghyeok Chu, Ananya Bal, Jinhyung Kim, Gunhee Lee, Sihaeng Lee, Seung Hwan Kim, Bohyung Han, Hyunmin Lee, Laszlo A. Jeni, Seungryong Kim2026-03-11🤖 cs.AI

Reasoning as Gradient: Scaling MLE Agents Beyond Tree Search

This paper introduces \textsc{Gome}, a gradient-based MLE agent that outperforms traditional tree search methods on MLE-Bench by mapping diagnostic reasoning to gradient computation, demonstrating that as LLM reasoning capabilities improve, gradient-based optimization becomes increasingly superior to exhaustive enumeration.

Yifei Zhang, Xu Yang, Xiao Yang, Bowen Xian, Qizheng Li, Shikai Fang, Jingyuan Li, Jian Wang, Mingrui Xu, Weiqing Liu, Jiang Bian2026-03-11🤖 cs.AI

Boltzmann-based Exploration for Robust Decentralized Multi-Agent Planning (Extended Version)

This paper introduces Coordinated Boltzmann MCTS (CB-MCTS), a novel decentralized multi-agent planning algorithm that replaces deterministic UCT with a stochastic Boltzmann policy and decaying entropy bonus to overcome the limitations of existing methods in sparse or deceptive reward environments.

Nhat D. A. Nguyen, Duong D. Nguyen, Gianluca Rizzo, Hung X. Nguyen2026-03-11🤖 cs.AI

FinTexTS: Financial Text-Paired Time-Series Dataset via Semantic-Based and Multi-Level Pairing

The paper introduces FinTexTS, a large-scale financial text-paired time-series dataset constructed via a novel semantic-based and multi-level pairing framework that overcomes the limitations of simple keyword matching by leveraging LLMs to align news articles with stock prices across macro, sector, related company, and target-company levels, thereby significantly improving stock price forecasting performance.

Jaehoon Lee, Suhwan Park, Tae Yoon Lim, Seunghan Lee, Jun Seo, Dongwan Kang, Hwanil Choi, Minjae Kim, Sungdong Yoo, SoonYoung Lee, Yongjae Lee, Wonbin Ahn2026-03-11🤖 cs.AI

SPARC: Spatial-Aware Path Planning via Attentive Robot Communication

The paper proposes SPARC, a spatial-aware path planning framework that introduces a Relation-enhanced Multi-Head Attention (RMHA) mechanism to explicitly encode pairwise distances into robot communication, significantly improving decentralized multi-robot coordination and zero-shot generalization in high-density environments compared to existing methods.

Sayang Mu, Xiangyu Wu, Bo An2026-03-11🤖 cs.AI

Unveiling the Potential of Quantization with MXFP4: Strategies for Quantization Error Reduction

This paper introduces two software-only techniques, Overflow-Aware Scaling (OAS) and Macro Block Scaling (MBS), that significantly reduce the accuracy gap between the hardware-efficient MXFP4 format and NVIDIA's NVFP4 standard in Large Language Models, achieving near-parity performance with minimal computational overhead.

Jatin Chhugani, Geonhwa Jeong, Bor-Yiing Su, Yunjie Pan, Hanmei Yang, Aayush Ankit, Jiecao Yu, Summer Deng, Yunqing Chen, Nadathur Satish, Changkyu Kim2026-03-11🤖 cs.AI

Design Conductor: An agent autonomously builds a 1.5 GHz Linux-capable RISC-V CPU

The paper introduces Design Conductor, an autonomous agent that leverages frontier models to independently design, verify, and generate a tape-out ready 1.48 GHz RISC-V CPU (VerCore) from a text specification to GDSII layout in just 12 hours, marking the first instance of an agent building a complete, working CPU end-to-end.

The Verkor Team, Ravi Krishna, Suresh Krishna, David Chin2026-03-11🤖 cs.AI

CktEvo: Repository-Level RTL Code Benchmark for Design Evolution

This paper introduces CktEvo, a repository-level benchmark and closed-loop framework that enables large language models to iteratively optimize Power, Performance, and Area (PPA) in complete RTL designs by preserving functional behavior across cross-file dependencies without human intervention.

Zhengyuan Shi, Jingxin Wang, Tairan Cheng, Changran Xu, Weikang Qian, Qiang Xu2026-03-11🤖 cs.AI

SiliconMind-V1: Multi-Agent Distillation and Debug-Reasoning Workflows for Verilog Code Generation

The paper introduces SiliconMind-V1, a unified multi-agent framework that leverages testbench-driven verification and iterative debug-reasoning workflows to train locally fine-tuned LLMs for generating functionally correct Verilog RTL designs, outperforming state-of-the-art models with greater efficiency and privacy.

Mu-Chi Chen, Yu-Hung Kao, Po-Hsuan Huang, Shao-Chun Ho, Hsiang-Yu Tsou, I-Ting Wu, En-Ming Huang, Yu-Kai Hung, Wei-Po Hsin, Cheng Liang, Chia-Heng Tu, Shih-Hao Hung, Hsiang-Tsung Kung2026-03-11🤖 cs.AI

ALADIN: Accuracy-Latency-Aware Design-space Inference Analysis for Embedded AI Accelerators

This paper presents ALADIN, an accuracy-latency-aware framework that enables the pre-deployment evaluation of mixed-precision quantized neural networks on scratchpad-based embedded AI accelerators by transforming models into platform-aware representations to analyze trade-offs and bottlenecks without requiring physical hardware.

T. Baldi, D. Casini, A. Biondi2026-03-11🤖 cs.AI

Alignment Is the Disease: Censorship Visibility and Alignment Constraint Complexity as Determinants of Collective Pathology in Multi-Agent LLM Systems

This paper presents preliminary evidence from multi-agent simulations suggesting that alignment techniques and invisible censorship in large language models may paradoxically induce collective pathological behaviors and insight-action dissociation, indicating that safety interventions can sometimes cause the very harms they aim to prevent.

Hiroki Fukui2026-03-11🤖 cs.AI

PhD Thesis Summary: Methods for Reliability Assessment and Enhancement of Deep Neural Network Hardware Accelerators

This PhD thesis presents novel, cost-efficient methods for assessing and enhancing the reliability of Deep Neural Network hardware accelerators, including a systematic literature review, new analytical tools, optimized trade-off methodologies, and the development of the AdAM real-time fault tolerance technique.

Mahdi Taheri2026-03-11🤖 cs.AI

ARKV: Adaptive and Resource-Efficient KV Cache Management under Limited Memory Budget for Long-Context Inference in LLMs

ARKV is a lightweight, adaptive framework that dynamically allocates precision levels to KV cache tokens based on per-layer attention dynamics and token importance, achieving a 4x reduction in memory usage while preserving ~97% of baseline accuracy for long-context LLM inference without requiring retraining or architectural modifications.

Jianlong Lei, Shashikant Ilager2026-03-11🤖 cs.AI

Measurement-Free Ancilla Recycling via Blind Reset: A Cross-Platform Study on Superconducting and Trapped-Ion Processors

This cross-platform study evaluates blind reset as a measurement-free ancilla recycling technique on superconducting and trapped-ion processors, demonstrating that it can significantly reduce logical-cycle latency while maintaining high ancilla cleanliness and identifying specific architecture-dependent crossover points for optimal deployment.

Sangkeum Lee2026-03-11⚛️ quant-ph

Benchmarking Federated Learning in Edge Computing Environments: A Systematic Review and Performance Evaluation

This paper presents a systematic review and performance evaluation of Federated Learning in edge computing, benchmarking five leading algorithms across key metrics to identify trade-offs, highlight SCAFFOLD's superior accuracy and robustness versus FedAvg's efficiency, and propose a future research agenda to address challenges like data heterogeneity and energy limitations.

Sales Aribe Jr., Gil Nicholas Cagande2026-03-11🤖 cs.AI

Autonomous Edge-Deployed AI Agents for Electric Vehicle Charging Infrastructure Management

This paper introduces Auralink SDC, an edge-deployed multi-agent AI architecture that autonomously manages electric vehicle charging infrastructure with high reliability and sub-50ms latency, achieving 78% autonomous incident resolution and 87.6% diagnostic accuracy to address the critical failure rates and slow remediation times of current cloud-centric systems.

Mohammed Cherifi2026-03-11🤖 cs.AI

← Previous Next →