cs.AI papers | Gist.Science

Dr. Seg: Revisiting GRPO Training for Visual Large Language Models through Perception-Oriented Design

Dr. Seg challenges the assumption that language-based GRPO training transfers seamlessly to visual perception by introducing a plug-and-play framework with a Look-to-Confirm mechanism and Distribution-Ranked Reward module that significantly enhances performance in complex visual scenarios without requiring architectural modifications.

Haoxiang Sun, Tao Wang, Chenwei Tang + 2 more2026-03-06💻 cs

AlignVAR: Towards Globally Consistent Visual Autoregression for Image Super-Resolution

This paper proposes AlignVAR, a globally consistent visual autoregressive framework for image super-resolution that overcomes locality bias and error accumulation through Spatial Consistency Autoregression and Hierarchical Consistency Constraint, achieving superior structural coherence and perceptual fidelity with significantly faster inference and fewer parameters than diffusion-based methods.

Cencen Liu, Dongyang Zhang, Wen Yin + 6 more2026-03-06💻 cs

Improving Text-to-Image Generation with Intrinsic Self-Confidence Rewards

The paper introduces SOLACE, a fully unsupervised post-training framework for text-to-image generation that leverages intrinsic self-confidence signals derived from noise recovery to optimize model performance without external reward models or annotated datasets.

Seungwook Kim, Minsu Cho2026-03-06💻 cs

AutoSkill: Experience-Driven Lifelong Learning via Skill Self-Evolution

AutoSkill is a model-agnostic, experience-driven lifelong learning framework that enables LLM agents to automatically derive, evolve, and dynamically reuse personalized skills from interaction traces without retraining, thereby transforming ephemeral user experiences into explicit, composable capabilities for personalized digital surrogates.

Yutao Yang, Junsong Li, Qianjun Pan + 9 more2026-03-06💻 cs

Agents Learn Their Runtime: Interpreter Persistence as Training-Time Semantics

This paper demonstrates that interpreter persistence is a critical training-time semantic that significantly impacts agent efficiency and stability, revealing that misalignment between training data and deployment runtime causes substantial token waste or error rates despite achieving comparable solution quality.

Victor May, Aaditya Salgarkar, Yishan Wang + 2 more2026-03-06💻 cs

ToolRLA: Multiplicative Reward Decomposition for Tool-Integrated Agents

ToolRLA is a three-stage post-training pipeline that employs a novel multiplicative reward decomposition across four dimensions to significantly enhance the accuracy, compliance, and task completion rates of tool-integrated agents in high-stakes financial advisory scenarios.

Pengbo Liu2026-03-06💻 cs

FreeAct: Freeing Activations for LLM Quantization

FreeAct is a novel quantization framework that improves Large Language Model performance by relaxing rigid one-to-one transformation constraints to dynamically allocate token-specific activation transformations, thereby addressing the distinct distribution patterns in diffusion and multimodal models.

Xiaohao Liu, Xiaobo Xia, Manyi Zhang + 6 more2026-03-06💻 cs

Real Money, Fake Models: Deceptive Model Claims in Shadow APIs

This paper presents the first systematic audit revealing that widely used "shadow APIs," which claim to provide access to restricted frontier LLMs, frequently employ deceptive practices such as model substitution and safety manipulation, thereby compromising the reliability, reproducibility, and validity of downstream applications and academic research.

Yage Zhang, Yukun Jiang, Zeyuan Chen, Michael Backes, Xinyue Shen, Yang Zhang2026-03-06🔒 cs.CR

MatRIS: Toward Reliable and Efficient Pretrained Machine Learning Interatomic Potentials

MatRIS is a novel, computationally efficient invariant machine learning interatomic potential that utilizes a linear-complexity separable attention mechanism for three-body interactions to achieve accuracy comparable to state-of-the-art equivariant models at a significantly lower training cost.

Yuanchang Zhou, Siyu Hu, Xiangyu Zhang + 3 more2026-03-06💻 cs

Kiwi-Edit: Versatile Video Editing via Instruction and Reference Guidance

Kiwi-Edit addresses the limitations of instruction-based video editing and the scarcity of reference-guided training data by introducing a scalable data generation pipeline to create the RefVIE dataset and a unified architecture that synergizes learnable queries with latent visual features to achieve state-of-the-art controllable video editing.

Yiqi Lin, Guoqiang Liang, Ziyun Zeng + 3 more2026-03-06💻 cs

IoUCert: Robustness Verification for Anchor-based Object Detectors

The paper introduces IoUCert, a novel formal verification framework that overcomes the challenges of non-linear coordinate transformations and IoU metrics to enable the first robustness verification of realistic, anchor-based object detection models like SSD and YOLO.

Benedikt Brückner, Alejandro J. Mercado, Yanghao Zhang, Panagiotis Kouvaros, Alessio Lomuscio2026-03-06🔒 cs.CR

AOI: Turning Failed Trajectories into Training Signals for Autonomous Cloud Diagnosis

AOI is a secure, trainable multi-agent framework that automates Site Reliability Engineering by leveraging Group Relative Policy Optimization and a read-write separated architecture to distill expert knowledge into local models and convert failed trajectories into corrective signals, achieving state-of-the-art performance on the AIOpsLab benchmark while ensuring data privacy and safe execution.

Pei Yang, Wanyi Chen, Asuka Yuxi Zheng + 11 more2026-03-06💻 cs

RADAR: Learning to Route with Asymmetry-aware DistAnce Representations

RADAR is a scalable neural framework that enhances vehicle routing problem solvers for asymmetric scenarios by leveraging Singular Value Decomposition to encode static distance asymmetry and Sinkhorn normalization to model dynamic interaction asymmetry, thereby achieving superior generalization and performance on both synthetic and real-world benchmarks.

Hang Yi, Ziwei Huang, Yining Ma + 1 more2026-03-06💻 cs

A theoretical model of dynamical grammatical gender shifting based on set-valued set function

This paper proposes a mathematical framework based on a set-valued set function within a Template-Based and Modular Cognitive model to formally explain the nonlinear dynamics of grammatical gender shifting and noun-to-noun derivation, using empirical data from Riffian to challenge conventional views on word formation.

Mohamed El Idrissi2026-03-06💻 cs

Baseline Performance of AI Tools in Classifying Cognitive Demand of Mathematical Tasks

This study evaluates eleven general-purpose and education-specific AI tools, finding that they achieve only moderate accuracy (63%) in classifying the cognitive demand of mathematical tasks due to a systematic bias toward middle-level categories and a tendency to prioritize surface textual features over underlying cognitive processes, thereby limiting their immediate reliability for teacher planning without improved prompt engineering or tool development.

Danielle S. Fox, Brenda L. Robles, Elizabeth DiPietro Brovey + 1 more2026-03-06💻 cs

DMD-augmented Unpaired Neural Schrödinger Bridge for Ultra-Low Field MRI Enhancement

This paper proposes a DMD-augmented Unpaired Neural Schrödinger Bridge framework that enhances Ultra-Low Field (64 mT) MRI image quality by leveraging diffusion-guided distribution matching and anatomical structure preservation to achieve superior realism and structural fidelity in translating unpaired 64 mT scans to 3 T quality.

Youngmin Kim, Jaeyun Shin, Jeongchan Kim + 5 more2026-03-06💻 cs

Zero-Knowledge Proof (ZKP) Authentication for Offline CBDC Payment System Using IoT Devices

This paper proposes a privacy-preserving, offline Central Bank Digital Currency (CBDC) payment model for resource-constrained IoT devices that integrates Secure Elements, lightweight Zero-Knowledge Proofs, and intermittent synchronization to enable secure, cash-like transactions while preventing double-spending and ensuring AML/CFT compliance without continuous internet connectivity.

Santanu Mondal, T. Chithralekha2026-03-06🔒 cs.CR

Measuring AI R&D Automation

This paper proposes a set of empirical metrics to track the extent and consequences of AI R&D automation, aiming to address data gaps regarding its impact on capability acceleration, safety progress, and oversight capabilities to guide better decision-making by companies and governments.

Alan Chan, Ranay Padarath, Joe Kwon + 2 more2026-03-06💻 cs

Bielik-Q2-Sharp: A Comparative Study of Extreme 2-bit Quantization Methods for a Polish 11B Language Model

This paper presents Bielik-Q2-Sharp, a systematic evaluation of six 2-bit quantization methods on a Polish 11B language model that identifies QuIP# as a high-performing variant comparable to the IQ2_XXS baseline while revealing a critical dissociation between log-likelihood preservation and autoregressive generation in rotation-based methods.

Jakub Prejzner2026-03-06💻 cs

FinRetrieval: A Benchmark for Financial Data Retrieval by AI Agents

This paper introduces FinRetrieval, a benchmark evaluating AI agents' ability to retrieve specific financial data from structured databases, revealing that tool availability (particularly structured APIs) is the primary driver of performance while highlighting nuanced impacts of reasoning modes and geographic naming conventions.

Eric Y. Kim, Jie Huang2026-03-06💻 cs

← Previous Next →