cs.AI papers | Gist.Science

HURRI-GAN: A Novel Approach for Hurricane Bias-Correction Beyond Gauge Stations using Generative Adversarial Networks

The paper introduces HURRI-GAN, a novel TimeGAN-based framework that corrects systemic biases in high-resolution hurricane simulation models like ADCIRC, enabling accurate, near real-time storm surge forecasting and bias extrapolation beyond gauge station locations while significantly reducing computational runtime.

Noujoud Nadera, Hadi Majed, Stefanos Giaremis, Rola El Osta, Clint Dawson, Carola Kaiser, Hartmut Kaiser2026-03-10🤖 cs.LG

Geodesic Gradient Descent: A Generic and Learning-rate-free Optimizer on Objective Function-induced Manifolds

This paper introduces Geodesic Gradient Descent (GGD), a generic, learning-rate-free optimization algorithm that approximates local neighborhoods of objective function-induced hypersurfaces using n-dimensional spheres to ensure update trajectories remain on the manifold, achieving significant performance improvements over Adam on both regression and classification tasks.

Liwei Hu, Guangyao Li, Wenyong Wang, Xiaoming Zhang, Yu Xiang2026-03-10🤖 cs.LG

PaLMR: Towards Faithful Visual Reasoning via Multimodal Process Alignment

PaLMR is a novel framework that enhances the faithfulness of multimodal large language models by aligning both the reasoning process and outcomes through a perception-aligned data layer and a hierarchical reward fusion scheme, thereby significantly reducing visual hallucinations while achieving state-of-the-art performance on key benchmarks.

Yantao Li, Qiang Hui, Chenyang Yan, Kanzhi Cheng, Fang Zhao, Chao Tan, Huanling Gao, Jianbing Zhang, Kai Wang, Xinyu Dai, Shiguo Lian2026-03-10💻 cs

A Parameter-efficient Convolutional Approach for Weed Detection in Multispectral Aerial Imagery

This paper introduces FCBNet, a parameter-efficient convolutional model featuring a frozen ConvNeXt backbone and a Feature Correction Block that achieves superior weed segmentation accuracy (over 85% mIoU) and computational efficiency across RGB and multispectral aerial imagery compared to existing state-of-the-art models.

Leo Thomas Ramos, Angel D. Sappa2026-03-10💻 cs

GameVerse: Can Vision-Language Models Learn from Video-based Reflection?

The paper introduces GameVerse, a comprehensive benchmark featuring a novel reflect-and-retry paradigm and a hierarchical taxonomy across 15 games, demonstrating that Vision-Language Models can effectively improve their gameplay policies through video-based reflection by combining failure trajectories with expert tutorials.

Kuan Zhang, Dongchen Liu, Qiyue Zhao, Jinkun Hou, Xinran Zhang, Qinlei Xie, Miao Liu, Yiming Li2026-03-10💻 cs

Science Literacy: Generative AI as Enabler of Coherence in the Teaching, Learning, and Assessment of Scientific Knowledge and Reasoning

This chapter explores the potential of generative AI to enhance K-16+ science literacy by proposing a coherent architectural framework that aligns the teaching, learning, and assessment of scientific knowledge and reasoning, while addressing associated challenges and outlining future research needs.

Xiaoming Zhai, James W. Pellegrino, Matias Rojas, Jongchan Park, Matthew Nyaaba, Clayton Cohn, Gautam Biswas2026-03-10💻 cs

Graph-of-Mark: Promote Spatial Reasoning in Multimodal Language Models with Graph-Based Visual Prompting

The paper proposes Graph-of-Mark (GoM), a novel pixel-level visual prompting technique that overlays scene graphs onto images to capture object relationships, thereby significantly enhancing the spatial reasoning and zero-shot performance of multimodal language models.

Giacomo Frisoni, Lorenzo Molfetta, Mattia Buzzoni, Gianluca Moro2026-03-10💻 cs

Accelerating Video Generation Inference with Sequential-Parallel 3D Positional Encoding Using a Global Time Index

This paper introduces a system-level inference optimization for Diffusion Transformer-based video generation that employs a sequence-parallel Causal-RoPE mechanism and operator fusion to overcome memory and latency bottlenecks, achieving near real-time speeds and sub-second first-frame latency on an eight-GPU cluster.

Chao Yuan, Pan Li2026-03-10💻 cs

Better Eyes, Better Thoughts: Why Vision Chain-of-Thought Fails in Medicine

This paper reveals that Chain-of-Thought prompting often underperforms direct answering in medical visual question answering due to a "medical perception bottleneck," and proposes training-free grounding interventions to restore visual accuracy and improve model reasoning.

Yuan Wu, Zongxian Yang, Jiayu Qian, Songpan Gao, Guanxing Chen, Qiankun Li, Yu-An Huang, Zhi-An Huang2026-03-10💻 cs

Hybrid Orchestration of Edge AI and Microservices via Graph-based Self-Imitation Learning

This paper introduces SIL-GPO, a reinforcement learning framework that combines graph attention networks with self-imitation learning to optimize the joint deployment and routing of heterogeneous edge AI and microservices, significantly reducing end-to-end latency and improving resource utilization compared to existing methods.

Chen Yang, Jin Zheng, Yang Zhuolin, Lai Pan, Zhang Xiao, Hu Menglan, Yin Haiyan2026-03-10💻 cs

calibfusion: Transformer-Based Differentiable Calibration for Radar-Camera Fusion Detection in Water-Surface Environments

The paper proposes CalibFusion, a Transformer-based differentiable calibration framework that learns to implicitly refine Radar-Camera extrinsics end-to-end to overcome the challenges of textureless, cluttered water-surface environments and significantly improve fusion-based 2D object detection.

Yuting Wan, Liguo Sun, Jiuwu Hao, Pin LV2026-03-10💻 cs

ERP-RiskBench: Leakage-Safe Ensemble Learning for Financial Risk

This paper introduces ERP-RiskBench, a leakage-safe ensemble learning framework for detecting financial risks in ERP systems that combines hybrid risk definitions, rigorous time- and group-aware validation to prevent data leakage, and a stacking gradient boosting model to deliver reproducible, interpretable, and operationally grounded fraud detection results.

Sanjay Mishra2026-03-10🤖 cs.LG

Does Semantic Noise Initialization Transfer from Images to Videos? A Paired Diagnostic Study

This paper investigates whether semantic noise initialization, known to improve image diffusion models, transfers to text-to-video generation, finding that while it shows a slight positive trend on temporal metrics, it does not significantly outperform standard Gaussian noise due to weak or unstable signals in the noise space.

Yixiao Jing, Chaoyu Zhang, Zixuan Zhong, Peizhou Huang2026-03-10💻 cs

AutoFigure-Edit: Generating Editable Scientific Illustration

AutoFigure-Edit is an end-to-end system that generates fully editable, high-quality scientific illustrations from long-form text with flexible style adaptation via reference images, leveraging long-context understanding and native SVG support to overcome limitations in editability and efficiency found in existing automated tools.

Zhen Lin, Qiujie Xie, Minjun Zhu, Shichen Li, Qiyao Sun, Enhao Gu, Yiran Ding, Ke Sun, Fang Guo, Panzhong Lu, Zhiyuan Ning, Yixuan Weng, Yue Zhang2026-03-10💻 cs

XAI and Few-shot-based Hybrid Classification Model for Plant Leaf Disease Prognosis

This paper proposes a hybrid few-shot learning model integrating Siamese and Prototypical Networks with Grad-CAM-based Explainable AI to achieve high-accuracy, interpretable classification of maize, rice, and wheat leaf diseases under limited data conditions.

Diana Susan Joseph, Pranav M Pawar, Raja Muthalagu, Mithun Mukharjee2026-03-10🤖 cs.LG

Chart Deep Research in LVLMs via Parallel Relative Policy Optimization

This paper addresses the limitations of current Large Vision-Language Models in deep chart research by proposing Parallel Relative Policy Optimization (PRPO) to resolve training conflicts and constructing the MCDR-Bench evaluation framework to enable objective assessment of complex reasoning capabilities.

Jiajin Tang, Gaoyang, Wenjie Wang, Sibei Yang, Xing Chen2026-03-10🤖 cs.LG

MultiGen: Level-Design for Editable Multiplayer Worlds in Diffusion Game Engines

The paper introduces MultiGen, a novel diffusion-based game engine that incorporates an explicit, persistent external memory to enable user-editable world structures and support coherent, real-time multiplayer interactions, overcoming the limitations of conventional next-frame prediction models.

Ryan Po, David Junhao Zhang, Amir Hertz, Gordon Wetzstein, Neal Wadhwa, Nataniel Ruiz2026-03-10💻 cs

VB: Visibility Benchmark for Visibility and Perspective Reasoning in Images

This paper introduces VB, a novel benchmark designed to evaluate vision-language models' ability to determine image visibility and appropriately abstain from answering when evidence is insufficient, utilizing controlled minimal edits and specialized metrics to reveal that top-tier models like GPT-4o and Gemini 3.1 Pro significantly outperform open-source alternatives in confidence-aware accuracy and perspective reasoning.

Neil Tripathi2026-03-10💻 cs

Narrative Weaver: Towards Controllable Long-Range Visual Consistency with Multi-Modal Conditioning

The paper introduces "Narrative Weaver," a novel framework that achieves controllable, long-range visual consistency in generative AI by integrating multimodal narrative planning with a dynamic memory bank, validated through extensive experiments and a newly released e-commerce advertising dataset.

Zhengjian Yao, Yongzhi Li, Xinyuan Gao, Quan Chen, Peng Jiang, Yanye Lu2026-03-10💻 cs

Thinking with Gaze: Sequential Eye-Tracking as Visual Reasoning Supervision for Medical VLMs

This paper introduces a method that enhances medical Vision-Language Models by using sequential eye-tracking data as supervision to train dedicated gaze tokens, enabling the models to mimic radiologists' visual search patterns and achieve state-of-the-art performance in both in-domain and out-of-domain medical reasoning tasks.

Yiwei Li, Zihao Wu, Yanjun Lv, Hanqi Jiang, Weihang You, Zhengliang Liu, Dajiang Zhu, Xiang Li, Quanzheng Li, Tianming Liu, Lin Zhao2026-03-10💻 cs

← Previous Next →