cs.AI papers | Gist.Science

Does Semantic Noise Initialization Transfer from Images to Videos? A Paired Diagnostic Study

This paper investigates whether semantic noise initialization, known to improve image diffusion models, transfers to text-to-video generation, finding that while it shows a slight positive trend on temporal metrics, it does not significantly outperform standard Gaussian noise due to weak or unstable signals in the noise space.

Yixiao Jing, Chaoyu Zhang, Zixuan Zhong, Peizhou Huang2026-03-10💻 cs

AutoFigure-Edit: Generating Editable Scientific Illustration

AutoFigure-Edit is an end-to-end system that generates fully editable, high-quality scientific illustrations from long-form text with flexible style adaptation via reference images, leveraging long-context understanding and native SVG support to overcome limitations in editability and efficiency found in existing automated tools.

Zhen Lin, Qiujie Xie, Minjun Zhu, Shichen Li, Qiyao Sun, Enhao Gu, Yiran Ding, Ke Sun, Fang Guo, Panzhong Lu, Zhiyuan Ning, Yixuan Weng, Yue Zhang2026-03-10💻 cs

XAI and Few-shot-based Hybrid Classification Model for Plant Leaf Disease Prognosis

This paper proposes a hybrid few-shot learning model integrating Siamese and Prototypical Networks with Grad-CAM-based Explainable AI to achieve high-accuracy, interpretable classification of maize, rice, and wheat leaf diseases under limited data conditions.

Diana Susan Joseph, Pranav M Pawar, Raja Muthalagu, Mithun Mukharjee2026-03-10🤖 cs.LG

Chart Deep Research in LVLMs via Parallel Relative Policy Optimization

This paper addresses the limitations of current Large Vision-Language Models in deep chart research by proposing Parallel Relative Policy Optimization (PRPO) to resolve training conflicts and constructing the MCDR-Bench evaluation framework to enable objective assessment of complex reasoning capabilities.

Jiajin Tang, Gaoyang, Wenjie Wang, Sibei Yang, Xing Chen2026-03-10🤖 cs.LG

MultiGen: Level-Design for Editable Multiplayer Worlds in Diffusion Game Engines

The paper introduces MultiGen, a novel diffusion-based game engine that incorporates an explicit, persistent external memory to enable user-editable world structures and support coherent, real-time multiplayer interactions, overcoming the limitations of conventional next-frame prediction models.

Ryan Po, David Junhao Zhang, Amir Hertz, Gordon Wetzstein, Neal Wadhwa, Nataniel Ruiz2026-03-10💻 cs

VB: Visibility Benchmark for Visibility and Perspective Reasoning in Images

This paper introduces VB, a novel benchmark designed to evaluate vision-language models' ability to determine image visibility and appropriately abstain from answering when evidence is insufficient, utilizing controlled minimal edits and specialized metrics to reveal that top-tier models like GPT-4o and Gemini 3.1 Pro significantly outperform open-source alternatives in confidence-aware accuracy and perspective reasoning.

Neil Tripathi2026-03-10💻 cs

Narrative Weaver: Towards Controllable Long-Range Visual Consistency with Multi-Modal Conditioning

The paper introduces "Narrative Weaver," a novel framework that achieves controllable, long-range visual consistency in generative AI by integrating multimodal narrative planning with a dynamic memory bank, validated through extensive experiments and a newly released e-commerce advertising dataset.

Zhengjian Yao, Yongzhi Li, Xinyuan Gao, Quan Chen, Peng Jiang, Yanye Lu2026-03-10💻 cs

Thinking with Gaze: Sequential Eye-Tracking as Visual Reasoning Supervision for Medical VLMs

This paper introduces a method that enhances medical Vision-Language Models by using sequential eye-tracking data as supervision to train dedicated gaze tokens, enabling the models to mimic radiologists' visual search patterns and achieve state-of-the-art performance in both in-domain and out-of-domain medical reasoning tasks.

Yiwei Li, Zihao Wu, Yanjun Lv, Hanqi Jiang, Weihang You, Zhengliang Liu, Dajiang Zhu, Xiang Li, Quanzheng Li, Tianming Liu, Lin Zhao2026-03-10💻 cs

Mining Beyond the Bools: Learning Data Transformations and Temporal Specifications

This paper proposes a novel approach to mining data-aware temporal specifications from execution traces by combining Syntax Guided Synthesis with a finite-prefix interpretation of Temporal Stream Logic (TSL $_f$ ), enabling the robust and sample-efficient synthesis of reactive programs that capture both data transformations and temporal behaviors.

Sam Nicholas Kouteili, William Fishell, Christian Scaff, Mark Santolucito, Ruzica Piskac2026-03-10💻 cs

Scaling Agentic Capabilities, Not Context: Efficient Reinforcement Finetuning for Large Toolspaces

The paper introduces ATLAS, a reinforcement finetuning framework that enables small language models to effectively navigate large toolspaces by learning adaptive context acquisition and execution strategies, thereby achieving frontier-level performance with significantly reduced parameter and context budgets.

Karan Gupta, Pranav Vajreshwari, Yash Pandya, Raghav Magazine, Akshay Nambi, Ahmed Awadallah2026-03-10🤖 cs.LG

Dynamic Targeting of Satellite Observations Using Supplemental Geostationary Satellite Data and Hierarchical Planning

This paper proposes a hierarchical planning approach that integrates supplemental geostationary satellite data to extend lookahead horizons for Dynamic Targeting missions, demonstrating up to a 41% performance improvement over traditional onboard-only planners, particularly in scenarios with sparsely distributed targets.

Akseli Kangaslahti, Itai Zilberstein, Alberto Candela, Steve Chien2026-03-10💻 cs

ProtAlign: Contrastive learning paradigm for Sequence and structure alignment

The paper introduces ProtAlign, a contrastive learning framework that unifies protein sequence and structure representations into a shared embedding space, thereby enabling cross-modal retrieval and improving downstream tasks like function annotation and stability estimation.

Aditya Ranganath, Hasin Us Sami, Kowshik Thopalli, Bhavya Kailkhura, Wesam Sakla2026-03-10🤖 cs.LG

UWPD: A General Paradigm for Invisible Watermark Detection Agnostic to Embedding Algorithms

This paper introduces Universal Watermark Presence Detection (UWPD), a novel task for identifying invisible watermarks without prior algorithm knowledge, supported by the UniFreq-100K dataset and the Frequency Shield Network (FSNet) model that achieves superior zero-shot detection by dynamically amplifying high-frequency watermark signals while suppressing semantic content.

Xiang Ao, Yiling Du, Zidan Wang, Mengru Chen2026-03-10💻 cs

Bi Directional Feedback Fusion for Activity Aware Forecasting of Indoor CO2 and PM2.5

This paper proposes a bi-directional feedback fusion framework that integrates human activity embeddings with dual-timescale temporal modules to significantly improve the accuracy and interpretability of indoor CO2 and PM2.5 forecasting compared to traditional data-driven models.

Harshala Gammulle, Lidia Morawska, Sridha Sridharan, Clinton Fookes2026-03-10🤖 cs.LG

Regression Models Meet Foundation Models: A Hybrid-AI Approach to Practical Electricity Price Forecasting

This paper introduces FutureBoosting, a hybrid-AI framework that enhances electricity price forecasting by integrating forecasted features from a frozen time series foundation model into a regression model, thereby achieving significant accuracy improvements over state-of-the-art baselines while maintaining interpretability.

Yunzhong Qiu, Binzhu Li, Hao Wei, Shenglin Weng, Chen Wang, Zhongyi Pei, Mingsheng Long, Jianmin Wang2026-03-10🤖 cs.LG

Safe Transformer: An Explicit Safety Bit For Interpretable And Controllable Alignment

The paper proposes Safe Transformer, a modular approach that inserts an explicit, interpretable safety bit into pre-trained language models to achieve controllable alignment and near-zero attack success rates through lightweight fine-tuning, addressing the opacity of traditional implicit safety methods.

Jingyuan Feng, Andrew Gambardella, Gouki Minegishi, Takeshi Kojima, Yusuke Iwasawa, Yutaka Matsuo2026-03-10🤖 cs.LG

Don't Freeze, Don't Crash: Extending the Safe Operating Range of Neural Navigation in Dense Crowds

This paper proposes a reinforcement learning approach for dense crowd navigation that achieves zero-shot generalization to higher crowd densities by combining density-invariant observation encoding, density-randomized training, and physics-informed proxemic reward shaping, thereby significantly outperforming existing learning-based and analytical methods in success rate and collision avoidance without freezing.

Jiefu Zhang, Yang Xu, Vaneet Aggarwal2026-03-10🤖 cs.LG

Calibrated Credit Intelligence: Shift-Robust and Fair Risk Scoring with Bayesian Uncertainty and Gradient Boosting

This paper introduces Calibrated Credit Intelligence (CCI), a deployment-oriented framework that integrates Bayesian uncertainty quantification, fairness-constrained gradient boosting, and shift-aware fusion to deliver accurate, reliable, and equitable credit risk scores that remain robust under temporal distribution shifts.

Srikumar Nayak2026-03-10🤖 cs.LG

Agent Hunt: Bounty Based Collaborative Autoformalization With LLM Agents

This paper presents "Agent Hunt," a decentralized autoformalization system for algebraic topology that utilizes a simulated bounty-based marketplace to coordinate multiple LLM agents in dynamically proposing, competing to prove, and iteratively refining formal statements within an Interactive Theorem Proving environment.

Chad E. Brown, Cezary Kaliszyk, Josef Urban2026-03-10💻 cs

Rank-Factorized Implicit Neural Bias: Scaling Super-Resolution Transformer with FlashAttention

This paper proposes Rank-factorized Implicit Neural Bias (RIB), a novel positional bias mechanism that enables the use of hardware-efficient FlashAttention in Super-Resolution Transformers, allowing for significantly larger window sizes and training patches that achieve state-of-the-art performance (35.63 dB PSNR) while reducing training and inference times by 2.1 $\times$ and 2.9 $\times$ , respectively.

Dongheon Lee, Seokju Yun, Jaegyun Im, Youngmin Ro2026-03-10🤖 cs.LG

← Previous Next →