cs.AI papers | Gist.Science

Lost in Stories: Consistency Bugs in Long Story Generation by LLMs

This paper introduces ConStory-Bench, a new benchmark with 2,000 prompts and a detailed error taxonomy, alongside the ConStory-Checker automated pipeline, to systematically evaluate and analyze the prevalence and patterns of consistency errors in long-form story generation by Large Language Models.

Junjie Li, Xinrui Guo, Yuhao Wu, Roy Ka-Wei Lee, Hongzhi Li, Yutao Xie2026-03-09🤖 cs.AI

Reference-guided Policy Optimization for Molecular Optimization via LLM Reasoning

This paper introduces Reference-guided Policy Optimization (RePO), a novel framework that combines reinforcement learning with verifiable rewards and supervised reference guidance to effectively balance exploration and exploitation in molecular optimization tasks where only single-reference data is available, thereby outperforming existing SFT and RLVR baselines.

Xuan Li, Zhanke Zhou, Zongze Li, Jiangchao Yao, Yu Rong, Lu Zhang, Bo Han2026-03-09🤖 cs.AI

LUMINA: LLM-Guided GPU Architecture Exploration via Bottleneck Analysis

LUMINA is an LLM-driven framework that enhances GPU architecture exploration for AI workloads by automatically extracting design rules and performing bottleneck analysis, achieving significantly higher efficiency and better performance-area trade-offs than existing methods with minimal search cost.

Tao Zhang, Rui Ma, Shuotao Xu, Peng Cheng, Yongqiang Xiong2026-03-09🤖 cs.AI

The World Won't Stay Still: Programmable Evolution for Agent Benchmarks

This paper introduces ProEvolve, a graph-based framework that enables programmable and scalable evolution of agent environments through graph transformations, addressing the limitations of static benchmarks by better evaluating agents' adaptability to real-world dynamics.

Guangrui Li, Yaochen Xie, Yi Liu, Ziwei Dong, Xingyuan Pan, Tianqi Zheng, Jason Choi, Michael J. Morais, Binit Jha, Shaunak Mishra, Bingrou Zhou, Chen Luo, Monica Xiao Cheng, Dawn Song2026-03-09🤖 cs.AI

CORE-Seg: Reasoning-Driven Segmentation for Complex Lesions via Reinforcement Learning

This paper introduces CORE-Seg, a reinforcement learning-driven framework that integrates a Semantic-Guided Prompt Adapter with a progressive SFT-to-GRPO training strategy to bridge the gap between visual segmentation and cognitive reasoning for complex medical lesions, achieving state-of-the-art performance on the newly proposed ComLesion-14K Chain-of-Thought benchmark.

Yuxin Xie, Yuming Chen, Yishan Yang, Yi Zhou, Tao Zhou, Zhen Zhao, Jiacheng Liu, Huazhu Fu2026-03-09🤖 cs.AI

DeepFact: Co-Evolving Benchmarks and Agents for Deep Research Factuality

This paper introduces DeepFact, a framework that addresses the brittleness of static factuality benchmarks for deep research reports by proposing an Evolving Benchmarking via Audit-then-Score (AtS) methodology, which significantly improves expert verification accuracy and enables the development of a high-performing document-level verification agent.

Yukun Huang, Leonardo F. R. Ribeiro, Momchil Hardalov, Bhuwan Dhingra, Markus Dreyer, Venkatesh Saligrama2026-03-09🤖 cs.AI

Stock Market Prediction Using Node Transformer Architecture Integrated with BERT Sentiment Analysis

This paper proposes an integrated framework combining a node transformer architecture with BERT-based sentiment analysis to model stock market graphs and social media sentiment, demonstrating superior forecasting accuracy (0.80% MAPE) and directional precision compared to traditional ARIMA and LSTM models across 20 S&P 500 stocks from 1982 to 2025.

Mohammad Al Ridhawi, Mahtab Haj Ali, Hussein Al Osman2026-03-09🤖 cs.AI

BlackMirror: Black-Box Backdoor Detection for Text-to-Image Models via Instruction-Response Deviation

This paper introduces BlackMirror, a novel black-box, training-free framework that detects backdoored text-to-image models by identifying and verifying the stability of partial semantic deviations between instructions and generated images, overcoming the limitations of existing image-similarity-based methods against diverse backdoor attacks.

Feiran Li, Qianqian Xu, Shilong Bao, Zhiyong Yang, Xilin Zhao, Xiaochun Cao, Qingming Huang2026-03-09🤖 cs.AI

RAC: Rectified Flow Auto Coder

The paper introduces RAC (Rectified Flow Auto Coder), a novel architecture that replaces traditional VAEs by leveraging rectified flow for multi-step, bidirectional inference, thereby achieving superior reconstruction and generation quality with significantly reduced parameters and computational cost.

Sen Fang, Yalin Feng, Yanxin Zhang, Dimitris N. Metaxas2026-03-09🤖 cs.AI

Addressing the Ecological Fallacy in Larger LMs with Human Context

This paper demonstrates that addressing the ecological fallacy by modeling an author's language context through a specific task called HuLM, particularly during fine-tuning (HuFT) or continued pre-training, significantly improves the performance of an 8B Llama model across multiple downstream tasks compared to standard training methods.

Nikita Soni, Dhruv Vijay Kunjadiya, Pratham Piyush Shah, Dikshya Mohanty, H. Andrew Schwartz, Niranjan Balasubramanian2026-03-09🤖 cs.AI

Facial Expression Recognition Using Residual Masking Network

This paper proposes a novel Residual Masking Network that integrates a Deep Residual Network with a Unet-like segmentation architecture to refine feature maps via an attention mechanism, achieving state-of-the-art accuracy on the FER2013 and VEMO facial expression recognition datasets.

Luan Pham, The Huynh Vu, Tuan Anh Tran2026-03-09🤖 cs.AI

XAI for Coding Agent Failures: Transforming Raw Execution Traces into Actionable Insights

This paper presents a systematic explainable AI framework that transforms raw coding agent execution traces into structured, visual, and natural language explanations, significantly accelerating root cause identification and improving fix accuracy for both technical and non-technical users compared to raw traces or ad-hoc model explanations.

Arun Joshi2026-03-09🤖 cs.AI

Energy-Driven Adaptive Visual Token Pruning for Efficient Vision-Language Models

The paper proposes E-AdaPrune, an energy-driven adaptive token pruning framework that dynamically allocates visual token budgets based on spectral energy to improve Vision-Language Model efficiency and performance without adding learnable parameters or significant latency.

Jialuo He, Huangxun Chen2026-03-09🤖 cs.AI

Who We Are, Where We Are: Mental Health at the Intersection of Person, Situation, and Large Language Models

This paper proposes an interpretable modeling approach that integrates person-level psychological traits with situational context features derived from social media data to predict dynamic mental well-being, demonstrating that theory-driven methods offer competitive performance and greater human-understandable insights compared to standard language model embeddings.

Nikita Soni, August Håkan Nilsson, Syeda Mahwish, Vasudha Varadarajan, H. Andrew Schwartz, Ryan L. Boyd2026-03-09🤖 cs.AI

Domain-Adaptive Model Merging across Disconnected Modes

The paper introduces DMM, a data-free framework that merges highly divergent domain-specific models by first consolidating similar ones and then refining the result with synthesized pseudo-data to achieve state-of-the-art performance across unimodal and multimodal benchmarks without requiring centralized data.

Junming Liu, Yusen Zhang, Rongchao Zhang, Wenkai Zhu, Tian Wu2026-03-09🤖 cs.AI

Skeleton-to-Image Encoding: Enabling Skeleton Representation Learning via Vision-Pretrained Models

This paper introduces Skeleton-to-Image Encoding (S2I), a novel method that transforms heterogeneous 3D skeleton sequences into standardized image-like formats to leverage powerful vision-pretrained models for effective self-supervised skeleton representation learning and cross-modal action recognition.

Siyuan Yang, Jun Liu, Hao Cheng, Chong Wang, Shijian Lu, Hedvig Kjellstrom, Weisi Lin, Alex C. Kot2026-03-09🤖 cs.AI

Imagine How To Change: Explicit Procedure Modeling for Change Captioning

The paper introduces ProCap, a novel framework that improves change captioning by reformulating static image comparison into dynamic procedure modeling through a two-stage design that learns latent change dynamics from sparse keyframes and utilizes learnable procedure queries to generate temporally coherent descriptions of how changes occur.

Jiayang Sun, Zixin Guo, Min Cao, Guibo Zhu, Jorma Laaksonen2026-03-09🤖 cs.AI

An Interactive Multi-Agent System for Evaluation of New Product Concepts

This paper proposes an automated, LLM-based multi-agent system that utilizes retrieval-augmented generation, real-time search, and fine-tuned specialized agents to objectively evaluate new product concepts on technical and market feasibility, demonstrating results consistent with senior industry experts.

Bin Xuan, Ruo Ai, Hakyeon Lee2026-03-09🤖 cs.AI

Technical Report: Automated Optical Inspection of Surgical Instruments

This technical report details a collaboration with industry leaders in Pakistan's Sialkot surgical cluster to develop an Automated Optical Inspection system using deep learning models (YOLOv8, ResNet-152, and EfficientNet-b4) on a new dataset of 4,414 images to detect manufacturing defects in surgical instruments, thereby enhancing patient safety and manufacturing quality.

Zunaira Shafqat, Atif Aftab Ahmed Jilani, Qurrat Ul Ain2026-03-09🤖 cs.AI

TADPO: Reinforcement Learning Goes Off-road

This paper introduces TADPO, a novel reinforcement learning framework that extends Proximal Policy Optimization with off-policy teacher guidance and on-policy student exploration to enable zero-shot sim-to-real, high-speed autonomous driving on full-scale off-road vehicles navigating complex, unmapped terrain.

Zhouchonghao Wu, Raymond Song, Vedant Mundheda, Luis E. Navarro-Serment, Christof Schoenborn, Jeff Schneider2026-03-09🤖 cs.AI

← Previous Next →