cs.AI papers | Gist.Science

SCAM: A Real-World Typographic Robustness Evaluation for Multimodal Foundation Models

This paper introduces SCAM, the largest and most diverse real-world dataset of typographic attack images, to evaluate and demonstrate the significant vulnerability of state-of-the-art multimodal foundation models to such attacks while providing empirical insights into how model architecture and training data influence robustness.

Justus Westerhoff, Erblina Purelku, Jakob Hackstein + 4 more2026-03-12🤖 cs.AI

Offline Dynamic Inventory and Pricing Strategy: Addressing Censored and Dependent Demand

This paper proposes a novel data-driven framework using offline reinforcement learning and survival analysis to estimate optimal pricing and inventory control policies in sequential environments with censored and dependent demand, overcoming challenges like missing profit information and non-stationarity by approximating the problem as a high-order Markov decision process.

Korel Gundem, Zhengling Qi2026-03-12📊 stat

Scalable Multi-Task Learning through Spiking Neural Networks with Adaptive Task-Switching Policy for Intelligent Autonomous Agents

The paper proposes SwitchMT, a novel methodology for scalable multi-task learning in resource-constrained autonomous agents that combines a Deep Spiking Q-Network with active dendrites and an adaptive task-switching policy to effectively mitigate task interference and outperform state-of-the-art methods in Atari games.

Rachmad Vidya Wicaksana Putra, Avaneesh Devkota, Muhammad Shafique2026-03-12🤖 cs.AI

Large Language Model Psychometrics: A Systematic Review of Evaluation, Validation, and Enhancement

This systematic review introduces the emerging interdisciplinary field of LLM Psychometrics, which applies psychometric theories and instruments to develop comprehensive evaluation frameworks for measuring human-like psychological constructs in large language models, ultimately guiding the creation of more robust, human-centered AI systems.

Haoran Ye, Jing Jin, Yuhang Xie, Xin Zhang, Guojie Song2026-03-12💬 cs.CL

REI-Bench: Can Embodied Agents Understand Vague Human Instructions in Task Planning?

This paper introduces REI-Bench, the first benchmark for evaluating robot task planning under vague referring expressions, revealing that such vagueness significantly degrades performance and demonstrating that a task-oriented context cognition approach effectively mitigates this issue to improve accessibility for non-expert users.

Chenxi Jiang, Chuhao Zhou, Jianfei Yang2026-03-12💬 cs.CL

Training with Pseudo-Code for Instruction Following

This paper proposes a training-time approach that fine-tunes Large Language Models using instruction-tuning data augmented with pseudo-code representations of natural language instructions, resulting in significant improvements in instruction-following reliability and overall reasoning performance across multiple benchmarks.

Prince Kumar, Rudra Murthy, Riyaz Bhat, Danish Contractor2026-03-12💬 cs.CL

LLLMs: A Data-Driven Survey of Evolving Research on Limitations of Large Language Models

This paper presents a data-driven survey of 14,648 studies from 2022 to early 2025, revealing that research on the limitations of large language models (LLLMs) has surged to over 30% of all LLM-related work, with reasoning, generalization, and hallucination being the most prominent areas of focus.

Aida Kostikova, Zhipin Wang, Deidamea Bajri, Ole Pütz, Benjamin Paaßen, Steffen Eger2026-03-12💬 cs.CL

Consistency-based Abductive Reasoning over Perceptual Errors of Multiple Pre-trained Models in Novel Environments

This paper proposes a consistency-based abductive reasoning framework that integrates predictions from multiple pre-trained models at test time to mitigate performance degradation in novel environments, achieving significant improvements in accuracy and F1-score over individual models and standard ensembles by selecting a subset of predictions that maximizes coverage while minimizing logical inconsistencies.

Mario Leiva, Noel Ngu, Joshua Shay Kricheli, Aditya Taparia, Ransalu Senanayake, Paulo Shakarian, Nathaniel Bastian, John Corcoran, Gerardo Simari2026-03-12🤖 cs.AI

Comparative Analysis of Modern Machine Learning Models for Retail Sales Forecasting

This study demonstrates that for retail sales forecasting characterized by intermittent demand and missing data, localized tree-based ensemble methods like XGBoost outperform sophisticated deep learning architectures, suggesting that aligning model selection with specific problem constraints is more critical than architectural complexity.

Luka Hobor, Mario Brcic, Lidija Polutnik, Ante Kapetanovic2026-03-12🤖 cs.LG

Self-Improving Loops for Visual Robotic Planning

This paper proposes SILVR, a self-improving framework that enables visual robotic planners to iteratively enhance their performance on novel tasks by continuously updating an in-domain video model using self-collected trajectories, achieving robust results without requiring ground-truth reward functions or expert demonstrations.

Calvin Luo, Zilai Zeng, Mingxi Jia, Yilun Du, Chen Sun2026-03-12🤖 cs.AI

Learning What Reinforcement Learning Can't: Interleaved Online Fine-Tuning for Hardest Questions

The paper introduces ReLIFT, a novel training framework that interleaves reinforcement learning with online supervised fine-tuning on challenging questions, enabling large language models to acquire new knowledge and reasoning patterns beyond their original capabilities while achieving superior performance with significantly less demonstration data.

Lu Ma, Hao Liang, Meiyi Qiang, Lexiang Tang, Xiaochen Ma, Zhen Hao Wong, Junbo Niu, Chengyu Shen, Runming He, Yanhao Li, Bin Cui, Wentao Zhang2026-03-12🤖 cs.AI

Differential Privacy in Machine Learning: A Survey from Symbolic AI to LLMs

This survey provides a comprehensive overview of differential privacy in machine learning, tracing its theoretical evolution from symbolic AI to large language models, examining integration methods for privacy-preserving training, and outlining practical evaluation techniques.

Francisco Aguilera-Martínez, Fernando Berzal2026-03-12🤖 cs.AI

Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation

This paper introduces Locality-aware Parallel Decoding (LPD), a novel framework combining flexible parallelized autoregressive modeling and locality-aware generation ordering to significantly accelerate autoregressive image generation by reducing inference steps and latency while maintaining high-quality results.

Zhuoyang Zhang, Luke J. Huang, Chengyue Wu, Shang Yang, Kelly Peng, Yao Lu, Song Han2026-03-12🤖 cs.AI

Technological folie à deux: Feedback Loops Between AI Chatbots and Mental Illness

This paper argues that the interaction between human cognitive biases and AI chatbot behaviors like sycophancy creates dangerous feedback loops that can destabilize beliefs and exacerbate mental illness, necessitating coordinated interventions across clinical, technical, and regulatory domains.

Sebastian Dohnány, Zeb Kurth-Nelson, Eleanor Spens, Lennart Luettgau, Alastair Reid, Iason Gabriel, Christopher Summerfield, Murray Shanahan, Matthew M Nour2026-03-12🧬 q-bio

What Makes Code Generation Ethically Sourced?

This paper introduces the novel concept of Ethically Sourced Code Generation (ES-CodeGen) as a framework for managing the entire lifecycle of code generation models through ethical and sustainable practices, establishing an 11-dimension taxonomy and identifying key consequences through a comprehensive literature review and practitioner survey.

Zhuolin Xu, Chenglin Li, Qiushi Li, Shin Hwei Tan2026-03-12🤖 cs.AI

IntrinsicWeather: Controllable Weather Editing in Intrinsic Space

IntrinsicWeather is a diffusion-based framework that achieves controllable weather editing by decomposing images into intrinsic maps (geometry, material, and lighting) for enhanced spatial control and utilizing CLIP-space interpolation for fine-grained weather manipulation, outperforming existing methods on both synthetic and real-world datasets.

Yixin Zhu, Zuo-Liang Zhu, Jian Yang + 3 more2026-03-12🤖 cs.AI

Shadow in the Cache: Unveiling and Mitigating Privacy Risks of KV-cache in LLM Inference

This paper reveals that the Key-Value (KV) cache used to accelerate Large Language Model inference is vulnerable to privacy attacks that allow attackers to reconstruct sensitive user inputs, and it proposes KV-Cloak, a lightweight and efficient obfuscation defense that effectively prevents such leakage without compromising model accuracy or performance.

Zhifan Luo, Shuo Shao, Su Zhang, Lijing Zhou, Yuke Hu, Chenxu Zhao, Zhihao Liu, Zhan Qin2026-03-12💬 cs.CL

The Yokai Learning Environment: Tracking Beliefs Over Space and Time

This paper introduces the Yokai Learning Environment (YLE), a new open-source benchmark for zero-shot coordination that overcomes the saturation of the Hanabi Learning Environment by requiring agents to track moving cards and reason under ambiguous hints, thereby revealing that current state-of-the-art methods fail to maintain consistent internal models when paired with unseen partners.

Constantin Ruhdorfer, Matteo Bortoletto, Johannes Forkel, Jakob Foerster, Andreas Bulling2026-03-12🤖 cs.AI

From Next Token Prediction to (STRIPS) World Models

This paper investigates whether next-token prediction can learn symbolic STRIPS world models for planning, finding that while a specialized STRIPS Transformer offers theoretical alignment, a standard transformer with stick-breaking attention achieves superior training accuracy and generalization, enabling effective planning across unseen states and goals.

Carlos Núñez-Molina, Vicenç Gómez, Hector Geffner2026-03-12🤖 cs.AI

Global Minimizers of Sigmoid Contrastive Loss

This paper theoretically characterizes the global minimizers of sigmoid contrastive loss as $(\mathsf{m}, \mathsf{b}_{\mathsf{rel}})$ -Constellations, providing a rigorous explanation for the success of SigLIP models, the origin of the modality gap, and the necessary dimensionality for high-quality representations while proposing an improved reparameterization for training dynamics.

Kiril Bangachev, Guy Bresler, Iliyas Noman, Yury Polyanskiy2026-03-12🤖 cs.LG

← Previous Next →