cs.AI papers | Gist.Science

Dual-Metric Evaluation of Social Bias in Large Language Models: Evidence from an Underrepresented Nepali Cultural Context

This study evaluates seven state-of-the-art large language models in the underrepresented Nepali cultural context using a Dual-Metric Bias Assessment framework, revealing that while explicit agreement with biased statements is measurable, implicit generative bias is distinct, follows a non-linear relationship with temperature, and is poorly predicted by agreement metrics, thereby highlighting the critical need for culturally grounded datasets and evaluation strategies.

Ashish Pandey, Tek Raj Chhetri2026-03-10💬 cs.CL

Learning embeddings of non-linear PDEs: the Burgers' equation

This paper proposes a Physics Informed Neural Network framework with multi-head linear layers and orthogonality constraints to construct robust, interpretable low-dimensional embeddings for the viscous Burgers' equation, demonstrating that a small number of latent modes effectively capture the solution space's dominant dynamics.

Pedro Tarancón-Álvarez, Leonid Sarieddine, Pavlos Protopapas, Raul Jimenez2026-03-10🤖 cs.LG

HybridStitch: Pixel and Timestep Level Model Stitching for Diffusion Acceleration

HybridStitch is a novel Text-to-Image generation paradigm that accelerates diffusion models by treating generation as an editing process, dynamically splitting the image into easy and complex regions to leverage a small model for coarse sketching and a large model for targeted refinement, thereby achieving a 1.83× speedup on Stable Diffusion 3.

Desen Sun, Jason Hon, Jintao Zhang, Sihang Liu2026-03-10💻 cs

Column Generation for the Micro-Transit Zoning Problem

This paper generalizes the Micro-Transit Zoning Problem to incorporate a global budget and proposes an efficient Column Generation framework with pricing heuristics that outperforms existing enumeration-based approaches in solution quality and scalability across major U.S. cities.

Hins Hu, Rishav Sen, Jose Paolo Talusan, Abhishek Dubey, Aron Laszka, Samitha Samaranayake2026-03-10🔢 math

Gradient Iterated Temporal-Difference Learning

This paper introduces Gradient Iterated Temporal-Difference (GTD) learning, a novel algorithm that modifies iterated TD by computing gradients over moving targets to achieve the stability of gradient methods while matching the competitive learning speed of semi-gradient methods across diverse benchmarks like Atari games.

Théo Vincent, Kevin Gerhardt, Yogesh Tripathi, Habib Maraqten, Adam White, Martha White, Jan Peters, Carlo D'Eramo2026-03-10🤖 cs.LG

AI Misuse in Education Is a Measurement Problem: Toward a Learning Visibility Framework

This paper argues that addressing AI misuse in education requires shifting from unreliable detection methods to a "Learning Visibility Framework" that treats the learning process as assessable evidence, thereby fostering ethical AI integration through transparency and shared understanding rather than surveillance.

Eduardo Davalos, Yike Zhang2026-03-10💻 cs

DistillGuard: Evaluating Defenses Against LLM Knowledge Distillation

The paper introduces DistillGuard, a framework that systematically evaluates output-level defenses against LLM knowledge distillation and finds that most current approaches are largely ineffective, with performance degradation being highly task-dependent and insufficient to broadly prevent knowledge theft.

Bo Jiang2026-03-10💬 cs.CL

AI Steerability 360: A Toolkit for Steering Large Language Models

The paper introduces AI Steerability 360, an open-source, Hugging Face-native Python toolkit that provides a unified interface for composing, evaluating, and comparing diverse large language model steering methods across input, structural, state, and output control surfaces.

Erik Miehling, Karthikeyan Natesan Ramamurthy, Praveen Venkateswaran, Irene Ko, Pierre Dognin, Moninder Singh, Tejaswini Pedapati, Avinash Balakrishnan, Matthew Riemer, Dennis Wei, Inge Vejsbjerg, Elizabeth M. Daly, Kush R. Varshney2026-03-10💬 cs.CL

Intentional Deception as Controllable Capability in LLM Agents

This paper presents a systematic study demonstrating that LLM agents can be engineered to intentionally deceive other agents in multi-agent systems by inferring their motivations and employing strategic misdirection rather than fabrication, revealing that current fact-checking defenses are insufficient against such targeted attacks.

Jason Starace, Terence Soule2026-03-10💻 cs

SynPlanResearch-R1: Encouraging Tool Exploration for Deep Research with Synthetic Plans

The paper introduces SynPlanResearch-R1, a framework that synthesizes tool-use trajectories to encourage deeper exploration during supervised fine-tuning, thereby overcoming the limitations of reinforcement learning with verifiable rewards and significantly improving research agent performance across multiple benchmarks.

Hansi Zeng, Zoey Li, Yifan Gao, Chenwei Zhang, Xiaoman Pan, Tao Yang, Fengran Mo, Jiacheng Lin, Xian Li, Jingbo Shang2026-03-10💬 cs.CL

Slumbering to Precision: Enhancing Artificial Neural Network Calibration Through Sleep-like Processes

Inspired by biological sleep, the paper introduces Sleep Replay Consolidation (SRC), a post-training method that selectively replays internal representations to improve artificial neural network calibration and trustworthiness without requiring supervised retraining.

Jean Erik Delanois, Aditya Ahuja, Giri P. Krishnan, Maxim Bazhenov2026-03-10🤖 cs.LG

Hospitality-VQA: Decision-Oriented Informativeness Evaluation for Vision-Language Models

This paper introduces a formal framework for "informativeness" and a corresponding hospitality-specific VQA dataset to evaluate Vision-Language Models, revealing that while current models struggle with decision-oriented reasoning, their performance significantly improves with modest domain-specific finetuning.

Jeongwoo Lee, Baek Duhyeong, Eungyeol Han, Soyeon Shin, Gukin han, Seungduk Kim, Jaehyun Jeon, Taewoo Jeong2026-03-10🤖 cs.LG

CCR-Bench: A Comprehensive Benchmark for Evaluating LLMs on Complex Constraints, Control Flows, and Real-World Cases

This paper introduces CCR-Bench, a novel benchmark designed to rigorously evaluate large language models on complex, real-world industrial tasks involving entangled content-format requirements and intricate logical workflows, revealing significant performance gaps in even state-of-the-art models.

Xiaona Xue, Yiqiao Huang, Jiacheng Li, Yuanhang Zheng, Huiqi Miao, Yunfei Ma, Rui Liu, Xinbao Sun, Minglu Liu, Fanyu Meng, Chao Deng, Junlan Feng2026-03-10💬 cs.CL

Reject, Resample, Repeat: Understanding Parallel Reasoning in Language Model Inference

This paper introduces a particle filtering framework to rigorously analyze the accuracy-cost tradeoffs of parallel inference methods in large language models, establishing theoretical guarantees and identifying fundamental limits while demonstrating that sampling error alone does not fully predict final model accuracy.

Noah Golowich, Fan Chen, Dhruv Rohatgi, Raghav Singhal, Carles Domingo-Enrich, Dylan J. Foster, Akshay Krishnamurthy2026-03-10🤖 cs.LG

VLM-SubtleBench: How Far Are VLMs from Human-Level Subtle Comparative Reasoning?

This paper introduces VLM-SubtleBench, a comprehensive benchmark spanning ten fine-grained difference types across diverse domains like industrial, medical, and aerial imagery, to evaluate and reveal the significant performance gaps between current vision-language models and humans in subtle comparative reasoning tasks.

Minkyu Kim, Sangheon Lee, Dongmin Park2026-03-10🤖 cs.LG

Visualizing Coalition Formation: From Hedonic Games to Image Segmentation

This paper proposes using image segmentation as a visual diagnostic framework for hedonic games, demonstrating how granularization parameters influence coalition equilibrium structures and their ability to recover foreground ground-truth on the Weizmann benchmark.

Pedro Henrique de Paula França, Lucas Lopes Felipe, Daniel Sadoc Menasché2026-03-10💻 cs

A Lightweight Traffic Map for Efficient Anytime LaCAM*

This paper proposes a new approach that leverages LaCAM*'s ability to construct a dynamic, lightweight traffic map during search, overcoming the computational overhead and static limitations of existing Frank-Wolfe-based guidance path methods to achieve higher solution quality in Multi-Agent Path Finding.

Bojie Shen, Yue Zhang, Zhe Chen, Daniel Harabor2026-03-10💻 cs

Designing probabilistic AI monsoon forecasts to inform agricultural decision-making

This paper presents a decision-theory framework and a blended AI-statistical forecasting system that successfully delivered skillful, tailored monsoon onset predictions to 38 million Indian farmers in 2025, enabling better agricultural decision-making under uncertainty.

Colin Aitken, Rajat Masiwal, Adam Marchakitus, Katherine Kowal, Mayank Gupta, Tyler Yang, Amir Jina, Pedram Hassanzadeh, William R. Boos, Michael Kremer2026-03-10🤖 cs.LG

SMGI: A Structural Theory of General Artificial Intelligence

This paper introduces SMGI, a structural theory of general artificial intelligence that formalizes learning as the controlled evolution of a typed meta-model, unifying diverse existing approaches under a rigorous framework defined by structural closure, dynamical stability, bounded capacity, and evaluative invariance.

Aomar Osmani2026-03-10🤖 cs.LG

EveryQuery: Zero-Shot Clinical Prediction via Task-Conditioned Pretraining over Electronic Health Records

EveryQuery is a novel electronic health record foundation model that achieves efficient, zero-shot clinical prediction by directly estimating outcome likelihoods through task-conditioned pre-training, thereby outperforming computationally expensive autoregressive baselines—particularly for rare events—while currently facing limitations in complex disjunctive reasoning tasks.

Payal Chandak, Gregory Kondas, Isaac Kohane, Matthew McDermott2026-03-10💻 cs

← Previous Next →