cs.AI papers | Gist.Science

AdAEM: An Adaptively and Automated Extensible Measurement of LLMs' Value Difference

This paper introduces AdAEM, a novel self-extensible evaluation framework that automatically generates adaptive test questions by probing the internal value boundaries of diverse LLMs to overcome the limitations of static benchmarks and provide more informative, distinguishable insights into models' value differences and alignment dynamics.

Jing Yao, Shitong Duan, Xiaoyuan Yi, Dongkuan Xu, Peng Zhang, Tun Lu, Ning Gu, Zhicheng Dou, Xing Xie2026-03-09🤖 cs.AI

ESGenius: Benchmarking LLMs on Environmental, Social, and Governance (ESG) and Sustainability Knowledge

The paper introduces ESGenius, the first comprehensive benchmark comprising a curated corpus of authoritative ESG documents and a rigorously validated question-answer dataset, which reveals that while large language models exhibit moderate zero-shot performance in sustainability domains, their accuracy significantly improves when grounded in retrieval-augmented generation (RAG) using the provided source materials.

Chaoyue He, Xin Zhou, Yi Wu + 9 more2026-03-09💬 cs.CL

KramaBench: A Benchmark for AI Systems on Data-to-Insight Pipelines over Data Lakes

The paper introduces KramaBench, a comprehensive benchmark featuring 104 real-world data-to-insight challenges across diverse domains, which reveals that current AI systems struggle to orchestrate end-to-end data pipelines over data lakes, achieving a maximum of only 55% accuracy despite strong performance in isolated tasks.

Eugenie Lai, Gerardo Vitagliano, Ziyu Zhang, Om Chabra, Sivaprasad Sudhir, Anna Zeng, Anton A. Zabreyko, Chenning Li, Ferdi Kossmann, Jialin Ding, Jun Chen, Markos Markakis, Matthew Russo, Weiyang Wang, Ziniu Wu, Michael J. Cafarella, Lei Cao, Samuel Madden, Tim Kraska2026-03-09🤖 cs.AI

VisioMath: Benchmarking Figure-based Mathematical Reasoning in LMMs

This paper introduces VisioMath, a benchmark of 1,800 K-12 mathematics problems featuring visually similar diagrams, to reveal that current Large Multimodal Models struggle with fine-grained comparative reasoning due to image-text misalignment and to demonstrate that alignment-oriented strategies can significantly improve performance.

Can Li, Ying Liu, Ting Zhang, Mei Wang, Hua Huang2026-03-09🤖 cs.AI

Discerning What Matters: A Multi-Dimensional Assessment of Moral Competence in LLMs

This paper critiques existing evaluations of LLM moral competence for over-relying on simplified scenarios and proposes a novel five-dimensional framework that reveals models often outperform humans in structured tasks but significantly underperform when required to discern moral relevance from noisy information, suggesting current assessments substantially overestimate their true moral reasoning capabilities.

Daniel Kilov, Caroline Hendy, Secil Yanik Guyot, Aaron J. Snoswell, Seth Lazar2026-03-09🤖 cs.AI

ContextBench: Modifying Contexts for Targeted Latent Activation

This paper introduces ContextBench, a benchmark for evaluating methods that generate fluent inputs to trigger specific latent features in language models, and demonstrates that enhanced Evolutionary Prompt Optimization variants achieve state-of-the-art performance in balancing elicitation strength with linguistic fluency.

Robert Graham, Edward Stevinson, Leo Richter, Alexander Chia, Joseph Miller, Joseph Isaac Bloom2026-03-09🤖 cs.AI

Sysformer: Safeguarding Frozen Large Language Models with Adaptive System Prompts

The paper introduces Sysformer, a novel approach that safeguards frozen large language models by learning to adapt system prompts in the embedding space to significantly improve safety robustness against harmful inputs and jailbreaking attacks without requiring costly model fine-tuning.

Kartik Sharma, Yiqiao Jin, Vineeth Rakesh, Yingtong Dou, Menghai Pan, Mahashweta Das, Srijan Kumar2026-03-09🤖 cs.AI

Iterative Quantum Feature Maps

The paper proposes Iterative Quantum Feature Maps (IQFMs), a hybrid quantum-classical framework that constructs deep architectures by iteratively connecting shallow, noise-resilient quantum feature maps with classically computed weights to mitigate hardware limitations and achieve performance comparable to classical neural networks without optimizing variational quantum parameters.

Nasa Matsumoto, Quoc Hoan Tran, Koki Chinzei, Yasuhiro Endo, Hirotaka Oshima2026-03-09⚛️ quant-ph

SPARC: Concept-Aligned Sparse Autoencoders for Cross-Model and Cross-Modal Interpretability

SPARC introduces a novel framework that unifies concept representations across diverse AI architectures and modalities by enforcing global sparsity and cross-reconstruction loss, thereby creating a shared latent space that enables direct cross-model interpretability and applications like text-guided localization without manual alignment.

Ali Nasiri-Sarvi, Hassan Rivaz, Mahdi S. Hosseini2026-03-09🤖 cs.AI

Bridging MOOCs, Smart Teaching, and AI: A Decade of Evolution Toward a Unified Pedagogy

This paper proposes a unified instructional framework that integrates MOOCs, Smart Teaching, and AI into a coherent, teaching-driven pedagogy, formalizing them as a layered knowledge transformation model to maximize systemic educational potential through structured exposure, adaptive allocation, and efficiency amplification.

Bo Yuan, Jiazi Hu2026-03-09🤖 cs.AI

ExDD: Explicit Dual Distribution Learning for Surface Defect Detection via Diffusion Synthesis

The paper introduces ExDD, a novel framework for industrial surface defect detection that overcomes data scarcity and uniform outlier assumptions by explicitly modeling dual feature distributions via parallel memory banks and generating context-aware synthetic defects using latent diffusion models.

Muhammad Aqeel, Federico Leonardi, Francesco Setti2026-03-09🤖 cs.AI

A Multi-Agent System Enables Versatile Information Extraction from the Chemical Literature

This paper presents a multimodal large language model-based multi-agent system that significantly outperforms existing state-of-the-art methods in automatically extracting structured chemical information from diverse and complex literature graphics, thereby advancing AI-driven chemical research.

Yufan Chen, Ching Ting Leung, Bowen Yu, Jianwei Sun, Yong Huang, Linyan Li, Hao Chen, Hanyu Gao2026-03-09🤖 cs.AI

MAP: Mitigating Hallucinations in Large Vision-Language Models with Map-Level Attention Processing

This paper introduces MAP, a training-free decoding method that mitigates hallucinations in Large Vision-Language Models by interpreting hidden states as a 2D semantic map and employing layer-wise criss-cross attention and global-local logit fusion to aggregate widely distributed factual information for improved factual consistency.

Chenxi Li, Yichen Guo, Benfang Qian, Jinhao You, Kai Tang, Yaosong Du, Zonghao Zhang, Xiande Huang2026-03-09🤖 cs.AI

VLMQ: Token Saliency-Driven Post-Training Quantization for Vision-language Models

This paper introduces VLMQ, a post-training quantization framework tailored for vision-language models that leverages a gradient-driven importance factor to address visual over-representation and modality gaps, thereby achieving state-of-the-art performance across various model sizes and low-bit settings.

Yufei Xue, Yushi Huang, Jiawei Shao, Lunjie Zhu, Chi Zhang, Xuelong Li, Jun Zhang2026-03-09🤖 cs.AI

SGDFuse: SAM-Guided Diffusion Model for High-Fidelity Infrared and Visible Image Fusion

The paper proposes SGDFuse, a novel two-stage conditional diffusion model guided by Segment Anything Model (SAM) semantic masks, which achieves high-fidelity infrared and visible image fusion by leveraging explicit semantic priors to preserve key targets and minimize artifacts for superior downstream task performance.

Xiaoyang Zhang, jinjiang Li, Guodong Fan, Yakun Ju, Linwei Fan, Jun Liu, Alex C. Kot2026-03-09🤖 cs.AI

Handling Infinite Domain Parameters in Planning Through Best-First Search with Delayed Partial Expansions

This paper proposes a best-first search algorithm utilizing delayed partial expansions to explicitly treat control parameters as decision points within infinite domains, offering a complete and competitive alternative to existing constraint-based approaches for automated planning.

Ángel Aso-Mollar, Diego Aineto, Enrico Scala + 1 more2026-03-09⚡ eess

Reasoned Safety Alignment: Ensuring Jailbreak Defense via Answer-Then-Check

This paper introduces "Answer-Then-Check," a novel safety alignment method that enhances LLM robustness against jailbreak attacks by training models to generate direct answers internally and then critically evaluate their safety before responding, achieving superior protection with reduced over-refusal while maintaining general reasoning capabilities through the newly constructed 80K-sample ReSA dataset.

Chentao Cao, Xiaojun Xu, Bo Han, Hang Li2026-03-09🤖 cs.AI

Better Late Than Never: Meta-Evaluation of Latency Metrics for Simultaneous Speech-to-Text Translation

This paper addresses the inconsistency and structural biases in existing latency metrics for simultaneous speech-to-text translation by introducing a comprehensive meta-evaluation, proposing new metrics (YAAL and LongYAAL) and a resegmentation tool (SoftSegmenter), and implementing these solutions within the OmniSTEval toolkit to enable more reliable system assessments.

Peter Polák, Sara Papi, Luisa Bentivogli, Ondřej Bojar2026-03-09🤖 cs.AI

LikePhys: Evaluating Intuitive Physics Understanding in Video Diffusion Models via Likelihood Preference

The paper introduces LikePhys, a training-free evaluation method using likelihood preferences to assess intuitive physics understanding in video diffusion models, demonstrating that current models show improving capabilities in physical reasoning as they scale despite challenges with complex dynamics.

Jianhao Yuan, Fabio Pizzati, Francesco Pinto, Lars Kunze, Ivan Laptev, Paul Newman, Philip Torr, Daniele De Martini2026-03-09🤖 cs.AI

Phys2Real: Fusing VLM Priors with Interactive Online Adaptation for Uncertainty-Aware Sim-to-Real Manipulation

Phys2Real is a real-to-sim-to-real reinforcement learning framework that enhances sim-to-real transfer for precise robotic manipulation by fusing vision-language model-inferred physical parameter priors with online interactive adaptation through uncertainty-aware ensemble estimation.

Maggie Wang, Stephen Tian, Aiden Swann, Ola Shorinwa, Jiajun Wu, Mac Schwager2026-03-09🤖 cs.AI

← Previous Next →