cs.LG papers | Gist.Science

AMA-Bench: Evaluating Long-Horizon Memory for Agentic Applications

This paper introduces AMA-Bench, a novel benchmark designed to evaluate long-horizon memory in agentic applications using real-world and synthetic machine-generated trajectories, and proposes AMA-Agent, a causality-driven memory system that significantly outperforms existing baselines by addressing the limitations of current similarity-based retrieval methods.

Yujie Zhao, Boqin Yuan, Junbo Huang + 9 more2026-03-05🤖 cs.AI

Causal Identification from Counterfactual Data: Completeness and Bounding Results

This paper introduces the CTFIDU+ algorithm to establish the completeness of identifying counterfactual queries from physically realizable Layer 3 data, defines the fundamental limits of exact causal inference in this setting, and derives novel analytic bounds for non-identifiable quantities that are empirically shown to be tighter with counterfactual data.

Arvind Raghavan, Elias Bareinboim2026-03-05🤖 cs.AI

CMI-RewardBench: Evaluating Music Reward Models with Compositional Multimodal Instruction

This paper addresses the evaluation gap in music generation by introducing CMI-RewardBench, a comprehensive ecosystem comprising large-scale datasets, a unified benchmark, and efficient reward models to evaluate and improve music generation under complex compositional multimodal instructions.

Yinghao Ma, Haiwen Xia, Hewei Gao + 9 more2026-03-05🤖 cs.AI

Causal Circuit Tracing Reveals Distinct Computational Architectures in Single-Cell Foundation Models: Inhibitory Dominance, Biological Coherence, and Cross-Model Convergence

This study introduces causal circuit tracing to reveal that distinct single-cell foundation models (Geneformer and scGPT) share conserved computational architectures characterized by inhibitory dominance and biological coherence, with cross-model consensus identifying disease-associated domains that are validated by CRISPRi as reflecting co-expression rather than causal encoding.

Ihor Kendiukhov2026-03-05🤖 cs.LG

From Variance to Invariance: Qualitative Content Analysis for Narrative Graph Annotation

This paper introduces a qualitative content analysis-based framework for annotating economic narratives as directed acyclic graphs and demonstrates through a factorial experiment that locally-constrained representations and overlap-based metrics significantly improve inter-annotator agreement by reducing human label variation.

Junbo Huang, Max Weinig, Ulrich Fritsche + 1 more2026-03-05🤖 cs.AI

Rich Insights from Cheap Signals: Efficient Evaluations via Tensor Factorization

This paper proposes a sample-efficient tensor factorization model that combines cheap automated ratings with a small set of human labels to enable fine-grained, accurate, and confidence-interval-backed evaluation of generative models at the prompt level.

Felipe Maia Polo, Aida Nematzadeh, Virginia Aglietti + 2 more2026-03-05🤖 cs.AI

Federated Inference: Toward Privacy-Preserving Collaborative and Incentivized Model Serving

This paper establishes Federated Inference as a distinct collaborative paradigm for privacy-preserving model serving by formalizing its core requirements, analyzing structural trade-offs, and identifying system-level challenges that differentiate it from federated learning and classical ensemble methods.

Jungwon Seo, Ferhat Ozgur Catak, Chunming Rong + 1 more2026-03-05🤖 cs.AI

Structured vs. Unstructured Pruning: An Exponential Gap

This paper demonstrates an exponential gap between structured and unstructured pruning within the Strong Lottery Ticket Hypothesis, proving that while unstructured weight pruning requires only logarithmic overparameterization to approximate a target neuron, structured neuron pruning necessitates linear overparameterization.

Davide Ferre', Frédéric Giroire, Frederik Mallmann-Trenn + 1 more2026-03-05🤖 cs.AI

A Unified Revisit of Temperature in Classification-Based Knowledge Distillation

This paper presents a unified study that systematically investigates the interactions between the temperature parameter and various training components in knowledge distillation, offering practical guidance for selecting optimal temperature values to improve student performance.

Logan Frank, Jim Davis2026-03-05🤖 cs.LG

Causal Learning Should Embrace the Wisdom of the Crowd

This paper proposes a new paradigm for causal discovery that leverages crowdsourcing, expert elicitation, and LLM-based simulation to aggregate fragmented knowledge from multiple agents into a comprehensive global causal structure unattainable by any single entity.

Ryan Feng Lin, Yuantao Wei, Huiling Liao + 2 more2026-03-05🤖 cs.LG

Toward Early Quality Assessment of Text-to-Image Diffusion Models

This paper introduces Probe-Select, a plug-in module that predicts final image quality from early denoising activations to enable efficient early termination of unpromising seeds, thereby reducing sampling costs by over 60% while improving the quality of retained images in text-to-image generation.

Huanlei Guo, Hongxin Wei, Bingyi Jing2026-03-05🤖 cs.LG

Learning in Markov Decision Processes with Exogenous Dynamics

This paper introduces a reinforcement learning framework for Markov Decision Processes with exogenous dynamics that leverages the independence of certain state components from agent actions to achieve information-theoretically optimal regret bounds and significantly improved sample efficiency compared to standard methods.

Davide Maran, Davide Salaorni, Marcello Restelli2026-03-05🤖 cs.LG

AriadneMem: Threading the Maze of Lifelong Memory for LLM Agents

AriadneMem is a structured memory system for long-horizon LLM agents that employs a decoupled two-phase pipeline of entropy-aware filtering, conflict-aware coarsening, and algorithmic bridge discovery to significantly improve multi-hop reasoning accuracy and efficiency while drastically reducing context usage and runtime.

Wenhui Zhu, Xiwen Chen, Zhipeng Wang + 11 more2026-03-05🤖 cs.AI

Fine-Tuning and Evaluating Conversational AI for Agricultural Advisory

This paper presents a hybrid LLM architecture and evaluation framework (DG-EVAL) that combines supervised fine-tuning on expert-curated agricultural facts with a safety-aware stitching layer to deliver accurate, culturally appropriate, and cost-effective conversational advisory for smallholder farmers in India.

Sanyam Singh, Naga Ganesh, Vineet Singh + 8 more2026-03-05🤖 cs.AI

TTSR: Test-Time Self-Reflection for Continual Reasoning Improvement

This paper proposes TTSR, a test-time self-evolving framework where a single language model alternates between student and teacher roles to analyze reasoning failures and generate targeted variant questions, thereby enabling stable and continual improvement in reasoning performance on challenging benchmarks.

Haoyang He, Zihua Rong, Liangjie Zhao + 3 more2026-03-05🤖 cs.AI

From Exact Hits to Close Enough: Semantic Caching for LLM Embeddings

This paper addresses the NP-hard challenge of optimal offline semantic caching for LLM embeddings by proposing polynomial-time heuristics and novel online policies that leverage recency, frequency, and locality to improve response speed, reduce costs, and enhance semantic accuracy.

Dvir David Biton, Roy Friedman2026-03-05🤖 cs.AI

Knowledge Graph and Hypergraph Transformers with Repository-Attention and Journey-Based Role Transport

This paper proposes a dual-stream transformer architecture that unifies language and structured data processing by encoding knowledge graphs and hypergraphs into a separate key-value repository, where a journey-based role transport mechanism enables the language model to attend over structured instances while maintaining a clear, inspectable separation between linguistic context and structured knowledge.

Mahesh Godavarti2026-03-05🤖 cs.AI

Draft-Conditioned Constrained Decoding for Structured Generation in LLMs

The paper proposes Draft-Conditioned Constrained Decoding (DCCD), a training-free two-step inference method that decouples semantic planning from structural enforcement to significantly improve the accuracy and parameter efficiency of structured generation in large language models by mitigating the distortions caused by hard constraints.

Avinash Reddy, Thayne T. Walker, James S. Ide + 1 more2026-03-05🤖 cs.AI

Entropic-Time Inference: Self-Organizing Large Language Model Decoding Beyond Attention

This paper proposes "entropic-time inference," a novel paradigm that replaces linear token-based decoding with a self-organizing, entropy-driven architecture to dynamically allocate computational resources, optimize attention sparsification, and adapt sampling temperatures for more efficient and intelligent LLM generation.

Andrew Kiruluta2026-03-05🤖 cs.LG

Towards Self-Robust LLMs: Intrinsic Prompt Noise Resistance via CoIPO

This paper proposes CoIPO, a contrastive learning-based method that enhances the intrinsic robustness of large language models against prompt noise by minimizing the discrepancy between clean and noisy prompt outputs, demonstrating superior performance on the newly introduced NoisyPromptBench benchmark.

Xin Yang, Letian Li, Abudukelimu Wuerkaixi + 5 more2026-03-05🤖 cs.AI

← Previous Next →