cs.LG papers | Gist.Science

SLO-Aware Compute Resource Allocation for Prefill-Decode Disaggregated LLM Inference

This paper proposes a hybrid methodology combining theoretical modeling with empirical benchmarking to accurately determine the optimal allocation of Prefill-Decode disaggregated hardware resources for Large Language Model inference while satisfying throughput, SLO, and request characteristic constraints.

Luchang Li, Dongfang Li, Bozhao Gong + 1 more2026-03-06🔢 math

A Benchmark Study of Neural Network Compression Methods for Hyperspectral Image Classification

This paper presents a systematic benchmark study evaluating the effectiveness of pruning, quantization, and knowledge distillation in compressing neural networks for hyperspectral image classification, demonstrating that these methods can significantly reduce model size and computational costs while maintaining competitive accuracy for resource-constrained remote sensing applications.

Sai Shi2026-03-06💻 cs

Model Medicine: A Clinical Framework for Understanding, Diagnosing, and Treating AI Models

This paper introduces "Model Medicine," a comprehensive clinical research program that treats AI models as biological-like organisms by establishing a taxonomy of subdisciplines, a behavioral genetics framework, and novel diagnostic tools like Neural MRI to systematically understand, diagnose, and treat model disorders.

Jihoon Jeong2026-03-06💻 cs

Count Bridges enable Modeling and Deconvolving Transcriptomic Data

This paper introduces Count Bridges, a novel stochastic bridge process for integer-valued data that enables principled generative modeling and the deconvolution of aggregated biological count measurements, such as bulk RNA-seq and spatial transcriptomics, into single-cell resolution profiles.

Nic Fishman, Gokul Gowri, Tanush Kumar + 4 more2026-03-06💻 cs

When Priors Backfire: On the Vulnerability of Unlearnable Examples to Pretraining

This paper identifies that pretraining priors undermine the effectiveness of Unlearnable Examples by enabling models to bypass their protective perturbations, and proposes BAIT, a bi-level optimization method that binds perturbations to incorrect targets to override these priors and ensure robust data unlearnability.

Zhihao Li, Gezheng Xu, Jiale Cai + 5 more2026-03-06💻 cs

Distribution-Conditioned Transport

This paper introduces Distribution-Conditioned Transport (DCT), a flexible framework that conditions transport maps on learned distribution embeddings to enable generalization to unseen source-target pairs and semi-supervised learning, demonstrating significant performance improvements across synthetic benchmarks and diverse biological applications.

Nic Fishman, Gokul Gowri, Paolo L. B. Fischer + 3 more2026-03-06💻 cs

Interactive Benchmarks

This paper proposes "Interactive Benchmarks," a unified evaluation paradigm that assesses model intelligence through active information acquisition and reasoning under budget constraints in interactive proofs and games, demonstrating that current models still have significant room for improvement in these dynamic scenarios.

Baoqing Yue, Zihan Zhu, Yifan Zhang + 3 more2026-03-06💻 cs

CONE: Embeddings for Complex Numerical Data Preserving Unit and Variable Semantics

This paper introduces CONE, a hybrid transformer encoder that utilizes a novel composite embedding algorithm to preserve the semantics of units and variables for complex numerical data, achieving state-of-the-art performance in numerical reasoning tasks across diverse domains.

Gyanendra Shrestha, Anna Pyayt, Michael Gubanov2026-03-06💻 cs

KindSleep: Knowledge-Informed Diagnosis of Obstructive Sleep Apnea from Oximetry

KindSleep is a deep learning framework that integrates clinical knowledge with single-channel oximetry and patient data to accurately diagnose obstructive sleep apnea, achieving superior performance and enhanced transparency across large, diverse datasets.

Micky C Nnamdi, Wenqi Shi, Cheng Wan + 4 more2026-03-06💻 cs

Evaluating GPT-5 as a Multimodal Clinical Reasoner: A Landscape Commentary

This landscape commentary evaluates the GPT-5 family against GPT-4o, revealing substantial improvements in expert-level textual reasoning and multimodal synthesis that approach state-of-the-art performance in tasks like mammography, while highlighting that generalist models still lag behind specialized systems in perception-critical domains such as neuroradiology.

Alexandru Florea, Shansong Wang, Mingzhe Hu + 5 more2026-03-06💻 cs

ConTSG-Bench: A Unified Benchmark for Conditional Time Series Generation

This paper introduces ConTSG-Bench, a unified benchmark featuring a large-scale, multi-modal dataset and comprehensive metrics to systematically evaluate and analyze the performance, limitations, and future directions of conditional time series generation models.

Shaocheng Lan, Shuqi Gu, Zhangzhi Xiong + 1 more2026-03-06💻 cs

Distributional Reinforcement Learning with Information Bottleneck for Uncertainty-Aware DRAM Equalization

This paper proposes a distributional risk-sensitive reinforcement learning framework that integrates Information Bottleneck representations and Conditional Value-at-Risk optimization to achieve certified worst-case DRAM equalizer performance with significant speedups and uncertainty quantification, outperforming existing methods by up to 89.1% on real-world memory data.

Muhammad Usama, Dong Eui Chang2026-03-06💻 cs

Distributional Equivalence in Linear Non-Gaussian Latent-Variable Cyclic Causal Models: Characterization and Learning

This paper presents the first structural-assumption-free causal discovery method for linear non-Gaussian latent-variable cyclic models by establishing a graphical criterion for distributional equivalence, introducing edge rank constraints, and providing an algorithm to recover models up to this equivalence class.

Haoyue Dai, Immanuel Albrecht, Peter Spirtes + 1 more2026-03-06💻 cs

Diffusion Policy through Conditional Proximal Policy Optimization

This paper introduces a novel and efficient on-policy reinforcement learning method that trains diffusion policies by aligning policy iteration with the diffusion process, thereby overcoming computational bottlenecks in log-likelihood estimation while enabling multimodal behavior generation and entropy regularization across diverse benchmark tasks.

Ben Liu, Shunpeng Yang, Hua Chen2026-03-06💻 cs

Guiding Diffusion-based Reconstruction with Contrastive Signals for Balanced Visual Representation

This paper proposes Diffusion Contrastive Reconstruction (DCR), a method that injects contrastive signals derived from reconstructed images into the diffusion process to resolve gradient conflicts and jointly optimize both discriminative and detail-perceptive abilities, thereby overcoming the limitations of CLIP's visual encoder for balanced visual representation.

Boyu Han, Qianqian Xu, Shilong Bao + 4 more2026-03-06💻 cs

The Inductive Bias of Convolutional Neural Networks: Locality and Weight Sharing Reshape Implicit Regularization

This paper demonstrates that the architectural inductive biases of locality and weight sharing in convolutional neural networks fundamentally alter implicit regularization by coupling learned filters to low-dimensional patch manifolds, thereby enabling generalization on high-dimensional spherical data where fully connected networks provably fail.

Tongtong Liang, Esha Singh, Rahul Parhi + 2 more2026-03-06💻 cs

WhisperAlign: Word-Boundary-Aware ASR and WhisperX-Anchored Pyannote Diarization for Long-Form Bengali Speech

This paper presents WhisperAlign, a solution for the DL Sprint 4.0 that combines word-boundary-aware ASR using whisper-timestamped chunking and domain-fine-tuned Pyannote diarization anchored by WhisperX to achieve high-accuracy transcription and speaker separation for long-form Bengali speech.

Aurchi Chowdhury, Rubaiyat -E-Zaman, Sk. Ashrafuzzaman Nafees2026-03-06💻 cs

Quadratic polarity and polar Fenchel-Young divergences from the canonical Legendre polarity

This paper establishes a unified framework linking quadratic polarities to deformed Legendre transformations via linear algebra on homogeneous coordinates, defines polar divergences that generalize Fenchel-Young and Bregman divergences, and elucidates the reference duality in information geometry through total polar Fenchel-Young divergences.

Frank Nielsen, Basile Plus-Gourdon, Mahito Sugiyama2026-03-06💻 cs

On the Strengths and Weaknesses of Data for Open-set Embodied Assistance

This paper investigates the generalization capabilities of a multimodal foundation model fine-tuned on diverse synthetic interactive data for the novel task of Open-Set Corrective Assistance, demonstrating that effective open-set assistive intelligence requires datasets encompassing multimodal grounding, defect inference, and exposure to varied scenarios.

Pradyumna Tambwekar, Andrew Silva, Deepak Gopinath + 3 more2026-03-06🤖 cs.AI

Mitigating Instance Entanglement in Instance-Dependent Partial Label Learning

This paper proposes the Class-specific Augmentation based Disentanglement (CAD) framework to mitigate instance entanglement in instance-dependent partial label learning by employing intra-class feature alignment and inter-class weighted penalty mechanisms to clarify class boundaries and reduce confusion.

Rui Zhao, Bin Shi, Kai Sun + 1 more2026-03-06🤖 cs.LG

← Previous Next →