cs.LG papers | Gist.Science

When Scaling Fails: Network and Fabric Effects on Distributed GPU Training Performance

This paper presents an empirical study demonstrating that network topology, congestion dynamics, and GPU locality often cause unpredictable scaling failures in distributed GPU training, urging system builders to adopt specific diagnostic principles to address these overlooked fabric-level bottlenecks.

Dinesh Gopalan, Ratul Ali2026-03-06💻 cs

Data-Driven Optimization of Multi-Generational Cellular Networks: A Performance Classification Framework for Strategic Infrastructure Management

This paper leverages a multi-generational cellular network dataset from OpenCelliD to analyze deployment patterns and utilization metrics, offering a strategic framework for Mobile Network Operators to optimize infrastructure, identify cost-saving opportunities, and guide targeted LTE upgrades in underserved regions.

Maryam Sabahat, M. Umar Khan2026-03-06💻 cs

Delta-Crosscoder: Robust Crosscoder Model Diffing in Narrow Fine-Tuning Regimes

The paper introduces Delta-Crosscoder, a robust method that combines BatchTopK sparsity with a delta-based loss to effectively identify and mitigate localized behavioral changes in narrowly fine-tuned models, outperforming existing SAE-based baselines across diverse model architectures and tasks.

Aly Kassem, Thomas Jiralerspong, Negar Rostamzadeh + 1 more2026-03-06💻 cs

Thin Keys, Full Values: Reducing KV Cache via Low-Dimensional Attention Selection

This paper proposes reducing KV cache memory usage in transformers by employing low-dimensional keys for attention selection while maintaining full-dimensional values for semantic transfer, a strategy validated across multiple models and datasets to achieve up to 75% cache savings with minimal performance degradation.

Hengshuai Yao, Guan Wang2026-03-06💻 cs

Agent Memory Below the Prompt: Persistent Q4 KV Cache for Multi-Agent LLM Inference on Edge Devices

This paper introduces a system for multi-agent LLM inference on edge devices that persists 4-bit quantized KV caches to disk, enabling direct cache restoration to eliminate redundant prefill computations and achieve up to 136x faster time-to-first-token while fitting four times more agent contexts into limited RAM.

Yakov Pyotr Shkolnikov2026-03-06💻 cs

Flowers: A Warp Drive for Neural PDE Solvers

The paper introduces Flowers, a novel neural architecture for solving partial differential equations that replaces traditional attention, convolution, and Fourier mechanisms with multihead warps to achieve adaptive global interactions at linear computational cost, demonstrating superior performance on 2D and 3D flow and wave benchmarks compared to existing foundation models.

Till Muser, Alexandra Spitzer, Matti Lassas + 2 more2026-03-06💻 cs

Uncertainty-Calibrated Spatiotemporal Field Diffusion with Sparse Supervision

The paper introduces SOLID, a mask-conditioned diffusion framework that learns spatiotemporal dynamics directly from sparse observations without requiring dense ground truth, achieving superior probabilistic forecasting and calibrated uncertainty through a novel dual-masking objective.

Kevin Valencia, Xihaier Luo, Shinjae Yoo + 1 more2026-03-06💻 cs

Auction-Based RIS Allocation With DRL: Controlling the Cost-Performance Trade-Off

This paper proposes an auction-based framework for allocating shared reconfigurable intelligent surfaces (RISs) in multi-cell networks, where deep reinforcement learning agents optimize bidding strategies to dynamically balance spectral efficiency and cost constraints.

Martin Mark Zan, Stefan Schwarz2026-03-06💻 cs

ZorBA: Zeroth-order Federated Fine-tuning of LLMs with Heterogeneous Block Activation

This paper proposes ZorBA, a zeroth-order federated fine-tuning framework for large language models that reduces VRAM usage and communication overhead through heterogeneous block activation and shared random seeds, while optimizing convergence via a novel lexicographic algorithm.

Chuiyang Meng, Ming Tang, Vincent W. S. Wong2026-03-06💻 cs

ASFL: An Adaptive Model Splitting and Resource Allocation Framework for Split Federated Learning

This paper proposes ASFL, an adaptive split federated learning framework that jointly optimizes model splitting and resource allocation via an online block coordinate descent algorithm to significantly reduce training delay and energy consumption while accelerating convergence in wireless networks.

Chuiyang Meng, Ming Tang, Vincent W. S. Wong2026-03-06💻 cs

CogGen: Cognitive-Load-Informed Fully Unsupervised Deep Generative Modeling for Compressively Sampled MRI Reconstruction

The paper proposes CogGen, a fully unsupervised deep generative modeling framework for compressively sampled MRI reconstruction that enhances fidelity and convergence by regulating cognitive load through a self-paced curriculum learning strategy that progressively schedules k-space data fitting from low-frequency, high-SNR samples to more complex, noise-dominated measurements.

Qingyong Zhu, Yumin Tan, Xiang Gu + 1 more2026-03-06💻 cs

Explainable Regime Aware Investing

This paper presents an explainable, regime-aware portfolio construction framework using a strictly causal Wasserstein Hidden Markov Model that dynamically adapts regime complexity while preserving economic interpretability, achieving superior risk-adjusted returns and lower drawdowns compared to traditional benchmarks and nonparametric alternatives.

Amine Boukardagha2026-03-06💻 cs

AMV-L: Lifecycle-Managed Agent Memory for Tail-Latency Control in Long-Running LLM Systems

This paper introduces AMV-L, a value-driven memory lifecycle framework for long-running LLM agents that replaces age-based retention with utility-based tiering to bound retrieval workloads, thereby achieving significantly improved tail-latency control and throughput compared to traditional TTL and LRU policies.

Emmanuel Bamidele2026-03-06💻 cs

SkillNet: Create, Evaluate, and Connect AI Skills

SkillNet is an open infrastructure that addresses the lack of systematic skill accumulation in AI agents by providing a unified ontology, a repository of over 200,000 skills, and evaluation tools to create, connect, and assess skills, thereby significantly enhancing agent performance and efficiency across diverse tasks.

Yuan Liang, Ruobin Zhong, Haoming Xu + 46 more2026-03-06✓ Author reviewed ⓘ💻 cs

An Explainable Ensemble Framework for Alzheimer's Disease Prediction Using Structured Clinical and Cognitive Data

This research proposes an explainable ensemble learning framework that integrates structured clinical and cognitive data with advanced preprocessing and hybrid class balancing techniques to achieve accurate and transparent Alzheimer's disease prediction, demonstrating that optimized ensemble models outperform deep learning while providing actionable clinical insights through SHAP analysis.

Nishan Mitra2026-03-06💻 cs

MPBMC: Multi-Property Bounded Model Checking with GNN-guided Clustering

This paper proposes MPBMC, a hybrid approach that leverages Graph Neural Network embeddings and runtime design statistics to functionally cluster properties, thereby significantly accelerating multi-property Bounded Model Checking verification on HWMCC benchmarks compared to state-of-the-art methods.

Soumik Guha Roy, Sumana Ghosh, Ansuman Banerjee + 2 more2026-03-06💻 cs

On Emergences of Non-Classical Statistical Characteristics in Classical Neural Networks

This paper introduces the Non-Classical Network (NCnet), a classical neural architecture that exhibits quantum-like non-classical statistical behaviors through gradient competitions and implicit inter-task correlations, revealing that the resulting CHSH $S$ statistic serves as a novel indicator for understanding internal network dynamics and generalization performance across different resource regimes.

Hanyu Zhao, Yang Wu, Yuexian Hou2026-03-06⚛️ quant-ph

Induced Numerical Instability: Hidden Costs in Multimodal Large Language Models

This paper introduces a novel attack method that induces numerical instability in multimodal large language models by optimizing a specific loss function to generate images, causing significant performance degradation across state-of-the-art models and datasets that is distinct from traditional adversarial perturbations.

Wai Tuck Wong, Jun Sun, Arunesh Sinha2026-03-06💻 cs

Learning Unified Distance Metric for Heterogeneous Attribute Data Clustering

This paper proposes a novel, parameter-free Heterogeneous Attribute Reconstruction and Representation (HARR) learning paradigm that unifies numerical and categorical attributes into homogeneous spaces with learnable metrics to effectively adapt to various clustering tasks while guaranteeing convergence.

Yiqun Zhang, Mingjie Zhao, Yizhou Chen + 2 more2026-03-06💻 cs

VSPrefill: Vertical-Slash Sparse Attention with Lightweight Indexing for Long-Context Prefilling

VSPrefill is a lightweight, training-efficient sparse attention mechanism that leverages vertical-slash structural patterns and adaptive thresholding to achieve linear complexity during long-context prefilling, delivering a 4.95x speedup while preserving 98.35% of full attention accuracy on 128k context lengths.

Chen Guanzhong2026-03-06💻 cs

← Previous Next →