cs.LG papers | Gist.Science

Context-Dependent Affordance Computation in Vision-Language Models

Through a large-scale study of Qwen-VL and LLaVA-1.5, this paper demonstrates that vision-language models exhibit significant context-dependent affordance drift, where both lexical and semantic outputs vary substantially based on agentic personas, suggesting a need for dynamic, query-dependent ontological projection in robotics rather than static world modeling.

Murad Farzulla2026-03-06💻 cs

Machine Learning for Complex Systems Dynamics: Detecting Bifurcations in Dynamical Systems with Deep Neural Networks

This study introduces Equilibrium-Informed Neural Networks (EINNs), a novel deep learning approach that reverses the traditional parameter-search process by inferring system parameters from candidate equilibrium states to efficiently detect critical bifurcation thresholds and tipping points in complex nonlinear dynamical systems.

Swadesh Pal, Roderick Melnik2026-03-06🔢 math

FedEMA-Distill: Exponential Moving Average Guided Knowledge Distillation for Robust Federated Learning

FedEMA-Distill is a robust and communication-efficient federated learning framework that leverages server-side exponential moving average smoothing and ensemble knowledge distillation from compressed client logits to achieve superior accuracy, faster convergence, and Byzantine resilience under non-IID data conditions without requiring client-side software modifications.

Hamza Reguieg, Mohamed El Kamili, Essaid Sabir2026-03-06💻 cs

When Scaling Fails: Network and Fabric Effects on Distributed GPU Training Performance

This paper presents an empirical study demonstrating that network topology, congestion dynamics, and GPU locality often cause unpredictable scaling failures in distributed GPU training, urging system builders to adopt specific diagnostic principles to address these overlooked fabric-level bottlenecks.

Dinesh Gopalan, Ratul Ali2026-03-06💻 cs

Data-Driven Optimization of Multi-Generational Cellular Networks: A Performance Classification Framework for Strategic Infrastructure Management

This paper leverages a multi-generational cellular network dataset from OpenCelliD to analyze deployment patterns and utilization metrics, offering a strategic framework for Mobile Network Operators to optimize infrastructure, identify cost-saving opportunities, and guide targeted LTE upgrades in underserved regions.

Maryam Sabahat, M. Umar Khan2026-03-06💻 cs

Delta-Crosscoder: Robust Crosscoder Model Diffing in Narrow Fine-Tuning Regimes

The paper introduces Delta-Crosscoder, a robust method that combines BatchTopK sparsity with a delta-based loss to effectively identify and mitigate localized behavioral changes in narrowly fine-tuned models, outperforming existing SAE-based baselines across diverse model architectures and tasks.

Aly Kassem, Thomas Jiralerspong, Negar Rostamzadeh + 1 more2026-03-06💻 cs

Thin Keys, Full Values: Reducing KV Cache via Low-Dimensional Attention Selection

This paper proposes reducing KV cache memory usage in transformers by employing low-dimensional keys for attention selection while maintaining full-dimensional values for semantic transfer, a strategy validated across multiple models and datasets to achieve up to 75% cache savings with minimal performance degradation.

Hengshuai Yao, Guan Wang2026-03-06💻 cs

Agent Memory Below the Prompt: Persistent Q4 KV Cache for Multi-Agent LLM Inference on Edge Devices

This paper introduces a system for multi-agent LLM inference on edge devices that persists 4-bit quantized KV caches to disk, enabling direct cache restoration to eliminate redundant prefill computations and achieve up to 136x faster time-to-first-token while fitting four times more agent contexts into limited RAM.

Yakov Pyotr Shkolnikov2026-03-06💻 cs

Flowers: A Warp Drive for Neural PDE Solvers

The paper introduces Flowers, a novel neural architecture for solving partial differential equations that replaces traditional attention, convolution, and Fourier mechanisms with multihead warps to achieve adaptive global interactions at linear computational cost, demonstrating superior performance on 2D and 3D flow and wave benchmarks compared to existing foundation models.

Till Muser, Alexandra Spitzer, Matti Lassas + 2 more2026-03-06💻 cs

Uncertainty-Calibrated Spatiotemporal Field Diffusion with Sparse Supervision

The paper introduces SOLID, a mask-conditioned diffusion framework that learns spatiotemporal dynamics directly from sparse observations without requiring dense ground truth, achieving superior probabilistic forecasting and calibrated uncertainty through a novel dual-masking objective.

Kevin Valencia, Xihaier Luo, Shinjae Yoo + 1 more2026-03-06💻 cs

Auction-Based RIS Allocation With DRL: Controlling the Cost-Performance Trade-Off

This paper proposes an auction-based framework for allocating shared reconfigurable intelligent surfaces (RISs) in multi-cell networks, where deep reinforcement learning agents optimize bidding strategies to dynamically balance spectral efficiency and cost constraints.

Martin Mark Zan, Stefan Schwarz2026-03-06💻 cs

ZorBA: Zeroth-order Federated Fine-tuning of LLMs with Heterogeneous Block Activation

This paper proposes ZorBA, a zeroth-order federated fine-tuning framework for large language models that reduces VRAM usage and communication overhead through heterogeneous block activation and shared random seeds, while optimizing convergence via a novel lexicographic algorithm.

Chuiyang Meng, Ming Tang, Vincent W. S. Wong2026-03-06💻 cs

ASFL: An Adaptive Model Splitting and Resource Allocation Framework for Split Federated Learning

This paper proposes ASFL, an adaptive split federated learning framework that jointly optimizes model splitting and resource allocation via an online block coordinate descent algorithm to significantly reduce training delay and energy consumption while accelerating convergence in wireless networks.

Chuiyang Meng, Ming Tang, Vincent W. S. Wong2026-03-06💻 cs

CogGen: Cognitive-Load-Informed Fully Unsupervised Deep Generative Modeling for Compressively Sampled MRI Reconstruction

The paper proposes CogGen, a fully unsupervised deep generative modeling framework for compressively sampled MRI reconstruction that enhances fidelity and convergence by regulating cognitive load through a self-paced curriculum learning strategy that progressively schedules k-space data fitting from low-frequency, high-SNR samples to more complex, noise-dominated measurements.

Qingyong Zhu, Yumin Tan, Xiang Gu + 1 more2026-03-06💻 cs

Explainable Regime Aware Investing

This paper presents an explainable, regime-aware portfolio construction framework using a strictly causal Wasserstein Hidden Markov Model that dynamically adapts regime complexity while preserving economic interpretability, achieving superior risk-adjusted returns and lower drawdowns compared to traditional benchmarks and nonparametric alternatives.

Amine Boukardagha2026-03-06💻 cs

AMV-L: Lifecycle-Managed Agent Memory for Tail-Latency Control in Long-Running LLM Systems

This paper introduces AMV-L, a value-driven memory lifecycle framework for long-running LLM agents that replaces age-based retention with utility-based tiering to bound retrieval workloads, thereby achieving significantly improved tail-latency control and throughput compared to traditional TTL and LRU policies.

Emmanuel Bamidele2026-03-06💻 cs

SkillNet: Create, Evaluate, and Connect AI Skills

SkillNet is an open infrastructure that addresses the lack of systematic skill accumulation in AI agents by providing a unified ontology, a repository of over 200,000 skills, and evaluation tools to create, connect, and assess skills, thereby significantly enhancing agent performance and efficiency across diverse tasks.

Yuan Liang, Ruobin Zhong, Haoming Xu + 46 more2026-03-06✓ Author reviewed ⓘ💻 cs

An Explainable Ensemble Framework for Alzheimer's Disease Prediction Using Structured Clinical and Cognitive Data

This research proposes an explainable ensemble learning framework that integrates structured clinical and cognitive data with advanced preprocessing and hybrid class balancing techniques to achieve accurate and transparent Alzheimer's disease prediction, demonstrating that optimized ensemble models outperform deep learning while providing actionable clinical insights through SHAP analysis.

Nishan Mitra2026-03-06💻 cs

MPBMC: Multi-Property Bounded Model Checking with GNN-guided Clustering

This paper proposes MPBMC, a hybrid approach that leverages Graph Neural Network embeddings and runtime design statistics to functionally cluster properties, thereby significantly accelerating multi-property Bounded Model Checking verification on HWMCC benchmarks compared to state-of-the-art methods.

Soumik Guha Roy, Sumana Ghosh, Ansuman Banerjee + 2 more2026-03-06💻 cs

On Emergences of Non-Classical Statistical Characteristics in Classical Neural Networks

This paper introduces the Non-Classical Network (NCnet), a classical neural architecture that exhibits quantum-like non-classical statistical behaviors through gradient competitions and implicit inter-task correlations, revealing that the resulting CHSH $S$ statistic serves as a novel indicator for understanding internal network dynamics and generalization performance across different resource regimes.

Hanyu Zhao, Yang Wu, Yuexian Hou2026-03-06⚛️ quant-ph

← Previous Next →