cs.LG papers | Gist.Science

VSPrefill: Vertical-Slash Sparse Attention with Lightweight Indexing for Long-Context Prefilling

VSPrefill is a lightweight, training-efficient sparse attention mechanism that leverages vertical-slash structural patterns and adaptive thresholding to achieve linear complexity during long-context prefilling, delivering a 4.95x speedup while preserving 98.35% of full attention accuracy on 128k context lengths.

Chen Guanzhong2026-03-06💻 cs

MAD-SmaAt-GNet: A Multimodal Advection-Guided Neural Network for Precipitation Nowcasting

This paper introduces MAD-SmaAt-GNet, a multimodal, advection-guided neural network that extends the SmaAt-UNet architecture by integrating multiple weather variables and physics-based advection to significantly improve the accuracy and physical consistency of short-term precipitation nowcasting.

Samuel van Wonderen, Siamak Mehrkanoon2026-03-06💻 cs

Understanding the Dynamics of Demonstration Conflict in In-Context Learning

This paper investigates how large language models process conflicting demonstrations in in-context learning, revealing a two-phase computational structure where early layers encode both correct and incorrect rules while late layers commit to predictions, and identifies specific attention heads responsible for this vulnerability that can be mitigated through targeted ablation to significantly improve performance.

Difan Jiao, Di Wang, Lijie Hu2026-03-06💻 cs

Act-Observe-Rewrite: Multimodal Coding Agents as In-Context Policy Learners for Robot Manipulation

This paper introduces Act-Observe-Rewrite (AOR), a framework enabling multimodal language models to iteratively improve robot manipulation policies by synthesizing and rewriting executable Python controller code based on visual feedback and failure analysis, achieving high success rates across tasks without demonstrations, reward engineering, or gradient updates.

Vaishak Kumar2026-03-06💻 cs

Towards Explainable Deep Learning for Ship Trajectory Prediction in Inland Waterways

This study proposes an interpretable LSTM-based model for predicting ship trajectories in inland waterways that incorporates trained ship domain parameters to analyze attention mechanisms, revealing that while the model achieves competitive accuracy, its attention weights do not fully align with expected causal relationships between interacting vessels.

Tom Legel, Dirk Söffker, Roland Schätzle + 1 more2026-03-06💻 cs

Dictionary Based Pattern Entropy for Causal Direction Discovery

This paper introduces Dictionary Based Pattern Entropy (DPE), a novel framework that combines Algorithmic and Shannon Information Theories to infer causal directions and identify driving subpatterns in symbolic sequences by quantifying how compact, rule-based patterns in a cause systematically reduce uncertainty in an effect, demonstrating robust performance across diverse synthetic and real-world datasets.

Harikrishnan N B, Shubham Bhilare, Aditi Kathpalia + 1 more2026-03-06🔢 math

Activity Recognition from Smart Insole Sensor Data Using a Circular Dilated CNN

This paper presents a circular dilated convolutional neural network (CDCNN) for real-time activity recognition using multi-modal smart insole sensor data, which achieves 86.42% accuracy in a subject-independent four-class classification task, demonstrating that inertial sensors are the primary contributors to performance while remaining suitable for embedded deployment.

Yanhua Zhao2026-03-06💻 cs

Standing on the Shoulders of Giants: Rethinking EEG Foundation Model Pretraining via Multi-Teacher Distillation

This paper proposes the Multi-Teacher Distillation Pretraining (MTDP) framework, which leverages representations from established vision and time-series foundation models to efficiently bootstrap EEG foundation models, achieving superior performance across diverse downstream tasks with only 25% of the data required by traditional self-supervised methods.

Chenqi Li, Yu Liu, Shuo Zhang + 2 more2026-03-06💻 cs

Bayesian Modeling of Collatz Stopping Times: A Probabilistic Machine Learning Perspective

This paper applies a probabilistic machine learning framework to analyze Collatz stopping times up to $10^7$ , demonstrating that a Bayesian hierarchical Negative Binomial regression outperforms a mechanistic odd-block generator in predictive likelihood while revealing that low-order modular structure significantly drives the observed heterogeneity.

Nicolò Bonacorsi, Matteo Bordoni2026-03-06🔢 math

AbAffinity: A Large Language Model for Predicting Antibody Binding Affinity against SARS-CoV-2

This paper introduces AbAffinity, a large language model designed to accurately predict the binding affinity of antibodies against target peptides, such as the SARS-CoV-2 spike protein, to advance machine learning-based antibody design.

Faisal Bin Ashraf, Animesh Ray, Stefano Lonardi2026-03-06💻 cs

Augmenting representations with scientific papers

This paper introduces a contrastive learning framework that aligns X-ray spectra with scientific literature to create shared multimodal representations, significantly improving the estimation of physical variables and enabling the discovery of rare astrophysical sources through integrated data analysis.

Nicolò Oreste Pinciroli Vago, Rocco Di Tella, Carolina Cuesta-Lázaro + 3 more2026-03-06✓ Author reviewed ⓘ🔭 astro-ph

Projected Hessian Learning: Fast Curvature Supervision for Accurate Machine-Learning Interatomic Potentials

The paper introduces Projected Hessian Learning (PHL), a scalable framework that enables efficient, curvature-informed training of machine-learning interatomic potentials by utilizing stochastic Hessian-vector products instead of explicit Hessian matrices, thereby achieving full-second-order accuracy with significantly reduced computational cost and memory requirements.

Austin Rodriguez, Justin S. Smith, Sakib Matin + 3 more2026-03-06🔬 physics

The Volterra signature

This paper introduces the Volterra signature, a principled and interpretable feature representation for non-Markovian time series that combines universal approximation guarantees, time-reparameterization invariance, and efficient computation via linear state-space ODEs and kernel tricks to outperform existing path signature baselines in dynamic learning tasks.

Paul P. Hager, Fabian N. Harang, Luca Pelizzari + 1 more2026-03-06💻 cs

Invariant Causal Routing for Governing Social Norms in Online Market Economies

This paper proposes Invariant Causal Routing (ICR), a causal governance framework that leverages counterfactual reasoning and invariant causal discovery to identify stable, interpretable policy rules for steering emergent social norms in online market economies across heterogeneous environments.

Xiangning Yu, Qirui Mi, Xiao Xue + 4 more2026-03-06💻 cs

A Fast Generative Framework for High-dimensional Posterior Sampling: Application to CMB Delensing

This paper introduces a fast deep generative framework that significantly accelerates high-dimensional Bayesian posterior sampling compared to diffusion-based methods, successfully demonstrating its robustness and effectiveness in recovering unlensed CMB power spectra for cosmological delensing applications.

Hadi Sotoudeh, Pablo Lemos, Laurence Perreault-Levasseur2026-03-06🔭 astro-ph

An LLM-Guided Query-Aware Inference System for GNN Models on Large Knowledge Graphs

This paper introduces KG-WISE, an LLM-guided inference system that accelerates Graph Neural Network queries on large Knowledge Graphs by dynamically decomposing models into fine-grained components and retrieving only semantically relevant subgraphs, achieving up to 28x faster inference and 98% lower memory usage while maintaining accuracy.

Waleed Afandi, Hussein Abdallah, Ashraf Aboulnaga + 1 more2026-03-06💻 cs

Oracle-efficient Hybrid Learning with Constrained Adversaries

This paper introduces an oracle-efficient learning algorithm for the Hybrid Online Learning setting with constrained adversaries that achieves statistical optimality by leveraging the Rademacher complexity of the learner's and adversary's function classes, while also providing a novel tool for computing equilibria in structured stochastic zero-sum games.

Princewill Okoroafor, Robert Kleinberg, Michael P. Kim2026-03-06💻 cs

Weather-Related Crash Risk Forecasting: A Deep Learning Approach for Heterogenous Spatiotemporal Data

This study proposes a deep learning framework that utilizes an ensemble of ConvLSTM models trained on overlapping spatial grids to effectively forecast weather-related traffic crash risk by capturing complex spatiotemporal dependencies and heterogeneity, demonstrating superior performance over baseline models in North Carolina's diverse high-risk zones.

Abimbola Ogungbire, Srinivas Pulugurtha2026-03-06💻 cs

Latent Particle World Models: Self-supervised Object-centric Stochastic Dynamics Modeling

The paper introduces the Latent Particle World Model (LPWM), a self-supervised, object-centric framework that autonomously discovers scene structures from video to model stochastic dynamics and achieve state-of-the-art performance in both video prediction and decision-making tasks.

Tal Daniel, Carl Qi, Dan Haramati + 5 more2026-03-06💻 cs

Fusion and Grouping Strategies in Deep Learning for Local Climate Zone Classification of Multimodal Remote Sensing Data

This study evaluates various deep learning fusion and grouping strategies for classifying Local Climate Zones using multimodal SAR and MSI data, demonstrating that a baseline hybrid fusion model combined with band grouping and label merging achieves the highest accuracy (76.6%) while significantly improving predictions for underrepresented classes.

Ancymol Thomas, Jaya Sreevalsan-Nair2026-03-06💻 cs

← Previous Next →