cs.LG papers | Gist.Science

TrainDeeploy: Hardware-Accelerated Parameter-Efficient Fine-Tuning of Small Transformer Models at the Extreme Edge

TrainDeeploy is a novel framework that enables efficient, parameter-efficient on-device fine-tuning of both CNN and Transformer models on ultra-low-power, memory-constrained RISC-V SoCs, achieving significant reductions in memory usage and computational overhead while supporting end-to-end training at the extreme edge.

Run Wang, Victor J. B. Jung, Philip Wiese, Francesco Conti, Alessio Burrello, Luca BeniniWed, 11 Ma🤖 cs.LG

You Didn't Have to Say It like That: Subliminal Learning from Faithful Paraphrases

This paper demonstrates that language models can covertly acquire behavioral traits from a teacher model through "subliminal learning" on faithful paraphrases, where the student adopts the teacher's preferences even when the paraphrased content is semantically unrelated or explicitly contradicts those preferences, rendering content-based inspection ineffective.

Isaia Gisler (ETH Zürich), Zhonghao He (University of Cambridge), Tianyi Qiu (Peking University)Wed, 11 Ma🤖 cs.LG

Efficiently Aligning Draft Models via Parameter- and Data-Efficient Adaptation

This paper introduces Efficient Draft Adaptation (EDA), a parameter- and data-efficient framework that restores speculative decoding performance on fine-tuned target models through a decoupled architecture, data regeneration strategy, and sample selection mechanism, achieving superior acceptance lengths with significantly reduced training costs compared to full retraining.

Luxi Lin, Zhihang Lin, Zhanpeng Zeng, Yuhao Chen, Qingyu Zhang, Jixiang Luo, Xuelong Li, Rongrong JiWed, 11 Ma🤖 cs.AI

What Do We Care About in Bandits with Noncompliance? BRACE: Bandits with Recommendations, Abstention, and Certified Effects

This paper introduces BRACE, a parameter-free algorithm for multi-armed bandits with noncompliance that simultaneously optimizes recommendation welfare and treatment learning by performing certified instrumental variable inversion only when identification is strong, otherwise providing honest structural intervals to navigate the trade-offs between mediated and direct-control regimes.

Nicolás Della PennaWed, 11 Ma🤖 cs.LG

Compiler-First State Space Duality and Portable $O(1)$ Autoregressive Caching for Inference

This paper demonstrates that Mamba-2's state space duality can be implemented entirely using standard XLA primitives without custom kernels, achieving portable, host-synchronization-free $O(1)$ autoregressive caching with high performance across CPU, NVIDIA GPU, and Google Cloud TPU hardware.

Cosmo SantoniWed, 11 Ma🤖 cs.AI

Learning Bayesian and Markov Networks with an Unreliable Oracle

This paper investigates constraint-based structure learning for Markov and Bayesian networks using an unreliable oracle, demonstrating that Markov networks remain uniquely identifiable under bounded errors if vertex-wise disjoint paths are limited, whereas Bayesian networks cannot tolerate any errors for guaranteed identification, and subsequently providing algorithms for cases where unique identifiability holds.

Juha Harviainen, Pekka Parviainen, Vidya Sagar SharmaWed, 11 Ma🤖 cs.LG

a-TMFG: Scalable Triangulated Maximally Filtered Graphs via Approximate Nearest Neighbors

This paper introduces a-TMFG, a scalable algorithm that overcomes the memory and computational limitations of traditional Triangulated Maximally Filtered Graphs by leveraging k-Nearest Neighbors and on-the-fly correlation estimation to construct sparse graphs from massive datasets.

Lionel YelibiWed, 11 Ma🤖 cs.LG

An Optimal Control Approach To Transformer Training

This paper proposes a rigorous optimal control framework that models Transformer training as a lifted Markov decision process on probability measures, establishing the existence of globally optimal policies and providing a quantized, gradient-free training alternative that respects key architectural constraints like input independence and positional encoding.

Ka\u{g}an Akman, Naci Saldı, Serdar YükselWed, 11 Ma🤖 cs.LG

SCDP: Learning Humanoid Locomotion from Partial Observations via Mixed-Observation Distillation

The paper introduces Sensor-Conditioned Diffusion Policies (SCDP), a novel framework that enables robust humanoid locomotion using only onboard sensors by distilling privileged full-body knowledge through mixed-observation training and specialized denoising techniques, successfully achieving near-perfect simulation performance and real-world deployment on a G1 robot without explicit state estimation.

Milo Carroll, Tianhu Peng, Lingfan Bao, Chengxu Zhou, Zhibin LiWed, 11 Ma🤖 cs.LG

Routing without Forgetting

The paper introduces Routing without Forgetting (RwF), a transformer architecture that addresses Online Continual Learning by replacing iterative gradient-based specialization with dynamic, single-step associative retrieval of input-conditioned prompts via energy-based layers, thereby achieving superior performance on class-incremental benchmarks without explicit task identifiers.

Alessio Masano, Giovanni Bellitto, Dipam Goswani, Joost Van de Weijer, Concetto SpampinatoWed, 11 Ma🤖 cs.AI

Towards Understanding Adam Convergence on Highly Degenerate Polynomials

This paper theoretically demonstrates that Adam naturally achieves local linear convergence on highly degenerate polynomials through a decoupling mechanism that amplifies the effective learning rate, significantly outperforming Gradient Descent and Momentum without requiring external schedulers.

Zhiwei Bai, Jiajie Zhao, Zhangchen Zhou, Zhi-Qin John Xu, Yaoyu ZhangWed, 11 Ma🤖 cs.LG

Nonparametric Variational Differential Privacy via Embedding Parameter Clipping

This paper introduces a theoretically grounded parameter clipping strategy derived from Rényi Divergence minimization to stabilize training and improve the privacy-utility trade-off in nonparametric variational differential privacy models by preventing learned latent representations from drifting into high-information regions.

Dina El Zein, Shashi Kumar, James HendersonWed, 11 Ma🤖 cs.LG

Memorization capacity of deep ReLU neural networks characterized by width and depth

This paper establishes the optimal trade-off between width and depth for deep ReLU neural networks to memorize $N$ separated data points, proving that the product of the squared width and squared depth must scale as $\Theta(N\log(\delta^{-1}))$ .

Xin Yang, Yunfei YangWed, 11 Ma🤖 cs.LG

MM-algorithms for traditional and convex NMF with Tweedie and Negative Binomial cost functions and empirical evaluation

This paper presents a unified framework for traditional and convex Non-negative Matrix Factorization (NMF) under Negative Binomial and Tweedie distributions, deriving novel multiplicative update rules via Majorize-Minimization and demonstrating through empirical evaluation that appropriate noise model selection and convex formulations significantly improve feature recovery in overdispersed data.

Elisabeth Sommer James, Asger Hobolth, Marta PelizzolaWed, 11 Ma🤖 cs.LG

Learning the Hierarchical Organization in Brain Network for Brain Disorder Diagnosis

The paper proposes BrainHO, a novel framework that learns intrinsic hierarchical brain network dependencies from fMRI data using a hierarchical attention mechanism and orthogonality constraints, thereby achieving state-of-the-art diagnosis performance and uncovering interpretable biomarkers for brain disorders without relying on predefined sub-network labels.

Jingfeng Tang, Peng Cao, Guangqi Wen, Jinzhu Yang, Xiaoli Liu, Osmar R. ZaianeWed, 11 Ma🤖 cs.LG

Multi-DNN Inference of Sparse Models on Edge SoCs

This paper introduces SparseLoom, a system that employs model stitching to recombine subgraphs from sparse models without re-training, thereby significantly improving throughput, reducing memory overhead, and lowering Service Level Objective violation rates for multi-DNN inference on edge SoCs compared to state-of-the-art systems.

Jiawei Luo, Di Wu, Simon Dobson, Blesson VargheseWed, 11 Ma🤖 cs.LG

Evolution of Photonic Quantum Machine Learning under Noise

This review systematically analyzes noise sources in photonic quantum machine learning, examining their impact on algorithm performance and exploring characterization and mitigation strategies to guide the development of robust, scalable systems.

A. M. A. S. D. Alagiyawanna, Asoka KarunanandaWed, 11 Ma⚛️ quant-ph

Well Log-Guided Synthesis of Subsurface Images from Sparse Petrography Data Using cGANs

This paper presents a conditional Generative Adversarial Network (cGAN) framework that synthesizes realistic, continuous pore-scale images of carbonate rock formations by conditioning on well log-derived porosity values, effectively bridging gaps between sparse petrography samples to enhance reservoir characterization for energy transition applications.

Ali Sadeghkhani, A. Assadi, B. Bennett, A. RabbaniWed, 11 Ma🤖 cs.LG

FreqCycle: A Multi-Scale Time-Frequency Analysis Method for Time Series Forecasting

FreqCycle is a novel multi-scale time-frequency analysis framework that improves time series forecasting by combining a Filter-Enhanced Cycle module for low-frequency patterns and a Segmented Frequency-domain module for mid-to-high frequencies, further extended to MFreqCycle to decouple coupled multi-periodicity, thereby achieving state-of-the-art accuracy with efficient inference.

Boya Zhang, Shuaijie Yin, Huiwen Zhu, Xing HeWed, 11 Ma🤖 cs.LG

No evaluation without fair representation : Impact of label and selection bias on the evaluation, performance and mitigation of classification models

This paper empirically analyzes the distinct impacts of label and selection bias on classification model evaluation and performance using a new framework for introducing controlled bias, revealing that fairness-accuracy trade-offs disappear when models are evaluated on unbiased data and demonstrating that the effectiveness of mitigation methods depends on the specific bias type present.

Magali Legast, Toon Calders, François FoussWed, 11 Ma🤖 cs.LG

← Previous Next →

cs.LG