Unveiling the Potential of Quantization with MXFP4: Strategies for Quantization Error Reduction

This paper introduces two software-only techniques, Overflow-Aware Scaling (OAS) and Macro Block Scaling (MBS), that significantly reduce the accuracy gap between the hardware-efficient MXFP4 format and NVIDIA's NVFP4 standard in Large Language Models, achieving near-parity performance with minimal computational overhead.

Jatin Chhugani, Geonhwa Jeong, Bor-Yiing Su, Yunjie Pan, Hanmei Yang, Aayush Ankit, Jiecao Yu, Summer Deng, Yunqing Chen, Nadathur Satish, Changkyu Kim2026-03-11🤖 cs.AI

SiliconMind-V1: Multi-Agent Distillation and Debug-Reasoning Workflows for Verilog Code Generation

The paper introduces SiliconMind-V1, a unified multi-agent framework that leverages testbench-driven verification and iterative debug-reasoning workflows to train locally fine-tuned LLMs for generating functionally correct Verilog RTL designs, outperforming state-of-the-art models with greater efficiency and privacy.

Mu-Chi Chen, Yu-Hung Kao, Po-Hsuan Huang, Shao-Chun Ho, Hsiang-Yu Tsou, I-Ting Wu, En-Ming Huang, Yu-Kai Hung, Wei-Po Hsin, Cheng Liang, Chia-Heng Tu, Shih-Hao Hung, Hsiang-Tsung Kung2026-03-11🤖 cs.AI

Alignment Is the Disease: Censorship Visibility and Alignment Constraint Complexity as Determinants of Collective Pathology in Multi-Agent LLM Systems

This paper presents preliminary evidence from multi-agent simulations suggesting that alignment techniques and invisible censorship in large language models may paradoxically induce collective pathological behaviors and insight-action dissociation, indicating that safety interventions can sometimes cause the very harms they aim to prevent.

Hiroki Fukui2026-03-11🤖 cs.AI

ARKV: Adaptive and Resource-Efficient KV Cache Management under Limited Memory Budget for Long-Context Inference in LLMs

ARKV is a lightweight, adaptive framework that dynamically allocates precision levels to KV cache tokens based on per-layer attention dynamics and token importance, achieving a 4x reduction in memory usage while preserving ~97% of baseline accuracy for long-context LLM inference without requiring retraining or architectural modifications.

Jianlong Lei, Shashikant Ilager2026-03-11🤖 cs.AI

Measurement-Free Ancilla Recycling via Blind Reset: A Cross-Platform Study on Superconducting and Trapped-Ion Processors

This cross-platform study evaluates blind reset as a measurement-free ancilla recycling technique on superconducting and trapped-ion processors, demonstrating that it can significantly reduce logical-cycle latency while maintaining high ancilla cleanliness and identifying specific architecture-dependent crossover points for optimal deployment.

Sangkeum Lee2026-03-11⚛️ quant-ph

Benchmarking Federated Learning in Edge Computing Environments: A Systematic Review and Performance Evaluation

This paper presents a systematic review and performance evaluation of Federated Learning in edge computing, benchmarking five leading algorithms across key metrics to identify trade-offs, highlight SCAFFOLD's superior accuracy and robustness versus FedAvg's efficiency, and propose a future research agenda to address challenges like data heterogeneity and energy limitations.

Sales Aribe Jr., Gil Nicholas Cagande2026-03-11🤖 cs.AI

Sensitivity-Guided Framework for Pruned and Quantized Reservoir Computing Accelerators

This paper presents a sensitivity-guided framework for compressing Reservoir Computing accelerators that systematically balances quantization and pruning to significantly improve hardware efficiency and reduce power consumption on FPGAs while maintaining high model accuracy across various time-series tasks.

Atousa Jafari, Mahdi Taheri, Hassan Ghasemzadeh Mohammadi, Christian Herglotz, Marco Platzner2026-03-11🤖 cs.AI

Zipage: Maintain High Request Concurrency for LLM Reasoning through Compressed PagedAttention

The paper introduces Zipage, an LLM inference engine utilizing Compressed PagedAttention to combine token-wise KV cache eviction with PagedAttention, achieving over 2.1×\times speedup in high-concurrency reasoning tasks while maintaining approximately 95% of the performance of full KV inference.

Mengqi Liao, Lu Wang, Chaoyun Zhang, Bo Qiao, Si Qin, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, Huaiyu Wan2026-03-11🤖 cs.AI

Permutation-Equivariant 2D State Space Models: Theory and Canonical Architecture for Multivariate Time Series

This paper introduces the Variable-Invariant Two-Dimensional State Space Model (VI 2D SSM) and its unified VI 2D Mamba architecture, which theoretically establish and implement a permutation-equivariant framework for multivariate time series that eliminates artificial variable ordering to achieve state-of-the-art performance with improved structural scalability.

Seungwoo Jeong, Heung-Il Suk2026-03-11🤖 cs.AI

Hindsight Credit Assignment for Long-Horizon LLM Agents

The paper introduces HCAPO, a novel framework that enhances long-horizon LLM agents by leveraging hindsight reasoning to refine step-level Q-values and employing a multi-scale advantage mechanism to address sparse reward challenges, thereby significantly outperforming state-of-the-art methods like GRPO on benchmarks such as WebShop and ALFWorld.

Hui-Ze Tan, Xiao-Wen Yang, Hao Chen, Jie-Jing Shao, Yi Wen, Yuteng Shen, Weihong Luo, Xiku Du, Lan-Zhe Guo, Yu-Feng Li2026-03-11🤖 cs.AI

Generalized Reduction to the Isotropy for Flexible Equivariant Neural Fields

This paper introduces a principled method to reduce GG-invariant functions on product spaces X×MX \times M to HH-invariant functions on XX alone, where HH is the isotropy subgroup of MM, thereby enabling flexible Equivariant Neural Fields to handle arbitrary group actions and heterogeneous product spaces without structural constraints.

Alejandro García-Castellanos, Gijs Bellaard, Remco Duits, Daniel Pelt, Erik J Bekkers2026-03-11🤖 cs.AI