VMXDOTP: A RISC-V Vector ISA Extension for Efficient Microscaling (MX) Format Acceleration

This paper proposes VMXDOTP, a RISC-V Vector ISA extension that efficiently accelerates microscaling (MX) format computations in modern transformer models by introducing specialized instructions for flexible block sizes and mixed-precision operations, achieving significant gains in performance, energy efficiency, and hardware utilization compared to software emulation and prior MX engines.

Max Wipfli, Gamze İslamoğlu, Navaneeth Kunhi Purayil + 2 more2026-03-06💻 cs

Hardware-Software Co-design for 3D-DRAM-based LLM Serving Accelerator

This paper presents Helios, a hybrid-bonding-based hardware-software co-designed accelerator for 3D-DRAM that overcomes limitations in existing near-memory processing designs by introducing spatially-aware KV cache allocation and distributed tiled attention execution to significantly improve the speed and energy efficiency of serving highly dynamic large language model workloads.

Cong Li, Yihan Yin, Chenhao Xue + 7 more2026-03-06💻 cs

When Small Variations Become Big Failures: Reliability Challenges in Compute-in-Memory Neural Accelerators

This paper addresses the critical reliability challenges in Compute-in-Memory neural accelerators caused by device non-idealities by demonstrating the disproportionate impact of small variations on safety-critical workloads and proposing cross-layer solutions, including a selective write-verify mechanism (SWIM) and noise-aware training, to ensure robust and efficient deployment.

Yifan Qin, Jiahao Zheng, Zheyu Yan + 3 more2026-03-05🤖 cs.LG