cs.AR papers | Gist.Science

Explainable and Hardware-Efficient Jamming Detection for 5G Networks Using the Convolutional Tsetlin Machine

This paper proposes and validates a hardware-efficient, explainable Convolutional Tsetlin Machine (CTM) for real-time 5G jamming detection that achieves comparable accuracy to convolutional neural networks while significantly reducing training time, memory usage, and enabling deterministic FPGA deployment.

Vojtech Halenka, Mohammadreza Amini, Per-Arne Andersen, Ole-Christoffer Granmo, Burak KantarciTue, 10 Ma🤖 cs.LG

Orion: Characterizing and Programming Apple's Neural Engine for LLM Training and Inference

This paper introduces Orion, the first open end-to-end system that bypasses Apple's opaque CoreML framework to enable direct Neural Engine programming for large language model training and inference, achieving an 8.5x speedup in weight updates through a novel patching mechanism and demonstrating stable training of 110M-parameter models on Apple Silicon.

Ramchand KumaresanTue, 10 Ma🤖 cs.LG

Optimized Many-Hypercube Codes toward Lower Logical Error Rates and Earlier Realization

This paper proposes and analyzes optimized many-hypercube codes using smaller constituent blocks and efficient encoders to achieve lower logical error rates and reduced overhead, thereby facilitating the earlier experimental realization of high-rate fault-tolerant quantum computing.

Hayato GotoTue, 10 Ma⚛️ quant-ph

In-Memory ADC-Based Nonlinear Activation Quantization for Efficient In-Memory Computing

This paper proposes Boundary Suppressed K-Means Quantization (BS-KMQ), a novel nonlinear quantization method that suppresses boundary outliers to optimize analog-to-digital converter resolution in in-memory computing, achieving significant improvements in quantization accuracy, area efficiency, and energy performance across various deep learning models.

Shuai Dong, Junyi Yang, Biyan Zhou, Hongyang Shang, Gourav Datta, Arindam BasuThu, 12 Ma💻 cs

Pooling Engram Conditional Memory in Large Language Models using CXL

This paper proposes a scalable and cost-efficient solution for Large Language Models by integrating Compute Express Link (CXL) memory pools into SGLang to store Engram conditional memory, achieving near-DRAM end-to-end performance while overcoming the latency limitations of traditional RDMA approaches.

Ruiyang Ma, Teng Ma, Zhiyuan Su, Hantian Zha, Xinpeng Zhao, Xuchun Shang, Xingrui Yi, Zheng Liu, Zhu Cao, An Wu, Zhichong Dou, Ziqian Liu, Daikang Kuang, Guojie LuoThu, 12 Ma💻 cs

Multi-Agent Memory from a Computer Architecture Perspective: Visions and Challenges Ahead

This position paper reframes multi-agent memory as a computer architecture challenge by proposing a three-layer hierarchy and identifying critical protocol gaps, with a specific focus on resolving multi-agent memory consistency as the primary obstacle to building reliable and scalable collaborative systems.

Zhongming Yu, Naicheng Yu, Hejia Zhang, Wentao Ni, Mingrui Yin, Jiaying Yang, Yujie Zhao, Jishen ZhaoThu, 12 Ma🤖 cs.AI

Managing Classical Processing Requirements for Quantum Error Correction

This paper addresses the challenge of fluctuating decoder demand in quantum error correction by proposing a two-level framework managed by a quantum operating system that treats decoders as shared resources, thereby reducing hardware requirements by 10–40% and making fault-tolerant quantum computing more practical.

Satvik Maurya, Abtin Molavi, Aws Albarghouthi, Swamit TannuThu, 12 Ma⚛️ quant-ph

Reference Architecture of a Quantum-Centric Supercomputer

This paper presents a reference architecture and roadmap for Quantum-Centric Supercomputing (QCSC) systems that integrate quantum, GPU, and CPU resources to overcome current isolation challenges and enable seamless, high-performance hybrid workflows across three evolutionary phases.

Seetharami Seelam, Jerry M. Chow, Antonio Córcoles, Sarah Sheldon, Tushar Mittal, Abhinav Kandala, Sean Dague, Ian Hincks, Hiroshi Horii, Blake Johnson, Michael Le, Hani Jamjoom, Jay M. GambettaThu, 12 Ma⚡ eess

HTM-EAR: Importance-Preserving Tiered Memory with Hybrid Routing under Saturation

HTM-EAR is a hierarchical tiered memory system that combines HNSW-based working memory with archival storage, importance-aware eviction, and hybrid routing to effectively preserve essential information and maintain high retrieval precision under sustained saturation, significantly outperforming traditional LRU approaches while approaching the performance of unbounded oracle memory.

Shubham Kumar SinghThu, 12 Ma🤖 cs.AI

Architecture-Aware LLM Inference Optimization on AMD Instinct GPUs: A Comprehensive Benchmark and Deployment Study

This paper presents a comprehensive benchmark of production LLM inference on AMD Instinct MI325X GPUs, demonstrating that architecture-aware optimizations—specifically the selective use of the AITER runtime and specific KV cache configurations—are critical for maximizing throughput across diverse model families while maintaining high reliability under heavy concurrency.

Athos GeorgiouThu, 12 Ma🤖 cs.AI

An FPGA Implementation of Displacement Vector Search for Intra Pattern Copy in JPEG XS

This paper presents an efficient pipelined FPGA architecture with optimized memory organization for the displacement vector search module of JPEG XS's Intra Pattern Copy tool, achieving a throughput of 38.3 Mpixels/s at 277 mW to enable practical hardware deployment.

Qiyue Chen, Yao Li, Jie Tao, Song Chen, Li Li, Dong LiuThu, 12 Ma⚡ eess

The DMA Streaming Framework: Kernel-Level Buffer Orchestration for High-Performance AI Data Paths

This paper introduces dmaplane, a Linux kernel module that provides explicit kernel-level buffer orchestration for high-performance AI data paths by integrating features like DMA lifecycle management, NUMA-aware allocation, and RDMA-based cross-device sharing to enable efficient, safe, and disaggregated AI inference.

Marco GrazianoThu, 12 Ma🤖 cs.AI

RedFuser: An Automatic Operator Fusion Framework for Cascaded Reductions on AI Accelerators

RedFuser is an automatic framework that employs a formal theoretical methodology to identify and fuse cascaded reduction operations into optimized single-loop kernels, achieving significant speedups over state-of-the-art AI compilers while matching the performance of hand-written solutions.

Xinsheng Tang, Yangcheng Li, Nan Wang, Zhiyi Shu, Xingyu Ling, Junna Xing, Peng Zhou, Qiang LiuThu, 12 Ma🤖 cs.AI

Machine Learning on Heterogeneous, Edge, and Quantum Hardware for Particle Physics (ML-HEQUPP)

This white paper presents a community-driven vision to prioritize research and development in hardware-based machine learning systems—leveraging emerging technologies like AI, silicon microelectronics, and quantum algorithms—to address the unprecedented data acquisition challenges and enable real-time scientific discovery in next-generation particle physics experiments.

Julia Gonski (Sunny), Jenni Ott (Sunny), Shiva Abbaszadeh (Sunny), Sagar Addepalli (Sunny), Matteo Cremonesi (Sunny), Jennet Dickinson (Sunny), Giuseppe Di Guglielmo (Sunny), Erdem Yigit Ertorer (Sunny), Lindsey Gray (Sunny), Ryan Herbst (Sunny), Christian Herwig (Sunny), Tae Min Hong (Sunny), Benedikt Maier (Sunny), Maryam Bayat Makou (Sunny), David Miller (Sunny), Mark S. Neubauer (Sunny), Cristián Peña (Sunny), Dylan Rankin (Sunny), Seon-Hee (Sunny), Seo, Giordon Stark, Alexander Tapper, Audrey Corbeil Therrien, Ioannis Xiotidis, Keisuke Yoshihara, G Abarajithan, Sagar Addepalli, Nural Akchurin, Carlos Argüelles, Saptaparna Bhattacharya, Lorenzo Borella, Christian Boutan, Tom Braine, James Brau, Martin Breidenbach, Antonio Chahine, Talal Ahmed Chowdhury, Yuan-Tang Chou, Seokju Chung, Alberto Coppi, Mariarosaria D'Alfonso, Abhilasha Dave, Chance Desmet, Angela Di Fulvio, Karri DiPetrillo, Javier Duarte, Auralee Edelen, Jan Eysermans, Yongbin Feng, Emmett Forrestel, Dolores Garcia, Loredana Gastaldo, Julián García Pardiñas, Lino Gerlach, Loukas Gouskos, Katya Govorkova, Carl Grace, Christopher Grant, Philip Harris, Ciaran Hasnip, Timon Heim, Abraham Holtermann, Tae Min Hong, Gian Michele Innocenti, Koji Ishidoshiro, Miaochen Jin, Jyothisraj Johnson, Stephen Jones, Andreas Jung, Georgia Karagiorgi, Ryan Kastner, Nicholas Kamp, Doojin Kim, Kyoungchul Kong, Katie Kudela, Jelena Lalic, Bo-Cheng Lai, Yun-Tsung Lai, Tommy Lam, Jeffrey Lazar, Aobo Li, Zepeng Li, Haoyun Liu, Vladimir Lončar, Luca Macchiarulo, Christopher Madrid, Benedikt Maier, Zhenghua Ma, Prashansa Mukim, Mark S. Neubauer, Victoria Nguyen, Sungbin Oh, Isobel Ojalvo, Hideyoshi Ozaki, Simone Pagan Griso, Myeonghun Park, Christoph Paus, Santosh Parajuli, Benjamin Parpillon, Sara Pozzi, Ema Puljak, Benjamin Ramhorst, Amy Roberts, Larry Ruckman, Kate Scholberg, Sebastian Schmitt, Noah Singer, Eluned Anne Smith, Alexandre Sousa, Michael Spannowsky, Sioni Summers, Yanwen Sun, Daniel Tapia Takaki, Antonino Tumeo, Caterina Vernieri, Belina von Krosigk, Yash Vora, Linyan Wan, Michael H. L. S. Wang, Amanda Weinstein, Andy White, Simon Williams, Felix YuThu, 12 Ma⚛️ hep-ex

Hardware Efficient Approximate Convolution with Tunable Error Tolerance for CNNs

This paper proposes a hardware-efficient "soft sparsity" paradigm for CNNs that utilizes a Most Significant Bit (MSB) proxy to skip negligible non-zero multiplications, achieving significant MAC and power reductions with zero accuracy loss while outperforming traditional zero-skipping methods.

Vishal Shashidhar, Anupam Kumari, Roy P PailyThu, 12 Ma🤖 cs.LG

Linear Layouts: Robust Code Generation of Efficient Tensor Computation Using $\mathbb{F}_2$

This paper introduces "Linear Layouts," a novel framework that models tensor layouts as linear algebra operations over $\mathbb{F}_2$ to enable generic, efficient, and bug-free layout definitions and conversions for deep learning workloads, successfully integrating with the Triton compiler to overcome the limitations of existing case-by-case approaches.

Keren Zhou, Mario Lezcano, Adam Goucher, Akhmed Rakhmati, Jeff Niu, Justin Lebar, Pawel Szczerbuk, Peter Bell, Phil Tillet, Thomas Raoux, Zahi MoudallalMon, 09 Ma💻 cs

An Integrated Failure and Threat Mode and Effect Analysis (FTMEA) Framework with Quantified Cross-Domain Correlation Factors for Automotive Semiconductors

This paper proposes an Integrated Failure and Threat Mode and Effect Analysis (FTMEA) framework for automotive semiconductors that unifies functional safety and cybersecurity assessments by introducing quantified Cross-Domain Correlation Factors to accurately identify and prioritize synergistic risks that traditional methods often overlook.

Antonino Armato, Marzana Khatun, Sebastian FischerMon, 09 Ma💻 cs

Scalable Digital Compute-in-Memory Ising Machines for Robustness Verification of Binary Neural Networks

This paper proposes a scalable digital compute-in-memory SRAM-based Ising machine that reformulates binary neural network robustness verification as a QUBO problem, leveraging imperfect solutions to efficiently detect adversarial perturbations while achieving significant improvements in convergence speed and power efficiency compared to conventional CPU implementations.

Madhav Vadlamani, Rahul Singh, Yuyao Kong, Zheng Zhang, Shimeng YuMon, 09 Ma💻 cs

A Persistent-State Dataflow Accelerator for Memory-Bound Linear Attention Decode on FPGA

This paper presents an FPGA accelerator that eliminates the memory-bound bottleneck of Gated DeltaNet decode by persistently storing the full recurrent state in on-chip BRAM, achieving 4.5 $\times$ faster inference and up to 60 $\times$ higher energy efficiency compared to an NVIDIA H100 GPU.

Neelesh Gupta, Peter Wang, Rajgopal Kannan, Viktor K. PrasannaMon, 09 Ma🤖 cs.LG

Estimation of Energy-dissipation Lower-bounds for Neuromorphic Learning-in-memory

This paper derives model-agnostic theoretical lower-bounds for the energy-to-solution metric of ideal neuromorphic learning-in-memory optimizers by analyzing their out-of-equilibrium thermodynamics, demonstrating how matching memory dynamics to optimization processes can overcome energy bottlenecks associated with memory writes and consolidation in large-scale AI workloads.

Zihao Chen, Faiek Ahsan, Johannes Leugering, Gert Cauwenberghs, Shantanu ChakrabarttyMon, 09 Ma🤖 cs.AI

← Previous Next →

cs.AR