cs.AR papers | Gist.Science

Sensitivity-Guided Framework for Pruned and Quantized Reservoir Computing Accelerators

This paper presents a sensitivity-guided framework for compressing Reservoir Computing accelerators that systematically balances quantization and pruning to significantly improve hardware efficiency and reduce power consumption on FPGAs while maintaining high model accuracy across various time-series tasks.

Atousa Jafari, Mahdi Taheri, Hassan Ghasemzadeh Mohammadi, Christian Herglotz, Marco PlatznerWed, 11 Ma🤖 cs.AI

ARKV: Adaptive and Resource-Efficient KV Cache Management under Limited Memory Budget for Long-Context Inference in LLMs

ARKV is a lightweight, adaptive framework that dynamically allocates precision levels to KV cache tokens based on per-layer attention dynamics and token importance, achieving a 4x reduction in memory usage while preserving ~97% of baseline accuracy for long-context LLM inference without requiring retraining or architectural modifications.

Jianlong Lei, Shashikant IlagerWed, 11 Ma🤖 cs.AI

PhD Thesis Summary: Methods for Reliability Assessment and Enhancement of Deep Neural Network Hardware Accelerators

This PhD thesis presents novel, cost-efficient methods for assessing and enhancing the reliability of Deep Neural Network hardware accelerators, including a systematic literature review, new analytical tools, optimized trade-off methodologies, and the development of the AdAM real-time fault tolerance technique.

Mahdi TaheriWed, 11 Ma🤖 cs.AI

ALADIN: Accuracy-Latency-Aware Design-space Inference Analysis for Embedded AI Accelerators

This paper presents ALADIN, an accuracy-latency-aware framework that enables the pre-deployment evaluation of mixed-precision quantized neural networks on scratchpad-based embedded AI accelerators by transforming models into platform-aware representations to analyze trade-offs and bottlenecks without requiring physical hardware.

T. Baldi, D. Casini, A. BiondiWed, 11 Ma🤖 cs.AI

SiliconMind-V1: Multi-Agent Distillation and Debug-Reasoning Workflows for Verilog Code Generation

The paper introduces SiliconMind-V1, a unified multi-agent framework that leverages testbench-driven verification and iterative debug-reasoning workflows to train locally fine-tuned LLMs for generating functionally correct Verilog RTL designs, outperforming state-of-the-art models with greater efficiency and privacy.

Mu-Chi Chen, Yu-Hung Kao, Po-Hsuan Huang, Shao-Chun Ho, Hsiang-Yu Tsou, I-Ting Wu, En-Ming Huang, Yu-Kai Hung, Wei-Po Hsin, Cheng Liang, Chia-Heng Tu, Shih-Hao Hung, Hsiang-Tsung KungWed, 11 Ma🤖 cs.AI

CktEvo: Repository-Level RTL Code Benchmark for Design Evolution

This paper introduces CktEvo, a repository-level benchmark and closed-loop framework that enables large language models to iteratively optimize Power, Performance, and Area (PPA) in complete RTL designs by preserving functional behavior across cross-file dependencies without human intervention.

Zhengyuan Shi, Jingxin Wang, Tairan Cheng, Changran Xu, Weikang Qian, Qiang XuWed, 11 Ma🤖 cs.AI

Design Conductor: An agent autonomously builds a 1.5 GHz Linux-capable RISC-V CPU

The paper introduces Design Conductor, an autonomous agent that leverages frontier models to independently design, verify, and generate a tape-out ready 1.48 GHz RISC-V CPU (VerCore) from a text specification to GDSII layout in just 12 hours, marking the first instance of an agent building a complete, working CPU end-to-end.

The Verkor Team, Ravi Krishna, Suresh Krishna, David ChinWed, 11 Ma🤖 cs.AI

Unveiling the Potential of Quantization with MXFP4: Strategies for Quantization Error Reduction

This paper introduces two software-only techniques, Overflow-Aware Scaling (OAS) and Macro Block Scaling (MBS), that significantly reduce the accuracy gap between the hardware-efficient MXFP4 format and NVIDIA's NVFP4 standard in Large Language Models, achieving near-parity performance with minimal computational overhead.

Jatin Chhugani, Geonhwa Jeong, Bor-Yiing Su, Yunjie Pan, Hanmei Yang, Aayush Ankit, Jiecao Yu, Summer Deng, Yunqing Chen, Nadathur Satish, Changkyu KimWed, 11 Ma🤖 cs.AI

Measurement-Free Ancilla Recycling via Blind Reset: A Cross-Platform Study on Superconducting and Trapped-Ion Processors

This cross-platform study evaluates blind reset as a measurement-free ancilla recycling technique on superconducting and trapped-ion processors, demonstrating that it can significantly reduce logical-cycle latency while maintaining high ancilla cleanliness and identifying specific architecture-dependent crossover points for optimal deployment.

Sangkeum LeeWed, 11 Ma⚛️ quant-ph

Trust Nothing: RTOS Security without Run-Time Software TCB (Extended Version)

This paper presents a novel capability architecture and a corresponding Zephyr-based real-time operating system that achieves comprehensive security for embedded devices by fully disaggregating and isolating all software subsystems and peripherals, thereby eliminating the need for a run-time software Trusted Computing Base (TCB) without requiring hardware modifications.

Eric Ackermann, Sven BugielTue, 10 Ma💻 cs

Why Learn What Physics Already Knows? Realizing Agile mmWave-based Human Pose Estimation via Physics-Guided Preprocessing

This paper proposes a physics-guided preprocessing framework for millimeter-wave human pose estimation that explicitly models signal correlations and kinematics to achieve real-time, lightweight performance with significantly fewer parameters than existing data-driven baselines while maintaining competitive accuracy.

Shuntian Zheng, Jiaqi Li, Minzhe Ni, Xiaoman Lu, Yu GuanTue, 10 Ma💻 cs

GOMA: Geometrically Optimal Mapping via Analytical Modeling for Spatial Accelerators

This paper presents GOMA, a geometrically abstracted, globally optimal framework that uses analytical modeling to efficiently solve the combinatorial GEMM mapping problem for spatial accelerators, achieving significant improvements in energy-delay product and search speed over state-of-the-art methods.

Wulve Yang, Hailong Zou, Rui Zhou, Jionghao Zhang, Qiang Li, Gang Li, Yi Zhan, Shushan QiaoTue, 10 Ma💻 cs

ConnChecker: Automated Root-Cause Analysis for Formal Connectivity Check via Graph

ConnChecker is a novel graph-based automated tool that accelerates formal connectivity verification in complex SoC designs by categorizing counterexamples and localizing root causes, achieving up to an 80% reduction in debugging time.

Do Ngoc Tiep, Nguyen Linh Anh, Luu Danh MinhTue, 10 Ma💻 cs

Mozart: Modularized and Efficient MoE Training on 3.5D Wafer-Scale Chiplet Architectures

The paper introduces Mozart, an algorithm-hardware co-design framework that leverages 3.5D wafer-scale chiplet architectures with specialized expert allocation and scheduling strategies to overcome communication and memory bottlenecks in the efficient training of large-scale Mixture-of-Experts (MoE) language models.

Shuqing Luo (Katie), Ye Han (Katie), Pingzhi Li (Katie), Jiayin Qin (Katie), Jie Peng (Katie), Yang (Katie), Zhao (Kevin), Yu (Kevin), Cao, Tianlong ChenTue, 10 Ma💻 cs

Space-Control: Process-Level Isolation for Sharing CXL-based Disaggregated Memory

Space-Control is a hardware-software co-design that addresses the critical security gap of missing process-level isolation in CXL-based disaggregated memory by authenticating execution contexts and enforcing fine-grained access control with minimal performance overhead.

Kaustav Goswami, Sean Peisert, Venkatesh Akella, Jason Lowe-PowerTue, 10 Ma💻 cs

HaLoRA: Hardware-aware Low-Rank Adaptation for Large Language Models Based on Hybrid Compute-in-Memory Architecture

This paper proposes HaLoRA, a hardware-aware low-rank adaptation method that enhances the robustness of Large Language Models deployed on hybrid Compute-in-Memory architectures by training noise-resilient LoRA branches on SRAM while storing pretrained weights on noisy RRAM, thereby achieving significant energy efficiency and up to a 22.7-point performance improvement without compromising accuracy.

Taiqiang Wu, Chenchen Ding, Wenyong Zhou, Yuxin Cheng, Xincheng Feng, Shuqi Wang, Wendong Xu, Chufan Shi, Zhengwu Liu, Ngai WongTue, 10 Ma💬 cs.CL

HDLxGraph: Bridging Large Language Models and HDL Repositories via HDL Graph Databases

The paper proposes HDLxGraph, a novel framework that integrates Abstract Syntax Trees and Data Flow Graphs into Retrieval Augmented Generation to overcome structural and vocabulary mismatches in Hardware Description Language tasks, while also introducing the HDLSearch benchmark to demonstrate significant improvements in search, debugging, and code completion accuracy over existing baselines.

Pingqing Zheng (Katie), Jiayin Qin (Katie), Fuqi Zhang (Katie), Niraj Chitla (Katie), Zishen Wan (Katie), Shang Wu (Katie), Yu Cao (Katie), Caiwen Ding (Katie), Yang (Katie), ZhaoTue, 10 Ma🤖 cs.LG

Continuous-Flow Data-Rate-Aware CNN Inference on FPGA

This paper proposes a novel data-rate-aware continuous-flow architecture for CNN inference on FPGAs that mitigates hardware underutilization caused by data reduction in pooling and strided convolution layers by interleaving signals and sharing resources, thereby enabling the high-throughput implementation of complex models like MobileNet on a single device.

Tobias Habermann, Michael Mecik, Zhenyu Wang, César David Vera, Martin Kumm, Mario GarridoTue, 10 Ma🤖 cs.LG

Mitigating the Memory Bottleneck with Machine Learning-Driven and Data-Aware Microarchitectural Techniques

This dissertation addresses the memory bottleneck in modern computing by advocating a shift from data-agnostic to data-informed microarchitectural designs, proposing four machine learning-driven and data-aware mechanisms that significantly enhance performance and energy efficiency.

Rahul BeraTue, 10 Ma🤖 cs.LG

Accelerating Diffusion Models for Generative AI Applications with Silicon Photonics

This paper introduces a novel silicon photonics-based accelerator that significantly enhances the energy efficiency and throughput of diffusion models, addressing the high computational costs and energy consumption associated with their iterative denoising processes on conventional electronic platforms.

Tharini Suresh, Salma Afifi, Sudeep PasrichaTue, 10 Ma🤖 cs.LG

← Previous Next →