cs.AR papers | Gist.Science

LUMINA: LLM-Guided GPU Architecture Exploration via Bottleneck Analysis

LUMINA is an LLM-driven framework that enhances GPU architecture exploration for AI workloads by automatically extracting design rules and performing bottleneck analysis, achieving significantly higher efficiency and better performance-area trade-offs than existing methods with minimal search cost.

Tao Zhang, Rui Ma, Shuotao Xu, Peng Cheng, Yongqiang XiongMon, 09 Ma🤖 cs.AI

NL2GDS: LLM-aided interface for Open Source Chip Design

NL2GDS is a novel framework that leverages large language models to automatically translate natural language hardware specifications into synthesizable RTL and complete GDSII layouts via the OpenLane flow, achieving significant improvements in area, delay, and power efficiency while democratizing ASIC design.

Max Eland, Jeyan Thiyagalingam, Dinesh Pamunuwa + 1 more2026-03-06💻 cs

Network Design for Wafer-Scale Systems with Wafer-on-Wafer Hybrid Bonding

This paper proposes four optimized reticle placement strategies for wafer-on-wafer hybrid bonded systems that significantly enhance network throughput, latency, and energy efficiency compared to a baseline 2D mesh topology.

Patrick Iff, Tommaso Bonato, Maciej Besta + 2 more2026-03-06💻 cs

VMXDOTP: A RISC-V Vector ISA Extension for Efficient Microscaling (MX) Format Acceleration

This paper proposes VMXDOTP, a RISC-V Vector ISA extension that efficiently accelerates microscaling (MX) format computations in modern transformer models by introducing specialized instructions for flexible block sizes and mixed-precision operations, achieving significant gains in performance, energy efficiency, and hardware utilization compared to software emulation and prior MX engines.

Max Wipfli, Gamze İslamoğlu, Navaneeth Kunhi Purayil + 2 more2026-03-06💻 cs

MCEL: Margin-Based Cross-Entropy Loss for Error-Tolerant Quantized Neural Networks

This paper proposes Margin-Based Cross-Entropy Loss (MCEL), a novel, efficient training objective that explicitly maximizes output logit margins to significantly enhance bit error tolerance in quantized neural networks, offering a scalable alternative to computationally expensive error-injection training methods.

Mikail Yayla, Akash Kumar2026-03-06🤖 cs.LG

AI+HW 2035: Shaping the Next Decade

This vision paper outlines a 10-year roadmap for the strategic co-design of AI and hardware, aiming to achieve a 1000x improvement in energy efficiency and sustainable, human-centric intelligent systems through coordinated global efforts across algorithms, architectures, and policy.

Deming Chen, Jason Cong, Azalia Mirhoseini + 27 more2026-03-06🤖 cs.AI

Hardware-Software Co-design for 3D-DRAM-based LLM Serving Accelerator

This paper presents Helios, a hybrid-bonding-based hardware-software co-designed accelerator for 3D-DRAM that overcomes limitations in existing near-memory processing designs by introducing spatially-aware KV cache allocation and distributed tiled attention execution to significantly improve the speed and energy efficiency of serving highly dynamic large language model workloads.

Cong Li, Yihan Yin, Chenhao Xue + 7 more2026-03-06💻 cs

HDLFORGE: A Two-Stage Multi-Agent Framework for Efficient Verilog Code Generation with Adaptive Model Escalation

HDLFORGE is a two-stage multi-agent framework that optimizes the trade-off between Verilog generation speed and accuracy by defaulting to a medium-sized LLM and escalating to a stronger model only when necessary, while leveraging a counterexample-guided formal agent to efficiently detect and repair bugs.

Armin Abdollahi, Saeid Shokoufa, Negin Ashrafi + 2 more2026-03-06💻 cs

FPGA-Enabled Machine Learning Applications in Earth Observation: A Systematic Review

This systematic review analyzes 68 experiments deploying machine learning models on FPGAs for Earth Observation, introducing dual taxonomies for model architectures and implementation strategies to address the challenges of onboard processing in the NewSpace era.

Cédric Léonard, Dirk Stober, Martin Schulz2026-03-06💻 cs

Lyra: A Hardware-Accelerated RISC-V Verification Framework with Generative Model-Based Processor Fuzzing

This paper presents Lyra, a heterogeneous RISC-V verification framework that combines FPGA-based hardware acceleration with a domain-specialized, ISA-aware generative model to overcome the limitations of software simulation and random fuzzing, achieving significantly higher coverage and up to 3343× faster verification speeds.

Juncheng Huo, Yunfan Gao, Xinxin Liu + 4 more2026-03-05💻 cs

ChipletPart: Cost-Aware Partitioning for 2.5D Systems

This paper introduces ChipletPart, a cost-driven 2.5D system partitioner that integrates a sophisticated cost model with genetic algorithm-based technology assignment and simulated annealing floorplanning to significantly reduce chiplet costs and ensure I/O feasibility compared to state-of-the-art methods.

Alexander Graening, Puneet Gupta, Andrew B. Kahng + 2 more2026-03-05💻 cs

Arapai: An Offline-First AI Chatbot Architecture for Low-Connectivity Educational Environments

This paper introduces Arapai, an offline-first AI chatbot architecture that enables personalized, curriculum-aligned learning on low-specification, CPU-only devices without internet connectivity, thereby addressing digital inequalities and enhancing educational resilience in resource-constrained environments.

Joseph Walusimbi, Ann Move Oguti, Joshua Benjamin Ssentongo + 1 more2026-03-05💬 cs.CL

When Small Variations Become Big Failures: Reliability Challenges in Compute-in-Memory Neural Accelerators

This paper addresses the critical reliability challenges in Compute-in-Memory neural accelerators caused by device non-idealities by demonstrating the disproportionate impact of small variations on safety-critical workloads and proposing cross-layer solutions, including a selective write-verify mechanism (SWIM) and noise-aware training, to ensure robust and efficient deployment.

Yifan Qin, Jiahao Zheng, Zheyu Yan + 3 more2026-03-05🤖 cs.LG

Joint Hardware-Workload Co-Optimization for In-Memory Computing Accelerators

This paper proposes a joint hardware-workload co-optimization framework using an evolutionary algorithm to design generalized in-memory computing accelerators that significantly reduce the energy-delay-area product across multiple neural network workloads, overcoming the limitations of single-workload specialized designs.

Olga Krestinskaya, Mohammed E. Fouda, Ahmed Eltawil + 1 more2026-03-05🤖 cs.AI

Formal that "Floats" High: Formal Verification of Floating Point Arithmetic

This paper presents a scalable, modular methodology for the formal verification of floating-point arithmetic that employs direct RTL-to-RTL model checking, counterexample-guided refinement, and AI-assisted property generation to overcome the limitations of high-level abstraction and achieve higher coverage efficiency.

Hansa Mohanty, Vaisakh Naduvodi Viswambharan, Deepak Narayan Gadde2026-03-05🤖 cs.AI

Efficient Image Reconstruction Architecture for Neutral Atom Quantum Computing

This paper presents a highly parallel FPGA-based image reconstruction accelerator for neutral atom quantum computers that employs hardware-software co-design to achieve a 34.9× speedup over CPU baselines, significantly reducing the time overhead associated with atom detection and state measurement.

Jonas Winklmann, Yian Yu, Xiaorang Guo + 2 more2026-03-03⚛️ quant-ph

← Previous