da4ml: Distributed Arithmetic for Real-time Neural Networks on FPGAs
The paper introduces **da4ml**, an efficient distributed arithmetic algorithm integrated into the `hls4ml` library that reduces FPGA resource utilization by up to one-third while simultaneously decreasing latency for real-time, highly quantized neural network inference.