Machine Learning on Heterogeneous, Edge, and Quantum Hardware for Particle Physics (ML-HEQUPP)

这篇白皮书提出了一项社区驱动的愿景,旨在通过整合人工智能、硅微电子和量子计算等新兴技术,优先研发面向下一代粒子物理实验的硬件机器学习系统,以应对前所未有的数据速率、极端环境及实时处理挑战。

Julia Gonski (Sunny), Jenni Ott (Sunny), Shiva Abbaszadeh (Sunny), Sagar Addepalli (Sunny), Matteo Cremonesi (Sunny), Jennet Dickinson (Sunny), Giuseppe Di Guglielmo (Sunny), Erdem Yigit Ertorer (Sunny), Lindsey Gray (Sunny), Ryan Herbst (Sunny), Christian Herwig (Sunny), Tae Min Hong (Sunny), Benedikt Maier (Sunny), Maryam Bayat Makou (Sunny), David Miller (Sunny), Mark S. Neubauer (Sunny), Cristián Peña (Sunny), Dylan Rankin (Sunny), Seon-Hee (Sunny), Seo, Giordon Stark, Alexander Tapper, Audrey Corbeil Therrien, Ioannis Xiotidis, Keisuke Yoshihara, G Abarajithan, Sagar Addepalli, Nural Akchurin, Carlos Argüelles, Saptaparna Bhattacharya, Lorenzo Borella, Christian Boutan, Tom Braine, James Brau, Martin Breidenbach, Antonio Chahine, Talal Ahmed Chowdhury, Yuan-Tang Chou, Seokju Chung, Alberto Coppi, Mariarosaria D'Alfonso, Abhilasha Dave, Chance Desmet, Angela Di Fulvio, Karri DiPetrillo, Javier Duarte, Auralee Edelen, Jan Eysermans, Yongbin Feng, Emmett Forrestel, Dolores Garcia, Loredana Gastaldo, Julián García Pardiñas, Lino Gerlach, Loukas Gouskos, Katya Govorkova, Carl Grace, Christopher Grant, Philip Harris, Ciaran Hasnip, Timon Heim, Abraham Holtermann, Tae Min Hong, Gian Michele Innocenti, Koji Ishidoshiro, Miaochen Jin, Jyothisraj Johnson, Stephen Jones, Andreas Jung, Georgia Karagiorgi, Ryan Kastner, Nicholas Kamp, Doojin Kim, Kyoungchul Kong, Katie Kudela, Jelena Lalic, Bo-Cheng Lai, Yun-Tsung Lai, Tommy Lam, Jeffrey Lazar, Aobo Li, Zepeng Li, Haoyun Liu, Vladimir Lončar, Luca Macchiarulo, Christopher Madrid, Benedikt Maier, Zhenghua Ma, Prashansa Mukim, Mark S. Neubauer, Victoria Nguyen, Sungbin Oh, Isobel Ojalvo, Hideyoshi Ozaki, Simone Pagan Griso, Myeonghun Park, Christoph Paus, Santosh Parajuli, Benjamin Parpillon, Sara Pozzi, Ema Puljak, Benjamin Ramhorst, Amy Roberts, Larry Ruckman, Kate Scholberg, Sebastian Schmitt, Noah Singer, Eluned Anne Smith, Alexandre Sousa, Michael Spannowsky, Sioni Summers, Yanwen Sun, Daniel Tapia Takaki, Antonino Tumeo, Caterina Vernieri, Belina von Krosigk, Yash Vora, Linyan Wan, Michael H. L. S. Wang, Amanda Weinstein, Andy White, Simon Williams, Felix YuThu, 12 Ma⚛️ hep-ex

Multi-Agent Memory from a Computer Architecture Perspective: Visions and Challenges Ahead

这篇论文从计算机体系结构视角出发,将多智能体记忆建模为包含共享与分布式范式及三层分级的架构问题,并指出跨智能体缓存共享、结构化访问控制以及最关键的记忆一致性是当前构建可靠可扩展多智能体系统面临的核心挑战。

Zhongming Yu, Naicheng Yu, Hejia Zhang, Wentao Ni, Mingrui Yin, Jiaying Yang, Yujie Zhao, Jishen ZhaoThu, 12 Ma🤖 cs.AI

Pooling Engram Conditional Memory in Large Language Models using CXL

本文提出利用 Compute Express Link (CXL) 内存池存储大语言模型的 Engram 条件记忆,通过 SGLang 集成实现了接近 DRAM 的端到端性能,为未来 Engram 集成的 LLM 提供了可扩展且高性价比的存储解决方案。

Ruiyang Ma, Teng Ma, Zhiyuan Su, Hantian Zha, Xinpeng Zhao, Xuchun Shang, Xingrui Yi, Zheng Liu, Zhu Cao, An Wu, Zhichong Dou, Ziqian Liu, Daikang Kuang, Guojie LuoThu, 12 Ma💻 cs

In-Memory ADC-Based Nonlinear Activation Quantization for Efficient In-Memory Computing

该论文提出了一种名为边界抑制 K 均值量化(BS-KMQ)的新型非线性量化方法,通过抑制分布边界异常值来优化聚类,并结合可重构存内非线性 ADC 设计,在显著降低量化误差和 ADC 分辨率需求的同时,大幅提升了存内计算系统的精度、面积效率及能效。

Shuai Dong, Junyi Yang, Biyan Zhou, Hongyang Shang, Gourav Datta, Arindam BasuThu, 12 Ma💻 cs

Reference Architecture of a Quantum-Centric Supercomputer

该论文针对当前量子与经典超算系统孤立运作导致的效率瓶颈,提出了一种融合量子、图形及中央处理单元的“以量子为中心”的超算(QCSC)参考架构,并规划了从专用卸载引擎到完全协同设计的三阶段演进路线图,旨在加速量子算法在化学与材料科学等关键领域的应用探索。

Seetharami Seelam, Jerry M. Chow, Antonio Córcoles, Sarah Sheldon, Tushar Mittal, Abhinav Kandala, Sean Dague, Ian Hincks, Hiroshi Horii, Blake Johnson, Michael Le, Hani Jamjoom, Jay M. GambettaThu, 12 Ma⚡ eess

Estimation of Energy-dissipation Lower-bounds for Neuromorphic Learning-in-memory

本文推导了基于学习存内(LIM)范式、通过调制物理存储能量势垒以匹配优化动力学的理想神经形态优化器的理论能耗下限,提出了一个仅依赖操作数、模型规模、收敛速度和精度的模型无关性能评估框架,并将其应用于大规模 AI 工作负载的能耗估算。

Zihao Chen, Faiek Ahsan, Johannes Leugering, Gert Cauwenberghs, Shantanu ChakrabarttyMon, 09 Ma🤖 cs.AI

Linear Layouts: Robust Code Generation of Efficient Tensor Computation Using F2\mathbb{F}_2

本文提出了名为“线性布局”的新方法,通过利用F2\mathbb{F}_2上的线性代数将张量布局建模为二进制矩阵,从而实现了通用且高效的布局定义与转换,显著降低了 Triton 编译器后端的工程复杂度并提升了张量计算性能。

Keren Zhou, Mario Lezcano, Adam Goucher, Akhmed Rakhmati, Jeff Niu, Justin Lebar, Pawel Szczerbuk, Peter Bell, Phil Tillet, Thomas Raoux, Zahi MoudallalMon, 09 Ma💻 cs

A Persistent-State Dataflow Accelerator for Memory-Bound Linear Attention Decode on FPGA

本文提出了一种基于 FPGA 的持久状态数据流加速器,通过将 Gated DeltaNet 的完整循环状态驻留于片上 BRAM 并结合流水线数据流设计,成功将内存受限的线性注意力解码转化为计算受限任务,在 AMD Alveo U55C 上实现了比 NVIDIA H100 GPU 快 4.5 倍且能效高出 60 倍的解码性能。

Neelesh Gupta, Peter Wang, Rajgopal Kannan, Viktor K. PrasannaMon, 09 Ma🤖 cs.LG