cs.DC papers | Gist.Science

Two Teachers Better Than One: Hardware-Physics Co-Guided Distributed Scientific Machine Learning

The paper introduces EPIC, a hardware- and physics-co-guided distributed scientific machine learning framework that significantly reduces communication latency and energy consumption while preserving physical fidelity by performing lightweight local encoding and physics-aware decoding with cross-attention for tasks like full-waveform inversion.

Yuchen Yuan, Junhuan Yang, Hao Wan, Yipei Liu, Hanhan Wu, Youzuo Lin, Lei YangWed, 11 Ma🤖 cs.LG

The $qs$ Inequality: Quantifying the Double Penalty of Mixture-of-Experts at Inference

This paper introduces the $qs$ inequality to demonstrate that Mixture-of-Experts (MoE) models suffer from a structural "double penalty" of routing fragmentation and memory constraints during inference, often rendering them significantly less efficient than quality-matched dense models for long-context serving despite their training-time FLOP advantages.

Vignesh Adhinarayanan, Nuwan JayasenaWed, 11 Ma🤖 cs.LG

Compiler-First State Space Duality and Portable $O(1)$ Autoregressive Caching for Inference

This paper demonstrates that Mamba-2's state space duality can be implemented entirely using standard XLA primitives without custom kernels, achieving portable, host-synchronization-free $O(1)$ autoregressive caching with high performance across CPU, NVIDIA GPU, and Google Cloud TPU hardware.

Cosmo SantoniWed, 11 Ma🤖 cs.AI

FedLECC: Cluster- and Loss-Guided Client Selection for Federated Learning under Non-IID Data

FedLECC is a lightweight client selection strategy for federated learning under non-IID data that groups clients by label-distribution similarity and prioritizes those with higher local loss, thereby significantly improving test accuracy while reducing communication rounds and overhead.

Daniel M. Jimenez-Gutierrez, Giovanni Giunta, Mehrdad Hassanzadeh, Aris Anagnostopoulos, Ioannis Chatzigiannakis, Andrea VitalettiWed, 11 Ma🤖 cs.AI

Zipage: Maintain High Request Concurrency for LLM Reasoning through Compressed PagedAttention

The paper introduces Zipage, an LLM inference engine utilizing Compressed PagedAttention to combine token-wise KV cache eviction with PagedAttention, achieving over 2.1 $\times$ speedup in high-concurrency reasoning tasks while maintaining approximately 95% of the performance of full KV inference.

Mengqi Liao, Lu Wang, Chaoyun Zhang, Bo Qiao, Si Qin, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, Huaiyu WanWed, 11 Ma🤖 cs.AI

Sensitivity-Guided Framework for Pruned and Quantized Reservoir Computing Accelerators

This paper presents a sensitivity-guided framework for compressing Reservoir Computing accelerators that systematically balances quantization and pruning to significantly improve hardware efficiency and reduce power consumption on FPGAs while maintaining high model accuracy across various time-series tasks.

Atousa Jafari, Mahdi Taheri, Hassan Ghasemzadeh Mohammadi, Christian Herglotz, Marco PlatznerWed, 11 Ma🤖 cs.AI

Autonomous Edge-Deployed AI Agents for Electric Vehicle Charging Infrastructure Management

This paper introduces Auralink SDC, an edge-deployed multi-agent AI architecture that autonomously manages electric vehicle charging infrastructure with high reliability and sub-50ms latency, achieving 78% autonomous incident resolution and 87.6% diagnostic accuracy to address the critical failure rates and slow remediation times of current cloud-centric systems.

Mohammed CherifiWed, 11 Ma🤖 cs.AI

Benchmarking Federated Learning in Edge Computing Environments: A Systematic Review and Performance Evaluation

This paper presents a systematic review and performance evaluation of Federated Learning in edge computing, benchmarking five leading algorithms across key metrics to identify trade-offs, highlight SCAFFOLD's superior accuracy and robustness versus FedAvg's efficiency, and propose a future research agenda to address challenges like data heterogeneity and energy limitations.

Sales Aribe Jr., Gil Nicholas CagandeWed, 11 Ma🤖 cs.AI

ARKV: Adaptive and Resource-Efficient KV Cache Management under Limited Memory Budget for Long-Context Inference in LLMs

ARKV is a lightweight, adaptive framework that dynamically allocates precision levels to KV cache tokens based on per-layer attention dynamics and token importance, achieving a 4x reduction in memory usage while preserving ~97% of baseline accuracy for long-context LLM inference without requiring retraining or architectural modifications.

Jianlong Lei, Shashikant IlagerWed, 11 Ma🤖 cs.AI

PhD Thesis Summary: Methods for Reliability Assessment and Enhancement of Deep Neural Network Hardware Accelerators

This PhD thesis presents novel, cost-efficient methods for assessing and enhancing the reliability of Deep Neural Network hardware accelerators, including a systematic literature review, new analytical tools, optimized trade-off methodologies, and the development of the AdAM real-time fault tolerance technique.

Mahdi TaheriWed, 11 Ma🤖 cs.AI

A Consensus-Driven Multi-LLM Pipeline for Missing-Person Investigations

This paper introduces Guardian, a consensus-driven, multi-LLM pipeline enhanced by QLoRA fine-tuning that coordinates specialized models and a consensus engine to perform auditable, structured information extraction for critical missing-person investigations while avoiding unconstrained decision-making.

Joshua Castillo, Ravi MukkamalaWed, 11 Ma🤖 cs.AI

A Blockchain-based Traceability System for AI-Driven Engine Blade Inspection

This paper presents BladeChain, a blockchain-based system that integrates multi-stakeholder endorsement, automated scheduling, and AI model provenance to provide immutable, auditable traceability for aircraft engine blade inspections across the entire component life cycle.

Mahmoud Hafez, Eman Ouda, Mohammed A. Mohammed Eltoum, Khaled Salah, Yusra AbdulrahmanTue, 10 Ma💻 cs

A Hodge-Based Framework for Service Operational Analysis in Serverless Platforms

This paper proposes a topological framework utilizing Hodge decomposition to analyze serverless service flows, distinguishing between locally correctable errors and globally persistent harmonic inefficiencies to derive actionable remediation strategies without requiring complete architectural restructuring.

Gianluca Reali, Mauro FemminellaTue, 10 Ma💻 cs

SafarDB: FPGA-Accelerated Distributed Transactions via Replicated Data Types

SafarDB is a novel FPGA-accelerated distributed transaction system that co-designs a network-attached replication engine with a custom FPGA network interface to achieve significantly lower latency and higher throughput for both Conflict-Free and Well-coordinated Replicated Data Types compared to state-of-the-art RDMA-based implementations.

Javad Saberlatibari, Prithviraj Yuvaraj, Mohsen Lesani, Philip Brisk, Mohammad SadoghiTue, 10 Ma💻 cs

SI-ChainFL: Shapley-Incentivized Secure Federated Learning for High-Speed Rail Data Sharing

This paper proposes SI-ChainFL, a secure and efficient federated learning framework for high-speed rail data sharing that combines Shapley value-based contribution incentives with a blockchain-driven decentralized aggregation protocol to mitigate free-riding and model poisoning while ensuring robust performance against malicious attacks.

Mingjie Zhao, Cheng Dai, Fei Chen, Xin Chen, Kaoru Ota, Mianxiong Dong, Bing GuoTue, 10 Ma💻 cs

ACE-GF-based Attestation Relay for PQC - Lightweight Mempool Propagation Without On-Path Proofs

The paper introduces AR-ACE, a lightweight mempool propagation protocol for post-quantum blockchains that eliminates on-path validity proofs by having relay nodes forward only objects with compact attestations, thereby achieving an order-of-magnitude bandwidth reduction while shifting proof verification entirely to the builder.

Jian Sheng WangTue, 10 Ma💻 cs

ZK-ACE: Identity-Centric Zero-Knowledge Authorization for Post-Quantum Blockchain Systems

ZK-ACE is a post-quantum authorization framework that replaces kilobyte-scale signature artifacts with compact, identity-bound zero-knowledge proofs, achieving an order-of-magnitude reduction in on-chain data while providing formal security guarantees against replay and substitution attacks.

Jian Sheng WangTue, 10 Ma💻 cs

RAPID: Redundancy-Aware and Compatibility-Optimal Edge-Cloud Partitioned Inference for Diverse VLA models

The paper introduces RAPID, a novel Edge-Cloud Collaborative inference framework designed to optimize the deployment of Vision Language Action models by addressing visual noise interference and step-wise task redundancy, thereby achieving up to a 1.73x speedup with minimal overhead.

Zihao Zheng, Sicheng Tian, Hangyu Cao, Chenyue Li, Jiayu Chen, Maoliang Li, Xinhao Sun, Hailong Zou, Guojie Luo, Xiang ChenTue, 10 Ma💻 cs

SageSched: Efficient LLM Scheduling Confronting Demand Uncertainty and Hybridity

SageSched is an efficient LLM scheduler that addresses demand uncertainty and hybrid resource requirements by integrating lightweight output-length prediction with a comprehensive cost model and an uncertainty-aware policy, achieving over 28.7% efficiency improvement in diverse testbed experiments.

Zhenghao Gan, Yichen Bao, Yifei Liu, Chen Chen, Quan Chen, Minyi GuoTue, 10 Ma💻 cs

The Consistency Correctness in CoPPar Tree

This supplementary document provides a detailed correctness proof for the CoPPar architecture, which was introduced in the main CoPPar Tree paper.

Xincheng Yang, Kyle HaleTue, 10 Ma💻 cs

← Previous Next →

cs.DC