cs.DC papers | Gist.Science

Reexamining Paradigms of End-to-End Data Movement

This paper argues that achieving high-performance end-to-end data movement requires shifting focus from raw network bandwidth to a holistic hardware-software co-design approach, introducing the "Drainage Basin Pattern" to identify and resolve bottlenecks across six critical paradigms ranging from network latency to host-side factors.

Chin Fang, Timothy Stitt, Michael J. McManus, Toshio MoriyaMon, 09 Ma💻 cs

A Systematic Evaluation of the Potential of Carbon-Aware Execution for Scientific Workflows

This paper systematically evaluates the potential of carbon-aware execution strategies for scientific workflows, demonstrating that leveraging their inherent flexibility through temporal shifting and dynamic resource scaling can reduce carbon emissions by over 80% and 67%, respectively.

Kathleen West, Youssef Moawad, Fabian Lehmann, Vasilis Bountris, Ulf Leser, Yehia Elkhatib, Lauritz ThamsenMon, 09 Ma💻 cs

A Hierarchical Sharded Blockchain Balancing Performance and Availability

This paper introduces PyloChain, a hierarchical sharded blockchain that balances performance and availability by utilizing speculative execution across local chains and a DAG-based main chain to efficiently handle global transactions while ensuring data availability and fault tolerance.

Yongrae Jo, Chanik ParkMon, 09 Ma💻 cs

Linear Layouts: Robust Code Generation of Efficient Tensor Computation Using $\mathbb{F}_2$

This paper introduces "Linear Layouts," a novel framework that models tensor layouts as linear algebra operations over $\mathbb{F}_2$ to enable generic, efficient, and bug-free layout definitions and conversions for deep learning workloads, successfully integrating with the Triton compiler to overcome the limitations of existing case-by-case approaches.

Keren Zhou, Mario Lezcano, Adam Goucher, Akhmed Rakhmati, Jeff Niu, Justin Lebar, Pawel Szczerbuk, Peter Bell, Phil Tillet, Thomas Raoux, Zahi MoudallalMon, 09 Ma💻 cs

FAST: An Efficient Scheduler for All-to-All GPU Communication

FAST is an efficient scheduler designed to overcome the scalability and performance limitations of existing solutions for All-to-All(v) communication in dynamic Mixture-of-Experts workloads by addressing traffic skew and incast congestion while drastically reducing synthesis time on modern GPU clusters.

Yiran Lei, Dongjoo Lee, Liangyu Zhao, Daniar Kurniawan, Chanmyeong Kim, Heetaek Jeong, Changsu Kim, Hyeonseong Choi, Liangcheng Yu, Arvind Krishnamurthy, Justine Sherry, Eriko NurvitadhiMon, 09 Ma💻 cs

{\lambda}Scale: Enabling Fast Scaling for Serverless Large Language Model Inference

{\lambda}Scale is an efficient serverless inference system that accelerates large language model scaling by leveraging high-speed RDMA networks for fast model multicast and enabling "execute-while-load" distributed inference, thereby significantly reducing tail latency and costs compared to state-of-the-art solutions.

Minchen Yu, Rui Yang, Chaobo Jia, Zhaoyuan Su, Sheng Yao, Tingfeng Lan, Yuchen Yang, Zirui Wang, Yue Cheng, Wei Wang, Ao Wang, Ruichuan ChenMon, 09 Ma💻 cs

Comparative Analysis of Cross-Chain Token Standards

This paper provides a comprehensive comparative analysis of five leading cross-chain token standards—xERC20, OFT, NTT, CCT, and SuperchainERC20—examining their architectural designs, message-passing mechanisms, and security features to highlight their distinct implementation approaches, trust models, and target ecosystems despite their shared goal of seamless cross-chain fungibility.

Fatemeh Heidari Soureshjani, Jan GorznyMon, 09 Ma💻 cs

Provuse: Platform-Side Function Fusion for Performance and Efficiency in FaaS Environments

This paper introduces Provuse, a transparent platform-side optimization for FaaS environments that automatically fuses independently deployed functions at runtime to eliminate redundant instances, thereby reducing end-to-end latency by an average of 26.33% and RAM usage by 53.57% without requiring any code changes from developers.

Niklas Kowallik, Natalie Carl, Leon Pöllinger, Wei Wang, Sharan Santhahanam, David BermbachMon, 09 Ma💻 cs

Edge Intelligence-Driven LegalEdge Contracts for EV Charging Stations: A Fedrated Learning with Deep Q-Networks Approach

This paper introduces LegalEdge, an edge intelligence framework that combines Federated Learning and Deep Q-Networks within blockchain smart contracts to optimize electric vehicle charging through privacy-preserving, real-time dynamic pricing and autonomous energy allocation.

Rahim Rahmani, Arman ChianehMon, 09 Ma💻 cs

Knowledge-driven Reasoning for Mobile Agentic AI: Concepts, Approaches, and Directions

This paper proposes a knowledge-driven reasoning framework for mobile agentic AI that extracts and synchronizes reusable decision structures to optimize on-device performance under resource and connectivity constraints, demonstrating that an optimal, non-monotonic level of knowledge injection significantly enhances mission reliability and efficiency compared to existing approaches.

Guangyuan Liu, Changyuan Zhao, Yinqiu Liu, Dusit Niyato, Biplab SikdarMon, 09 Ma💻 cs

Gathering Autonomous Mobile Robots Under the Adversarial Defected View Model

This paper presents two distributed algorithms that guarantee deterministic finite-time gathering for $N$ oblivious autonomous mobile robots in the Euclidean plane under the adversarial defected view model, achieving success in the fully synchronous setting with a (4, 2) fault constraint and in the asynchronous setting with a general (N, K) fault constraint, both under non-rigid motion.

Prakhar Shukla, Seshunadh Tanuj Peddinti, Subhash BhagatMon, 09 Ma💻 cs

Why Ethereum Needs Fairness Mechanisms that Do Not Depend on Participant Altruism

This paper argues that Ethereum's decentralization and censorship resistance ideals cannot be restored by relying on altruistic block proposers, as empirical analysis reveals that less than 1.4% of proposers consistently act in accordance with these objectives, thereby necessitating the implementation of incentive- or penalty-based fairness mechanisms.

Patrick Spiesberger, Nils Henrik Beyer, Hannes HartensteinMon, 09 Ma💻 cs

A Lock-Free Work-Stealing Algorithm for Bulk Operations

This paper presents a specialized lock-free work-stealing queue designed for a master-worker framework in mixed-integer programming solvers that leverages restricted concurrency assumptions to support native bulk operations and achieve constant-latency push performance, significantly outperforming general-purpose implementations like C++ Taskflow in batch processing scenarios.

Raja Sai Nandhan Yadav Kataru, Danial Davarnia, Ali JannesariMon, 09 Ma🔢 math

Parallelization Strategies for Dense LLM Deployment: Navigating Through Application-Specific Tradeoffs and Bottlenecks

This paper investigates parallelization strategies for deploying dense LLMs, demonstrating that while Tensor Parallelism optimizes latency and Pipeline Parallelism enhances throughput, a hybrid approach allows for effective control over the inherent latency-throughput tradeoff to meet specific application requirements.

Burak Topcu, Musa Oguzhan Cim, Poovaiah Palangappa, Meena Arunachalam, Mahmut Taylan KandemirMon, 09 Ma🤖 cs.LG

First-Order Softmax Weighted Switching Gradient Method for Distributed Stochastic Minimax Optimization with Stochastic Constraints

This paper proposes a first-order Softmax-Weighted Switching Gradient method for distributed stochastic minimax optimization under stochastic constraints, achieving optimal oracle complexity and high-probability convergence guarantees in both full and partial client participation settings while avoiding the instability of traditional primal-dual approaches.

Zhankun Luo, Antesh Upadhyay, Sang Bin Moon, Abolfazl HashemiMon, 09 Ma🤖 cs.LG

A-3PO: Accelerating Asynchronous LLM Training with Staleness-aware Proximal Policy Approximation

The paper introduces A-3PO, a method that accelerates asynchronous LLM training by 1.8x by approximating the computationally expensive proximal policy in Decoupled PPO through simple interpolation, thereby eliminating the need for extra forward passes while maintaining comparable performance.

Xiaocan Li, Shiliang Wu, Zheng ShenMon, 09 Ma🤖 cs.AI

MoEless: Efficient MoE LLM Serving via Serverless Computing

MoEless is a novel serverless serving framework that mitigates expert load imbalance in Mixture-of-Experts (MoE) large language models through proactive load prediction and dynamic scaling, achieving significant reductions in inference latency and cost compared to existing solutions.

Hanfei Yu, Bei Ouyang, Shwai He, Ang Li, Hao WangMon, 09 Ma🤖 cs.AI

Domain-Adaptive Model Merging across Disconnected Modes

The paper introduces DMM, a data-free framework that merges highly divergent domain-specific models by first consolidating similar ones and then refining the result with synthesized pseudo-data to achieve state-of-the-art performance across unimodal and multimodal benchmarks without requiring centralized data.

Junming Liu, Yusen Zhang, Rongchao Zhang, Wenkai Zhu, Tian WuMon, 09 Ma🤖 cs.AI

StreamWise: Serving Multi-Modal Generation in Real-Time at Scale

StreamWise is an adaptive, modular serving system that leverages heterogeneous hardware and dynamic resource management to enable cost-effective, high-quality real-time multi-modal generation (such as podcast videos) with sub-second startup delays, overcoming the latency and complexity challenges of coordinating diverse models at scale.

Haoran Qiu, Gohar Irfan Chaudhry, Chaojie Zhang, Íñigo Goiri, Esha Choukse, Rodrigo Fonseca, Ricardo BianchiniMon, 09 Ma🤖 cs.AI

Radiation Hydrodynamics at Scale: Comparing MPI and Asynchronous Many-Task Runtimes with FleCSI

This paper benchmarks the FleCSI framework's MPI, Legion, and HPX backends using Poisson and radiation hydrodynamics applications on up to 1024 nodes, revealing that while the MPI backend offers superior scalability for communication-heavy tasks, the HPX backend delivers significant performance gains (up to 1.64x speedup) for computation-intensive hydrodynamics workloads on smaller node counts.

Alexander Strack, Hartmut Kaiser, Dirk Pflüger2026-03-06💻 cs

← Previous Next →

cs.DC