Communication-Efficient Decentralized Optimization via Double-Communication Symmetric ADMM

Imagine a group of friends trying to solve a massive jigsaw puzzle together, but they are all in different rooms and can only talk to the friends sitting right next to them. There is no "boss" in the middle telling everyone what to do. This is the world of Decentralized Optimization.

The problem? If they only talk once per round of thinking, the puzzle takes forever to finish. If they talk too much, they get tired from shouting across the rooms.

This paper introduces a new way for these friends to work together called DS-ADMM (Double-Communication Symmetric ADMM). Here is the simple breakdown of how it works and why it's a game-changer.

1. The Old Way: The "One-and-Done" Rule

In the past, most decentralized algorithms followed a strict rule: Think, then talk once.

Step 1: Everyone looks at their own puzzle pieces and makes a guess.
Step 2: Everyone whispers their guess to their immediate neighbors.
Step 3: Everyone updates their guess based on what they heard.
Repeat.

The problem is that this "whisper" only reaches the people sitting right next to you. It takes hundreds of rounds for a piece of information to travel from one side of the room to the other. It's like trying to pass a message down a long line of people by whispering; by the time it gets to the end, it's slow and the message might get distorted.

2. The New Idea: The "Double-Check" Strategy

The authors realized that while talking more sounds like it would waste time, talking smarter actually saves time. They proposed a method where, inside a single "round" of work, the friends talk twice.

Think of it like a relay race where the runners pass the baton, but then immediately pass it again to the next person in line before the race officially moves to the next lap.

The Innovation: Instead of just mixing their own data with their neighbors' data once, they mix it, pass it along, and then mix it again within the same step.
The Result: Information travels much faster across the network. It's as if the friends suddenly developed a way to "teleport" their ideas a few steps further in a single second.

3. The "Symmetric" Secret Sauce

Why does this work without causing chaos? The authors used a mathematical trick called Symmetric ADMM.

Imagine two teams of dancers (Team A and Team B) trying to synchronize their moves.

Old methods: Team A moves, then Team B copies them. Then Team B moves, and Team A copies them. It's a bit lopsided and can get out of sync.
This paper's method: They use a Symmetric approach. Team A and Team B move in perfect harmony, mirroring each other's steps. They update their positions and their "dual" (the plan for the next move) simultaneously and equally.

This symmetry acts like a stabilizer. It prevents the group from going off-track, allowing them to take bigger, bolder steps toward the solution without falling over.

4. The "Optimal Messenger"

You might think, "If they talk twice, they must be sending double the messages!"
The authors were clever. They designed a specific communication rule (like a secret code).

Instead of shouting the whole puzzle piece (which is huge data), they shout a tiny, summarized "hint" (a small vector).
They figured out the exact minimum amount of information needed to make the double-talk work.
Analogy: Imagine instead of sending a whole photo of the puzzle piece, you just send a text message saying, "The edge is blue." It conveys the necessary info with almost zero effort.

5. The Big Payoff: Speed vs. Effort

Here is the magic trade-off:

Per Round Cost: Yes, each round is slightly more expensive because they talk twice.
Total Cost: Because the information spreads so much faster, the group reaches the solution in far fewer rounds.

The Analogy:
Imagine you are trying to clean a huge house.

Old Method: You and your friends clean one room, then check with the next room. It takes 100 hours.
New Method: You and your friends clean a room, but you also quickly check with the room two doors down and the room three doors down during the cleaning process. Even though checking takes a few extra minutes per room, you finish the whole house in 20 hours because you didn't have to wait for the "cleaning wave" to slowly travel across the house.

Summary

This paper proves that by letting decentralized networks talk twice in a single step, using a symmetric (balanced) mathematical structure and smart, minimal messaging, we can solve complex machine learning problems much faster.

It's a win-win: The computers do less total work, send less total data, and get the answer sooner. It turns a slow, whispering chain into a fast, synchronized chorus.

Here is a detailed technical summary of the paper "Communication-Efficient Decentralized Optimization via Double-Communication Symmetric ADMM" (DS-ADMM).

1. Problem Statement

The paper addresses decentralized composite optimization over networks without a central coordinator. The goal is to solve the following problem collaboratively among $n$ agents:
$\min_{x \in \mathbb{R}^d} F(x) = \sum_{i=1}^n [f_i(x) + g_i(x)]$
where:

$f_i$ is a convex local loss function (e.g., least squares, hinge loss).
$g_i$ is a convex local regularizer (e.g., $\ell_1$ , $\ell_2$ norms).
Agents communicate only with immediate neighbors over a connected graph $\mathcal{G}$ .

The Core Challenge: In decentralized optimization, the total time cost is dominated by the sum of local computation and inter-node communication. Existing methods typically follow a "one iteration, one communication round" paradigm. While "multi-consensus" schemes (multiple communication rounds per iteration) exist, they often fail to reduce the total communication cost because the extra rounds do not sufficiently accelerate convergence to offset the increased per-iteration overhead. The paper seeks a principled framework where multiple communication rounds per iteration lead to a net reduction in total communication cost.

2. Methodology: DS-ADMM

The authors propose Double-Communication Symmetric ADMM (DS-ADMM), a novel algorithm that integrates multiple communication rounds directly into the algorithmic structure rather than treating them as an external enhancement.

A. Symmetric Consensus Constraints

Standard decentralized ADMM often uses asymmetric constraints. DS-ADMM introduces a symmetric consensus constraint formulation derived from the spectral properties of the mixing matrix $W$ .

Instead of a single constraint, the consensus condition $u_1 = \dots = u_n$ is reformulated using an auxiliary variable $v$ and the mixing matrix $W$ :
$u = \tilde{W}v \quad \text{and} \quad v = \tilde{W}u$
where $\tilde{W} = W \otimes I_d$ .
This formulation is invariant under the exchange of primal blocks ( $u \leftrightarrow v$ ), enabling the use of Symmetric ADMM (S-ADMM), which features a balanced primal-dual update structure known for faster convergence.

B. Graph-Aware Proximal Linearization

To ensure the subproblems remain separable across agents (decentralized), the authors introduce a graph-aware proximal matrix $Q$ :
$Q = \beta((1+\tau)I_{nd} - \tilde{W}^\top \tilde{W})$
This choice eliminates the quadratic coupling terms ( $\|Au - Bv\|^2$ ) that would otherwise require global information, allowing each agent to perform local updates using only neighbor information.

C. Optimal Double-Communication Structure

The algorithm is designed to execute exactly two communication rounds per iteration, structured as follows:

Group 1 Update: Agent $i$ updates local variables and computes an intermediate dual variable.
Communication Round 1: Agents transmit a carefully constructed auxiliary message $a_i$ (a combination of dual variables) and the updated primal variable $u_i$ . This minimizes data transmission compared to sending raw variables.
Group 2 Update: Using the aggregated information from Round 1, agents update the second primal block $v$ and the second dual variable.
Communication Round 2: Agents transmit $v_i$ and another auxiliary message $b_i$ .

Key Innovation: The algorithm transmits dual-variable combinations rather than raw primal variables in the first round. This specific design, coupled with the symmetric structure, allows the algorithm to achieve faster convergence (fewer total iterations) that outweighs the cost of the extra communication round, resulting in a net reduction in total communication cost.

3. Key Contributions

Novel Framework (DS-ADMM): A decentralized composite optimization framework based on Symmetric ADMM that inherently incorporates two communication rounds per iteration.
Optimal Communication Rules: Derivation of transmission strategies that minimize the number of rounds (fixed at 2) and the volume of data transmitted per round (sending compressed dual combinations).
Theoretical Guarantees:
- Sublinear Convergence: Proven $O(1/t)$ convergence under standard assumptions.
- Linear Convergence: Proven Q-linear convergence under the metric subregularity condition of the KKT mapping. This condition is satisfied by many practical machine learning problems (e.g., Lasso, SVM, Logistic Regression) where loss functions are smooth/strongly convex or piecewise linear-quadratic (PLQ).
Empirical Superiority: Extensive experiments showing DS-ADMM outperforms state-of-the-art methods (D-ADMM, PG-EXTRA, NIDS, ProxMudag) in both iteration count and total communication rounds.

4. Experimental Results

The authors evaluated DS-ADMM on Lasso Regression and $\ell_2$ -regularized SVM Classification across various network topologies (random graphs with varying edge probabilities, ring networks) and agent counts (30, 50, 100).

Performance: DS-ADMM consistently converged faster than baselines.
Communication Efficiency: While DS-ADMM performs two communication rounds per iteration, it requires significantly fewer total iterations to reach a target suboptimality. Consequently, the total number of communication rounds required to converge is lower than methods performing only one round per iteration.
Robustness: The method maintained superior performance even on sparse network topologies (low edge probability) and random data partitions.

5. Significance

This paper challenges the conventional wisdom in decentralized optimization that "more communication per iteration is always bad." By embedding multi-round communication into the mathematical structure of Symmetric ADMM, the authors demonstrate a new trade-off:

Trade-off: Increasing per-iteration communication cost $\rightarrow$ Drastically reducing total iterations $\rightarrow$ Net reduction in total communication cost.

The work provides a rigorous theoretical foundation for "multi-round" strategies, showing that with the right algorithmic design (symmetric constraints and optimal message encoding), decentralized systems can achieve higher efficiency without a central coordinator. This is particularly significant for large-scale, privacy-sensitive machine learning applications where communication bandwidth is a bottleneck.

Communication-Efficient Decentralized Optimization via Double-Communication Symmetric ADMM

1. The Old Way: The "One-and-Done" Rule

2. The New Idea: The "Double-Check" Strategy

3. The "Symmetric" Secret Sauce

4. The "Optimal Messenger"

5. The Big Payoff: Speed vs. Effort

Summary

1. Problem Statement

2. Methodology: DS-ADMM

A. Symmetric Consensus Constraints

B. Graph-Aware Proximal Linearization

C. Optimal Double-Communication Structure

3. Key Contributions

4. Experimental Results

5. Significance

More like this

Partial Sums of the Series for the Dirichlet Eta Function, their Peculiar Convergence, the Simple Zeros Conjecture, and the RH

Triangular arrangements on the projective plane

Some arithmetic properties of Weil polynomials of the form t2g+atg+qgt^{2g}+at^g+q^gt2g+atg+qg

Big Picard theorems and algebraic hyperbolicity for varieties admitting a variation of Hodge structures

On the dual positive cones and the algebraicity of a compact Kähler manifold

Some arithmetic properties of Weil polynomials of the form $t^{2g}+at^g+q^g$