Pick your neighbor: Local Gauss-Southwell rule for fast asynchronous decentralized optimization

[C81] - Pick your neighbor: Local Gauss-Southwell rule for fast asynchronous decentralized optimization

M. Costantini, N. Liakopoulos, P. Mertikopoulos, and T. Spyropoulos. In CDC '22: Proceedings of the 61st IEEE Annual Conference on Decision and Control, 2022.

Abstract

In decentralized optimization environments, each agent $i$ in a network of $n$ optimization nodes possesses its individual component function $f_i$, and all nodes in the network communicate with their neighbors to cooperatively minimize the aggregate objective $\sum_{i=1}^n f_{i}$. In this setting, synchronizing the nodes' updates incurs significant communication overhead and computational costs, so much of the recent literature has focused on the analysis and design of asynchronous optimization algorithms where agents activate and communicate with their neighbors at arbitrary times, without being managed by a global synchronization enforcer. Nonetheless, in most of the work on the topic, active nodes select a neighbor to contact based on a fixed probability (e.g., uniformly at random), a choice that ignores the optimization landscape at the moment of activation.

Instead, in this work we introduce an optimization-aware selection rule that chooses the neighbor with the highest dual cost improvement (a quantity related to a consensus-based dualization of the problem at hand). This scheme is related to the coordinate descent (CD) method with a Gauss-Southwell (GS) rule for coordinate updates. In our setting however, only a subset of coordinates is accessible at each iteration (because each node is constrained to communicate only with its direct neighbors), so the existing literature on GS methods does not apply. To overcome this difficulty, we develop a new analytical framework for smooth and strongly convex $f_i$ that covers the class of setwise CD algorithms – a class that directly applies to decentralized scenarios, but is not limited to them – and we show that the proposed setwise GS rule achieves a speedup by a factor of up to the maximum degree in the network (which is of the order of $\Theta(n)$ in highly connected graphs). The speedup predicted by our theoretical analysis is subsequently validated in a series of numerical experiments with synthetic data.

arXiv link: https://arxiv.org/abs/2207.07543