Z. Zhou, P. Mertikopoulos, N. Bambos, P. W. Glynn, and C. Tomlin. Working paper.
Multi-agent online learning revolves around multiple interacting agents that make sequential decisions online, each concerned with maximizing their individual rewards (which, a priori, typically depend on the actions of all other players). In this paper, we consider a model of multi-agent online learning where the game is not known in advance, and the agents’ feedback is subject to both noise and delays. Motivated by its strong no-regret properties, we first focus on a class of learning algorithms known as online mirror descent (OMD), and we show that, even in the presence of noise, the induced sequence of play converges to Nash equilibria in a wide class of continuous games, provided that the feedback delays faced by the agents are synchronous and bounded. Subsequently, to tackle fully decentralized, asynchronous environments with unbounded feedback delays, we propose a variant of OMD which we call delayed mirror descent (DMD), and which relies on the repeated leveraging of past information. With this modification, the algorithm converges to Nash equilibria almost surely, even in noisy environments with no feedback synchronicity assumptions, and with feedback delays growing at a superlinear rate relative to the game’s horizon.
[Finalist for the INFORMS George Nicholson and Applied Probability Society awards]