Z. Zhou, P. Mertikopoulos, A. L. Moustakas, N. Bambos, and P. W. Glynn. In CDC '17: Proceedings of the 56th IEEE Annual Conference on Decision and Control, 2017.
Online mirror descent (OMD) is an important and widely used class of adaptive learning algorithms that enjoys good regret performance guarantees. It is therefore natural to study the evolution of the joint action in a multi-agent decision process (typically modeled as a repeated game) where every agent employs an OMD algorithm. This well-motivated question has received much attention in the literature that lies at the intersection between learning and games. However, much of the existing literature has been focused on the time average of the joint iterates. In this paper, we tackle a harder problem that is of practical utility, particularly in the online decision making setting: the convergence of the last iterate when all the agents make decisions according to OMD. We introduce an equilibrium stability notion called variational stability (VS) and show that in variationally stable games, the last iterate of OMD converges to the set of Nash equilibria. We also extend the OMD learning dynamics to a more general setting where the exact gradient is not available and show that the last iterate (now random) of OMD converges to the set of Nash equilibria almost surely.