P. Mertikopoulos and M. Staudigl. In CDC '21: Proceedings of the 60th IEEE Annual Conference on Decision and Control.
In this paper, we examine the equilibrium tracking and convergence properties of no-regret learning algorithms in continuous games that evolve over time. Specifically, we focus on learning via “mirror descent”, a widely used class of no-regret learning schemes where players take small steps along their individual payoff gradients and then “mirror” the output back to their action sets. In this general context, we show that the induced sequence of play stays asymptotically close to the evolving equilibrium of the sequence of stage games (assuming they are strongly monotone), and converges to it if the game stabilizes to a strictly monotone limit. Our results apply to both gradient- and payoff-based feedback, i.e., the “bandit” case where players only observe the payoffs of their chosen actions.