P. Mertikopoulos, Y.-P. Hsieh, and V. Cevher. Working paper.
We develop a unified stochastic approximation framework for analyzing the long-run behavior of online learning in games. Our framework is based on a flexible primal-dual Robbins–Monro (PDRM) template which encompasses a wide array of popular game-theoretic learning algorithms – from gradient methods and their optimistic variants, to the exponential / multiplicative weights algorithm and its bandit adaptations for learning in finite games. In addition to providing an integrated view of these algorithms, the proposed PDRM framework allows us to obtain a broad range of new convergence results, both asymptotic and in finite time, in both continuous and finite games. Specifically, we provide a range of criteria for identifying sets of action profiles that are attracting with high probability, and we also introduce the notion of coherence, a game-theoretic property that allows convergence in finite time, despite all the randomness involved. Importantly, our analysis applies to both oracle-based and bandit, payoff-based methods – that is, when players only observe their realized payoffs.
arXiv link: https://arxiv.org/abs/2206.03922