P. Mertikopoulos. Habilitation à diriger des recherches (HDR) en Informatique et Mathématiques Appliquées, Université Grenoble Alpes, 2019.
The bane of decision-making in an unknown environment is that of regret: no rational agent would want to realize in hindsight that the decision policy they employed was strictly inferior to a crude policy prescribing the same action throughout. As such, the minimization of regret has become the linchpin of an extensive literature on online learning and optimization, with far-reaching applications in machine learning, operations reserach, network science, and many other fields where online decision-making plays a predominant role.
The aim of this HDR dissertation is to examine the ramifications of no-regret learning in multi-agent environments, i.e., when the agents’ rewards are determined by their interactions. In this context, a natural question that arises is whether no-regret learning always leads to a rationally admissible states – such as a Nash equilibrium. We provide a range of answers to this question, both positive and negative, and we examine in detail how these answers are affected by the structure of the underlying game, the information available to the players, the time-scales involved, etc. We also discuss a range of concrete applications to large-scale distributed machine learning problems and multiple-input/multiple-output systems in signal processing.