Research
Table of Contents
My list of publications is now extracted from HAL.
Research Topics
Although our everyday life and society now depends heavily on communication infrastructures and computation infrastructures, scientists and engineers have always been among the main consumers of computing power. My research targets the management and performance evaluation of large scale distributed computing infrastructures such as clusters, grids, desktop grids, volunteer computing platforms, clouds,… when used for scientific computing. More specifically, I have interest in understanding how to make a better use of these platforms and possibly to extend their applicability to other workload than those for which they are already efficiently used. Although my motivations are quite practical, my work is mostly theoretical but done in connection with practitioners whenever possible in order to keep my modeling assumptions as reasonable as possible.
Scheduling for Distributed Platforms
I study scheduling problems arising on distributed platforms (like computing grids) with a particular emphasis on heterogeneity and multi-user issues, hence some background in game theory.
During my PhD thesis, I initially worked on scheduling and parallel algorithms for dense linear algebra kernels on heterogeneous platforms (IJHPCA01, ParCo02) but my main results have been obtained in the context of steady-state scheduling, i.e., throughput optimization instead of more classical makespan minimization (TPDS03, TPDS04, JPDC05, TPDS05-2) and divisible load scheduling (ParCo03, TPDS05-1). These two models are relaxed versions of more classical scheduling frameworks and allow to easily account for key platform characteristics such as heterogeneity or complex topology while providing efficient practical solutions.
Since then, I have been particularly interested in trying to incorporate a notion of user in such scheduling problems, in particular using classical game theory notions:
- Non-Cooperative Throughput Optimization (Infocom07 and GameComm07 with Corinne Touati)
- Max-min Fair Throughput Optimization (IPDPS06, TPDS08) with Yves Robert, Loris Marchal, Olivier Beaumont, Larry Carter and Jeanne Ferrante.
- Proportionally Fair Distributed Throughput Optimization (Grid08, and JPDC13) with Corinne Touati, Sascha Hunold, and Rémi Bertin
- Centralized Response Time Optimization (HCW05, SPAA06, JoS08) with Fréderic Vivien and Alan Su.
- Non-Cooperative Throughput and Response Time Optimization (CCGrid11) with Bruno Donassolo and Claudio Geyer.
Some of this work has been part of the ANR ALPAGE and the ANR DOCCA. It is also the result of my participation to the CloudShare and CloudComputing@home associated teams with Berkeley.
Simulation and Performance Evaluation of Distributed Platforms
Since 2000, I am one of the main developers of the SimGrid project. SimGrid is a simulation toolkit for building simulators of distributed applications (originally designed for scheduling algorithm evaluation purposes). This software is developed in collaboration with Henri Casanova, Martin Quinson and Frédéric Suter.
The official website is simgrid.gforge.inria.fr. We also have a mailing list for SimGrid users.
We try to provide high-quality software and studying the validity of such simulations. My most notable scientific contributions in this software are the following:
- Deep assessment of the validity of fluid network models with my former PhD student Pedro Velho.
- Fast and scalable implementation of fluid network models.
- Reliable performance prediction capabilities in the context of complex HPC applications with former and current PhD students (Luka Stanisic, Christian Heinrich, Tom Cornebize,…) and engineers (in particular Augustin Degomme) and numerous other colleagues.
Since 2009, I have started working with Lucas Schnorr and Jean-Marc Vincent on visualization and trace analysis (Triva, Viva, Paje…).
All this work has been supported by the INRIA through ADTs and ODL and by the ANR USS-SimGrid and ANR SONGS projects.
Former and Current Students and Collaborators
This list only lists graduate students (Masters and above). I only realized recently I should maintain it so I hope I did not forget anyone.
- Pedro Bruel (co-tutelle with USP 2017-…): Design of experiments and autotuning of HPC computation kernels (co-advised with Alfredo Goldman and Brice Videau, funded by the Brazilian Government).
- Tom Cornebize (2017-…): Capacity planning and performance evaluation of supercomputers (funded by the French Ministery for Research).
- Bruno Luis de Moura Donassolo (CIFRE Orange 2017-…): Decentralized management of applications in Fog computing environments (co-advised with Panayotis Mertikopoulos and Ilhem Fajari, funded by Orange).
- Vinicius Garcia Pinto (co-tutelle with UFRGS 2013-…): Performance analysis and visualization of dynamic task-based applications (co-advised with Lucas Schnorr and Nicolas Maillard, funded by the Brazilian government).
- Christian Heinrich (2015-…): Modeling of performance and energy consumption of HPC systems (funded by Inria).
- Luka Stanisic (Msc + PhD, 2012-2015): Performance evaluation, modeling and simulation of HPC systems; Experimental methodology and reproducible research.
- Rafael Tesser (co-tutelle with UFRGS 2013-…): Simulation and performance evaluation of dynamical load balancing of an over-decomposed Geophysics application (co-advised with Lucas Schnorr and Philippe Navaux, funded by the Brazilian government).
- Augustin Degomme (Eng. 2012-2015): Simulation/performance prediction of MPI applications
- Sascha Hunold (Post-doc 2011-2012): Design of Experiments, Reproducible Research, Fair Scheduling of Bag-of-Tasks Applications Using Distributed Lagrangian Optimization
- Lucas Schnorr (Post-doc 2009-2012): Tracing, observation and visualization of large scale distributed systems.
- Wagner Kolberg (MSc 2012 from UFRGS): Faithful Modeling of MapReduce Applications
- Pierre Navarro (Eng. 2010-2012): Improvement of the SimGrid Framework (scalability, robustness, new features, …)
- Pedro Velho (PhD. 2006-2011): Accurate and Fast Simulations of Large-Scale Distributed Computing Systems. Co advised with Jean-François Méhaut.
- Lionel Eyraud-Dubois (Post-doc 2007): Automatically Building Sound Network Representations
- Rémi Bertin (PhD 2007-2009, interrupted): Collaboration Mechanisms in Peer-to-Peer and Collaborative Computing Systems
- Bruno Donassolo (MSc 2007-2009): Design and Implementation of a Scalable Scheduler for the SimGrid Project; Study of Non-Cooperative Optimization in Volunteer Computing Systems.
- Rémi Vannier (MSc 2006): Proportionnally Fair and Distributed Scheduling of Multiple Bag-Of-Task Applications
- Darina Dimitrova (MSc 2006): Application-level Network Topology Discovery in Grid Computing Platforms