PUF/JLPC Summer school on Performance Metrics, Modeling and Simulation of Large HPC Systems


---> misc
| ---> 2016
| ---> 2015
| ---> 2014
| ---> 2013
| ---> 2012
`--> Agenda

LOGOPUF-RGB.png ANL-Logo.png Illinois-logo.png NCSA-logo-slide-1.png inria.png BSC-Logo.png


The Joint Laboratory for Petascale Computing focuses on software challenges found in complex high-performance computers. The Joint Laboratory is based at the University of Illinois at Urbana-Champaign and includes researchers from the French national computer science institute Inria, Argonne National Laboratory, Illinois' Center for Extreme-Scale Computation, and the National Center for Supercomputing Applications. The Barcelona Supercomputer Center also recently joined this effort.

The 11th workshop of the INRIA-Illinois-ANL Joint Laboratory will be held at Sophia Antipolis, on the French riviera, from June 9th to June 11th 2014.

The first summer school of the Joint-Laboratory will follow at the same location on June 12th-13th. Both events are supported by the PUF in the context of the NEXTGN project. The theme of this summer school will be on performance metrics, modeling and simulation of large HPC systems. This summer school is offered to students (mainly PhD students and post-docs although Master students may also be interested) of the collaboration and open to students and faculties outside of the collaboration, if room. We are considering a maximum of 30 students.


Participants should register at the following address: http://registration.gipco-adns.com/site/2736.

Travel Information and Accommodation

Housing is funded by the JLPC and by the PUF. Traveling will be funded in the limit of the school budget.


Modern computing systems have become increasingly complex and large scale. Understanding their performance has thus become dramatically more difficult than it has ever been. The aim of this summer school is to provide attendees a presentation of modern tools and techniques which allow to study the performances of large HPC systems. Four series of lectures (including hands on) on trace/workload analysis/visualization and system simulation/emulation will be given by domain experts. Although classical API (s.a MPI, CHARM++, OmpSs) will be used to illustrate techniques, the fundamental problems remain identical and interesting to any student working with HPC systems and less classical APIs or applications.

Day Morning (8:30AM-12PM) Afternoon (13:20PM-16:30PM)
Thursday, June 12th 2014 HPC Application Tracing and Analysis (J. Labarta) HPC Applications Performance Analysis and Debugging (S. Kale)
Friday, June 13th 2014 Simulation of HPC systems (M. Quinson) Reproducible Research (A. Legrand)

The tutorials given during this summer school will include hands-on sessions. Attendees are thus expected to bring their own laptop. Software requirements are described for each tutorial in the next sections.

Software Requirements

Each tutorial has a special software requirement section in case you want to run code nativeley. However, to make sure everyone has a working environment, we decided to set up Virtual Machines. There are two alternatives:


If you're running linux, you probably want to use this VM (debian_test.qcow2), which can be run as follows:

      sudo qemu-system-x86_64 -smp threads=3 -m 4G --enable-kvm debian_test.qcow2

If you're running Mac OS X, you can choose the easy was (i.e. use the VirtualBox image) or the hard one (use the QEMU image). Joseph Emeras told me it was possible to get qemu 2.0.0 by installing brew (http://brew.sh/) and then brew install qemu.

If you're running on Mac OS X, you probably want to use this VM (debian_test.vdi). You may want to select the mac keyboard layout once logged in. Also make sure you give enough RAM (e.g., 2GB) to the VM if you want to be able to work with it and possibly make a large data analysis. Here are roughly what the steps are for running it:
  • Run VirtualBox
  • Click the "New" button
  • Enter the name "PUF/JLPC VM";
  • Select "Linux" with the OS Type dropdown and "Debian (64 bit)" with the Version drowdown
  • Select "Next"
  • On the "Memory" panel, increase to at least 2GB and select "Next"
  • On the Virtual Hard Disk panel select "Existing" and then select the file with the small icon
  • Select the "debian_test.vdi" file that you just downloaded
  • Click "Open", "Create" and you're all set
  • Run the VM by double-clicking on it

These VM images are recent "debian testing" images with

  • R, Rstudio, knitr, plyr and ggplot2;
  • emacs (set up with my emacs configuration and some pretty efficient shortcuts I demonstrate in the talk) with ess, auctex, pdflatex;
  • git and svn…
  • gcc, cmake, and simgrid;
  • a bunch of example org files (including an excerpt of my journal.org) and data to look at in the data_analysis/ directory;
  • several tools and traces from the BSC;
  • projections and some traces from the UIUC/Charm++ team.

I used Kameleon to set up these images and I will make the corresponding recipes available as soon as I get some time so that people who want to do something similar natively know where to start. If you're in a hurry, send me an email and I'll send you the uncleaned versions.

Full Description of the Lectures

HPC Applications Tracing and Analysis


Jesus Labarta (Director of the Barcelona Supercomputing Center) and Juan Gonzalez (Junior Researcher at Barcelona Supercomputing Center)

     cd puf_jlpc/bsc
     for i in * ; do echo " - [[./puf_jlpc/bsc/$i][$i]]" ; done

Obtaining traces from HPC applications is the first step before analysis, understanding, optimization or simulation. At large scale, issues such as tracing overhead, intrusiveness or trace size become so important that classical tracing mechanisms have to be redesigned.


In a first part, we will present the basics of tracing and what the major issues are. We will present some of the main tracing environments and try to compare them. We will focus on extrae and present some mechanisms that allow to increase scalability. We will also describe some of the analysis capabilities motivating such mechanisms and how to use them. In particular, we will present a couple of recent developments with clustering and processing sampled data ("folding"). We will illustrate the resulting analysis with paraver.

In a second part, a few traces will be provided and attendees will have the opportunity of analyzing them or even to obtain traces from their own code if time allows


We will illustrate and use several tools and traces from the BSC toolchain (e.g., extrae, dimemas, paraver). Such tools can be downloaded from the BSC software page.

Identifying bottleneck in applications


Sanjay Kale (Professor in the CS dept., University of Illinois, Urbana Champaign) and Ronak Buch (PhD candidate, University of Illinois, Urbana Champaign)


Unavailable at the moment. I'm working on it…


Performance of parallel applications is notoriously hard to optimize. This gets even more challenging when running a large number of processors, and/or during strong scaling of an application to the limit of its scaling. Fortunately, one can obtain detailed performance data in the form of traces or summaries. However, it requires skill and expertise to use this data to identify the major bottleneck that is holding up performance. This is further compounded by the "whack a mole" nature of performance debugging: when you fix one problem, another problem that was masked by the first one emerges as the next bottleneck.


In this tutorial, we will learn about techniques and methodologies that one can use to hold onto and saw that the major performance problems faced by your applications. You will learn about different views of performance, such as time profiles, processor profiles, communication graphs, outlier analysis, histograms, and the richest of them all: detailed timelines. You will then learn about the rules, heuristics, and idioms (i.e. sequence of analysis/visualizations in pursuit of a conclusion or inference) that experts use in performance tuning.

The tutorial will use the projections performance analysis tool that is part of the Charm++ parallel programming system. It will include several case studies from applications such as NAMD (biophysics), ChaNGa (astronomy), and several mini-applications.


It is required that the attendees download and test "Projections" on their laptops. In addition, several log files will be provided that must be downloaded prior to the workshop. Projections can be downloaded from http://charm.cs.uiuc.edu/software

Using Simulation to study HPC codes


Martin Quinson (Lorraine University/Inria) and Augustin Degomme (CNRS/University of Grenoble/Inria)

Slides and Material
     cd puf_jlpc/sg
     for i in * ; do echo " - [[./puf_jlpc/sg/$i][$i]]" ; done

Modern computing systems have become increasingly complex and large scale. This irreducible complexity of creates a large gap between our understanding of the system and its reality, between the facts and our analysis. Simulation is thus an appealing alternative to study such systems. Indeed, in silico studies have proved their usefulness in most other scientific and engineering disciplines. SimGrid is an open source framework to study the behavior of large-scale distributed systems such as Grids, Clouds, HPC or P2P systems. It can be used to evaluate heuristics, prototype applications or even assess legacy MPI applications. It is developed and optimized since more than 10 years and allows to develop simulators that are several orders of magnitude faster than many ad hoc simulators while relying on fluid (coarse-grain) resource models whose validity has been thoroughly studied. These models allow to account for network topology (e.g., contention, performance heterogeneity, communication locality) and resource capacity fluctuations as well as non trivial network protocol peculiarities (e.g., protocol switches, fairness, reverse traffic impact).

This tutorial will provide attendees with clear perspectives on the challenges for experimental research in the area of parallel and large-scale distributed computing, and on current technology for conducting experiments with real-world testbeds, emulated testbeds, or simulated testbeds. It will particularly emphasize on the capabilities and limitations of simulation.


The first part of the tutorial will present and contrast current experimental methodologies, giving attendees in-depth understanding of the scientific and technological issues at hand.

The second part of the tutorial will focus on simulation, giving a state of the art of current simulation technology and discussing challenges for the development of sound simulation models. The tutorial will use the SimGrid simulation framework as an exemplar since it implements sophisticated and validated simulation models.

The last part of the tutorial will be a hands-on and focus on an in-depth presentation how SMPI can be used to predict the performance of MPI codes.

By the end of this tutorial attendees should have a clear understanding of current technology and best practice for experimental parallel large-scale distributed computing research, and in particular on the use of simulation.


The attendees should be familiar with MPI and HPC systems. If they have an idea of system they would like to study through simulation, we will be happy to brainstorm with them. It is however essential to fully benefit from the hands-on that they bring their own laptop with an already working recent coding environment (C/C++, cmake, whatever IDE they prefer).

The simplest option for attendees is to install the libsimgrid-dev package from the official servers of recent debian or ubuntu distributions. If you have another OS, you should install from the sources (or have a VM with an up-to-date debian).

Reproducible Research: Best Practices and Tools in Conducting Experiments and Analyzing Results


Arnaud Legrand (CNRS/University of Grenoble/Inria).

   cd /home/alegrand/Work/SimGrid/infra-songs/WP8/140613-Nice-PUF_JLPC-RR/
   cp PUF_JLPC_intro.pdf /home/alegrand/org/public_html/research/events/puf_jlpc
   cp RR_orgmode.pdf /home/alegrand/org/public_html/research/events/puf_jlpc/rr/

Reproducibility of experiments and analysis by others is one of the pillars of modern science. Yet, the description of experimental protocols (particularly in computer science articles) is often lacunar and rarely allows to reproduce a study. Such inaccuracies may not have been too problematic 20 years ago when hardware and operating systems were not too complex. However nowadays are made of a huge number of heterogeneous components and rely on an software stack (OS, compiler, libraries, virtual machines, …) that are so complex that they cannot be perfectly controlled anymore. As a consequence some observations have become more and more difficult to reproduce and to explain by other researchers and even sometimes by the original researchers themselves. Although computer systems are theoretically deterministic systems, the state explosion problem and the inability to perfectly observe and control their state leave no other option than considering them as rather stochastic systems.

Yet, the analysis of stochastic systems is more complex and the graphs provided in articles are generally insufficient to draw solid conclusions. Although simple graphs may illustrate the point of view of the authors, they rarely convey information about the variability of the system, which is yet critical to evaluate how much confidence can be put in the analysis.

In the last decade there has been an increasing number of article withdrawal even in prestigious journals and the realization by both the scientific community and the general public that many research results and studies were actually flawed and wrong.

Open science is the umbrella term of the movement to make scientific research, data and dissemination accessible to all levels of an inquiring society. In particular, it encompasses practices such as the use of open laboratory notebooks and reproducible research, which refers to the idea that the ultimate product of academic research is the paper along with the full computational environment used to produce the results in the paper such as the code, data, etc. that can be used to reproduce the results and create new work based on the research.

An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures.

– Jonathan Buckheit and David Donoho


The aim of this tutorial is to sensibilize the audience to the experiment and analysis reproducibility issue in particular in computer science. I will present tools that help answering the analysis problem and may also reveal useful for managing the experimental process through notebooks.

More precisely, I will introduce the audience to the following tools:

  • R and ggplot2 that provide a standard, efficient and flexible data management and graph generation mechanism. Although R is quite cumbersome at first for computer scientists, it quickly reveals an incredible asset compared to spreadsheets, gnuplot or graphical libraries like matplotlib or tikz.
  • knitR is a tool that enables to integrate R commands within a LaTeX or a Markdown document. It allows to fully automatize data post-processing/analysis and figure generation down to their integration to a report. Beyond the gain in term of ease of generation, page layout, uniformity insurance, such integration allows anyone to easily check what has been done during the analysis and possibly to improve graphs or analysis.
  • I will explain how to use these tools with Rstudio, which is a multi-platform and easy-to-use IDE for R. For example, using R+Markdown (Rmd files) in Rstudio, it is extremely easy to export the output result to Rpubs and hence make the result of your research available to others in no more than two clicks.
  • I will also mention other alternatives such as org-mode and babel or the ipython notebook that allow a day-to-day practice of reproducible research in a somehow more fluent way than knitR but is mainly a matter of taste.

Depending on the question of the audience, I can also help the attendees analyzing some of their data and introduce them to the basics of data analysis.


To make sure that everyone uses its time in the most possible efficient way, it is required that the attendees have installed before hand R, Rstudio et ggplot2 on their laptop.

Here is how to proceed on a recent debian distribution

     sudo apt-get install r-base r-cran-ggplot2 r-cran-reshape

Rstudio and knitr are unfortunately not packaged within debian so the easiest is to download the corresponding debian package on the Rstudio webpage and then to install it manually.

     wget http://download1.rstudio.org/rstudio-0.98.490-amd64.deb
     sudo dpkg -i rstudio-0.98.490-amd64.deb
     sudo apt-get -f install # to fix possibly missing dependencies

You will also need to install knitr. To this end, you should simply run R (or Rstudio) and use the following command (you'll have to answer yes to the first question about installing in your home and then you'll have to select a mirror).


If r-cran-ggplot2 or r-cran-reshape could not be installed for some reason, you can also install them through R by doing:


Those of you who whould like to play with org-mode should install emacs24, org-mode (>=8.2!!!) and ess:

     sudo apt-get install emacs24 org-mode ess

You may want to use my emacs configuration.