M2R Performance Evaluation

Table of Contents

Sitemap

---> misc
| ---> 2016
| ---> 2015
| ---> 2014
| ---> 2013
| ---> 2012
`--> Agenda

Performance Evaluation

General Informations

The coordinator for these lectures is Jean-Marc Vincent . The lecturers are Jean-Marc Vincent and Arnaud Legrand .

Lectures take place generally on Monday afternoon generally in PLURIEL 134 (but also twice in H103) from 13:45 to 15:45.

In doubt, the planning with lecture rooms is available here (login/password is voirIMATEL/imatel).

Objectives

The aim of this course is to provide the fundamental basis for performance evaluation of computer systems. Two approaches are developed:

  • performance measurement: based on experimental platforms (benchmarks or owner instrumented code execution), how to analyze data and synthesize performance indexes
  • performance modeling: from a description of resources and the behavior of applications, how to predict the performance of the application

Here are links to the previous editions of this lecture: 2011-2012, 2012-2013.

Program and expected schedule

  • 23 September 2013 (13:45 - 16:45): Arnaud Legrand and Jean-Marc Vincent Data presentation, reporting results. Introduction to visualizing data with R and reporting results.
  • 11 October 2013 (13:45 - 15:45): Arnaud Legrand
    1. We reviewed together the Rpubs document I had submitted as a naive analysis. Here is the original document on which the analysis was run, in case it gets updated. Here is a short summary of what we said about this poor analysis:
      • You used a "Dell Latitude 6430u with 16Gb of RAM".
        • Unless the size of the workload is huge, the amount of RAM is not really useful.
        • You meant GB, right ?
        • What kind of processor is there in this machine ? How many cores ? Looking at "/proc/cpuinfo" would indicate that you used an "Intel(R) Core(TM) i7-3687U CPU @ 2.10GHz" and that you have 4 cores. Looking at "/sys/devices/system/node/node0/cpu*/online" would indicate you that Hyperthreading is actually activated. Ideally, you should use hwloc to obtain all such information as well as how much cache there are. Note that some of you performed the experiment on a single machine but were so convinced that there would be a gain that they claimed there was one.
        • What was the frequency used ? You should make sure that your cpu was in the right frequency state e.g., using "cpufreq-info" and "cpufreq-set". In my case, the machine was on battery so the governor was powersave with the minimal frequency (i.e., 754MHz)…
        • Did you activate compiling optimizations ?
      • There are no units on the labels…
      • Do you think 10 measurements is really enough ?
      • For small arrays, the time taken is around a dozen of micro-seconds. Are your timing functions accurate enough ?
      • Why did you plot the mean time ? Couldn't you plot also all the values to get a feeling of variability. If you aim at comparing average response time, maybe confidence intervals would be a good option ?
      • It is actually not possible to reproduce your analysis as the measurements.csv file is not available. Ideally, you would have either inlined it in the Rmd file or made it available on the web. This would allow others to try other analysis and plots with the same dataset. Note that many of you explained they changed the code but did not bother to provide the modifications…
      • Wouldn't it be interesting to make measurements for larger data sets as well ? Just to see up to how much you manage to gain. Indeed, since this is a log-log scale and complexity should be n.log(n) the slope of the parallel plot is likely to increase.
      • You claim that activating parallelism is beneficial as soon as tables comprise 5E05 elements. However this value was determined graphically because you connected your estimations of the expected execution time with lines. Additional measurements should be done around the announced value to check whether this is true or not.
    2. Measurement on computer systems (benchmarking, observation, tracing, monitoring, profiling).
      • Documents: slides EP_02_measurements.pdf
      • Goal: become fully aware of variability, the need to replicate, experimental setup (compiling, machine, input workload), timing resolution, warmup, randomization, …
      • To do for the next time: Use what you learnt to improve previous work! :) Try to assess the importance of gcc optimization, of loop unrolling, of input workload, … To this end, here is a new version of the code and the corresponding Rpubs document.
  • 14 October 2013 (13:45 - 15:45): Arnaud Legrand Esperance and variance estimation, confidence interval, distribution comparison. Checking whether the hypothesis apply or not. Illustration with R.
    • Documents: slides EP_03_confidence_interval.pdf
    • Goal: understand how to deal with randomness
    • To do for the next time: Use what you learnt to improve previous work! :) Make sure you do not run into issues such as:
      • Crappy data
      • Inadequate data
      • Temporal dependencies
  • 21 October 2013 (13:45 - 15:45): Arnaud Legrand Using simulation/emulation/real experiments to study large scale distributed systems.
  • 4 November 2013 (13:45 - 15:45): Jean-Marc Vincent Linear Regression
  • 18 November 2013 (13:45 - 15:45): Jean-Marc Vincent Statistical Modeling
  • 9 December 2013 (13:45 - 15:45): Jean-Marc Vincent Design of Experiments
    • Documents: slides
    • Goals:
      • Understand the difference between observationnal and experimental data.
      • Fishbone
      • Factorial designs
      • Fractional designs
      • Saturated designs for screening
    • To do for the next time: Use what you learnt to improve previous work! Propose an experimental design.

Using R

Installing R and Rstudio

Here is how to proceed on debian-based distributions:

sudo apt-get install r-base r-cran-ggplot2 r-cran-reshape 

Rstudio and knitr are unfortunately not packaged within debian so the easiest is to download the corresponding debian package on the Rstudio webpage and then to install it manually (depending on when you do this, you can obviously change the version number).

wget http://download1.rstudio.org/rstudio-0.97.551-amd64.deb
sudo dpkg -i rstudio-0.97.551-amd64.deb
sudo apt-get -f install # to fix possibly missing dependencies

You will also need to install knitr. To this end, you should simply run R (or Rstudio) and use the following command.

install.packages("knitr")

If r-cran-ggplot2 or r-cran-reshape could not be installed for some reason, you can also install it through R by doing:

install.packages("ggplot2")
install.packages("reshape")

Producing documents

The easiest way to go is probably to use R+Markdown (Rmd files) in Rstudio and to export them via Rpubs to make available whatever you want.

We can roughly distinguish between three kinds of documents:

  1. Lab notebook (with everything you try and that is meant mainly for yourself)
  2. Experimental report (selected results and explanations with enough details to discuss with your advisor)
  3. Result description (rather short with only the main point and, which could be embedded in an article)

We expect you to provide us the last two ones and to make them publicly available so as to allow others to comment on them.

Documentation

For a quick start, you may want to look at R for Beginners (French version). A probably more entertaining way to go is to follow a good online lecture providing an introduction to R and to data analysis such as this one: https://www.coursera.org/course/compdata

Bibliography