Tutorial on Reproducible Research for the SyncFree Project
Sitemap
After my tutoriel at Compas, Marc Shapiro kindly invited me to give another tutorial to the PhD students of the SyncFree Europen project. There were about 8-9 people, mainly 1st or 2nd year PhD students and Annette Middelkoop-Bieniusa, the project coordinator.
The purpose of this tutorial was:
- To give an overview of current trends in reproducible research,
- To demonstrate that R/knitr or R/org-mode are perfectly usable for doing reproducible analysis and that there is no more excuse for not doing it… ;)
- To show how to use emacs/org-mode for keeping a laboratory notebook.
To make sure everyone would have a working environment, I decided to learn to use Kameleon to set up well set up Virtual Machines. There are two alternatives:
- QEMU/KVM
If you're running linux, you probably want to use this VM (debian_test.qcow2), which can be run as follows:
sudo qemu-system-x86_64 -m 2048 --enable-kvm debian_test.qcow2
If you're running Mac OS X, you can choose the easy was (i.e. use the VirtualBox image) or the hard one (use the QEMU image). Joseph Emeras told me it was possible to get qemu 2.0.0 by installing brew (http://brew.sh/) and then
brew install qemu
.- VirtualBox
- If you're running on Mac OS X, you probably want to
use this VM (debian_test.vdi). You may want to select
the mac keyboard layout once logged in. Also make sure
you give enough RAM (e.g., 2GB) to the VM if you want
to be able to work with it and possibly make a large
data analysis. Here are roughly what the steps are for
running it:
- Run VirtualBox
- Click the "New" button
- Enter the name "PUF/JLPC VM";
- Select "Linux" with the OS Type dropdown and "Debian (64 bit)" with the Version drowdown
- Select "Next"
- On the "Memory" panel, increase to at least 2GB and select "Next"
- On the Virtual Hard Disk panel select "Existing" and then select the file with the small icon
- Select the "debian_test.vdi" file that you just downloaded
- Click "Open", "Create" and you're all set
- Run the VM by double-clicking on it
These VM are recent debian images with
- R, Rstudio, knitr, plyr and ggplot2
- emacs (set up with my emacs configuration and some pretty efficient shortcuts I demonstrate in the talk) with ess, auctex, pdflatex
- git and svn…
- gcc, cmake, and simgrid
- a bunch of example org files (including an excerpt of my
journal.org) and data to look at in the
data_analysis/
directory
This VM stuff was partly successful and here are a few issues to fix:
- Some people on Mac had applications crashing (e.g., when knitting or when calling pdflatex from Rstudio). We suspect this was due to giving only 512 MB or RAM to the VM… I generally give a 2GB limit which is not hit but allows to work at least a little.
- Some graphical gnome menus (e.g., when listing a directory or when opening the terminal preferences) are all screwed in qemu but there was no such issue with virtualbox.
- I made VMs that automatically log in the default user (kameleon) whose password is kameleon. But when the screen automatically locks after some idle time, we could never unlock it! Entering the password could not help. :( I should probably retry it and either set an empty password or a azerty/qwerty-friendly password just to be sure it does not come from a mapping issue.
Here are the slides I used. Everything went smooth except the video projector died after one hour. When we finally went to another room, there was some power surge when I plugged my AC adapter (hopefully, my laptop was intact) and the projector was not working either… :( Bad luck. Demoing rstudio and emacs without projectors was quite painful and I did lost some people here…
A few things to improve for the next time:
- The introduction/motivation was probably too long. I had added slides about the Duke story and Falsifiability that are not useful (slide 14, 17, 18). I could probably remove the execo slide as I did not really know what to say about it.
- Every time I do this kind of talk, there is a moment where I talk about Design of Experiments and the need to randomize to reduce bias. Maybe I should devote a slide to this…
- Also I had questions about how to archive things (e.g., measurements) so a few slides on our git + org-mode workflow would probably be good.
- We had all the beamer/projector issues that made us loose time (especially when demoing emacs and Rstudio) but still, it seems a little too long. I may have to shorten a bit for the summer school.
- Showing the knitr/rstudio is good as it show people it's easy and do not freak people out right away with emacs. In the end I tell people to rather use a well-configured emacs/org-mode so I wonder whether the rstudio/knitr part makes me waste time or not. I think it helps as it is comforting to see you can get something in a few clicks and have a clean IDE.