M2R Parallel Systems
Table of Contents
Sitemap
Parallel Systems
General Informations
These lectures take place in every monday from 9h45 to 12h45. The coordinator for these lectures is Arnaud Legrand
The next lecture will be on Monday 17/10 in room CUEFA 134 from 9h45 to 13h00.
The planning with lecture rooms is available here.
Objectives
Today, parallel computing is omnipresent across a large spectrum of computing platforms. At the ``microscopic'' level, processor cores have used multiple functional units in concurrent and pipelined fashions for years, and multiple-core chips are now commonplace with a trend toward rapidly increasing numbers of cores per chip. At the ``macroscopic'' level, one can now build clusters of hundreds to thousands of individual (multi-core) computers. Such distributed-memory systems have become mainstream and affordable in the form of commodity clusters. Furthermore, advances in network technology and infrastructures have made it possible to aggregate parallel computing platforms across wide-area networks in so-called ``grids.'' The popularization of virtualization has allowed to consolidate workload and resource exploitation in ``clouds'' and raise many energy and efficiency issues.
An efficient exploitation of such platforms requires a deep understanding of both architecture, software and infrastructure mechanisms and of advanced algorithmic principles. The aim of this course is thus twofold. It aims at introducing the main trends and principles in the area of high performance computing infrastructures, illustrated by examples of the current state of the art. It intends to provide a rigorous yet accessible treatment of parallel algorithms, including theoretical models of parallel computation, parallel algorithm design for homogeneous and heterogeneous platforms, complexity and performance analysis, and fundamental notions of scheduling and work-stealing. These notions will always be presented in connection with real applications and platforms.
Program and expected schedule
Check last year's schedule to get a foretaste.
26/09/2011: Arnaud Legrand
Introduction to parallel computing. High Performance Architectures Processors (superscalar, simultaneous multi-threading, multi-core, GPU…). Symmetric MultiProcessors. OS features for cluster computing Multi-threading. From clusters to Grids.Documents:
slides PC_01_parallel_architectures.pdf and What every programmer should know about memory.Work to do:
Study the slides, get more knowledge (e.g., on wikipedia) on all techniques presented in the slides (e.g., pipelining, hyperthreading, snoopy cache coherence.References:
I don't think reading these books will be particularly useful to you at the moment (read more condensed information in wikipedia instead) but here are two classical references on Grid computing.- Fran Berman, Geoffrey Fox, and Anthony Hey. Grid Computing: Making the Global Infrastructure a Reality. John Wiley & Sons, 2003.
- Ian Foster and Carl Kessellman. The Grid 2: Blueprint for a New Computing Infrastructure. Morgan Kaufmann, 2003.
03/10/2011 Arnaud Legrand
Parallel algorithms on a ring and on a grid. The key notions are speedup/efficiency, Amhdal's law, pipelining, changing granularity to improve efficiency, and searching for sequential time in parallel algorithms.Documents:
slides PC_02_parallel_algorithms.pdfWork to do
: There is a lot more in these slides that what we can talk about during the lecture but I consider you should read the rest of the slides as a homework. Just don't read them. Do the complexity evaluation by yourself. Try to implement one of these algorithms with MPI (maybe not now but as soon as you have time to give a shot to MPI).References:
- "Parallel Algorithms", by H. Casanova, A. Legrand, and
- Robert. Chapman & Hall.
- "Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes", Frank Thomson Leighton
- "Parallel Algorithms", by H. Casanova, A. Legrand, and
10/10/2011: Vincent Danjean
High Performance Networks: bandwidth, latency, DMA, PIO, overlapping. How to Efficiently Program High Performance Architectures ? System and Low Level approaches. MPI, pthreads.Documents:
slides PC_03_progpar.pdf. A document explaining the internals of the __thread keyword. Another document on Synchronization and FutexWork to do
: read the articles about futex (you can also read this article that presents a more user-oriented point of view), Read-copy-update and Race conditions on LWN (search for the previous keywords in this list, there is a bunch of associated articles).
17/10/2011: Vincent Danjean
How to Efficiently Communicate on Distributed Architectures ? Research aspects of mixing different HP API (e.g. how to efficiently use MPI and pthreads, how to efficiently use threads on hierarchical platforms, ….)Documents:
slides PC_04_commdist.pdfWork to do
: Playing with MPI on G5K: First, you may want to have a look at the following documents to know how to run an MPI code on G5K (slides in French, Wikipage%275000, and Cheat Sheet). Then, there are two possible options that do not illustrate the same issues:- Practical session on performance evaluation of parallel programs. PC_par_sort.pdf, parsort-1.0.tar.gz, mpi_sort.c This session is great for understanding the problems of measurement and parallel speedup.
- Double broadcast parallel matrix multiplication 07_MPI_tutorial.tgz. This session is great for understanding the data movements and distribution in an SPMD program and the power of collective communication operations.
07/11/2011: Arnaud Legrand
From fine-grain to coarse-grain. PRAM, sorting networks and application to implementation on NOWs.Documents:
slides PC_05_theory.pdf and as Alberto suggested, some good classnotes covering the same subject. For those that want to know about parallel matrix product on heterogeneous platforms using static load balancing. Last, here is the reference Kaue suggested on the Master Theorem that list solutions for most recurrence formulas you may come up with when working on algorithm complexity.
14/11/2011: Arnaud Legrand
Modeling parallel programs and platforms. Fundamental characteristics: Work and Depth. Dataflow graph representation of an execution. BSP programs. Introduction to Scheduling!Documents:
slides PC_06_scheduling.pdf, Peter Brucker's website on complexity results for scheduling problems, Viggo Kann's website on NP-hard optimization problems.Work to do
: Select a few research problems and try to see what are the underlying scheduling problems. Then try to formally define it and search for the closest problems using the three field notation.
21/11/2011: Vincent Danjean
From parallelism-aware algorithms to parallelism-oblivious algorithm. Work, depth and work-stealing. Illustration with a real implementation of such technique.Documents:
slides PC_07_work_stealing.pdf, PC_07_kaapi.pdf.
28/11/2011: Arnaud Legrand
Hype and trends: cloud computing and how Virtualization changed the Grid perspective, exascale computing and how energy issues changed the HPC perspective.Documents:
slides PC_08_hype.pdf
05/12/2011: Vincent Danjean
Work-stealing and data locality. Sorting and merging, FFT, matrix operations. Adaptive algorithms and cascading divide & conquer: prefix computation, data compression, linear system solvingDocuments:
slides PC_09_work_stealing.pdf
12/12/2011: Derrick Kondo
Desktop Grids. On the convergence of cloud computing and desktop grids; Google Map Reduce.Documents:
slides PC_10_mapreduce.pdf
09/01/2012: Derrick Kondo
FTA, Desktop Grid modeling, and Amazon Spot Instances:12/01/2012: Arnaud Legrand
Answering last year's exam questions.
Course Organization
The course gives 6 credits (ECTS). In previous years, each student used to performs an individual performance evaluation study. Now that the MOSIG comprises a series of lectures on Performance Evaluation, this does not need to be done anymore solely in the context of the Parallel Systems lecture. There will be an exam at the end of the year and an extra lecture was devoted to study together one of the exam of the previous years.
Bibliography
- Henri Casanova, Arnaud Legrand and Yves Robert. Parallel Algorithms. Chapman & Hall, 2008.