M2R Parallel Systems

Parallel Systems

Sitemap

Alvin's Homepage

Teaching

2019

RICM4: Probabilité et Simulation

RICM4: Évaluation de performance

2018

RICM4: Probabilité et Simulation

RICM4: Évaluation de performance

M2R Scientific Methodology and Performance Evaluation

2017

RICM4: Probabilité et Simulation

RICM4: Évaluation de performance

M2R Scientific Methodology and Performance Evaluation

2016

RICM4: Probabilité et Simulation

RICM4: Évaluation de performance

M2R Scientific Methodology and Performance Evaluation

M2R Parallel Systems

2015

RICM4: Probabilité et Simulation

RICM4: Évaluation de Performance

M2R Parallel Systems

M2R Scientific Methodology and Performance Evaluation

2014

RICM4: Probabilité et Simulation

RICM4: Évaluation de Performance

M2R Parallel Systems

M2R Scientific Methodology and Performance Evaluation

2013

RICM4: Probabilité et Simulation

RICM4: Évaluation de Performance

M2R Parallel Systems

M2R Performance Evaluation

Evaluation of SimGrid's Lazy Mechanism for Network Settings

2012

Brown Bag Lunch Sessions

M2R Parallel Systems

2011

M2R Parallel Systems

M2R Performance Evaluation

2010

M2R Parallel Systems

M1 MOSIG Operating Systems

2009

M2R Parallel Systems

2008

M2R Parallel Systems

2007

M2R Evaluation de Performance

Research

misc

Emacs init file written in org-mode

Lab Blog

2016

June

21. Talk on Reproducible Research at the Inria Scientific Days

May

10. Talk at la maison de la simulation

2015

2014

2013

2012

Agenda

Parallel Systems

General Informations

These lectures take place in every monday from 9h45 to 12h45. The coordinator for these lectures is Arnaud Legrand.

The next lecture will be on Monday 24/09 in room CNAM 128 from 9h45 to 13h00.

In doubt, the planning with lecture rooms is available here.

Objectives

Today, parallel computing is omnipresent across a large spectrum of computing platforms. At the ``microscopic'' level, processor cores have used multiple functional units in concurrent and pipelined fashions for years, and multiple-core chips are now commonplace with a trend toward rapidly increasing numbers of cores per chip. At the ``macroscopic'' level, one can now build clusters of hundreds to thousands of individual (multi-core) computers. Such distributed-memory systems have become mainstream and affordable in the form of commodity clusters. Furthermore, advances in network technology and infrastructures have made it possible to aggregate parallel computing platforms across wide-area networks in so-called ``grids.'' The popularization of virtualization has allowed to consolidate workload and resource exploitation in ``clouds'' and raise many energy and efficiency issues.

An efficient exploitation of such platforms requires a deep understanding of both architecture, software and infrastructure mechanisms and of advanced algorithmic principles. The aim of this course is thus twofold. It aims at introducing the main trends and principles in the area of high performance computing infrastructures, illustrated by examples of the current state of the art. It intends to provide a rigorous yet accessible treatment of parallel algorithms, including theoretical models of parallel computation, parallel algorithm design for homogeneous and heterogeneous platforms, complexity and performance analysis, and fundamental notions of scheduling and work-stealing. These notions will always be presented in connection with real applications and platforms.

Program and expected schedule

Check last year's schedule to get a foretaste.

24 September 2012: Arnaud Legrand Introduction to parallel computing. High Performance Architectures Processors (superscalar, simultaneous multi-threading, multi-core, GPU…). Symmetric MultiProcessors. OS features for cluster computing Multi-threading. Cache-aware vs. cache-oblivious algorithms.
- Documents: slides PC_01_parallel_architectures.pdf and What every programmer should know about memory
- Work to do: Study the slides, get more knowledge (e.g., on wikipedia) on all techniques presented in the slides (e.g., pipelining, hyperthreading, snoopy cache coherence, Grid computing, etc.). Do not limitate to the reading of the previous links and browse related links to get a good picture of recent evolutions. You can also have a look at the top500 and summary
1 October 2012: Arnaud Legrand From clusters to Grids. Communication models.
- Documents: End of the previous slides + beginning of these ones PC_02_parallel_algorithms.pdf
- References: I don't think reading these books will be particularly useful to you for this lecture (read more condensed information in wikipedia instead) but here are two classical references on Grid computing.
  - Fran Berman, Geoffrey Fox, and Anthony Hey. Grid Computing: Making the Global Infrastructure a Reality. John Wiley & Sons, 2003.
  - Ian Foster and Carl Kessellman. The Grid 2: Blueprint for a New Computing Infrastructure. Morgan Kaufmann, 2003.
8 October 2012: Arnaud Legrand Parallel algorithms on a ring and on a grid. The key notions are speedup/efficiency, Amhdal's law, pipelining, changing granularity to improve efficiency, and searching for sequential time in parallel algorithms.
- Documents: The series of slides from the previous lecture.
- Work to do: You need to read the whole set of slides even if I haven't presented all of them during the lecture. You should also play with MPI on G5K: First, you may want to have a look at the following documents to know how to run an MPI code on G5K (slides in French, Wikipage, and Cheat Sheet). Then, there are two possible options that do not illustrate the same issues:
  - Practical session on performance evaluation of parallel programs. PC_par_sort.pdf, parsort-1.0.tar.gz, mpi_sort.c This session is great for understanding the problems of measurement and parallel speedup.
  - Double broadcast parallel matrix multiplication 07_MPI_tutorial.tgz This session is great for understanding the data movements and distribution in an SPMD program and the power of collective communication operations.
15 October 2012: Vincent Danjean High Performance Networks: bandwidth, latency, DMA, PIO, overlapping. How to Efficiently Program High Performance Architectures ? System and Low Level approaches. MPI, pthreads, openMPI, CUDA, openCL, …
- Documents: slides PC_04_HPC_101.pdf
22 October 2012: Arnaud Legrand From fine-grain to coarse-grain. PRAM, sorting networks and application to implementation on NOWs.
- Documents: PC_05_theory.pdf (you need to read and understand the whole set of slides, even those on FFT that have not been presented during the lecture; they are very similar in essence to the sorting networks and are thus a good exercise) and beginning of PC_06_scheduling.pdf.
5 November 2012: Arnaud Legrand Modeling parallel programs and platforms. Fundamental characteristics: Work and Depth. Dataflow graph representation of an execution. BSP programs. Introduction to Scheduling (see previous slides)
12 November 2012: Vincent Danjean How to Efficiently Communicate on Distributed Architectures ? Research aspects of mixing different HP API (e.g. how to efficiently use MPI and pthreads, how to efficiently use threads on hierarchical platforms, ….)
- Documents: slides PC_04_HPC_102.pdf
19 November 2012: Vincent Danjean From parallelism-aware algorithms to parallelism-oblivious algorithm. Work, depth and work-stealing. Illustration with a real implementation of such technique.
- Documents: slides PC_07_WS.pdf and slides on Kaapi PC_07_kaapi.pdf
26 November 2012: Vincent Danjean Work-stealing and data locality. Sorting and merging, FFT, matrix operations. Adaptive algorithms and cascading divide & conquer: prefix computation, data compression, linear system solving
- Documents: slides PC_08_algo_adaptatif.pdf
3 December 2012: Arnaud Legrand Hype and trends: cloud computing and how Virtualization changed the Grid perspective, exascale computing and how energy issues changed the HPC perspective. Desktop Grids. On the convergence of cloud computing and desktop grids; Google Map Reduce.
- Documents: Cloud + MR slides PC_10_mapreduce.pdf, Cloud + Exascale slides PC_08_hype.pdf
10 December 2012: Arnaud Legrand Recent challenges and trends in HPC linear algebra: a recap on the whole lecture.
- Documents: slides PC_11_linalg_challenges.pdf
17 December 2012: Arnaud Legrand Answering last year's exam questions.
21 January 2013 Exam

Course Organization

The course gives 6 credits (ECTS). In previous years, each student used to performs an individual performance evaluation study. Now that the MOSIG comprises a series of lectures on Performance Evaluation, this does not need to be done anymore solely in the context of the Parallel Systems lecture. There will be an exam at the end of the year and an extra lecture should be devoted to study together one of the exam of the previous years.

Bibliography

Henri Casanova, Arnaud Legrand and Yves Robert. Parallel Algorithms Chapman & Hall, 2008.

M2R Parallel Systems

Table of Contents

Sitemap

Parallel Systems

General Informations

Objectives

Program and expected schedule

Course Organization

Bibliography