SkaMPI measurements on griffon G5K
Table of Contents
Sitemap
It's now time for me to learn how perform my own MPI measurements on G5K (yeah, it's been a few years I haven't actually run a real MPI code). My goal is to play a little bit with SkaMPI and check whether I can statistically trust the measurements of Mark and Stéphane as well as possibly improve them.
Commandes pour accéder à griffon
http://www.grid5000.fr/mediawiki/images/G5k_cheat_sheet.pdf
Ajout de export OMPI_MCA_plm_rsh_agent=oarsh
dans le .bashrc
(http://www.grid5000.fr/mediawiki/index.php/Run_MPI_On_Grid'5000)
ssh nancy.grid5000 # curieusement griffon nancy.g5K ne marche pas, voir avec Mt oarsub -I -l nodes=1,walltime=2 -p "cluster='griffon'"
Fichier ski utilisé
# ---------------------------------------------------------------------- # pt2pt measurements set_min_repetitions(8) set_max_repetitions(16) set_max_relative_standard_error(0.03) set_skampi_buffer(64mb) datatype = MPI_CHAR # ---------------------------------------------------------------------- comm_pt2pt = comm2_max_latency_with_root() begin measurement "Pingpong_Send_Recv" for count = 1 to ... step *sqrt(2) do measure comm_pt2pt : Pingpong_Send_Recv(count, datatype, 0, 1) od end measurement
Informations sur Griffon
https://api.grid5000.fr/2.0/ui/nodes.html
network_adapters_2_interface: InfiniBand network_adapters_2_rate: 20000000000 network_adapters_0_device: eth0 network_adapters_0_rate: 1000000000 On est donc en Ethernet 1Gbit et en IB 20Gbit.
Lancements de skampi
Stephan's measurements
/home/sgenaud/openmpi.install/bin/mpiexec --mca btl_tcp_if_include eth0 --mca btl_tcp_if_exclude ib0 --mca btl tcp,self -mca orte_base_help_aggregate 0 -mca plm_rsh_ag ent oarsh -machinefile machinefile -n 2 skampi -i ski_smpi/skampi_pt2pt.ski count= 8388608 8388608 144965.0 15.2 32 144965.0 143205.7
-> Bw = 8388608/144965.0 = 57.8664367261063 Mbytes/s
Use tcp
mpirun --mca btl self,tcp -machinefile $OAR_NODEFILE ./skampi -i ski/skampi_pt2pt_alvin.ski -o p2p.log 2>/dev/null ; tail p2p.log count= 8388608 8388608 19509.3 14.2 8 19448.5 19484.8 count= 11863283 11863283 28949.4 485.2 8 27303.0 28949.4 count= 16777216 16777216 41638.3 71.9 8 41607.3 41525.5 # end result "Pingpong_Send_Recv" # duration = 1.71 sec # Finished at Mon Jan 23 16:50:46 2012 # Total runtime 2 seconds
-> Bw= 8388608/19509.3 = 429.979958276309 Mb/s Interrestingly, this measure is not stable. I sometimes get 435.502624351699, 464.717079386184, … Note that I use at most 16 measurements but this means the speed is not really stable.
Use openib
mpirun --mca btl self,openib -machinefile $OAR_NODEFILE ./skampi -i ski/skampi_pt2pt_alvin.ski -o p2p.log 2>/dev/null ; tail p2p.log count= 8388608 8388608 13706.0 275.7 8 13696.0 13434.9 count= 11863283 11863283 18585.0 112.0 8 18580.2 18565.8 count= 16777216 16777216 26465.1 5.3 8 26417.7 26465.1 # end result "Pingpong_Send_Recv" # duration = 1.30 sec # Finished at Mon Jan 23 16:54:39 2012 # Total runtime 2 seconds
-> Bw = 8388608/13706 = 612.039106960455. This is quite stable.
Use ipoib
mpirun --mca btl self,ipoib -machinefile $OAR_NODEFILE ./skampi -i ski/skampi_pt2pt_alvin.ski -o p2p.log 2>/dev/null ; tail p2p.log count= 8388608 8388608 13706.0 275.7 8 13696.0 13434.9 count= 11863283 11863283 18585.0 112.0 8 18580.2 18565.8 count= 16777216 16777216 26465.1 5.3 8 26417.7 26465.1 # end result "Pingpong_Send_Recv" # duration = 1.30 sec # Finished at Mon Jan 23 16:54:39 2012 # Total runtime 2 seconds
Same behavior as openib here.
Sebastien's arguments
mpirun --mca btl self,tcp --mca btl_tcp_if_include lo,br0 -machinefile $OAR_NODEFILE ./skampi -i ski/skampi_pt2pt_alvin.ski -o p2p.log 2>/dev/null ; tail p2p.log count= 8388608 8388608 20787.5 13.7 8 20779.5 20711.0 count= 11863283 11863283 29042.0 26.2 8 28986.8 28333.2 count= 16777216 16777216 41008.8 17.0 8 40965.3 40983.5 # end result "Pingpong_Send_Recv" # duration = 1.71 sec # Finished at Mon Jan 23 16:58:00 2012 # Total runtime 2 seconds
-> Bw = 8388608/20787.5 = 403.540974143115 Even when asking to use br0 (eth0 is an epic failure), it keeps using IB (400Mbytes/s is too large for a Gb ethernet link).
Stéphane's arguments
/home/sgenaud/openmpi.install/bin/mpiexec --mca btl_tcp_if_include eth0 --mca btl_tcp_if_exclude ib0 --mca btl tcp,self -mca orte_base_help_aggregate 0 -mca plm_rsh_agent oarsh -machinefile machinefile -n 2 skampi -i ski_smpi/skampi_pt2pt.ski
Stéphane utilisait son propre openmpi et il a fallu que je remplace eth0 par br0 car ça ne marche pas.
mpirun --mca btl_tcp_if_include br0 --mca btl_tcp_if_exclude ib0 --mca btl tcp,self -mca orte_base_help_aggregate 0 --mca plm_rsh_agent "oarsh" -machinefile $OAR_NODEFILE ./skampi -i ski/skampi_pt2pt_alvin.ski -o p2p.log 2>/dev/null; tail p2p.log count= 8388608 8388608 20786.2 8.1 8 20754.7 20694.3 count= 11863283 11863283 29065.3 60.1 8 29065.3 28288.8 count= 16777216 16777216 41059.5 16.5 8 41038.8 40960.0 # end result "Pingpong_Send_Recv" # duration = 1.72 sec # Finished at Mon Jan 23 17:06:39 2012 # Total runtime 2 seconds
Pas mieux donc. Et la même chose en disant juste exclude ib0:
mpirun --mca btl_tcp_if_exclude ib0 --mca btl tcp,self -mca orte_base_help_aggregate 0 --mca plm_rsh_agent "oarsh" -machinefile $OAR_NODEFILE ./skampi -i ski/skampi_pt2pt_alvin.ski -o p2p.log 2>/dev/null; tail p2p.log count= 8388608 8388608 19631.0 27.1 8 19631.0 19536.7 count= 11863283 11863283 28554.7 616.4 8 28554.7 27378.3 count= 16777216 16777216 38772.5 49.2 8 38626.3 38730.5 # end result "Pingpong_Send_Recv" # duration = 1.66 sec # Finished at Mon Jan 23 17:08:08 2012 # Total runtime 1 seconds
Toujours pareil. Pas moyen de l'empécher d'utiliser IB et pas moyen d'enlever IB:
/sbin/modprobe -r `lsmod | grep '^ib' | sed 's/ .*//' ` FATAL: Error removing ib_ipoib (/lib/modules/2.6.32-5-amd64/kernel/drivers/infiniband/ulp/ipoib/ib_ipoib.ko): Operation not permitted
Deploying my own image:
dtach -A /tmp/alegrand-dtach-socket bash oarsub -I -l 'nodes=1,walltime=1' -p "cluster='griffon'" -t deploy kadeploy3 -e squeeze-x64-nfs -f $OAR_NODE_FILE -k ~/.ssh/id_rsa.pub /etc/init.d/openibd stop /etc/init.d/mx stop mpirun --mca btl self,tcp -machinefile $OAR_NODEFILE ./skampi -i ski/skampi_pt2pt_alvin.ski -o p2p.log 2>/dev/null ; tail p2p.log count= 8388608 8388608 19631.0 27.1 8 19631.0 19536.7 count= 11863283 11863283 28554.7 616.4 8 28554.7 27378.3 count= 16777216 16777216 38772.5 49.2 8 38626.3 38730.5 # end result "Pingpong_Send_Recv" # duration = 1.66 sec
-> Bw = 8388608/19631 = 427.314349752942
mpirun --mca btl_tcp_if_include br0 --mca btl_tcp_if_exclude ib0 --mca btl tcp,self -mca orte_base_help_aggregate 0 --mca plm_rsh_agent "oarsh" -machinefile $OAR_NODEFILE ./skampi -i ski/skampi_pt2pt_alvin.ski -o p2p.log 2>/dev/null; tail p2p.log count= 8388608 8388608 19631.0 27.1 8 19631.0 19536.7 count= 11863283 11863283 28554.7 616.4 8 28554.7 27378.3 count= 16777216 16777216 38772.5 49.2 8 38626.3 38730.5 # end result "Pingpong_Send_Recv" # duration = 1.66 sec
Pareil!
/sbin/ifconfig eth0 Link encap:Ethernet HWaddr 00:e0:81:b2:c0:46 inet addr:172.16.65.90 Bcast:172.16.79.255 Mask:255.255.240.0 inet6 addr: fe80::2e0:81ff:feb2:c046/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:1640 errors:0 dropped:0 overruns:0 frame:0 TX packets:928 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:353702 (345.4 KiB) TX bytes:139603 (136.3 KiB) Interrupt:18 Memory:dca00000-dca20000 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:54 errors:0 dropped:0 overruns:0 frame:0 TX packets:54 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:4908 (4.7 KiB) TX bytes:4908 (4.7 KiB)
Rhaaaa, le boulet, je passe par le loopback!!! C'est pour ça. Y'a plus qu'a recommencer.
Use two production nodes (and not just one… :( )
uniq $OAR_NODEFILE > machinefile mpirun --mca btl self,tcp -machinefile machinefile ./skampi -i ski/skampi_pt2pt_alvin.ski -o p2p.log 2>/dev/null ; tail p2p.log count= 8388608 8388608 74580.0 41.5 8 74580.0 73331.5 count= 11863283 11863283 104334.2 62.9 8 104334.2 102909.4 count= 16777216 16777216 145859.3 117.3 8 145859.3 144548.6 # end result "Pingpong_Send_Recv" # duration = 6.46 sec
-> Bw= 8388608/74580 = 112.477983373559 Mb/s YEAAAH!!!!
Getting Stéphane's Ski file
# ---------------------------------------------------------------------- # pt2pt measurements set_min_repetitions(32) set_max_repetitions(64) set_max_relative_standard_error(0.03) set_skampi_buffer(32768kb) datatype = MPI_CHAR # ---------------------------------------------------------------------- comm_pt2pt = comm2_max_latency_with_root() begin measurement "Pingpong_Send_Recv" for count = 1 to ... step *2 do measure comm_pt2pt : Pingpong_Send_Recv(count, datatype, 0, 1) od for count = 1024 to 8192 step +512 do measure comm_pt2pt : Pingpong_Send_Recv(count, datatype, 0, 1) od for count = 32768 to 262144 step +1024 do measure comm_pt2pt : Pingpong_Send_Recv(count, datatype, 0, 1) od end measurement
Comparing two measurements
I have performed two measurements to see whether the measurements were stable or not. Everything is detailed in this Sweave document. The conclusion is that there is some noise, a piecewise (not necessarily linear) model could be just fine and a sound experiment plan could be set up.
Entered on