370CT: Distributed Programming - 1

Dr Carey Pridgeon

2016-08-06

Created: 2017-02-01 Wed 13:22

MPI

Introduction

  • MPI (Message Passing Interface) is a communication system used on distributed systems (clusters).
  • We will use the OpenMPI version, an open source build that implements the MPI standard (as all implementations must) specification
  • Read if you want, and have a couple of weeks spare, but you don’t need to in order to pass this module.
  • OpenMPI is maintained by a consortium of academic and industry partners.
  • MPI is not a language itself, it’s a library that Compilers for various languages can use (C,C++,Python/Java,Fortran) can use to enable cluster computation.

Message Passing

  • MPI works via a message passing system. Messages contain data when sent by the user. MPI uses the same message system to handle its inter node communication, but we won't cover that.
  • One node (usually the one initially loading the program) manages the distributed program by sending and receiving these messages and collating the results.

Inter Node Comms

  • MPI is’t hard to set up, and will work on any connected set of computers that share the same operating system. MPI uses ssh to communicate between nodes, and all nodes should be on the same subnet.
  • This isn’t a strict requirement for MPI, but if the computers aren’t in the same subnet/same building, you will incur network overheads.
  • Your accounts are already set up so SSH will work with MPI accross the cluster, so don't alter it.

Usage

Basic Usage - 1

  • MPI has a wrapper for gcc, called mpic++. Using this to compile MPI code builds in all dependancies.
  • The programs that result have to be passed as parameters to mpirun.
  • Programs passed to mpirun need not have been compiled for MPI, but they usually are.
  • For mpirun to work youy need to give it a list of machines set up to communicate using MPI's protocols, and some other experiment specific instructions.

Basic Usage - 3

  • MPI runs a duplicated process across all the compute nodes.
  • The process is duplicated automagically, what we control is the distribution of the data to be processed in that distributed process.

    mpi.png

No Parallel Programming with MPI

Why Not

  • MPI can do its thing quite easily on a multi core system, as you could see, indeed may have seen when you play with the exercises.
  • Working on a multi core system means Parallel, not Distributed, and you can mix the two paradigms easily with MPI.
  • We don't want to though, because MPIs strength is with distribution, and it can be used in conjunction with OpenMP, a library that is specifically aimed at paralellism only.

Exercises

intro to mpirun

  • Go through exercise_d1 to get an initial idea of how MPI works over the cluster.
  • That exercise involves no coding, so you should do it fairly easily.
  • Once you have, start on exercise_d2, where you will write some basic c++ code to try out MPI.