370CT: Parallel Programming - 2

Dr Carey Pridgeon

2016-08-06

Created: 2017-02-27 Mon 12:37

OpenMP Features Walkthrough

Compatible Languages

  • Fortran
  • C
  • C++
  • Matlab Mex files (I have zero clue how to do this), but here is a link about it.
  • We will stick with C++

Pragmas - 1

  • A pragma is a directive used during compile time to control what the compiler is to do with a given block of code.
  • Outside of parallel programming I mostly use them to specify compile time inclusion or exclusion of code.
  • in OpenMP, pragma's are used to tell the compiler how to incorporate your code with OpenMP.

Pragmas - 2

#pragma omp parallel
  • Declares that the code scope that follows this pragma will be compiled as a parallel block.
  • Scope is that which normally defines scope in the language used.
  • What type of parallel block is determined by further options.

Pragma practice

  • Using the provided base code, run this.
#pragma omp parallel 
  {
   int i = omp_get_thread_num() 
   std::cout << omp_get_thread_num()  << std::endl;
  }
  • If all goes well you should have a program that prints out the number of each thread that OpenMP makes.

OMP Options

  • basic extentions to the parallel directive

    #pragma omp parallel for
    
  • This tells openmp to convert the next code block (which must be a for loop with no breakouts) into a threadpool.
#pragma omp parallel sections
  • This allows different functions to be run in the threads of a threadpoo.

Barriers - 1

  • Each threadpool is followed by a barrier. All threads wait until every other thread has reached this barrier before the threadpool can pass through the barrier and the program continue.
  • Barriers are implicit, they are always present, so must be disabled if you don't want to use them.
  • To disable them you follow the type of the parallel region required with a nowait clause.
#pragma omp parallel for nowait
#pragma omp parallel sections nowait

Barriers - 2

  • Barriers can be disabled for for loop regions if no following code depends on the output of the threadpool.
  • Barriers can be disabled for for Sections if you want any functions launched by one to have a longer runtime (e.g a menu thread), or if no following code depends on the output.

A bit more thread num practice

  • By using the command omp_get_thread_num() we can access an array that we are parallelising using the #pragma omp parallel directive.
#define sqr ((x)*(x))
int arr[10]= {2,3,4,5,6,7,8,9,10};
#pragma omp parallel 
 {
   int i = omp_get_thread_num() 
                std::cout << sqr(arr[i]) << std::endl;
 }

Clauses

Clauses

  • Type in and run this code. You can use any operation on the array elements, the point is just to show how thread numbers can be used for array access.
  • If there are more array elements than threads (as there will be in any non trivial project) then the threads will be fed new tasks as they complete previous ones until the work is done.

Setting the thread count

  • If you want to, you can set a specific number of threads
#pragma omp parallel num_threads(4)
  • This can work to reduce the number of threads, but may not work if you seek to increase the number of threads, as there may be an effective upper limit..
  • Threads have a creation cost, so creating ones you could do without is wasteful.
  • Openmp will create a small amount of threads if it only needs a few.

Master

  • The Master block will execute enclosed code in one thread of a thread pool.
#pragma omp master
{
    block
}
  • It cannot be used in a parallel for loop
  • This gives us a way of telling OpenMP that we want this code to be run in the threadpool, but only once, not by every thread.
  • Use it in the code you wrote earlier to try out parallel, so that program only prints to screen once.

Critical - 1

  • A critical block implements a classic critical section, controlled entirely by OpenMP.
#pragma omp critical
{
    block
}
  • Use to print to screen (screen printing is not thread safe).
  • Use for nothing else. If the logic of your program needs critical sections, it is a bad parallel program.
  • Printing in a threadpool is not really a good idea anyway, other than for testing or initial learning.
  • As with all critical sections, they will not scale well.

Nowait

  • Nowait can disable the Barriers in some OpenMP contructs, in Parallel For it allows you tell OpenMP that program execution can continue before the threadpool finishes.
  • More on this later, but using it in Parralel For is something to consider carefully.

Variable settings

Variable settings

OpenMP lets you assign properties to variables that effect how they function within the structured block.

Shared

#pragma omp parallel shared (x)
  • In OpenMP, all variables are shared by default in parallel regions, unless declared within the scope of a given parallel region.
  • Explicitly specifying them as such leads to clearer code that is easier to debug.

Private

#pragma omp parallel private (x)
  • This type of variable, again single or compound type, is duplicated and provided uninitialized to each thread in the block.
  • By default these are used internally by each thread then discarded. If this isn’t the required result, there are other options.

Firstprivate

#pragma omp parallel firstprivate (x)
  • Declares one or more list items to be private to a task.
  • After the structured block completes it updates the variable/s passed with the value held in the firstprivate variable before the block began.
  • So it can be set, used, then reset to be used again. Not specifically by another threadpool.

Lastprivate

#pragma omp parallel lastprivate (x)
  • Declares one or more list items to be private to a task.
  • After the structured block completes it updates the variable/s passed with the value held in the last thread or section to complete.
  • The private var is uninitialized, and on completion contains the last value it held in the last thread to complete.

threadprivate

#pragma omp parallel threadprivate (x)
  • The Threadprivate directive is used to make global file scope variables local and persistent to a thread through the execution of multiple parallel regions.
  • This allows some simplification of programming logic when privates are used repeatedly.