370CT: Parallel Programming - 4

Dr Carey Pridgeon


Created: 2017-01-22 Sun 11:55

Thread Building Blocks

TBB - 1

  • Thread Building Blocks (TBB) is a C++ Template Library that provides task level paralelism.
  • It first appeared in 2006 alongside Intel's first duel core processor.
  • TBB contains algorithms and data structures that simplify some aspects of parallel programming.
  • It is rapidly becoming an industry standard tool, so learning it can be beneficial to your careers.
  • OpenMP is still more prevelent, but arguably less capable.

TBB - 2

  • Templates are a feature of the C++ programming language that allows functions and classes to operate with generic types.
  • We need to use Templates to use TBB, but we do not need to understand how templates work in too much detail.
  • OpenMP’s approach simply isn’t fine grained enough to scale to the sorts of systems and problems we can expect in the future.


Partitioners - 1

  • A partitioner specifies how a loop template should partition its work among threads.
  • Auto partitioning is still an active research topic, in terms of optimality.
  • The default behavior of the loop templates parallel_for, parallel_reduce, and parallel_scan tries to recursively split a range into enough parts to keep processors busy, not necessarily splitting as finely as possible.
  • auto_partitioner() is the default action.

Partitioners - 2

  • affinity_partitioner
    • Performs sufficient splitting to balance load, not necessarily splitting as finely as Range::is_divisible permits.
  • simple_partitioner
    • Recursively splits a range until it is no longer divisible. This used to be default, till replaced by autopartitioner.
  • Most of the time, the default will do until you hit rather complex code.


  • This does exactly what parallel for in OpenMP does, except for adding task stealing.
#include "tbb/tbb.h"
using namespace tbb;
void ParallelApplyFoo( float a[], size_t n ) {tbb::parallel_for( size_t(0), n, [&]( size_t i ) {
     } );



  • C++ Vectors aren't threadsafe, so TBB provides an alternative in concurrent_vector.
#include <vector>
#include "tbb/tbb.h"
using namespace std;
tbb::concurrent_vector<int> my_list;
void add_element(int i) {
int main() {
    const int size = 100000;
    tbb::parallel_for(0,size,1, [=](int i) {
   } );
    return 0;


  • Sorts a sequence or a container in a threadsafe manner.
#include "tbb/parallel_sort.h"
#include <math.h>
using namespace tbb;
const int N = 100000;
float a[N], b[N], c[N], d[N];
int main() {
    for( int i = 0; i < N; i++ ) {
        a[i] = sin((double)i);
        b[i] = cos((double)i);
        c[i] = 1/sin((double)i);
        d[i] = 1/cos((double)i);
    parallel_sort(a, a + N);
    parallel_sort(b, b + N, std::greater<float>());
    parallel_sort(d, std::greater<float>());
    return 0;


  • Reduce does the same job as it did in OpemMP.
  • It is more complicated to set up, as with most of TBB, but once set up the same code can be edited easily for a multitude of tasks.

Nested Parallelism

  • This involves using Parallel_For with Blocked_range_2d
  • Blocked_range_2d and 3d Allows threads to extend over more than one dimension of a structure.
  • If your calculation is on a matrix, TBB will operate over both dimensions simultaniously.


  • Not all iteration problems have a known iteration space so need to exit on a condition being met.
  • OpenMP cannot do this, due to its use of threadpools.
  • With parallel_do tasks are added to the job queue if the exit condition hasn't been met, so it never 'breaks' out (a thing that parallel threading libraries can't normally do), it just stops.
void operator()( Cell* c,tbb::parallel_do_feeder<Cell*>& feeder ) const {
    // Restore ref_count in preparation for subsequent traversal.
    c->ref_count = ArityOfOp[c->op];
    for( size_t k=0;k<c->successor.size();++k){
        Cell* successor=c->successor[k];
        if(0== --(successor->ref_count)){


  • This is nearly TBB's version of sections (in reality just single thread task groups with no task stealing), and operates in almost the same way.
  • You cannot disable the implicit barrier, at least not that I can find.
void RunFunctions() {
    tbb::parallel_invoke(Function1, Function2, Function3);