Chapter 61. Parallel programming framework

There are many types of parallel programming. One classic model for parallel programming is that of Communicating Sequential Processes (CSP). CSP models a parallel program as a set of sequential, or serial programs that talk to each other.

Within CSP two main models exist;

NSCLDAQ provides a high level library that insulates your program from the detailed mechanisms of specific message passing libraries. Programs written that only communicate via message passing can be easily rehosted from a threaded implementation to a distributed implementation, changing only initialization code.

In this chapter we'll describe:

61.1. Concepts and classes

Parallel programs build on top of this library consists of objects that communicate via messagse passing. To limit the amount of data copying required, messages can, not only be a single block of data but can be a block list that's gathered by the library into a single message.

The assumption is that a program consists of some source of data and that processing these data involve parcelling out work units to parallel processing objects. which then fan their results back in to processing objects that gather the data.

Thus the programming model has you stringing together a pipeline (each element of the pipeline runs in parallel) where stages of those pipelines may include stages that run multiple processors that operate in parallel on data streaming through the pipeline.

The following classes, therefore, model several types of processing elements that can appear in these data/processing flows:

CDataSourceElement

These elements are intended to connect the program to concrete sources of data. In NSCLDAQ these sources can be ring items, files or some other communicating processor (internal or external).

These elements are normally at the start of the processing pipeline.

CDataSinkElement

These elements are intended to connect the program to some concrete sink of data. In NSCLDAQ, sinks can be ring buffers, files or other communicating processors.

CParallelWorker

A generic parallel worker. This is normally used to encapsulate code that runs in the data parallel segments of the computation. Normally this encapsulation

From these classes you can see that the computation normally takes the form of a pipeline where data comes from a CDataSourceElement runs through several CParallelWorker elements and then in the end data is emitted from the program via a CDataSinkElement

Having these classes is all well and good, but how do they communicate with each other? Following the pipeline model, each element of the computation gets its data from a CReceiver and sends the results of its computation to the next stage of the pipeline via a CSender.

CReceiver and CSender objects encapsulate another class derived from CTransport. Transport classes are actual do the messaging required by the receiver and sender objects. In doing so, the contain code specific to the type communication library being used (e.g. MPI or ZeroMQ), and they also encapsulate a specific communication pattern.

Some base classes for transports are:

CTransport

The abstract base class for all transports. This class provides the interfaces used by sender and receiver objects to request actual communication.

CFanoutTransport

Abstract base class for transports that fan-out work items to data parallel sections of the program. In addition to the data transfer interfaces, this class provides interfaces to inform the members of the fanout that there is no more data available.

CFanoutClientTransport

Abstract base class for transports that get data from a fanout transport. The model provided requires that each client provide a unique integer client identifier. This class encapsulates interfaces for both setting the id and communicating the id to the other end of the transport.

Thuse the computation can be made up of processing elements that get and send data without actually knowing how that's done. Program initialization can select actual transports and bind them into processing elements. If necessary, program initialization can also allocate processors to computing resources. This allows a computation to be rehosted without the actual computing elements being aware of the process.