C++ Parallel Processing in Computational Finance, Part I: An Overview of OpenMP www.openmp.org
This blog is the first in a series of three blogs on
the application of parallel programming techniques to Computational Finance
with special attention being paid to Monte Carlo and PDE techniques. The
current blog is meant as an overview of OpenMP at 30,000 feet.
OpenMP is a software product consisting of a
collection of compiler directives (called pragmas),
library functions and environment variables that developers can use to specify
shared-memory parallelism in C. By shared memory we mean that all threads
share a single address space and they communicate with each other by writing
and reading shared variables. The OpenMP Application Programming Interface
(API) is portable between various shared memory models. It has been tailored in
such a way to support programs than run in both parallel and sequential
modes. It is especially useful for
large array-based applications. The main features are:
- Support
for parallelization of loops and iterations
- Work-sharing
constructs and the construction or parallel regions
- Synchronisation constructs
- Sharing
and privatization of data
We shall discuss how to realize the above features in
OpenMP in the next blog.
Many engineering, scientific and computational finance
applications are expressed in the form of iterative constructs. In these cases
the code consists of various kinds of loops. In general, loops are needed when
we need to navigate in (hierarchical) data structures such and vectors,
matrices, trees and graphs. Improving the performance of applications that rely
heavily on loop constructs is a high priority. Given that many developers will
choose to port their serial programs to the equivalent parallel ones using
OpenMP we now discuss the forces to be reckoned with when we port loop-based
code written for serial machines to parallelized loop code. There are a number
of forces to examine that are related to the correctness and efficiency of the
parallel code:
. Sequential
equivalence: a given program should produce the same results when executed with one thread or with more than
one thread. We use a series of transformations that we apply on the loops in
order to achieve this end. These are called semantically
neutral transformations if they leave the semantics of the program
unchanged. A point to note is that due to round-off errors the results from a
serial loop may be different from those in the equivalent parallel loop. For example,
adding numbers in a loop will give different answers depending on the order in
which they are added, for example if we add the numbers serially or in
ascending order. This may be unacceptable and we should then decide not to
parallelise the loop.
. Incremental
parallelism: we parallelise a program by examining one loop at a time, for
example. We apply a sequence of incremental transformations and we test the
sequential equivalence after each increment. In this way we can be assured of
correctness on the one hand and we can measure the performance improvement on
the other hand.
. Memory utilization: In order to achieve good performance it is important to note that the way data is accessed (for example, using indexing operators or by dereferencing) is consistent with the memory model of the operating system.
Any comments or questions, I’d be happy to answer them.
Daniel
























Recent Forum Discussions