![]() |
||||||||||
|
||||||||||
|
IFS documentation front page
Chapter 1. Technical overview Chapter 2. FULL-POS post-processing and interpolation Chapter 3. Parallel implementation REFERENCES |
Next
Section Previous Section 3.1 IntroductionIn order to achieve efficient execution on computers with multiple processors and distributed memory, a message-passing programming model has been adopted. This enforces a local view of data for each processor and requires that data belonging to other processors be copied into the local memory before it can be read. This is accomplished using messages which are communicated by means of message-passing library routines such as MPI. Given a strong desire to protect the scientific code from details of the parallel implementation, a transposition strategy is used to handle the distributed memory code. With this approach, the complete data required are redistributed at various stages of a time step so that the arithmetic computations between two consecutive transpositions can be performed without any interprocess communication. Such an approach is feasible because data dependencies in the spectral tranform method exist within only one coordinate direction, this direction being different for each algorithmic component. An overwhelming practical advantage of this technique is that the message-passing logic is localized in a few routines. It also turns out to be a very efficient method. The transpositions are executed prior to the appropriate algorithmic stage, so that the computational routines (which constitute the vast bulk of the IFS source code) need have no knowledge of this activity. In each of the algorithmic stages the source code is similar on each processor but the data are distributed among the processors. The distribution of data among the processors is determined during the setup phase of the IFS and involves, in some cases, quite complex calculations to achieve load balancing of the work among the processors. Care has been taken to be able to re-use the algorithms from the original serial code so that reproducible results are retained. For example, it has been possible to keep much of the physics code untouched by allowing for a local or a global view, respectively, when it is appropriate. The call to the physics routines just handles a set (vector) of grid-point columns, in principle from anywhere on the globe. The setup phase makes sure that the proper Coriolis parameters, orography, etc. are correctly associated with each grid-point vector. The definitions of many of the variables used to describe the local data distributions will be described in Appendix A. For the purpose of further description of the parallel strategy, the algorithmic steps are referred to in the following order:
Next Section Previous Section |
|||||||||
|
|
|||||||||