![]() |
||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
|
IFS documentation front page
Chapter 1. Technical overview Chapter 2. FULL-POS post-processing and interpolation Chapter 3. Parallel implementation REFERENCES |
Next
Section Previous Section 3.7 Semi-Lagrangian calculation3.7.1 IntroductionThe semi-Lagrangian calculation in the IFS consists of two parts: the computation of a trajectory from a grid point backwards in time to determine the departure point; and the interpolation of various quantities at the departure point and at the mid point of the trajectory. The issue for distributed memory computer systems is that both these parts require access to grid-point data held on neighbouring processors. For shared memory systems this issue can be resolved by processors simply accessing data in the shared memory while, for distributed-memory systems, message passing is required to obtain these data. The grid-column data that could potentially be required on a processor from neighbouring processors (called the halo) is, in the present implementation, calculated assuming a conservative estimate of the global maximum wind likely to be encountered (VMAX2 (m sec-1)) and the model time step TSTEP (s). Typical values for VMAX2 are 150-200 m sec-1. The advantage of using a fixed (large) VMAX2 is that semi-Lagrangian communication tables can be calculated once and for all during the setup phase after the distribution of grid columns to processors has been defined. This pre-fixed pattern then allows for efficient block transfers of halo data between processors. Debugging is also easier than in a dynamic scheme. The disadvantage is that a large amount of the halo data communicated may not really be required because the wind speed may be much lower than VMAX2. The wind may also blow from the core region towards the edges of this processors domain, i.e. the departure point is within the processor's core region. The semi-Lagrangian communication tables are calculated by the subroutine SLRSET called from SLCSET within SUSC2. When LMESSP is FALSE, the distribution of grid-point space to processors is done in partitions of contiguous equally spaced latitudes, which allows the halo to be computed (in SUSC2) simply as a number of latitudes (NSLWIDE) north and/or south of each processor partition. However, when LMESSP is TRUE, the distribution of grid-point space is more general, supporting both north-south and east-west partitioning-in general a processor can have any continuous block of grid columns on the sphere (see Figs. A.6 and A.7 ). As a result, a processor's halo cannot be expressed just by NSLWIDE. In this case SLCSET is called on each processor to calculate the halo of grid-point columns required by itself, based on (VMAX2, TSTEP) and on the additional stencil requirements of the semi-Lagrangian interpolation method. Once this is done, SLRSET is called to exchange this halo information with other processors so that each processor knows what data need to be sent and received and which processors to communicate with. All this is done only once at initialization time. Then during each model time step the SL halo is constructed by calling SLCOMM to perform the message passing and SLEXTPOL to initialize those halo grid points that only require to be copied from other grid points held on the local processor. To simplify the interpolation routines, halo points are cyclically mirrored for complete latitudes in the east-west direction, and mirror-extended near the poles. 3.7.2 SLCSETSLCSET is called mainly to determine the SL halo for the local processor. This halo is described by array variables NSLSTA, NSLONL and NSLOFF, which are dimensioned by the number of latitudes that cover the halo and core region (see Fig. A.7 ) and are, briefly,
The semi-Lagrangian buffer SLBUF1 contains the variables needed for semi-Lagrangian interpolation. It has a 1-dimensional data structure representing the latitude fractions for each latitude within this processor's core plus halo region. The storage is organized from north towards south. The total size is calculated in SLCSET and called NASLB1. To improve vector efficiency and cache performance the `horizontal' collapsed dimension is the innermost loop in the semi-Lagrangian buffer. NASLB1 is just the container size and it may be increased slightly in SLCSET to avoid bank conflicts on vector machines. The second dimension represents the fields in the semi-Lagrangian buffer and will vary according to the chosen semi-Lagrangian configuration. This strategy makes it simple to add new fields to the semi-Lagrangian buffer-no changes in the message-passing routines are needed. The calculation of the halo is done as follows: For each latitude,
To aid debugging, space is also reserved on each latitude for NSLPAD grid points east and west of the halo. As these points are initialized to huge(), any attempt to use this data in an interpolation routine will result in an immediate floating point exception, which can simplify the detection of programming errors in SL interpolation routines. NSLPAD is 0 by default. Once the halo has been determined, SLCSET then initializes NSLCORE to contain the position of each core point in the SL buffer. This data structure is used during the semi-Lagrangian calculations every time step. Finally NSLEXT is initialized, which is used to simplify a SL buffer (lat, lon) offset calculation in LASCAW. This reduces an `IF TEST' (to account for phase change over poles) and a `modulo function', to a simple array access. As a result LASCAW becomes more efficient and more maintainable. 3.7.3 SLRSETSLRSET is called by SLCSET at initialization time to determine the detailed send- and receive-list information that will be used later by SLCOMM during model execution. This is achieved by a global communication where send and receive lists are exchanged in terms of global (LAT, LON) coordinates. The data structures initialized by SLRSET are as follows (see Fig. A.7 ):
3.7.4 SLCOMMSLCOMM is called at each model time step to obtain grid-point halo data from neighbouring processors. As the data volume for these communications can be very large, a strategy is used to control the amount of memory needed for the message-passing mailbox. This is done by blocking the data to be sent and controlling how many such blocks can be queued in a processors mailbox. Control is achieved by recognising that processor pairs involved in SL communication send and receive a similar amounts of data, and by only sending the next block of data to a processor when a corresponding block has been received from that processor. With this approach we avoid waiting for messages from processors that are still computing and use a probe function to detect if a message has arrived before issuing a receive for it. In this protocol, it is possible for 2 blocks to be queued at a destination processor from another processor. At initialization time (SLRSET) we compute the maximum number of processors that any processor has to communicate with (NSLPROCSMX), which is simply the maximum of NSLPROCS. Given NCOMBFLEN is the maximum number of words that we want to use at any processors mailbox and NSLPROCSMX we can compute the maximum block size by NSLMPBUFSZ = NCOMBFLEN/(3*NSLPROCSMX-1). Note that the factor 2*NSLPROCSMX corresponds to the maximum number of blocks that can be queued at a processor's mailbox for SL communication. The extra factor NSLPROCSMX-1 is needed to take into account mailbox fragmentation. Next Section Previous Section |
|||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||