Home page  
Home   Your Room   Login   Contact   Feedback   Site Map   Search:  
Discover this product  
About Us
Overview
Getting here
Committees
Products
Forecasts
Order Data
Order Software
Services
Computing
Archive
PrepIFS
Research
Modelling
Reanalysis
Seasonal
Publications
Newsletters
Manuals
Library
News&Events
Calendar
Employment
Open Tenders
   
Home > Research > Ifsdocs > TECHNICAL >  
   


IFS documentation front page


Table of contents




Chapter 1. Technical overview

Chapter 2. FULL-POS post-processing and interpolation

Chapter 3. Parallel implementation

REFERENCES


 
  Next Section
Previous Section


3.7 Semi-Lagrangian calculation




3.7.1 Introduction




The semi-Lagrangian calculation in the IFS consists of two parts: the computation of a trajectory from a grid point backwards in time to determine the departure point; and the interpolation of various quantities at the departure point and at the mid point of the trajectory.


The issue for distributed memory computer systems is that both these parts require access to grid-point data held on neighbouring processors. For shared memory systems this issue can be resolved by processors simply accessing data in the shared memory while, for distributed-memory systems, message passing is required to obtain these data.


The grid-column data that could potentially be required on a processor from neighbouring processors (called the halo) is, in the present implementation, calculated assuming a conservative estimate of the global maximum wind likely to be encountered (VMAX2 (m sec-1)) and the model time step TSTEP (s). Typical values for VMAX2 are 150-200 m sec-1. The advantage of using a fixed (large) VMAX2 is that semi-Lagrangian communication tables can be calculated once and for all during the setup phase after the distribution of grid columns to processors has been defined. This pre-fixed pattern then allows for efficient block transfers of halo data between processors. Debugging is also easier than in a dynamic scheme. The disadvantage is that a large amount of the halo data communicated may not really be required because the wind speed may be much lower than VMAX2. The wind may also blow from the core region towards the edges of this processors domain, i.e. the departure point is within the processor's core region. The semi-Lagrangian communication tables are calculated by the subroutine SLRSET called from SLCSET within SUSC2.


When LMESSP is FALSE, the distribution of grid-point space to processors is done in partitions of contiguous equally spaced latitudes, which allows the halo to be computed (in SUSC2) simply as a number of latitudes (NSLWIDE) north and/or south of each processor partition.


However, when LMESSP is TRUE, the distribution of grid-point space is more general, supporting both north-south and east-west partitioning-in general a processor can have any continuous block of grid columns on the sphere (see Figs. A.6 and A.7 ). As a result, a processor's halo cannot be expressed just by NSLWIDE. In this case SLCSET is called on each processor to calculate the halo of grid-point columns required by itself, based on (VMAX2, TSTEP) and on the additional stencil requirements of the semi-Lagrangian interpolation method. Once this is done, SLRSET is called to exchange this halo information with other processors so that each processor knows what data need to be sent and received and which processors to communicate with. All this is done only once at initialization time.


Then during each model time step the SL halo is constructed by calling SLCOMM to perform the message passing and SLEXTPOL to initialize those halo grid points that only require to be copied from other grid points held on the local processor. To simplify the interpolation routines, halo points are cyclically mirrored for complete latitudes in the east-west direction, and mirror-extended near the poles.


3.7.2 SLCSET




SLCSET is called mainly to determine the SL halo for the local processor. This halo is described by array variables NSLSTA, NSLONL and NSLOFF, which are dimensioned by the number of latitudes that cover the halo and core region (see Fig. A.7 ) and are, briefly,
  NSLSTA(JN)-starting (most westerly) grid point (relative to Greenwich) for halo on relative latitude JN (is negative if the area starts west of Greenwich)
  NSLONL(JN)-number of halo and core (i.e. belonging to this processor) grid points on relative latitude JN
  NSLOFF(JN)-offset from beginning of SL buffer to first halo grid point on relative latitude JN


The semi-Lagrangian buffer SLBUF1 contains the variables needed for semi-Lagrangian interpolation. It has a 1-dimensional data structure representing the latitude fractions for each latitude within this processor's core plus halo region. The storage is organized from north towards south. The total size is calculated in SLCSET and called NASLB1. To improve vector efficiency and cache performance the `horizontal' collapsed dimension is the innermost loop in the semi-Lagrangian buffer. NASLB1 is just the container size and it may be increased slightly in SLCSET to avoid bank conflicts on vector machines. The second dimension represents the fields in the semi-Lagrangian buffer and will vary according to the chosen semi-Lagrangian configuration. This strategy makes it simple to add new fields to the semi-Lagrangian buffer-no changes in the message-passing routines are needed.


The calculation of the halo is done as follows:


For each latitude,
1)   the minimum (i.e. most westerly) and maximum (i.e. most easterly) angles on the sphere are determined for the local processor's core region by considering NSLWIDE latitudes to the north and south.
2)   the angular distance a particle can travel on the sphere (given the maximum wind speed VMAX2 and timestep TSTEP) is then subtracted and added respectively from the above minimum and maximum angles.
3)   the angular distances are converted to grid points and at the same time a further grid point is added to satisfy the requirements of the interpolation method used. For more complex interpolation methods more points are required.
4)   NSLSTA, NSLONL and NSLOFF are then updated for this latitude, such that the number of grid points required for the halo and core region is never greater than the number of grid points on the whole latitude plus the extra points (IPERIOD) required for the interpolation. In addition, the NSLWIDE latitudes at the north and south poles are forced to require full latitudes to simplify the design.


To aid debugging, space is also reserved on each latitude for NSLPAD grid points east and west of the halo. As these points are initialized to huge(), any attempt to use this data in an interpolation routine will result in an immediate floating point exception, which can simplify the detection of programming errors in SL interpolation routines. NSLPAD is 0 by default.


Once the halo has been determined, SLCSET then initializes NSLCORE to contain the position of each core point in the SL buffer. This data structure is used during the semi-Lagrangian calculations every time step.


Finally NSLEXT is initialized, which is used to simplify a SL buffer (lat, lon) offset calculation in LASCAW. This reduces an `IF TEST' (to account for phase change over poles) and a `modulo function', to a simple array access. As a result LASCAW becomes more efficient and more maintainable.


3.7.3 SLRSET




SLRSET is called by SLCSET at initialization time to determine the detailed send- and receive-list information that will be used later by SLCOMM during model execution. This is achieved by a global communication where send and receive lists are exchanged in terms of global (LAT, LON) coordinates.


The data structures initialized by SLRSET are as follows (see Fig. A.7 ):
  NSLPROCS is a scalar which defines the number of processors that the local processor has to communicate with during SL halo communication.
  Array NSLCOMM contains the list of processors that the local processor has to communicate with, and is dimensioned 1: NSLPROCS.
  Arrays NSENDNUM, NRECVNUM contain the number of send and receive (lat, lon) pairs that the local processor has to communicate. The difference between elements (N) and (N+1) contain the number of entries that apply to processor N.
  Arrays NRLSTLAT, NRLSTLON describe the global latitude and longitude of the grid-point columns to be received during SL halo communication. Columns to be received from processor N start at entry NRECVNUM(N) in these arrays.
  Arrays NSLSTLAT, NSLSTLON describe the global latitude and longitude of the grid-point columns to be sent during SL halo communication. Columns to be sent to processor N start at entry NSENDNUM(N) in these arrays.


3.7.4 SLCOMM




SLCOMM is called at each model time step to obtain grid-point halo data from neighbouring processors. As the data volume for these communications can be very large, a strategy is used to control the amount of memory needed for the message-passing mailbox. This is done by blocking the data to be sent and controlling how many such blocks can be queued in a processors mailbox. Control is achieved by recognising that processor pairs involved in SL communication send and receive a similar amounts of data, and by only sending the next block of data to a processor when a corresponding block has been received from that processor. With this approach we avoid waiting for messages from processors that are still computing and use a probe function to detect if a message has arrived before issuing a receive for it. In this protocol, it is possible for 2 blocks to be queued at a destination processor from another processor. At initialization time (SLRSET) we compute the maximum number of processors that any processor has to communicate with (NSLPROCSMX), which is simply the maximum of NSLPROCS. Given NCOMBFLEN is the maximum number of words that we want to use at any processors mailbox and NSLPROCSMX we can compute the maximum block size by NSLMPBUFSZ = NCOMBFLEN/(3*NSLPROCSMX-1). Note that the factor 2*NSLPROCSMX corresponds to the maximum number of blocks that can be queued at a processor's mailbox for SL communication. The extra factor NSLPROCSMX-1 is needed to take into account mailbox fragmentation.


Next Section
Previous Section



 

Top of page 23.04.2002
 
   Page Details         © ECMWF
shim shim shim