|
|
|
Training Course Notes Front Page >>
Table of contents >>
Next Section >>
Previous Section >>
2 . The observation screening
The ECMWF 3D/4D-Var data assimilation system makes use
of an incremental minimization scheme to reduce the computational cost.
The variational data assimilation starts with the first (high resolution)
trajectory run. During this run the model counterparts for all the observations
are calculated through the non-linear observation operators. As soon as
these background departures are available for observations, the screening
can be performed. Options for 3D- and 4D-screening are available. 3D-screening
time window extends over the whole assimilation time window (currently six
hours), whereas in 4D-screening the assimilation time window is partitioned
into one hour time slots where the screening decisions are taken independently
of the other time slots.
2.1 Screening of conventional observations
2.1 (a) Preliminary checks of observations
The observation screening begins with a preliminary check
of the completeness of the reports. For instance, the observation and background
errors should not be missing, as otherwise the background quality control
cannot be performed. Also the reporting practice for synop and temp mass
observations (surface pressure and geopotential height) is checked.
Next the observations are scanned through for blacklisting.
The blacklist consist formally of two parts. First, the selection of variables
for assimilation is done using the data selection part of the blacklist
file. This controls which observation types, variables, vertical ranges
etc. will be selected for the assimilation. Some more complicated decisions
are also performed through the data selection file. For instance, an orographic
rejection limit is applied in the case of the observation being too deep
inside the model orography. This part of the blacklist also provides a handy
tool for experimentation. Second, a monthly monitoring blacklist is applied
for discarding the stations that have recently been reporting in an excessively
noisy or biased manner as compared with the ECMWF background field.
2.1 (b) Background quality control
The background quality control is performed for all the
variables that are intended to be used in the assimilation. The procedure
is as follows. The variance of the background departure can be estimated as a sum of observation and background
error variances , assuming that the observation and
the background errors are uncorrelated. After normalizing with , the estimate of variance for the normalized departure
is given by . In the background quality control, the square of normalized background
departure is considered as suspect when it exceeds its expected variance
more than by a predefined multiple. For the wind observations, the background
quality control is performed simultaneously for both wind components. There
is also a background quality control for the observed wind direction. For
the scatt winds, a test for high wind speeds and cold SST (possible sea-ice)
is applied. An example of the background quality control rejections is given
in Fig. 1 . It shows that
the background quality control effectively cuts off the tails of observation
minus background departure distribution.
Figure 1 . An example of a histogram
of background departures for airep temperature observations. Variational
and background quality control rejections are denoted by filled and outlined
columns, respectively.
2.1 (c) Vertical consistency of multi-level
reports
The multi-level reports are checked for the vertical consistency
and the duplicated levels are removed from the reports. The vertical consistency
check of multi-level reports is applied in such a way that if four consecutive
layers are found to be of suspicious quality, then these layers are rejected,
and in the case of geopotential observations also all the layers above these
four are rejected.
2.1 (d) Removal of duplicated reports
The removal of duplicated reports is performed by searching
pairs of co-located reports of the same observation types and then checking
the content of these reports. It may, for instance, happen that an airep
report is duplicated having only a slightly different station identifier
but the observed variables inside these reports are exactly the same ones,
or partially duplicated. The pair-wise checking of duplicates results in
a rejection of some or all of the content of one of the reports.
2.1 (e) Redundancy check
The redundancy check of the reports, together with the
level selection of multi-level reports, is performed next for the active
reports that are co-located and originate from the same station. For land
synop and paob reports, the report closest to the centre of the screening
time window with most active data is retained whereas the other reports
from that station are considered as redundant and are therefore rejected
from the assimilation. For ship synop and dribu observations the redundancy
check is done in a slightly modified fashion. These observations are considered
as potentially redundant if the moving platforms are within a circle with
a radius of one degree latitude. Also in this case only the report closest
to the centre of the screening time window with most active data is retained.
All the data from the multi-level temp and pilot reports from same station
are considered at the same time in the redundancy check. The principle is
to retain the best quality data at the significant levels (i.e. the turning
points of the sounding) and closest to the centre of the screening time
window. One such datum will however only be retained in one of the reports.
A wind observation, for instance, from a sounding station may therefore
be retained either in a temp or in a pilot report, depending on which one
happens to be of a better quality. A synop mass observation, if made at
the same time and at the same station as the temp report, is redundant if
there are any temp geopotential height observations that are no more than
50hPa above the synop mass observation.
2.1 (f) Thinning
Finally, a horizontal thinning is performed for the airep
and TOVS reports. The horizontal thinning of reports means that a predefined
minimum horizontal distance between the nearby reports from the same platform
is enforced. For airep reports the free distance between reports is currently
enforced to about 125 km. The thinning of the airep data is performed with
respect to one airliner at a time. Reports from different airliners may
however be very close to each other. In this removal of redundant reports
the best quality data is retained as the preceding quality control is taken
into account. In vertical, the thinning is performed for layers around standard
pressure levels thus allowing more reports for ascending and descending
flight paths. Thinning of TOVS reports is done at two stages. First a minimum
distance of about 70 km is enforced, and thereafter a repeated scan is performed
to achieve the final separation of roughly 250 km between reports from one
platform. The thinning algorithm is the same as used for aireps but in case
of TOVS reports a different preference order is applied: a sea sounding
is preferred over a land one, a clear sounding is preferred over a cloudy
one and finally, the closeness of observation time to centre of the screening
time window is preferred. Fig. 2 gives an example of the over-all
usage of TOVS reports. There is also an option for further thinning of SSM/I
and satob observations within the IFS.
Figure 2 . The usage of TOVS reports
in the assimilation on the North Eastern Atlantic. Filled rings mark reports
contain one or more channels used in the assimilation, whereas the empty
rings denote rejected reports. Most of the rejections are due to the horizontal
thinning and much less due to the quality reasons. Note that both edges
of the swath are rejected.
The effect of observation screening on synop surface pressure
observations is summarized in Fig. 3 in the case of 3D-Var and 4D-Var,
demonstrating the potential of 4D-Var in using observations from frequently
reporting stations.
Figure 3 . The effect of the observation
screening on synop surface pressure observations. Column height gives the
number of observations available, while the shaded part displays those actually
used in the assimilation. (a) 4D-screening for 4D-Var, and (b) 3D-screening
for 3D/4D-Var
2.2 Screening of satellite radiances
The TOVS radiances (currently 120 km resolution) are preprocessed
in a dedicated module which performs several functions to allow the assimilation
of TOVS radiances in 4D-Var (the NESDIS retrievals are not used in 4D-Var
but only monitored with the background profiles). This module is called
advar and it is called for each TOVS observation with the model background
temperature, specific humidity and ozone profiles and surface parameters
interpolated to the location of the observations. For each analysis cycle
there are typically 20,000 TOVS observations in total, for a dual polar
orbiter system. In the screening run, advar is called twice.
2.2 (a) Input
The fast radiative transfer model for TOVS radiances requires
an input profile from 1000 to 0.1 hPa. For the current 31 level model the
background profiles are only available up to 10 hPa and so an extrapolation
has to be performed up to 0.1 hPa for temperature using the NESDIS retrievals
to 1 hPa and then a simple extrapolation based on model atmospheres above
this level. Climatological mean profiles are assumed for water vapour and
ozone. For the next version of the ECMWF forecast model with levels in the
stratosphere this extrapolation is not necessary any more. Once the full
profile from 1000 to 0.1 hPa is defined and checked radiative transfer model
is called to compute the background radiances from the background profiles.
2.2 (b) Quality control
Several quality checks are applied to the measured and
background radiances. The gross checks applied are:
|
(i) Check that the background
profile is within realistic limits (e.g. temperature in range 150
to 350 K, specific humidity positive and not supersaturated, ozone
within climatological extremes). |
|
(ii) Check that the measured and
background brightness temperatures are present for all required channels
and within the range 150 to 350 K. |
A series of more critical tests are then applied:
|
(i) Gross background check (i.e.
measured radiance departures from the background are less than 20
K). |
|
(ii) The background temperature,
specific humidity and ozone profiles are checked to make sure they
are close to or within the range encompassed by the diverse 32 (or
35 for ozone) profile dataset for which the radiative transfer model
is valid. |
|
(iii) A fine background check
where the square of the radiance departures are flagged if they are
greater than . |
|
(iv) A check for cloud contamination
for the HIRS channels is included by checking the radiance departure
for HIRS channel 10 is inside the range -4 to +8 K. |
|
(v) Radiances at the two extreme
edge positions of the swath are flagged at present and not used in
4D- Var. |
|
(vi) Checks are also made that
the bias correction coefficients, satellite id, and scan position
are all valid before proceeding. |
2.2 (c) Retrieval
The main task for advar is to perform a 1D-Var retrieval
of temperature, water vapour and ozone profiles. Each radiance profile is
assigned to be clear, partly cloudy or cloudy by NESDIS and different TOVS
channels and observation errors are used for each type. The background error
covariances are also specified in a file and for temperature are
close to the global mean background errors assumed in 4D-Var. For specific
humidity the background errors assumed in 1D-Var follow the same formulation
as in 4D-Var and the correlations are the same as in 4D-Var.
The minimisation of the cost function is performed using
the method of Newtonian iteration and up to 5 iterations are allowed before
the minimisation fails. If the cost function of the observed radiance in
any of the channels exceeds a predefined threshold then the set of radiances
is indicated as inconsistent. The output of 1D-Var includes background and
retrieved temperature, water vapour and ozone profiles together with several
retrieved surface parameters also included in the 1D-Var control vector.
A final check on the stability of the retrieved profile
is provided in the code but not implemented as the profiles are not used
in 4D-Var.
2.2 (d) SSM/I radiances
SSM/I radiances are also screened in a similar module which
performs a similar set of functions to advar retrieving total column water
vapour, surface wind speed and cloud liquid water path. At the time of writing
the SSM/I radiances are used operationally only in a passive mode enabling
a full scale performance monitoring.
2.2 (e) Scatterometer processing
A horizontal thinning is performed for the ERS scatterometer
reports with respect to the particular measurement geometry of the instrument.
The backscatter data are acquired within individual cells related to a 450
km wide grid with a mesh of 25 km in the across and along track directions.
19 measurement nodes are thus defined across the scatterometer's swath,
while 19 rows are also considered in the along track direction to gather
the data in squares of 19 by 19 points. The thinning is then achieved by
keeping only every fourth point within these squares. The data are thus
used at a resolution of 100 km instead of the original 25 km sampling distance.
Apart from the thinning, the other observation dependent
decisions involved by the screening of the scatt data come essentially from
the application of a sea-ice contamination test from the model sea surface
temperature analysis, using a minimum threshold of 273 K, and a high wind
rejection test with an upper wind speed limit set to 25 m/s for the higher
of the scatt and background winds.
An extra quality control is done on the wind retrieval
residual or so-called "normalized distance to the cone". This quantity is
tested in global average over the six hours of the analysis cycle for each
of the 19 measurement nodes across the swath. All the data are then rejected
in bulk if an excessive value is found for any node (more than 1.3 times
the expected average) whereas the number of data taken into account is judged
significant (more than 500). While the first check performed locally aims
at avoiding geophysical effects not explained by the transfer function (cmod4),
for example rain or sea-state effects in the vicinity of deep lows, this
global quality control on distance to the cone allows to detect technical
anomalies not reported in real time by ESA and likely to affect the measurements
in a correlated way and at larger scales. Such anomalies occur typically
in the case of orbital manoeuvres.
2.3 A summary of the current use of
observations
A summary of the current status of use of observations in the 4D-Var data
assimilation is given in Table 1 below.
2.4 Compression of the CMA-file
After the observation screening roughly 15% of all the
observed data are active and the compressed observation array for the minimization
run only contains those data. That large compression rate is mainly driven
by the number of TOVS data as after the screening there are only 10-20%
of the TOVS reports left, whereas for the conventional observations the
figure is around 40%. As a part of the compression, the observations are
resorted among the processors for the minimization job in order to achieve
a more optimal load balancing of the parallel computer.
2.5 A massively parallel computing environment
The migration of operational codes at the ECMWF in 1996
to support a massively parallel computing environment set a requirement
for reproducibility. The observation screening should result in exactly
the same selection of observations when different number of processors are
used for the computations. In the observation screening there are the two
basic types of decisions to be made. Independent decisions, on one hand,
are those where no information of any other observations or decisions is
needed. In a parallel computing environment these decisions can be happily
made at different processors fully in parallel. For dependent decisions,
on the other hand, a global view of the observations is needed which implies
that some communication between the processors is required. The observation
array is however far too large to be copied for each individual processor.
Therefore, the implementation of observation screening at the ECMWF is such
that only a minimum necessary information of the reports is globally communicated
in order to provide the global view to the observations needed for the dependent
decisions.
The global view of the observations is provided in the
form of a global "time-location" array for selected observation types. This
array contains compact information of the reports that are still active
at this stage. For instance, the observation time, location and station
identifier as well as the owner processor of that report are included. The
time-location array is composed at each processor locally and then collected
for merging and redistributed for each processor. After the redistribution
the array is sorted locally at the processors according to the unique sequence
number. Every processor has thus exactly the same information to start with
and the dependent decisions can be performed in a reproducible manner independently
of the computer configuration.
The time-location array is just enough for all the dependent
decisions, except for the redundancy checking of the multi-level temp and
pilot reports. This is a special case in the sense that the information
of each and every observed variable and from each level is needed. This
actually means that the whole multi-level report has to be communicated.
The other way out of this would be to force the observation clusters of
the multi-level reports always into one processor without splitting them.
In that case codes responsible for creation of the observation arrays for
assimilation should ensure that geographical integrity of the observation
arrays distributed for processors. This is, however, not possible in all
the cases, and the observation screening has to be able to cope with this.
Currently, it is coded in such a way that only a limited number of multi-level
temp and pilot reports, based on the time-location array, are communicated
between the appropriate processors as copies of these common stations.
Training Course Notes Front Page >>
Table of contents >>
Next Section >>
Previous Section >>
|