Home page  
Home   Your Room   Login   Contact   Feedback   Site Map   Search:  
Discover this product  
About Us
Overview
Getting here
Committees
Products
Forecasts
Order Data
Order Software
Services
Computing
Archive
PrepIFS
Research
Modelling
Reanalysis
Seasonal
Publications
Newsletters
Manuals
Library
News&Events
Calendar
Employment
Open Tenders
   
Home > Newsevents > Training > Rcourse_notes > DATA_ASSIMILATION > ASSIM_CONCEPTS >  
   

Data assimilation concepts and methods
March 1999

By F. Bouttier and P. Courtier


1. Basic concepts in data assimilation
2. The state vector, control space and observations
3. The modelling of errors
4. Statistical interpolation with least-squares estimation
5. A simple scalar illustration of least-squares estimation
6. Models of error covariance
7. Optimal interpolation (OI) analysis
8. Three-dimensional variational analysis (3D-Var)
9. 1D-Var and other variational analysis systems
10. Four-dimensional variational assimilation (4D-Var)
11. Estimating the quality of the analysis
12. Implementation techniques
13. Dual formulation of 3D/4D-Var (PSAS)
14. The extended Kalman filter (EKF)
15. Conclusion
Appendix A. A primer on linear matrix algebra
Appendix B. Practical adjoint coding
Appendix C. Exercises
Appendix D. Main symbols
References
 
  Training Course Notes Front Page >>
Table of contents >>
Next Section >>
Previous Section >>





3 . The modelling of errors

To represent the fact that there is some uncertainty in the background, the observations and in the analysis we will assume some model of the errors between these vectors and their true counterparts. The correct way to do this is to assume some probability density function, or pdf, for each kind of error. There is a sophisticated and rigorous mathematical theory of probabilities to which the reader may refer. For the more practical minds we present a simplified (and mathematically loose) explanation of pdfs in the paragraph below, using the example of background errors.

3.1 Using pdfs to represent uncertainty

Given a background field just before doing an analysis, there is one and only one vector of errors that separates it from the true state:

 


If we were able to repeat each analysis experiment a large number of times, under exactly the same conditions, but with different realizations of errors generated by unknown causes, would be different each time. We can calculate statistics such as averages, variances and histograms of frequencies of . In the limit of a very large number of realizations, we expect the statistics to converge to values which depend only on the physical processes responsible for the errors, not on any particular realization of these errors. When we do another analysis under the same conditions, we do not expect to know what will be the error , but at least we will know its statistics. The best information about the distribution of is given by the limit of the histogram when the classes are infinitely small, which is a scalar function of integral 1 called the probability density function of . From this function one can derive all statistics, including the average (or expectation) and the variances
1. A popular model of scalar pdf is the Gaussian function, which can be generalized to a multivariate pdf.

3.2 Error variables

The errors in the background and in the observations
2 are modelled as follows:
    background errors: , of average and covariances . They are the estimation errors of the background state, i.e. the difference between the background state vector and its true value. They do not include discretization errors.
    observation errors: , of average and covariances . They contain errors in the observation process (instrumental errors, because the reported value is not a perfect image of reality), errors in the design of the operator , and representativeness errors i.e. discretization errors which prevent from being a perfect image of the true state3.
    analysis errors: , of average . A measure of these errors is given by the trace of the analysis error covariance matrix ,

 
.

  They are the estimation errors of the analysis state, which is what we want to minimize.

The averages of errors are called biases and they are the sign of a systematic problem in the assimilating system: a model drift, or a bias in the observations, or a systematic error in the way they are used.

It is important to understand the algebraic nature of the statistics. Biases are vectors of the same kind as the model state or observation vectors, so their interpretation is straightforward. Linear transforms that are applied to model state or observation vectors (such as spectral transforms) can be applied to bias vectors.

3.3 Using error covariances

Error covariances are more subtle and we will illustrate this with the background errors (all remarks apply to observation errors too). In a scalar system, the background error covariance is simply the variance, i.e. the root-mean-square (or r.m.s., or quadratic) average of departures from the mean:

 

In a multidimensional system, the covariances are a square symmetric matrix. If the model state vector has dimension , then the covariances are an matrix. The diagonal of the matrix contain variances4, for each variable of the model; the off-diagonal terms are cross-covariances between each pair of variables of the model. The matrix is positive5. Unless some variances are zero, which happens only in the rather special case where one believes some features are perfect in the background, the error covariance matrix is positive definite. For instance if the model state is tri-dimensional, and the background errors (minus their average) are denoted , then

 

The off-diagonal terms can be transformed into error correlations (if the corresponding variances are non zero):

 


Finally, linear transformations of the model state vector can only be applied to covariances as full matrix transforms. In particular, it is not possible to directly transform the fields of variances or standard deviations. If one defines a linear transformation by a matrix (i.e. a matrix whose lines are the coordinates of the new basis vectors in terms of the old ones, so that the new coordinates of the transform of are ), then the covariance matrix in terms of the new variables is .

3.4 Estimating statistics in practice

The error statistics (biases and covariances) are functions of the physical processes governing the meteorological situation and the observing network. They also depend on our a priori knowledge of the errors. Error variances in particular reflect our uncertainty in features of the background or the observations. In general, the only way to estimate statistics is to assume that they are stationary over a period of time and uniform over a domain
6 so that one can take a number of error realizations and make empirical statistics. This is in a sense a climatology of errors. Another empirical way to specify error statistics is to take them to be a fraction of the climatological statistics of the fields themselves.

When setting up an assimilation system in practice, such approximations are unavoidable because it is very difficult to gather accurate data to calibrate statistics: estimation errors cannot be observed directly. Some useful information on the average values of the statistics can be gathered from diagnostics of an existing data assimilation system using the observational method (see its description below) and the NMC method (use of forecast differences as surrogates to short-range forecast errors). More detailed, flow-dependent forecast error covariances can be estimated directly from a Kalman filter (described below), although this algorithm raises other problems. Finally, meteorological common sense can be used to specify error statistics, to the extent that they reflect our a priori knowledge of the physical processes responsible for the errors
7.

ref:
Hollingsworth et al. 1986; Parrish and Derber 1992


Training Course Notes Front Page >>
Table of contents >>
Next Section >>
Previous Section >>




1 Mathematically speaking, a pdf may not have an average or variances, but in the usual geophysical problems all pdfs do, and we will assume this throughout this presentation.
2 One could model forecast errors and balance properties in a similar way, although this is outside the scope of this discussion. See the section on the Kalman filter.
3 An example is sharp temperature inversions in the vertical. They can be fairly well observed using a radiosonde, but it is impossible to represent them precisely with the current vertical resolution of atmospheric models. On the other hand, temperature soundings obtained from satellite cannot themselves observe sharp inversions.
4 The square roots of variances are called standard deviations, or standard errors.
5 This does not mean that all the matrix elements are positive; the definition of a positive definite matrix is given in
Appendix A. The positiveness can be proven by remarking that the eigenvalues of the matrix are the variances in the direction of the eigenvectors, and thus are positive.
6 It is called an assumption of ergodicity.
7 It is obvious that e.g. forecast errors in a tropical meteorological assimilation shall be increased in the vicinity of reported tropical cyclones, for instance, or that observation operators for satellite radiances have more errors in cloudy areas.



 

Top of page 03.12.2001
 
   Page Details         © ECMWF
shim shim shim