Home page  
Home   Your Room   Login   Contact   Feedback   Site Map   Search:  
Discover this product  
About Us
Overview
Getting here
Committees
Products
Forecasts
Order Data
Order Software
Services
Computing
Archive
PrepIFS
Research
Modelling
Reanalysis
Seasonal
Publications
Newsletters
Manuals
Library
News&Events
Calendar
Employment
Open Tenders
   
Home > Newsevents > Training > Rcourse_notes > DATA_ASSIMILATION > ASSIM_CONCEPTS >  
   

Data assimilation concepts and methods
March 1999

By F. Bouttier and P. Courtier


1. Basic concepts in data assimilation
2. The state vector, control space and observations
3. The modelling of errors
4. Statistical interpolation with least-squares estimation
5. A simple scalar illustration of least-squares estimation
6. Models of error covariance
7. Optimal interpolation (OI) analysis
8. Three-dimensional variational analysis (3D-Var)
9. 1D-Var and other variational analysis systems
10. Four-dimensional variational assimilation (4D-Var)
11. Estimating the quality of the analysis
12. Implementation techniques
13. Dual formulation of 3D/4D-Var (PSAS)
14. The extended Kalman filter (EKF)
15. Conclusion
Appendix A. A primer on linear matrix algebra
Appendix B. Practical adjoint coding
Appendix C. Exercises
Appendix D. Main symbols
References
 
  Training Course Notes Front Page >>
Table of contents >>
Next Section >>
Previous Section >>





4 . Statistical interpolation with least-squares estimation

In this section we present the fundamental equation for linear analysis in a general algebraic form: the least squares estimation, also called Best Linear Unbiased Estimator (BLUE). The following sections will provide more explanations and illustrations, and we shall see how the least-squares estimation can be simplified to yield the most common algorithms used nowadays in meteorology and oceanography.

4.1 Notation and hypotheses

The dimension of the model state is and the dimension of the observation vector is . We will denote:
  true model state (dimension )
  background model state (dimension )
  analysis model state (dimension )
  vector of observations (dimension )
  observation operator (from dimension to )
  covariance matrix of the background errors (dimension )
  covariance matrix of observation errors (dimension )
  covariance matrix of the analysis errors (dimension )

The following hypotheses are assumed:
    Linearized observation operator: the variations of the observation operator in the vicinity of the background state are linear: for any close enough to , where is a linear operator.
    Non-trivial errors: and are positive definite matrices.
    Unbiased errors: the expectation of the background and observation errors is zero i.e.
    Uncorrelated errors: observation and background errors are mutually uncorrelated i.e.
    Linear analysis: we look for an analysis defined by corrections to the background which depend linearly on background observation departures.
    Optimal analysis: we look for an analysis state which is as close as possible to the true state in an r.m.s. sense (i.e. it is a minimum variance estimate).

ref:
Daley 1991; Lorenc 1986; Ghil 1989

4.2 Theorem: least-squares analysis equations
(a)   The optimal least-squares estimator, or BLUE analysis, is defined by the following interpolation equations:

 
(A1)


 
(A2)

  where the linear operator is called the gain, or weight matrix, of the analysis.
(a)   The analysis error covariance matrix is, for any :

 
(A3)

  If is the optimal least-squares gain, the expression becomes

 
(A4)

(a)   The BLUE analysis is equivalently obtained as a solution to the variational optimization problem:

 
(A5)

  where is called the cost function of the analysis (or misfit, or penalty function), is the background term, is the observation term.
(a)   The analysis is optimal: it is closest in an r.m.s. sense to the true state .
(b)   If the background and observation error pdfs are Gaussian, then is also the maximum likelihood estimator of .


Proof:
  With a translation of by , we can assume that so the observation operator is linear for our purposes. The equation (A1) is simply a mathematical expression of the fact that we want the analysis to depend linearly on the observation departures. The expression of in (A2) is well-defined because is a positive definite matrix, and is positive. The minimization problem (A5) is well-defined because is a convex function and is a strictly convex function (it is a quadratic form).
  The equivalence between items (a) and (c) of the theorem stems from the requirement that the gradient of is zero at the optimum :

 

  The identity with (A2) is straightforward to prove (all inverse matrices considered are positive definite):

 


 

  hence

 

  The expressions (A3) and (A4) for are obtained by rewriting the analysis equation (A1) in terms of the background, analysis and observation errors:

 

  By developing the expression of and taking its expectation, by linearity of the expectation operator one finds the general expression (A3) (remember that and being uncorrelated, their cross-covariance is zero). The simpler form (A4) is easy to derive by substituting the expression for the optimal and simplifying the terms that cancel.
  Finally to prove (A2) itself we take the analysis error covariance matrix given by (A3) and we minimize its trace, i.e. the total error variance: (note that and )

 

  This is a continuous differentiable scalar function of the coefficients of , so we can express its derivative as the first- order terms in of the difference , being an arbitrary test matrix:

 

  The last line shows that the derivative is zero for any choice of if and only if , which is equivalent to

 

  because is assumed to be invertible.

In the case of Gaussian pdfs, one can model the background, observation and analysis pdfs as follows, respectively: ( , and are normalization factors.)

 


which yields the right averages and covariances for the background and observations errors, and the analysis error pdf is simply defined as the Bayesian product of the two known sources of information, the background and the observation pdfs (this can be derived rigorously by using Bayes' theorem to write as a conditional probability of given the observations and the a priori pdf of the background). Then, by taking minus the logarithm of , one finds that the model state with the maximum probability (or likelihood) is the one that minimizes the cost function expressed in the theorem.

4.3 Comments

The hypotheses of non-triviality can always been made in well-posed analysis problems: if is non-positive, one can restrict the control space to the orthogonal of the kernel of (the analysis will not make any correction to background variables that are perfectly known). If is not a surjection, then some observations are redundant and the observing network shall be restricted to the image of . If is non-positive, the expression
(A2) for still holds (then the analysis will be equal to the observed value at the observation points ), but the variational version of the least-squares analysis cannot be used. It is even possible (with some algebraic precautions) to have some infinite eigenvalues in , i.e. a non-positive , which means that some observations are not used because their errors are infinite.

The hypothesis of unbiased errors is a difficult one in practice because there often are significant biases in the background fields (caused by biases in the forecast model) and in the observations (or in the observation operators). If the biases are known, they can be subtracted from the background and observation values, and the above algebra applies to the debiased quantities. If the biases are left in, the analysis will not be optimal, even though it will seem to reduce the biases by interpolating between the background and observations. It is important to monitor the biases in an assimilation system, e.g. by looking at averages of background departures, but it is not trivial to decide which part of these are model or observation biases. The problem of bias monitoring and removal is the subject of ongoing research.

The hypothesis of uncorrelated errors is usually justified because the causes of errors in the background and in the observations are supposed to be completely independent. However, one must be careful about observation preprocessing practices (such as satellite retrieval procedures) that use the background field in a way that biases the observations toward the background. It might reduce the apparent background departures, but it will cause the analysis to be suboptimal (too close to the background, a condition nicknamed as the incest problem).

The tangent linear hypothesis is not trivial and it is commented in the next section.

It is possible to rewrite the least-squares analysis equations in terms of the inverses of the error covariance matrices, called information matrices. It makes the algebra a bit more complicated, but it allows one to see clearly that the information contained in the analysis is the sum, in a simple sense, of the observations provided by the background and by the observations. This is illustrated in the section on the estimation of analysis quality below.

It will be shown in the section on dual algorithms (PSAS analysis) that the equations, and in particular the cost function , can be rewritten in the space of the observations . Also, it is easy to that least-squares analysis is closely related to a linear regression between model state and observations.

4.4 On the tangent linear hypothesis

The hypothesis of linearized observation operator is needed in order to derive a rigorous algebraic expression for the optimal . In practice, may not be linear, but it usually makes physical sense to linearize it in the vicinity of the background state:

 

Then, being a continuous function of , the least-squares equations for the analysis should intuitively yield a nearly optimal .

More generally, the tangent linear hypothesis on can be written as the first-order Taylor-Young formula in the vicinity of an arbitrary state and for a perturbation :

 
,

with . This hypothesis, called the tangent linear hypothesis is only acceptable if the higher-order variations of can be neglected (in particular there should be no discontinuities) for all perturbations of the model state which have the same order of magnitude as the background errors. The operator is called the differential, or first derivative, or tangent linear (TL)1 function of at point . Although this is a desirable mathematical property of , it is not enough for practical purposes, because the approximation

 

must be satisfactory, in user-defined terms, for finite values of that depend on the application considered. In the least-squares analysis problem, we need

 

for all values of that will be encountered in the analysis procedure, notably , , and also all trial values used in the minimization of if a variational analysis is performed2. Thus the important requirement is that the difference between and should be much smaller than the typical observation errors (defined by ), for all model state perturbations of size and structure consistent with typical background errors, and also with the amplitude of the analysis increments .

Thus the problem of linearizing is not just related to the observation errors themselves. It must be appreciated in terms of the errors in the background too, which in a sequential assimilation system are the previous forecast errors, which depend on the forecast range and the quality of the model. Ultimately the correctness of the linearization must be appreciated in the context of the fully integrated assimilation system. It will be easier to apply the linearization to a good system because the departures will be smaller. Conversely, the linearization may be inapplicable to difficult data assimilation problems. This is often the case with ocean models or satellite data, which means that it can be wrong to use sophisticated analysis algorithms that rely too much on the linearity of the problem.

The linearization problem can be even more acute for the linearization of the model forecast operator which is needed in 4D-Var and in the Kalman filter described below. As with the linearization of , it may or may not be licit depending on the quality of all components of the assimilation system: data coverage, observation quality, model resolution and physics, and forecast range. The user requirements and the physical properties of the system must be considered.

The non-linear analysis problem

The assumption of linear analysis is a strong one. Linear algebra is needed to derive the optimal analysis equations. One can rely on the linearization of a weakly non-linear observation operator, at the expense of optimality. The incremental method (described below for the variational analysis) performs this procedure iteratively in an empirical attempt to make the analysis more optimal. For strongly non-linear problems, there is no general and simple way to calculate the optimal analysis. The simulated annealing method can be useful; specific methods, such as the simplex, deal with variables with bounded definition domains. Finally, it is sometimes possible to make a problem more linear simply by a clever definition of model and observation variables (see the section on minimization methods).

4.5 The point of view of conditional probabilities

It is interesting to formalize the analysis problem using the conditional, or Bayesian, probabilities. Let us denote the a priori pdf (probability density function) of the model state before the observations are considered, i.e. the background pdf. Let us denote the pdf of the observations. The aim of the analysis is to find the maximum of , the conditional probability of the model state given the observations. The joint pdf of and (i.e. the probability that and occur together) is

 

i.e. it is the probability that occurs when occurs, and vice versa. The above expression is the Bayes theorem. In the analysis procedure we know that a measurement has been made and we know its value , so and we obtain

 

which means that the analysis pdf is equal to the background pdf times the observation pdf . The latter peaks at but it is not a Dirac distribution because the observations are not error-free.

The virtue of the probabilistic derivation of the analysis problem is that it can be extended to non-Gaussian probabilities (although this spoils the equivalence with the
(A2) equation for ). A practical application is done in the framework of variational quality control, where it is assumed that observation errors are not Gaussian but they contain some amount of "gross errors", i.e. there is a probability that the error is not generated by the usual Gaussian physical processes but by some more serious problem, like coding or instrumental failure. The gross errors might be modelled using a uniform pdf over a predefined interval of admissible gross errors, leading to a non-Gaussian observation pdf. When the opposite of the logarithm of this pdf is taken, the resulting observation cost function is not quadratic, but gives less weight to the observation (i.e. there is less slope) for model states that disagree strongly with the observed value.

ref: Lorenc 1986

4.6 Numerical cost of least-squares analysis

In current operational meteorological models, the dimension of the model state (or, more precisely, of the control variable space) is of the order of , and the dimension of the observation vector (the number of observed scalars) is of the order of per analysis
3. Therefore the analysis problem is mathematically underdetermined (although in some regions it might be overdetermined if the density of the observations is larger than the resolution of the model). In any practical application it is essential to keep in mind the size of the matrix operators involved in computing the analysis (Fig. 4 ). The least-squares analysis method requires in principle the specification of covariance matrices and (or their inverses in the variational form of the algorithm) which respectively contain of the order of and distinct coefficients, which are statistics to estimate (the estimation of a variance or covariance statistic converges like the square root of the number of realizations). The explicit determination of requires the inversion of a matrix of size , which has an asymptotic complexity of the order of . The exact minimization of the cost function requires, in principle, evaluations of the cost function and its gradient, assuming is quadratic and there are no numerical errors (e.g. using a conjugate gradient method).


Figure 4 . Sketches of the shapes of the matrices and vector dimensions involved in an usual analysis problem where there are many fewer observations than degrees of freedom in the model: from top to bottom, in the equations of the linear analysis, the computation of , of the term, and the computation of the cost function .

It is obvious that, except in analysis problems of very small dimension (like one-dimensional retrievals), it is impossible to compute exactly the least-squares analysis. Some approximations are necessary, they are the subject of the following sections.

4.7 Conclusion

We have seen that there are two main ways of defining the statistical analysis problem:
    either assume that the background and error covariances are known, and derive the analysis equations by requiring that the total analysis error variances are minimum,
    or assume that the background and observation error pdfs are Gaussian, and derive the analysis equations by looking for the state with the maximum probability.
Both approaches lead to two mathematically equivalent algorithms:
    the direct determination of the analysis gain matrix ,
    the minimization of a quadratic cost function.

These algorithms have very different numerical properties, and their equivalence stops as soon as some underlying hypotheses are not verified, like the linearization of the observation operator, for instance.


Training Course Notes Front Page >>
Table of contents >>
Next Section >>
Previous Section >>




1 Both qualifiers tangent and linear are needed: obviously could be linear without satisfying the Taylor formula. A function can also be tangent to another without being linear, if the difference between them is an , e.g. and are tangent to each other for .
2 Qualitatively speaking they all belong to a neighbourhood of having a shape and size which is consistent with the and error covariances.
3 At ECMWF in winter 1998 the control variable dimension was 512000, the number of active observations (per 6-hour interval) was about 150000



 

Top of page 03.12.2001
 
   Page Details         © ECMWF
shim shim shim