A reliability diagram (Wilks,
1995) is one in which
is plotted against
for some finite binning of width
. In a perfectly reliable system
and the graph is a straight line oriented at 45o to the axes,
and
. Reliability measures the mean square distance of the graph of
to the diagonal line.
The term
on the right-hand side of (49)
ranges from 0 to 0.25. If E was either so common, or so rare, that
it either always occurred or never occurred within the sample of years studied,
then
; conversely if E occurred 50% of the time within the sample, then
. Uncertainty is a function of the climatological frequency of E, and is
not dependent on the forecasting system itself. It can be shown that the
resolution of a perfect deterministic system is equal to the uncertainty.
Fig.
13 shows two examples of reliability diagrams for the ECMWF EPS taken
over all day-6 forecasts from December 1998 - February 1999 over Europe
(cf. Fig. 8 ). The events
are
,
:- lower tropospheric temperature being at least 4oC, 8oC
greater than normal. The Brier score, Brier skill score, and Murphy decomposition
are shown on the figure.
To see why probability forecasts of
have higher Brier skill scores than probability forecasts of
, consider Eq. (57). From
Fig. 13 , whilst
is the same for both events,
is larger for
than for
. This can be seen by comparing the histograms of
in Fig. 13 which are
more highly peaked for
than for
; there is less dispersion of the probability forecasts of the more extreme
event about its climatological frequency, than the equivalent probability
forecasts of the more moderate event. This is hardly surprising; the more
extreme event
is relatively rare (its climatological frequency is
) and most of the time is forecast with probabilities which almost always
lie in the first probability category (
). In order to increase the Brier score of this relatively extreme event,
one would need to increase the ensemble size so that finer probability categories
can be reliably defined. (For example, suppose an extreme event has a climatological
probability of occurrence of
Let us suppose that we want to be able to forecast probabilities of this
event which can discriminate between probability categories with a band
width comparable with this climatological frequency, then the ensemble size
should be
.) With finer probability categories, the resolution component of the Brier
score can be expected to increase. Providing reliability is not compromised,
this will lead to higher overall skill scores.
The relative operating characteristic (ROC;
Swets, 1973; Mason
1982; Harvey et al.,
1992) is based on the forecast assumption that E will occur, providing
E is forecast by at least a fraction
of ensemble members, where the threshold
is defined a priori by the user. As discussed below, optimal
can be determined by the parameters of a simple decision model.
We illustrate in Fig.
14 the application of these measures of skill to a set of multi-model
multi-initial condition ensemble integrations made over the seasonal timescale
(Palmer et al., 2000).
The event being forecast is
:- the seasonal-mean (December-February) 850 hPa temperature anomaly will
be below normal. The global climate models used in the ensemble are the
ECMWF model, the UK Meteorological Office Unified Model, and two versions
of the French Arpège model; the integrations were made as part of
the European Union "Prediction of Climate Variations on Seasonal to Interannual
Timescales (PROVOST)". For each of these models, 9-member ensembles were
run over the boreal winter season for the period 1979-1993 using observed
specified SSTs. The values
and
have been estimated from probability bins of width 0.1. The ROC curve and
corresponding A value is shown for the 9-member ECMWF model ensemble,
and for the 36-member multi-model ensemble. It can be seen that in both
cases, A is greater than the no-skill value of 0.5; however, the
multi-model ensemble is more skilful than the ECMWF-model ensemble. Studies
have shown that the higher skill of the multi-model ensemble arises mainly
because of the larger ensemble size, but also because of a sampling of the
pdf associated with model uncertainty. Training Course Notes Front Page >> Table of contents >> Next Section >> Previous Section >>