The primary goal of the PURE Research Programme is to improve the assessment and quantification of uncertainty and risk in natural hazards; this to benefit scientific progress, risk management and decision-making. High windspeed is an important natural hazard for the UK and northwest Europe as evidenced by the series of damaging storms which struck these regions - and Britain in particular – during the winter of 2013/14. Ensemble forecasts made by operational numerical weather prediction (NWP) models are being used increasingly to quantify the likelihood that a given storm event, storm loss or high windspeed will occur. But how well do ensemble forecasts of windspeed (through their spread of outcomes) represent the true forecast uncertainty at a given lead time? Have ensemble forecasts of windspeed been calibrated well enough to provide accurate probabilistic information? These questions are relevant to all user applications of ensemble wind speed forecasts including in wind power generation, storm loss prediction and storm surge prediction. They are also basic to the main PURE objective of offering improved assessment of uncertainty.
The above questions are being examined within the RACER consortium of PURE by myself and Dr Adam Lea (both UCL Department of Space and Climate Physics) and by Professor Richard Chandler (UCL Department of Statistical Science). We are assessing ensemble forecasts of European windspeed made by eight state-of-the-art numerical weather prediction models archived in the TIGGE (Thorpex Interactive Grand Global Ensemble) database. These models possess between 14 and 50 ensemble members. Our assessment is made for the European region 35N-65N, 10W-30E. We consider ensemble predictions out to 10 days lead with updates either every 6 hrs or 12 hrs, and have included all forecasts between March 2007 and February 2014. Two different re-analysis datasets are employed for verification: ERA-Interim at a lat/long resolution of 0.75˚ x 0.75˚ and NASA-MERRA at a lat/long resolution of 0.5˚ x 0.667˚.
We employ two tools - verification rank (VR) histograms and the Reliability Index - to assess the calibration of ensemble predictions. VR histograms plot the distribution of the ranks of the verifications when pooled within the ordered ensemble predictions. A well calibrated ensemble forecast will produce a uniform flat VR histogram. The Reliability Index (RI) is used to quantify the deviation of VR histograms from uniformity; a small RI shows good calibration while a large RI shows that uncertainty is poorly represented. Maps of RI have been made by grid cell across Europe for each forecast model, for four different lead times (24, 72, 144 and 240 hrs), for each season (annual, winter, spring, summer and autumn), and for both ERA-Interim and NASA-MERRA used as the verification.
Figure 1. Verification rank histogram of ensemble wind speed forecasts from a well-known NWP model for a grid cell centred 100km southeast of Paris.
Figure 2: Reliability Index map for Europe quantifying how well uncertainty is represented in ensemble forecasts of windspeed at a lead of 72 hours made by the same well-known NWP model employed in Figure 1. The higher the RI value the worse the representation of uncertainty.
Examples which typify how well we find ensemble forecasts of windspeed represent uncertainty are shown in Figures 1 and 2. These examples employ forecasts from a well- known global NWP model having 20 ensemble members. They refer to a forecast lead-time of 72 hours and include 10,200 forecasts made over six years. ERA-Interim data are used for verification. Figure 1 displays a VR histogram for a verification location near Paris. The figure shows that for this grid cell 30% of all windspeed forecasts (the percentage sum of the occurrence ranks 1 and 21) verified outside the ensemble spread. Clearly the ensemble spread of forecast windspeed is under-representing the true forecast uncertainty by a sizeable margin. Figure 2 uses the Reliability Index to map graphically across Europe how well forecast uncertainty is represented at a lead of 72 hours. The RI value for the near-Paris location used in Figure 1 is 35-40. Much of Europe has RI values comparable to or greater than 35-40 indicating that ensemble forecasts of windspeed made with this leading model are poorly calibrated over most of the continent. Only over sea areas is the RI value lower (at typically 20-25).
Our findings show that leading ensemble forecasts of European windspeed often represent uncertainty poorly. In general the mis-calibration is worst at shorter lead times and improves at longer forecast lead times. Our findings are repeatable across Europe using different re-analysis verification datasets and are consistent across seasons. Although there are differences in performance between NWP models with some a lot better/worse than others one has to conclude, regrettably, that for most land areas across Europe current ensemble forecasts of windspeed are mostly not well calibrated. The probabilistic information which they provide is likely to be erroneous and inaccurate for users. An important goal therefore to which PURE can contribute will be to improve the calibration of ensemble wind speed forecasts.