Forecast Evaluation for Fun and Profit

 

Harold E. Brooks

NOAA/National Severe Storms Laboratory

Harold.Brooks@noaa.gov

 

 

Allan Murphy defined three kinds of goodness of forecasts:

 

1.  Consistency:  The correspondence between the forecaster’s true beliefs and the actual forecast.

2.  Quality:  The correspondence between the forecast and the observations.

3.  Value:  The incremental benefit to users because of the use of the forecasts in making decisions.

 

Weather forecasts are made by a wide variety of forecasters/systems for a wide variety of purposes for a wide variety of users.  Unfortunately, they are frequently evaluated very narrowly and unsystematically.  This diminishes their value for users, who want to improve their application of the forecasts, and for forecasters, who want to improve the quality and usability of the forecasts.

 

It is perhaps only a slight oversimplification to say that the reason that so little is done with evaluation, in many cases, is because people think it’s hard to do it “right.”  In fact, it is hard to do it completely, but it is possible to gain a great deal of insight about the forecast system without an incredible effort.

 

Fundamentally, evaluating the quality of forecasts involves trying to estimate the joint probability distribution of forecasts and events [p(f,x)].  For most problems of interest, this is very high-dimensional problem.  Summarizing the results of a forecast verification problem in one or a small number of values is throwing away most of information in the problem.

 

Measuring the value of forecasts involves knowing the user’s decision making process as well, making it even more complicated at one level.  At another level, however, learning about the decisions of the user makes forecasters better equipped to make forecasts that serve the user’s needs.  These users may be end users of forecasts, as well as human forecasters using guidance products.

 

I’ll summarize the importance of attempting to explore the dimensionality of the forecast evaluation problem, rather than just looking at a single measure.  I hope that it will provide insight into how and, more importantly, why to do take more sophisticated approaches to forecast evaluation.