M. Statistics applications and forecast verification

[Background] [Workshop]
[New verification approaches for precipitation and convective weather]
[Detecting inhomogeneities in precipitation observations]

1. Background

The RAP Verification Group continued to provide ongoing independent verification of improved forecasting systems for aviation weather developed NCAR and other laboratories. RAP works closely with other verification groups [e.g., the Real-Time Verification System group at NOAA's Forecast Systems Laboratory (FSL)] to evaluate the forecasting capabilities of experimental products and products being considered for operational use. A major study in 2002 involved evaluation of the Integrated Turbulence Forecasting Algorithm, which is going through the NWS and FAA approval process.

Because aviation weather forecasts are presented in varying formats and frequencies, and because the phenomena of concern (e.g., icing, turbulence) can be difficult to observe, the Verification Group put a great deal of effort toward developing methods and learning how to use the available observations appropriately. In addition to these concerns, it often is difficult to find meaningful verification measures that provide useful information for forecast users and developers. Many of the verification issues are pervasive in meteorological forecasting and have become more important as forecast grids have become finer in scale. To help cope with these issues, the RAP Verification Group co-hosted a workshop on verification, titled "Making Verification More Meaningful," at which many of these issues were discussed. In addition, the Verification Group continues to work on development of improved verification approaches for convective and precipitation forecasts. The workshop and development of an object-based verification approach for convective/precipitation forecasts are the subjects of the following two subsections.

B. Brown and T. Fowler also are involved in a different type of application of statistics to atmospheric sciences, in collaboration with E. Tollerud at FSL. Their study concerns development of an approach to identify changes (i.e., inhomogeneities) in precipitation observations. Because daily observations are used in many applications (some of which are economically-sensitive), it is important to alert users if characteristics of the observations change due to an un-reported change in station location, growth of vegetation around the precipitation gauge, or some other factor. The third sub-section considers ongoing work on this study.

2. Workshop on "Making Verification More Meaningful"

This workshop was conceived as a way to bring together verification experts who have similar problems (e.g., difficult observations, gridded forecasts, a need for operationally-meaningful metrics), to provide opportunities for discussion of specific issues, and to develop new collaborations. The workshop was organized by Barbara Brown, Tressa Fowler, and Agnes Takacs, all of RAP, in collaboration with Jennifer Mahoney of FSL. In addition, members of the RAP verification Group (J. Braid, R. Bullock, and M. Chapman) and RAP administrative staff (I. Gallo and C. Park) facilitated the workshop preparations and event.




Figure 1. Attendees at the Workshop on Making Verification More Meaningful.

The workshop proved to be more popular than anticipated, with approximately 90 participants, including meteorologists, hydrologists, statisticians, mathematicians, researchers, and operational staff members from several countries, from weather services, universities, and research institutes (Figure 1). The workshop included several components: invited speakers, contributed talks, a poster session, working group meetings and reports, and a panel discussion. These components focused on three general themes: User and Operational Issues; Scaling and Observations; and Advanced Methods (including ensemble methods). Most of the individual and working group presentations are available on the workshop web page (http://www.rap.ucar.edu/research/verification/ver_wkshp1.html), along with the workshop program. A summary report on the workshop also is available. A number of conclusions can be drawn from the workshop presentations and discussions:

  • Users of verification information need to be taken into consideration when designing verification approaches and measures - each user may need a particular kind of information about forecast quality, which is likely to differ from other users' needs.
  • Operationally-relevant metrics are needed along with meteorologically-relevant metrics; each type serves a different purpose.
  • Scale issues (e.g., observation scale vs. forecast scale) must be taken into account in verification studies. Scale separation approaches are available and should be applied.
  • Current verification methods for spatial forecasts only provide limited information about the quality of these forecasts; new object- or field-based approaches show promise for providing more useful information.
  • Observational uncertainty limits how well we can measure forecast quality; ideally, observational uncertainty should be taken into account in verification studies, but this is very difficult to actually accomplish and should be a subject of research.
  • Additional educational opportunities regarding statistics and verification should be made available, through atmospheric science curriculums, short courses, and web-based material. Future workshops on this topic would be desirable and were requested by many of the attendees.

3. New verification approaches for precipitation and convective weather

Standard approaches for verification of forecasts of convection and precipitation generally have relied on overlaying grids of observations and grids of forecasts; the individual grid values for the two fields are compared and statistics such as the POD (Probability of Detection), FAR (False Alarm Ratio) and CSI (Critical Success Index) are computed. Unfortunately these measures do not provide useful information for improvement of the forecasts. Moreover, these measures can unfairly penalize forecasts that should be considered "good" (e.g., a forecast area located adjacent to the observed area has no skill according to these measures).

In response to concerns about the limitations of standard verification approaches for verification of convective and precipitation forecasts, B. Brown, R. Bullock, and C. Mueller of RAP, along with C. Davis (MMM), K. Manning (MMM), and R. Morss (MMM and ESIG) are developing an object-based approach for these evaluations. The goals of this project are to develop and test new approaches for verification of convective and precipitation forecasts; characterize precipitation/convective regions in a "natural" way; tie the verification method development to user studies; and apply the
approaches developed to nowcasts and NWP forecasts.

The proposed approach is an adaptable method that is based on attributes of precipitation objects/shapes and their associated precipitation values. It will provide the capability to answer a variety of questions about the forecasts, observations, and their relationship. Specifically, the approach involves several steps: (1) define the relevant precipitation/convective objects and shapes; (2) diagnose errors in the location, shape, orientation, size, timing, etc. of the forecasts; and (3) characterize basic attributes of the precipitation/convection within the objects (e.g., intensity, density, etc.). In parallel, R. Morss is investigating users' needs for precipitation information and information about precipitation forecast quality, through interviews with flood control managers, emergency managers, and water resource managers in the Colorado Front Range.

One approach for identifying a forecast region of interest is shown in Figure 2. In this approach, the forecast region is "smoothed" using a convolving disk. A threshold value is then applied to filter out regions that are not of interest. The same approach would be applied to the observations. Because the original values on the grid are still available, it is possible to directly compare characteristics of the values inside the objects. An example of such a comparison is shown in Figure 3.

[TOP]

(2a)

(2b)

(2c)

 

Figure 2. Example of an approach to defining "objects" in a precipitation field: (a) the original precipitation field from the
Weather, Research, and Forecasting (WRF) modeling system; (b) the smoothed WRF precipitation field after a convolving disk has been applied; and (c) the final field after a threshold has been applied to the convolved field.

[TOP]

 

(3a)

 

(3b)

Figure 3. An example of a defined (bandaid) object applied to an observed (Stage 4 precipitation) field and to a WRF precipitation field: (a) objects applied to the original (convolved and thresholded) fields, with original (upper plots) and optimized (lower right) forecast location and orientation; and (b) distributions of observed and forecast precipitation values inside the matched shapes.

Ongoing work on this study will include developing and testing the ability to match forecast and observed objects. One aspect of this area of research will involve investigating the scale of predictability of different types of phenomena. In addition, the applications of the research will expand to include nowcasts as well as the numerical weather prediction forecasts considered thus far. This work was presented at the USWRP Science Symposium in April, at the Workshop on Making Verification More Meaningful in July, and at the WWRP International Conference on Quantitative Precipitation Forecasting in September.

[TOP]

4. Detecting inhomogeneities in precipitation observations

Detecting actual changes in the amount or frequency of precipitation received at a United States cooperative observer network (COOP) station requires eliminating apparent "changes" that are the result of instrument drift, alterations in method of measurement and/or reporting, modification of the station's surroundings, etc. Change point detection is very challenging even when the measurements possess nice statistical properties such as normality, continuity, and homogeneity of variance.
However, precipitation data do not possess nice statistical properties. In fact, the occurrences of precipitation can be relatively infrequent and when precipitation does occur, the measurements tend to have a skewed distribution. For these sorts of measurements, use of standard change point methods is not recommended.

Fortunately, the COOP network is fairly dense. Each station has several neighbors also measuring precipitation. These neighbors are being taken advantage of in this study [by B. Brown, T. Fowler, and E. Tollerud (FSL)] to develop an alternative approach for detecting inhomogeneities. In particular, the frequency and amount of precipitation at each station
are compared to its neighbors' values for each month over the entire period of record. Thus the empirical distribution and time series of various measures of association between stations are obtained. New measurements can be compared to these to determine if the recent measures are "typical" or not. If not, the station can be flagged as possibly having experienced a change, and further checks can be performed.

Data from spring seasons (April - June) were analyzed at all stations in Iowa. In many ways, this set of data is easy to analyze because precipitation is plentiful in this area during the spring. Additionally, the terrain in Iowa is relatively uniform and the COOP network is fairly dense. However, the precipitation in Iowa during the spring months tends to be convective in nature. Thus, the precipitation may be very localized.

The equitable threat score (ETS) is one measure of a station's relationship to its neighbors. Figure 4 shows a time series plot of ETS for a particular station in Iowa. The scores are relatively random in nature until the last three seasons, when they attain their lowest values. The cause of this apparent inhomogeneity has yet to be determined. However, the behavior of the scores in the last three seasons clearly differs from the behavior of the scores in the preceding seasons.

Figure 4. Time series plot of Equitable Threat Score for each spring season 1950-2001 for a station in Iowa.

 


Figure 5. Time series plot of Equitable Threat Score for 33 seasons of simulated data. About half of the observations from the last season were replaced with observations of no precipitation.

 

Precipitation observations were simulated in order to test the efficacy of the methods on data with known (i.e. constructed) inhomogeneities. Figure 5 shows the time series of ETS for seven simulated stations. The frequency of precipitation is different at each station and is indicated by the color of the line. The inhomogeneity was the same at each station. Each failed to observe about half (44%) of the time during the final season. Note the dramatic change in ETS for the last season for all stations except the station with extremely rare precipitation events (probability of precipitation p = 1%).

More complete analyses of the Iowa and simulated data can be found in Tollerud et al. (2002) and Fowler et al. (2003), respectively. These analyses confirm that changes of various types, including inhomogeneities, may be indicated by changes in the scores.

While much progress has been made in the detection of inhomogeneities, much work remains to be completed. The scores have only been used on the spring measurements from Iowa and one set of simulated data. Other states and seasons must be investigated to determine how well the methods work in less ideal circumstances. The simulation procedure assumed that there was very good event agreement between the target station and its neighbors. Further research will include investigation
of the scores computed on simulated data with less agreement between neighbors. It is likely that less correlation among neighbors will result in the scores being less sensitive to inhomogeneities. Additionally, all of the analyses so far focus on analyzing the seasons separately. Attempts are being made to homogenize the measurements from different seasons, so that precipitation totals from all seasons may be considered together rather than separately, thus yielding a larger sample size in the same amount of time. Certainly, not all inhomogeneities will be detectable by these methods. However, the focus
of the future research will be to determine what can be detected, i.e. how large the change must be, and how soon after the change occurs will our methods detect it.

 

References

Tollerud, E. I., B. G. Brown, and T. L. Fowler, 2002: Identifying Inhomogeneities in precipitation time series: 1. Diagnostic measures of spatial correlation. 13th Conference on Applied Climatology, Portland, OR, May 12-16.

Fowler, T. L., Tollerud, E. I., and B. G. Brown, 2003: You've Changed! Inhomogeneity detection for COOP network precipitation measurements. 7th Conference on Integrated Observing Systems, Long Beach, CA, February 9-13.

 

[TOP]