FAA Summary Project Report

A Comparison of Hail Detection Algorithms

31 January 1995

Cathy J. Kessinger and Edward A. Brandes

Research Applications Program
National Center for Atmospheric Research*
P.O. Box 3000
Boulder, CO 80307

*NCAR is sponsored by the National Science Foundation


1.0 Introduction

"A United Express jet flying into a hailstorm was forced to return to Stapleton International Airport shortly after takeoff Saturday [1 October 1994] when hail shattered the plane's windshield and injured both crew members" (The Sunday Camera, 1994). This recent encounter illustrates just one potential hazard from hailstorms for aircraft. Damage to airfoils can significantly degrade aircraft performance by a loss of lift. Windshields can be cracked or, as in the case above, shattered. Hail ingestion has been identified as the primary cause of in-flight engine shutdowns (FSF News, 1993). A hail detection algorithm that provides timely and accurate warnings could have substantial economic benefit. For these reasons, the National Center for Atmospheric Research (NCAR) Research Applications Program (RAP) has undertaken a two year project to evaluate three reflectivity-based hail detection algorithms. This report summaries our findings.

The Joint Systems Project Office (JSPO) Next Generation Radar (NEXRAD) hail algorithm and the National Severe Storms Laboratory (NSSL) Hail Detection Algorithm (HDA) are selected for evaluation. The NSSL HDA has two components, the Probability of Severe Hail (POSH) and the Probability of Hail (POH). The NEXRAD algorithm is currently in use at National Weather Service (NWS) offices with a Weather Surveillance Radar-1988 Doppler (WSR-88D) installation. The NSSL algorithms are planned replacements of the NEXRAD algorithm. Both the NEXRAD and POSH algorithms were developed using data from Great Plains thunderstorms, while the POH algorithm was developed from data taken in central Switzerland. Evaluation of algorithm performance in other climatic regimes should determine what, if any, regional biases exist in the algorithm designs.

To ensure a data set adequate for verification, RAP conducted a Hail Project in the High Plains of northeastern Colorado during the summer months of 1992 and 1993. Documentation of precipitation type and hailstone sizes and characteristics comprises a "ground truth" verification data set for comparison with the predictions from the three hail algorithms. Statistical quantities are calculated to evaluate algorithm skill and performance at increasing hail size thresholds.


2.0 Algorithm Descriptions

2.1 NEXRAD Algorithms

The NEXRAD Hail algorithm identifies storms that are currently or will soon produce hail via a reflectivity-based determination of storm characteristics (Petrocchi, 1982; Smart and Alberty, 1984; Smart, 1985; Smart and Alberty, 1985). The NEXRAD Hail algorithm is one of six algorithms that constitute the Storm Sequence Algorithm (SSA) to define storm characteristics based on radar reflectivity.

2.1.1 Storm Sequence Algorithm (SSA)

The Storm Sequence Algorithm (SSA) constructs storm "segments" along a radial wherever the reflectivity is >30 dBZ and spans a distance of at least 5 km. Within a segment, "dropouts" may occur where the reflectivity data are below the threshold for a specified, small distance. Segments are combined into two-dimensional (2-d) storm components when sufficient overlap exists with adjacent segments, as determined by proximity of the centroids. To be defined a "storm," vertical correlation must exist between 2-d storm components.

2.1.2 Hail Algorithm

The NEXRAD Hail algorithm checks the geometry of the storm as defined by the SSA for hail indicators (Table 1) and assigns an appropriate weight. When an indicator is satisfied, it is assigned as positive. When an indicator is not satisfied or cannot be tested, it is assigned as probable. Weights are accumulated as the "Sum of Positive Weights" and as the "Sum of Probable Weights." The Confidence Factor (CFA) and the Score (SCR) are calculated by

CFA = 100 - Sum of Probable Weights, and
SCR = (Sum of Positive Weights/CFA) x 100.

The four outcomes for the NEXRAD Hail algorithm are (Smart, 1985)

Hail			CFA > 50 and SCR > 60, 
Probable Hail 25 < CFA < 50 or 50 < SCR < 60,
No Hail CFA > 25 and SCR < 50, and
Insufficient Data CFA < 25.

Additionally, a storm with maximum reflectivity >70 dBZ that is not labeled as a Hail storm is designated as a Probable Hail storm. Figure 1 illustrates a model hailstorm as defined by NEXRAD.

2.2 NSSL Algorithms

The two components of the NSSL Hail Detection Algorithm (HDA) have different hail size criteria (Witt, 1990). The Probability of Severe Hail (POSH) estimates the probability of hail >19 mm in diameter, the definition of severe hail in the NWS. The Probability of Hail (POH) is based on a method described by Waldvogel et al. (1979) and predicts the probability of hail of any size. Both algorithms use storm characteristics defined by the NSSL Storm Cell Identification and Tracking (SCIT) algorithm (Witt and Johnson, 1993) to determine the likelihood of hail. The SCIT differs from the NEXRAD SSA because it examines the higher reflectivity regions that are typically located aloft during the early stages of storm development.

2.2.1 Storm Cell Identification and Tracking (SCIT)

The NSSL SCIT algorithm and the HDA are run in conjunction. The SCIT computes storm segments along a radial by application of seven reflectivity threshold levels (60, 55, 50, 45, 40, 35, and 30 dBZ), in descending order. At each elevation angle, segments from a given reflectivity threshold are combined into 2-d storm components after application of proximity and area constraints. Storm centroids are computed for each of the seven sets of 2-d storm components. The centroid of the storm component derived from the highest reflectivity threshold is retained. Vertical correlation of the 2-d storm components defines a storm "cell" and is determined through an iterative process that correlates horizontal positions of storm centroids at increasing heights. A 5 km horizontal influence radius is applied initially, then increased to 7.5 km, and finally to 10 km to achieve correlation. Once storm cells are identified, a storm motion vector is calculated from temporal correlation of storm centroids.

2.2.2 Probability of Severe Hail (POSH)

For each storm cell, the Hailfall Kinetic Energy () (Waldvogel et al. 1978a,b; Waldvogel and Schmid 1982), is calculated as

Equation

where the reflectivity weighting factor, W(Z), is defined

Equation

with Z in dBZ, in J m-2s-1, Z1=40 dBZ, and Z2=50 dBZ. The 10 dBZ difference (Z2-Z1) defines the transition of precipitation type from only rain (Z To reduce the influence of seasonal and latitudinal lapse rate variations, a temperature weighting factor, WT(H), is computed from a nearby sounding as

Equation

where H is the height above ground level (all heights AGL). H0 and H-20 are the heights of the 0oC and -20oC isotherms, respectively, as determined from the 12 UTC Denver sounding each day. Note that the weight is applied only at temperatures <0oC (H>H0) and that the maximum weight occurs at temperatures <-20oC (H>H-20).

The Severe Hail Index (SHI) is calculated by

Equation ,

where N is the number of 2-d storm components within each storm cell and is calculated using the maximum reflectivity within each 2-d storm component.

The POSH is calculated from the SHI and the SHI warning threshold (SWT). The SWT (Fig. 3) is calculated from the height of the 0oC isotherm as

Equation .

Values of the SWT that are <20 are set to 20. For this study, the SWT is determined daily using the 12 UTC sounding from Denver. For a given SHI value and SWT, POSH is calculated as

Equation .

where NINT is a FORTRAN command that converts from floating point to integer operations and rounds to the nearest integer. Values of POSH <0 are set to 0 and values of POSH >100 are set to 100. The POSH is incremented at 10% intervals. Normalizing the POSH by the SWT places all environmental conditions within a common context. Notice that for SHI=SWT the value for POSH is 50%. As used by NSSL, values of the POSH >50% designate "Hail" while values of POSH <50% designate "No Hail." A specified number of storm cells, ranked by the maximum POSH values, are retained for output and display. For this report, the maximum number of storm cells processed at one time is 20.

2.2.3 Probability of Hail (POH)

Using storm cell designations from the SCIT algorithm, the POH algorithm computes the height of the 45 dBZ echo above the 0oC isotherm height (H45-H0) and applies this difference to the probability curve in Fig. 4. When (H45-H0) is >1.4 km, a positive indication for hail begins with a greater difference indicating a higher probability of hail. The algorithm outputs a percent probability of hail that varies from 0 to 100% in 10% increments. Unlike the POSH, the POH predicts the probability of hail of any size. The POH is retained for the same 20 storm cells as POSH.

2.3 NCAR Thunderstorm Identification, Tracking, Analysis, and Nowcasting (TITAN) Program

The NCAR TITAN program (Dixon and Weiner, 1993) allows rapid perusal of radar reflectivity data as well as identification of storms, computation of storm motions, storm tracking, and estimation of storm tendencies for growth or dissipation. Since storm identification information is not retained from either the NEXRAD SSA or NSSL SCIT algorithms, the TITAN storm outlines at 30 and 40 dBZ reflectivity thresholds define two of the five influence regions used in this study to define the area where verification data are matched to algorithm output. Radii from the verification report define the remaining influence regions. Influence regions are discussed more fully in the next section.

TITAN operates within a Cartesian reference frame having 1 km horizontal and vertical grid spacing and a 300 X 300 X 20 km domain. Cartesian volumes are constructed at the conclusion of each 6 min radar volume with a time stamp applied at the midpoint of the collection volume. Storm identification is accomplished by application of a minimum reflectivity threshold, minimum volume requirements, and minimum lifetimes. A storm outline is constructed using Cartesian (x, y) coordinates.

Further, TITAN allows the concurrent display of verification reports, algorithm predictions, and radar reflectivity data. This capability is invaluable in finding and correcting hail verification reports that have spatial or temporal errors as well as verifying that all algorithms are operational.


3.0 Verification Methodology

3.1 Data collection

The NCAR/RAP Hail Project was conducted in northeastern Colorado during the months of June and July in 1992 and 1993 (Fig. 5). The Hail Project was contained within a larger research effort named the Real-time Analysis and Prediction of Storms (RAPS-92 and RAPS-93), an umbrella project covering all summer convective research within RAP. Neilley et al. (1993) describes these research efforts and the deployment of additional instrumentation.

The operations center for RAPS 92-93 was the Aviation Weather Development Laboratory (AWDL) housed at RAP. Reflectivity data from the Mile High Radar (MHR), a NEXRAD prototype radar located 15 km northeast of Denver Stapleton International Airport, was used for testing the hail algorithms. Radar characteristics are listed in Pratte et al. (1991). Two hail intercept teams were vectored to storms of interest for documentation of precipitation type and the size and characteristics of the hailstones. For RAPS-93, Global Positioning System (GPS) units were placed in the hail cars, providing precise locations at 15-20 s intervals. The GPS units were a distinct advantage over the handwritten navigational documentation used in 1992. Freedom I from navigational documentation allowed the intercept crews to document hail characteristics more frequently, ideally at 1 min intervals. A Volunteer Observing Network (VHN) was formed from area high school and junior high school teachers and students and the public. Other observations were provided by the Mountain States Weather Services (MSWS) and the Denver NWS office.

Field operations were conducted Monday through Saturday from 18 UTC to the end of weather activity (typically 01 UTC). Sunday was an operational day if weather conditions were forecast as favorable for hail. An 18 UTC forecast of convective development was given before operations each day. After deployment of the intercept cars, the Hail Coordinator vectored them to storms of interest. As rain or hail was encountered, documentation procedures were begun for precipitation type and intensity (light, moderate, heavy), the minimum, maximum, and average hailstone size (mm), hail density (number of stones/m3), hail depth, stone shape (round, flat, conical), stone color (clear, milky), stone hardness (hard, mushy), and whether damage to vegetation or property occurred. Photographic documentation of the hailstones was made. Detailing similar information about rain or hail occurrence, members of the VHN and MSWS network mailed their reports to NCAR. Information gained from the VHN and MSWS network had less temporal resolution than that from the intercept cars since their reports were for the event as a whole rather than for the evolution of the event. NWS reports typically contain the maximum hail size, location and the time the report was received.

MHR data were transmitted into the AWDL and to the Forecast Systems Laboratory (FSL) via a high speed telephone link and ingested into the respective computer networks. The NSSL algorithms were installed by NSSL on a RAP computer in the AWDL. The algorithms were run during operations and the results displayed. The NEXRAD algorithm was run at FSL and the output given to RAP at the conclusion of each field season.

3.2 Data editing

After each field summer program, the handwritten observations were converted to ASCII computer files. For RAPS-92, the intercept car positions were determined from topographic maps. Location, time of occurrence, and the precipitation documentation described above were included in the files. The verification and algorithm outputs were written using a similar format, ingested into TITAN and overlaid onto the MHR data. Verification reports were checked for spatial and temporal errors. If a verification report was not located within precipitation echo (i.e., a temporal error) or its location within the storm was in error (i.e., severe hail occurring in a low reflectivity region at the edge of the storm instead of near the maximum reflectivity region), its location and time were rechecked and corrected. If a position or time error could not be resolved satisfactorily, the report was deleted. NWS reports were especially prone to temporal errors since public reports were often received many minutes after the event.

Each storm with verification data was characterized as either a "Hailstorm" or a "Rainstorm" for each radar volume. A +3 min time window centered on the verification time was selected for matching algorithm predictions to precipitation reports because the collection time for a radar volume was 6 min. Many storms contained multiple observations of either hail or rain or both, necessitating the selection of one verification report for retention. For a hailstorm, the hail observation judged most representative was retained. The report closest to the maximum reflectivity region (>45 dBZ) at the lowest TITAN Cartesian level (either 2 or 3 km MSL, depending on range) and with the largest hailstone size was kept. Rainstorms (i.e., "no hail") test the ability of an algorithm to predict the non-occurrence of hail and improves the distribution of observed events. However, characterizing a rainstorm must be carefully done since it is easily argued that the observer was not in the right position to encounter the hail swath. For this reason, rules applied to rainstorms were more stringent than those for hailstorms. First, when the maximum reflectivity region was >45 dBZ at the lowest Cartesian level, the observer location was required to be within the maximum reflectivity as determined by TITAN at 1 km grid spacing, and the times of the rain report and the TITAN radar analysis must correspond within 1 min. Second, when the maximum reflectivity contour at the lowest level was <45 dBZ, the rain and radar observations were required to be within 1 km and +3 min. Third, for stratiform rain situations, only those rain observations within an isolated maximum reflectivity contour were retained. Examples of hail and rain event selection are shown in Fig. 6.

Defining hail or rainstorms in this manner means the statistical analysis is "observation driven" since only those storms with verification data are used. Not all storms can be characterized, as is optimal for determination of the population characteristics. For this reason, statistical results are dependent on the characteristics of the observations, such as their distribution. Inclusion of rain events improves the distribution of observed event categories and the statistical results.

3.3 Correlation Program

Using both temporal and spatial boundaries, a "correlation" program matches the edited verification data with the algorithm predictions. The confines of the temporal and spatial boundaries are termed an "influence region." Five methods are used to define the influence region for matching a verification report with an algorithm prediction. The influence regions are determined by the distance from the verification report at 5, 10, and 15 km influence radii and by the TITAN storm outline at 30 and 40 dBZ thresholds.

Due to the 30 dBZ threshold, the NEXRAD SSA typically identifies large regions that may contain multiple storm cells with reflectivity >50 dBZ. This is especially true in large squall lines and is illustrated by a TITAN 30 dBZ storm outline (Fig. 7a). The smaller size of the 40 dBZ outline is shown in Fig. 8a. The NSSL SCIT algorithm identifies individual storm cells within a storm on a 5-10 km spatial scale. Algorithm performance is expected to be a function of the influence region because of the different storm identification techniques used. All algorithms are evaluated using the five methods.

3.4 Contingency Table Program and Statistical Quantities

After the correlation program was run for all storm days using the five methods described above, the "contingency table" program placed the matched verification and algorithm predictions into the cells of a contingency table for statistical analysis (Table 2). The Critical Success Index (CSI), Probability of Detection (POD) or prefigurance, False Alarm Ratio (FAR), the Frequency of Misses (FOM), the Heidke Skill Score (HSS) and the Mean Square Error (MSE) are calculated from the contingency table (Donaldson et al. 1975; Stanski et al. 1989; Doswell et al. 1990; Harvey et al. 1992). The POD, FAR, CSI and FOM are calculated as

Equations

While somewhat redundant, the FOM is included to contrast algorithm declaration of a miss versus a false alarm. Notice when the algorithm correctly identifies a no hail event (NN), the event does not contribute to the CSI, POD, FAR or FOM. The inclusion of the rain events as discussed in Section 3.2 contribute to the evaluation by measuring algorithm effectiveness in no hail events.

The Heidke Skill Score (HSS) is computed by

Equations

For the HSS, R is defined as the number of perfect forecasts, T is the total number of events, Cy is the positive columns sum (see Table 2), Cn is the negative columns sum, Ry is the positive row sum, Rn is the negative row sum, and Ec is the expected number of correct predictions from chance.

The Heidke Skill Score (HSS) uses the contingency table scores to test the skill of each algorithm above a standard which, for purposes of this report, is assumed to be "chance." In the HSS, forecasts that are correct on the basis of chance are removed. No correlation is assumed between the predicted and observed values. A perfect HSS score is +1. If the algorithm has the same skill as chance, then the HSS = 0. Negative values indicate fewer right predictions than chance.

Unlike the CSI, the Heidke Skill Score (HSS) uses the sum of the perfect forecasts (i.e., YY and NN) and cannot be computed if either cell is missing. Inclusion of rain events fills the NN cell of the contingency table, assuming the algorithm has a correct prediction of "no hail". Because the HSS requires both perfect forecasts (YY and NN) for calculation, it is a better method for evaluation than the CSI alone. However, for contingency tables, they give similar results.

The Mean Square Error (MSE) is computed from the contingency table as

Equations

and measures the proportion of misclassified events (Harvey et al. 1992). The MSE is equivalent to (1 - Proportion Correct) where the Proportion Correct = (YY + NN)/T.

Reliability diagrams (Stanski et al. 1989) are computed for the NSSL algorithms to illustrate the extent that the forecast probability matches the actual frequency that hail is observed. Diagrams are constructed with the forecast probability categories along the X-axis and the observed frequencies along the Y-axis. Reliability is shown by proximity of the curve to the bisecting 45o line. The 45o line indicates perfect forecasts of the probability of hail. When an algorithm over-forecasts (probabilities are too high), points are under the 45o line. For under-forecasts (probabilities are too low), the points are over the 45o line. Reliability is most accurate when sufficient number of points are contained in each percentile of probability.

Sharpness is defined from the distribution of points within each probability category (Stanski et al. 1989). A perfectly sharp algorithm has all forecasts in the 0% and 100% probability categories and acts as a binary flag for the prediction of the event. Sharpness increases as the number of forecasts in the extreme probability categories increases.

3.5 Statistical Analysis Methodology

Procedural rules governing the statistical analysis were:

1) For each radar volume, storms are classified as a Hailstorm or a Rainstorm based on the verification data. Only storms with verification data are selected. Section 3.2 discusses the rules used in the classification process. For this study, a total of 237 hail events and 95 rain events are included. See Fig. 6 for an illustration of event selection criteria.

2) Within multicellular storms or convective cell complexes, the NEXRAD Hail algorithm typically produces 1-3 predictions while the NSSL hail algorithms typically produce >3. Figures 7b and 8b show an example of this for a squall line. Having different numbers of algorithm predictions per storm creates problems with interpretation of the statistical results. Within the same storm, NEXRAD may predict "Hail" and "No Hail", while POH and POSH may predict a 10%, 40%, 50% and 100% probability of hail. To simplify the analysis, only 1 algorithm prediction at the level deemed most severe is kept per storm per radar volume. Similarly, only 1 verification report is kept per storm per radar volume, as discussed in Section 3.2. For NEXRAD, predictions are ranked by severity as "Hail," "Probable Hail," and "No Hail." All "Insufficient Data" designations and their corresponding verification data are removed from the data set. For POH and POSH, the maximum percent probability of hail is kept.

3) The correlation program matches the algorithm predictions to the verification data using the appropriate spatial and temporal boundaries. If either or both algorithms has no pairing with a verification report, a prediction is inserted at the lowest level, which, for NEXRAD, is a "No Hail" prediction and, for POH and POSH, is a 0% probability prediction. Prediction are inserted more frequently for the NEXRAD algorithm than the NSSL algorithms due to the fewer number of algorithm predictions generated. Typically, POH and POSH are inserted when the maximum number of storm cells identified exceeds 20 and the verification is with a storm cell outside of the 20. In these cases, the storm cell typically has weaker reflectivity values than the 20 identified storm cells.

4) Only days with both NEXRAD and NSSL algorithm predictions are used. If one algorithm is not present for a short period during the day, the other algorithm and the verification report are deleted.

5) Because the NSSL algorithms yield a probability of hail ranging from 0 to 100%, determination of a "Hail" versus "No Hail" threshold is desired. To test for the appropriate threshold, increasing thresholds are applied at 10% intervals with those predictions at or above the threshold probability being designated as "Hail" while those below are "No Hail." For example, to test at the 50% probability level, algorithm predictions <50% are designated as a "No Hail" prediction while those >50% are designated as a "Hail" prediction. At the 0% probability threshold, all algorithm predictions are set to "Hail" such that the CSI scores are actually the percentage of hail events. The seeming discontinuity between the 0% and 10% probability thresholds seen in most statistical quantities results from the discretization of the algorithm probability predictions into 10% intervals. Witt (1993) used >50% as the probability threshold in his performance evaluation of the POSH algorithm.

6) NEXRAD outputs "Hail" and "Probable Hail" designations. To test the added value of the "Probable Hail" prediction, the algorithm is scored two ways. In one test, "Probable Hail" predictions are designated as "No Hail." Test results are termed the NEXRAD Hail algorithm in subsequent figures and tables. In the second test, "Probable Hail" reports are considered "Hail" predictions and are termed the NEXRAD Hail + Probable Hail algorithm.

7) To test algorithm performance as hail size increases, the verification reports are thresholded at four levels: >0 mm, >6 mm, >13 mm, and >19 mm. Hail reports between 1-5 mm are typically small ice particles or graupel. Graupel is fairly common in Colorado due to the relatively low height of the wet bulb zero isotherm when compared to other geographical regions. During the test, hail reports below the size threshold being applied are changed to "No Hail." Too few reports of hail >25 mm prevent statistical analysis at larger hail sizes. The NWS identifies severe hail as >19 mm and only those reports are contained within Storm Data.


4.0 Discussion of Results

An activity summary from the RAPS-92 and RAPS-93 Hail Projects is compiled in Table 3. Five days from 1992 and 20 from 1993 are used. Sixteen days are characterized with severe and non-severe hail events, 4 days have only non-severe hail events, and 5 days have no hail events. A total of 97 hailstorms and 68 rainstorms have verification data. A storm, as defined by TITAN at a 40 dBZ threshold, may exist for multiple radar volumes and may contain multiple hail or rain events. The verification data classify the storm as a hailstorm or a rainstorm. Of the 97 hailstorms documented, a total of 237 hail events were recorded with the maximum hailstone having a diameter of 65 mm and the smallest hailstone having a diameter of a few millimeters. Of the 237 hail events, 193 are >6 mm, 102 are >13 mm, and 52 are >19 mm. From the 68 rainstorms, a total of 95 rain reports were collected. About half of the hail reports were collected by the NCAR hail intercept crews, with the sum of the VHN, NWS, and MSWS reports comprising the other half. The mobility of intercept crews ensured that a large number of storms were investigated.

The statistical analysis of the NEXRAD and NSSL hail algorithms for the five influence regions (5, 10, and 15 km influence radii, the 30 dBZ and 40 dBZ storm outlines, respectively) and 4 hail size categories is summarized in numeric form in Tables A1-A5 of Appendix A. In the following sections, results from the 15 km influence radius are discussed for the NSSL algorithms while results from the 30 dBZ influence region are used for the NEXRAD algorithm because these influence regions produced the maximum CSI scores for the respective algorithms. Except for the 5 km influence radii which has the worst performance, statistical quantities are similar for all influence regions and differ by 0.15, at most. The similarity of the statistical results demonstrates a lack of sensitivity to influence region selection and may be a consequence of selecting the maximum algorithm predictions. The poor performance at a 5 km influence radius suggests that this spatial scale is less consistent with the cell and storm definitions used by the algorithms. The statistical quantities for the 15 km influence radius are plotted in Figs. 9-12.

4.1 NEXRAD Hail Algorithm Results

The CSI for the NEXRAD Hail algorithm (hereafter referred to as the NHail algorithm) varies from 0.22 to 0.48 for the four hail size categories (see Table A4 for the 30 dBZ influence region statistics). The maximum CSI occurs with the hail threshold >6 mm and the minimum CSI occurs for severe hail (>19 mm). Having the maximum CSI at a small hail size threshold is consistent with the algorithm design of detecting hail of any size. The POD scores range from 0.49 to 0.60 and increase as the lower bound for defining a hail event increases. The maximum HSS is 0.36 and occurs for hail sizes >6 mm. The minimum HSS value (0.18) occurs with severe hail. Values for the FAR include a minimum of 0.06 for hail >0 mm and a maximum of 0.75 for severe hail. The FOM decreases from 0.52 to 0.40 as the hail size threshold increases. The MSE is as a maximum of 0.39 for all hail sizes and a minimum of 0.33 at a hail sizes threshold >6 mm.

The NEXRAD Hail + Probable Hail algorithm (hereafter referred to as the NHailPH algorithm) shows significant improvement in CSI over NHail for hail size categories >0 and >6 mm (Table A4) with CSI scores of 0.72 and 0.68, an increase of 0.25 and 0.20, respectively. The maximum CSI score for NHailPH occurs at >0 mm. At hail size categories >13 mm and >19 mm, the CSI scores for NHailPH decrease markedly from those at the small hail size thresholds but are greater than or equal to the NHail CSI scores, respectively. The improvement in CSI caused by inclusion of the Probable Hail prediction has been documented by Smart (1985) and Witt (1993). As might be expected from the NEXRAD SSA storm definition, the CSI scores for the 30 dBZ influence region are higher than those for the 40 dBZ influence region (see Tables A4 and A5). For the two smaller hail size thresholds, the 30 dBZ influence region outperforms the 15 km radius of influence region in CSI scores. The NHailPH POD scores are an improvement of 0.21 to 0.27 over the NHail scores. The FAR scores of NHail and NHailPH differ by <0.02 for all hail size thresholds. With a few exceptions, the statistical quantities show that the NHailPH has more skill than the NHail.

Smart (1985) evaluated the NHail and NHailPH algorithms in Colorado for severe and non-severe hailstorms. Rainstorm data were presented but were not included within the statistical calculations. His verification data set consisted of chase car observations and weather observations reported to the Denver NWS office. For comparison to the statistical results from this study, the rainstorm data are added to the Smart contingency tables and presented in Table 4. Further, the scoring methodology used in this study for the NHailPH algorithm is applied (see Section 3.5, Item 6). A difference between the two analyses is the time window applied to match the weather observations to the algorithm predictions. Smart used a 20 min time window centered on the observation time such that 1 observation might be counted as many as 5 times (a function of the radar volume collection time). This study uses a 6 min window centered on the observation time with each observation being counted once. However, chase crews tended to remain with a storm for an extended period of time taking frequent observations and producing a similar number of events per storm as in the Smart study. Smart included hailstones as small as 3-6 mm, therefore, the statistical results from the hail size threshold >6 mm should be comparable. After the modifications are made to the Smart study, good agreement is found with the results from this study with maximum differences of <0.21 among the CSI, POD, and FAR for both the NHail and NHailPH algorithms using the 30 dBZ influence region (compare Table 4 to Table A4). The CSI and POD scores found in this study are better than that found by Smart; FAR scores are comparable.

Using Oklahoma and Florida data, Witt (1993) evaluated the NHail and NHailPH algorithms for severe hailstorms and non-severe storms using Storm Data hail reports as the verification data set. All storm cells identified by the NEXRAD SSA algorithm were assigned a precipitation-type designation such that any storm cell without a corresponding hail report was assumed to be a non-severe (i.e., non-hail) storm.Witt used a 60 min time window (-45 to +15 min) with t=0 min being the time of the verification report. The rule set for filling the cells of the 2 X 2 contingency table varied over the 60 min time window. For the -15 to +5 min window, both hits (YY) and misses (NY) were accumulated in the contingency table. For the -45 to -16 and +6 to +15 min time intervals, only hits were recorded. False alarms (YN) were recorded at t=0 min whenever a positive algorithm indication of hail occurrence existed but no corresponding hail report was found. This scoring scheme tends to inflate the YY cell of the contingency table and favor higher CSI and POD scores and lower FAR scores, particularly for long-lived hailstorms. For the NHail and NHailPH algorithms, Witt found the CSI=0.47 and 0.54, the POD=0.66 and 0.84, and the FAR=0.38 and 0.39, respectively. We attribute Witt's higher CSI scores (by 0.25 and 0.32), higher POD (by 0.06 and 0.03), and lower FAR (by 0.37 and 0.38) principally to the inflation of the YY cell in the contingency table.

4.2 Results from the NSSL POSH and POH Algorithms

The verification data for the NSSL POH and POSH algorithms are included in Figs. 9-12 and Tables A1-A5 of Appendix A. Statistical results from the 15 km influence radius are used in the discussion. Recall that the POH algorithm is designed specifically for hail of any size, while the POSH algorithm is designed specifically for severe hail (>19 mm). The POH, NHail, and NHailPH are comparable algorithms. However, to ascertain how well the POH and POSH algorithms perform their respective functions and to check for redundancy, the results for the POSH algorithm at the smaller size thresholds for defining hail events and the POH algorithm at larger size thresholds have been tabulated for comparison.

Inspection of Fig. 9a and Table A3 reveals that the skill of the POH algorithm for hail of any size, as measured by the CSI score, is relatively constant (decreasing from 0.89 to 0.81) over the probability threshold interval of 10 to 80%. At higher probability thresholds, the CSI rapidly decreases. The POH algorithm has considerably greater skill for identifying hail of any size than the NEXRAD algorithm for both the NHail and NHailPH categories. When the POSH algorithm is applied to hail of any size, its CSI decreases steadily from 0.85 at 10% probability to 0.46 at a probability of 80%. The difference between the two algorithms ranges from 0.03 at 10% to 0.35 at 80%. Thus, the POH algorithm also outperforms the POSH algorithm for any size hail.

The discriminant function for the POH (Fig. 13) shows that the majority of the algorithm predictions for "Hail" or "No Hail" are at the 0% and >80% probability categories, while the discriminant function for the POSH (Fig. 14) shows more algorithm predictions than POH are at intermediate probabilities. Consequently, the POH algorithm is more sharp than POSH with fewer of the POH predictions changing from "Hail" to "No Hail" as the probability threshold increases. As a result, a near-constant slope is attained by the POH. This effect is less striking at the two larger hail size categories. The general sharpness exhibited by the two algorithms may be partly a consequence of selecting the maximum algorithm prediction within each influence region. A numeric tabulation of the distribution of precipitation events for the NEXRAD and NSSL algorithms is given in Tables 5-7.

For hail >0 mm (Fig. 9a and Table A4), the CSI score for the POH algorithm is greater than that for the NHail algorithm at all algorithm probability thresholds and is greater than the NHailPH algorithm except at the 90% and 100% probability thresholds. The large improvement demonstrated by the POH shows it is a better algorithm for detecting hail of any size.

In general, as the size threshold for defining hail increases, the CSI scores decrease for the NSSL algorithms, like those for the NEXRAD algorithm. The decrease is attributable to a rapid increase in FAR. Notice also that the range in CSI values among all algorithms decreases significantly as the hail size threshold increases. As the size threshold for defining hail events increases, the performance of the POSH algorithm relative to the POH algorithm steadily improves. In fact, for severe hail (Fig. 12a), the POSH algorithm outperforms all others. This result is consistent with the design of the algorithm. Also, the slope of the CSI line for the POSH becomes positive. As mentioned earlier, the HSS scores (Figs. 9c-12c) behave similarly as the CSI score.

Witt (1993) evaluated the NSSL POSH algorithm for severe hail and non-hail events using the methodology discussed in Section 4.1 but using the NSSL SCIT algorithm for storm cell identification. Using a 50% probability threshold, he found the POSH CSI=0.71, POD=0.86 and the FAR=0.21. This study found the CSI=0.29, POD=0.94, and the FAR=0.71 at the 50% probability threshold and for the 15 km influence radius. We attribute the greater skill found by Witt primarily to the variable rule set used for matching precipitation observations to the algorithm designations.

Similar to the CSI curves, the POD curves for the POH algorithm (Figs. 9b-12b) are also nearly flat to the 80% probability threshold and then decrease rapidly. For hail of any size, the POH POD is generally greater than 0.80. The POH POD is always greater than the NHail POD; and except for the 100% probability threshold at the three smallest size categories, the POH PODs are greater than the NHailPH values. As expected, the POSH PODs are largest for hail >19 mm. Although the POSH PODs improve relative to POH as hail size increases and they become quite large (i.e., 0.94 for hail >19 mm and a probability threshold of 50%), they remain less than those for POH.

Figures 9d-12d show that the FARs for all algorithms increase dramatically as the size threshold for defining hail increases. The range in FAR values among algorithms remains small. For hail >0 mm, the typical FAR is about 0.05; for severe hail, the typical FAR is about 0.75. At the higher probability of hail thresholds, the NSSL POSH algorithm tends to have the lowest FAR. To examine the possible contribution to the FAR of the 95 observed events from rainstorms (i.e., rain-only events), Table 8 shows the distribution of these events for the hail predicted/not observed category (i.e., the YN cell) of the contingency table. The POSH algorithm has no rain-events at probabilities in excess of 50% as compared to POH and is a better discriminator of rain-only events than the POH; NHail and NHailPH have 7 and 9 rain-only events that respectively contribute to the FAR.

The MSE curves (Figs. 9e-12e and Table A4) show that, for hail of any size, the POH algorithm has the lowest error except at 90% and 100% probability where it is greater by 0.03 and 0.12, respectively, than the NHailPH algorithm. Hence, the POH algorithm represents a significant improvement over the NEXRAD algorithm. For hail sizes thresholds >13 mm and >19 mm, the MSE for the POSH algorithm has a mean value and a 50% probability threshold value that is roughly equivalent to the NEXRAD algorithms but lower than the POH algorithm.

Reliability diagrams are constructed (see Section 3.4 for discussion) to examine the extent that the NSSL algorithm forecast probabilities match the observed frequencies hail (Figs. 15-16). For hail events >0 and >6 mm, the POSH algorithm under-forecasts the occurrence of hail (Fig. 15a-b). For hail size categories >13 and >19 mm, the POSH algorithm over-forecasts the occurrence of hail. The reliability diagram suggests that the algorithm would be optimal for detecting hail with a minimum size between 6 and 13 mm. Although the number of hail events in some of the probability categories is small (see Figs. 13 and 14), the POH algorithm (Fig. 16) clearly approaches the 45o line for hail >6 mm, under-forecasts for hail of any size, and over-forecasts for the two largest hail size thresholds.


5.0 Conclusions

This study has examined three reflectivity-based hail detection algorithms. Verification data taken over two summers have been compared to matching algorithm predictions of hail occurrence. Statistical analysis was applied and indices used to compare algorithm performance.

The NEXRAD Hail algorithm performed best for hail of any size and a 30 dBZ influence region (CSI=0.47). As noted in previous studies, changing the Probable Hail designation to a prediction of Hail resulted in improved performance over NEXRAD Hail alone (CSI=0.72). Results shown here are slightly better than that in the Smart (1985) study of the NEXRAD algorithms in Colorado.

The NSSL POH algorithm, like NEXRAD, was designed to detect small hail and had its best CSI performance at the smaller hail size categories with typical CSI values between 0.80 and 0.90. The POH performed significantly better than the NEXRAD Hail or Hail + Probable Hail algorithms by all measures except perhaps the FAR where all results were similar. For hail of any size (>0 mm), the POH under-forecasts the probability of hail occurrence.

When the results for the POSH algorithm are compared to that for the algorithms designed to detect hail of any size but applied to severe hail, the POSH algorithm has the highest CSI and HSS scores. At the 50% probability threshold and for a 15 km influence radius, the CSI is 0.29. Further, the POSH algorithm has a higher POD and lower FOM than the NEXRAD algorithm for the NHail and NHailPH cases. However, the highest POD and lowest FOM for severe hail were attained by the POH algorithm. The POSH algorithm was comparable in performance to the NEXRAD algorithm in terms of the MSE and the FAR. For Colorado, the POSH algorithm over-forecasts the probability of severe hail. Further, algorithm hail probabilities seem optimal for hail with a minimum size between 6 and 13 mm. Overall, our results were not as spectacular as that found by Witt (1993) in Oklahoma and Florida. We believe this result is due in part to differences in matching the precipitation observations to the algorithm designations. We conclude, however, that the NSSL POH algorithm represents a significant improvement over the NEXRAD algorithm for detecting all hail events and that the NSSL POSH algorithm exhibits some skill in detecting severe hail events.


Acknowledgments

Special thanks to Mr. J. Smith for data archival, quality control of the verification data, and figure preparation, to Ms. L. Carson and Dr. M. Dixon for their correlation program and in understanding TITAN, and to Mr. E. Jeannette for data archival, all from NCAR. Thanks to Ms. C. Mueller, Ms. B. Brown, and Mr. J. Wilson, all of NCAR, for discussions on statistical analysis and for reviewing this paper. Thanks also to Mr. A. Witt, Mr. J. Johnson and Mr. M. Eilts, all of NSSL, for installation of the NSSL HDA program and for answering questions; to Mr. R. Lipschutz, FSL, for providing the NEXRAD data; to Mr. L. Mooney, NWS, for providing the NWS severe weather reports; to Mr. J. Wirshborn and Mr. B. Bernstein for their organization of the VHN; to Ms. R. Swindle, NCAR, for maintaining the NSSL HDA program at RAP and in setting up archival procedures; and to Dr. P. Neilley for installing the GPS systems. Special thanks to Ms. N. Knight, NCAR, and to the many participants in the RAPS 92-93 Hail Projects whose dedication and enthusiasm ensured a high quality verification data set was collected. The radar data set was collected by the NCAR Atmospheric Technology Division. This research is sponsored by the National Science Foundation through an Interagency Agreement in response to requirements and funding by the Federal Aviation Administration's Aviation Weather Development Program. The views expressed are those of the authors and do not necessarily represent the official policy or position of the U.S. Government.


References

The Sunday Camera, 1994: Hail shatters plane's windshield in flight. The Boulder Daily Camera, 2 Oct. 1994, Boulder, CO, p. 3B.

Dixon, M., and G. Wiener, 1993: TITAN: Thunderstorm Identification, Tracking, Analysis, and Nowcasting - A radar-based methodology. J. Atmos. Ocean. Tech., 10, 785-797.

Donaldson, R.J., R.M. Dyer, and M.J. Kraus, 1975: An objective evaluator of techniques for predicting severe weather events. Preprints, 9th Conference on Severe Local Storms, Norman, OK, Amer. Meteor. Soc., 321-326.

Doswell, C.A, III, R. Davies-Jones, and D.L. Keller, 1990: On summary measures of skill in rare event forecasting based on contingency tables. Weather and Forecasting, 5, 576-585.

Flight Safety Foundation News, 1993: International panel of aviation experts delivers safety messages to European and regional operators, Flight Safety Foundation News, 32.

Harvey, Jr., L.O., K.R. Hammond, C.M. Lusk, and E.F. Mross, 1992: The application of signal detection theory to weather forecasting behavior, Mon. Wea. Rev., 120, 863-883.

Neilley, P.P., N.A. Crook, E.A. Brandes, M. Dixon, C. Kessinger, C. Mueller, R. Roberts, and J. Tuttle, 1993: RAPS92 - Realtime analysis and prediction of storms, 1992. Preprints, 26th Radar Meteor. Conf., Norman, OK, 24-28 May 1993, Amer. Meteor. Soc., Boston, 135-137.

Petrocchi, P.J., 1982: Automatic detection of hail by radar. AFGL, Tech. Report 82-0277, 33 pp.

Pratte, J.F., J.H. Van Andel, D.G. Ferraro, R.W. Gagnon, S.M. Maher, G.L. Blair, 1991: NCAR's Mile High meteorological radar. Preprints, 25th International Conf. on Radar Meteor., 24-28 June 1991, Paris, France, Amer. Meteor. Soc., 863-866.

Smart, J.R., and R.L. Alberty, 1984: An evaluation of the performance of the NEXRAD hail algorithm. Preprints, 22nd Radar Meteor. Conf., Zurich, Amer. Meteor. Soc., Boston, 202-207.

Smart, J.R., 1985: Performance evaluation of the NEXRAD hail algorithm applied to Colorado thunderstorms. NOAA Tech. Memo. ERL ESG-18, 29 pp.

Smart, J.R., and R.L. Alberty, 1985: The NEXRAD hail algorithm applied to Colorado thunderstorms. Preprints, 14th Conf. Severe Local Storms, Indianapolis, Amer. Meteor. Soc., Boston, 244-247.

Stanski, H.R., L.J. Wilson, and W.R. Burrows, 1989: Survey of Common Verification Methods in Meteorology. 2nd Edition, Atmospheric Environment Service, Downsview, Ontario, Canada, 112 pp.

Waldvogel, A., W. Schmid, and B. Federer, 1978a: The kinetic energy of hailfalls. Part I: Hailstone spectra. J. Appl. Meteor., 17, 515-520.

Waldvogel, A., B. Federer, W. Schmid, and J.F. Mezeix, 1978b: The kinetic energy of hailfalls. Part II: Radar and hailpads. J. Appl. Meteor., 17, 1680-1693.

Waldvogel, A., B. Federer and P. Grimm, 1979: Criteria for the detection of hail cells. J. Appl. Meteor., 18, 1521-1525.

Waldvogel, A., and W. Schmid, 1982: The kinetic energy of hailfalls. Part III: Sampling errors inferred from radar data. J. Appl. Meteor., 17, 1680-1693.

Witt, A., 1990: A hail core aloft detection algorithm. Preprints, 16th Conf. Severe Local Storms and Conf. Atmos. Electr., Kananaskis Park, Amer. Meteor. Soc., Boston, 232-235.

Witt, A., 1993: Comparison of the performance of two hail detection algorithms using WSR-88D data. Preprints, 26th Radar Meteor. Conf., Norman, OK, Amer. Meteor. Soc., Boston, 154-156.

Witt, A., and J.T. Johnson, 1993: An enhanced storm cell identification and tracking algorithm. Preprints, 26th Radar Meteor. Conf., Norman, OK, Amer. Meteor. Soc., Boston, 141-143.