Research Applications ProgramFAA Summary Project Report
A Comparison of Hail Detection Algorithms
31 January 1995
Cathy J. Kessinger and Edward A. Brandes
*NCAR is sponsored by the National Science Foundation
"A United Express jet flying into a hailstorm was forced to return to Stapleton International Airport shortly after takeoff Saturday [1 October 1994] when hail shattered the plane's windshield and injured both crew members" (The Sunday Camera, 1994). This recent encounter illustrates just one potential hazard from hailstorms for aircraft. Damage to airfoils can significantly degrade aircraft performance by a loss of lift. Windshields can be cracked or, as in the case above, shattered. Hail ingestion has been identified as the primary cause of in-flight engine shutdowns (FSF News, 1993). A hail detection algorithm that provides timely and accurate warnings could have substantial economic benefit. For these reasons, the National Center for Atmospheric Research (NCAR) Research Applications Program (RAP) has undertaken a two year project to evaluate three reflectivity-based hail detection algorithms. This report summaries our findings.
The Joint Systems Project Office (JSPO) Next Generation Radar (NEXRAD) hail algorithm and the National Severe Storms Laboratory (NSSL) Hail Detection Algorithm (HDA) are selected for evaluation. The NSSL HDA has two components, the Probability of Severe Hail (POSH) and the Probability of Hail (POH). The NEXRAD algorithm is currently in use at National Weather Service (NWS) offices with a Weather Surveillance Radar-1988 Doppler (WSR-88D) installation. The NSSL algorithms are planned replacements of the NEXRAD algorithm. Both the NEXRAD and POSH algorithms were developed using data from Great Plains thunderstorms, while the POH algorithm was developed from data taken in central Switzerland. Evaluation of algorithm performance in other climatic regimes should determine what, if any, regional biases exist in the algorithm designs.
To ensure a data set adequate for verification, RAP conducted a Hail Project in the High Plains of northeastern Colorado during the summer months of 1992 and 1993. Documentation of precipitation type and hailstone sizes and characteristics comprises a "ground truth" verification data set for comparison with the predictions from the three hail algorithms. Statistical quantities are calculated to evaluate algorithm skill and performance at increasing hail size thresholds.
2.1 NEXRAD Algorithms
The NEXRAD Hail algorithm identifies storms that are currently or will soon produce hail via a reflectivity-based determination of storm characteristics (Petrocchi, 1982; Smart and Alberty, 1984; Smart, 1985; Smart and Alberty, 1985). The NEXRAD Hail algorithm is one of six algorithms that constitute the Storm Sequence Algorithm (SSA) to define storm characteristics based on radar reflectivity.
2.1.1 Storm Sequence Algorithm (SSA)
The Storm Sequence Algorithm (SSA) constructs storm "segments" along a radial wherever the reflectivity is >30 dBZ and spans a distance of at least 5 km. Within a segment, "dropouts" may occur where the reflectivity data are below the threshold for a specified, small distance. Segments are combined into two-dimensional (2-d) storm components when sufficient overlap exists with adjacent segments, as determined by proximity of the centroids. To be defined a "storm," vertical correlation must exist between 2-d storm components.
2.1.2 Hail Algorithm
The NEXRAD Hail algorithm checks the geometry of the storm as defined by the SSA for hail indicators (Table 1) and assigns an appropriate weight. When an indicator is satisfied, it is assigned as positive. When an indicator is not satisfied or cannot be tested, it is assigned as probable. Weights are accumulated as the "Sum of Positive Weights" and as the "Sum of Probable Weights." The Confidence Factor (CFA) and the Score (SCR) are calculated by
CFA = 100 - Sum of Probable Weights, and
SCR = (Sum of Positive Weights/CFA) x 100.
The four outcomes for the NEXRAD Hail algorithm are (Smart, 1985)
Hail CFA > 50 and SCR > 60,Additionally, a storm with maximum reflectivity >70 dBZ that is not labeled as a Hail storm is designated as a Probable Hail storm. Figure 1 illustrates a model hailstorm as defined by NEXRAD.
Probable Hail 25 < CFA < 50 or 50 < SCR < 60,
No Hail CFA > 25 and SCR < 50, and
Insufficient Data CFA < 25.
2.2 NSSL Algorithms
The two components of the NSSL Hail Detection Algorithm (HDA) have different hail size criteria (Witt, 1990). The Probability of Severe Hail (POSH) estimates the probability of hail >19 mm in diameter, the definition of severe hail in the NWS. The Probability of Hail (POH) is based on a method described by Waldvogel et al. (1979) and predicts the probability of hail of any size. Both algorithms use storm characteristics defined by the NSSL Storm Cell Identification and Tracking (SCIT) algorithm (Witt and Johnson, 1993) to determine the likelihood of hail. The SCIT differs from the NEXRAD SSA because it examines the higher reflectivity regions that are typically located aloft during the early stages of storm development.
2.2.1 Storm Cell Identification and Tracking (SCIT)
The NSSL SCIT algorithm and the HDA are run in conjunction. The SCIT computes storm segments along a radial by application of seven reflectivity threshold levels (60, 55, 50, 45, 40, 35, and 30 dBZ), in descending order. At each elevation angle, segments from a given reflectivity threshold are combined into 2-d storm components after application of proximity and area constraints. Storm centroids are computed for each of the seven sets of 2-d storm components. The centroid of the storm component derived from the highest reflectivity threshold is retained. Vertical correlation of the 2-d storm components defines a storm "cell" and is determined through an iterative process that correlates horizontal positions of storm centroids at increasing heights. A 5 km horizontal influence radius is applied initially, then increased to 7.5 km, and finally to 10 km to achieve correlation. Once storm cells are identified, a storm motion vector is calculated from temporal correlation of storm centroids.
2.2.2 Probability of Severe Hail (POSH)
For each storm cell, the Hailfall Kinetic Energy () (Waldvogel et al. 1978a,b; Waldvogel and Schmid 1982), is calculated as
Equation
where the reflectivity weighting factor, W(Z), is defined
Equation
with Z in dBZ, in J m-2s-1, Z1=40 dBZ, and Z2=50 dBZ. The 10 dBZ
difference (Z2-Z1) defines the transition of precipitation type from
only rain (Z
Equation
where H is the height above ground level (all heights AGL). H0 and H-20
are the heights of the 0oC and -20oC isotherms, respectively, as
determined from the 12 UTC Denver sounding each day. Note that the
weight is applied only at temperatures <0oC (H>H0) and that the maximum
weight occurs at temperatures <-20oC (H>H-20).
The Severe Hail Index (SHI) is calculated by
Equation ,
where N is the number of 2-d storm components within each storm cell
and is calculated using the maximum reflectivity within each 2-d storm
component.
The POSH is calculated from the SHI and the SHI warning threshold
(SWT). The SWT (Fig. 3) is calculated from the height of the 0oC
isotherm as
Equation .
Values of the SWT that are <20 are set to 20. For this study, the SWT
is determined daily using the 12 UTC sounding from Denver. For a given
SHI value and SWT, POSH is calculated as
Equation .
where NINT is a FORTRAN command that converts from floating point to
integer operations and rounds to the nearest integer. Values of POSH <0
are set to 0 and values of POSH >100 are set to 100. The POSH is
incremented at 10% intervals. Normalizing the POSH by the SWT places
all environmental conditions within a common context. Notice that for
SHI=SWT the value for POSH is 50%. As used by NSSL, values of the POSH
>50% designate "Hail" while values of POSH <50% designate "No Hail." A
specified number of storm cells, ranked by the maximum POSH values, are
retained for output and display. For this report, the maximum number of
storm cells processed at one time is 20.
2.2.3 Probability of Hail (POH)
Using storm cell designations from the SCIT algorithm, the POH
algorithm computes the height of the 45 dBZ echo above the 0oC isotherm
height (H45-H0) and applies this difference to the probability curve in
Fig. 4. When (H45-H0) is >1.4 km, a positive indication for hail begins
with a greater difference indicating a higher probability of hail. The
algorithm outputs a percent probability of hail that varies from 0 to
100% in 10% increments. Unlike the POSH, the POH predicts the
probability of hail of any size. The POH is retained for the same 20
storm cells as POSH.
2.3 NCAR Thunderstorm Identification, Tracking, Analysis, and Nowcasting (TITAN) Program
The NCAR TITAN program (Dixon and Weiner, 1993) allows rapid perusal of
radar reflectivity data as well as identification of storms,
computation of storm motions, storm tracking, and estimation of storm
tendencies for growth or dissipation. Since storm identification
information is not retained from either the NEXRAD SSA or NSSL SCIT
algorithms, the TITAN storm outlines at 30 and 40 dBZ reflectivity
thresholds define two of the five influence regions used in this study
to define the area where verification data are matched to algorithm
output. Radii from the verification report define the remaining
influence regions. Influence regions are discussed more fully in the
next section.
TITAN operates within a Cartesian reference frame having 1 km
horizontal and vertical grid spacing and a 300 X 300 X 20 km domain.
Cartesian volumes are constructed at the conclusion of each 6 min radar
volume with a time stamp applied at the midpoint of the collection
volume. Storm identification is accomplished by application of a
minimum reflectivity threshold, minimum volume requirements, and
minimum lifetimes. A storm outline is constructed using Cartesian (x,
y) coordinates.
Further, TITAN allows the concurrent display of verification reports,
algorithm predictions, and radar reflectivity data. This capability is
invaluable in finding and correcting hail verification reports that
have spatial or temporal errors as well as verifying that all
algorithms are operational.
The NCAR/RAP Hail Project was conducted in northeastern Colorado during
the months of June and July in 1992 and 1993 (Fig. 5). The Hail Project
was contained within a larger research effort named the Real-time
Analysis and Prediction of Storms (RAPS-92 and RAPS-93), an umbrella
project covering all summer convective research within RAP. Neilley et
al. (1993) describes these research efforts and the deployment of
additional instrumentation.
The operations center for RAPS 92-93 was the Aviation Weather
Development Laboratory (AWDL) housed at RAP. Reflectivity data from the
Mile High Radar (MHR), a NEXRAD prototype radar located 15 km northeast
of Denver Stapleton International Airport, was used for testing the
hail algorithms. Radar characteristics are listed in Pratte et al.
(1991). Two hail intercept teams were vectored to storms of interest
for documentation of precipitation type and the size and
characteristics of the hailstones. For RAPS-93, Global Positioning
System (GPS) units were placed in the hail cars, providing precise
locations at 15-20 s intervals. The GPS units were a distinct advantage
over the handwritten navigational documentation used in 1992. Freedom I
from navigational documentation allowed the intercept crews to document
hail characteristics more frequently, ideally at 1 min intervals. A
Volunteer Observing Network (VHN) was formed from area high school and
junior high school teachers and students and the public. Other
observations were provided by the Mountain States Weather Services
(MSWS) and the Denver NWS office.
Field operations were conducted Monday through Saturday from 18 UTC to
the end of weather activity (typically 01 UTC). Sunday was an
operational day if weather conditions were forecast as favorable for
hail. An 18 UTC forecast of convective development was given before
operations each day. After deployment of the intercept cars, the Hail
Coordinator vectored them to storms of interest. As rain or hail was
encountered, documentation procedures were begun for precipitation type
and intensity (light, moderate, heavy), the minimum, maximum, and
average hailstone size (mm), hail density (number of stones/m3), hail
depth, stone shape (round, flat, conical), stone color (clear, milky),
stone hardness (hard, mushy), and whether damage to vegetation or
property occurred. Photographic documentation of the hailstones was
made. Detailing similar information about rain or hail occurrence,
members of the VHN and MSWS network mailed their reports to NCAR.
Information gained from the VHN and MSWS network had less temporal
resolution than that from the intercept cars since their reports were
for the event as a whole rather than for the evolution of the event.
NWS reports typically contain the maximum hail size, location and the
time the report was received.
MHR data were transmitted into the AWDL and to the Forecast Systems
Laboratory (FSL) via a high speed telephone link and ingested into the
respective computer networks. The NSSL algorithms were installed by
NSSL on a RAP computer in the AWDL. The algorithms were run during
operations and the results displayed. The NEXRAD algorithm was run at
FSL and the output given to RAP at the conclusion of each field season.
3.2 Data editing
After each field summer program, the handwritten observations were
converted to ASCII computer files. For RAPS-92, the intercept car
positions were determined from topographic maps. Location, time of
occurrence, and the precipitation documentation described above were
included in the files. The verification and algorithm outputs were
written using a similar format, ingested into TITAN and overlaid onto
the MHR data. Verification reports were checked for spatial and
temporal errors. If a verification report was not located within
precipitation echo (i.e., a temporal error) or its location within the
storm was in error (i.e., severe hail occurring in a low reflectivity
region at the edge of the storm instead of near the maximum
reflectivity region), its location and time were rechecked and
corrected. If a position or time error could not be resolved
satisfactorily, the report was deleted. NWS reports were especially
prone to temporal errors since public reports were often received many
minutes after the event.
Each storm with verification data was characterized as either a
"Hailstorm" or a "Rainstorm" for each radar volume. A +3 min time
window centered on the verification time was selected for matching
algorithm predictions to precipitation reports because the collection
time for a radar volume was 6 min. Many storms contained multiple
observations of either hail or rain or both, necessitating the
selection of one verification report for retention. For a hailstorm,
the hail observation judged most representative was retained. The
report closest to the maximum reflectivity region (>45 dBZ) at the
lowest TITAN Cartesian level (either 2 or 3 km MSL, depending on range)
and with the largest hailstone size was kept. Rainstorms (i.e., "no
hail") test the ability of an algorithm to predict the non-occurrence
of hail and improves the distribution of observed events. However,
characterizing a rainstorm must be carefully done since it is easily
argued that the observer was not in the right position to encounter the
hail swath. For this reason, rules applied to rainstorms were more
stringent than those for hailstorms. First, when the maximum
reflectivity region was >45 dBZ at the lowest Cartesian level, the
observer location was required to be within the maximum reflectivity as
determined by TITAN at 1 km grid spacing, and the times of the rain
report and the TITAN radar analysis must correspond within 1 min.
Second, when the maximum reflectivity contour at the lowest level was
<45 dBZ, the rain and radar observations were required to be within 1
km and +3 min. Third, for stratiform rain situations, only those rain
observations within an isolated maximum reflectivity contour were
retained. Examples of hail and rain event selection are shown in Fig.
6.
Defining hail or rainstorms in this manner means the statistical
analysis is "observation driven" since only those storms with
verification data are used. Not all storms can be characterized, as is
optimal for determination of the population characteristics. For this
reason, statistical results are dependent on the characteristics of the
observations, such as their distribution. Inclusion of rain events
improves the distribution of observed event categories and the
statistical results.
3.3 Correlation Program
Using both temporal and spatial boundaries, a "correlation" program
matches the edited verification data with the algorithm predictions.
The confines of the temporal and spatial boundaries are termed an
"influence region." Five methods are used to define the influence
region for matching a verification report with an algorithm prediction.
The influence regions are determined by the distance from the
verification report at 5, 10, and 15 km influence radii and by the
TITAN storm outline at 30 and 40 dBZ thresholds.
Due to the 30 dBZ threshold, the NEXRAD SSA typically identifies large
regions that may contain multiple storm cells with reflectivity >50
dBZ. This is especially true in large squall lines and is illustrated
by a TITAN 30 dBZ storm outline (Fig. 7a). The smaller size of the 40
dBZ outline is shown in Fig. 8a. The NSSL SCIT algorithm identifies
individual storm cells within a storm on a 5-10 km spatial scale.
Algorithm performance is expected to be a function of the influence
region because of the different storm identification techniques used.
All algorithms are evaluated using the five methods.
3.4 Contingency Table Program and Statistical Quantities
After the correlation program was run for all storm days using the five
methods described above, the "contingency table" program placed the
matched verification and algorithm predictions into the cells of a
contingency table for statistical analysis (Table 2). The Critical
Success Index (CSI), Probability of Detection (POD) or prefigurance,
False Alarm Ratio (FAR), the Frequency of Misses (FOM), the Heidke
Skill Score (HSS) and the Mean Square Error (MSE) are calculated from
the contingency table (Donaldson et al. 1975; Stanski et al. 1989;
Doswell et al. 1990; Harvey et al. 1992).
The POD, FAR, CSI and FOM are calculated as
Equations
While somewhat redundant, the FOM is included to contrast algorithm
declaration of a miss versus a false alarm. Notice when the algorithm
correctly identifies a no hail event (NN), the event does not
contribute to the CSI, POD, FAR or FOM. The inclusion of the rain
events as discussed in Section 3.2 contribute to the evaluation by
measuring algorithm effectiveness in no hail events.
The Heidke Skill Score (HSS) is computed by
Equations
For the HSS, R is defined as the number of perfect forecasts, T is the
total number of events, Cy is the positive columns sum (see Table 2),
Cn is the negative columns sum, Ry is the positive row sum, Rn is the
negative row sum, and Ec is the expected number of correct predictions
from chance.
The Heidke Skill Score (HSS) uses the contingency table scores to test
the skill of each algorithm above a standard which, for purposes of
this report, is assumed to be "chance." In the HSS, forecasts that are
correct on the basis of chance are removed. No correlation is assumed
between the predicted and observed values. A perfect HSS score is +1.
If the algorithm has the same skill as chance, then the HSS = 0.
Negative values indicate fewer right predictions than chance.
Unlike the CSI, the Heidke Skill Score (HSS) uses the sum of the
perfect forecasts (i.e., YY and NN) and cannot be computed if either
cell is missing. Inclusion of rain events fills the NN cell of the
contingency table, assuming the algorithm has a correct prediction of
"no hail". Because the HSS requires both perfect forecasts (YY and NN)
for calculation, it is a better method for evaluation than the CSI
alone. However, for contingency tables, they give similar results.
The Mean Square Error (MSE) is computed from the contingency table as
Equations
and measures the proportion of misclassified events (Harvey et al.
1992). The MSE is equivalent to (1 - Proportion Correct) where the
Proportion Correct = (YY + NN)/T.
Reliability diagrams (Stanski et al. 1989) are computed for the NSSL
algorithms to illustrate the extent that the forecast probability
matches the actual frequency that hail is observed. Diagrams are
constructed with the forecast probability categories along the X-axis
and the observed frequencies along the Y-axis. Reliability is shown by
proximity of the curve to the bisecting 45o line. The 45o line
indicates perfect forecasts of the probability of hail. When an
algorithm over-forecasts (probabilities are too high), points are under
the 45o line. For under-forecasts (probabilities are too low), the
points are over the 45o line. Reliability is most accurate when
sufficient number of points are contained in each percentile of
probability.
Sharpness is defined from the distribution of points within each
probability category (Stanski et al. 1989). A perfectly sharp algorithm
has all forecasts in the 0% and 100% probability categories and acts as
a binary flag for the prediction of the event. Sharpness increases as
the number of forecasts in the extreme probability categories
increases.
3.5 Statistical Analysis Methodology
Procedural rules governing the statistical analysis were:
1) For each radar volume, storms are classified as a Hailstorm or a
Rainstorm based on the verification data. Only storms with verification
data are selected. Section 3.2 discusses the rules used in the
classification process. For this study, a total of 237 hail events and
95 rain events are included. See Fig. 6 for an illustration of event
selection criteria.
2) Within multicellular storms or convective cell complexes, the NEXRAD
Hail algorithm typically produces 1-3 predictions while the NSSL hail
algorithms typically produce >3. Figures 7b and 8b show an example of
this for a squall line. Having different numbers of algorithm
predictions per storm creates problems with interpretation of the
statistical results. Within the same storm, NEXRAD may predict "Hail"
and "No Hail", while POH and POSH may predict a 10%, 40%, 50% and 100%
probability of hail. To simplify the analysis, only 1 algorithm
prediction at the level deemed most severe is kept per storm per radar
volume. Similarly, only 1 verification report is kept per storm per
radar volume, as discussed in Section 3.2. For NEXRAD, predictions are
ranked by severity as "Hail," "Probable Hail," and "No Hail." All
"Insufficient Data" designations and their corresponding verification
data are removed from the data set. For POH and POSH, the maximum
percent probability of hail is kept.
3) The correlation program matches the algorithm predictions to the
verification data using the appropriate spatial and temporal
boundaries. If either or both algorithms has no pairing with a
verification report, a prediction is inserted at the lowest level,
which, for NEXRAD, is a "No Hail" prediction and, for POH and POSH, is
a 0% probability prediction. Prediction are inserted more frequently
for the NEXRAD algorithm than the NSSL algorithms due to the fewer
number of algorithm predictions generated. Typically, POH and POSH are
inserted when the maximum number of storm cells identified exceeds 20
and the verification is with a storm cell outside of the 20. In these
cases, the storm cell typically has weaker reflectivity values than the
20 identified storm cells.
4) Only days with both NEXRAD and NSSL algorithm predictions are used.
If one algorithm is not present for a short period during the day, the
other algorithm and the verification report are deleted.
5) Because the NSSL algorithms yield a probability of hail ranging from
0 to 100%, determination of a "Hail" versus "No Hail" threshold is
desired. To test for the appropriate threshold, increasing thresholds
are applied at 10% intervals with those predictions at or above the
threshold probability being designated as "Hail" while those below are
"No Hail." For example, to test at the 50% probability level, algorithm
predictions <50% are designated as a "No Hail" prediction while those
>50% are designated as a "Hail" prediction. At the 0% probability
threshold, all algorithm predictions are set to "Hail" such that the
CSI scores are actually the percentage of hail events. The seeming
discontinuity between the 0% and 10% probability thresholds seen in
most statistical quantities results from the discretization of the
algorithm probability predictions into 10% intervals. Witt (1993) used
>50% as the probability threshold in his performance evaluation of the
POSH algorithm.
6) NEXRAD outputs "Hail" and "Probable Hail" designations. To test the
added value of the "Probable Hail" prediction, the algorithm is scored
two ways. In one test, "Probable Hail" predictions are designated as
"No Hail." Test results are termed the NEXRAD Hail algorithm in
subsequent figures and tables. In the second test, "Probable Hail"
reports are considered "Hail" predictions and are termed the NEXRAD
Hail + Probable Hail algorithm.
7) To test algorithm performance as hail size increases, the
verification reports are thresholded at four levels: >0 mm, >6 mm, >13
mm, and >19 mm. Hail reports between 1-5 mm are typically small ice
particles or graupel. Graupel is fairly common in Colorado due to the
relatively low height of the wet bulb zero isotherm when compared to
other geographical regions. During the test, hail reports below the
size threshold being applied are changed to "No Hail." Too few reports
of hail >25 mm prevent statistical analysis at larger hail sizes. The
NWS identifies severe hail as >19 mm and only those reports are
contained within Storm Data.
The statistical analysis of the NEXRAD and NSSL hail algorithms for the
five influence regions (5, 10, and 15 km influence radii, the 30 dBZ
and 40 dBZ storm outlines, respectively) and 4 hail size categories is
summarized in numeric form in Tables A1-A5 of Appendix A. In the
following sections, results from the 15 km influence radius are
discussed for the NSSL algorithms while results from the 30 dBZ
influence region are used for the NEXRAD algorithm because these
influence regions produced the maximum CSI scores for the respective
algorithms. Except for the 5 km influence radii which has the worst
performance, statistical quantities are similar for all influence
regions and differ by 0.15, at most. The similarity of the statistical
results demonstrates a lack of sensitivity to influence region
selection and may be a consequence of selecting the maximum algorithm
predictions. The poor performance at a 5 km influence radius suggests
that this spatial scale is less consistent with the cell and storm
definitions used by the algorithms. The statistical quantities for the
15 km influence radius are plotted in Figs. 9-12.
4.1 NEXRAD Hail Algorithm Results
The CSI for the NEXRAD Hail algorithm (hereafter referred to as the
NHail algorithm) varies from 0.22 to 0.48 for the four hail size
categories (see Table A4 for the 30 dBZ influence region statistics).
The maximum CSI occurs with the hail threshold >6 mm and the minimum
CSI occurs for severe hail (>19 mm). Having the maximum CSI at a small
hail size threshold is consistent with the algorithm design of
detecting hail of any size. The POD scores range from 0.49 to 0.60 and
increase as the lower bound for defining a hail event increases. The
maximum HSS is 0.36 and occurs for hail sizes >6 mm. The minimum HSS
value (0.18) occurs with severe hail. Values for the FAR include a
minimum of 0.06 for hail >0 mm and a maximum of 0.75 for severe hail.
The FOM decreases from 0.52 to 0.40 as the hail size threshold
increases. The MSE is as a maximum of 0.39 for all hail sizes and a
minimum of 0.33 at a hail sizes threshold >6 mm.
The NEXRAD Hail + Probable Hail algorithm (hereafter referred to as the
NHailPH algorithm) shows significant improvement in CSI over NHail for
hail size categories >0 and >6 mm (Table A4) with CSI scores of 0.72
and 0.68, an increase of 0.25 and 0.20, respectively. The maximum CSI
score for NHailPH occurs at >0 mm. At hail size categories >13 mm and
>19 mm, the CSI scores for NHailPH decrease markedly from those at the
small hail size thresholds but are greater than or equal to the NHail
CSI scores, respectively. The improvement in CSI caused by inclusion of
the Probable Hail prediction has been documented by Smart (1985) and
Witt (1993). As might be expected from the NEXRAD SSA storm definition,
the CSI scores for the 30 dBZ influence region are higher than those
for the 40 dBZ influence region (see Tables A4 and A5). For the two
smaller hail size thresholds, the 30 dBZ influence region outperforms
the 15 km radius of influence region in CSI scores. The NHailPH POD
scores are an improvement of 0.21 to 0.27 over the NHail scores. The
FAR scores of NHail and NHailPH differ by <0.02 for all hail size
thresholds. With a few exceptions, the statistical quantities show that
the NHailPH has more skill than the NHail.
Smart (1985) evaluated the NHail and NHailPH algorithms in Colorado for
severe and non-severe hailstorms. Rainstorm data were presented but
were not included within the statistical calculations. His verification
data set consisted of chase car observations and weather observations
reported to the Denver NWS office. For comparison to the statistical
results from this study, the rainstorm data are added to the Smart
contingency tables and presented in Table 4. Further, the scoring
methodology used in this study for the NHailPH algorithm is applied
(see Section 3.5, Item 6). A difference between the two analyses is the
time window applied to match the weather observations to the algorithm
predictions. Smart used a 20 min time window centered on the
observation time such that 1 observation might be counted as many as 5
times (a function of the radar volume collection time). This study uses
a 6 min window centered on the observation time with each observation
being counted once. However, chase crews tended to remain with a storm
for an extended period of time taking frequent observations and
producing a similar number of events per storm as in the Smart study.
Smart included hailstones as small as 3-6 mm, therefore, the
statistical results from the hail size threshold >6 mm should be
comparable. After the modifications are made to the Smart study, good
agreement is found with the results from this study with maximum
differences of <0.21 among the CSI, POD, and FAR for both the NHail and
NHailPH algorithms using the 30 dBZ influence region (compare Table 4
to Table A4). The CSI and POD scores found in this study are better
than that found by Smart; FAR scores are comparable.
Using Oklahoma and Florida data, Witt (1993) evaluated the NHail and
NHailPH algorithms for severe hailstorms and non-severe storms using
Storm Data hail reports as the verification data set. All storm cells
identified by the NEXRAD SSA algorithm were assigned a
precipitation-type designation such that any storm cell without a
corresponding hail report was assumed to be a non-severe (i.e.,
non-hail) storm.Witt used a 60 min time window (-45 to +15 min) with
t=0 min being the time of the verification report. The rule set for
filling the cells of the 2 X 2 contingency table varied over the 60 min time
window. For the -15 to +5 min window, both hits (YY) and misses (NY)
were accumulated in the contingency table. For the -45 to -16 and +6 to
+15 min time intervals, only hits were recorded. False alarms (YN) were
recorded at t=0 min whenever a positive algorithm indication of hail
occurrence existed but no corresponding hail report was found. This
scoring scheme tends to inflate the YY cell of the contingency table
and favor higher CSI and POD scores and lower FAR scores, particularly
for long-lived hailstorms. For the NHail and NHailPH algorithms, Witt
found the CSI=0.47 and 0.54, the POD=0.66 and 0.84, and the FAR=0.38
and 0.39, respectively. We attribute Witt's higher CSI scores (by 0.25
and 0.32), higher POD (by 0.06 and 0.03), and lower FAR (by 0.37 and
0.38) principally to the inflation of the YY cell in the contingency
table.
4.2 Results from the NSSL POSH and POH Algorithms
The verification data for the NSSL POH and POSH algorithms are included
in Figs. 9-12 and Tables A1-A5 of Appendix A. Statistical results from
the 15 km influence radius are used in the discussion. Recall that the
POH algorithm is designed specifically for hail of any size, while the
POSH algorithm is designed specifically for severe hail (>19 mm). The
POH, NHail, and NHailPH are comparable algorithms. However, to
ascertain how well the POH and POSH algorithms perform their respective
functions and to check for redundancy, the results for the POSH
algorithm at the smaller size thresholds for defining hail events and
the POH algorithm at larger size thresholds have been tabulated for
comparison.
Inspection of Fig. 9a and Table A3 reveals that the skill of the POH
algorithm for hail of any size, as measured by the CSI score, is
relatively constant (decreasing from 0.89 to 0.81) over the probability
threshold interval of 10 to 80%. At higher probability thresholds, the
CSI rapidly decreases. The POH algorithm has considerably greater skill
for identifying hail of any size than the NEXRAD algorithm for both the
NHail and NHailPH categories. When the POSH algorithm is applied to
hail of any size, its CSI decreases steadily from 0.85 at 10%
probability to 0.46 at a probability of 80%. The difference between the
two algorithms ranges from 0.03 at 10% to 0.35 at 80%. Thus, the POH
algorithm also outperforms the POSH algorithm for any size hail.
The discriminant function for the POH (Fig. 13) shows that the majority
of the algorithm predictions for "Hail" or "No Hail" are at the 0% and
>80% probability categories, while the discriminant function for the
POSH (Fig. 14) shows more algorithm predictions than POH are at
intermediate probabilities. Consequently, the POH algorithm is more
sharp than POSH with fewer of the POH predictions changing from "Hail"
to "No Hail" as the probability threshold increases. As a result, a
near-constant slope is attained by the POH. This effect is less
striking at the two larger hail size categories. The general sharpness
exhibited by the two algorithms may be partly a consequence of
selecting the maximum algorithm prediction within each influence
region. A numeric tabulation of the distribution of precipitation
events for the NEXRAD and NSSL algorithms is given in Tables 5-7.
For hail >0 mm (Fig. 9a and Table A4), the CSI score for the POH
algorithm is greater than that for the NHail algorithm at all algorithm
probability thresholds and is greater than the NHailPH algorithm except
at the 90% and 100% probability thresholds. The large improvement
demonstrated by the POH shows it is a better algorithm for detecting
hail of any size.
In general, as the size threshold for defining hail increases, the CSI
scores decrease for the NSSL algorithms, like those for the NEXRAD
algorithm. The decrease is attributable to a rapid increase in FAR.
Notice also that the range in CSI values among all algorithms decreases
significantly as the hail size threshold increases. As the size
threshold for defining hail events increases, the performance of the
POSH algorithm relative to the POH algorithm steadily improves. In
fact, for severe hail (Fig. 12a), the POSH algorithm outperforms all
others. This result is consistent with the design of the algorithm.
Also, the slope of the CSI line for the POSH becomes positive. As
mentioned earlier, the HSS scores (Figs. 9c-12c) behave similarly as
the CSI score.
Witt (1993) evaluated the NSSL POSH algorithm for severe hail and
non-hail events using the methodology discussed in Section 4.1 but
using the NSSL SCIT algorithm for storm cell identification. Using a
50% probability threshold, he found the POSH CSI=0.71, POD=0.86 and the
FAR=0.21. This study found the CSI=0.29, POD=0.94, and the FAR=0.71 at
the 50% probability threshold and for the 15 km influence radius. We
attribute the greater skill found by Witt primarily to the variable
rule set used for matching precipitation observations to the algorithm
designations.
Similar to the CSI curves, the POD curves for the POH algorithm (Figs.
9b-12b) are also nearly flat to the 80% probability threshold and then
decrease rapidly. For hail of any size, the POH POD is generally
greater than 0.80. The POH POD is always greater than the NHail POD;
and except for the 100% probability threshold at the three smallest
size categories, the POH PODs are greater than the NHailPH values. As
expected, the POSH PODs are largest for hail >19 mm. Although the POSH
PODs improve relative to POH as hail size increases and they become
quite large (i.e., 0.94 for hail >19 mm and a probability threshold of
50%), they remain less than those for POH.
Figures 9d-12d show that the FARs for all algorithms increase
dramatically as the size threshold for defining hail increases. The
range in FAR values among algorithms remains small. For hail >0 mm, the
typical FAR is about 0.05; for severe hail, the typical FAR is about
0.75. At the higher probability of hail thresholds, the NSSL POSH
algorithm tends to have the lowest FAR. To examine the possible
contribution to the FAR of the 95 observed events from rainstorms
(i.e., rain-only events), Table 8 shows the distribution of these
events for the hail predicted/not observed category (i.e., the YN cell)
of the contingency table. The POSH algorithm has no rain-events at
probabilities in excess of 50% as compared to POH and is a better
discriminator of rain-only events than the POH; NHail and NHailPH have
7 and 9 rain-only events that respectively contribute to the FAR.
The MSE curves (Figs. 9e-12e and Table A4) show that, for hail of any
size, the POH algorithm has the lowest error except at 90% and 100%
probability where it is greater by 0.03 and 0.12, respectively, than
the NHailPH algorithm. Hence, the POH algorithm represents a
significant improvement over the NEXRAD algorithm. For hail sizes
thresholds >13 mm and >19 mm, the MSE for the POSH algorithm has a mean
value and a 50% probability threshold value that is roughly equivalent
to the NEXRAD algorithms but lower than the POH algorithm.
Reliability diagrams are constructed (see Section 3.4 for discussion)
to examine the extent that the NSSL algorithm forecast probabilities
match the observed frequencies hail (Figs. 15-16). For hail events >0
and >6 mm, the POSH algorithm under-forecasts the occurrence of hail
(Fig. 15a-b). For hail size categories >13 and >19 mm, the POSH
algorithm over-forecasts the occurrence of hail. The reliability
diagram suggests that the algorithm would be optimal for detecting hail
with a minimum size between 6 and 13 mm. Although the number of hail
events in some of the probability categories is small (see Figs. 13 and
14), the POH algorithm (Fig. 16) clearly approaches the 45o line for
hail >6 mm, under-forecasts for hail of any size, and over-forecasts
for the two largest hail size thresholds.
The NEXRAD Hail algorithm performed best for hail of any size and a 30
dBZ influence region (CSI=0.47). As noted in previous studies, changing
the Probable Hail designation to a prediction of Hail resulted in
improved performance over NEXRAD Hail alone (CSI=0.72). Results shown
here are slightly better than that in the Smart (1985) study of the
NEXRAD algorithms in Colorado.
The NSSL POH algorithm, like NEXRAD, was designed to detect small hail
and had its best CSI performance at the smaller hail size categories
with typical CSI values between 0.80 and 0.90. The POH performed
significantly better than the NEXRAD Hail or Hail + Probable Hail
algorithms by all measures except perhaps the FAR where all results
were similar. For hail of any size (>0 mm), the POH under-forecasts the
probability of hail occurrence.
When the results for the POSH algorithm are compared to that for the
algorithms designed to detect hail of any size but applied to severe
hail, the POSH algorithm has the highest CSI and HSS scores. At the 50%
probability threshold and for a 15 km influence radius, the CSI is
0.29. Further, the POSH algorithm has a higher POD and lower FOM than
the NEXRAD algorithm for the NHail and NHailPH cases. However, the
highest POD and lowest FOM for severe hail were attained by the POH
algorithm. The POSH algorithm was comparable in performance to the
NEXRAD algorithm in terms of the MSE and the FAR. For Colorado, the
POSH algorithm over-forecasts the probability of severe hail. Further,
algorithm hail probabilities seem optimal for hail with a minimum size
between 6 and 13 mm. Overall, our results were not as spectacular as
that found by Witt (1993) in Oklahoma and Florida. We believe this
result is due in part to differences in matching the precipitation
observations to the algorithm designations. We conclude, however, that
the NSSL POH algorithm represents a significant improvement over the
NEXRAD algorithm for detecting all hail events and that the NSSL POSH
algorithm exhibits some skill in detecting severe hail events.
Special thanks to Mr. J. Smith for data archival, quality control of
the verification data, and figure preparation, to Ms. L. Carson and Dr.
M. Dixon for their correlation program and in understanding TITAN, and
to Mr. E. Jeannette for data archival, all from NCAR. Thanks to Ms. C.
Mueller, Ms. B. Brown, and Mr. J. Wilson, all of NCAR, for discussions
on statistical analysis and for reviewing this paper. Thanks also to
Mr. A. Witt, Mr. J. Johnson and Mr. M. Eilts, all of NSSL, for
installation of the NSSL HDA program and for answering questions; to
Mr. R. Lipschutz, FSL, for providing the NEXRAD data; to Mr. L. Mooney,
NWS, for providing the NWS severe weather reports; to Mr. J. Wirshborn
and Mr. B. Bernstein for their organization of the VHN; to Ms. R.
Swindle, NCAR, for maintaining the NSSL HDA program at RAP and in
setting up archival procedures; and to Dr. P. Neilley for installing
the GPS systems. Special thanks to Ms. N. Knight, NCAR, and to the many
participants in the RAPS 92-93 Hail Projects whose dedication and
enthusiasm ensured a high quality verification data set was collected.
The radar data set was collected by the NCAR Atmospheric Technology
Division. This research is sponsored by the National Science Foundation
through an Interagency Agreement in response to requirements and
funding by the Federal Aviation Administration's Aviation Weather
Development Program. The views expressed are those of the authors and
do not necessarily represent the official policy or position of the
U.S. Government.
The Sunday Camera, 1994: Hail shatters plane's windshield in flight.
The Boulder Daily Camera, 2 Oct. 1994, Boulder, CO, p. 3B.
Dixon, M., and G. Wiener, 1993: TITAN: Thunderstorm Identification,
Tracking, Analysis, and Nowcasting - A radar-based methodology. J.
Atmos. Ocean. Tech., 10, 785-797.
Donaldson, R.J., R.M. Dyer, and M.J. Kraus, 1975: An objective
evaluator of techniques for predicting severe weather events.
Preprints, 9th Conference on Severe Local Storms, Norman, OK, Amer.
Meteor. Soc., 321-326.
Doswell, C.A, III, R. Davies-Jones, and D.L. Keller, 1990: On summary
measures of skill in rare event forecasting based on contingency
tables. Weather and Forecasting, 5, 576-585.
Flight Safety Foundation News, 1993: International panel of aviation
experts delivers safety messages to European and regional operators,
Flight Safety Foundation News, 32.
Harvey, Jr., L.O., K.R. Hammond, C.M. Lusk, and E.F. Mross, 1992: The
application of signal detection theory to weather forecasting behavior,
Mon. Wea. Rev., 120, 863-883.
Neilley, P.P., N.A. Crook, E.A. Brandes, M. Dixon, C. Kessinger, C.
Mueller, R. Roberts, and J. Tuttle, 1993: RAPS92 - Realtime analysis
and prediction of storms, 1992. Preprints, 26th Radar Meteor. Conf.,
Norman, OK, 24-28 May 1993, Amer. Meteor. Soc., Boston, 135-137.
Petrocchi, P.J., 1982: Automatic detection of hail by radar. AFGL,
Tech. Report 82-0277, 33 pp.
Pratte, J.F., J.H. Van Andel, D.G. Ferraro, R.W. Gagnon, S.M. Maher,
G.L. Blair, 1991: NCAR's Mile High meteorological radar. Preprints,
25th International Conf. on Radar Meteor., 24-28 June 1991, Paris,
France, Amer. Meteor. Soc., 863-866.
Smart, J.R., and R.L. Alberty, 1984: An evaluation of the performance
of the NEXRAD hail algorithm. Preprints, 22nd Radar Meteor. Conf.,
Zurich, Amer. Meteor. Soc., Boston, 202-207.
Smart, J.R., 1985: Performance evaluation of the NEXRAD hail algorithm
applied to Colorado thunderstorms. NOAA Tech. Memo. ERL ESG-18, 29
pp.
Smart, J.R., and R.L. Alberty, 1985: The NEXRAD hail algorithm applied
to Colorado thunderstorms. Preprints, 14th Conf. Severe Local Storms,
Indianapolis, Amer. Meteor. Soc., Boston, 244-247.
Stanski, H.R., L.J. Wilson, and W.R. Burrows, 1989: Survey of Common
Verification Methods in Meteorology. 2nd Edition, Atmospheric
Environment Service, Downsview, Ontario, Canada, 112 pp.
Waldvogel, A., W. Schmid, and B. Federer, 1978a: The kinetic energy of
hailfalls. Part I: Hailstone spectra. J. Appl. Meteor., 17, 515-520.
Waldvogel, A., B. Federer, W. Schmid, and J.F. Mezeix, 1978b: The
kinetic energy of hailfalls. Part II: Radar and hailpads. J. Appl.
Meteor., 17, 1680-1693.
Waldvogel, A., B. Federer and P. Grimm, 1979: Criteria for the
detection of hail cells. J. Appl. Meteor., 18, 1521-1525.
Waldvogel, A., and W. Schmid, 1982: The kinetic energy of hailfalls.
Part III: Sampling errors inferred from radar data. J. Appl. Meteor.,
17, 1680-1693.
Witt, A., 1990: A hail core aloft detection algorithm. Preprints, 16th
Conf. Severe Local Storms and Conf. Atmos. Electr., Kananaskis Park,
Amer. Meteor. Soc., Boston, 232-235.
Witt, A., 1993: Comparison of the performance of two hail detection
algorithms using WSR-88D data. Preprints, 26th Radar Meteor. Conf.,
Norman, OK, Amer. Meteor. Soc., Boston, 154-156.
Witt, A., and J.T. Johnson, 1993: An enhanced storm cell identification
and tracking algorithm. Preprints, 26th Radar Meteor. Conf., Norman,
OK, Amer. Meteor. Soc., Boston, 141-143.
3.0 Verification Methodology
3.1 Data collection
4.0 Discussion of Results
An activity summary from the RAPS-92 and RAPS-93 Hail Projects is
compiled in Table 3. Five days from 1992 and 20 from 1993 are used.
Sixteen days are characterized with severe and non-severe hail events,
4 days have only non-severe hail events, and 5 days have no hail
events. A total of 97 hailstorms and 68 rainstorms have verification
data. A storm, as defined by TITAN at a 40 dBZ threshold, may exist for
multiple radar volumes and may contain multiple hail or rain events.
The verification data classify the storm as a hailstorm or a rainstorm.
Of the 97 hailstorms documented, a total of 237 hail events were
recorded with the maximum hailstone having a diameter of 65 mm and the
smallest hailstone having a diameter of a few millimeters. Of the 237
hail events, 193 are >6 mm, 102 are >13 mm, and 52 are >19 mm. From the
68 rainstorms, a total of 95 rain reports were collected. About half of
the hail reports were collected by the NCAR hail intercept crews, with
the sum of the VHN, NWS, and MSWS reports comprising the other half.
The mobility of intercept crews ensured that a large number of storms
were investigated.
5.0 Conclusions
This study has examined three reflectivity-based hail detection
algorithms. Verification data taken over two summers have been compared
to matching algorithm predictions of hail occurrence. Statistical
analysis was applied and indices used to compare algorithm performance.
Acknowledgments
References