A severe-weather warning is not one thing. A warning is a bundle of decisions made by a forecaster — event type, hail estimate, wind estimate, damage threat tags, geometry — wrapped into a single CAP feed message and pushed out the door in seconds. Two warnings with the same headline can describe wildly different events on the ground. One is a 0.75-inch nuisance hailer that will dent gutters. The other is a 3-inch baseball-hail catastrophe that will total roofs across a half-mile corridor.

For a roofing company, treating those two events as equivalent is the difference between a productive week and a wasted one. That is why every warning that crosses our pipe gets pushed through a single, consistent scoring model — and why the score is the first thing that decides whether the event ever reaches a customer at all.

What the score actually is

The model produces two things: a 0–100 severity score and a tier. The score is the continuous number — useful for sorting, ranking, and weighting. The tier is the categorical label — useful for product surfaces, alerts, and the four-color heatmap on the live map. Every storm-belt warning we ingest gets both.

The four tiers, in order:

Extreme (score 80–100): Giant hail, destructive straight-line winds, tornado emergencies, or any warning carrying an Impact-Based Warning “Destructive” tag. These are the events that drive insurance claim volume.
Very Severe (score 60–79): Significant hail (typically 2″+) or significant winds (75 mph+), or any warning carrying an IBW “Considerable” tag. Roof-damage probability is high across the polygon.
Severe (score 40–59): Standard severe thunderstorm criteria — 1″+ hail or 58 mph+ winds. Damage is real but selective; outcome depends heavily on local roof age and exposure.
Watch (score under 40): Sub-severe events, marginal warnings, and watch-only alerts. Surfaced for context, not for chase prioritization.

The cutoffs are not arbitrary. They were calibrated against five years of paired warning + claim data, looking for the thresholds where claim probability per affected parcel makes a non-linear jump. The biggest jump sits between Severe and Very Severe — which is why the heatmap recolors hard at that line.

The three inputs that move the score the most

Most of the 0–100 number is driven by three inputs, in roughly this order of weight:

1. Peak hail size. Hail is the dominant roof-damage mechanism in the central and southern storm belts, and it scales non-linearly with diameter. A 2″ stone has roughly four times the kinetic energy of a 1″ stone at the same fall speed. The model reflects that — the hail-size contribution to the score is non-linear, with steep slopes at the 1.75″, 2.5″, and 3″ thresholds where roof material failure modes change.

2. Peak wind gust. Wind drives the secondary damage layer — torn shingles, lifted ridge caps, blown flashing — plus the bulk of commercial roofing exposure (built-up roofs and single-ply membranes care more about wind than they do about hail, up to a point). Gusts at 70 mph and above pull the score up materially; gusts above 90 mph push almost any warning into the top tier on their own.

3. IBW damage threat tags. The National Weather Service started tagging warnings with explicit Impact-Based Warning damage threats in 2021, and they have become the single highest-signal field in the feed. A “Destructive” tag is a forecaster saying, in plain language, that this event will damage structures. We weight that tag heavily — heavily enough that a Destructive tag alone can push a warning into Extreme territory regardless of the raw hail and wind numbers.

Hail size and wind speed alone are not enough to score a warning correctly. The IBW damage tags are the closest thing we have to a forecaster’s expert judgment about how bad the event will actually be — and the model treats them that way.

The inputs that fine-tune it

Beyond the big three, the model considers several secondary factors that move the score by a few points in either direction:

Event type. Tornado warnings, flash flood warnings, and severe thunderstorm warnings carry different base scores even before the magnitude inputs are applied. A tornado emergency starts much higher than a generic severe thunderstorm warning.
Polygon geometry. Larger warning footprints implicate more parcels but typically average lower per-parcel intensity. The model normalizes for this so a half-county warning is not artificially inflated relative to a tightly-drawn tornado warning.
Time of day. Nighttime severe warnings get a small upweight because nocturnal storms historically correlate with higher casualty and structural-damage rates for the same magnitude inputs.
Source confidence. Warnings with confirmed spotter reports, radar-indicated TVS signatures, or MRMS MESH agreement get a small confidence boost. Warnings issued on weaker signals lose a couple of points.

Why a single number matters

Plenty of weather products show you raw hail size, raw wind gust, raw warning text. The advantage of collapsing all of that into a single 0–100 number is operational: it lets the whole product reason about “how bad was this” consistently, no matter who is looking. Sales managers use the score to triage which storms get the team scrambled. Crews use the tier color on the heatmap to decide which neighborhood to canvas first. Our own pipelines use the score to decide which storms are worth saving snapshots for and which ones are not.

The other reason is comparability across time. Because the same classifier scores every warning archived in the system, you can ask questions that are otherwise impossible — questions like “how many Extreme-tier events did the Plains see this season versus last?” or “has the ratio of Extreme to Severe warnings shifted in the last five years?” The score is what makes the historical record coherent.

What the score is not

Worth being clear about: the score is a severity estimate, not a damage guarantee. A high-tier warning over an empty rural area produces fewer claims than a mid-tier warning over a dense suburban corridor. A storm scoring 92 over houses with brand-new roofs will still produce far less claim volume than the same storm over a neighborhood of 15-year-old asphalt. The score tells you the storm was bad. Combining it with parcel data, roof age signals, and recent permit density is what tells you the opportunitywas real.

That is the layered model the rest of the product is built on. The score is the front door — everything downstream depends on it being right.

See the model in action

Every warning on the live map is colored by tier — yellow for Severe, orange for Very Severe, red for Extreme — and every saved storm in the system carries its peak score with it. If you want to see what tier the events near you have been hitting, the live map shows the last 24 hours scored in real time, and the storm history shows everything older scored against the same model.

The whole point of building this in-house, instead of leaning on a third-party severity feed, was so the score could be tuned for what roofers actually care about — not what an insurance underwriter or a power utility cares about. The tiers, the cutoffs, and the input weights were all set with one question in mind: does this warning produce roof claims? That is the only question the model is built to answer.

If you want to put it to work, sign up gets you 5 credits on the house and full access to the live tiered map. Or if you want a quick walk-through of how the score flows from a warning to a property list, our how it works page covers the full path end to end.

Inside the Storm Scoring Model