Thunderstorm forecasting with GAMs

Boosted binary generalized additive models (GAMs) with stability selection and corresponding MCMC-based credibility intervals are discussed in a new MWR paper as a probabilistic forecasting method for the occurrence of thunderstorms.


Thorsten Simon, Peter Fabsic, Georg J. Mayr, Nikolaus Umlauf, Achim Zeileis (2018). “Probabilistic Forecasting of Thunderstorms in the Eastern Alps.” Monthly Weather Review. 146(9), 2999-3009. doi:10.1175/MWR-D-17-0366.1


A probabilistic forecasting method to predict thunderstorms in the European eastern Alps is developed. A statistical model links lightning occurrence from the ground-based Austrian Lightning Detection and Information System (ALDIS) detection network to a large set of direct and derived variables from a numerical weather prediction (NWP) system. The NWP system is the high-resolution run (HRES) of the European Centre for Medium-Range Weather Forecasts (ECMWF) with a grid spacing of 16 km. The statistical model is a generalized additive model (GAM) framework, which is estimated by Markov chain Monte Carlo (MCMC) simulation. Gradient boosting with stability selection serves as a tool for selecting a stable set of potentially nonlinear terms. Three grids from 64 x 64 to 16 x 16 km2 and five forecast horizons from 5 days to 1 day ahead are investigated to predict thunderstorms during afternoons (1200–1800 UTC). Frequently selected covariates for the nonlinear terms are variants of convective precipitation, convective potential available energy, relative humidity, and temperature in the midlayers of the troposphere, among others. All models, even for a lead time of 5 days, outperform a forecast based on climatology in an out-of-sample comparison. An example case illustrates that coarse spatial patterns are already successfully forecast 5 days ahead.


Case study

Predicting thunderstorms in complex terrain (like the Austrian Alps) is a challenging task since one of the main forecasting tools, NWP systems, cannot fully resolve convective processes or circulations and exchange processes over complex topography. However, using a boosted binary GAM based on a broad range of NWP outputs useful forecasts can be obtained up to 5 days ahead. As an illustration, lightning activity for the afternoon of 2015-07-22 is shown in the top-left panel below, indicating thunderstorms in many areas in the west but not the east. While the corresponding baseline climatology (top middle) has a low probability of thunderstorms for the entire region, the NWP-based probabilistic forecasts (bottom row) highlight increased probabilities already 5 days ahead, becoming much more clear cut when moving to 3 days and 1 day ahead.

observed and forecasted occurrence of thunderstorms on 2015-07-22

More precisely, the probability of thunderstorms is predicted based on a binary logit GAM that allows for potentially nonlinear smooth effects in all NWP variables considered. It selects the relevant variables by gradient boosting coupled with stability selection. Effects and 95% credible intervals of the model for day 1 are estimated via MCMC sampling and shown below (on the logit scale). The number in the bottom-right corner of each panel indicates the absolute range of the effect. The x-axes are cropped at the 1% and 99% quantiles of the respective covariate to enhance graphical representation.

stability-selected effects of the boosted binary logit GAM

(Note: As the data cannot be shared freely, the customary replication materials unfortunately cannot be provided.)