Paper in Stat: Stagewise Boosting Distributional Regression

Research

Statistical Modeling

Computational Statistics

Published

January 1, 2026

Modern probabilistic forecasting requires modeling not only the mean of a response but its full distribution. Distributional regression, such as GAMLSS, provides this flexibility, yet reliable variable selection and stable estimation remain challenging in high-dimensional and complex settings. In our new paper, we introduce Stagewise Boosting Distributional Regression, a robust alternative to classical gradient boosting that overcomes vanishing gradient issues and improves scalability.

Key Contributions

Semiconstant Stagewise Updates: We adapt forward stagewise regression to distributional regression using sign-based, semiconstant step updates. This prevents vanishing gradients, a common issue in complex distributions (e.g., zero-adjusted negative binomial models), and stabilizes learning.
Best-Subset Updating Across Distributional Parameters: Instead of updating only one parameter per iteration, the proposed method allows simultaneous updates of optimal subsets of distributional parameters, leading to more balanced and efficient estimation.
Correlation Filtering (CF) for Variable Selection: A novel correlation-based early stopping and selection criterion replaces costly cross-validation. CF effectively reduces false positives while maintaining strong predictive performance.
Batchwise Variants for Big Data:
A stochastic, batchwise extension enables scalable estimation for massive datasets, avoiding local optima and drastically reducing computation time.

Vanishing gradient problem in distributional regression. The top panels show classical gradient boosting updates, while the bottom panels display stagewise boosting with a semi-constant step size. For the normal distribution (left), both methods converge quickly to the optimal scale parameter. For the negative binomial model (right), gradient boosting stalls in flat likelihood regions due to vanishing gradients, whereas stagewise boosting continues to make stable progress toward the optimum.

Simulation Results

An extensive simulation study compares the proposed approach with state-of-the-art methods including gradient boosting, stability selection, and adaptive step-length boosting. Across normal, Gamma, and zero-adjusted negative binomial distributions, the proposed stagewise variants:

Achieve lower false positive rates in variable selection.
Provide competitive or superior predictive performance (CRPS).
Avoid vanishing gradient problems in complex models.
Are dramatically faster in large-scale settings (up to 100x faster in high-dimensional scenarios).

In challenging zero-inflated count models, classical gradient boosting struggles due to flat likelihood regions, whereas stagewise boosting remains stable and continues to identify relevant predictors.

Application: Lightning Forecasting in Austria

To demonstrate scalability and practical relevance, we model hourly lightning counts across Austria using:

9.1 million observations,
672 candidate covariates (including transformations),
a zero-adjusted negative binomial model.

The stagewise boosting algorithm successfully selects a balanced subset of meteorological predictors and delivers well-calibrated probabilistic forecasts. The approach efficiently handles rare-event data (only 2.65% positive counts) using a batchwise strategy with intercept correction.

The resulting model provides spatially resolved probabilistic forecasts of lightning intensity, illustrating the method’s suitability for large-scale environmental applications.

Lightning nowcast example (ZANBI model) for Austria on July 27, 2019 (1-4 PM). The top row shows observed lightning counts; rows 2-4 show model-based exceedance probabilities for three thresholds: >0 (at least 1 strike), >9 (at least 10 strikes), and >19 (at least 20 strikes).

Enabling Scalable and Stable Distributional Regression

Distributional regression models are essential for modern probabilistic modeling in areas such as climate science, environmental risk assessment, epidemiology, and economics. However, classical boosting approaches can fail in complex or high-dimensional settings due to vanishing gradients and expensive tuning procedures.

Our stagewise boosting framework:

Enhances numerical stability,
Improves variable selection,
Reduces computational burden,
Scales to millions of observations,
And maintains full probabilistic interpretability.

The methods are implemented in the R package stagewise, supporting all distributions from gamlss.dist.