Paper in Journal of Computational and Graphical Statistics: Scalable Estimation for Structured Additive Distributional Regression
The estimation of probabilistic models is crucial in various applications, but fitting distributional regression models to large datasets remains computationally demanding. This study proposes a novel batchwise backfitting algorithm, which combines ideas from stochastic gradient descent (SGD) with backfitting, allowing estimation on arbitrarily large datasets, even on conventional laptops.
Key Contributions
- A new batchwise backfitting algorithm for structured additive distributional regression, designed for scalability.
- The method leverages stochastic gradient descent and Hessian-based learning rates, ensuring fast convergence with minimal manual tuning.
- It allows for automatic selection of variables and smoothing parameters, reducing overfitting risks.
- Performance is demonstrated through:
- Simulation studies comparing computation time and accuracy against traditional methods (e.g., gradient boosting).
- A large-scale case study predicting lightning counts in Austria using 9 million observations and 80 covariates.
Why This Matters
The proposed algorithm enables researchers and practitioners to fit complex probabilistic models efficiently, overcoming memory limitations that typically restrict structured additive distributional regression models. This is particularly beneficial for fields that require high-dimensional, large-scale data modeling, such as climate science, epidemiology, and econometrics.
Example Application: Lightning Count Prediction
A key application of the method was demonstrated using Austrian lightning detection data. The study models hourly lightning counts across Austria, identifying significant weather-related predictors. The findings highlight the effectiveness of the approach in: - Handling massive datasets (millions of observations). - Selecting relevant predictors automatically. - Providing a full probabilistic forecast rather than just binary classification.