Schlosser L, Hothorn T, Zeileis A (2019). “The Power of Unbiased Recursive Partitioning: A Unifying View of CTree, MOB, and GUIDE”, arXiv:1906.10179, arXiv.org EPrint Archive. https://arXiv.org/abs/1906.10179
A core step of every algorithm for learning regression trees is the selection of the best splitting variable from the available covariates and the corresponding split point. Early tree algorithms (e.g., AID, CART) employed greedy search strategies, directly comparing all possible split points in all available covariates. However, subsequent research showed that this is biased towards selecting covariates with more potential split points. Therefore, unbiased recursive partitioning algorithms have been suggested (e.g., QUEST, GUIDE, CTree, MOB) that first select the covariate based on statistical inference using pvalues that are adjusted for the possible split points. In a second step a split point optimizing some objective function is selected in the chosen split variable. However, different unbiased tree algorithms obtain these pvalues from different inference frameworks and their relative advantages or disadvantages are not well understood, yet. Therefore, three different popular approaches are considered here: classical categorical association tests (as in GUIDE), conditional inference (as in CTree), and parameter instability tests (as in MOB). First, these are embedded into a common inference framework encompassing parametric model trees, in particular linear model trees. Second, it is assessed how different building blocks from this common framework affect the power of the algorithms to select the appropriate covariates for splitting: observationwise goodnessoffit measure (residuals vs. model scores), dichotomization of residuals/scores at zero, and binning of possible split variables. This shows that specifically the goodnessoffit measure is crucial for the power of the procedures, with model scores without dichotomization performing much better in many scenarios.
CRAN package: https://CRAN.Rproject.org/package=partykit
Development version with some extensions enabled: partykit_1.24.2.tar.gz
Replication materials: simulation.zip
The manuscript compares three socalled unbiased recursive partitioning algorithms that employ statistical inference to adjust for the number of possible splits in a split variable: GUIDE (Loh 2002), CTree (Hothorn et al. 2006), MOB (Zeileis et al. 2008).
First, it is pointed out what the similarities and the differences in the algorithms are, specifically with respect to the split variable selection through statistical tests. Second, the power of these tests is studied for a “stump”, i.e., a single split only. Third, the capability of the entire algorithm (including a pruning strategy) to recover the correct partition in a “tree” with two splits is investigated.
In all cases, the three algorithms are employed to learn modelbased trees where in each leaf of the tree a linear regression model is fitted with intercept β_{0} and slope β_{1}. The simulations then vary whether only the intercept β_{0} or the slope β_{1} or both differ in the data.
All three algorithms proceed by first fitting the model (here: linear regression by OLS) in a given subgroup (or node) of the tree. Then they extract some kind of goodnessoffit measure (either residuals or full model scores) and test whether this measure is associated with any of the split variables. The variable with the highest association (i.e., lowest pvalue) is employed for splitting and then the procedure is repeated recursively in the resulting subgroups.
For “pruning” the tree to the right size one can either first grow a larger tree and then prune those splits that are not relevant enough (postpruning). Or the algorithm can stop splitting when the association test is not significant anymore (prepruning).
The default combinations of fitted model type, test type, and pruning strategy for the three algorithms are given in the following table.
Algorithm  Fit  Test  Pruning 

CTree  Nonparametric  Conditional inference  Pre 
MOB  Parametric  Scorebased fluctuation  Pre (or post with AIC/BIC) 
GUIDE  Parametric  Residualbased chisquared  Post (costcomplexity pruning) 
Thus, the main difference is the testing strategy but also the pruning is relevant. While at first sight, the tests come from very different motivations, they are actually not that different. When assessing the association with the split variable the following three properties are most relevant:
An overview of the corresponding settings for the three algorithms is given in the following table. Additionally, the tests differ somewhat in how they aggregate across the possible splits considered. Either in a sumofsquares statistic or a maximallyselected statistic.
Algorithm  Scores  Dichotomization  Categorization  Statistic 

CTree  Model scores  –  –  Sum of squares 
MOB  Model scores  –  –  Maximally selected 
GUIDE  Residuals  X  X  Sum of squares 
Subsequently, these algorithms are compared in two simulation studies. More details and more simulation studies can be found in the manuscript. In addition to the three default algorithms, a modified GUIDE algorithm using model scores instead of residuals (GUIDE+scores) is considered.
Clearly, the different choices made in the construction influence the inference properties of the significance tests. Hence, in a first step we investigate the power properties of the tests when there is only one split in one of the split variables (among further noise variables). The split can pertain either to the intercept β_{0} only or the slope β_{1} only or both.
The plot below shows the probability of selection the true split variable (Z_{1}) with the minimal pvalue against the magnitude of the difference in the regression coefficients (δ). For a split in the middle of the data (50%) pertaining only to the intercept β_{0} (top left panel) all tests perform almost equivalently. However, if the split only affects the slope β_{1} (middle column) it is much better to use scorebased tests rather than residualbased tests (as in GUIDE) which cannot pick up changes that do not affect the conditional mean. Moreover, if the split occurs not in the middle (50% quantile, top row) but in the tails (90% quantile, bottom row) it is better to use a maximallyselected statistic (as in MOB) rather than a sumofsquares statistic.
One could argue that the power properties of the tests may be crucial when prepruning (based on statistical significance) is used. However, when combined with costcomplexity postpruning it may not be so important to have particularly high power. As long as the power for the true split variables is higher than for the noise variables, it might be sufficient to select the correct split variable.
This is assessed in a simulation for a tree with two splits, both depending on differences of magnitude δ in the two regression coefficients, respectively. The adjusted Rand index is used to assess how well the partition found by the tree conforms with the true partition. The columns of the display below are for splits that occur in the middle of the data vs. later in the sample (left to right).
And indeed it can be shown that postpruning (bottom row) mitigates many of the power deficits of the testing strategies compared to significancebased prepruning (top row). However, it is still clearly better to use a score based test (as in CTree, MOB, and GUIDE+scores) than a residualsbased test (as in GUIDE). Also, prepruning may even lead to slightly better results than postpruning when based on a powerful test.
Using several simulation setups we have shown that in many circumstances CTree, MOB, and GUIDE perform very similarly for recursive partitioning based on linear regression models. However, in some settings scorebased clearly outperform residualbased tests (the latter may even lack power altogether). To some extent crosscomplexity postpruning can mitigate power deficits of the testing strategy but prepruning typically works as well as long as the significance test works well.
Furthermore, other simulations in the manuscript show that dichotomization of residuals/scores should be avoided as it reduces the power of the tests. Note that this is very easy to do in GUIDE: Instead of chisquared tests one can simply use oneway ANOVA tests. Finally, in the appendix of the manuscript it is shown that maximallyselected statistics (as in MOB) work better for abrupt splits late in the sample while the sumofsquares statistics (from CTree and GUIDE) work better for smooth(er) transitions.
]]>The forecast is based on a hybrid random forest learner that combines three main sources of information: An ability estimate for every team based on historic matches; an ability estimate for every team based on odds from 18 bookmakers; further team covariates (e.g., age, team structure) and countryspecific socioeconomic factors (population, GDP). The random forest is learned using the FIFA Women’s World Cups in 2011 and 2015 as training data and then applied to current information to obtain a forecast for the 2019 FIFA Women’s World Cup. The random forest actually provides the predicted number of goals for each team in all possible matches in the tournament so that a bivariate Poisson distribution can be used to compute the probabilities for a win, draw, or loss in such a match. Based on these match probabilities the entire tournament can be simulated 100,000 times yielding winning probabilities for each team. The results show that defending champions United States are the clear favorite with a winning probability of 28.1% followed by host France with a winning probability of 14.3%, England with 13.3%, and Germany with 12.9%. The winning probabilities for all teams are shown in the barchart below with more information linked in the interactive fullwidth version.
Interactive fullwidth graphic
The full study is available in a recent working paper which has been conducted by an international team of researchers: Andreas Groll, Christophe Ley, Gunther Schauberger, Hans Van Eetvelde, Achim Zeileis. It actually provides a hybrid approach that combines three stateoftheart forecasting methods:
Historic match abilities:
An ability estimate is obtained for every team based on “retrospective” data, namely 3418 historic matches of 167 international women’s teams over the last 8 years. A bivariate Poisson model with teamspecific fixed effects is fitted to the number of goals scored by both teams in each match. However, rather than equally weighting all matches to obtain average team abilities (or team strengths) over the entire history period, an exponential weighting scheme is employed. This assigns more weight to more recent results and thus yields an estimate of current team abilities. More details can be found in Ley, Van de Wiele, Van Eetvelde (2019).
Bookmaker consensus abilities:
Another ability estimate for every team is obtained based on “prospective” data, namely the odds of 18 international bookmakers that reflect their expert expectations for the tournament. Using the bookmaker consensus model of Leitner, Zeileis, Hornik (2010) the bookmaker odds are first adjusted for the bookmakers’ profit margins (“overround”) and then averaged (on a logit scale) to obtain a consensus for the winning probability of each team. To adjust for the effects of the tournament draw (that might have led to easier or harder groups for some teams), an “inverse” simulation approach is used to infer which team abilities are most likely to lead up to these winning probabilities.
Hybrid random forest:
Finally, machine learning is used to combine the two ability estimates above along with a broad range of further relevant covariates, yielding refined probabilistic forecasts for each match. Specifically, the hybrid random forest approach of Groll, Ley, Schauberger, Van Eetvelde (2019) is used to combine the two highlyinformative ability estimates with further teamspecific information that may or may not be relevant to the team’s performance. The covariates considered comprise teamspecific details (e.g., FIFA rank, average age, confederation, team structure, …) as well as countryspecifc socioeconomic factors (population and GDP per capita). By learning a large ensemble of 5,000 regression trees, the relative importances of all the covariates can be inferred automatically. The resulting predicted number of goals for each team (averaged over all trees) can then finally be used to simulate the entire tournament 100,000 times.
Using the hybrid random forest an expected number of goals is obtained for both teams in each possible match. The covariate information used for this is the difference between the two teams in each of the variables listed above, i.e., the difference in historic match abilities (on a log scale), the difference in bookmaker consensus abilities (on a log scale), difference in mean age of the teams, etc. Assuming a bivariate Poisson distribution with the expected numbers of goals for both teams, we can compute the probability that a certain match ends in a win, a draw, or a loss.
The following heatmap shows the win probabilities in each possible match between a pair of teams with green vs. pink signalling probabilities above vs. below 50%, respectively. The corresponding loss probability is displayed when changing the roles of the teams (i.e., switching rows and columns in the matrix below). The tooltips for each match in the interactive version of the graphic also print the three win, draw, and loss probabilities.
Interactive fullwidth graphic
As every single match can be simulated with the pairwise probabilities above, it is also straightfoward to simulate the entire tournament (here: 100,000 times) providing “survival” probabilities for each team across the different stages.
Interactive fullwidth graphic
All our forecasts are probabilistic, clearly below 100%, and thus by no means certain  even if the favorite United States has clearly the highest winning probability of all participating teams. However, recall that a single poor performance in the playoffs is sufficient to drop out of the tournament. For example, this happened to host and clear favorite Germany in 2011 (with a winning probability of almost 40% according to the bookmakers) when they lost to Japan 01 in extra time in the quarterfinals. Japan then went on to become FIFA Women’s World Champion for the first time.
Another interesting observation is that the bookmakers see both the United States and France almost on par with bookmaker consensus probabilities of 18.1% and 18.7%, respectively. Clearly, the bookmakers (and presumably their customers) expect that France’s home advantage will play an important role. In contrast, our hybrid random forest does not find the home advantage to be an important factor and hence forecasts a much higher winning probability for the United States (28.1%) than for France (14.3%). This is due to the home advantage not having played an important role in our learning data: Germany in 2011 and Canada in 2015 both dropped out in the quarterfinals.
Finally, when considering the bookmaker consensus, it is also worth pointing out that the bookmakers seem to be less confident about their odds for the Women’s World Cup compared to the Men’s World Cup. This is reflected by the increased overround that assures the bookmakers’ profit margins. While for men’s tournaments this overround is typically around 15% (that the bookmakers keep and do not pay out) while for the FIFA Women’s World Cup 2019 it is a sizeable 25% on average and thus ten percentage points higher.
This overround is also the main reason why we recommend against betting based on the results presented here. It assures that the best chances of making money based on sports betting lie with the bookmakers. Instead we recommend to bet only privately among friends and colleagues  or simply enjoy the exciting matches we are surely about to see in France!
Groll A, Ley C, Schauberger G, Van Eetvelde H, Zeileis A (2019). “Hybrid Machine Learning Forecasts for the FIFA Women’s World Cup 2019”, arXiv:1906.01131, arXiv.org EPrint Archive. https://arXiv.org/abs/1906.01131
]]>Heidi Seibold, Achim Zeileis, Torsten Hothorn (2019). “model4you: An R Package for Personalised Treatment Effect Estimation.” Journal of Open Research Software, 7(17), 16. doi:10.5334/jors.219
Typical models estimating treatment effects assume that the treatment effect is the same for all individuals. Modelbased recursive partitioning allows to relax this assumption and to estimate stratified treatment effects (modelbased trees) or even personalised treatment effects (modelbased forests). With modelbased trees one can compute treatment effects for different strata of individuals. The strata are found in a datadriven fashion and depend on characteristics of the individuals. Modelbased random forests allow for a similarity estimation between individuals in terms of model parameters (e.g. intercept and treatment effect). The similarity measure can then be used to estimate personalised models. The R package model4you implements these stratified and personalised models in the setting with two randomly assigned treatments with a focus on ease of use and interpretability so that clinicians and other users can take the model they usually use for the estimation of the average treatment effect and with a few lines of code get a visualisation that is easy to understand and interpret.
https://CRAN.Rproject.org/package=model4you
The correlation between exam group and exam performance in an introductory mathematics exam (for business and economics students) is investigated using treebased stratified and personalized treatment effects. Group 1 took the exam in the morning and group 2 started the exam with slightly different exercises after the first group finished. Potential sources of heterogeneity in the group effect include gender, field of study, whether the exam was taken (and failed) previously, and prior performance in online “tests” earlier in the semester. Performance in both the written exam and the online tests is captured by percentage of correctly solved exercises.
Overall, it seems that the split into two different exam groups was fair: The second group had only a slightly lower performance by around 2 or 3 percentage points, suggesting that the exam in the second group was only very slightly more difficult. However, when investigating the heterogeneity of this group effect with a modelbased tree it turns out that this distinguishes the students by their performance in the online tests. The largest difference between the two exam groups is in the students who did very well in the online tests (more than 92.3 percent correct), where the secondgroup students performed worse by 13.3 percentage points. So the split into the two exam groups seems to have been not fully fair for those very good students.
To refine the assessment further, a modelbased forest can be estimated. This reveals that the dependence of the group effect on the performance in the online tests is even more pronounced. This is shown in the dependence plots and beeswarm plots below with the group treatment effect on the yaxis and the performance in the online tests on the xaxis.
To fit the simple linear base model in R, lm()
can be used. The subsequent tree based on this model can be obtained with pmtree()
from model4you
and the forest with pmforest()
. Example code is shown below, the full replication code for the entire analysis and graphics is included in the manuscript.
bmod_math < lm(pcorrect ~ group, data = MathExam)
tr_math < pmtree(bmod_math, control = ctree_control(maxdepth = 2))
forest_math < pmforest(bmod_math)
]]>The goto palette in many software packages is  or used to be until rather recently  the socalled rainbow: a palette created by changing the hue in highlysaturated RGB colors. This has been widely recognized as having a number of disadvantages including: abrupt shifts in brightness, misleading for viewers with color vision deficiencies, too flashy to look at for a longer time. As part of our R software project colorspace we therefore started collecting typical (ab)uses of the RGB rainbow palette on our web site http://colorspace.RForge.Rproject.org/articles/endrainbow.html and suggest better HCLbased color palettes.
Here, we present the most recent addition to that example collection, a map of influenza severity in Germany, published by the influenza working group of the Robert KochInstitut. Along with the original map and its poor choice of colors we:
The shaded map below was taken from the web site of the Robert KochInstitut (Arbeitsgemeinschaft Influenza) and it shows the severity of influenza in Germany in week 8, 2019. The original color palette (left) is the classic rainbow ranging from “normal” (blue) to “strongly increased” (red). As all colors in the palette are very flashy and highlysaturated it is hard to grasp intuitively which areas are most affected by influenza. Also, the least interesting “normal” areas stand out as blue is the darkest color in the palette.
As an alternative, a proper multihue sequential HCL palette is used on the right. This has smooth gradients and the overall message can be grasped quickly, giving focus to the highrisk regions depicted with dark/colorful colors. However, the extremely sharp transitions between “normal” and “strongly increased” areas (e.g., in the North and the East) might indicate some overfitting in the underlying smoothing for the map.
Converting all colors to grayscale brings out even more clearly why the overall picture is so hard to grasp with the original palette: The gradients are discontinuous switching several times between bright and dark. Thus, it is hard to identify the highrisk regions while this is more natural and straightforward with the HCLbased sequential palette.
Emulating greendeficient vision (deuteranopia) emphasizes the same problems as the desaturated version above but shows even more problems with the original palette: The wrong areas in the map “pop out”, making the map extremely hard to use for viewers with redgreen deficiency. The HCLbased palette on the other hand is equally accessible for colordeficient viewers as for those with full color vision.
The desaturated and deuteranope version of the original image influenzarainbow.png (a screenshot of the RKI web page) are relatively easy to produce using the colorspace
function cvd_emulator("influenzarainbow.png")
. Internally, this reads the RGB colors for all pixels in the PNG, converts them with the colorspace
functions desaturate()
and deutan()
, respectively, and saves the PNG again. Below we also do this “by hand”.
What is more complicated is the replacement of the original rainbow palette with a properly balanced HCL palette (without access to the underlying data). Luckily the image contains a legend from which the original palette can be extracted. Subsequently, it is possibly to index all colors in the image, replace them, and write out the PNG again.
As a first step we read the original PNG image using the R package png, returning a height x width x 4 array containing the three RGB (red/green/blue) channels plus a channel for alpha transparency. Then, this is turned into a height x width matrix containing color hex codes using the base rgb()
function:
img < png::readPNG("influenzarainbow.png")
img < matrix(
rgb(img[,,1], img[,,2], img[,,3]),
nrow = nrow(img), ncol = ncol(img)
)
Using a manual search we find a column of pixels from the palette legend (column 630) and thin it to obtain only 99 colors:
pal_rain < img[96:699, 630]
pal_rain < pal_rain[seq(1, length(pal_rain), length.out = 99)]
For replacement we use a slightly adapted sequential_hcl()
that was suggested by Stauffer et al. (2015) for a precipitation warning map. The "PurpleYellow"
palette is currently only in version 1.41 of the package on RForge but other sequential HCL palettes could also be used here.
library("colorspace")
pal_hcl < sequential_hcl(99, "PurpleYellow", p1 = 1.3, c2 = 20)
Now for replacing the RGB rainbow colors with the sequential colors, the following approach is taken: The original image is indexed by matching the color of each pixel to the closest of the 99 colors from the rainbow palette. Furthermore, to preserve the black borders and the gray shadows, 50 shades of gray are also offered for the indexing. To match pixel colors to palette colors a simple Manhattan distance (sum of absolute distances) is used in the CIELUV color space:
# 50 shades of gray
pal_gray < gray(0:50/50)
## HCL coordinates for image and palette
img_luv < coords(as(hex2RGB(as.vector(img)), "LUV"))
pal_luv < coords(as(hex2RGB(c(pal_rain, pal_gray)), "LUV"))
## Manhattan distance matrix
dm < matrix(NA, nrow = nrow(img_luv), ncol = nrow(pal_luv))
for(i in 1:nrow(pal_luv)) dm[, i] < rowSums(abs(t(t(img_luv)  pal_luv[i,])))
idx < apply(dm, 1, which.min)
Now each element of the img
hex color matrix can be easily replaced by indexing a new palette with 99 colors (plus 50 shades of gray) using the idx
vector. This is what the pal_to_png()
function below does, writing the resulting matrix to a PNG file. The function is somewhat quick and dirty, makes no sanity checks, and assumes img
and idx
are in the calling environment.
pal_to_png < function(pal = pal_hcl, file = "influenza.png", rev = FALSE) {
ret < img
pal < if(rev) c(rev(pal), rev(pal_gray)) else c(pal, pal_gray)
ret[] < pal[idx]
ret < coords(hex2RGB(ret))
dim(ret) < c(dim(img), 3)
png::writePNG(ret, target = file)
}
With this function, we can easily produce the PNG graphic with the desaturated palette and the deuteranope version”
pal_to_png(desaturate(pal_rain), "influenzarainbowgray.png")
pal_to_png( deutan(pal_rain), "influenzarainbowdeutan.png")
The analogous graphics for the HCLbased "PurpleYellow"
palette are generated by:
pal_to_png( pal_hcl, "influenzapurpleyellow.png")
pal_to_png(desaturate(pal_hcl), "influenzapurpleyellowgray.png")
pal_to_png( deutan(pal_hcl), "influenzapurpleyellowdeutan.png")
Given that we have now extracted the pal_rain
palette and set up the pal_hcl
alternative we can also use the colorspace
function specplot()
to understand how the perceptual properties of the colors change across the two palettes. For the HCLbased palette the hue/chroma/luminance changes smoothly from dark/colorful purple to a light yellow. In contrast, in the original RGB rainbow chroma and, more importantly, luminance change nonmonotonically and rather abruptly:
specplot(pal_rain)
specplot(pal_hcl)
Given that the colors in the image are indexed now and the gray shades are in a separate subvector, we can now easily rev
erse the order in both subvectors. This yields a black background with white letters and we can use the "Inferno"
palette that works well on dark backgrounds:
pal_to_png(sequential_hcl(99, "Inferno"), "influenzainferno.png", rev = TRUE)
For more details on the limitations of the rainbow palette and further pointers see “The End of the Rainbow” by Hawkins et al. (2014) or “Somewhere over the Rainbow: How to Make Effective Use of Colors in Meteorological Visualizations” by Stauffer et al. (2015) as well as the #endrainbow hashtag on Twitter.
]]>The web page http://hclwizard.org/ had originally been started to accompany the manuscript: “Somewhere over the Rainbow: How to Make Effective Use of Colors in Meteorological Visualizations” by Stauffer et al. (2015, Bulletin of the American Meteorological Society) to facilitate the adoption of color palettes using the HCL (HueChromaLuminance) color model. It was realized using the R package colorspace in combination with shiny.
After the major update of the colorspace package the http://hclwizard.org/ has also just been relaunched, now hosting all three shiny color apps from the package:
This app allows to design new palettes interactively: qualitative palettes, sequential palettes with single or multiple hues, and diverging palettes (composed from two singlehue sequential palettes). The underlying HCL coordinates can be modified, starting out from a wide range of predefined palettes. The resulting palette can be assessed in various kinds of displays and exported in different formats.
This app allows to assess how well the colors in an uploaded graphics file (png/jpg/jpeg) work for viewers with color vision deficiencies. Different kinds of color blindness can be emulated: deuteranope (red deficient), protanope (green deficient), tritanope (blue deficient), monochrome (grayscale).
In addition to the palette creator app described above, this app provides a more traditional color picker. Sets of individual colors can be selected (and exported) by navigating different views of the HCL color space.
]]>Martin Wagner, Achim Zeileis (2019). “Heterogeneity and Spatial Dependence of Regional Growth in the EU: A Recursive Partitioning Approach.” German Economic Review, 20(1), 6782. doi:10.1111/geer.12146 [ pdf ]
We use modelbased recursive partitioning to assess heterogeneity of growth and convergence processes based on economic growth regressions for 255 European Union NUTS2 regions from 1995 to 2005. Spatial dependencies are taken into account by augmenting the modelbased regression tree with a spatial lag. The starting point of the analysis is a humancapitalaugmented Solowtype growth equation similar in spirit to Mankiw et al. (1992, The Quarterly Journal of Economics, 107, 407437). Initial GDP and the share of highly educated in the working age population are found to be important for explaining economic growth, whereas the investment share in physical capital is only significant for coastal regions in the PIIGS countries. For all considered spatial weight matrices recursive partitioning leads to a regression tree with four terminal nodes with partitioning according to (i) capital regions, (ii) noncapital regions in or outside the socalled PIIGS countries and (iii) inside the respective PIIGS regions furthermore between coastal and noncoastal regions. The choice of the spatial weight matrix clearly influences the spatial lag parameter while the estimated slope parameters are very robust to it. This indicates that accounting for heterogeneity is an important aspect of modeling regional economic growth and convergence.
https://CRAN.Rproject.org/package=lagsarlmtree
The growth model to be assessed for heterogeneity is a linear regression model for the average growth rate of real GDP per capita (ggdpcap) as the dependent variable with the following regressors:
Thus, a humancapitalaugmented version of the Solow model is employed, inspired by the by now classical work of Mankiw et al. (1992). The wellknown data sets from SalaiMartin et al. (2004) and Fernandez et al. (2001) are employed below for estimation.
To assess whether a single growth regression model with stable parameters across all EU regions is sufficient, splitting the data by the following partitioning variables is considered:
To adjust for spatial dependencies a spatial lag term with inverse distance weights is considered here. Other weight specifications lead to very similar estimated tree structures and regression coefficients, though.
library("lagsarlmtree")
data("GrowthNUTS2", package = "lagsarlmtree")
data("WeightsNUTS2", package = "lagsarlmtree")
tr < lagsarlmtree(ggdpcap ~ gdpcap0 + shgfcf + shsh + shsm 
gdpcap0 + accessrail + accessroad + capital + regboarder + regcoast + regobj1 + cee + piigs,
data = GrowthNUTS2, listw = WeightsNUTS2$invw, minsize = 12, alpha = 0.05)
print(tr)
## Spatial lag model tree
##
## Model formula:
## ggdpcap ~ gdpcap0 + shgfcf + shsh + shsm  gdpcap0 + accessrail +
## accessroad + capital + regboarder + regcoast + regobj1 +
## cee + piigs
##
## Fitted party:
## [1] root
##  [2] capital in no
##   [3] piigs in no: n = 176
##   (Intercept) gdpcap0 shgfcf shsh shsm
##   0.11055 0.01171 0.00208 0.02195 0.00179
##   [4] piigs in yes
##    [5] regcoast in no: n = 13
##    (Intercept) gdpcap0 shgfcf shsh shsm
##    0.1606 0.0159 0.0469 0.0789 0.0234
##    [6] regcoast in yes: n = 39
##    (Intercept) gdpcap0 shgfcf shsh shsm
##    0.07348 0.01106 0.09156 0.11668 0.00942
##  [7] capital in yes: n = 27
##  (Intercept) gdpcap0 shgfcf shsh shsm
##  0.2056 0.0223 0.0075 0.0411 0.0528
##
## Number of inner nodes: 3
## Number of terminal nodes: 4
## Number of parameters per node: 5
## Objective function (residual sum of squares): 0.0155
##
## Rho (from lagsarlm model):
## rho
## 0.837
The resulting linear regression tree can be visualized with pvalues from the parameter stability tests displayed in the inner nodes and a scatter plot of GDP per capita growth (ggdpcap) vs. (log) initial real GDP per capita (ggdcap0) in the terminal nodes:
plot(tr, tp_args = list(which = 1))
It is most striking that the speed of βconvergence is much higher for the 27 capital regions. More details about differences in the other regressors are shown in the table below. Finally, it is of interest which variables were not selected for splitting in the tree, i.e., are not associated with significant parameter instabilities: initial income, the border dummy, and Objective 1 regions, among others.
Node  n  Partitioning  variables  Regressor  variables  

capital  piigs  regcoast  (Const.)  gdpcap0  shgfcf  shsh  shsm  
3  176  no  no  –  0.111 (0.016) 
–0.0117 (0.0016) 
–0.0021 (0.0077) 
0.022 (0.011) 
0.0018 (0.0068) 

5  13  no  yes  no  0.161 (0.128) 
–0.0159 (0.0135) 
–0.0469 (0.0815) 
0.079 (0.059) 
–0.0234 (0.0660) 

6  39  no  yes  yes  0.073 (0.056) 
–0.0111 (0.0059) 
0.0916 (0.0420) 
0.117 (0.029) 
0.0094 (0.0218) 

7  27  yes  –  –  0.206 (0.031) 
–0.0223 (0.0029) 
–0.0075 (0.0259) 
0.041 (0.020) 
0.0528 (0.0117) 
For more details see the full manuscript. Replication materials for the entire analysis from the manuscript are available as a demo in the package:
demo("GrowthNUTS2", package = "lagsarlmtree")
]]>The colorspace package provides a broad toolbox for selecting individual colors or color palettes, manipulating these colors, and employing them in various kinds of visualizations. Version 1.4.0 has just been released on CRAN, containing many new features and contributions from new coauthors. A new web site presenting and documenting the package has been launched at http://colorspace.RForge.Rproject.org/.
At the core of the package there are various utilities for computing with color spaces (as the name conveys). Thus, the package helps to map various threedimensional representations of color to each other. A particularly important mapping is the one from the perceptuallybased and deviceindependent color model HCL (HueChromaLuminance) to standard RedGreenBlue (sRGB) which is the basis for color specifications in many systems based on the corresponding hex codes (e.g., in HTML but also in R). For completeness further standard color models are included as well in the package: polarLUV()
(= HCL), LUV()
, polarLAB()
, LAB()
, XYZ()
, RGB()
, sRGB()
, HLS()
, HSV()
.
The HCL space (= polar coordinates in CIELUV) is particularly useful for specifying individual colors and color palettes as its three axes match those of the human visual system very well: Hue (= type of color, dominant wavelength), chroma (= colorfulness), luminance (= brightness).
The colorspace package provides three types of palettes based on the HCL model:
qualitative_hcl()
.sequential_hcl()
.diverging_hcl()
.To aid choice and application of these palettes there are: scales for use with ggplot2; shiny (and tcltk) apps for interactive exploration; visualizations of palette properties; accompanying manipulation utilities (like desaturation, lighten/darken, and emulation of color vision deficiencies).
More detailed overviews and examples are provided in the articles:
The stable release version of colorspace is hosted on the Comprehensive R Archive Network (CRAN) at https://CRAN.Rproject.org/package=colorspace and can be installed via
install.packages("colorspace")
The development version of colorspace is hosted on RForge at https://RForge.Rproject.org/projects/colorspace/ in a Subversion (SVN) repository. It can be installed via
install.packages("colorspace", repos = "http://RForge.Rproject.org")
For Python users a beta reimplementation of the full colorspace package in Python 2/Python 3 is also available, see https://github.com/retostauffer/pythoncolorspace.
The colorspace package ships with a wide range of predefined color palettes, specified through suitable trajectories in the HCL (huechromaluminance) color space. A quick overview can be gained easily with hcl_palettes()
:
library("colorspace")
hcl_palettes(plot = TRUE)
Using the names from the plot above and a desired number of colors in the palette a suitable color vector can be easily computed, e.g.,
q4 < qualitative_hcl(4, "Dark 3")
q4
## [1] "#E16A86" "#909800" "#00AD9A" "#9183E6"
The functions sequential_hcl()
, and diverging_hcl()
work analogously. Additionally, their hue/chroma/luminance parameters can be modified, thus allowing to easily customize each palette. Moreover, the choose_palette()
/hclwizard()
app provide convenient user interfaces to do the customization interactively. Finally, even more flexible diverging HCL palettes are provided by divergingx_hcl()
.
The color vectors returned by the HCL palette functions can usually be passed directly to most base graphics function, typically through the col
argument. Here, the q4
vector created above is used in a time series display:
plot(log(EuStockMarkets), plot.type = "single", col = q4, lwd = 2)
legend("topleft", colnames(EuStockMarkets), col = q4, lwd = 3, bty = "n")
As another example for a sequential palette a spine plot is created below, displaying the proportion of Titanic passengers that survived per class. The Purples 3
palette is used which is quite similar to the ColorBrewer.org palette Purples
. Here, only two colors are employed, yielding a dark purple and light gray.
ttnc < margin.table(Titanic, c(1, 4))[, 2:1]
spineplot(ttnc, col = sequential_hcl(2, "Purples 3"))
To plug the HCL color palettes into ggplot2 graphics suitable discrete and/or continuous gglot2 color scales are provided. The scales are called via the scheme scale_<aesthetic>_<datatype>_<colorscale>()
, where <aesthetic>
is the name of the aesthetic (fill
, color
, colour
), <datatype>
is the type of the variable plotted (discrete
or continuous
) and <colorscale>
sets the type of the color scale used (qualitative
, sequential
, diverging
, divergingx
).
To illustrate their usage two simple examples are shown using the qualitative Dark 3
and sequential Purples 3
palettes that were also employed above. First, semitransparent shaded densities of the sepal length from the iris data are shown, grouped by species.
library("ggplot2")
ggplot(iris, aes(x = Sepal.Length, fill = Species)) + geom_density(alpha = 0.6) +
scale_fill_discrete_qualitative(palette = "Dark 3")
The sequential palette is used to code the cut levels in a scatter of price by carat in the diamonds data (or rather a small subsample thereof). The scale function first generates six colors but then drops the first color because the light gray is too light in this display. (Alternatively, the chroma and luminance parameters could also be tweaked.)
dsamp < diamonds[1 + 1:1000 * 50, ]
ggplot(dsamp, aes(carat, price, color = cut)) + geom_point() +
scale_color_discrete_sequential(palette = "Purples 3", nmax = 6, order = 2:6)
The colorspace package also provides a number of functions that aid visualization and assessment of its palettes.
demoplot()
can display a palette (with arbitrary number of colors) in a range of typical and somewhat simplified statistical graphics.hclplot()
converts the colors of a palette to the corresponding hue/chroma/luminance coordinates and displays them in HCL space with one dimension collapsed. The collapsed dimension is the luminance for qualitative palettes and the hue for sequential/diverging palettes.specplot()
also converts the colors to hue/chroma/luminance coordinates but draws the resulting spectrum in a line plot.For the qualitative Dark 3
palette from above the following plots can be obtained.
demoplot(q4, "bar")
hclplot(q4)
specplot(q4, type = "o")
A bar plot would be another typical application for a qualitative palette (instead of the time series and density plot used above). However, a lighter and less colorful palette might be preferable in this situation (e.g., Pastel 1
or Set 3
).
The other two displays show that luminance is (almost) constant in the palette while the hue changes linearly along the color “wheel”. Ideally, chroma would have also been constant to completely balance the colors. However, at this luminance the maximum chroma differs across hues so that the palette is fixed up to use less chroma for the yellow and green elements.
Subsequently, the same assessment is carried out for the sequential Purples 3
palette as employed above.
s9 < sequential_hcl(9, "Purples 3")
demoplot(s9, "heatmap")
hclplot(s9)
specplot(s9, type = "o")
Here, a heatmap (based on the wellknown Maunga Whau volcano data) is used as a typical application for a sequential palette. The elevation of the volcano is brought out clearly, focusing with the dark colors on the higher elevations.
The other two displays show that hue is constant in the palette while luminance and chroma vary. Luminance increases monotonically from dark to light (as required for a proper sequential palette). Chroma is triangularshaped which allows to better distinguish the middle colors in the palette (compared to a monotonic chroma trajectory).
]]>Manuel Gebetsberger, Jakob W. Messner, Georg J. Mayr, Achim Zeileis (2018). “Estimation Methods for Nonhomogeneous Regression Models: Minimum Continuous Ranked Probability Score versus Maximum Likelihood.” Monthly Weather Review. 146(12), 43234338. doi:10.1175/MWRD170364.1
Nonhomogeneous regression models are widely used to statistically postprocess numerical ensemble weather prediction models. Such regression models are capable of forecasting full probability distributions and correcting for ensemble errors in the mean and variance. To estimate the corresponding regression coefficients, minimization of the continuous ranked probability score (CRPS) has widely been used in meteorological postprocessing studies and has often been found to yield more calibrated forecasts compared to maximum likelihood estimation. From a theoretical perspective, both estimators are consistent and should lead to similar results, provided the correct distribution assumption about empirical data. Differences between the estimated values indicate a wrong specification of the regression model. This study compares the two estimators for probabilistic temperature forecasting with nonhomogeneous regression, where results show discrepancies for the classical Gaussian assumption. The heavytailed logistic and Student?s t distributions can improve forecast performance in terms of sharpness and calibration, and lead to only minor differences between the estimators employed. Finally, a simulation study confirms the importance of appropriate distribution assumptions and shows that for a correctly specified model the maximum likelihood estimator is slightly more efficient than the CRPS estimator.
https://CRAN.Rproject.org/package=crch
The function crch()
provides heteroscedastic (or nonhomogenous) regression models of "gaussian"
(i.e., normally distributed), "logistic"
, or "student"
(i.e., tdistributed) response variables. Additionally, responses may be censored or truncated. Estimation methods include maximum likelihood (type = "ml"
, default) and minimum CRPS (type = "crps"
). Boosting can also be employed for model fitting (instead of full optimization). CRPS computations leverage the excellent scoringRules package.
The plots below show histograms of the PIT (probability integral transform) for various nonhomogenous regression models yielding probabilistic 1dayahead temperature forecasts at an Alpine site (Innsbruck). When the probabilistic forecasts are perfectly calibrated to the actual observations the PIT histograms should form a straight line at density 1. The gray area illustrates the 95% consistency interval around perfect calibration  and binning is based on 5% intervals.
When a normally distributed or Gaussian response is assumed (left panel), it is shown that the maximumlikelihood model (solid line) is not well calibrated as the tails are not heavy enough. (The legend denotes this “LS” because maximizing the likelihood is equivalent to minimizing the socalled logscore.) In contrast, the minimumCRPS model is reasonably well calibrated.
When assuming a Studentt response (right panel) there is little deviation between both estimation techniques and both are wellcalibrated.
Thus, the source of the differences between CRPS and MLbased estimation with a Gaussian response comes from assuming a distribution whose tails are not heavy enough. In this situation, minimumCRPS yields the somewhat more robust model fit while both estimation techniques lead to very similar results if a more suitable response distribution is adopted. In the latter case ML is slightly more efficient than minimumCRPS.
]]>Heidi Seibold, Torsten Hothorn, Achim Zeileis (2018). “Generalised Linear Model Trees with Global Additive Effects.” Advances in Data Analysis and Classification. Forthcoming. doi:10.1007/s1163401803421 arXiv
Modelbased trees are used to find subgroups in data which differ with respect to model parameters. In some applications it is natural to keep some parameters fixed globally for all observations while asking if and how other parameters vary across subgroups. Existing implementations of modelbased trees can only deal with the scenario where all parameters depend on the subgroups. We propose partially additive linear model trees (PALM trees) as an extension of (generalised) linear model trees (LM and GLM trees, respectively), in which the model parameters are specified a priori to be estimated either globally from all observations or locally from the observations within the subgroups determined by the tree. Simulations show that the method has high power for detecting subgroups in the presence of global effects and reliably recovers the true parameters. Furthermore, treatmentsubgroup differences are detected in an empirical application of the method to data from a mathematics exam: the PALM tree is able to detect a small subgroup of students that had a disadvantage in an exam with two versions while adjusting for overall ability effects.
https://CRAN.Rproject.org/package=palmtree
PALM trees are employed to investigate treatment differences in a mathematics 101 exam (for firstyear business and economics students) at Universität Innsbruck. Due to limited availability of seats in the exam room, students could selfselect into one of two exam tracks that were conducted back to back with slightly different questions on the same topics. The question is whether this “treatment” of splitting the students into two tracks was fair in the sense that it is on average equally difficult for the two groups. To investigate the question the data are loaded from the psychotools package, points are scaled to achieved percent in [0, 100], and the subset of variables for the analysis are selected:
data("MathExam14W", package = "psychotools")
MathExam14W$tests < 100 * MathExam14W$tests/26
MathExam14W$pcorrect < 100 * MathExam14W$nsolved/13
MathExam < MathExam14W[ , c("pcorrect", "group", "tests", "study",
"attempt", "semester", "gender")]
A naive check could be whether the percentage of correct points (pcorrect
) differs between the two group
s:
ci < function(object) cbind("Coefficient" = coef(object), confint(object))
ci(lm(pcorrect ~ group, data = MathExam))
## Coefficient 2.5 % 97.5 %
## (Intercept) 57.60 55.1 60.08
## group2 2.33 5.7 1.03
This shows that the second group achieved on average 2.33 percentage points less than the first group. But the corresponding confidence interval conveys that this difference is not significant.
However, it is conceivable that stronger (or weaker) students selected themselves more into one of the two groups. And if the assignment had been random, then the “treatment effect” might have been larger or even smaller. Luckily, an independent measure of the students’ ability is available, namely the percentage of points achieved in the online tests
conducted during the semester prior to the exam. Adjusting for that increases the treatment effect to a decrease of 4.37 percentage points which is still nonsignificant, though. This is due to weaker students selfselecting into the second group. Moreover, the tests
coefficient signals that 1 more percentage point from the online tests lead on average to 0.855 more percentage points in the written exam.
ci(lm(pcorrect ~ group + tests, data = MathExam))
## Coefficient 2.5 % 97.5 %
## (Intercept) 5.846 13.521 1.828
## group2 4.366 7.231 1.502
## tests 0.855 0.756 0.955
Finally, PALM trees are used to assess whether there are subgroups of differential group
treatment effects when adjusting for a global additive tests
effect. Potential subgroups can be formed from the covariates tests
, type of study
(threeyear bachelor vs. fouryear diploma), the number of times the students attempt
ed the exam, number of semester
s, and gender
. Using palmtree this can be easily carried out:
library("palmtree")
palmtree_math < palmtree(pcorrect ~ group  tests  tests +
study + attempt + semester + gender, data = MathExam)
print(palmtree_math)
## Partially additive linear model tree
##
## Model formula:
## pcorrect ~ group  tests + study + attempt + semester + gender
##
## Fitted party:
## [1] root
##  [2] attempt <= 1
##   [3] tests <= 92.3: n = 352
##   (Intercept) group2
##   7.09 3.00
##   [4] tests > 92.3: n = 79
##   (Intercept) group2
##   14.0 14.5
##  [5] attempt > 1: n = 298
##  (Intercept) group2
##  2.33 1.70
##
## Number of inner nodes: 2
## Number of terminal nodes: 3
## Number of parameters per node: 2
## Objective function (residual sum of squares): 253218
##
## Linear fixed effects (from palm model):
## tests
## 0.787
A somewhat enhanced version of plot(palmtree_math)
is shown below:
This indicates that for most students the group
treatment effect is indeed negligible. However, for the subgroup of “good” students (with high percentage correct in the online tests) in the first attempt, the exam in the second group was indeed more difficult. On average the students in the second group obtained 14.5 percentage points less than in the first group.
ci(palmtree_math$palm)
## Coefficient 2.5 % 97.5 %
## (Intercept) 7.088 16.148 1.971
## .tree4 21.069 13.348 28.791
## .tree5 9.421 5.168 13.673
## tests 0.787 0.671 0.903
## .tree3:group2 2.997 6.971 0.976
## .tree4:group2 14.494 22.921 6.068
## .tree5:group2 1.704 5.965 2.557
The absolute size of this group difference is still moderate, though, corresponding to about half an exercise out of 13.
In addition to the empirical case study the manuscript also provides an extensive simulation study comparing the performance of PALM trees in treatmentsubgroup scenarios to standard linear model (LM) trees, optimal treatment regime (OTR) trees (following Zhang et al. 2012), and the STIMA algorithm (simultaneous threshold interaction modeling algorithm). The study evaluates the methods with respect to (1) finding the correct subgroups, (2) not splitting when there are no subgroups, (3) finding the optimal treatment regime, and (4) correctly estimating the treatment effect.
Here we just briefly highlight the results for question (1): Are the correct subgroups found? The figure below shows the mean number of subgroups (over 150 simulated data sets and mean adjusted rand index (ARI) for increasing treatment effect differences Δ_{β} and number of observations n.
This shows that PALM trees perform increasingly well and somewhat better with respect to these metrics than the competitors. More details on the different scenarios and corresponding evaluations can be found in the manuscript. More replication materials are provided along with the manuscript on the publisher’s web page.
]]>Thorsten Simon, Peter Fabsic, Georg J. Mayr, Nikolaus Umlauf, Achim Zeileis (2018). “Probabilistic Forecasting of Thunderstorms in the Eastern Alps.” Monthly Weather Review. 146(9), 29993009. doi:10.1175/MWRD170366.1
A probabilistic forecasting method to predict thunderstorms in the European eastern Alps is developed. A statistical model links lightning occurrence from the groundbased Austrian Lightning Detection and Information System (ALDIS) detection network to a large set of direct and derived variables from a numerical weather prediction (NWP) system. The NWP system is the highresolution run (HRES) of the European Centre for MediumRange Weather Forecasts (ECMWF) with a grid spacing of 16 km. The statistical model is a generalized additive model (GAM) framework, which is estimated by Markov chain Monte Carlo (MCMC) simulation. Gradient boosting with stability selection serves as a tool for selecting a stable set of potentially nonlinear terms. Three grids from 64 x 64 to 16 x 16 km^{2} and five forecast horizons from 5 days to 1 day ahead are investigated to predict thunderstorms during afternoons (1200–1800 UTC). Frequently selected covariates for the nonlinear terms are variants of convective precipitation, convective potential available energy, relative humidity, and temperature in the midlayers of the troposphere, among others. All models, even for a lead time of 5 days, outperform a forecast based on climatology in an outofsample comparison. An example case illustrates that coarse spatial patterns are already successfully forecast 5 days ahead.
https://CRAN.Rproject.org/package=bamlss
Predicting thunderstorms in complex terrain (like the Austrian Alps) is a challenging task since one of the main forecasting tools, NWP systems, cannot fully resolve convective processes or circulations and exchange processes over complex topography. However, using a boosted binary GAM based on a broad range of NWP outputs useful forecasts can be obtained up to 5 days ahead. As an illustration, lightning activity for the afternoon of 20150722 is shown in the topleft panel below, indicating thunderstorms in many areas in the west but not the east. While the corresponding baseline climatology (top middle) has a low probability of thunderstorms for the entire region, the NWPbased probabilistic forecasts (bottom row) highlight increased probabilities already 5 days ahead, becoming much more clear cut when moving to 3 days and 1 day ahead.
More precisely, the probability of thunderstorms is predicted based on a binary logit GAM that allows for potentially nonlinear smooth effects in all NWP variables considered. It selects the relevant variables by gradient boosting coupled with stability selection. Effects and 95% credible intervals of the model for day 1 are estimated via MCMC sampling and shown below (on the logit scale). The number in the bottomright corner of each panel indicates the absolute range of the effect. The xaxes are cropped at the 1% and 99% quantiles of the respective covariate to enhance graphical representation.
(Note: As the data cannot be shared freely, the customary replication materials unfortunately cannot be provided.)
]]>