The goto palette in many software packages is  or used to be until rather recently  the socalled rainbow: a palette created by changing the hue in highlysaturated RGB colors. This has been widely recognized as having a number of disadvantages including: abrupt shifts in brightness, misleading for viewers with color vision deficiencies, too flashy to look at for a longer time. As part of our R software project colorspace we therefore started collecting typical (ab)uses of the RGB rainbow palette on our web site http://colorspace.RForge.Rproject.org/articles/endrainbow.html and suggest better HCLbased color palettes.
Here, we present the most recent addition to that example collection, a map of influenza severity in Germany, published by the influenza working group of the Robert KochInstitut. Along with the original map and its poor choice of colors we:
The shaded map below was taken from the web site of the Robert KochInstitut (Arbeitsgemeinschaft Influenza) and it shows the severity of influenza in Germany in week 8, 2019. The original color palette (left) is the classic rainbow ranging from “normal” (blue) to “strongly increased” (red). As all colors in the palette are very flashy and highlysaturated it is hard to grasp intuitively which areas are most affected by influenza. Also, the least interesting “normal” areas stand out as blue is the darkest color in the palette.
As an alternative, a proper multihue sequential HCL palette is used on the right. This has smooth gradients and the overall message can be grasped quickly, giving focus to the highrisk regions depicted with dark/colorful colors. However, the extremely sharp transitions between “normal” and “strongly increased” areas (e.g., in the North and the East) might indicate some overfitting in the underlying smoothing for the map.
Converting all colors to grayscale brings out even more clearly why the overall picture is so hard to grasp with the original palette: The gradients are discontinuous switching several times between bright and dark. Thus, it is hard to identify the highrisk regions while this is more natural and straightforward with the HCLbased sequential palette.
Emulating greendeficient vision (deuteranopia) emphasizes the same problems as the desaturated version above but shows even more problems with the original palette: The wrong areas in the map “pop out”, making the map extremely hard to use for viewers with redgreen deficiency. The HCLbased palette on the other hand is equally accessible for colordeficient viewers as for those with full color vision.
The desaturated and deuteranope version of the original image influenzarainbow.png (a screenshot of the RKI web page) are relatively easy to produce using the colorspace
function cvd_emulator("influenzarainbow.png")
. Internally, this reads the RGB colors for all pixels in the PNG, converts them with the colorspace
functions desaturate()
and deutan()
, respectively, and saves the PNG again. Below we also do this “by hand”.
What is more complicated is the replacement of the original rainbow palette with a properly balanced HCL palette (without access to the underlying data). Luckily the image contains a legend from which the original palette can be extracted. Subsequently, it is possibly to index all colors in the image, replace them, and write out the PNG again.
As a first step we read the original PNG image using the R package png, returning a height x width x 4 array containing the three RGB (red/green/blue) channels plus a channel for alpha transparency. Then, this is turned into a height x width matrix containing color hex codes using the base rgb()
function:
img < png::readPNG("influenzarainbow.png")
img < matrix(
rgb(img[,,1], img[,,2], img[,,3]),
nrow = nrow(img), ncol = ncol(img)
)
Using a manual search we find a column of pixels from the palette legend (column 630) and thin it to obtain only 99 colors:
pal_rain < img[96:699, 630]
pal_rain < pal_rain[seq(1, length(pal_rain), length.out = 99)]
For replacement we use a slightly adapted sequential_hcl()
that was suggested by Stauffer et al. (2015) for a precipitation warning map. The "PurpleYellow"
palette is currently only in version 1.41 of the package on RForge but other sequential HCL palettes could also be used here.
library("colorspace")
pal_hcl < sequential_hcl(99, "PurpleYellow", p1 = 1.3, c2 = 20)
Now for replacing the RGB rainbow colors with the sequential colors, the following approach is taken: The original image is indexed by matching the color of each pixel to the closest of the 99 colors from the rainbow palette. Furthermore, to preserve the black borders and the gray shadows, 50 shades of gray are also offered for the indexing. To match pixel colors to palette colors a simple Manhattan distance (sum of absolute distances) is used in the CIELUV color space:
# 50 shades of gray
pal_gray < gray(0:50/50)
## HCL coordinates for image and palette
img_luv < coords(as(hex2RGB(as.vector(img)), "LUV"))
pal_luv < coords(as(hex2RGB(c(pal_rain, pal_gray)), "LUV"))
## Manhattan distance matrix
dm < matrix(NA, nrow = nrow(img_luv), ncol = nrow(pal_luv))
for(i in 1:nrow(pal_luv)) dm[, i] < rowSums(abs(t(t(img_luv)  pal_luv[i,])))
idx < apply(dm, 1, which.min)
Now each element of the img
hex color matrix can be easily replaced by indexing a new palette with 99 colors (plus 50 shades of gray) using the idx
vector. This is what the pal_to_png()
function below does, writing the resulting matrix to a PNG file. The function is somewhat quick and dirty, makes no sanity checks, and assumes img
and idx
are in the calling environment.
pal_to_png < function(pal = pal_hcl, file = "influenza.png", rev = FALSE) {
ret < img
pal < if(rev) c(rev(pal), rev(pal_gray)) else c(pal, pal_gray)
ret[] < pal[idx]
ret < coords(hex2RGB(ret))
dim(ret) < c(dim(img), 3)
png::writePNG(ret, target = file)
}
With this function, we can easily produce the PNG graphic with the desaturated palette and the deuteranope version”
pal_to_png(desaturate(pal_rain), "influenzarainbowgray.png")
pal_to_png( deutan(pal_rain), "influenzarainbowdeutan.png")
The analogous graphics for the HCLbased "PurpleYellow"
palette are generated by:
pal_to_png( pal_hcl, "influenzapurpleyellow.png")
pal_to_png(desaturate(pal_hcl), "influenzapurpleyellowgray.png")
pal_to_png( deutan(pal_hcl), "influenzapurpleyellowdeutan.png")
Given that we have now extracted the pal_rain
palette and set up the pal_hcl
alternative we can also use the colorspace
function specplot()
to understand how the perceptual properties of the colors change across the two palettes. For the HCLbased palette the hue/chroma/luminance changes smoothly from dark/colorful purple to a light yellow. In contrast, in the original RGB rainbow chroma and, more importantly, luminance change nonmonotonically and rather abruptly:
specplot(pal_rain)
specplot(pal_hcl)
Given that the colors in the image are indexed now and the gray shades are in a separate subvector, we can now easily rev
erse the order in both subvectors. This yields a black background with white letters and we can use the "Inferno"
palette that works well on dark backgrounds:
pal_to_png(sequential_hcl(99, "Inferno"), "influenzainferno.png", rev = TRUE)
For more details on the limitations of the rainbow palette and further pointers see “The End of the Rainbow” by Hawkins et al. (2014) or “Somewhere over the Rainbow: How to Make Effective Use of Colors in Meteorological Visualizations” by Stauffer et al. (2015) as well as the #endrainbow hashtag on Twitter.
]]>The web page http://hclwizard.org/ had originally been started to accompany the manuscript: “Somewhere over the Rainbow: How to Make Effective Use of Colors in Meteorological Visualizations” by Stauffer et al. (2015, Bulletin of the American Meteorological Society) to facilitate the adoption of color palettes using the HCL (HueChromaLuminance) color model. It was realized using the R package colorspace in combination with shiny.
After the major update of the colorspace package the http://hclwizard.org/ has also just been relaunched, now hosting all three shiny color apps from the package:
This app allows to design new palettes interactively: qualitative palettes, sequential palettes with single or multiple hues, and diverging palettes (composed from two singlehue sequential palettes). The underlying HCL coordinates can be modified, starting out from a wide range of predefined palettes. The resulting palette can be assessed in various kinds of displays and exported in different formats.
This app allows to assess how well the colors in an uploaded graphics file (png/jpg/jpeg) work for viewers with color vision deficiencies. Different kinds of color blindness can be emulated: deuteranope (red deficient), protanope (green deficient), tritanope (blue deficient), monochrome (grayscale).
In addition to the palette creator app described above, this app provides a more traditional color picker. Sets of individual colors can be selected (and exported) by navigating different views of the HCL color space.
]]>Martin Wagner, Achim Zeileis (2019). “Heterogeneity and Spatial Dependence of Regional Growth in the EU: A Recursive Partitioning Approach.” German Economic Review, 20(1), 6782. doi:10.1111/geer.12146 [ pdf ]
We use modelbased recursive partitioning to assess heterogeneity of growth and convergence processes based on economic growth regressions for 255 European Union NUTS2 regions from 1995 to 2005. Spatial dependencies are taken into account by augmenting the modelbased regression tree with a spatial lag. The starting point of the analysis is a humancapitalaugmented Solowtype growth equation similar in spirit to Mankiw et al. (1992, The Quarterly Journal of Economics, 107, 407437). Initial GDP and the share of highly educated in the working age population are found to be important for explaining economic growth, whereas the investment share in physical capital is only significant for coastal regions in the PIIGS countries. For all considered spatial weight matrices recursive partitioning leads to a regression tree with four terminal nodes with partitioning according to (i) capital regions, (ii) noncapital regions in or outside the socalled PIIGS countries and (iii) inside the respective PIIGS regions furthermore between coastal and noncoastal regions. The choice of the spatial weight matrix clearly influences the spatial lag parameter while the estimated slope parameters are very robust to it. This indicates that accounting for heterogeneity is an important aspect of modeling regional economic growth and convergence.
https://CRAN.Rproject.org/package=lagsarlmtree
The growth model to be assessed for heterogeneity is a linear regression model for the average growth rate of real GDP per capita (ggdpcap) as the dependent variable with the following regressors:
Thus, a humancapitalaugmented version of the Solow model is employed, inspired by the by now classical work of Mankiw et al. (1992). The wellknown data sets from SalaiMartin et al. (2004) and Fernandez et al. (2001) are employed below for estimation.
To assess whether a single growth regression model with stable parameters across all EU regions is sufficient, splitting the data by the following partitioning variables is considered:
To adjust for spatial dependencies a spatial lag term with inverse distance weights is considered here. Other weight specifications lead to very similar estimated tree structures and regression coefficients, though.
library("lagsarlmtree")
data("GrowthNUTS2", package = "lagsarlmtree")
data("WeightsNUTS2", package = "lagsarlmtree")
tr < lagsarlmtree(ggdpcap ~ gdpcap0 + shgfcf + shsh + shsm 
gdpcap0 + accessrail + accessroad + capital + regboarder + regcoast + regobj1 + cee + piigs,
data = GrowthNUTS2, listw = WeightsNUTS2$invw, minsize = 12, alpha = 0.05)
print(tr)
## Spatial lag model tree
##
## Model formula:
## ggdpcap ~ gdpcap0 + shgfcf + shsh + shsm  gdpcap0 + accessrail +
## accessroad + capital + regboarder + regcoast + regobj1 +
## cee + piigs
##
## Fitted party:
## [1] root
##  [2] capital in no
##   [3] piigs in no: n = 176
##   (Intercept) gdpcap0 shgfcf shsh shsm
##   0.11055 0.01171 0.00208 0.02195 0.00179
##   [4] piigs in yes
##    [5] regcoast in no: n = 13
##    (Intercept) gdpcap0 shgfcf shsh shsm
##    0.1606 0.0159 0.0469 0.0789 0.0234
##    [6] regcoast in yes: n = 39
##    (Intercept) gdpcap0 shgfcf shsh shsm
##    0.07348 0.01106 0.09156 0.11668 0.00942
##  [7] capital in yes: n = 27
##  (Intercept) gdpcap0 shgfcf shsh shsm
##  0.2056 0.0223 0.0075 0.0411 0.0528
##
## Number of inner nodes: 3
## Number of terminal nodes: 4
## Number of parameters per node: 5
## Objective function (residual sum of squares): 0.0155
##
## Rho (from lagsarlm model):
## rho
## 0.837
The resulting linear regression tree can be visualized with pvalues from the parameter stability tests displayed in the inner nodes and a scatter plot of GDP per capita growth (ggdpcap) vs. (log) initial real GDP per capita (ggdcap0) in the terminal nodes:
plot(tr, tp_args = list(which = 1))
It is most striking that the speed of βconvergence is much higher for the 27 capital regions. More details about differences in the other regressors are shown in the table below. Finally, it is of interest which variables were not selected for splitting in the tree, i.e., are not associated with significant parameter instabilities: initial income, the border dummy, and Objective 1 regions, among others.
Node  n  Partitioning  variables  Regressor  variables  

capital  piigs  regcoast  (Const.)  gdpcap0  shgfcf  shsh  shsm  
3  176  no  no  –  0.111 (0.016) 
–0.0117 (0.0016) 
–0.0021 (0.0077) 
0.022 (0.011) 
0.0018 (0.0068) 

5  13  no  yes  no  0.161 (0.128) 
–0.0159 (0.0135) 
–0.0469 (0.0815) 
0.079 (0.059) 
–0.0234 (0.0660) 

6  39  no  yes  yes  0.073 (0.056) 
–0.0111 (0.0059) 
0.0916 (0.0420) 
0.117 (0.029) 
0.0094 (0.0218) 

7  27  yes  –  –  0.206 (0.031) 
–0.0223 (0.0029) 
–0.0075 (0.0259) 
0.041 (0.020) 
0.0528 (0.0117) 
For more details see the full manuscript. Replication materials for the entire analysis from the manuscript are available as a demo in the package:
demo("GrowthNUTS2", package = "lagsarlmtree")
]]>The colorspace package provides a broad toolbox for selecting individual colors or color palettes, manipulating these colors, and employing them in various kinds of visualizations. Version 1.4.0 has just been released on CRAN, containing many new features and contributions from new coauthors. A new web site presenting and documenting the package has been launched at http://colorspace.RForge.Rproject.org/.
At the core of the package there are various utilities for computing with color spaces (as the name conveys). Thus, the package helps to map various threedimensional representations of color to each other. A particularly important mapping is the one from the perceptuallybased and deviceindependent color model HCL (HueChromaLuminance) to standard RedGreenBlue (sRGB) which is the basis for color specifications in many systems based on the corresponding hex codes (e.g., in HTML but also in R). For completeness further standard color models are included as well in the package: polarLUV()
(= HCL), LUV()
, polarLAB()
, LAB()
, XYZ()
, RGB()
, sRGB()
, HLS()
, HSV()
.
The HCL space (= polar coordinates in CIELUV) is particularly useful for specifying individual colors and color palettes as its three axes match those of the human visual system very well: Hue (= type of color, dominant wavelength), chroma (= colorfulness), luminance (= brightness).
The colorspace package provides three types of palettes based on the HCL model:
qualitative_hcl()
.sequential_hcl()
.diverging_hcl()
.To aid choice and application of these palettes there are: scales for use with ggplot2; shiny (and tcltk) apps for interactive exploration; visualizations of palette properties; accompanying manipulation utilities (like desaturation, lighten/darken, and emulation of color vision deficiencies).
More detailed overviews and examples are provided in the articles:
The stable release version of colorspace is hosted on the Comprehensive R Archive Network (CRAN) at https://CRAN.Rproject.org/package=colorspace and can be installed via
install.packages("colorspace")
The development version of colorspace is hosted on RForge at https://RForge.Rproject.org/projects/colorspace/ in a Subversion (SVN) repository. It can be installed via
install.packages("colorspace", repos = "http://RForge.Rproject.org")
For Python users a beta reimplementation of the full colorspace package in Python 2/Python 3 is also available, see https://github.com/retostauffer/pythoncolorspace.
The colorspace package ships with a wide range of predefined color palettes, specified through suitable trajectories in the HCL (huechromaluminance) color space. A quick overview can be gained easily with hcl_palettes()
:
library("colorspace")
hcl_palettes(plot = TRUE)
Using the names from the plot above and a desired number of colors in the palette a suitable color vector can be easily computed, e.g.,
q4 < qualitative_hcl(4, "Dark 3")
q4
## [1] "#E16A86" "#909800" "#00AD9A" "#9183E6"
The functions sequential_hcl()
, and diverging_hcl()
work analogously. Additionally, their hue/chroma/luminance parameters can be modified, thus allowing to easily customize each palette. Moreover, the choose_palette()
/hclwizard()
app provide convenient user interfaces to do the customization interactively. Finally, even more flexible diverging HCL palettes are provided by divergingx_hcl()
.
The color vectors returned by the HCL palette functions can usually be passed directly to most base graphics function, typically through the col
argument. Here, the q4
vector created above is used in a time series display:
plot(log(EuStockMarkets), plot.type = "single", col = q4, lwd = 2)
legend("topleft", colnames(EuStockMarkets), col = q4, lwd = 3, bty = "n")
As another example for a sequential palette a spine plot is created below, displaying the proportion of Titanic passengers that survived per class. The Purples 3
palette is used which is quite similar to the ColorBrewer.org palette Purples
. Here, only two colors are employed, yielding a dark purple and light gray.
ttnc < margin.table(Titanic, c(1, 4))[, 2:1]
spineplot(ttnc, col = sequential_hcl(2, "Purples 3"))
To plug the HCL color palettes into ggplot2 graphics suitable discrete and/or continuous gglot2 color scales are provided. The scales are called via the scheme scale_<aesthetic>_<datatype>_<colorscale>()
, where <aesthetic>
is the name of the aesthetic (fill
, color
, colour
), <datatype>
is the type of the variable plotted (discrete
or continuous
) and <colorscale>
sets the type of the color scale used (qualitative
, sequential
, diverging
, divergingx
).
To illustrate their usage two simple examples are shown using the qualitative Dark 3
and sequential Purples 3
palettes that were also employed above. First, semitransparent shaded densities of the sepal length from the iris data are shown, grouped by species.
library("ggplot2")
ggplot(iris, aes(x = Sepal.Length, fill = Species)) + geom_density(alpha = 0.6) +
scale_fill_discrete_qualitative(palette = "Dark 3")
The sequential palette is used to code the cut levels in a scatter of price by carat in the diamonds data (or rather a small subsample thereof). The scale function first generates six colors but then drops the first color because the light gray is too light in this display. (Alternatively, the chroma and luminance parameters could also be tweaked.)
dsamp < diamonds[1 + 1:1000 * 50, ]
ggplot(dsamp, aes(carat, price, color = cut)) + geom_point() +
scale_color_discrete_sequential(palette = "Purples 3", nmax = 6, order = 2:6)
The colorspace package also provides a number of functions that aid visualization and assessment of its palettes.
demoplot()
can display a palette (with arbitrary number of colors) in a range of typical and somewhat simplified statistical graphics.hclplot()
converts the colors of a palette to the corresponding hue/chroma/luminance coordinates and displays them in HCL space with one dimension collapsed. The collapsed dimension is the luminance for qualitative palettes and the hue for sequential/diverging palettes.specplot()
also converts the colors to hue/chroma/luminance coordinates but draws the resulting spectrum in a line plot.For the qualitative Dark 3
palette from above the following plots can be obtained.
demoplot(q4, "bar")
hclplot(q4)
specplot(q4, type = "o")
A bar plot would be another typical application for a qualitative palette (instead of the time series and density plot used above). However, a lighter and less colorful palette might be preferable in this situation (e.g., Pastel 1
or Set 3
).
The other two displays show that luminance is (almost) constant in the palette while the hue changes linearly along the color “wheel”. Ideally, chroma would have also been constant to completely balance the colors. However, at this luminance the maximum chroma differs across hues so that the palette is fixed up to use less chroma for the yellow and green elements.
Subsequently, the same assessment is carried out for the sequential Purples 3
palette as employed above.
s9 < sequential_hcl(9, "Purples 3")
demoplot(s9, "heatmap")
hclplot(s9)
specplot(s9, type = "o")
Here, a heatmap (based on the wellknown Maunga Whau volcano data) is used as a typical application for a sequential palette. The elevation of the volcano is brought out clearly, focusing with the dark colors on the higher elevations.
The other two displays show that hue is constant in the palette while luminance and chroma vary. Luminance increases monotonically from dark to light (as required for a proper sequential palette). Chroma is triangularshaped which allows to better distinguish the middle colors in the palette (compared to a monotonic chroma trajectory).
]]>Manuel Gebetsberger, Jakob W. Messner, Georg J. Mayr, Achim Zeileis (2018). “Estimation Methods for Nonhomogeneous Regression Models: Minimum Continuous Ranked Probability Score versus Maximum Likelihood.” Monthly Weather Review. 146(12), 43234338. doi:10.1175/MWRD170364.1
Nonhomogeneous regression models are widely used to statistically postprocess numerical ensemble weather prediction models. Such regression models are capable of forecasting full probability distributions and correcting for ensemble errors in the mean and variance. To estimate the corresponding regression coefficients, minimization of the continuous ranked probability score (CRPS) has widely been used in meteorological postprocessing studies and has often been found to yield more calibrated forecasts compared to maximum likelihood estimation. From a theoretical perspective, both estimators are consistent and should lead to similar results, provided the correct distribution assumption about empirical data. Differences between the estimated values indicate a wrong specification of the regression model. This study compares the two estimators for probabilistic temperature forecasting with nonhomogeneous regression, where results show discrepancies for the classical Gaussian assumption. The heavytailed logistic and Student?s t distributions can improve forecast performance in terms of sharpness and calibration, and lead to only minor differences between the estimators employed. Finally, a simulation study confirms the importance of appropriate distribution assumptions and shows that for a correctly specified model the maximum likelihood estimator is slightly more efficient than the CRPS estimator.
https://CRAN.Rproject.org/package=crch
The function crch()
provides heteroscedastic (or nonhomogenous) regression models of "gaussian"
(i.e., normally distributed), "logistic"
, or "student"
(i.e., tdistributed) response variables. Additionally, responses may be censored or truncated. Estimation methods include maximum likelihood (type = "ml"
, default) and minimum CRPS (type = "crps"
). Boosting can also be employed for model fitting (instead of full optimization). CRPS computations leverage the excellent scoringRules package.
The plots below show histograms of the PIT (probability integral transform) for various nonhomogenous regression models yielding probabilistic 1dayahead temperature forecasts at an Alpine site (Innsbruck). When the probabilistic forecasts are perfectly calibrated to the actual observations the PIT histograms should form a straight line at density 1. The gray area illustrates the 95% consistency interval around perfect calibration  and binning is based on 5% intervals.
When a normally distributed or Gaussian response is assumed (left panel), it is shown that the maximumlikelihood model (solid line) is not well calibrated as the tails are not heavy enough. (The legend denotes this “LS” because maximizing the likelihood is equivalent to minimizing the socalled logscore.) In contrast, the minimumCRPS model is reasonably well calibrated.
When assuming a Studentt response (right panel) there is little deviation between both estimation techniques and both are wellcalibrated.
Thus, the source of the differences between CRPS and MLbased estimation with a Gaussian response comes from assuming a distribution whose tails are not heavy enough. In this situation, minimumCRPS yields the somewhat more robust model fit while both estimation techniques lead to very similar results if a more suitable response distribution is adopted. In the latter case ML is slightly more efficient than minimumCRPS.
]]>Heidi Seibold, Torsten Hothorn, Achim Zeileis (2018). “Generalised Linear Model Trees with Global Additive Effects.” Advances in Data Analysis and Classification. Forthcoming. doi:10.1007/s1163401803421 arXiv
Modelbased trees are used to find subgroups in data which differ with respect to model parameters. In some applications it is natural to keep some parameters fixed globally for all observations while asking if and how other parameters vary across subgroups. Existing implementations of modelbased trees can only deal with the scenario where all parameters depend on the subgroups. We propose partially additive linear model trees (PALM trees) as an extension of (generalised) linear model trees (LM and GLM trees, respectively), in which the model parameters are specified a priori to be estimated either globally from all observations or locally from the observations within the subgroups determined by the tree. Simulations show that the method has high power for detecting subgroups in the presence of global effects and reliably recovers the true parameters. Furthermore, treatmentsubgroup differences are detected in an empirical application of the method to data from a mathematics exam: the PALM tree is able to detect a small subgroup of students that had a disadvantage in an exam with two versions while adjusting for overall ability effects.
https://CRAN.Rproject.org/package=palmtree
PALM trees are employed to investigate treatment differences in a mathematics 101 exam (for firstyear business and economics students) at Universität Innsbruck. Due to limited availability of seats in the exam room, students could selfselect into one of two exam tracks that were conducted back to back with slightly different questions on the same topics. The question is whether this “treatment” of splitting the students into two tracks was fair in the sense that it is on average equally difficult for the two groups. To investigate the question the data are loaded from the psychotools package, points are scaled to achieved percent in [0, 100], and the subset of variables for the analysis are selected:
data("MathExam14W", package = "psychotools")
MathExam14W$tests < 100 * MathExam14W$tests/26
MathExam14W$pcorrect < 100 * MathExam14W$nsolved/13
MathExam < MathExam14W[ , c("pcorrect", "group", "tests", "study",
"attempt", "semester", "gender")]
A naive check could be whether the percentage of correct points (pcorrect
) differs between the two group
s:
ci < function(object) cbind("Coefficient" = coef(object), confint(object))
ci(lm(pcorrect ~ group, data = MathExam))
## Coefficient 2.5 % 97.5 %
## (Intercept) 57.60 55.1 60.08
## group2 2.33 5.7 1.03
This shows that the second group achieved on average 2.33 percentage points less than the first group. But the corresponding confidence interval conveys that this difference is not significant.
However, it is conceivable that stronger (or weaker) students selected themselves more into one of the two groups. And if the assignment had been random, then the “treatment effect” might have been larger or even smaller. Luckily, an independent measure of the students’ ability is available, namely the percentage of points achieved in the online tests
conducted during the semester prior to the exam. Adjusting for that increases the treatment effect to a decrease of 4.37 percentage points which is still nonsignificant, though. This is due to weaker students selfselecting into the second group. Moreover, the tests
coefficient signals that 1 more percentage point from the online tests lead on average to 0.855 more percentage points in the written exam.
ci(lm(pcorrect ~ group + tests, data = MathExam))
## Coefficient 2.5 % 97.5 %
## (Intercept) 5.846 13.521 1.828
## group2 4.366 7.231 1.502
## tests 0.855 0.756 0.955
Finally, PALM trees are used to assess whether there are subgroups of differential group
treatment effects when adjusting for a global additive tests
effect. Potential subgroups can be formed from the covariates tests
, type of study
(threeyear bachelor vs. fouryear diploma), the number of times the students attempt
ed the exam, number of semester
s, and gender
. Using palmtree this can be easily carried out:
library("palmtree")
palmtree_math < palmtree(pcorrect ~ group  tests  tests +
study + attempt + semester + gender, data = MathExam)
print(palmtree_math)
## Partially additive linear model tree
##
## Model formula:
## pcorrect ~ group  tests + study + attempt + semester + gender
##
## Fitted party:
## [1] root
##  [2] attempt <= 1
##   [3] tests <= 92.3: n = 352
##   (Intercept) group2
##   7.09 3.00
##   [4] tests > 92.3: n = 79
##   (Intercept) group2
##   14.0 14.5
##  [5] attempt > 1: n = 298
##  (Intercept) group2
##  2.33 1.70
##
## Number of inner nodes: 2
## Number of terminal nodes: 3
## Number of parameters per node: 2
## Objective function (residual sum of squares): 253218
##
## Linear fixed effects (from palm model):
## tests
## 0.787
A somewhat enhanced version of plot(palmtree_math)
is shown below:
This indicates that for most students the group
treatment effect is indeed negligible. However, for the subgroup of “good” students (with high percentage correct in the online tests) in the first attempt, the exam in the second group was indeed more difficult. On average the students in the second group obtained 14.5 percentage points less than in the first group.
ci(palmtree_math$palm)
## Coefficient 2.5 % 97.5 %
## (Intercept) 7.088 16.148 1.971
## .tree4 21.069 13.348 28.791
## .tree5 9.421 5.168 13.673
## tests 0.787 0.671 0.903
## .tree3:group2 2.997 6.971 0.976
## .tree4:group2 14.494 22.921 6.068
## .tree5:group2 1.704 5.965 2.557
The absolute size of this group difference is still moderate, though, corresponding to about half an exercise out of 13.
In addition to the empirical case study the manuscript also provides an extensive simulation study comparing the performance of PALM trees in treatmentsubgroup scenarios to standard linear model (LM) trees, optimal treatment regime (OTR) trees (following Zhang et al. 2012), and the STIMA algorithm (simultaneous threshold interaction modeling algorithm). The study evaluates the methods with respect to (1) finding the correct subgroups, (2) not splitting when there are no subgroups, (3) finding the optimal treatment regime, and (4) correctly estimating the treatment effect.
Here we just briefly highlight the results for question (1): Are the correct subgroups found? The figure below shows the mean number of subgroups (over 150 simulated data sets and mean adjusted rand index (ARI) for increasing treatment effect differences Δ_{β} and number of observations n.
This shows that PALM trees perform increasingly well and somewhat better with respect to these metrics than the competitors. More details on the different scenarios and corresponding evaluations can be found in the manuscript. More replication materials are provided along with the manuscript on the publisher’s web page.
]]>Thorsten Simon, Peter Fabsic, Georg J. Mayr, Nikolaus Umlauf, Achim Zeileis (2018). “Probabilistic Forecasting of Thunderstorms in the Eastern Alps.” Monthly Weather Review. 146(9), 29993009. doi:10.1175/MWRD170366.1
A probabilistic forecasting method to predict thunderstorms in the European eastern Alps is developed. A statistical model links lightning occurrence from the groundbased Austrian Lightning Detection and Information System (ALDIS) detection network to a large set of direct and derived variables from a numerical weather prediction (NWP) system. The NWP system is the highresolution run (HRES) of the European Centre for MediumRange Weather Forecasts (ECMWF) with a grid spacing of 16 km. The statistical model is a generalized additive model (GAM) framework, which is estimated by Markov chain Monte Carlo (MCMC) simulation. Gradient boosting with stability selection serves as a tool for selecting a stable set of potentially nonlinear terms. Three grids from 64 x 64 to 16 x 16 km^{2} and five forecast horizons from 5 days to 1 day ahead are investigated to predict thunderstorms during afternoons (1200–1800 UTC). Frequently selected covariates for the nonlinear terms are variants of convective precipitation, convective potential available energy, relative humidity, and temperature in the midlayers of the troposphere, among others. All models, even for a lead time of 5 days, outperform a forecast based on climatology in an outofsample comparison. An example case illustrates that coarse spatial patterns are already successfully forecast 5 days ahead.
https://CRAN.Rproject.org/package=bamlss
Predicting thunderstorms in complex terrain (like the Austrian Alps) is a challenging task since one of the main forecasting tools, NWP systems, cannot fully resolve convective processes or circulations and exchange processes over complex topography. However, using a boosted binary GAM based on a broad range of NWP outputs useful forecasts can be obtained up to 5 days ahead. As an illustration, lightning activity for the afternoon of 20150722 is shown in the topleft panel below, indicating thunderstorms in many areas in the west but not the east. While the corresponding baseline climatology (top middle) has a low probability of thunderstorms for the entire region, the NWPbased probabilistic forecasts (bottom row) highlight increased probabilities already 5 days ahead, becoming much more clear cut when moving to 3 days and 1 day ahead.
More precisely, the probability of thunderstorms is predicted based on a binary logit GAM that allows for potentially nonlinear smooth effects in all NWP variables considered. It selects the relevant variables by gradient boosting coupled with stability selection. Effects and 95% credible intervals of the model for day 1 are estimated via MCMC sampling and shown below (on the logit scale). The number in the bottomright corner of each panel indicates the absolute range of the effect. The xaxes are cropped at the 1% and 99% quantiles of the respective covariate to enhance graphical representation.
(Note: As the data cannot be shared freely, the customary replication materials unfortunately cannot be provided.)
]]>Florian Wickelmaier, Achim Zeileis (2018). “Using Recursive Partitioning to Account for Parameter Heterogeneity in Multinomial Processing Tree Models.” Behavior Research Methods, 50(3), 12171233. doi:10.3758/s134280170937z
In multinomial processing tree (MPT) models, individual differences between the participants in a study can lead to heterogeneity of the model parameters. While subject covariates may explain these differences, it is often unknown in advance how the parameters depend on the available covariates, that is, which variables play a role at all, interact, or have a nonlinear influence, etc. Therefore, a new approach for capturing parameter heterogeneity in MPT models is proposed based on the machine learning method MOB for modelbased recursive partitioning. This procedure recursively partitions the covariate space, leading to an MPT tree with subgroups that are directly interpretable in terms of effects and interactions of the covariates. The pros and cons of MPT trees as a means of analyzing the effects of covariates in MPT model parameters are discussed based on simulation experiments as well as on two empirical applications from memory research. Software that implements MPT trees is provided via the mpttree
function in the psychotree package in R.
https://CRAN.Rproject.org/package=psychotree
To highlight how MPT trees can capture the influence of covariates on the parameters in MPT models, data from a source monitoring experiment are analyzed, that was conducted at the Department of Psychology, University of Tübingen.
Study: Participants were presented with items from two different sources (labeled A vs. B) and afterwards, in a memory test, were shown old and new items intermixed and asked to classify them as either A, B, or new (N). In the experiment the two sources were controlled such that half of the respondents had to read the presented items either quietly (A = think) or aloud (B = say). The other half wrote them down (A = write) or read them aloud (B = say). Items were presented on a computer screen at a selfpaced rate. In the final memory test, the studied items and distractor items had to be classified as either A, B, or new (N) by pressing a button on the screen.
Model: To infer the cognitive processes a wellknown MPT model is employed that was established by the late Bill Batchelder (who passed away earlier this month) and David Riefer for the source monitoring paradigm:
Explanation: Consider the paths from the root to an A response for a Source A item (left). With probability D1, a respondent detects an item as old. If, in a second step, he/she is able to discriminate the item from a Source B item (d1), then the response will correctly be A; else, if discrimination fails (1  d1), a correct A response can only be guessed with probability a. If the item was not detected as old in the first place (1  D1), the response will be A only if there are both a response bias for “old” (b) and a guess for the item being Source A (g). The remaining paths in the left tree lead to classification errors (B, N). The trees for Source B and new items work analogously. Moreover, a = g is assumed for identifiability and discriminability is assumed to be equal for both sources (d1 = d2) as in a similar example in Batchelder and Riefer (1990).
Question: Do these probabilities in the source monitoring (D1, D2, d, b, g) depend on the source condition (thinksay vs. writesay), or gender or age of the participants?
Answer: The MPTbased model tree (MOB) finds a highly significant difference between the thinksay and writesay source condition. Furthermore, there is an age difference in the thinksay condition that is significant at a Bonferronicorrected 5% level. Gender is not found to play a significant role.
Probabilities: For the thinksay sources (Nodes 3 and 4), probability D2 exceeds D1 indicating an advantage of say items over think items with respect to detectability. For the writesay sources (Node 5), D2 and D1 are about the same indicating that for these sources no such advantage exists. The thinksay subgroup is further split by age with the older participants having lower values on D1 and d, which suggests lower detectability of think items and lower discriminability as compared to the younger participants. This age effect seems to depend on the type of sources as there is no such effect for the writesay sources. In addition, there are only small effects for the bias parameters b and g, which are psychologically less interesting. Some of the differences in the probabilities across groups/nodes can be brought out even more clearly by parameter estimates and corresponding 95% Wald confidence intervals:
]]>Most of the improvements and new features pertain to clustered covariances which had been introduced to the sandwich package last year in version 2.40. For this my PhD student Susanne Berger and myself (= Achim Zeileis) teamed up with Nathaniel Graham, the maintainer of the multiwayvcov package. With the new version 2.50 almost all features from multiwayvcov have been ported to sandwich, mostly implemented from scratch along with generalizations, extensions, speedups, etc.
The full list of changes can be seen in the NEWS file. The most important changes are:
The manuscript vignette("sandwichCL", package = "sandwich")
has been significantly improved based on very helpful and constructive reviewer feedback. See also below.
The cluster
argument for the vcov*()
functions can now be a formula, simplifying its usage (see below). NA
handling has been added as well.
Clustered bootstrap covariances have been reimplemented and extended in vcovBS()
. A dedicated method for lm
objects is considerably faster now and also includes various wild bootstraps.
Convenient parallelization for bootstrap covariances is now available.
Bugs reported by James Pustejovsky and Brian Tsay, respectively, have been fixed.
Susanne Berger, Nathaniel Graham, Achim Zeileis: Various Versatile Variances: An ObjectOriented Implementation of Clustered Covariances in R
Clustered covariances or clustered standard errors are very widely used to account for correlated or clustered data, especially in economics, political sciences, or other social sciences. They are employed to adjust the inference following estimation of a standard leastsquares regression or generalized linear model estimated by maximum likelihood. Although many publications just refer to “the” clustered standard errors, there is a surprisingly wide variety of clustered covariances, particularly due to different flavors of bias corrections. Furthermore, while the linear regression model is certainly the most important application case, the same strategies can be employed in more general models (e.g. for zeroinflated, censored, or limited responses).
In R, functions for covariances in clustered or panel models have been somewhat scattered or available only for certain modeling functions, notably the (generalized) linear regression model. In contrast, an objectoriented approach to “robust” covariance matrix estimation  applicable beyond lm()
and glm()
 is available in the sandwich package but has been limited to the case of crosssection or time series data. Now, this shortcoming has been corrected in sandwich (starting from version 2.4.0): Based on methods for two generic functions (estfun()
and bread()
), clustered and panel covariances are now provided in vcovCL()
, vcovPL()
, and vcovPC()
. Moreover, clustered bootstrap covariances, based on update()
for models on bootstrap samples of the data, are provided in vcovBS()
. These are directly applicable to models from many packages, e.g., including MASS, pscl, countreg, betareg, among others. Some empirical illustrations are provided as well as an assessment of the methods’ performance in a simulation study.
To show how easily the clustered covariances from sandwich
can be applied in practice, two short illustrations from the manuscript/vignette are used. In addition to the sandwich
package the lmtest
package is employed to easily obtain Wald tests of all coefficients:
library("sandwich")
library("lmtest")
options(digits = 4)
First, a Poisson model with clustered standard errors from Aghion et al. (2013, American Economic Review) is replicated. To investigate the effect of institutional ownership on innovation (as captured by citationweighted patent counts) they employ a (pseudo)Poisson model with industry/year fixed effects and standard errors clustered by company, see their Table I(3):
data("InstInnovation", package = "sandwich")
ii < glm(cites ~ institutions + log(capital/employment) + log(sales) + industry + year,
data = InstInnovation, family = poisson)
coeftest(ii, vcov = vcovCL, cluster = ~ company)[2:4, ]
## Estimate Std. Error z value Pr(>z)
## institutions 0.009687 0.002406 4.026 5.682e05
## log(capital/employment) 0.482884 0.135953 3.552 3.826e04
## log(sales) 0.820318 0.041523 19.756 7.187e87
Second, a simple linear regression model with doubleclustered standard errors is replicated using the wellknown Petersen data from Petersen (2009, Review of Financial Studies):
data("PetersenCL", package = "sandwich")
p < lm(y ~ x, data = PetersenCL)
coeftest(p, vcov = vcovCL, cluster = ~ firm + year)
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>t)
## (Intercept) 0.0297 0.0651 0.46 0.65
## x 1.0348 0.0536 19.32 <2e16 ***
## 
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
In addition to the description of the methods and the software, the manuscript/vignette also contains a simulation study that investigates the properties of clustered covariances. In particular, this assesses how well the methods perfom in models beyond linear regression but also compares different types of bias adjustments (HC0HC3) and alternative estimation techniques (generalized estimating equations, mixed effects).
The detailed results are presented in the manuscript  here we just show the results from one of the simulation experiments: The empirical coverage of 95% Wald confidence intervals is depicted for a beta regression, zeroinflated Poisson, and zerotruncated Poisson model. With increasing correlation within the clusters the conventional “standard” errors and “basic” robust sandwich standard errors become too small thus leading to a drop in empirical coverage. However, both clustered HC0 standard errors (CL0) and clustered bootstrap standard errors (BS) perform reasonably well, leading to empirical coverages close to the nominal 0.95.
Details: Data sets were simulated with 100 clusters of 5 observations each. The cluster correlation (on the xaxis) was generated with a Gaussian copula. The only regressor had a correlation of 0.25 with the clustering variable. Empirical coverages were computed from 10,000 replications.
]]>Last week France won the 2018 FIFA World Cup in a match against Croatia in Russia, thus delivering an entertaining final to a sportful tournament. Many perceived the course of the tournament as very unexpected and surprising because many of the “usual” favorites like Brazil, Germany, Spain, or Argentina did not even make it to the semifinals. And in contrast, teams like host Russia and finalist Croatia proceeded further than expected. However, does this really mean that expectations of experts and fans were so wrong? Or, how surprising was the result given pretournament predictions?
Therefore, we want to take a critical look back at our own Probabilistic Forecast for the 2018 FIFA World Cup based on the bookmaker consensus model that aggregated the expert judgments of 26 bookmakers and betting exchanges. A set of presentation slides (in PDF format) with explanations of the model and its evaluation are available to accompany this blog post: slides.pdf
Despite some surprises in the tournament, the probabilistic bookmaker consensus forecast fitted reasonably well. Although it is hard to evaluate probabilistic forecasts with only one realization of the tournament but by and large most outcomes do not deviate systematically from the probabilities assigned to them.
However, there is one notable exception: Expectations about defending champion Germany were clearly wrong. “Die Mannschaft” was predicted to advance from the group stage to the round of 16 with probability 89.1%  and they not only failed to do so but instead came in last in their group.
Other events that were perceived as surprising were not so unlikely to begin, e.g., for Argentina it was more likely to get eliminated before the quarter finals (predicted probability: 51%) than to proceed further. Or they were not unlikely conditional on previous tournament events. Examples for the latter are the pretournament prediction for Belgium beating Brazil in a match (40%) or Russia beating Spain (33%). Of course, another outcome of those matches was more likely but compared with these predictions the results were maybe not as surprising as perceived by many. Finally, the pretournament prediction of Croatia making it to the final was only 6% but conditional on the events from the round of 16 (especially with Spain being eliminated) this increased to 27% (only surpassed by England with 36%).
The animated GIF below shows the pretournament predictions for each team winning the 2018 FIFA world cup. In the animation the teams that “survived” over the course of the tournament are highlighted. This clearly shows that the elimination of Germany (winning probability: 15.8%) was the big surprise in the group stage but otherwise almost all of the teams expected to proceed also did so. Afterwards, two of the other main favorites Brazil (16.6%) and Spain (12.5%) dropped out but eventually the fourth team with doubledigit winning probability (France, 12.1%) prevailed.
Compared to other rankings of the teams in the tournament, the bookmaker consensus model did quite well. To illustrate this we compute the Spearman rank correlation of observed partial tournament ranking (1 FRA, 2 CRO, 3 BEL, 4 ENG, 6.5 URU, 6.5 BRA, …) with the bookmaker consensus model as well as Elo and FIFA rating.
Method  Correlation 

Bookmaker consensus Elo rating FIFA rating 
0.704 0.592 0.411 
As there is no good way to assess the predicted winning probabilities for winning the title with only one realization of the tournament, we at least (roughly) assess the quality of the predicted probabilities for the individual matches. To do so, we split the 63 matches into three groups, depending on the winning probability of the stronger team.
This gives us matches that were predicted to be almost even (5058%), had moderate advantages for the stronger team (5872%), or clear advantages for the stronger team (7285%). It turns out that in the latter two groups the average predicted probabilities (dashed red line) match the actual observed proportions quite well. Only in the “almost even” group, the stronger teams won slightly more often than expected.
As already mentioned above, there was only one big surprise in the group stage  with Germany being eliminated. As the tables below show, most other results from the group rankings conformed quite well with the predicted probabilities to “survive” the group stage.
A Rank 
Team 
Prob. (in %) 

1 2 3 4 
URU RUS KSA EGY 
68.1 64.2 19.2 39.3 
B Rank 
Team 
Prob. (in %) 

1 2 3 4 
ESP POR IRN MAR 
85.9 66.3 26.5 27.3 
C Rank 
Team 
Prob. (in %) 

1 2 3 4 
FRA DEN PER AUS 
87.0 46.7 31.7 25.2 
D Rank 
Team 
Prob. (in %) 

1 2 3 4 
CRO ARG NGA ISL 
58.7 78.7 41.2 30.9 
E Rank 
Team 
Prob. (in %) 

1 2 3 4 
BRA SUI SRB CRC 
89.9 45.4 39.0 22.6 
F Rank 
Team 
Prob. (in %) 

1 2 3 4 
SWE MEX KOR GER 
44.5 45.2 26.8 89.1 
G Rank 
Team 
Prob. (in %) 

1 2 3 4 
BEL ENG TUN PAN 
81.7 75.6 23.5 23.2 
H Rank 
Team 
Prob. (in %) 

1 2 3 4 
COL JPN SEN POL 
64.6 36.3 37.9 57.9 