Title: Modern Missing-Data Methods Implemented in the R Package semTools

Author: Terrence D. Jorgensen

Affiliation: University of Amsterdam

Abstract:
Missing data are ubiquitous in social and behavioral sciences. If data are
missing completely at random (a very restrictive assumption), traditional
deletion methods yield unbiased point estimates but can lower power
considerably. If data are merely missing at random conditional on the variables
in the model (a more realistic assumption), deletion methods yield biased point
estimates, and single-imputation methods yield inflated Type I errors due to
biased SEs. Modern methods for missing data analysis include multiple imputation
and maximum likelihood (ML) estimation, which yield unbiased point and SE
estimates when data are missing at random. Although multiple imputation can be
used prior to analysis using any parametric model, specifying even ANOVA and
regression models as structural equation models (SEM) allows users to take
advantage of ML methods as well. The R package semTools is part of the lavaan
ecosystem, providing additional functionality available in popular commercial
SEM software (e.g., Mplus, EQS) that is not available in lavaan itself. The
semTools package provides a suite of missing data tools, allowing users (a) to
diagnose the potential effect of missing data on inferences, (b) to
automatically incorporate auxiliary variables using the "saturated
correlates" approach when fitting a model using lavaan's full-information ML
estimator, (c) to easily implement multiple imputation or two-stage ML, or (d)
to appropriately implement the Bollen-Stine bootstrap for partially observed
data. I will introduce the functionality and options available in semTools,
using applied examples as a guide.