Title: Estimating Statistical Models on Data Streams

Authors: Lianne Ippel, Maurits C. Kaptein, Jeroen K. Vermunt

Affiliation: Tilburg University

Abstract: 
Tracking people over a long period of time has never been as easy as it is in
the current Internet-era. Continuous tracking of individuals presents us with
opportunities to study human behavior, attitudes, and emotions in greater
detail: for example using experience sampling, we can collect data on people’s
behavior and feelings in real time. However, continuous tracking also presents
us with new challenges: how can we analyze these ever-growing streams of data
for our research? During my PhD project, I focus on estimating statistical
models on data streams. While traditional estimation methods often revisit older
data points to compute up-to-date estimates of model parameters, online learning
or row-by-row estimation, updates model parameters without revisiting or storing
older data. Several [R] functions that allow the online estimation of
statistical models, such as online linear regression can be found on
github.com/L-Ippel/Methodology. My project mainly focuses on estimating
multilevel models in data streams. While many statistical models hardly pose any
problem when estimated online, models where the likelihood function is maximized
iteratively are more difficult to fit online because the traditional estimation
procedures are already approximate and revisit the entire dataset repeatedly. We
have developed an online algorithm in R to estimate multilevel models on data
streams.