Title: eatDB: A Spreadsheet Interface to Relational Data Bases Authors: Benjamin Becker Affiliation: Humboldt University Berlin Abstract: In educational large-scale assessments, often a substantial amount of data is collected. The imputation of missing responses and the estimation of person parameters via plausible values enhance this problem even further. Storing this kind of hierarchical data (e.g. pupils nested in classes/schools, imputations nested in persons) in a common two dimensional spreadsheet saving style (e.g. as .csv, .sav, or .RData) is very inefficient. R needs enough working memory to store the complete two dimensional data, even if only a certain subset of the data is to be used for analysis. Therefore, data sets from educational large-scale assessments like PISA, PIAAC or the German Bildungstrend, can often not be loaded into R on common hardware setups. In other areas, relational data bases are often used to store similar kinds of hierarchical data tidily (Wickham, 2014) or, speaking in terms of relational data bases, normalized, to optimize storing efficiency and allowing easier and more efficient querying of the data. However these relational data base management systems (RDBMS) are rarely used in the educational large-scale assessment context, probably partly due to the fact that they require users learning SQL. R interfaces like dplyr (Wickham, François, Henry, & Müller, 2018) exist, but the initial data base creation and later joining of data frames are still rather cumbersome. The R package eatDB is meant to bridge this gap. It provides a simple R interface for the creation of data bases and extracting data from data bases created via eatDB. It utilizes SQLite3 (SQLite Development Team, 2018), the R driver RSQLite (Müller, Wickham, James, & Falcon, 2018) and the R driver framework DBI (R Special Interest Group on Databases, Wickham, & Müller, 2018). Extracting data from large hierarchical data sets becomes substantially faster and more efficient. Exhaustive checks to guarantee the integrity of the data base are performed by the package. In my presentation, I would like to give a short introduction to the ideas behind eatDB, how these ideas are implemented in eatDB and how eatDB can be used in practice. Furthermore, I would like to show some small benchmark examples illustrating the reduction in working memory and increase in efficiency this approach yields compared to common alternatives. References: Müller, K., Wickham, H., James, D. A., & Falcon, S. (2018). RSQLite: 'SQLite' interface for R [Computer software manual]. Retrieved from https://CRAN.R-project.org/package=RSQLite (R package version 2.1.1) R Special Interest Group on Databases Wickham, H., & Müller, K. (2018). DBI: R database interface [Computer software manual]. Retrieved from https://CRAN.R-project.org/package=DBI (R package version 1.0.0) SQLite Development Team. (2018). SQLite [Computer software manual]. Retrieved from https://www.sqlite.org/index.html (Version 3.26.0) Wickham, H. (2014). Tidy data. The Journal of Statistical Software, 59 . Retrieved from http://www.jstatsoft.org/v59/i10/ Wickham, H., François, R., Henry, L., & Müller, K. (2018). dplyr: A grammar of data manipulation [Computer software manual]. Retrieved from https://CRAN.R-project.org/package=dplyr (R package version 0.7.8)