Multiple imputation of missing data in multilevel research
Dr. Simon Grund, Research associate IPN
Multilevel models have become one of the most frequently used statistical models for analyzing multilevel data. These types of data occur in many fields of psychology when observations (Level 1) are clustered within some higher-level collectives (Level 2). This includes, for example, students nested in schools, employees nested in work teams, patients nested in clinics, and longitudinal data, in which observations are nested within persons. Unfortunately, multilevel data often contain missing data, for example, when participants omit certain items in a questionnaire or they drop out before the end of a study. If treated improperly, missing data can severely distort parameter estimates and may compromise statistical decision making. For this reason, it is often recommended to rely on principled methods for dealing with missing data such as multiple imputation (MI) or maximum likelihood estimation (ML). These procedures have the advantage that they take all the available data into account, thus improving statistical power and the conclusions that can be drawn from the data.
In the present dissertation, I consider different procedures for the treatment of missing data with an emphasis on multilevel MI. In multilevel research, it is important that the imputation model takes the structure of the data and the features of the substantive analysis model into account. However, many open questions remain about how this can be achieved in practice. In the present dissertation, I consider a variety of applications of multilevel models as well as different implementations of multilevel MI. In multiple studies, I examined how the multilevel structure is represented in different implementations of multilevel MI, how different representations may affect the results obtained from MI, and how missing data can be treated in multilevel models with random intercepts, random slopes, interaction effects, continuous and categorical data, and missing data at Level 2.
In addition, the present dissertation was concerned with the analysis of multiply imputed data sets. In this context, I examined different procedures for pooling the results obtained from multiply imputed data sets with an emphasis on multiparameter tests (e.g., model comparisons). This includes applications in traditional research designs with the analysis of variance (ANOVA) as well as applications in multilevel models with hypothesis tests about fixed effects and variance components. Finally, the dissertation presents the R package mitml➚, which is intended to provide researchers with a set of practical tools for conducting multilevel MI in research practice. This includes tools for the specification of the imputation model, convergence diagnostics, managing and analyzing multiply imputed data sets, and pooling methods for single- and multiparameter tests along with a tutorial article that illustrates these features and provides a nontechnical introduction to multilevel MI.
Referenz: Grund, S. (2017). Multiple imputation of missing data in multilevel research [Christian-Albrechts-Universität zu Kiel]. https://macau.uni-kiel.de/receive/diss_mods_00022800➚
Betreuer: Prof. Dr. Oliver Lüdtke