Comparison of Imputation Strategies for Incomplete Longitudinal Data in Lifecourse Epidemiology.

Year of Publication	2023
Author	Crystal Shaw Yingyan Wu Scott Zimmerman Eleanor Hayes-Larson Thomas Belin Melinda Power Maria Glymour Elizabeth Mayeda
Journal	Am J Epidemiol
ISSN Number	1476-6256
Abstract	Incomplete longitudinal data are common in lifecourse epidemiology and may induce bias leading to incorrect inference. Multiple imputation (MI) is increasingly preferred for handling missing data, but few studies explore MI method performance and feasibility in real data settings. We compared three MI methods using real data under nine missing data scenarios, representing combinations of 10%, 20%, and 30% missingness and missing completely at random, at random, and not at random. Using data from Health and Retirement Study (HRS) participants, we introduced record-level missingness to a sample of participants with complete data on depressive symptoms (1998-2008), mortality (2008-2018), and relevant covariates. We then imputed missing data using three MI methods (normal linear regression, predictive mean matching, variable-tailored specification), and fit Cox proportional hazards models to estimate effects of four operationalizations of longitudinal depressive symptoms on mortality. We compared bias in hazard ratios, root mean square error (RMSE), and computation time for each method. Bias was similar across MI methods and results were consistent across operationalizations of the longitudinal exposure variable. However, our results suggest predictive mean matching may be an appealing strategy for imputing lifecourse exposure data given consistently low RMSE, competitive computation times, and few implementation challenges.
DOI	10.1093/aje/kwad139
PMID	37338987
Download citation	DOI PubMed Google Scholar