|Methods for Improving Efficiency of Planned Missing Data Designs
|Year of Publication
|Number of Pages
|University of Michigan
|Ann Arbor, MI
|Data designs, planned missing data, split questionnaire design, two-phase sampling
Any survey specifically constructed so that at least some variables are unobserved on a subset of participants is a planned missing data design, where missing data represent an intentional feature of the study. Use of planned missing data designs can potentially reduce costs, improve data quality, and reduce unplanned missing data, and advancements in missing data methodology and multiple imputation software make planned missing data designs more attractive than before. Two commonly used planned missing data designs are two-phase sampling and split questionnaire design. Two-phase sampling is used to improve the efficiency of an estimate for a single outcome that is costly to measure, while the split questionnaire design is primarily used to reduce survey length. First, we propose new methods for selecting our second phase sample in two-phase surveys to reduce the variance of our estimate. When our outcome variable is continuous, we can use the data collected in Phase I for selecting our Phase II sample in order to increase the precision of the estimates. For other instances, we propose an adaptive sampling method to select Phase II samples in order to improve estimation of the quantity of interest. Next, we examine the performance of several design allocations for implementing a split questionnaire survey in a longitudinal study. While many papers examined the administration of split questionnaire designs in cross-sectional studies, research in applying these methods to longitudinal studies has been limited. For our project, we focus on the commonly used 3-form design and propose several methods to administer the 3-form split questionnaire in a longitudinal study. Using simulations and data from the Health and Retirement Study, we compare the performance of each proposed design under several correlation structures. Finally, we propose a method for improving variable allocation in split questionnaire designs. We establish a criterion that allows us to determine which variable allocations minimize the loss of information due to missing data. We use the Kullback-Leibler divergence between the posterior distribution of the parameters with missing data and the posterior distribution without missing data to locate optimal designs. After establishing a criterion for comparing designs, we propose a search algorithm to find optimal variable allocations, as it would be difficult to enumerate all possible designs as the number of variables grows.