Novel Applications and Extensions for Bayesian Additive Regression Trees (BART) in Prediction, Imputation, and Causal Inference

Year of Publication	2018
Author	Yaoyuan Tan
Academic Department	Biostatistics
Degree	PhD
Number of Pages	201
ISBN Number	9780438885981
Abstract	The Bayesian additive regression trees (BART) is a method proposed by Chipman et al. (2010) that can handle non-linear main and multiple-way interaction effects for independent continuous or binary outcomes. It has enjoyed much success in areas like causal inference, economics, environmental sciences, and genomics. However, extensions of BART and application of these extensions are limited. This thesis discusses three novel applications and extensions for BART. We first discuss how BART can be extended to clustered outcomes by adding a random intercept. This work was motivated by the need to accurately predict driver behavior using observable speed and location information with application to communication of key human-driver intention to nearby vehicles in traffic. Although our extension can be considered a special case of the spatial BART (Zhang et al., 2007), our approach differs by providing a relatively simple algorithm that allows application to clustered binary outcomes. We next focus on the use of BART in missing data settings. Doubly robust (DR) methods allow consistent estimation of population means when either non-response propensity or modeling of the mean of the outcome is correctly specified. Kang and Schafer (2007) showed that DR methods produce biased and inefficient estimates when both propensity and mean models are misspecified. We consider the use of BART for modeling means and/or propensities to provide a ``robust-squared'' estimator that reduces bias and improves efficiency. We demonstrate this result, using simulations, for the two commonly used DR methods: Augmented Inverse Probability Weighting (AIPWT, Robbins et al., 1994) and penalized splines of propensity prediction (PSPP, Zhang and Little, 2009). We successfully applied our proposed model to two national crash datasets to impute missing change in deceleration values (delta-v) and missing Blood Alcohol Concentration (BAC) levels respectively. Our final effort considers how a negative wealth shock (sudden large decline in wealth) affects the cognitive outcome of late middle aged US adults using the Health Retirement Study, a longitudinal study of US adults, enrolled at age 50 and older and surveyed biennially since 1992. Our analysis faced three issues: lack of randomization, confounding by indication, and censoring of the cognitive outcome by a substantial number of deaths in our subjects. Marginal structural models (MSM), a commonly used method to deal with censoring by death, is arguably inappropriate because it upweights subjects who are more likely to die, creating a pseudo-population which resembles one where death is absent. We propose to compare the negative wealth shock effect only among subjects who survived under both sets of treatment regimens - a special case of principal stratification (Frangakis and Rubin, 2002). Because the counterfactual survival status would be unobserved, we imputed their survival status and restrict analysis to subjects who were observed and predicted to survive under both treatment regimes. We used a modified version of penalized spline of propensity methods in treatment comparisons (PENCOMP, Zhou et. al, 2018) to obtain a robust imputation of the counterfactual cognitive outcomes. Finally, we consider several possible extensions of these efforts for future work.
Thesis Type	phd
URL	https://deepblue.lib.umich.edu/handle/2027.42/147594
University	University of Michigan
City	Ann Arbor, MI
Download citation	Google Scholar