Machine learning approaches to the social determinants of health in the Health and Retirement study.

TitleMachine learning approaches to the social determinants of health in the Health and Retirement study.
Publication TypeJournal Article
Year of Publication2018
AuthorsSeligman, B, Tuljapurkar, S, Rehkopf, DH
JournalSSM Popul Health
ISSN Number2352-8273
KeywordsBiomarkers, Computer science, Machine learning, Neural network, Social Factors, Social Support

Background: Social and economic factors are important predictors of health and of recognized importance for health systems. However, machine learning, used elsewhere in the biomedical literature, has not been extensively applied to study relationships between society and health. We investigate how machine learning may add to our understanding of social determinants of health using data from the Health and Retirement Study.

Methods: A linear regression of age and gender, and a parsimonious theory-based regression additionally incorporating income, wealth, and education, were used to predict systolic blood pressure, body mass index, waist circumference, and telomere length. Prediction, fit, and interpretability were compared across four machine learning methods: linear regression, penalized regressions, random forests, and neural networks.

Results: All models had poor out-of-sample prediction. Most machine learning models performed similarly to the simpler models. However, neural networks greatly outperformed the three other methods. Neural networks also had good fit to the data (between 0.4-0.6, versus <0.3 for all others). Across machine learning models, nine variables were frequently selected or highly weighted as predictors: dental visits, current smoking, self-rated health, serial-seven subtractions, probability of receiving an inheritance, probability of leaving an inheritance of at least $10,000, number of children ever born, African-American race, and gender.

Discussion: Some of the machine learning methods do not improve prediction or fit beyond simpler models, however, neural networks performed well. The predictors identified across models suggest underlying social factors that are important predictors of biological indicators of chronic disease, and that the non-linear and interactive relationships between variables fundamental to the neural network approach may be important to consider.

User Guide Notes

Alternate JournalSSM Popul Health
Citation Key9517
PubMed ID29349278
PubMed Central IDPMC5769116
Grant ListK01 AG047280 / AG / NIA NIH HHS / United States
R24 AG039345 / AG / NIA NIH HHS / United States