|Health State Risk Categorization: A Machine Learning Clustering Approach Using Health and Retirement Study Data
|Year of Publication
|Tan, F, Mehta, D
|The Journal of Financial Data Science
|health state risk, Machine learning
For countries such as the United States, which lacks a universal health care system, future health care costs can create significant uncertainty that a retirement investment strategy must be built to manage. One of the most important factors determining health care costs is the individual’s health status. Hence, categorizing individuals into meaningful health risk types is an essential task. The conventional approach is to use individuals’ self-rated health state categorization. In this work, the authors provide an objective and data-driven machine learning (ML)–based approach to categorize heath state risk by using the most widely used US household surveys on older Americans, the Health and Retirement Study (HRS). The authors propose an approach of employing the K-modes clustering method to algorithmically cluster on an exhaustive list of categorical health-related variables in the HRS. The resulting clusters are shown to provide an objective, interpretable, and practical health state risk categorization. The authors then compare and contrast the ML-based and self-rated health state categorizations and discuss the implications of the differences. They also illustrate the difficulty in predicting out-of-pocket costs based on self-rated health status and how ML-based categorizations can generate more-accurate health care cost estimates for personalized retirement planning. The results in this article open different avenues of research, including behavioral science analysis for health and retirement study.