|Title||Statistical Significance of Hyperparameter Tuning for Varying Levels of Class Imbalance|
|Publication Type||Conference Paper|
|Year of Publication||2021|
|Authors||Hancock, J, Khoshgoftaar, TM, Landset, S|
|Conference Name||26th ISSAT International Conference on Reliability and Quality in Design, RQD 2021|
|Keywords||Class imbalance, Cognition, Logistic Regression, Machine learning, Random forest|
Researchers experimenting with classification tasks for Machine Learning have a choice to use optimized or default values for their algorithms' hyperparameters. Our contribution is to conduct experiments with balanced and imbalanced datasets to show hyperparameter tuning has a significant, positive impact on classification results regardless of class ratio. To the best of our knowledge, this is the first study to investigate whether hyperparameter tuning has a statistically significant impact on the classification of balanced and imbalanced datasets derived from the Health and Retirement Study.We conduct a series of experiments with three classifiers, and five datasets. The classifiers are well-known, widely used classifiers in Machine Learning research. The datasets are based on a survey on cognition in human subjects. Three of the datasets are balanced, and two of them are imbalanced. We perform Analysis of Variance and Tukey's Honestly Significant Difference tests to determine the effect of hyperparameter tuning. Our results show that, regardless of class imbalance, using optimized hyperparameter values yields better results in a statistically significant sense.