Comparison of Linear Regression, Decision Tree Regression, and Random Forest Regression Algorithms in Predicting Baldness Risk

Sebastianus Adi Santoso Mola, Alfonsus Maria De Liguori Goru, Christian Jaquelino Lamapaha, Yoseph Kurubingan Bekayo

Abstract


Abstract—Baldness is a common condition affecting both men and women, primarily caused by age, hormones, and genetics. Predicting the risk of baldness is crucial for early diagnosis and prevention of further hair loss. This study aims to compare the performance of Linear Regression (LR), Decision Tree Regression (DTR), and Random Forest Regression (RFR) in predicting baldness risk using data with variables such as age, gender, occupation, stress levels, and other lifestyle factors. A dataset of 5925 samples was processed through a series of steps, including normalization, parameter tuning, cross-validation, and residual analysis. The results show that Random Forest Regression outperformed other models with the lowest MSE (0.0979) and the highest R² (0.9056) on both training and testing data, followed by Decision Tree Regression and Linear Regression. Hyperparameter optimization using Grid Search significantly enhanced model performance. In conclusion, Random Forest Regression is the most suitable model for predicting baldness risk with complex datasets, while Linear Regression remains a viable alternative for simpler datasets.

Keywords


Baldness, Linear Regression, Decision Tree Regression, Random Forest Regression

Full Text:

PDF

References


[1] I. Sina et al., “HAIR LOSS AND ALOPECIA,” vol. 20, no. 2, 2021.

[2] N. Sa, P. S. Biologi, and U. N. Padang, “Literature Review : Peran Hormon Testosteron terhadap Androgenetic Alopecia,” vol. 8, pp. 30472–30482, 2024.

[3] E. Fitri, “Analisis Perbandingan Metode Regresi Linier, Random Forest Regression dan Gradient Boosted Trees Regression Method untuk Prediksi Harga Rumah,” J. Appl. Comput. Sci. Technol., vol. 4, no. 1, pp. 58–64, 2023, doi: 10.52158/jacost.v4i1.491.

[4] Farhanuddin, Sarah Ennola Karina Sihombing, and Yahfizham, “Komparasi Multiple Linear Regression dan Random Forest Regression Dalam Memprediksi Anggaran Biaya Manajemen Proyek Sistem Informasi,” J. Comput. Digit. Bus., vol. 3, no. 2, pp. 86–97, 2024, doi: 10.56427/jcbd.v3i2.408.

[5] F. Hidayah, S. J. Angesti, and Y. P. Widyastuti, “Prediksi Harga Rumah di Boston Menggunakan Metode Linear Regression, SVR, Decision Tree dan Random Forest Regression,” pp. 1–9.

[6] M. A. Sembiring, F. W. Sembiring, and S. Informasi, “Analisa Kinerja Model Regresi Dalam Machine Learning,” vol. 8, no. 1, pp. 144–152, 2024.

[7] D. Eko Waluyo et al., “Implementasi Algoritma Regresi pada Machine Learning untuk Prediksi Indeks Harga Saham Gabungan,” Univ. Dian Nuswantoro, Semarang Jln. Imam Bonjol, vol. 9, no. 1, pp. 12–17, 2024.

[8] N. S. Soraya and H. Hendry, “Komparasi linear regression, random forest regression, dan multilayer perceptron regression untuk prediksi tren musik TikTok,” Aiti, vol. 20, no. 2, pp. 191–205, 2023, doi: 10.24246/aiti.v20i2.191-205.

[9] D. Pramesti and Wiga Maulana Baihaqi, “Perbandingan Prediksi Jumlah Transaksi Ojek Online Menggunakan Regresi Linier Dan Random Forest,” Gener. J., vol. 7, no. 3, pp. 21–30, 2023, doi: 10.29407/gj.v7i3.20676.

[10] I. H. Maula, “Kemungkinan Kebotakan | Kaggle.” Accessed: Dec. 05, 2024. [Online]. Available: https://www.kaggle.com/datasets/itsnahm/baldness-probability

[11] I. Nurdin, Sugiman, and Sunarmi, “Penerapan Kombinasi Metode Ridge Regression (RR) dan Metode Generalized Least Square (GLS) untuk Mengatasi Masalah Multikolinearitas dan Autokorelasi,” J. Mipa, vol. 41, no. 1, pp. 58–68, 2018.

[12] G. Chairunisa et al., “Life Expectancy Prediction Using Decision Tree, Random Forest, Gradient Boosting, and XGBoost Regressions,” J. Sintak, vol. 2, no. 2, pp. 71–82, 2024, doi: 10.62375/jsintak.v2i2.249.

[13] Diana Tri Susetianingtias, Eka Patriya, and Rodiah, “Model Random Forest Regression Untuk Peramalan Penyebaran Covid-19 Di Indonesia,” Decod. J. Pendidik. Teknol. Inf., vol. 2, no. 2, pp. 84–95, 2022, doi: 10.51454/decode.v2i2.48.

[14] H. Nuha, “Mean Squared Error (MSE) dan Penggunaannya,” Papers.Ssrn.Com, vol. 52, pp. 1–1, 2023, [Online]. Available: https://ssrn.com/abstract=4420880

[15] H. Hernandez, “Vol. 8, 2023-10,” vol. 8, pp. 1–43, 2023, doi: 10.13140/RG.2.2.26570.13769.




DOI: https://doi.org/10.18860/mat.v17i2.30256

Refbacks

  • There are currently no refbacks.




Copyright (c) 2025 Sebastianus Adi Santoso Mola, Alfonsus Maria De Liguori Goru, Christian Jaquelino Lamapaha, Yoseph Kurubingan Bekayo

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

The journal is indexed by :

Dimensions Sinta CrossRef GoogleScholar
Index Copernicus Moraref Portal Garuda

 

_______________________________________________________________________________________________________________

Editorial Office:
Informatics Engineering Department
Faculty of Science and Technology
Universitas Islam Negeri Maulana Malik Ibrahim Malang
Jalan Gajayana 50 Malang, Jawa Timur, Indonesia 65144
Email: matics@uin-malang.ac.id
_______________________________________________________________________________________________________________

Creative Commons License
This work is licensed under a CC-BY-SA 4.0.
© All rights reserved 2015. MATICS , ISSN : 1978-161X | e-ISSN :  2477-2550