Obesity Prediction Using Synthetic Minority Oversampling Technique for Numeric and Continous and XGBoost Approaches

Tiara Azahra Wika Putri, Umu Sa’adah, Ummu Habibah

Abstract


This study investigates the effect of using SMOTE-NC on the XGBoost algorithm in predicting obesity. The main objective of this research is to determine the effect of implementing SMOTE-NC and also the features that are most influential in the prediction process. By using the SMOTE-NC approach with XGBoost it is hoped that it can improve obesity prediction performance, data is collected from UCI Machine Learning for Obesity analysis. The prediction results reveal that the application of SMOTE-NC can improve the accuracy of obesity prediction using XGBoost. The results show that the best accuracy in this study was able to reach 98.30%. Further analysis, this research reviews several influential features in the prediction process, namely Weight, Height and Age. Based on these results, it is hoped that they can contribute to further research. Overall, this research underlines the importance of maintaining health to avoid obesity by keeping body weight within normal limits.

Keywords


Obesity; Ensemble Learning; SMOTE-NC; XGBoost.

Full Text:

PDF

References


[1]

WOF, World Obesity Atlas 2022, London: World Obesity Federation 2022, 2022.

[2]

WHO, “Obesity and Overweight,” 2024. [Online]. Available: https://www.who.int/news-room/fact-sheets/detail/obesity-and-overweight.

[3]

E. Laurence, “Obesity Statistics And Facts In 2024,” 2024. [Online]. Available: https://www.forbes.com/health/weight-loss/obesity-statistics/.

[4]

Kemenkes, “1 Dari 4 Penduduk Indonesia Mengalami Obesitas,” Kementrian Kesehatan Indonesia, 2018.

[5]

E. R. Puspapertiwi dan A. N. Dzulfaroh, “List of Countries with the Highest Obesity Rates in the World 2024, Is Indonesia Included?,” Kompas.com, 2024.

[6]

J. Cawley, “An economy of scales: A selective review of obesity’s economic causes, consequences, and solutions,” Journal of Health Economics, no. Elsevier B.V, pp. 244-268, 2015.

[7]

R. Peters, N. Ee, J. Peters, N. Beckett, A. Booth, K. Rockwood dan K. J. Anstey, “Common risk factors for major noncommunicable disease, a systematic overview of reviews and commentary: the implied potential for targeted risk reduction,” Therapeutic Advances in Chronic Disease, vol. 10, pp. 1-14, 2019.

[8]

N. L. Rane, M. Paramesha, S. P. Choudhary dan J. Rane, “Machine Learning and Deep Learning for Big Data Analytics: A Review of Methods and Applications,” Partners Universal International Innovation Journal (PUIIJ), vol. 03, no. 02, p. 172, 2024.

[9]

F. Fredwosy, K. S. A. Rahi, M. I. Jabiullah dan M. T. Habib, “A Machine Learning Approach for Obesity Risk Prediction,” Current Research in Behavioral Sciences, pp. 1-9, 2021.

[10]

X. Pang, C. B. Forrest, F. Le-Scherban dan A. J. Masino, “Prediction of early childhood obesity with machine learning and electronic health record data,” International Journal of Medical Informatics, pp. 1-8, 2021.

[11]

M. Calderon-Diaz, L. J. Seret-Castillo, E. A. Vallejos-Cuevas, A. Espinoza, R. Salas dan M. A. Macias-Jimenez, “Detection of Variables for The Diagnosis of Overweight and Obesitty in Young Chileans Using Machine Learning Techniques,” dalam The 1th International Workshop on Human-Centric Innovation and Computational Intelligence (IWHICI 2023), 2023.

[12]

N. V. Chawla, K. W. Bowyer, L. O. Hall dan W. P. Kegelmeyer, “SMOTE: Synthetic Minority Over-sampling Technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321-357, 2002.

[13]

D. L. Zou, X. Fang, L. Xu dan L. L. Wu, “Study on the relationship of design parameters and damage modes for RC slabs subjected to large-scale hard missile impacts base on task-driven approach,” Structures , vol. 58, pp. 1-17, 2023.

[14]

S. Chalichalamala, N. Govindan dan R. Kasarapu, “An extreme gradient boost based classification and regression tree for network intrusion detection in IoT,” Bulletin of Electrical Engineering and Informatics, vol. 3, no. 13, pp. 1741-1751, 2024.

[15]

T. Chen dan C. Guestrin, “XGBoost: A Scalable Tree Boosting System,” dalam KDD '16, San Francisco, 2016.

[16]

S. B. Kotsiantis, D. Kanellopoulos dan P. E. Pintelas, “Data Preprocessing for Supervised Leaning,” INTERNATIONAL JOURNAL OF COMPUTER SCIENCE, vol. 1, no. 1, pp. 111-117, 2006.

[17]

Suraya, M. Sholeh dan U. Lestari, “Evaluation of Data Clustering Accuracy usingK-Means Algorithm,” International Journal of Multidisciplinary Approach Research and Science, vol. 01, no. 02, pp. 385-396, 2024.

[18]

F. M. Palechor dan A. d. l. H. Manotas, “Dataset for estimation of obesity levels based on eating habits and physical condition in individuals from Colombia, Peru and Mexico,” Data in brief, vol. 25, pp. 1-5, 2019.

[19]

S. A. Alasadi dan W. S. Bhaya, “Review of Data Preprocessing Techniques in Data Mining,” Journal of Engineering and Applied Sciences, vol. 16, no. 12, pp. 4102-4107, 2017.

[20]

D. T. Utari, “INTEGRATION OF SVM AND SMOTE-NC FOR CLASSIFICATION OF HEART FAILURE PATIENTS,” BAREKENG: Journal of Mathematics and Its Applications, vol. 4, no. 17, pp. 2263-2272, 2023.

[21]

Z. H. Zhou, Ensemble Method Foundations and Algorithm, Boca Raton: Taylor and Francis Group, CRC press, 2012.

[22]

A. F. L. Ptr, M. M. Siregar dan I. Daniel, “Analysis of Gradient Boosting, XGBoost, and CatBooston Mobile Phone Classification,” Journal of Computer Networks, Architecture and High Performance Computing, vol. 2, no. 6, pp. 661-670, 2024.

[23]

D. Hu, C. Wang dan A. M. O’Connor, “A method of back-calculating the log odds ratio and standard error of the log odds ratio from the reported group-level risk of disease,” Plos One, pp. 1-8, 2020.

[24]

Ž. Đ. Vujović, “Classification Model Evaluation Metrics,” (IJACSA) International Journal of Advanced Computer Science and Applications, vol. 6, no. 12, pp. 1-8, 2021.




DOI: https://doi.org/10.18860/cauchy.v10i1.30818

Refbacks

  • There are currently no refbacks.


Copyright (c) 2025 Tiara Azahra Wika Putri

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Editorial Office
Mathematics Department,
Universitas Islam Negeri Maulana Malik Ibrahim Malang
Gajayana Street 50 Malang, East Java, Indonesia 65144
Faximile (+62) 341 558933
e-mail: cauchy@uin-malang.ac.id

Creative Commons License
CAUCHY: Jurnal Matematika Murni dan Aplikasi is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.