Random Forest Classification of Infant Mortality Rate in Indonesia: A Gini-Based Analysis

Ria Dhea Layla Nur Karisma, Usman Pagalay, Muhammad Khudzaifah

Abstract


One of the indicators used to measure the success of development programs in Indonesia is the Infant Mortality Rate (IMR). IMR is a sensitive indicator and represents maternal and child health problems in a country. Random forest is an ensemble machine learning method that combines multiple decision trees using bootstrap aggregation. It aims to improve the prediction accuracy and robustness of the model. In addition, it can be applied to both case classification and regression because it can handle high-dimensional and complex cases and non-linear relationships. In this study, Random Forest is used to solve the classification of IMR cases in Indonesia, making them easy to interpret and related to policy relevance. The aim of this study is to predict infant mortality factors using the Gini Index to determine which variables need to be improved. The Gini Index is used to identify key factors, enabling targeted policy interventions. It highlights the most influential variables, helping policymakers focus on areas that require improvement for more effective outcomes. The evaluation model in this study uses out-of-bag estimation and k-fold validation. The model achieves an overall accuracy of 99.97%, with a sensitivity of 99.87% and specificity of 100\%, indicating excellent performance. The most important variables in this study are breastfeeding, type of birth (single and twin), and birth weight of the baby. The parent node in IMR is breastfeeding, where live IMRs that are breastfed have a greater chance of survival than dead IMRs that are not breastfed.

Keywords


Accuracy; Gini Index; Infant Mortality Rate; Random Forest; Sensitivity; Specificity

Full Text:

PDF

References


[1] World Health Organization. UNICEF-WHO-WB Joint Child Malnutrition Estimates Group released new data for 2021. 2021. Accessed: June 26, 2025. Available online.

[2] UNICEF. Levels & trends in child mortality: Report 2019. 2019. Accessed: June 26, 2025. Available online.

[3] United Nations. Ensure healthy lives and promote well-being for all at all ages (SDG Goal 3). 2020. Retrieved January 2024. Available online.

[4] BKKBN, BPS, Ministry of Health, and USAID. Survei Demografi dan Kesehatan Indonesia 2017. Jakarta, Indonesia: BKKBN, 2017. Accessed: June 26, 2025. Available online.

[5] T. Bylander. Estimating generalization error on two-class datasets using out-of-bag estimates. Machine Learning, 48(1–3):287–297, 2002. doi:10.1023/A:1013964023376.

[6] L. Breiman. Random forests. Machine Learning, 45(1):5–32, 2001. doi:10.1023/A:1010933404324.

[7] J. C. Lee. Predicting mortality risk for preterm infants using random forest. Scientific Reports, 11(1):7308, 2021. doi:10.1038/s41598-021-86748-4.

[8] L. M. Frota, M. Hasegawa, and P. Jacinto. Infant mortality in Brazil: A survival analysis using machine learning models. ResearchGate, pages 1–46, 2024. doi:10.13140/RG.2.2.32819.64805.

[9] T. G. Dietterich. Ensemble methods in machine learning. Lecture Notes in Computer Science, 1857:1–15, 2000. doi:10.1007/3-540-45014-9_1.

[10] R. D. Karisma. Random forest of modified risk factor on ischemic and hemorrhagic (case study: Medicum clinic, Tallinn, Estonia). In Proceedings of the International Conference on Science and Science Education, pages 26–41, 2015. Accessed: June 26, 2025. Available online.

[11] Janosh. Illustrating the Random Forest algorithm in TikZ. 2019. Retrieved January 2024 from https://tex.stackexchange.com/. Available online.

[12] S. W. He. Predictive modeling of groundwater nitrate pollution and evaluating its main impact factors using random forest. Chemosphere, 290:133388, Mar. 2022. doi:10.1016/j.chemosphere.2021.133388.

[13] M. I. Irawan and M. Jamhuri. State of the art of machine learning: An overview of the past, current, and the future research trends in the era of quantum computing. AIP Conference Proceedings, 2641, 2022.

[14] Y. Amit and D. Geman. Shape quantization and recognition with randomized trees. Neural Computation, 9, 1997.

[15] G. James, D. Witten, T. Hastie, and R. Tibshirani. An Introduction to Statistical Learning. Springer, 2013. Available online.

[16] V. K. Verma. Analysis Effect of K Values Used in K Fold Cross Validation for Enhancing Performance of Machine Learning Model with Decision Tree. Springer, Cham, Switzerland AG, 2024.

[17] Q. L. Ren. Tectonic discrimination of olivine in basalt using data mining techniques based on major elements: A comparative study from multiple perspectives. Big Earth Data, 3(1):8–25, 2019. doi:10.1080/20964471.2019.1572452.

[18] G. M. Foody. Challenges in the real world use of classification accuracy metrics: From recall and precision to the Matthews correlation coefficient. PLOS ONE, 18(10), 2023. doi:10.1371/journal.pone.0291908.

[19] N. Lunardon, G. Menardi, and N. Tore. ROSE: A package for binary imbalanced learning. The R Journal, 6:82–92, 2014.

[20] J. Zhang and L. Chen. Clustering-based undersampling with random over sampling examples and support vector machine for imbalanced classification of breast cancer diagnosis. Computer Assisted Surgery, 24(52):62–72, 2019. doi:10.1080/24699322.2019.1649074.




DOI: https://doi.org/10.18860/cauchy.v10i2.29508

Refbacks

  • There are currently no refbacks.


Copyright (c) 2025 Ria Dhea Layla Nur Karisma

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Editorial Office
Mathematics Department,
Universitas Islam Negeri Maulana Malik Ibrahim Malang
Gajayana Street 50 Malang, East Java, Indonesia 65144
Faximile (+62) 341 558933
e-mail: cauchy@uin-malang.ac.id

Creative Commons License
CAUCHY: Jurnal Matematika Murni dan Aplikasi is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.