Comparison Of Methods For Handling Imbalanced Datasets In Improving Classification Algorithm Performance
Abstract
Data availability in large observations and dimensions is known as big data. There are several problems in processing big data, such as imbalanced datasets. In classification modeling, an imbalanced dataset is a common challenge. Data class predictions are more likely to be accurate in the majority class data and inaccurate in the minority class, resulting from the problem of imbalanced data. The data-level, the algorithm-level, and the ensemble method approach are the solutions that have been extensively researched. Some methods with a data-level approach are SMOTE, Undersampling, and Oversampling. The algorithm-level method is NWKNN. And then, the ensemble approach is UnderBagging, RUSBoosting, SMOTEBoost, and SMOTEBagging. The goal of this study is to determine the best method for handling each case of the imbalanced dataset. There are three cases of imbalance, namely mild, moderate, and extreme. A simulation study was conducted for each imbalanced case to evaluate the accuracy of each method. Based on the AUC value, the SMOTEBagging method is the best for mild imbalance cases with an AUC value of 0.9581. For moderate imbalance cases, the SMOTEBagging method is the best method, with an AUC value of 0.9033. Meanwhile, for extreme imbalance cases, the UnderBagging method provides the best performance.
Keywords
Full Text:
PDFReferences
[1] R. Barandela, R. M. Valdovinos, dan J. S. Sánchez, “New Applications of Ensembles of Classifiers,” Pattern Anal. Appl., vol. 6, hal. 245–256, 2003.
[2] N. V Chawla, K. W. Bowyer, L. O. Hall, dan W. P. Kegelmeyer, “SMOTE : Synthetic Minority Over-sampling Technique,” vol. 16, hal. 321–357, 2002.
[3] Y. Freund dan R. E. Schapire, “Experiments with a New Boosting Algorithm,” Proc. 13th Int. Conf. Mach. Learn., hal. 148–156, 1996, doi: 10.1.1.133.1040.
[4] M. Galar, A. Fernandez, E. Barrenechea, H. Bustince, dan F. Herrera, “A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches,” IEEE Trans. Syst., vol. 42, no. 4, hal. 463–484, 2012.
[5] K. P. Murphy, Machine Learning A Probabilistic Perspective. London, England: The MIT Press, 2012. doi: 10.1007/978-94-011-3532-0_2.
[6] L. Nuzula, A. Prahutama, dan A. R. Hakim, “KLASIFIKASI STATUS KEMISKINAN RUMAH TANGGA DENGAN METODE SUPPORT VECTOR MACHINES (SVM) DAN CLASSIFICATION AND REGRESSION TREES (CART) MENGGUNAKAN GUI R (Studi Kasus di Kabupaten Wonosobo Tahun 2018),” J. Gaussian, vol. 9, no. 4, hal. 525–534, 2020, doi: 10.14710/j.gauss.v9i4.29449.
[7] Y. Permatasari, “PENANGANAN MASALAH KELAS TIDAK SEIMBANG DENGAN RUSBOOST DAN UNDERBAGGING (STUDI KASUS: MAHASISWA DROP OUT SPs IPB PROGRAM MAGISTER),” IPB University, 2016. Tersedia pada: http://repository.ipb.ac.id/handle/123456789/80118.
[8] B. Sartono dan U. D. Syafitri, “Metode Pohon Gabungan: Solusi Pilihan untuk Mengatasi Kelemahan Pohon Regresi dan Klasifikasi Tunggal,” Forum Stat. dan Komputasi, vol. 15, no. 1, hal. 1–7, 2010, [Daring]. Tersedia pada: https://journal.ipb.ac.id/index.php/statistika/article/view/4895.
[9] C. Seiffert, T. M. Khoshgoftaar, J. Van Hulse, and A. Napolitano, "RUSBoost: A hybrid approach to alleviating class imbalance," IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans, vol. 40, no. 1, pp. 185–197, Jan. 2010, doi: 10.1109/TSMCA.2009.2029559.
[10] S. Tan, "Neighbor-weighted K-nearest neighbor for unbalanced text corpus," Expert Systems with Applications, vol. 28, no. 4, pp. 667–671, 2005, doi: 10.1016/j.eswa.2004.12.023.
[11] Z.-H. Zhou, Ensemble Methods: Foundations and Algorithms. Boca Raton, FL, USA: CRC Press, 2012.
[12] D. Ispriyanti, A. Prahutama, dan M. Mustafid, "Analisis Klasifikasi Kemiskinan di Kota Semarang Menggunakan Algoritma QUEST," Statistika, vol. 7, no. 1, pp. 47–54, Mei 2019.
[13] N. Rout, D. Mishra, dan M. K. Mallick, “Learning from class‑imbalanced data: Review of methods and applications,” Expert Systems with Applications, vol. 73, hlm. 220–239, Mei 2017.
[14] S. Wang, X. Yao, Diversity Analysis on Imbalances Data Sets by Using Ensemble Models. IEEE Symp. Comput. Intell. Data Mining (2009) 324-331.
[15] T. Fawcett, “An introduction to ROC analysis,” Pattern Recognition Letters, vol. 27, no. 8, pp. 861–874, Jun. 2006, doi: 10.1016/j.patrec.2005.10.010.
[16] N. V. Chawla, A. Lazarevic, L. O. Hall, dan K. W. Bowyer, “SMOTEBoost: Improving Prediction of the Minority Class in Boosting,” in Proceedings of the Seventh European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), Cavtat-Dubrovnik, Croatia, 2003, pp. 107–119.
DOI: https://doi.org/10.18860/cauchy.v11i1.35780
Refbacks
- There are currently no refbacks.
Copyright (c) 2026 Winalia Agwil, Dyah Setyo Rini, Dian Agustina, Ahmad Famuji

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Editorial Office
Mathematics Department,
Universitas Islam Negeri Maulana Malik Ibrahim Malang
Gajayana Street 50 Malang, East Java, Indonesia 65144
Faximile (+62) 341 558933
e-mail: cauchy@uin-malang.ac.id

CAUCHY: Jurnal Matematika Murni dan Aplikasi is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.






