Enhancing Binary Classification Performance in Biomedical Datasets: Regularized ELM with SMOTE and Quantile Transforms Focused on Breast Cancer Analysis

Brilliant Friezka Aina, Meta Kallista, Ig. Prasetya Dwi Wibawa, Ginaldi Ari Nugroho, Ivana Meiska, Syifa Melinda Naf’an

Abstract


Using microarray datasets, this research investigation addresses the problem of unbalanced data in binary classification tasks. The objective is to increase classification performance by adding Extreme Learning Machine (ELM) regularization, as well as Synthetic Minority Over-sampling Technique (SMOTE) for data over-sampling and Quantile Transformer for data scaling. The study began with gathering important biological datasets from reputable sources such as UCI and Kaggle, including Pima Indian Diabetes, Heart Disease, and Wisconsin Breast Cancer. SMOTE was employed to solve the difficulty of data imbalance in the preparation of the dataset. The data was then separated into training (80%) and testing (20%) sets before being scaled using Quantile Transformation. To boost accuracy, ELMs were employed with an emphasis on introducing regularization techniques. Quantile Transforms are used to generate a Gaussian or uniform probability distribution from numerical input variables. Regularized ELM (R-ELM) surpasses ELM in terms of AUC, despite ELM's faster calculation time. The final selection of the regularization parameter (C) in R-ELM influences the model's performance and calculation time. Overall, R-ELM with SMOTE produces encouraging results when it comes to effectively categorizing biological dataset properties. A subsequent investigation and validation of additional datasets, however, are necessary to establish its generalizability and robustness.


Keywords


Regularized Extreme Learning Machine (R-ELM); Biomedical; Synthetic Minority Over-sampling Technique (SMOTE); Quantile Transformer.

Full Text:

PDF

References


K. R. I. P2PT, “Hari Jantung Sedunia (World Heart Day): Your Heart is Our Heart Too,” https://p2ptm.kemkes.go.id/, Sep. 26, 2019. [2] R. Rustogi and A. Prasad, "Swift Imbalance Data Classification using SMOTE and Extreme Learning Machine," 2019 International Conference on Computational Intelligence in Data Science (ICCIDS), Chennai, India, 2019, pp. 1-6, doi: 10.1109/ICCIDS.2019.8862112. [3] Hairani, N. A. Setiawan, & T. B. Adji. 2013. Metode Klasifikasi Data Mining dan Teknik Sampling Smote.Seminar Nasional Sains dan Teknologi, 168–172. [4] A. A. Dharmasaputro, N. M. Fauzan, M. Kallista, I. P. D. Wibawa and P. D. Kusuma, "Handling Missing and Imbalanced Data to Improve Generalization Performance of Machine Learning Classifier," 2021 International Seminar on Machine Learning, Optimization, and Data Science (ISMODE), Jakarta, Indonesia, 2022, pp. 140-145, doi: 10.1109/ISMODE53584.2022.9743022. [5] Wang Y, Wu X, Chen Z, Ren F, Feng L, Du Q. Optimizing the Predictive Ability of Machine Learning Methods for Landslide Susceptibility Mapping Using SMOTE for Lishui City in Zhejiang Province, China. International Journal of Environmental Research and Public Health. 2019; 16(3):368. https://doi.org/10.3390/ijerph16030368. [6] J. Peng, R. Gao, L. Nguyen, Y. Liang, S. Thng and Z. Lin, "Classification of Non-Tumorous Facial Pigmentation Disorders Using Improved Smote and Transfer Learning," 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 2019, pp. 220-224, doi: 10.1109/ICIP.2019.8802993. [7] Amaratunga, D.; Cabrera, J. (2001). "Analysis of Data from Viral DNA Microchips". Journal of the American Statistical Association. 96 (456): 1161. doi:10.1198/016214501753381814. [8] Bolstad, B. M.; Irizarry, R. A.; Astrand, M.; Speed, T. P. (2003). "A comparison of normalization methods for high density oligonucleotide array data based on variance and bias". Bioinformatics. 19 (2): 185–193. doi:10.1093/bio informatics/19.2.185. [9] Jan Gasthaus, Konstantinos Benidis, Yuyang Wang, Syama Sundar Rangapuram, David Salinas, Valentin Flunkert, and Tim Januschowski. 2019. Probabilistic forecasting with spline quantile function RNNs. In International conference on Artificial Intelligence and Statistics. PMLR. [10] Roger Koenker and Kevin F Hallock. 2001. Quantile regression. Journal of economic perspectives 15, 4 (2001), 143–156. [11] Youngsuk Park, Danielle Maddix, François-Xavier Aubet, Kelvin Kan, Jan Gasthaus, and Yuyang Wang. 2021. Learning Quantile Functions without Quantile Crossing for Distribution-free Time Series Forecasting. arXiv preprint arXiv:2111.06581 (2021). [12] G. Bin Huang, Q. Y. Zhu, and C. K. Siew, “Extreme learning machine: Theory and applications,” Neurocomputing, vol. 70, no. 1–3, pp. 489–501, Dec. 2006, doi: 10.1016/j.neucom.2005.12.126. [13] I. P. D. Wibawa, C. Machbub, A. S. Rohman, and E. Hidayat, “Modified online sequential extreme learning machine algorithm using model predictive control approach,” Intelligent Systems with Applications, vol. 18, May 2023, doi: 10.1016/j.iswa.2023.200191. [14] G. Bin Huang, H. Zhou, X. Ding, and R. Zhang, “Extreme learning machine for regression and multiclass classification,” IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 42, no. 2, pp. 513–529, Apr. 2012, doi: 10.1109/TSMCB.2011.2168604. [15] C. Chen, K. Li, M. Duan, and K. Li, “Extreme Learning Machine and Its Applications in Big Data Processing,” in Big Data Analytics for Sensor-Network Collected Intelligence, Elsevier Inc., 2017, pp. 117–150. doi: 10.1016/B978-0-12-809393-1.00006-4. [16] UCI Machine Learning, “Pima Indians Diabetes Database,” https://www.kaggle.com, 2016. [17] Smith, J.W., Everhart, J.E., Dickson, W.C., Knowler, W.C., & Johannes, R.S. (1988). Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In Proceedings of the Symposium on Computer Applications and Medical Care (pp. 261-265). IEEE Computer Society Press. [18] Janosi,Andras, Steinbrunn,William, Pfisterer,Matthias, and Detrano,Robert. (1988). Heart Disease. UCI Machine Learning Repository. https://doi.org/10.24432/C52P4X. [19] Wolberg,William, Mangasarian,Olvi, Street,Nick, and Street,W.. (1995). Breast Cancer Wisconsin (Diagnostic). UCI Machine Learning Repository. https://doi.org/10.24432/C5DW2B. [20] I.H. Witten, E. Frank, and M.A. Hall, “Data Mining Practical Machine Learning Tools and Techniques”, Third Edition, Elsevier Publisher, USA, 2011. [21] T. Y. Hadiwandra, “SATIN-Sains dan Teknologi Informasi Perbandingan Kinerja Model Klasifikasi Decission Tree, Bayesian Classifier, Instance Base, Linear Function Base, Rule Base pada 4 Dataset Berbeda,” vol. 5, no. 1, 2019, [Online]. Available: http://jurnal.stmik-amik-riau.ac.id.




DOI: https://doi.org/10.18860/ca.v9i2.28785

Refbacks

  • There are currently no refbacks.


Copyright (c) 2024 Meta Kallista

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Editorial Office
Mathematics Department,
Universitas Islam Negeri Maulana Malik Ibrahim Malang
Gajayana Street 50 Malang, East Java, Indonesia 65144
Faximile (+62) 341 558933
e-mail: cauchy@uin-malang.ac.id

Creative Commons License
CAUCHY: Jurnal Matematika Murni dan Aplikasi is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.