KFCM-PSOTD : An Imputation Technique for Missing Values in Incomplete Data Classification
Abstract
Data mining is a very important process for finding out the data interpretation. Data preprocessing is the crucial data mining steps. The existence of missing values in the data is one of the primary issues with data preprocessing. Generally, this can be overcome with mean or median imputation because they are easy to implement. However, the use of these techniques is not recommended because they ignore the data variance. This research develops the Kernel Fuzzy C-Means Optimized by the Particle Swarm Optimizer with Two Differential Mutations (KFCM-PSOTD). KFCM imputation is applied to obtain better estimation values due to its proven ability to recognize patterns in the data. In addition, the PSOTD algorithm is used as an optimization tool to boost the KFCM's performance. PSOTD is adopted because it has more balanced exploration and exploitation capabilities compared to classical PSO. Datasets that have been imputed on KFCM-PSOTD are classified using the Decision Tree algorithm. The results are evaluated using accuracy, precision, recall, and f1 score to determine the quality of the imputed values. The outcomes demonstrate that the KFCM-PSOTD algorithm has a better performance; even the difference in evaluation scores obtained reaches 10% better than other imputation techniques.
Keywords
Full Text:
PDFReferences
[1]
D. Dietrrich, B. Heller and B. Yang, Data Science & Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data, Hoboken: John Wiley & Sons, 2015.
[2]
J. Dean, Big Data, Data Mining, and Machine Learning, Hoboken: John Wiley & Sons, 2014.
[3]
D. Lamba, W. H. Hsu and M. Alsadhan, "Predictive Analytics and Machine Learning for Medical Informatics: A Survey of Tasks and Techniques," in Intelligent Data-Centric Systems, Machine Learning, Big Data, and IoT for Medical Informatics,, Academic Press, 2021, pp. 1-35.
[4]
S. Loghavi, R. Kanagal-Shamanna, J. D. Khoury, L. J. Medeiros, K. N. Naresh, R. Nejati and M. M. Patnaik, "5th Edition Of The World Health Classification Of Tumors Of The Hematopoietic And Lymphoid Tissues," Modern Pathology, 2023.
[5]
B. W. v. Rhijn, A. E. Hentschel, J. Bründl, E. M. Compérat, V. Hernández, O. Čapoun, H. M. Bruins, D. Cohen, M. Rouprêt, S. F. Shariat, A. H. Mostafid, R. Zigeuner, J. L. Dominguez-Escrig and M. Burge, "Prognostic Value of the WHO1973 and WHO2004/2016 Classification Systems for Grade in Primary Ta/T1 Non–muscle-invasive Bladder Cancer: A Multicenter European Association of Urology Non–muscle-invasive Bladder Cancer Guidelines Panel Study," European Urology Oncology, vol. 4, no. 2, pp. 182-191, 2021.
[6]
A. R. Fielder, G. E. Quinn, R. P. Chan, G. E. Holmström, M. F. Chiang, A. Berrocal, G. Binenbaum, M. Blair, J. P. Campbell, A. Capone, Y. Chen, S. Dai, A. Ells, B. Fleck, W. V. Good and M. E. Hartnet, "Retinopathy of Prematurity Classification Updates: possible Implications for Treatment," Journal of American Association for Pediatric Ophthalmology and Strabismus, vol. 26, no. 3, pp. 109-112, 2022.
[7]
S. Zhou, H. Cai, X. He, Z. Tang and S. Lu, "Enzyme-mimetic Antioxidant Nanomaterials for ROS Scavenging: Design, Classification, and Biological Applications," Coordination Chemistry Reviews, vol. 500, 2023.
[8]
B. Wei, K. Hao, L. Gao, X.-s. Tang and Y. Zhao, "A Biologically Inspired Visual Integrated Model for Image Classification," Neurocomputing, vol. 405, pp. 103-113, 2020.
[9]
N. Dhakate and R. Joshi, "Classification of Reviews of e-Healthcare Services to Improve Patient Satisfaction: Insights from An Emerging Economy," Journal of Business Research, vol. 164, 2023.
[10]
M. d. l. Paz-Marín, P. A. Gutiérrez and C. Hervás-Martínez, "Classification of Countries’ Progress Toward a Knowledge Economy Based on Machine Learning Classification Techniques," Expert Systems with Applications, vol. 42, no. 1, pp. 562-572, 2015.
[11]
H. Wang and S. Wang, "Mining Incomplete Survey Data Through Classification," Knowl Inf Syst, vol. 24, pp. 221-223, 2010.
[12]
A. T. S. Dhevi, "Imputing Missing Values Using Inverse Distance Weighted Interpolation for Time Data Series," in Sixth International Conference on Advanced Computing (ICoAC), 2014.
[13]
M. N. M. Salleh and N. A. Samat, "FCMPSO : An Imputation for Missing Data Features in Heart Disease Classification," in IOP Conference Series : Materials Science and Engineering, 2017.
[14]
M. Jamshidian and M. Mata, "Advances in Analysis of Mean and Covariance Structure when Data are Incomplete," in Handbook of Computing and Statistics with Applications, North-Holland, Handbook of Latent Variable and Related Models, 2007, pp. 21-44.
[15]
R. J. Hathaway and J. C. Bezdek, "Fuzzy C-Means Clustering of Incomplete Data," IEEE Trans. Syst. Man Cybern, vol. 31, no. 5, pp. 735-744, 2001.
[16]
K. Aristiawati, T. Siswantining, D. Sarwinda and S. M. Soemartojo, "Missing Values Imputation Based on Fuzzy C-Means Algorithm for Classification of Chronic Obstructive Pulmonary Disease (COPD)," in Proceedings of the 8th SEAMS-UGM International Conference on Mathematics and Its Applications 2019: Deepening Mathematical Concepts for Wider Application through Multidisciplinary Research and Industries Collaborations, 2019.
[17]
H. Z. D. Li, T. Li, A. Bouras, X. Yu and T. Wang, "Hybrid Missing Value Imputation Algorithms Using Fuzzy C-Means and Vaguely Quantified Rough Set," IEEE Transactions on Fuzzy Systems, vol. 30, no. 5, pp. 1396-1408, 2022.
[18]
D. Q. Zhang and S. C. Chen, "Clustering Incomplete Data Using Kernel- Based Fuzzy C-means Algorithm," Neural Processing Letters, vol. 18, pp. 155-162, 2003.
[19]
H. Izakian and A. Abraham, "Fuzzy C-Means and Fuzzy Swarm for Fuzzy Clustering Problem," Expert Systems with Application, vol. 38, no. 3, pp. 1835-1838, 2011.
[20]
S. Sengupta, S. Basak and R. A. Peters, "Data Clustering Using a Hybrid of Fuzzy C-Means and Quantum-behaved Particle Swarm Optimization," in Proceedings of the 2018 IEEE 8th Annual Computing and Communication Workshop and Conference (CCWC), Las Vegas, 2017.
[21]
T. M. S. Filho, B. A. Pimentel, R. M. C. R. Souza and A. L. I. Oliveira, "Hybrid Methods for Fuzzy Clustering Based on Fuzzy C-Means and Improved Particle Swarm Optimization," Expert Systems with Application, vol. 42, no. 17-18, pp. 6315-6328, 2015.
[22]
Y. Chen, L. Li, H. Peng, J. Xiao, Y. Yang and Y. Shi, "Particle Swarm Optimizer with Two Differential Mutation," Applied Soft Computing, vol. 61, pp. 314-330, 2017.
[23]
W. Zhang and X. Xie, "DEPSO : Hybrid Particle Swarm with Differential Evolution Operator," in IEEE International Conference on Systems, Man, and Cybernetics. Conference Theme - System Security and Assurance, 2003.
[24]
Y. Wu, X. Z. Gao, X. L. Huangand and K. Zenger, "A Hybrid Optimization Method of Particle Swarm Optimization and Cultural Algorithm," in Sxith International Conference on Natural Computation, 2010.
[25]
J. S. Rojo-Alvarez, M. Martinez-Ramon, J. Munoz-Mari and G. Camps-Valls, Digital Signal Processing with Kernel Methods, Hoboken: John Wiley & Sons, 2018.
[26]
T. Li, L. Zhang, W. Lu, H. Hou, X. Liu and W. Pedrcz, "Interval Kernel Fuzzy C-Means Clustering of Incomplete Data," Neurocomputing, vol. 237, pp. 316-331, 2017.
[27]
M. A. Tuegeh, A. Soeprijanto and M. H. Purnomo, "Optimal Generator Scheduling Based on Particle Swarm Optimization," in Seminar Nasional Informatika (SemnasIF), 2009.
[28]
J. R. Quinlan, "Induction of Decision Tree," Machine Learning, pp. 81-106, 1986.
DOI: https://doi.org/10.18860/ca.v9i1.25138
Refbacks
- There are currently no refbacks.
Copyright (c) 2024 Muhaimin Ilyas, Syaiful Anam, Trisilowati Trisilowati
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Editorial Office
Mathematics Department,
Universitas Islam Negeri Maulana Malik Ibrahim Malang
Gajayana Street 50 Malang, East Java, Indonesia 65144
Faximile (+62) 341 558933
e-mail: cauchy@uin-malang.ac.id
CAUCHY: Jurnal Matematika Murni dan Aplikasi is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.