Improving Random Forest Performance for Botnet Attack Detection in IoT Big Data Using Remove Frequent Values Filter

Imam Marzuki, Mas Ahmad Baihaqi, Hartawan Abdillah, Dwi Iryaning Handayani, Nurhidayati Nurhidayati

Abstract


This research aims to enhance the performance of the Random Forest algorithm in classifying big data within the Internet of Things (IoT) domain, specifically for detecting botnet attacks. The study utilizes the N-BaIoT dataset, comprising 150,000 instances of IoT network traffic categorized into normal and anomalous (botnet) data. To optimize classification outcomes, a preprocessing technique—the “remove frequent values” filter—is applied to reduce redundancy and improve computational efficiency. Model performance is evaluated using accuracy, precision, recall, and F1-score. Experimental results demonstrate that this filter improves classification accuracy from 99.976% to 99.998%, with precision, recall, and F1-score all reaching 1.000. Cross-validation was conducted to ensure the robustness of these results. These findings suggest that even lightweight preprocessing techniques can significantly enhance machine learning performance in IoT big data classification tasks.

 


Keywords


Random Forest; Internet of Things (IoT); Botnet Detection; Data Preprocessing; Machine Learning

Full Text:

PDF

References


[1] D. M. Sharif, “Application-Layer DDoS Detection via Efficient Machine Learning and Feature Selection,” in 2023 International Conference on Engineering Applied and Nano Sciences (ICEANS), Erbil, Iraq: IEEE, Oct. 2023, pp. 19–23. doi: 10.1109/ICEANS58413.2023.10630487.

[2] H. Zhang, S. Dai, Y. Li, and W. Zhang, “Real-time Distributed-Random-Forest-Based Network Intrusion Detection System Using Apache Spark,” in 2018 IEEE 37th International Performance Computing and Communications Conference (IPCCC), Orlando, FL, USA: IEEE, Nov. 2018, pp. 1–7. doi: 10.1109/PCCC.2018.8711068.

[3] Q. Liang, R. A. Bauder, and T. M. Khoshgoftaar, “Enhancing Medicare Fraud Detection: Random Undersampling Followed by SHAP-Driven Feature Selection with Big Data,” in 2024 IEEE 36th International Conference on Tools with Artificial Intelligence (ICTAI), Herndon, VA, USA: IEEE, Oct. 2024, pp. 256–263. doi: 10.1109/ICTAI62512.2024.00045.

[4] S. Soim, S. Sholihin, and C. B. Subianto, “Optimizing Performance Random Forest Algorithm Using Correlation-Based Feature Selection (CFS) Method to Improve Distributed Denial of Service (DDoS) Attack Detection Accuracy,” Indones. J. Artif. Intell. Data Min., vol. 7, no. 2, p. 220, Apr. 2024, doi: 10.24014/ijaidm.v7i2.24783.

[5] P. Dey and D. Bhakta, “A New Random Forest and Support Vector Machine-based Intrusion Detection Model in Networks,” Natl. Acad. Sci. Lett., vol. 46, no. 5, pp. 471–477, Oct. 2023, doi: 10.1007/s40009-023-01223-0.

[6] P. Negi, A. Dhablia, H. B. Vanjari, J. Tamkhade, S. Ikhar, and S. T. Shirkande, “Evaluating Feature Selection Methods to Enhance Diabetes Prediction with Random Forest,” in Proceedings of the 5th International Conference on Information Management & Machine Intelligence, Jaipur India: ACM, Nov. 2023, pp. 1–7. doi: 10.1145/3647444.3647934.

[7] R. A. D. Talasari, T. Ahmad, and M. A. R. Putra, “Exploring the Potential of Feature Selection Methods for Effective and Efficient IoT Malware Detection,” in 2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT), Kamand, India: IEEE, Jun. 2024, pp. 1–6. doi: 10.1109/ICCCNT61001.2024.10726080.

[8] SDNBV College for Women, University of Madras, Chrompet, Chennai, 600 044, India, E. S. Sujatha, and R. R. Radha, “A Hybrid of Proposed Filtration and Feature Selections to Enhance the Model Performance,” Indian J. Sci. Technol., vol. 14, no. 24, pp. 2039–2050, Jun. 2021, doi: 10.17485/IJST/v14i24.2017.

[9] M. I. Prasetiyowati, N. U. Maulidevi, and K. Surendro, “Feature selection to increase the random forest method performance on high dimensional data,” Int. J. Adv. Intell. Inform., vol. 6, no. 3, p. 303, Nov. 2020, doi: 10.26555/ijain.v6i3.471.

[10] H. Cui, H. Xu, and J. Li, “Optimization of random forest algorithm based on mixed sampling additional feature selection,” in 2023 3rd International Conference on Consumer Electronics and Computer Engineering (ICCECE), Guangzhou, China: IEEE, Jan. 2023, pp. 461–467. doi: 10.1109/ICCECE58074.2023.10135433.

[11] T. Agustina, M. Masrizal, and I. Irmayanti, “Performance Analysis of Random Forest Algorithm for Network Anomaly Detection using Feature Selection,” sinkron, vol. 8, no. 2, Apr. 2024, doi: 10.33395/sinkron.v8i2.13625.

[12] A. B. Siddique et al., “Studying the effects of feature selection approaches on machine learning techniques for Mushroom classification problem,” in 2023 International Conference on IT and Industrial Technologies (ICIT), Chiniot, Pakistan: IEEE, Oct. 2023, pp. 1–6. doi: 10.1109/ICIT59216.2023.10335842.

[13] M. F. Yacoub, H. A. Maghawry, N. A. Helal, S. V. Soto, and T. F. Gharib, “An Efficient 2-Stages Classification Model for Students Performance Prediction,” in Proceedings of the 8th International Conference on Advanced Intelligent Systems and Informatics 2022, vol. 152, A. E. Hassanien, V. Snášel, M. Tang, T.-W. Sung, and K.-C. Chang, Eds., in Lecture Notes on Data Engineering and Communications Technologies, vol. 152. , Cham: Springer International Publishing, 2023, pp. 107–122. doi: 10.1007/978-3-031-20601-6_9.

[14] SDNBV College for Women, University of Madras, Chrompet, Chennai, 600 044, India, E. S. Sujatha, and R. R. Radha, “A Hybrid of Proposed Filtration and Feature Selections to Enhance the Model Performance,” Indian J. Sci. Technol., vol. 14, no. 24, pp. 2039–2050, Jun. 2021, doi: 10.17485/IJST/v14i24.2017.

[15] D. M. Sharif, “Application-Layer DDoS Detection via Efficient Machine Learning and Feature Selection,” in 2023 International Conference on Engineering Applied and Nano Sciences (ICEANS), Erbil, Iraq: IEEE, Oct. 2023, pp. 19–23. doi: 10.1109/ICEANS58413.2023.10630487.

[16] O. Arokodare, H. Wimmer, and J. Du, “Big Data Approach For IoT Botnet Traffic Detection Using Apache Spark Technology,” in 2023 IEEE 13th Annual Computing and Communication Workshop and Conference (CCWC), Las Vegas, NV, USA: IEEE, Mar. 2023, pp. 1260–1266. doi: 10.1109/CCWC57344.2023.10099385.

[17] Arsad, A. H. Muhammad, and T. Hidayat, “Classification of Mental Disorders Using Modified Balanced Random Forest And Feature Selection,” J. Teknol. Inf. Univ. Lambung Mangkurat JTIULM, vol. 9, no. 2, pp. 45–54, Oct. 2024, doi: 10.20527/jtiulm.v9i2.320.

[18] Q. Liang, R. A. Bauder, and T. M. Khoshgoftaar, “Enhancing Medicare Fraud Detection: Random Undersampling Followed by SHAP-Driven Feature Selection with Big Data,” in 2024 IEEE 36th International Conference on Tools with Artificial Intelligence (ICTAI), Herndon, VA, USA: IEEE, Oct. 2024, pp. 256–263. doi: 10.1109/ICTAI62512.2024.00045.

[19] S. D. Satav, H. G. Patel, S. Walke, R. Megala, C. A. Patel, and P. G, “Enhancing Network Security Algorithm Using Machine Learning,” in 2023 10th IEEE Uttar Pradesh Section International Conference on Electrical, Electronics and Computer Engineering (UPCON), Gautam Buddha Nagar, India: IEEE, Dec. 2023, pp. 1636–1643. doi: 10.1109/UPCON59197.2023.10434765.

[20] P. Negi, A. Dhablia, H. B. Vanjari, J. Tamkhade, S. Ikhar, and S. T. Shirkande, “Evaluating Feature Selection Methods to Enhance Diabetes Prediction with Random Forest,” in Proceedings of the 5th International Conference on Information Management & Machine Intelligence, Jaipur India: ACM, Nov. 2023, pp. 1–7. doi: 10.1145/3647444.3647934.

[21] R. A. D. Talasari, T. Ahmad, and M. A. R. Putra, “Exploring the Potential of Feature Selection Methods for Effective and Efficient IoT Malware Detection,” in 2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT), Kamand, India: IEEE, Jun. 2024, pp. 1–6. doi: 10.1109/ICCCNT61001.2024.10726080.

[22] M. I. Prasetiyowati, N. U. Maulidevi, and K. Surendro, “Feature selection to increase the random forest method performance on high dimensional data,” Int. J. Adv. Intell. Inform., vol. 6, no. 3, p. 303, Nov. 2020, doi: 10.26555/ijain.v6i3.471.

[23] Q. Xie, G. Cheng, X. Zhang, and L. Peng, “Feature Selection Using Improved Forest Optimization Algorithm,” Inf. Technol. Control, vol. 49, no. 2, pp. 289–301, Jun. 2020, doi: 10.5755/j01.itc.49.2.24858.

[24] A. Shabbir et al., “Genetic Algorithm-Based Feature Selection for Accurate Breast Cancer Classification,” in 2023 International Conference on IT and Industrial Technologies (ICIT), Chiniot, Pakistan: IEEE, Oct. 2023, pp. 1–6. doi: 10.1109/ICIT59216.2023.10335827.

[25] W. Nuankaew and J. Thongkam, “Improving Student Academic Performance Prediction Models using Feature Selection,” in 2020 17th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON), Phuket, Thailand: IEEE, Jun. 2020, pp. 392–395. doi: 10.1109/ECTI-CON49241.2020.9158286.

[26] S. Han et al., “Optimal feature selection for firewall log analysis using Machine learning and Hybrid Metaheuristic algorithms,” Mar. 08, 2024. doi: 10.31224/osf.io/pm3hy.

[27] H. Cui, H. Xu, and J. Li, “Optimization of random forest algorithm based on mixed sampling additional feature selection,” in 2023 3rd International Conference on Consumer Electronics and Computer Engineering (ICCECE), Guangzhou, China: IEEE, Jan. 2023, pp. 461–467. doi: 10.1109/ICCECE58074.2023.10135433.

[28] S. Soim, S. Sholihin, and C. B. Subianto, “Optimizing Performance Random Forest Algorithm Using Correlation-Based Feature Selection (CFS) Method to Improve Distributed Denial of Service (DDoS) Attack Detection Accuracy,” Indones. J. Artif. Intell. Data Min., vol. 7, no. 2, p. 220, Apr. 2024, doi: 10.24014/ijaidm.v7i2.24783.

[29] T. Agustina, M. Masrizal, and I. Irmayanti, “Performance Analysis of Random Forest Algorithm for Network Anomaly Detection using Feature Selection,” sinkron, vol. 8, no. 2, Apr. 2024, doi: 10.33395/sinkron.v8i2.13625.

[30] M. I. Prasetiyowati, N. U. Maulidevi, and K. Surendro, “The Speed and Accuracy Evaluation of Random Forest Performance by Selecting Features in the Transformation Data,” in Proceedings of the 2020 The 9th International Conference on Informatics, Environment, Energy and Applications, Amsterdam Netherlands: ACM, Mar. 2020, pp. 125–130. doi: 10.1145/3386762.3386768.

[31] P. Sudhakar, D. Prasanna N, S. Bhukya, M. Azhar, G. R Suresh, and M. Ajmeera, “Wrapper-based Feature Selection for Enhanced Intrusion Detection Using Random Forest Classification,” in 2024 International Conference on IoT Based Control Networks and Intelligent Systems (ICICNIS), Bengaluru, India: IEEE, Dec. 2024, pp. 1330–1335. doi: 10.1109/ICICNIS64247.2024.10823207.

[32] P. Santosh Kumar Patra and B. Tripathy, “Hybrid optimal feature selection-based iterative deep convolution learning for COVID-19 classification system,” Comput. Biol. Med., vol. 181, p. 109031, Oct. 2024, doi: 10.1016/j.compbiomed.2024.109031.

[33] A. K. Putri, J. Wiratama, S. A. Sanjaya, S. F. Wijaya, M. E. Johan, and A. Faza, “Web URLs Phishing Detection Model with Random Forest Algorithm,” in 2024 5th International Conference on Big Data Analytics and Practices (IBDAP), Bangkok, Thailand: IEEE, Aug. 2024, pp. 1–5. doi: 10.1109/IBDAP62940.2024.10689685.

[34] A. Afandi, H. Bedi Agtriadi, L. Luqman, and M. Susanti, “Advanced Credit Scoring with Naive Bayes Algorithm: Improving Accuracy and Reliability in Financial Risk Assessment,” J. E-Komtek Elektro-Komput.-Tek., vol. 8, no. 2, pp. 399–409, Dec. 2024, doi: 10.37339/e-komtek.v8i2.2160.

[35] K. H. Abushahla and M. A. Pala, “Optimizing Diabetes Prediction: Addressing Data Imbalance with Machine Learning Algorithms,” ADBA Comput. Sci., p. 1, Jul. 2024, doi: 10.69882/adba.cs.2024075.

[36] M. Hakimi, E. Ahmady, A. K. Shahidzay, A. W. Fazil, M. M. Quchi, and R. Akbari, “Securing Cyberspace: Exploring the Efficacy of SVM (Poly, Sigmoid) and ANN in Malware Analysis,” Cogniz. J. Multidiscip. Stud., vol. 3, no. 12, pp. 199–208, Dec. 2023, doi: 10.47760/cognizance.2023.v03i12.017.

[37] H. S. Salem, M. A. Mead, and G. S. El-Taweel, “Wrapper-based Modified Binary Particle Swarm Optimization for Dimensionality Reduction in Big Gene Expression Data Analytics,” Int. J. Adv. Comput. Sci. Appl., vol. 14, no. 10, 2023, doi: 10.14569/IJACSA.2023.01410116.

[38] M. B. Yair Meidan, “detection_of_IoT_botnet_attacks_N_BaIoT.” UCI Machine Learning Repository, 2018. doi: 10.24432/C5RC8J.




DOI: https://doi.org/10.18860/ijeie.v1i1.34533

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

IJEIE : International Journal of Electrical and Intelligent Engineering
Mailing Address
Department of Electrical Engineering
Faculty of Science and Technology
Universitas Islam Negeri Maulana Malik Ibrahim Malang
Gajayana Street 50 Malang 65144, Jawa Timur, Indonesia
Email: ijeie@uin-malang.ac.id  

This work is licensed under Creative Commons Attribution-ShareAlike 4.0 International