BERTopic-Based Multi-Class Topic Classification on Indonesian Shopee E-commerce Reviews Using Ensemble Learning
Abstract
Keywords
Full Text:
PDFReferences
[1] I. P. Nuralam, N. Yudiono, M. R. A. Fahmi, E. S. Yuliaji, and T. Hidayat. “Perceived ease of use, perceived usefulness, and customer satisfaction as driving factors on repurchase intention: The perspective of the e-commerce market in Indonesia”. In: Cogent Business & Management 11.1 (2024). doi: 10.1080/23311975.2024.2413376.
[2] M. Mishra. “A holistic review of customer experience research: Topic modelling using BERTopic”. In: Marketing Intelligence & Planning (2024). doi: 10.1108/MIP-09-2023-0457.
[3] S. Das, S. S. Mullick, and I. Zelinka. “On supervised class-imbalanced learning: An updated perspective and some key challenges”. In: IEEE Transactions on Artificial Intelligence 3.6 (2022), pp. 973–993. doi: 10.1109/TAI.2022.3160658.
[4] Lukmanul Hakim, Bagus Sartono, and Asep Saefuddin. “Bagging Based Ensemble Classification Method on Imbalance Datasets”. In: 2017. https://api.semanticscholar.org/CorpusID:212484809.
[5] J. Tanha, Y. Abdi, N. Samadi, N. Razzaghi, and M. Asadpour. “Boosting methods for multi-class imbalanced data classification: An experimental review”. In: Journal of Big Data 7 (2020), pp. 1–47. doi: 10.1186/s40537-020-00349-y.
[6] A. N. A. Aldania, A. M. Soleh, and K. A. Notodiputro. “A comparative study of CatBoost and double random forest for multi-class classification”. In: Jurnal RESTI 7.1 (2023), pp. 129–137. doi: 10.30598/barekengvol19iss1pp227-236.
[7] A. Sharaff and H. Gupta. “Extra-tree classifier with metaheuristics approach for email classification”. In: Advances in Computer Communication and Computational Sciences. Springer, 2019, pp. 189–197. doi: 10.1007/978-981-13-6861-5_17.
[8] Slamet Riyanto, Sukaesih Sitanggang Imas, Taufik Djatna, and Tika Dewi Atikah. “Comparative analysis using various performance metrics in imbalanced data for multi-class text classification”. In: International Journal of Advanced Computer Science and Applications 14.6 (2023). doi: 10.14569/IJACSA.2023.01406116.
[9] Bambang Nazief and Mirna Adriani. “Confix Stripping Approach in Indonesian Stemming Algorithm”. In: Proceedings of the Workshop on Computational Linguistics (1996), pp. 1–13. https://dl.acm.org/doi/10.1145/1316457.1316459.
[10] Indra, Edi Winarko, and Reza Pulungan. “Trending topics detection of Indonesian tweets using BN-grams and Doc-p”. In: Journal of King Saud University – Computer and Information Sciences 31.2 (Apr. 2019), pp. 266–274. doi: 10.1016/j.jksuci.2018.01.005.
[11] Maarten Grootendorst. “BERTopic: Neural Topic Modeling with a Class-Based TF–IDF Procedure”. In: arXiv preprint arXiv:2203.05794 (2022). doi: 10.48550/arXiv.2203.05794.
[12] Bryan Wilie, Kevin Vincentio, Genta Indra Winata, Samuel Cahyawijaya, Xiaohong Li, Zhi Yuan Lim, Sidik Soleman, and Pascale Fung. “IndoBenchmark: Benchmarking Natural Language Processing Tasks for Indonesian”. In: Proceedings of the 28th International Conference on Computational Linguistics (2020), pp. 843–857. doi: 10.48550/arXiv.2009.05387.
[13] Nils Reimers and Iryna Gurevych. “Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks”. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (2019). doi: 10.18653/v1/D19-1410.
[14] Leland McInnes, John Healy, and James Melville. “UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction”. In: arXiv preprint arXiv:1802.03426 (2018). doi: 10.48550/arXiv.1802.03426.
[15] Leland McInnes, John Healy, and Steve Astels. “hdbscan: Hierarchical Density Based Clustering”. In: Journal of Open Source Software 2.11 (2017), p. 205. doi: 10.21105/joss.00205.
[16] Liudmila Ostroumova, Gleb Gusev, Aleksandr Vorobev, Anna Veronika Dorogush, and Andrey Gulin. “CatBoost: unbiased boosting with categorical features”. In: Neural Information Processing Systems. 2017. https://api.semanticscholar.org/CorpusID:5044218.
[17] John T. Hancock and Taghi M. Khoshgoftaar. “CatBoost for big data: an interdisciplinary review”. In: Journal of Big Data 7 (2020). doi: 10.1186/s40537-020-00369-8.
[18] Pierre Geurts, Damien Ernst, and Louis Wehenkel. “Extremely randomized trees”. In: Machine Learning 63 (2006), pp. 3–42. doi: 10.1007/s10994-006-6226-1.
[19] Budi Padmaja, Vicky Prasa, and K. V. N. Sunitha. “A Novel Random Split Point Procedure Using Extremely Randomized (Extra) Trees Ensemble Method for Human Activity Recognition”. In: EAI Endorsed Transactions on Pervasive Health and Technology 6 (2020), e5. https://api.semanticscholar.org/CorpusID:219545647.
[20] Chalvina Izumi and Nidya Sari Rahmawati. “Handling Multiclass Imbalance in Diabetes, Cancer, and Pneumonia Classification Using NR-Clustering SMOTE”. In: IJACI: International Journal of Advanced Computing and Informatics (2025). https://api.semanticscholar.org/CorpusID:282367647.
[21] Juan Enrique Ramos. “Using TF-IDF to Determine Word Relevance in Document Queries”. In: 2003. https://api.semanticscholar.org/CorpusID:14638345.
DOI: https://doi.org/10.18860/cauchy.v11i1.37941
Refbacks
- There are currently no refbacks.
Copyright (c) 2026 Kevin Alifviansyah

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Editorial Office
Mathematics Department,
Universitas Islam Negeri Maulana Malik Ibrahim Malang
Gajayana Street 50 Malang, East Java, Indonesia 65144
Faximile (+62) 341 558933
e-mail: cauchy@uin-malang.ac.id

CAUCHY: Jurnal Matematika Murni dan Aplikasi is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.







