BERTopic-Based Multi-Class Topic Classification on Indonesian Shopee E-commerce Reviews Using Ensemble Learning
Abstract
Keywords
Full Text:
PDFReferences
I. P. Nuralam, N. Yudiono, M. R. A. Fahmi, E. S. Yuliaji, and T. Hidayat, “Perceived ease of use, perceived usefulness, and customer satisfaction as driving factors on repurchase intention: The perspective of the e-commerce market in indonesia,” Cogent Business & Management, vol. 11, no. 1, 2024. doi: 10.1080/23311975.2024.2413376.
M. Mishra, “A holistic review of customer experience research: Topic modelling using bertopic,” Marketing Intelligence & Planning, 2024. doi: 10.1108/MIP-09-2023-0457.
S. Das, S. S. Mullick, and I. Zelinka, “On supervised class-imbalanced learning: An updated perspective and some key challenges,” IEEE Transactions on Artificial Intelligence, vol. 3, no. 6, pp. 973–993, 2022. doi: 10.1109/TAI.2022.3160658.
L. Hakim, B. Sartono, and A. Saefuddin, “Bagging based ensemble classification method on imbalance datasets,” 2017. Available online.
J. Tanha, Y. Abdi, N. Samadi, N. Razzaghi, and M. Asadpour, “Boosting methods for multi-class imbalanced data classification: An experimental review,” Journal of Big Data, vol. 7, pp. 1–47, 2020. doi: 10.1186/s40537-020-00349-y.
A. N. A. Aldania, A. M. Soleh, and K. A. Notodiputro, “A comparative study of catboost and double random forest for multi-class classification,” Jurnal RESTI, vol. 7, no. 1, pp. 129–137, 2023. doi: 10.30598/barekengvol19iss1pp227-236.
A. Sharaff and H. Gupta, “Extra-tree classifier with metaheuristics approach for email classification,” in Advances in Computer Communication and Computational Sciences, Springer, 2019, pp. 189–197. doi: 10.1007/978-981-13-6861-5_17.
S. Riyanto, S. S. Imas, T. Djatna, and T. D. Atikah, “Comparative analysis using various performance metrics in imbalanced data for multi-class text classification,” International Journal of Advanced Computer Science and Applications, vol. 14, no. 6, 2023. doi: 10.14569/IJACSA.2023.01406116.
B. Nazief and M. Adriani, “Confix stripping approach in indonesian stemming algorithm,” Proceedings of the Workshop on Computational Linguistics, pp. 1–13, 1996. Available online.
Indra, E. Winarko, and R. Pulungan, “Trending topics detection of indonesian tweets using bn-grams and doc-p,” J. King Saud Univ. Comput. Inf. Sci., vol. 31, no. 2, pp. 266–274, Apr. 2019. doi: 10.1016/j.jksuci.2018.01.005.
M. Grootendorst, “Bertopic: Neural topic modeling with a class-based tf–idf procedure,” arXiv preprint arXiv:2203.05794, 2022. doi: DOI:10.48550/arXiv.2203.05794.
B. Wilie et al., “Indobenchmark: Benchmarking natural language processing tasks for indonesian,” Proceedings of the 28th International Conference on Computational Linguistics, pp. 843–857, 2020. doi: DOI:10.48550/arXiv.2009.05387.
N. Reimers and I. Gurevych, “Sentence-bert: Sentence embeddings using siamese bert-networks,” Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, 2019. doi: DOI:10.18653/v1/D19-1410.
L. McInnes, J. Healy, and J. Melville, “Umap: Uniform manifold approximation and projection for dimension reduction,” arXiv preprint arXiv:1802.03426, 2018. doi: DOI:10.48550/arXiv.1802.03426.
L. McInnes, J. Healy, and S. Astels, “Hdbscan: Hierarchical density based clustering,” Journal of Open Source Software, vol. 2, no. 11, p. 205, 2017. doi: DOI:10.21105/joss.00205.
L. Ostroumova, G. Gusev, A. Vorobev, A. V. Dorogush, and A. Gulin, “Catboost: Unbiased boosting with categorical features,” in Neural Information Processing Systems, 2017. Available online.
J. T. Hancock and T. M. Khoshgoftaar, “Catboost for big data: An interdisciplinary review,” Journal of Big Data, vol. 7, 2020. doi: DOI:10.1186/s40537-020-00369-8.
P. Geurts, D. Ernst, and L. Wehenkel, “Extremely randomized trees,” Machine Learning, vol. 63, pp. 3–42, 2006. doi: DOI:10.1007/s10994-006-6226-1.
B. Padmaja, V. Prasa, and K. V. N. Sunitha, “A novel random split point procedure using extremely randomized (extra) trees ensemble method for human activity recognition,” EAI Endorsed Trans. Pervasive Health Technol., vol. 6, e5, 2020. Available online.
C. Izumi and N. S. Rahmawati, “Handling multiclass imbalance in diabetes, cancer, and pneumonia classification using nr-clustering smote,” IJACI : International Journal of Advanced Computing and Informatics, 2025. Available online.
J. E. Ramos, “Using tf-idf to determine word relevance in document queries,” 2003. Available online.
DOI: https://doi.org/10.18860/cauchy.v11i1.37941
Refbacks
- There are currently no refbacks.
Copyright (c) 2026 Kevin Alifviansyah

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Editorial Office
Mathematics Department,
Universitas Islam Negeri Maulana Malik Ibrahim Malang
Gajayana Street 50 Malang, East Java, Indonesia 65144
Faximile (+62) 341 558933
e-mail: cauchy@uin-malang.ac.id

CAUCHY: Jurnal Matematika Murni dan Aplikasi is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.







