Optimizing K-Means Clustering through Distance Metric Simulation for Strategic Enrollment Segmentation in Private Universities

Regita Putri Permata, Amalia Nur Alifah, I Made Wisnu Adi Sanjaya

Abstract


K-Means clustering is a widely used unsupervised learning technique for identifying patterns and grouping data based on feature similarities. However, the effectiveness of K-Means significantly depends on the choice of distance metric. This study conducts a comprehensive simulation to evaluate and compare the performance of four distance metrics—Euclidean, Cityblock (Manhattan), Canberra, and Mahalanobis—in the context of strategic market segmentation for private universities. The dataset includes simulated and institutional data incorporating variables such as account creation, registration, graduation, student performance (social, science, and scholastic scores), income, and geographic distance. The results indicate that Euclidean and Cityblock distances yield efficient and interpretable clusters with low computational costs, whereas Mahalanobis distance, despite its capacity to model covariance, introduces computational overhead without proportional improvement in segmentation quality. Interestingly, Canberra distance produces compact clusters but offers no significant gain in separability. From the resulting segmentation, two clusters emerge as high-potential targets for marketing strategies: Cluster 0 (high-income and distant students) and Cluster 1 (diverse academic and socioeconomic profiles). The findings highlight the importance of aligning distance metric selection with specific clustering objectives and offer practical insights for data-driven strategic enrollment planning in private higher education institutions.


Keywords


statistics

Full Text:

PDF

References


[1] A. Abdulhafedh, “Incorporating K-means, Hierarchical Clustering and PCA in Customer Segmentation”.

[2] A. M. Ikotun, A. E. Ezugwu, L. Abualigah, B. Abuhaija, and J. Heming, “K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data,” Information Sciences, vol. 622, pp. 178–210, Apr. 2023, doi: 10.1016/j.ins.2022.11.139.

[3] S. Abadi et al., “Application model of k-means clustering: insights into promotion strategy of vocational high school,” International Journal of Engineering & Technology, vol. 7, no. 2.27, Art. no. 2.27, Aug. 2018, doi: 10.14419/ijet.v7i2.11491.

[4] Md. Zubair, MD. A. Iqbal, A. Shil, M. J. M. Chowdhury, M. A. Moni, and I. H. Sarker, “An Improved K-means Clustering Algorithm Towards an Efficient Data-Driven Modeling,” Ann. Data. Sci., Jun. 2022, doi: 10.1007/s40745-022-00428-2.

[5] M. Faisal, E. M. Zamzami, and Sutarman, “Comparative Analysis of Inter-Centroid K-Means Performance using Euclidean Distance, Canberra Distance and Manhattan Distance,” J. Phys.: Conf. Ser., vol. 1566, no. 1, p. 012112, Jun. 2020, doi: 10.1088/1742-6596/1566/1/012112.

[6] S. Suraya, M. Sholeh, and D. Andayati, “Comparison of distance metric in k-mean algorithm for clustering wheat grain datasheet,” Jurnal Teknik Informatika C.I.T Medicom, vol. 15, no. 2, Art. no. 2, May 2023, doi: 10.35335/cit.Vol15.2023.408.pp73-83.

[7] D. Jollyta, P. Prihandoko, D. Priyanto, A. Hajjah, and Y. N. Marlim, “Comparison of Distance Measurements Based on k-Numbers and Its Influence to Clustering,” MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer, vol. 23, no. 1, Art. no. 1, Nov. 2023, doi: 10.30812/matrik.v23i1.3078.

[8] A. Farid, E. Pavel, G. Marina, P. Irina, and M. Sergey, “Clustering of k-means based on Euclidean distance metric and Mahalanobis metric,” 2024.

[9] H. Ghorbani, “MAHALANOBIS DISTANCE AND ITS APPLICATION FOR DETECTING MULTIVARIATE OUTLIERS”.

[10] M. Raeisi and A. B. Sesay, “A Distance Metric for Uneven Clusters of Unsupervised K-Means Clustering Algorithm,” IEEE Access, vol. 10, pp. 86286–86297, 2022, doi: 10.1109/ACCESS.2022.3198992.

[11] “Comparative Analysis of Euclidean, Manhattan, Canberra, and Squared Chord Methods in Face Recognition | IIETA.” Accessed: Aug. 21, 2024. [Online]. Available: https://www.iieta.org/journals/ria/paper/10.18280/ria.370308

[12] “Trapletti, A.: On neural networks as statistical time series models. Ph.D. thesis, Vienna University of Technology, Vienna (2000)”.

[13] “Manhattan Distance - an overview | ScienceDirect Topics.” Accessed: Aug. 22, 2024. [Online]. Available: https://www.sciencedirect.com/topics/computer-science/manhattan-distance

[14] B. H. Russell and L. R. Lines, “Mahalanobis clustering, with applications to AVO classification and seismic reservoir parameter estimation,” vol. 15, 2003.

[15] R. Mohemad, N. N. M. Muhait, N. M. M. Noor, and Z. A. Othman, “Performance analysis in text clustering using k-means and k-medoids algorithms for Malay crime documents,” International Journal of Electrical and Computer Engineering (IJECE), vol. 12, no. 5, Art. no. 5, Oct. 2022, doi: 10.11591/ijece.v12i5.pp5014-5026.

[16] S. Suraya, M. Sholeh, and U. Lestari, “Evaluation of Data Clustering Accuracy using K-Means Algorithm,” International Journal of Multidisciplinary Approach Research and Science, vol. 2, no. 01, Art. no. 01, 2024, doi: 10.59653/ijmars.v2i01.504.

[17] Deny Jollyta, Syahril Efendi, and Muhammad Zarlis, “Analysis of an optimal cluster approach: a review paper - IOPscience.” Accessed: Aug. 21, 2024. [Online]. Available: https://iopscience.iop.org/article/10.1088/1742-6596/2421/1/012015

[18] M. Paramadina and M. K. Aidid, “Perbandingan Analisis Cluster Metode Average Linkage dan Metode Ward (Kasus: IPM Provinsi Sulawesi Selatan),” vol. 1, no. 2, 2019.

[19] C.-E. Ben Ncir, A. Hamza, and W. Bouaguel, “Parallel and scalable Dunn Index for the validation of big data clusters,” Parallel Computing, vol. 102, p. 102751, May 2021, doi: 10.1016/j.parco.2021.102751.

[20] H. Malikhatin, A. Rusgiyono, and D. A. I. Maruddani, “PENERAPAN k-MODES CLUSTERING DENGAN VALIDASI DUNN INDEX PADA PENGELOMPOKAN KARAKTERISTIK CALON TKI MENGGUNAKAN R-GUI,” Jurnal Gaussian, vol. 10, no. 3, Art. no. 3, Dec. 2021.

[21] C. Ais, A. Hamid, and D. C. R. Novitasari, “Analysis of Livestock Meat Production in Indonesia Using Fuzzy C-Means Clustering,” Jurnal Ilmu Komputer dan Informasi, vol. 15, no. 1, pp. 1–8, Feb. 2022, doi: 10.21609/jiki.v15i1.993.




DOI: https://doi.org/10.18860/cauchy.v10i2.33089

Refbacks

  • There are currently no refbacks.


Copyright (c) 2025 Regita Putri Permata, Amalia Nur Alifah

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Editorial Office
Mathematics Department,
Universitas Islam Negeri Maulana Malik Ibrahim Malang
Gajayana Street 50 Malang, East Java, Indonesia 65144
Faximile (+62) 341 558933
e-mail: cauchy@uin-malang.ac.id

Creative Commons License
CAUCHY: Jurnal Matematika Murni dan Aplikasi is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.