Cross-Dataset Evaluation of Support Vector Machines: A Reproducible, Calibration-Aware Baseline for Tabular Classification
Abstract
Keywords
Full Text:
PDFReferences
[1] C. Cortes and V. Vapnik, “Support-vector networks,” Machine Learning, vol. 20, no. 3, pp. 273–297, 1995. doi: 10.1007/BF00994018.
[2] R. Shwartz-Ziv and A. Armon, “Tabular data: Deep learning is not all you need,” Information Fusion, vol. 81, pp. 84–90, 2022. doi: 10.1016/j.inffus.2021.11.011.
[3] S. Kaufman, S. Rosset, C. Perlich, and O. Stitelman, “Leakage in data mining: Formulation, detection, and avoidance,” ACM Transactions on Knowledge Discovery from Data (TKDD), vol. 6, no. 4, pp. 1–21, 2012. doi: 10.1145/2382577.2382579.
[4] M. Jamhuri, M. Irawan, I. Mukhlash, M. Iqbal, and N. Puspaningsih, “Neural networks optimization via Gauss–Newton based QR factorization on SARS-CoV-2 variant classification,” Systems and Soft Computing, vol. 7, p. 200195, 2025.
[5] D. Ulya, J. Juhari, R. Yuliana, and M. Jamhuri, “Reliable and efficient sentiment analysis on IMDb with logistic regression,” CAUCHY: Jurnal Matematika Murni dan Aplikasi, vol. 10, no. 2, pp. 821–834, 2025.
[6] J. Pineau, P. Vincent-Lamarre, K. Sinha, et al., “Improving reproducibility in machine learning research: A report from the NeurIPS 2019 reproducibility program,” Journal of Machine Learning Research, vol. 22, no. 164, pp. 1–20, 2021. Available online.
[7] C. Guo, G. Pleiss, Y. Sun, and K. Q. Weinberger, “On calibration of modern neural networks,” in Proc. 34th Int. Conf. on Machine Learning (ICML), vol. 70, PMLR, 2017, pp. 1321–1330. doi: 10.48550/arXiv.1706.04599.
[8] M. P. Naeini, G. Cooper, and M. Hauskrecht, “Obtaining well calibrated probabilities using Bayesian binning,” in Proc. AAAI Conf. on Artificial Intelligence, vol. 29, 2015. doi: 10.1609/aaai.v29i1.9602.
[9] Scikit-learn documentation, “Pipeline and composite estimators,” Available: https://scikit-learn.org/stable/modules/compose.html. Accessed: Aug. 30, 2025.
[10] H. He and E. A. Garcia, “Learning from imbalanced data,” IEEE Trans. on Knowledge and Data Engineering, vol. 21, no. 9, pp. 1263–1284, 2009. doi: 10.1109/TKDE.2008.239.
[11] J. Davis and M. Goadrich, “The relationship between precision-recall and ROC curves,” in Proc. 23rd Int. Conf. on Machine Learning (ICML), 2006, pp. 233–240. doi: 10.1145/1143844.1143874.
[12] J. Platt, et al., “Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods,” Advances in Large Margin Classifiers, vol. 10, no. 3, pp. 61–74, 1999. Available online.
[13] B. Zadrozny and C. Elkan, “Transforming classifier scores into accurate multiclass probability estimates,” in Proc. 8th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, 2002, pp. 694–699. Available online.
[14] B. Zadrozny and C. Elkan, “Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers,” in Proc. 18th Int. Conf. on Machine Learning (ICML), Morgan Kaufmann, 2001, pp. 609–616.
[15] S. Arlot and A. Celisse, “A survey of cross-validation procedures for model selection,” Statistics Surveys, vol. 4, pp. 40–79, 2010. doi: 10.1214/09-SS054.
[16] Q. McNemar, “Note on the sampling error of the difference between correlated proportions or percentages,” Psychometrika, vol. 12, no. 2, pp. 153–157, 1947. doi: 10.1007/BF02295996.
[17] J. Demšar, “Statistical comparisons of classifiers over multiple data sets,” Journal of Machine Learning Research, vol. 7, pp. 1–30, 2006.
[18] G. Ke, Q. Meng, T. Finley, et al., “LightGBM: A highly efficient gradient boosting decision tree,” in Advances in Neural Information Processing Systems (NeurIPS), vol. 30, 2017.
[19] T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” in Proc. 22nd ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, 2016, pp. 785–794. doi: 10.1145/2939672.2939785.
[20] S. Lundberg and S.-I. Lee, “A unified approach to interpreting model predictions,” in Advances in Neural Information Processing Systems (NeurIPS), vol. 30, 2017.
[21] J. Larson, S. Mattu, J. Angwin, and L. Kirchner, “An evaluation of machine learning classifiers for predicting diabetes on Pima Indians data: Ethical implications,” in Proc. AAAI/ACM Conf. on AI, Ethics, and Society, ACM, 2018, pp. 1–7. doi: 10.1145/3278721.3278730.
[22] Y. Rimal, N. Sharma, S. Paudel, A. Alsadoon, M. P. Koirala, and S. Gill, “Comparative analysis of heart disease prediction using logistic regression, SVM, KNN, and random forest with cross-validation for improved accuracy,” Scientific Reports, vol. 15, no. 1, p. 13444, 2025.
[23] C. Bentéjac, A. Csörgő, and G. Martínez-Muñoz, “A comparative analysis of gradient boosting algorithms,” Artificial Intelligence Review, vol. 54, no. 3, pp. 1937–1967, 2021.
[24] D. Khanna, R. Sahu, V. Baths, and B. Deshpande, “Comparative study of classification techniques (SVM, logistic regression and neural networks) to predict the prevalence of heart disease,” International Journal of Machine Learning and Computing, vol. 5, no. 5, p. 414, 2015. doi: 10.7763/IJMLC.2015.V5.544.
[25] M. Awad and R. Khanna, Efficient Learning Machines: Theories, Concepts, and Applications for Engineers and System Designers. Springer Nature, 2015. doi: 10.1007/978-1-4302-5990-9.
DOI: https://doi.org/10.18860/jrmm.v4i6.33438
Refbacks
- There are currently no refbacks.



1.png)
.png)




