Ensemble Bagging in Binary Logistic Regression for Transportation Mode Selection
Abstract
This study examines train versus bus transportation mode choice on the Malang–Blitar route using binary logistic regression combined with ensemble bagging. Data from 100 respondents were analyzed using 80% for training and 20% for testing with k-fold cross-validation. Variables included travel cost differences, time, safety, comfort, and ease of access. Bagging was selected over other ensemble methods due to its effectiveness in reducing variance and overfitting with small datasets. Results showed the standard logistic regression achieved 85% accuracy on test data, while ensemble bagging with 200 replications improved accuracy to 90.83% (confidence interval: 90.379%–91.187%). McNemar’s test confirmed a statistically significant improvement (p < 0.01). Under equivalent conditions, 20.6% of respondents preferred trains while 79.4% chose buses. Ease of access emerged as the primary decision factor, outweighing cost and time considerations. The optimal replication number was 200; exceeding 300 replications decreased model performance. This research contributes an optimized ensemble methodology for transportation mode prediction in developing countries, demonstrating that accessibility infrastructure significantly influences passenger preferences over traditional economic factors.
Keywords
Full Text:
PDFReferences
[1] T. Cao, X. Song, and J. Wang, “A Comparison of the Effectiveness of Techniques for Predicting Binary Dependent Variables,” in 2022 21st International Symposium on Communications and Information Technologies, ISCIT 2022, Institute of Electrical and Electronics Engineers Inc., 2022, pp. 160–165. doi: 10.1109/ISCIT55906.2022.9931323.
[2] X. Cai, Q. Jin, and W. Zhang, “Traffic Flow Prediction: A Method Using Bagging-Based Ensemble Learning Model,” Science Journal of Applied Mathematics and Statistics, vol. 12, no. 5, pp. 72–79, Oct. 2024, doi: 10.11648/j.sjams.20241205.11.
[3] S. Roy, Y. P. Singh, U. Biswas, D. S. Gurjar, and T. Goel, “Machine Learning in Smart Transportation Systems for Mode Detection,” in Proceedings of the 2021 IEEE 18th India Council International Conference, INDICON 2021, Institute of Electrical and Electronics Engineers Inc., 2021. doi: 10.1109/INDICON52576.2021.9691616.
[4] H. Huda et al., “Train Delay Analysis Using Logistic Regression Approach,” Apr. 2024. [Online]. Available: www.ijfmr.com
[5] A. Brenner, M. Wu, and S. Amin, “Interpretable Machine Learning Models for Modal Split Prediction in Transportation Systems,” in IEEE Conference on Intelligent Transportation Systems, Proceedings, ITSC, Institute of Electrical and Electronics Engineers Inc., 2022, pp. 901–908. doi: 10.1109/ITSC55140.2022.9921938.
[6] N. Fahriza et al., “The Open Transportation Journal Travel Mode Choice Modeling: Predictive Efficacy between Machine Learning Models and Discrete Choice Model,” 2021, doi: 10.2174/18744478021150102.
[7] P. Wang, Z. Dong, and J. Tang, “Modelling Intercity Travel Mode Choice Behavior Based on the Logistic Regression Stacking Fusion Algorithm,” in 7th IEEE International Conference on Transportation Information and Safety, ICTIS 2023, Institute of Electrical and Electronics Engineers Inc., 2023, pp. 2236–2242. doi: 10.1109/ICTIS60134.2023.10243830.
[8] X. Zhao, X. Yan, A. Yu, and P. Van Hentenryck, “Prediction and behavioral analysis of travel mode choice: A comparison of machine learning and logit models,” Travel Behav Soc, vol. 20, pp. 22–35, Jul. 2020, doi: 10.1016/j.tbs.2020.02.003.
[9] J. Á. Martín-Baos, J. A. López-Gómez, L. Rodriguez-Benitez, T. Hillel, and R. García-Ródenas, “A prediction and behavioural analysis of machine learning methods for modelling travel mode choice,” Transp Res Part C Emerg Technol, vol. 156, Nov. 2023, doi: 10.1016/j.trc.2023.104318.
[10] J. José´, J. Martín-Baos, R. García-Ródenas, and L. Rodriguez-Benitez, “TRANSPORTATION LETTERS Revisiting Kernel Logistic Regression under the Random Utility Models perspective. An Interpretable Machine Learning Approach,” 2021, doi: 10.1080/19427867.2020.
[11] H. Naseri, E. O. D. Waygood, B. Wang, and Z. Patterson, “Application of Machine Learning to Child Mode Choice with a Novel Technique to Optimize Hyperparameters,” Int J Environ Res Public Health, vol. 19, no. 24, Dec. 2022, doi: 10.3390/ijerph192416844.
[12] H. Chen and Y. Cheng, “Travel Mode Choice Prediction Using Imbalanced Machine Learning,” IEEE Transactions on Intelligent Transportation Systems, vol. 24, no. 4, pp. 3795–3808, Apr. 2023, doi: 10.1109/TITS.2023.3237681.
[13] V. Kotenko, “Application of algorithmic models of machine learning to the freight transportation process,” Transport technologies, vol. 2022, no. 2, pp. 10–21, Dec. 2022, doi: 10.23939/tt2022.02.010.
[14] J. J. M. D’Cruz, A. P. Alex, and V. S. Manju, “MODE CHOICE ANALYSIS OF SCHOOL TRIPS USING RANDOM FOREST TECHNIQUE,” Archives of Transport, vol. 63, no. 3, pp. 39–48, 2022, doi: 10.5604/01.3001.0015.9175.
[15] D. W. . Hosmer, Stanley. Lemeshow, and R. X. . Sturdivant, Applied logistic regression, Third. Canada: Wiley, 2013.
[16] S. Burhan and A. Kresna Jaya, “Penaksiran Parameter Regresi Linier Logistik dengan Metode Maksimum Likelihood Lokal pada Resiko Kanker Payudara di Makassar,” 2018.
[17] R. Ramandhani and D. Safitri, “METODE BOOTSTRAP AGGREGATING REGRESI LOGISTIK BINER UNTUK KETEPATAN KLASIFIKASI KESEJAHTERAAN RUMAH TANGGA DI KOTA PATI,” JURNAL GAUSSIAN, vol. 6, no. 1, pp. 121–130, 2017, [Online]. Available: http://ejournal-s1.undip.ac.id/index.php/gaussian
[18] L. Anisa and N. A. K. Rifai, “Analisis Regresi Logistik Biner dengan Metode Penalized Maximum Likelihood pada Penyakit Covid-19 di RSUD Pringsewu,” Jurnal Riset Statistika, pp. 129–136, Dec. 2022, doi: 10.29313/jrs.v2i2.1425.
[19] R. Woro Maharsi, dan Sulistyo Hadi, and K. kunci, “Pemilihan Metode Terbaik Support Vector Machine (SVM) Dan Regresi Logistik Biner Untuk Klasifikasi Status Kemiskinan Rumah Tangga Di Provinsi Lampung Tahun 2019,” 2022.
[20] Z. Sanchez-Varela, D. Boullosa-Falces, J. L. L. Barrena, and M. A. Gomez-Solaeche, “Prediction of loss of position during dynamic positioning drilling operations using binary logistic regression modeling,” J Mar Sci Eng, vol. 9, no. 2, pp. 1–18, Feb. 2021, doi: 10.3390/jmse9020139.
[21] L. Maretva Cendani and A. Wibowo, “Perbandingan Metode Ensemble Learning pada Klasifikasi Penyakit Diabetes,” 2022.
[22] A. Efendi, R. Fitriani, H. I. Naufal, and B. Rahayudi, “ENSEMBLE ADABOOST IN CLASSIFICATION AND REGRESSION TREES TO OVERCOME CLASS IMBALANCE IN CREDIT STATUS OF BANK CUSTOMERS,” J Theor Appl Inf Technol, vol. 15, p. 17, 2020, [Online]. Available: www.jatit.org
[23] J. Friedman, T. Hastie, and R. Tibshirani, “ADDITIVE LOGISTIC REGRESSION: A STATISTICAL VIEW OF BOOSTING,” 2000.
[24] A. I. Marqués, V. García, and J. S. Sánchez, “Exploring the behaviour of base classifiers in credit scoring ensembles,” Expert Syst Appl, vol. 39, no. 11, pp. 10244–10250, Sep. 2012, doi: 10.1016/j.eswa.2012.02.092.
[25] J. Wang, F. Li, J. Li, C. Hou, Y. Qian, and J. Liang, “RSS-Bagging: Improving Generalization Through the Fisher Information of Training Data,” IEEE Trans Neural Netw Learn Syst, 2023, doi: 10.1109/TNNLS.2023.3270559.
[26] S. Innassuraiya, T. Widiharih, and I. T. Utami, “ANALISIS KLASIFIKASI MENGGUNAKAN METODE REGRESI LOGISTIK BINER DAN BOOTSTRAP AGGREGATING CLASSIFICATION AND REGRESSION TREES (BAGGING CART) (Studi Kasus: Nasabah Koperasi Simpan Pinjam Dan Pembiayaan Syariah (KSPPS)),” vol. 11, no. 2, pp. 183–194, 2022, [Online]. Available: https://ejournal3.undip.ac.id/index.php/gaussian/
[27] M. Raihan-Al-Masud and M. Rubaiyat Hossain Mondal, “Data-driven diagnosis of spinal abnormalities using feature selection and machine learning algorithms,” PLoS One, vol. 15, no. 2, Feb. 2020, doi: 10.1371/journal.pone.0228422.
[28] S. Innassuraiya, T. Widiharih, and I. T. Utami, “ANALISIS KLASIFIKASI MENGGUNAKAN METODE REGRESI LOGISTIK BINER DAN BOOTSTRAP AGGREGATING CLASSIFICATION AND REGRESSION TREES (BAGGING CART) (Studi Kasus: Nasabah Koperasi Simpan Pinjam Dan Pembiayaan Syariah (KSPPS)),” vol. 11, no. 2, pp. 183–194, 2022, [Online]. Available: https://ejournal3.undip.ac.id/index.php/gaussian/
[29] D. Liana Wella Putri and S. Mariani, “Peningkatan Ketepatan Klasifikasi Model Regresi Logistik Biner dengan Metode Bagging (Bootstrap Aggregating),” 2021. [Online]. Available: http://journal.unnes.ac.id/nju/index.php/JM
DOI: https://doi.org/10.18860/cauchy.v10i2.32241
Refbacks
- There are currently no refbacks.
Copyright (c) 2025 Nuzulul Laili Nabila, Sobri Abusini, Umu Sa'adah

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Editorial Office
Mathematics Department,
Universitas Islam Negeri Maulana Malik Ibrahim Malang
Gajayana Street 50 Malang, East Java, Indonesia 65144
Faximile (+62) 341 558933
e-mail: cauchy@uin-malang.ac.id

CAUCHY: Jurnal Matematika Murni dan Aplikasi is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.







