Estimating Missing Panel Data with Regression and Multivariate Imputation by Chained Equations (MICE)

Budi Susetyo, Anwar Fitrianto

Abstract


Missing data may occur in various types of research. Regression and multiple imputation by chained equations (MICE) are two methods that can be used to estimate missing data in panel data types. This study aims to compare the accuracy of the missing panel data estimation using the regression and the MICE methods. The data used in this study are 161 random samples of senior high schools and vocational schools in DKI province for the year 2016-2020. Based on the results of the Chow test, Hausman test, and Lagrange Multiplier test on panel data regression, it shows that the appropriate model for the student-teacher ratio (X5) is random, the percentage of teachers who have an educator certificate (X6) is a fixed model with the specific effect of individual school and time, while the percentage of teachers who hold a bachelor degree (X7) is a fixed model with the specific effect of individual. Based on this model, the estimation of missing data is then carried out. The accuracy of the missing data estimation was carried out by comparing the MAPE, MAE, and RMSE values. The results show that the MICE method is quite good for estimating missing data at X5, quite feasible for estimating X6, and very good for estimating missing data at X7. In general, MICE is more accurate than panel data regression

Keywords


Missing data, Panel data, Imputation, Regression, Multiple imputation by chained equations

Full Text:

PDF

References


[1] D. N. Gujarati, Basic econometrics. Prentice Hall, 2022.

[2] A. Bell and K. Jones, “Explaining fixed effects: Random effects modeling of time-series cross-sectional and panel data,” Political Sci Res Methods, vol. 3, no. 1, pp. 133–153, 2015.

[3] A. R. Alfarisi, H. Tjandrasa, and I. Arieshanti, “Perbandingan Performa antara Imputasi Metode Konvensional dan Imputasi dengan Algoritma Mutual Nearest Neighbor,” Jurnal Teknik ITS, vol. 2, no. 1, pp. A73–A76, 2013.

[4] D. C. Montgomery, E. A. Peck, and G. G. Vining, Introduction to linear regression analysis. John Wiley & Sons, 2021.

[5] K. M. Lang and T. D. Little, “Principled missing data treatments,” Prevention science, vol. 19, no. 3, pp. 284–294, 2018.

[6] A. J. Izenman, Modern multivariate statistical techniques, vol. 1. Springer, 2008.

[7] M. W. Heymans and J. W. R. Twisk, “Handling missing data in clinical research,” J Clin Epidemiol, vol. 151, pp. 185–188, 2022.

[8] R. M. Cook, “Addressing missing data in quantitative counseling research,” Counseling Outcome Research and Evaluation, vol. 12, no. 1, pp. 43–53, 2021.

[9] J. T. Chi, E. C. Chi, and R. G. Baraniuk, “k-pod: A method for k-means clustering of missing data,” Am Stat, vol. 70, no. 1, pp. 91–99, 2016.

[10] T. F. Johnson, N. J. B. Isaac, A. Paviolo, and M. González‐Suárez, “Handling missing values in trait data,” Global Ecology and Biogeography, vol. 30, no. 1, pp. 51–62, 2021.

[11] G. T. Waterbury, “Missing data and the Rasch model: The effects of missing data mechanisms on item parameter estimation,” J Appl Meas, vol. 20, no. 2, pp. 154–166, 2019.

[12] D. Feng, Z. Cong, and M. Silverstein, “Missing data and attrition,” in Longitudinal Data Analysis, Routledge, 2013, pp. 71–96.

[13] S. Van Buuren and K. Groothuis-Oudshoorn, “mice: Multivariate imputation by chained equations in R,” J Stat Softw, vol. 45, pp. 1–67, 2011.

[14] R. J. A. Little and D. B. Rubin, Statistical analysis with missing data, vol. 793. John Wiley & Sons, 2019.

[15] P. Li, E. A. Stuart, and D. B. Allison, “Multiple imputation: a flexible tool for handling missing data,” JAMA, vol. 314, no. 18, pp. 1966–1967, 2015.

[16] J. M. Jerez et al., “Missing data imputation using statistical and machine learning methods in a real breast cancer problem,” Artif Intell Med, vol. 50, no. 2, pp. 105–115, 2010.

[17] W.-C. Lin and C.-F. Tsai, “Missing value imputation: a review and analysis of the literature (2006–2017),” Artif Intell Rev, vol. 53, pp. 1487–1509, 2020.

[18] W.-C. Lin, C.-F. Tsai, and J. R. Zhong, “Deep learning for missing value imputation of continuous data and the effect of data discretization,” Knowl Based Syst, vol. 239, p. 108079, 2022.

[19] A. M. Gad and R. H. M. Abdelkhalek, “Imputation methods for longitudinal data: A comparative study,” International Journal of Statistical Distributions and Applications, vol. 3, no. 4, p. 72, 2017.

[20] C. K. Enders, Applied missing data analysis. Guilford Publications, 2022.

[21] H. Romaniuk, G. C. Patton, and J. B. Carlin, “Multiple imputation in a longitudinal cohort study: a case study of sensitivity to imputation methods,” Am J Epidemiol, vol. 180, no. 9, pp. 920–932, 2014.

[22] J. Brüderl and V. Ludwig, “Fixed-effects panel regression,” The Sage handbook of regression analysis and causal inference, pp. 327–357, 2015.

[23] C. Hsiao, Analysis of panel data, no. 64. Cambridge university press, 2022.

[24] K. Mahmud, A. Mallik, M. F. Imtiaz, and N. Tabassum, “The bank-specific factors affecting the profitability of commercial banks in Bangladesh: A panel data analysis,” International Journal of Managerial Studies and Research, vol. 4, no. 7, pp. 67–74, 2016.

[25] J. M. Wooldridge, Introductory econometrics: A modern approach. Cengage learning, 2015.

[26] V. M. Musau, A. G. Waititu, and A. K. Wanjoya, “Modeling panel data: Comparison of GLS estimation and robust covariance matrix estimation,” American Journal of Theoretical and Applied Statistics, vol. 4, no. 3, pp. 185–191, 2015.

[27] R. Zulfikar and M. M. STp, “Estimation model and selection method of panel data regression: an overview of common effect, fixed effect, and random effect model,” JEMA: Jurnal Ilmiah Bidang Akuntansi, pp. 1–10, 2018.

[28] J. N. Wulff and L. E. Jeppesen, “Multiple imputation by chained equations in praxis: guidelines and review,” Electronic Journal of Business Research Methods, vol. 15, no. 1, pp. 41–56, 2017.

[29] G. Chhabra, V. Vashisht, and J. Ranjan, “A comparison of multiple imputation methods for data with missing values,” Indian J Sci Technol, vol. 10, no. 19, pp. 1–7, 2017.

[30] S. Van Buuren and K. Groothuis-Oudshoorn, “mice: Multivariate imputation by chained equations in R,” J Stat Softw, vol. 45, pp. 1–67, 2011.

[31] J. R. van Ginkel and P. M. Kroonenberg, “Analysis of variance of multiply imputed data,” Multivariate Behav Res, vol. 49, no. 1, pp. 78–91, 2014.

[32] C. Chen, J. Twycross, and J. M. Garibaldi, “A new accuracy measure based on bounded relative error for time series forecasting,” PLoS One, vol. 12, no. 3, p. e0174202, 2017.

[33] J. J. M. Moreno, A. P. Pol, A. S. Abad, and B. C. Blasco, “Using the R-MAPE index as a resistant measure of forecast accuracy,” Psicothema, vol. 25, no. 4, pp. 500–506, 2013.

[34] J. J. Hox, M. Moerbeek, and R. Van de Schoot, Multilevel analysis: Techniques and applications. Routledge, 2017.




DOI: https://doi.org/10.18860/ca.v9i1.24824

Refbacks

  • There are currently no refbacks.


Copyright (c) 2024 Budi Susetyo

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Editorial Office
Mathematics Department,
Universitas Islam Negeri Maulana Malik Ibrahim Malang
Gajayana Street 50 Malang, East Java, Indonesia 65144
Faximile (+62) 341 558933
e-mail: cauchy@uin-malang.ac.id

Creative Commons License
CAUCHY: Jurnal Matematika Murni dan Aplikasi is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.