Comparing Several Missing Data Estimation Methods in Linear Regression;Real Data Example and A Simulation Study

Anwar Fitrianto, Jap Ee Jia, Budi Susetyo, La Ode Abdul Rahman

Abstract


Analysis on incomplete could lead to biased estimation when using standard statistical procedure since it ignores the missing observations. The disadvantage of ignoring missing data is that the researcher might not have enough data to conduct an analysis. The main objective of the study is to compare the performance between listwise deletion (LD), mean substitution (MS) and multiple imputation (MI) method in estimating parameters. The performance will be measured through bias, standard error and 95% confidence interval of interested estimates for handling missing data with 10% missing observations. A complete empirical data set was used and assumed as population data. Ten percent of total observations in the population ere set as missing arbitrarily by generating random numbers from a uniform distribution,  . Then, bias of parameter estimates and confidence interval of parameter estimates are calculated to compare the three methods. A Monte Carlo simulation was carried out to know the properties of missing data and investigated using simulated random numbers. Simulation of 1000 sampled data with 20, 50, and 100 observations and each sample is set to have 10% missing observations. Standard statistical analyses are run for each missing data and get the average of parameter estimates to calculate the bias and standard error of parameter estimates for every missing data method. The analysis was conducted by using SAS version 9.2. It was found that the MI method provided the smallest bias and standard error of parameter estimates and a narrower confidence interval compared to the LD and MS methods Meanwhile, the LD method gives a smaller bias of parameter estimates and standard error for small sample size of missing data. And, MS method is strongly recommended not to use for handling missing data because it will result in large bias and standard error of parameter estimates.

Keywords


incomplete, mcar, missing, regression, simulation

Full Text:

PDF

References


S. Ghosh and P. Pahwa, “Assessing bias associated with missing data from Joint Canada,” in United States Survey of Health: an application, paper presented at the Joint Statistical Meetings, Denver, CO, USA, 2008.

G. Molenberghs and G. Verbeke, “Multiple imputation and the expectation-maximization algorithm,” Models for discrete longitudinal data, pp. 511–529, 2005.

D. B. Rubin, Multiple imputation for nonresponse in surveys, vol. 81. John Wiley & Sons, 2004.

R. L. Carter, “Solutions for missing data in structural equation modeling.,” Research & Practice in Assessment, vol. 1, pp. 4–7, 2006.

A. N. Baraldi and C. K. Enders, “An introduction to modern missing data analyses,” J Sch Psychol, vol. 48, no. 1, pp. 5–37, 2010.

R. J. A. Little, “Regression with missing X’s: a review,” J Am Stat Assoc, vol. 87, no. 420, pp. 1227–1237, 1992.

P. R. de Gil and J. D. Kromrey, “Missing_Items: A SAS® Macro for Missing Data Imputation in Summative Response Scales”.

K. F. Widaman, “Best practices in quantitative methods for developmentalists: III. Missing data: What to do with or without them.,” Monogr Soc Res Child Dev, 2006.

J. C. Wayman, “Multiple imputation for missing data: What is it and how can I use it,” in Annual Meeting of the American Educational Research Association, Chicago, IL, 2003, vol. 2, p. 16.

Y. C. Yuan, “Multiple imputation for missing data: Concepts and new development (Version 9.0),” SAS Institute Inc, Rockville, MD, vol. 49, no. 1–11, p. 12, 2010.

T. E. Raghunathan, “What do we do with missing data? Some options for analysis of incomplete data,” Annu. Rev. Public Health, vol. 25, pp. 99–117, 2004.

C.-Y. J. Peng, M. Harwell, S.-M. Liou, and L. H. Ehman, “Advances in missing data methods and implications for educational research,” Real data analysis, vol. 3178, p. 102, 2006.

UNDP, “United Nations Development Programme: Human Development Reports 2015.” Oxford University Press Oxford, 2014.

G. Ranis, F. Stewart, and E. Samman, “Human development: beyond the human development index,” Journal of Human Development, vol. 7, no. 3, pp. 323–358, 2006.

W. H. Organization, The world health report 2006: working together for health. World Health Organization, 2006.

G. Chamberlin, “Gross domestic product, real income and economic welfare,” Economic & Labour Market Review, vol. 5, pp. 5–25, 2011.

P. Paxton, P. J. Curran, K. A. Bollen, J. Kirby, and F. Chen, “Monte Carlo experiments: Design and implementation,” Structural Equation Modeling, vol. 8, no. 2, pp. 287–312, 2001.

R. J. A. Little, “A test of missing completely at random for multivariate data with missing values,” J Am Stat Assoc, vol. 83, no. 404, pp. 1198–1202, 1988.

T. G. Morrison, M. A. Morrison, and J. M. McCutcheon, “Best practice recommendations for using structural equation modelling in psychological research,” Psychology, vol. 8, no. 09, p. 1326, 2017.

T. Schwartz and R. Zeig-Owens, “Knowledge (of your missing data) is power: handling missing values in your SAS dataset,” in SAS Global Forum, 2012, pp. 1–8.

P. D. Allison, Missing data, vol. 200210, no. 9781412985079.31. Sage Thousand Oaks, CA, 2010.

D. C. Howell, “The treatment of missing data,” The Sage handbook of social science methodology, vol. 208, p. 224, 2007.

T. D. Pigott, “A review of methods for missing data,” Educational research and evaluation, vol. 7, no. 4, pp. 353–383, 2001.




DOI: https://doi.org/10.18860/ca.v7i4.20548

Refbacks

  • There are currently no refbacks.


Copyright (c) 2023 Anwar Fitrianto, Jap Ee Jia, Budi Susetyo, La Ode Abdul Rahman

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Editorial Office
Mathematics Department,
Universitas Islam Negeri Maulana Malik Ibrahim Malang
Gajayana Street 50 Malang, East Java, Indonesia 65144
Faximile (+62) 341 558933
e-mail: cauchy@uin-malang.ac.id

Creative Commons License
CAUCHY: Jurnal Matematika Murni dan Aplikasi is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.