






Vol.4 , No. 5, Publication Date: Dec. 6, 2017, Page: 49-57
[1] | Jochen Hardt, Medical Psychology and Medical Sociology, Clinic and Policlinic for Psychosomatic Medicine and Psychotherapy, University Medicine Mainz, Mainz, Germany. |
The way how authors deal with missing data in health care research is often still not optimal, even if modern computers have a high power and there are programs available that would do much better. In the present paper, various ways how to deal with missing data are described, and their pro's and con's are mentioned. Out of the no imputation or single imputation methods, complete case analysis (CC), pairwise deletion (PD), mean imputation, regression imputation, Full information Maximum Likelihood (FIML) and Restricted Maximum Likelihood (REML), hot-deck, missing value indicator, Last observation carried forward (LOCF), Yates method, propensity score and worst case imputation are described. Out of the multiple imputation methods, hot-deck, propensity score, expectation maximation (EM), data augmentation (DA), multiple imputations by chained equations (MICE) and predictive mean matching (PMM) are described. Finally some recommendations were given which method can be applied for which data.
Keywords
Bias, Precision, Missing at Random, Unit Level Missing, Item Level Missing
Reference
[01] | Rubin, D. B., Inference and missing data. Biometrica, 1976. 63: p. 581-592. |
[02] | Seaman, S. R. and White, I. R., Review of inverse probability weighting for dealing with missing data. Stat Methods Med Res, 2011. |
[03] | Schneider, K. L., Clark, M. A., Rakowski, W., and Lapane, K. L., Evaluating the impact of non-response bias in the Behavioral Risk Factor Surveillance System (BRFSS). J Epidemiol Community Health, 2012. 66 (4): p. 290-5. |
[04] | Seaman, S. R., White, I. R., Copas, A. J., and Li, L., Combining multiple imputation and inverse-probability weighting. Biometrics, 2012. 68: p. 129-37. |
[05] | Little, R. J. and Rubin, D. B., Statistical Analysis with Missing Data. 2002, New York: Wiley. |
[06] | Rubin, D. B., Multiple imputations after 18 plus years. JASA, 1996. 91: p. 473-89. |
[07] | Winship, C. and Mare, R. D., Models for sample selection bias. Annual Review of Sociology, 1992. 18: p. 327-350. |
[08] | Potthoff, R. F., Tudor, G. E., Pieper, K. S., and Hasselblad, V., Can one assess wether missing data are missing at random in medical research. Stat Methods Med Res, 2006. 15: p. 213-234. |
[09] | Jamshidian, M. and Jalal, S., Tests of homoscedasticity, normality and missing at random for incomplete multivariate data. Psychometrika, 2010. 75 (4): p. 649-74. |
[10] | Yates, F., The analysis of replicated experiments when the field results are incomplete. Emporium Journal of Experimental Agriculture, 1933. 1: p. 129-42. |
[11] | Rubin, D. B., Multiple imputation for nonresponse in surveys. 1987, New York: Wiley & Sons. |
[12] | Arbuckle, J., Full information estimation in the presence of missing data, in Advanced Structural Modelling, Marcoulides, G. A. and Schumaker, R. E., Editors. 2004, Erlbaum: NJ. |
[13] | van Ginkel, J. R., van der Ark, L. A., and Sijtsma, K., Multiple imputation of item scores in test and questionaire data, and their influence on psychomertic results. Multivariate Behavioural Research, 2007. 42 (2): p. 387-414. |
[14] | Ambler, G., Omar, R. Z., and Royston, P., A comparison of imputation techniques for handling missing predictor values in a risk model with a binary outcome. Statistical Methods in Medical Research, 2007. 16 (3): p. 277-98. |
[15] | Schemper, M. and Heinze, G., Probability imputation revisited for prognostic factor studies. Stat Med, 1997. 16 (1-3): p. 73-80. |
[16] | Donders, A. R., van der Heijden, G. J., Stijnen, T., and Moons, K. G., Review: a gentle introduction to imputation of missing values. J Clin Epidemiol, 2006. 59 (10): p. 1087-91. |
[17] | Schafer, J. L. and Graham, J. W., Missing data: our view of the state of the art. Psychological Methods, 2002. 7: p. 147-177. |
[18] | Schlomer, G. L., Bauman, S., and Card, N. A., Best practices for missing data management in counseling psychology. J Couns Psychol, 2010. 57 (1): p. 1-10. |
[19] | Enders, C. E., Applied missing data analysis. 2010, New York: Guilford. |
[20] | Baraldi, A. N. and Enders, C. K., An introduction to modern missing data analyses. J Sch Psychol, 2010. 48 (1): p. 5-37. |
[21] | Schafer, J. L., Analysis of incomplete multivariate data. 1997, New York: CRC Press. |
[22] | Ferro, M. A., Missing data in longitudinal studies: cross-sectional multiple imputation provides similar estimates to full-information maximum likelihood. Annals of Epidemiology, 2014. 24: p. 75-77. |
[23] | Graham, J. W., Taylor, B. J., Olchowski, A. E., and Cumsille, P. E., Planned missing data designs in psychological research. Psychol Meth, 2006. 111: p. 323-43. |
[24] | Raabe-Hesketh, S. and Skrondal, A., Multilevel and longitudinal modelling using Stata. 2005, College Station, TX: Stata Press. |
[25] | Laird, N. M. and Ware, J. H., Random-effects models for longitudinal data. Biometrics, 1982. 38 (4): p. 963-74. |
[26] | Cnaan, A., Laird, N. M., and Slasor, P., Using the general linear mixed model to analyse unbalanced repeated measures and longitudinal data. Stat Med, 1997. 16 (20): p. 2349-80. |
[27] | Kwok, O. M., Underhill, A. T., Berry, J. W., Luo, W., Elliot, T. R., et al., Analyzing longitudinal data with multilevel models: An example with individuals living with lower level extremety intra-artcular fractures. Rehabil Psychol, 2008. 53 (3): p. 370-86. |
[28] | West, T. B., Welch, K. B., and Galecki, A. T., Linear mixed models: a practical guide using statistical software. 2007, Boca Raton: Chapman & Hall. |
[29] | Gelman, A. and Hill, J., Data analysis using regression and multilevel/hierarchical models. 2006, New York: Cambridge University Press. |
[30] | Seltman, H., Experimental Design for Behavioral and Social Sciences, ed. http://www.stat.cmu.edu/~hseltman/309/. 2013. |
[31] | Little, R. J., Yosef, M., Cain, K. C., Nan, B., and Harlow, S. D., A hot-deck multiple imputation procedure for gaps in longitudinal data on recurrent events. Stat Med, 2008. 27 (1): p. 103-20. |
[32] | Siddique, J. and Belin, T. R., Multiple imputation using an iterative hot-deck with distance-based donor selection. Stat Med, 2008. 27 (1): p. 83-102. |
[33] | Andridge, R. R. and Little, R. J., A review of Hot Deck imputation for survey response. International Statistical Review, 2010. 78: p. 40-64. |
[34] | Knol, M. J., Janssen, K. J., Donders, A. R., Egberts, A. C., Heerdink, E. R., et al., Unpredictable bias when using the missing indicator method or complete case analysis for missing confounder values: an empirical example. Journal of Clinical Epidemiology, 2010. 63 (7): p. 728-36. |
[35] | Groenwold, R. H., White, I. R., Donders, A. R., Carpenter, J. R., Altman, D. G., et al., Missing covariate data in clinical research: when and when not to use the missing-indicator method for analysis. Cmaj, 2012. |
[36] | Salim, A., Mackinnon, A., Christensen, H., and Griffiths, K., Comparison of data analysis strategies for intent-to-treat analysis in pre-test-post-test designs with substantial dropout rates. Psychiatry Res, 2008. 160 (3): p. 335-45. |
[37] | Olsen, M. K., Stechuchak, K. M., Edinger, J. D., Ulmer, C. S., and Woolson, R. F., Move over LOCF: principled methods for handling missing data in sleep disorder trials. Sleep Med, 2011. 13 (2): p. 123-32. |
[38] | Hendrix, S. B. and Wilcock, G. K., What we have learned from the myriad trials. J Nutr Health Aging, 2009. 13 (4): p. 362-4. |
[39] | Cook, R. J., Zeng, L., and Yi, G. Y., Marginal analysis of incomplete longitudinal binary data: a cautionary note on LOCF imputation. Biometrics, 2004. 60 (3): p. 820-8. |
[40] | Rosenbaum, P. R. and Rubin, D. B., The central role of the propensity score in observational studies for causal effects. Biometrica, 1983. 70 (1): p. 41-55. |
[41] | Pearl, J., Remarks on the method of propensity score. Stat Med, 2009. 28: p. 1415-1423. |
[42] | Allison, P. D., Multiple imputation for missing data: a cautionary tale. Sociological Methods & Research, 2000. 28: p. 301 - 9. |
[43] | McPherson, S., Barbosa-Leiker, C., Burns, G. L., Howell, D., and Roll, J., Missing data in substance abuse treatment research: current methods and modern approaches. Exp Clin Psychopharmacol, 2012. 20 (3): p. 243-50. |
[44] | Hardouin, J. B., Conroy, R., and Sebille, V., Imputation by the mean score should be avoided when validating a Patient Reported Outcomes questionnaire by a Rasch model in presence of informative missing data. BMC Med Res Methodol, 2011. 11: p. 105. |
[45] | Hallgren, K. A. and Witkiewitz, K., Missing Data in Alcohol Clinical Trials: A Comparison of Methods. Alcoholism-Clinical and Experimental Research, 2013. 37 (12): p. 2152-2160. |
[46] | Rubin, D. B., Multiple imputation in sample surveys. Proc Survey Res Meth Sec Am Statist Assoc, 1978. 20-34. |
[47] | Marshall, A., Altman, D. G., Holder, R. L., and Royston, P., Combining estimates of interest in prognostic modelling studies after multiple imputation: current practice and guidelines. BMC Medical Research Methodology, 2009. 9: p. 57. |
[48] | Enders, C. K., A primer on the use of modern missing-data methods in psychosomatic medicine research. Psychosom Med, 2006. 68 (3): p. 427-36. |
[49] | Reilly, M., Data Ananysis using Hot Deck multiple imputation. Journal of the Royal Statistical Society. Series D, 1992. 42: p. 307 - 313. |
[50] | Reilly, M. and Pepe, M., The relationship between hot-deck multiple imputation and weighted likelihood. Stat Med, 1997. 16 (1-3): p. 5-19. |
[51] | Wang, C. N., Little, R., Nan, B., and Harlow, S. D., A hot-deck multiple imputation procedure for gaps in longitudinal recurrent event histories. Biometrics, 2011. 67 (4): p. 1573-82. |
[52] | Horton, N. J. and Kleinman, K. P., Much ado about nothing: A comparison of missing data methods and software to fit incomplete data regression models. Am Stat, 2007. 61 (1): p. 79-90. |
[53] | Solas, Statistical Solutions. http://www.statsols.com/propensity-score-based-multiple-imputation/, 2014. |
[54] | Dempster, A. P., Laird, N., and Rubin, D. B., Maximum Likelyhood from incomplete data using the EM algorithm. J R Stat Soc (Series B), 1977. 39: p. 1-38. |
[55] | Honaker, J. and King, G., What to do about missing values in time-series cross sectional data. American Journal of Political Science, 2010. 54: p. 561-581. |
[56] | van Buuren, S., Flexible imputation of missing data. 2012, Boca Raton: CRC Press (Chapman & Hall). |
[57] | Allison, P. D., Multiple Imputation for Missing Data: A Cautionary Tale. Sociolocical Methods & Research, 2000. 28 (3): p. 301-9. |
[58] | van Buuren, S., Boshuizen, H. C., and Knook, D. L., Multiple imputation of missing blood pressure covariates in survival analysis. Statistics in Medicine, 1999. 18 (6): p. 681-94. |
[59] | Faris, P. D., Ghali, W. A., Brant, R., Norris, C. M., Galbraith, P. D., et al., Multiple imputation versus data enhancement for dealing with missing data in observational health care outcome analyses. J Clin Epidemiol, 2002. 55 (2): p. 184-91. |
[60] | Marshall, A., Altman, D. G., and Holder, R. L., Comparison of imputation methods for handling missing covariate data when fitting a Cox proportional hazards model: a resampling study. BMC Med Res Methodol, 2010. 10: p. 112. |
[61] | Landerman, L., Land, K. C., and Pieper, C. F., An empirical evaluation of the predictive mean matching method for imputing missing values. Sociolocical Methods & Research, 1997. 26: p. 3-33. |
[62] | Rubin, D. B. and Schenker, N., Multiple imputation for interval estimation from simple random samples with ignorable nonresponse. J American Statistical Association, 1986. 81: p. 366-74. |
[63] | Barnard, J. and Rubin, D. B., Small-sample degrees of freedom with multiple imputation. Biometrika, 1999. 86 (4): p. 948-55. |
[64] | Lipsitz, S. R., Parzen, M., and Zhao, L. P., A degrees-of-approximation in multiple imputation. J Statist Comput Simul, 2002. 72 (4): p. 309-18. |
[65] | Collins, L. M., Schafer, J. L., and Kam, C. - M., A comparison of inclusive and restrictive strategies in modern missing data procedures. Psychological Methods, 2001. 6: p. 330-51. |
[66] | Lee, K. J. and Carlin, J. B., Multiple imputation for missing data: fully conditional specification versus multivariate normal imputation. Am J Epidemiol, 2010. 171 (5): p. 624-32. |