Open Access Subscription or Fee Access
Bias in regression coefficient estimates upon different treatments of systematically missing data
In this simulation study, the bias in regression coefficient estimates was investigated in a four-predictor multiple regression model involving four missing data treatments, namely expectation maximization (EM), mean substitution (MS), pairwise deletion (PD) and regression imputation (RI) methods. This was done under different conditions of percent missing, pattern of systematically missing data and non-normality.
Findings indicate no significant difference across the four MDTs in the bias for b2, b3 and b4, the regression coefficients of X2, X3 and X4, respectively. Therefore, only the bias in b1, the regression coefficient of the variable with no missing data, is reported.
Overall, the bias under monotonic missing pattern was lower than that under non-monotonic missing pattern. For monotonic pattern, the lowest bias was under regression imputation (RI), followed by expectation-maximization method (EM). For this pattern, mean substitution (MS), pairwise deletion (PD) and regression imputation (RI) consistently overestimated the population parameter, regardless of percent missing and level of non-normality. The overestimation consistently increased with percent missing.
The findings under non-monotonic pattern indicate that RI had the lowest bias, followed by EM. MS and PW consistently overestimated the population parameter. EM and RI, on the other hand, tended to consistently underestimate the population parameter under non-monotonic pattern.
Keywords: Missing data, bias, regression, percent missing, non-normality, missing pattern
> East African Journal of Statistics Vol. 1 (2) 2006: pp. 185-197