TB6 - Diagnostics and Transformations for MLR#
6.4 - Multicollinearity#
Important issues arise when strong correlations exist among the predictor variables (often referred to as multicollinearity). In particular, many regression coefficients can have the wrong sign and/or many of the predictor variables are not statistically significant when the overall F-test is highly significant.
Consider a multiple regression model with two predictors
Let \(r_{12}\) denote the correlation between \(x_1\) and \(x_2\) and \(S_{x_j}\) denote the standard deviation of \(x_j\). Then it can be shown that
Note how the variance of \(\hat\beta_j\) gets larger as the absolute value of \(r_{12}\) increases. Thus, the correlation amongst the predictors increases the variance of the estimated regression coefficients.
The term \(\frac{1}{1 - r^2_{12}}\) is the variance inflation factor (VIF). VIF values above 10 are a typical cutoff for multicollinearity. Any values above this indicate that the associated regression coefficients are poorly estimated.
vif(model)