The Variance Inflation Factor (VIF) is a measure of the degree of multicollinearity in a set of multiple regression variables. It quantifies how much the variance of an estimated regression coefficient is increased due to collinearity with other predictors.
Key points:
To derive the VIF, we start with the variance of the OLS estimator in multiple regression:
Therefore, the VIF for the j-th predictor is:
where R_j² is the coefficient of determination when X_j is regressed on all other predictors.
The interpretation of VIF values is as follows:
Specific interpretations:
The Variance Inflation Factor is a valuable tool for detecting and quantifying multicollinearity in multiple regression. By expressing how much the variance of a coefficient is inflated due to linear dependencies with other predictors, VIF provides insights into the reliability of regression coefficients and guides decisions in model refinement.
We'll derive the expression for the variance of the j-th regression coefficient in multiple linear regression. This derivation is key to understanding the Variance Inflation Factor (VIF).
In matrix notation, the variance of the OLS estimator β̂ is given by:
where X is the design matrix and σ² is the error variance.
The variance of the j-th coefficient, β̂j, is the j-th diagonal element of this matrix:
Partition X into Xj (the j-th predictor) and X-j (all other predictors):
Using the partitioned inverse formula, we can express [(X'X)⁻¹]jj as:
where M-j = I - X-j(X'-jX-j)⁻¹X'-j is the projection matrix onto the orthogonal complement of the space spanned by X-j.
M-jXj represents the residuals from regressing Xj on all other predictors. We can write:
where X̂j is the predicted Xj from its regression on other predictors.
The R-squared from regressing Xj on other predictors is:
Rearranging the R²j equation:
where SXXj is the sum of squared deviations of Xj.
Substituting back into the variance formula:
We have derived the expression:
This formula shows how the variance of β̂j is inflated by a factor of 1/(1-R²j) due to multicollinearity, which is precisely the Variance Inflation Factor (VIF).