Variance Inflation Factor (VIF) in Multicollinearity

1. Concept of VIF

The Variance Inflation Factor (VIF) is a measure of the degree of multicollinearity in a set of multiple regression variables. It quantifies how much the variance of an estimated regression coefficient is increased due to collinearity with other predictors.

Key points:

2. Mathematical Derivation of VIF

To derive the VIF, we start with the variance of the OLS estimator in multiple regression:

  1. In a multiple regression model, the variance of the j-th coefficient is:
    \[ Var(\hat{\beta}_j) = \frac{\sigma^2}{(1-R_j^2)S_{XX_j}} \]
    where σ² is the error variance, R_j² is the R-squared from regressing X_j on all other predictors, and S_{XX_j} is the sum of squared deviations of X_j.
  2. In the absence of multicollinearity (R_j² = 0), the variance would be:
    \[ Var(\hat{\beta}_j)_{no\_collinearity} = \frac{\sigma^2}{S_{XX_j}} \]
  3. The VIF is defined as the ratio of these variances:
    \[ VIF_j = \frac{Var(\hat{\beta}_j)}{Var(\hat{\beta}_j)_{no\_collinearity}} = \frac{1}{1-R_j^2} \]

Therefore, the VIF for the j-th predictor is:

\[ VIF_j = \frac{1}{1-R_j^2} \]

where R_j² is the coefficient of determination when X_j is regressed on all other predictors.

3. Interpretation of VIF Values

The interpretation of VIF values is as follows:

Specific interpretations:

4. Implications and Use of VIF

  1. Identifying Multicollinearity: High VIF values indicate which variables are involved in multicollinearity.
  2. Variable Selection: VIF can guide the removal of highly collinear predictors to improve model stability.
  3. Model Interpretation: High VIFs suggest that coefficient estimates and their standard errors may be unreliable.
  4. Prediction Accuracy: While multicollinearity doesn't affect overall model fit, it can impact individual predictor importance and model interpretation.

5. Limitations

6. Conclusion

The Variance Inflation Factor is a valuable tool for detecting and quantifying multicollinearity in multiple regression. By expressing how much the variance of a coefficient is inflated due to linear dependencies with other predictors, VIF provides insights into the reliability of regression coefficients and guides decisions in model refinement.

Variance Inflation Factor (VIF) in Multicollinearity

Appendix: Derivation of Var(β̂j) = σ²/((1-R²j)SXXj)

We'll derive the expression for the variance of the j-th regression coefficient in multiple linear regression. This derivation is key to understanding the Variance Inflation Factor (VIF).

Step 1: Start with the General Form of Var(β̂)

In matrix notation, the variance of the OLS estimator β̂ is given by:

\[ Var(\hat{\boldsymbol{\beta}}) = \sigma^2(X'X)^{-1} \]

where X is the design matrix and σ² is the error variance.

Step 2: Focus on the j-th Coefficient

The variance of the j-th coefficient, β̂j, is the j-th diagonal element of this matrix:

\[ Var(\hat{\beta}_j) = \sigma^2[(X'X)^{-1}]_{jj} \]

Step 3: Partition the X Matrix

Partition X into Xj (the j-th predictor) and X-j (all other predictors):

\[ X = [X_{-j} \quad X_j] \]

Step 4: Apply the Partitioned Inverse Formula

Using the partitioned inverse formula, we can express [(X'X)⁻¹]jj as:

\[ [(X'X)^{-1}]_{jj} = \frac{1}{X_j'M_{-j}X_j} \]

where M-j = I - X-j(X'-jX-j)⁻¹X'-j is the projection matrix onto the orthogonal complement of the space spanned by X-j.

Step 5: Interpret M-jXj

M-jXj represents the residuals from regressing Xj on all other predictors. We can write:

\[ M_{-j}X_j = X_j - \hat{X}_j \]

where X̂j is the predicted Xj from its regression on other predictors.

Step 6: Relate to R²j

The R-squared from regressing Xj on other predictors is:

\[ R_j^2 = 1 - \frac{(X_j - \hat{X}_j)'(X_j - \hat{X}_j)}{X_j'X_j} = 1 - \frac{X_j'M_{-j}X_j}{X_j'X_j} \]

Step 7: Solve for X'j M-j Xj

Rearranging the R²j equation:

\[ X_j'M_{-j}X_j = (1-R_j^2)X_j'X_j = (1-R_j^2)S_{XX_j} \]

where SXXj is the sum of squared deviations of Xj.

Step 8: Combine Results

Substituting back into the variance formula:

\[ Var(\hat{\beta}_j) = \sigma^2[(X'X)^{-1}]_{jj} = \sigma^2 \frac{1}{X_j'M_{-j}X_j} = \frac{\sigma^2}{(1-R_j^2)S_{XX_j}} \]

Conclusion

We have derived the expression:

\[ Var(\hat{\beta}_j) = \frac{\sigma^2}{(1-R_j^2)S_{XX_j}} \]

This formula shows how the variance of β̂j is inflated by a factor of 1/(1-R²j) due to multicollinearity, which is precisely the Variance Inflation Factor (VIF).