Weighted Least Squares (WLS) and White's Standard Errors

1. Weighted Least Squares (WLS)

Weighted Least Squares is a generalization of Ordinary Least Squares (OLS) that can be used when the standard assumptions of constant variance (homoscedasticity) in the errors do not hold.

Key Idea:

In WLS, we give each observation a weight that is inversely proportional to its variance. Observations with higher variance (less reliable) are given lower weight, while observations with lower variance (more reliable) are given higher weight.

Mathematical Formulation:

Consider the linear model:

\[ \mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon} \]

where \(\text{Var}(\boldsymbol{\varepsilon}) = \boldsymbol{\Omega} = \text{diag}(\sigma_1^2, \sigma_2^2, \ldots, \sigma_n^2)\)

The WLS estimator is given by:

\[ \hat{\boldsymbol{\beta}}_{WLS} = (\mathbf{X}'\mathbf{W}\mathbf{X})^{-1}\mathbf{X}'\mathbf{W}\mathbf{y} \]

where \(\mathbf{W} = \boldsymbol{\Omega}^{-1} = \text{diag}(1/\sigma_1^2, 1/\sigma_2^2, \ldots, 1/\sigma_n^2)\)

Properties:

  1. WLS is the Best Linear Unbiased Estimator (BLUE) when the weights are correctly specified.
  2. The variance of the WLS estimator is: \(\text{Var}(\hat{\boldsymbol{\beta}}_{WLS}) = (\mathbf{X}'\mathbf{W}\mathbf{X})^{-1}\)
  3. WLS can be seen as transforming the original model to achieve homoscedasticity.

Challenges:

The main challenge in applying WLS is knowing the correct weights. In practice, these often need to be estimated, which can introduce additional uncertainty.

2. White's Standard Errors

White's standard errors, also known as heteroscedasticity-consistent (HC) standard errors, were introduced by Halbert White in 1980. They provide a way to obtain consistent estimates of standard errors when heteroscedasticity is present, without having to specify the form of the heteroscedasticity.

Key Idea:

Instead of trying to model the heteroscedasticity explicitly (as in WLS), White's approach uses the observed residuals to estimate the variance of the OLS estimator under heteroscedasticity.

Mathematical Formulation:

The variance-covariance matrix of the OLS estimator under heteroscedasticity is:

\[ \text{Var}(\hat{\boldsymbol{\beta}}_{OLS}) = (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\boldsymbol{\Omega}\mathbf{X}(\mathbf{X}'\mathbf{X})^{-1} \]

White's estimator replaces \(\boldsymbol{\Omega}\) with a diagonal matrix of squared OLS residuals:

\[ \widehat{\text{Var}}(\hat{\boldsymbol{\beta}}_{OLS}) = (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\text{diag}(\hat{\varepsilon}_1^2, \hat{\varepsilon}_2^2, \ldots, \hat{\varepsilon}_n^2)\mathbf{X}(\mathbf{X}'\mathbf{X})^{-1} \]

where \(\hat{\varepsilon}_i\) are the OLS residuals.

Properties:

  1. White's standard errors are consistent even under heteroscedasticity of unknown form.
  2. They allow for valid inference (t-tests, confidence intervals) in the presence of heteroscedasticity.
  3. They are widely implemented in statistical software packages.

Limitations:

  1. In small samples, White's standard errors can be biased downwards.
  2. They may be less efficient than correctly specified WLS if the true form of heteroscedasticity is known.

Comparison: WLS vs. White's Standard Errors

Weighted Least Squares (WLS) and White's Standard Errors

Appendix: Proof that WLS can be transformed into OLS

This proof demonstrates that a Weighted Least Squares problem can be transformed into an Ordinary Least Squares problem through a simple transformation of variables.

Given:

Consider the linear model:

\[ \mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon} \]

where \(\text{Var}(\boldsymbol{\varepsilon}) = \boldsymbol{\Omega} = \text{diag}(\sigma_1^2, \sigma_2^2, \ldots, \sigma_n^2)\)

Proof:

  1. Define the weight matrix \(\mathbf{W}\) as the inverse of the variance matrix:

    \[ \mathbf{W} = \boldsymbol{\Omega}^{-1} = \text{diag}(1/\sigma_1^2, 1/\sigma_2^2, \ldots, 1/\sigma_n^2) \]
  2. Define a transformation matrix \(\mathbf{P}\) such that \(\mathbf{P}'\mathbf{P} = \mathbf{W}\). We can choose \(\mathbf{P}\) as:

    \[ \mathbf{P} = \text{diag}(1/\sigma_1, 1/\sigma_2, \ldots, 1/\sigma_n) \]
  3. Transform the original model by pre-multiplying both sides by \(\mathbf{P}\):

    \[ \mathbf{P}\mathbf{y} = \mathbf{P}\mathbf{X}\boldsymbol{\beta} + \mathbf{P}\boldsymbol{\varepsilon} \]
  4. Define new variables:

    \[ \mathbf{y}^* = \mathbf{P}\mathbf{y}, \quad \mathbf{X}^* = \mathbf{P}\mathbf{X}, \quad \boldsymbol{\varepsilon}^* = \mathbf{P}\boldsymbol{\varepsilon} \]
  5. The transformed model is now:

    \[ \mathbf{y}^* = \mathbf{X}^*\boldsymbol{\beta} + \boldsymbol{\varepsilon}^* \]
  6. Check the variance of the transformed error term:

    \[ \begin{aligned} \text{Var}(\boldsymbol{\varepsilon}^*) &= \text{Var}(\mathbf{P}\boldsymbol{\varepsilon}) \\ &= \mathbf{P}\text{Var}(\boldsymbol{\varepsilon})\mathbf{P}' \\ &= \mathbf{P}\boldsymbol{\Omega}\mathbf{P}' \\ &= \mathbf{P}\mathbf{W}^{-1}\mathbf{P}' \\ &= \mathbf{P}(\mathbf{P}'\mathbf{P})^{-1}\mathbf{P}' \\ &= \mathbf{P}\mathbf{P}^{-1}(\mathbf{P}')^{-1}\mathbf{P}' \\ &= \mathbf{I} \end{aligned} \]
  7. The transformed model now has homoscedastic errors (constant variance).

  8. Apply OLS to the transformed model:

    \[ \begin{aligned} \hat{\boldsymbol{\beta}}_{OLS} &= (\mathbf{X}^{*'}\mathbf{X}^*)^{-1}\mathbf{X}^{*'}\mathbf{y}^* \\ &= ((\mathbf{P}\mathbf{X})'(\mathbf{P}\mathbf{X}))^{-1}(\mathbf{P}\mathbf{X})'\mathbf{P}\mathbf{y} \\ &= (\mathbf{X}'\mathbf{P}'\mathbf{P}\mathbf{X})^{-1}\mathbf{X}'\mathbf{P}'\mathbf{P}\mathbf{y} \\ &= (\mathbf{X}'\mathbf{W}\mathbf{X})^{-1}\mathbf{X}'\mathbf{W}\mathbf{y} \end{aligned} \]
  9. This final expression is identical to the WLS estimator.

Conclusion:

We have shown that by applying an appropriate transformation, a WLS problem can be converted into an OLS problem. The OLS estimator of the transformed model is identical to the WLS estimator of the original model. This proof demonstrates why WLS is the Best Linear Unbiased Estimator (BLUE) when the weights are correctly specified, as it reduces to OLS in a transformed space where the Gauss-Markov assumptions hold.