Distribution of OLS Estimator: Normal Errors vs. i.i.d. Errors

Part 1: Normality of OLS Estimator with Normal Errors

Theorem: Given a linear regression model with normally distributed errors, the Ordinary Least Squares (OLS) estimator follows a normal distribution.

Proof:

1. Consider the linear regression model:

\[ \mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon} \]

where \(\boldsymbol{\varepsilon} \sim \mathcal{N}(\mathbf{0}, \sigma^2\mathbf{I})\).

2. The OLS estimator is given by:

\[ \hat{\boldsymbol{\beta}} = (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{y} \]

3. Substituting the model equation:

\[ \begin{aligned} \hat{\boldsymbol{\beta}} &= (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'(\mathbf{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon}) \\ &= \boldsymbol{\beta} + (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\boldsymbol{\varepsilon} \end{aligned} \]

4. The term \((\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\boldsymbol{\varepsilon}\) is a linear combination of the elements of \(\boldsymbol{\varepsilon}\), which are normally distributed.

5. A key property of normal distributions is that any linear combination of normal random variables is also normally distributed.

6. Therefore, \((\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\boldsymbol{\varepsilon}\) follows a normal distribution.

7. We can characterize this distribution:

\[ \begin{aligned} \mathbb{E}[(\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\boldsymbol{\varepsilon}] &= (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbb{E}[\boldsymbol{\varepsilon}] = \mathbf{0} \\ \text{Var}[(\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\boldsymbol{\varepsilon}] &= (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\text{Var}(\boldsymbol{\varepsilon})\mathbf{X}(\mathbf{X}'\mathbf{X})^{-1} \\ &= \sigma^2(\mathbf{X}'\mathbf{X})^{-1} \end{aligned} \]

8. Thus, \((\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\boldsymbol{\varepsilon} \sim \mathcal{N}(\mathbf{0}, \sigma^2(\mathbf{X}'\mathbf{X})^{-1})\)

9. Since \(\hat{\boldsymbol{\beta}}\) is the sum of a constant vector \(\boldsymbol{\beta}\) and a normally distributed random vector, it follows a normal distribution with a shifted mean:

\[ \hat{\boldsymbol{\beta}} \sim \mathcal{N}(\boldsymbol{\beta}, \sigma^2(\mathbf{X}'\mathbf{X})^{-1}) \]

Conclusion: The OLS estimator \(\hat{\boldsymbol{\beta}}\) is normally distributed around the true parameter \(\boldsymbol{\beta}\), with variance-covariance matrix \(\sigma^2(\mathbf{X}'\mathbf{X})^{-1}\), given that the errors are normally distributed.

Part 2: Distribution of OLS Estimator with i.i.d. Errors (not necessarily normal)

When we only assume that the errors are independent and identically distributed (i.i.d.) but not necessarily normal, the distribution of the OLS estimator changes. We can use the Central Limit Theorem (CLT) to characterize its asymptotic distribution.

Assumptions:

Result:

Under these assumptions, the OLS estimator is asymptotically normal:

\[ \sqrt{n}(\hat{\boldsymbol{\beta}} - \boldsymbol{\beta}) \xrightarrow{d} \mathcal{N}(\mathbf{0}, \sigma^2 \mathbf{Q}^{-1}) \]

where \(\mathbf{Q} = \lim_{n \to \infty} \frac{1}{n}\mathbf{X}'\mathbf{X}\) and \(\xrightarrow{d}\) denotes convergence in distribution.

Explanation:

  1. The OLS estimator can still be written as \(\hat{\boldsymbol{\beta}} = \boldsymbol{\beta} + (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\boldsymbol{\varepsilon}\).
  2. The term \((\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\boldsymbol{\varepsilon}\) is a sum of i.i.d. random variables (transformed by the matrix \((\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\)).
  3. By the CLT, this sum, when properly normalized, converges to a normal distribution as the sample size increases.
  4. The asymptotic normality holds regardless of the specific distribution of the errors, as long as they are i.i.d. with finite variance.

Key Differences:

  1. Finite Sample vs. Asymptotic: With normal errors, the OLS estimator is exactly normally distributed for any sample size. With i.i.d. errors, normality is an asymptotic result.
  2. Convergence Rate: The \(\sqrt{n}\) factor in the asymptotic result indicates the rate of convergence to normality.
  3. Robustness: The asymptotic result is more robust as it holds for a wider class of error distributions.
  4. Inference: For small samples with non-normal errors, standard inferential procedures based on normality may not be reliable.
Distribution of OLS Estimator: Normal Errors vs. i.i.d. Errors

Appendix: Proof of Variance of OLS Estimator

Theorem: For the OLS estimator \(\hat{\boldsymbol{\beta}} = (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{y}\), the variance is given by:

\[ \text{Var}[(\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\boldsymbol{\varepsilon}] = (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\text{Var}(\boldsymbol{\varepsilon})\mathbf{X}(\mathbf{X}'\mathbf{X})^{-1} = \sigma^2(\mathbf{X}'\mathbf{X})^{-1} \]

Proof:

  1. Recall that \(\hat{\boldsymbol{\beta}} = \boldsymbol{\beta} + (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\boldsymbol{\varepsilon}\).
  2. Since \(\boldsymbol{\beta}\) is a constant, \(\text{Var}(\hat{\boldsymbol{\beta}}) = \text{Var}[(\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\boldsymbol{\varepsilon}]\).
  3. Let \(\mathbf{A} = (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\). We need to find \(\text{Var}(\mathbf{A}\boldsymbol{\varepsilon})\).
  4. For a random vector \(\mathbf{z}\) and a constant matrix \(\mathbf{B}\), we have the property: \(\text{Var}(\mathbf{B}\mathbf{z}) = \mathbf{B}\text{Var}(\mathbf{z})\mathbf{B}'\).
  5. Applying this property:
    \[ \begin{aligned} \text{Var}(\mathbf{A}\boldsymbol{\varepsilon}) &= \mathbf{A}\text{Var}(\boldsymbol{\varepsilon})\mathbf{A}' \\ &= (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\text{Var}(\boldsymbol{\varepsilon})\mathbf{X}(\mathbf{X}'\mathbf{X})^{-1} \end{aligned} \]
  6. We assume \(\text{Var}(\boldsymbol{\varepsilon}) = \sigma^2\mathbf{I}\), where \(\mathbf{I}\) is the identity matrix. Substituting this:
    \[ \begin{aligned} \text{Var}(\mathbf{A}\boldsymbol{\varepsilon}) &= (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\sigma^2\mathbf{I}\mathbf{X}(\mathbf{X}'\mathbf{X})^{-1} \\ &= \sigma^2(\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{X}(\mathbf{X}'\mathbf{X})^{-1} \\ &= \sigma^2(\mathbf{X}'\mathbf{X})^{-1} \end{aligned} \]

Conclusion: We have proven that:

\[ \text{Var}[(\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\boldsymbol{\varepsilon}] = (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\text{Var}(\boldsymbol{\varepsilon})\mathbf{X}(\mathbf{X}'\mathbf{X})^{-1} = \sigma^2(\mathbf{X}'\mathbf{X})^{-1} \]

This result is crucial in determining the precision of our OLS estimates and in constructing confidence intervals and hypothesis tests for the regression coefficients.