Theorem: Given a linear regression model with normally distributed errors, the Ordinary Least Squares (OLS) estimator follows a normal distribution.
Proof:
1. Consider the linear regression model:
where \(\boldsymbol{\varepsilon} \sim \mathcal{N}(\mathbf{0}, \sigma^2\mathbf{I})\).
2. The OLS estimator is given by:
3. Substituting the model equation:
4. The term \((\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\boldsymbol{\varepsilon}\) is a linear combination of the elements of \(\boldsymbol{\varepsilon}\), which are normally distributed.
5. A key property of normal distributions is that any linear combination of normal random variables is also normally distributed.
6. Therefore, \((\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\boldsymbol{\varepsilon}\) follows a normal distribution.
7. We can characterize this distribution:
8. Thus, \((\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\boldsymbol{\varepsilon} \sim \mathcal{N}(\mathbf{0}, \sigma^2(\mathbf{X}'\mathbf{X})^{-1})\)
9. Since \(\hat{\boldsymbol{\beta}}\) is the sum of a constant vector \(\boldsymbol{\beta}\) and a normally distributed random vector, it follows a normal distribution with a shifted mean:
Conclusion: The OLS estimator \(\hat{\boldsymbol{\beta}}\) is normally distributed around the true parameter \(\boldsymbol{\beta}\), with variance-covariance matrix \(\sigma^2(\mathbf{X}'\mathbf{X})^{-1}\), given that the errors are normally distributed.
When we only assume that the errors are independent and identically distributed (i.i.d.) but not necessarily normal, the distribution of the OLS estimator changes. We can use the Central Limit Theorem (CLT) to characterize its asymptotic distribution.
Assumptions:
Result:
Under these assumptions, the OLS estimator is asymptotically normal:
where \(\mathbf{Q} = \lim_{n \to \infty} \frac{1}{n}\mathbf{X}'\mathbf{X}\) and \(\xrightarrow{d}\) denotes convergence in distribution.
Explanation:
Key Differences:
Theorem: For the OLS estimator \(\hat{\boldsymbol{\beta}} = (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{y}\), the variance is given by:
Proof:
Conclusion: We have proven that:
This result is crucial in determining the precision of our OLS estimates and in constructing confidence intervals and hypothesis tests for the regression coefficients.