Proof of OLS Optimality for IID Data

1. Theorem

For independent and identically distributed (IID) data, no estimator whose data points are weighted cumulative sums of the IID observations can be better than the OLS estimator, even when the weights are optimally chosen.

2. Setup

Consider the simple linear regression model:

\[ Y_i = \beta X_i + \varepsilon_i, \quad i = 1, ..., n \]

Where ε_i are IID with mean 0 and variance σ².

3. OLS Estimator

The OLS estimator is:

\[ \hat{\beta}_{OLS} = \frac{\sum_{i=1}^n X_i Y_i}{\sum_{i=1}^n X_i^2} \]

4. Weighted Cumulative Sum Estimator

Consider an estimator of the form:

\[ \hat{\beta}_{W} = \frac{\sum_{i=1}^n w_i S_i}{\sum_{i=1}^n X_i^2} \]

Where S_i = Σ_j=1^i X_j Y_j and w_i are weights.

5. Proof

  1. Both estimators are unbiased: E[β̂_OLS] = E[β̂_W] = β
  2. Variance of OLS estimator:
    \[ Var(\hat{\beta}_{OLS}) = \frac{\sigma^2}{\sum_{i=1}^n X_i^2} \]
  3. Variance of weighted estimator:
    \[ Var(\hat{\beta}_{W}) = \frac{\sigma^2 \sum_{i=1}^n w_i^2 i}{\left(\sum_{i=1}^n X_i^2\right)^2} \]
  4. The difference in variances is:
    \[ Var(\hat{\beta}_{W}) - Var(\hat{\beta}_{OLS}) = \frac{\sigma^2}{\left(\sum_{i=1}^n X_i^2\right)^2} \left(\sum_{i=1}^n w_i^2 i - 1\right) \]
  5. By the Cauchy-Schwarz inequality:
    \[ \left(\sum_{i=1}^n w_i\right)^2 \leq n \sum_{i=1}^n w_i^2 \]
  6. For β̂_W to be unbiased, we must have Σw_i = 1. Therefore:
    \[ 1 \leq n \sum_{i=1}^n w_i^2 \]
  7. This implies:
    \[ \sum_{i=1}^n w_i^2 i \geq \frac{1}{n} \sum_{i=1}^n i = \frac{n+1}{2} > 1 \]
  8. Therefore, Var(β̂_W) - Var(β̂_OLS) > 0 for any