Proof: Residuals Sum to Zero and Orthogonality of Residuals

1. Proof that Residuals Sum to Zero

Method 1: Using Partial Derivatives (Image 1)

In multiple linear regression with an intercept:

\[ \hat{y}_i = \beta_0 + \beta_1x_{i,1} + \beta_2x_{i,2} + ... + \beta_px_{i,p} \]

The sum of squared errors (SSE) is minimized:

\[ SSE = \sum_{i=1}^n (e_i)^2 = \sum_{i=1}^n (y_i - \hat{y}_i)^2 = \sum_{i=1}^n (y_i - \beta_0 - \beta_1x_{i,1} - \beta_2x_{i,2} - ... - \beta_px_{i,p})^2 \]

Taking the partial derivative of SSE with respect to β₀ and setting it to zero:

\[ \frac{\partial SSE}{\partial \beta_0} = \sum_{i=1}^n 2(y_i - \beta_0 - \beta_1x_{i,1} - \beta_2x_{i,2} - ... - \beta_px_{i,p})(-1) = -2\sum_{i=1}^n e_i = 0 \]

This implies:

\[ \sum_{i=1}^n e_i = 0 \]

Method 2: Using Normal Equations (Image 2)

The normal equations for OLS are:

\[ X'(y - X\beta) = 0 \]

The term (y - Xβ) is the residual vector e. Including a vector of ones in X for the intercept leads to:

\[ 1'e = 0 \implies \sum_{i=1}^n e_i = 0 \]

2. Generalized Proof that Residuals are in the Orthogonal Space

To prove that residuals are in the orthogonal space of X, we need to show that they are orthogonal to all columns of X.

1. Start with the normal equations:

\[ X'(y - X\hat{\beta}) = 0 \]

2. Recognize that (y - X𝛃̂) is the residual vector e:

\[ X'e = 0 \]

3. This equation means that e is orthogonal to every column of X, as the dot product of e with each column of X is zero.

4. In geometric terms, this means that the residual vector e is perpendicular to the subspace spanned by the columns of X.

5. Therefore, e lies in the orthogonal complement of the column space of X.

This orthogonality property has important implications:

It ensures that the residuals contain no linear information from the predictors that could improve the fit.
It's a key property in the decomposition of total sum of squares in regression analysis.
It's crucial for the efficiency of the OLS estimator under the Gauss-Markov assumptions.

3. Conclusion

These proofs demonstrate that:

The sum of residuals in a linear regression with an intercept is always zero.
The residuals are orthogonal to all predictor variables, including the intercept.

These properties are fundamental to understanding the behavior of linear regression models and are crucial for various model diagnostics and inferential procedures.

1. Derivation of the Normal Equations

Let's start with the linear regression model:

\[ \mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon} \]

where y is the n×1 vector of responses, X is the n×p matrix of predictors (including a column of ones for the intercept), β is the p×1 vector of coefficients, and ε is the n×1 vector of errors.

The goal is to find β̂ that minimizes the sum of squared residuals:

\[ SSR = (\mathbf{y} - \mathbf{X}\boldsymbol{\beta})^T(\mathbf{y} - \mathbf{X}\boldsymbol{\beta}) \]

Steps to derive the normal equations:

Expand the SSR expression:
\[ SSR = \mathbf{y}^T\mathbf{y} - \mathbf{y}^T\mathbf{X}\boldsymbol{\beta} - \boldsymbol{\beta}^T\mathbf{X}^T\mathbf{y} + \boldsymbol{\beta}^T\mathbf{X}^T\mathbf{X}\boldsymbol{\beta} \]
Take the derivative with respect to β:
\[ \frac{\partial SSR}{\partial \boldsymbol{\beta}} = -2\mathbf{X}^T\mathbf{y} + 2\mathbf{X}^T\mathbf{X}\boldsymbol{\beta} \]
Set the derivative to zero to find the minimum:
\[ -2\mathbf{X}^T\mathbf{y} + 2\mathbf{X}^T\mathbf{X}\boldsymbol{\beta} = 0 \]
Simplify to get the normal equations:
\[ \mathbf{X}^T\mathbf{X}\boldsymbol{\beta} = \mathbf{X}^T\mathbf{y} \]

Solving for β gives us the OLS estimator:

\[ \hat{\boldsymbol{\beta}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y} \]

2. Proof that Residuals Sum to Zero

Now that we have derived the normal equations, we can prove that the residuals sum to zero when an intercept is included:

Recall the normal equations:
\[ \mathbf{X}^T\mathbf{X}\hat{\boldsymbol{\beta}} = \mathbf{X}^T\mathbf{y} \]
Subtract X'X𝛃̂ from both sides:
\[ \mathbf{X}^T(\mathbf{y} - \mathbf{X}\hat{\boldsymbol{\beta}}) = 0 \]
Recognize that (y - X𝛃̂) is the residual vector e:
\[ \mathbf{X}^T\mathbf{e} = 0 \]
If X includes a column of ones for the intercept, the first equation in this system is:
\[ [1, 1, ..., 1] \begin{bmatrix} e_1 \\ e_2 \\ \vdots \\ e_n \end{bmatrix} = 0 \]
This simplifies to:
\[ \sum_{i=1}^n e_i = 0 \]

3. Orthogonality of Residuals

The normal equations X'e = 0 also prove that the residuals are orthogonal to all columns of X:

Each column of X, when dotted with e, gives zero.
This means e is perpendicular to the space spanned by the columns of X.
Therefore, e lies in the orthogonal complement of the column space of X.

This orthogonality has important implications:

It ensures that no linear combination of the predictors can explain any part of the residuals.
It's fundamental to the decomposition of variance in ANOVA.
It's crucial for the optimality properties of OLS under the Gauss-Markov assumptions.