In multiple linear regression with an intercept:
The sum of squared errors (SSE) is minimized:
Taking the partial derivative of SSE with respect to β₀ and setting it to zero:
This implies:
The normal equations for OLS are:
The term (y - Xβ) is the residual vector e. Including a vector of ones in X for the intercept leads to:
To prove that residuals are in the orthogonal space of X, we need to show that they are orthogonal to all columns of X.
1. Start with the normal equations:
2. Recognize that (y - X𝛃̂) is the residual vector e:
3. This equation means that e is orthogonal to every column of X, as the dot product of e with each column of X is zero.
4. In geometric terms, this means that the residual vector e is perpendicular to the subspace spanned by the columns of X.
5. Therefore, e lies in the orthogonal complement of the column space of X.
This orthogonality property has important implications:
These proofs demonstrate that:
These properties are fundamental to understanding the behavior of linear regression models and are crucial for various model diagnostics and inferential procedures.
Let's start with the linear regression model:
where y is the n×1 vector of responses, X is the n×p matrix of predictors (including a column of ones for the intercept), β is the p×1 vector of coefficients, and ε is the n×1 vector of errors.
The goal is to find β̂ that minimizes the sum of squared residuals:
Steps to derive the normal equations:
Solving for β gives us the OLS estimator:
Now that we have derived the normal equations, we can prove that the residuals sum to zero when an intercept is included:
The normal equations X'e = 0 also prove that the residuals are orthogonal to all columns of X:
This orthogonality has important implications: