Hat Matrix in Linear Regression: Geometric Interpretation and Leverage Points

1. Definition of the Hat Matrix

In linear regression, the hat matrix H is defined as:

\[ \mathbf{H} = \mathbf{X}(\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}' \]

where X is the design matrix of predictors.

The hat matrix relates the observed response values y to the fitted values ŷ:

\[ \hat{\mathbf{y}} = \mathbf{H}\mathbf{y} \]

2. Geometric Interpretation

The hat matrix has several important geometric interpretations:

2.1 Projection Matrix

H is a projection matrix that projects the response vector y onto the column space of X. Geometrically, this means:

2.2 Properties of H

2.3 Geometric Meaning of h_ii

The diagonal elements h_ii of H have a special interpretation:

3. Leverage Points

Leverage points are observations that have a large influence on the regression model due to their extreme values in the predictor space.

3.1 Identification of Leverage Points

Leverage points are identified using the diagonal elements h_ii of the hat matrix:

3.2 Geometric Interpretation of Leverage

Geometrically, leverage points are observations that:

3.3 Impact of Leverage Points

4. Relationship Between Hat Matrix and Leverage Points

The hat matrix provides a direct way to quantify the leverage of each observation:

In essence, the hat matrix bridges the gap between the algebraic formulation of linear regression and its geometric interpretation, with leverage points being a key concept in understanding the influence of individual observations on the regression model.