Conditions for Equivalence of MAP and Ridge Regression Estimates

1. Introduction

We'll explore the conditions under which Maximum A Posteriori (MAP) estimates in Bayesian inference are equivalent to Ridge Regression estimates in frequentist statistics.

2. Ridge Regression

Ridge Regression minimizes the following objective function:

\[ L(\beta) = \|y - X\beta\|^2 + \lambda\|\beta\|^2 \]

The Ridge estimate is given by:

\[ \hat{\beta}_{Ridge} = (X'X + \lambda I)^{-1}X'y \]

where λ > 0 is the regularization parameter.

3. Bayesian Framework

In the Bayesian framework, we seek to maximize the posterior probability:

\[ P(\beta|y) \propto P(y|\beta)P(\beta) \]

Where P(β|y) is the posterior, P(y|β) is the likelihood, and P(β) is the prior.

4. Conditions for Equivalence

MAP estimates are equal to Ridge Regression estimates under the following conditions:

4.1 Likelihood

The likelihood is Gaussian:

\[ P(y|\beta) \propto \exp\left(-\frac{1}{2\sigma^2}\|y - X\beta\|^2\right) \]

This assumes that the errors are normally distributed with variance σ².

4.2 Prior

The prior on β is also Gaussian:

\[ P(\beta) \propto \exp\left(-\frac{\lambda}{2\sigma^2}\|\beta\|^2\right) \]

This is equivalent to assuming β ~ N(0, σ²/λ · I).

4.3 Relationship between λ and prior variance

The regularization parameter λ in Ridge Regression corresponds to the ratio of error variance to prior variance in the Bayesian setting:

\[ \lambda = \frac{\sigma^2}{\tau^2} \]

where τ² is the variance of the prior distribution on β.

5. Proof of Equivalence

Under these conditions, the posterior distribution is:

\[ P(\beta|y) \propto \exp\left(-\frac{1}{2\sigma^2}(\|y - X\beta\|^2 + \lambda\|\beta\|^2)\right) \]

The MAP estimate maximizes this posterior, which is equivalent to minimizing:

\[ \|y - X\beta\|^2 + \lambda\|\beta\|^2 \]

This is exactly the Ridge Regression objective function.

6. Implications and Interpretations

7. Limitations