Starting Point: Linear Regression Model
We begin with the linear regression model:
\[
\mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon}
\]
where:
- \(\mathbf{y}\) is an \(n \times 1\) vector of observations
- \(\mathbf{X}\) is an \(n \times p\) matrix of predictors
- \(\boldsymbol{\beta}\) is a \(p \times 1\) vector of coefficients
- \(\boldsymbol{\varepsilon}\) is an \(n \times 1\) vector of errors
Assumptions
We assume that the errors are normally distributed:
\[
\boldsymbol{\varepsilon} \sim \mathcal{N}(\mathbf{0}, \sigma^2\mathbf{I})
\]
This means each \(\varepsilon_i\) is independently and identically distributed as \(\mathcal{N}(0, \sigma^2)\).
Deriving the Likelihood Function
1. Given \(\mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon}\), we can write:
\[
\boldsymbol{\varepsilon} = \mathbf{y} - \mathbf{X}\boldsymbol{\beta}
\]
2. Since \(\boldsymbol{\varepsilon} \sim \mathcal{N}(\mathbf{0}, \sigma^2\mathbf{I})\), we know that \(\mathbf{y}\) given \(\boldsymbol{\beta}\) and \(\mathbf{X}\) follows a multivariate normal distribution:
\[
\mathbf{y} | \boldsymbol{\beta}, \mathbf{X} \sim \mathcal{N}(\mathbf{X}\boldsymbol{\beta}, \sigma^2\mathbf{I})
\]
3. The probability density function of a multivariate normal distribution \(\mathcal{N}(\boldsymbol{\mu}, \boldsymbol{\Sigma})\) is given by:
\[
f(\mathbf{x}) = (2\pi)^{-n/2} |\boldsymbol{\Sigma}|^{-1/2} \exp\left(-\frac{1}{2}(\mathbf{x} - \boldsymbol{\mu})^T \boldsymbol{\Sigma}^{-1} (\mathbf{x} - \boldsymbol{\mu})\right)
\]
4. In our case:
- \(\mathbf{x} = \mathbf{y}\)
- \(\boldsymbol{\mu} = \mathbf{X}\boldsymbol{\beta}\)
- \(\boldsymbol{\Sigma} = \sigma^2\mathbf{I}\)
5. Note that \(|\sigma^2\mathbf{I}| = (\sigma^2)^n\) and \((\sigma^2\mathbf{I})^{-1} = \frac{1}{\sigma^2}\mathbf{I}\)
6. Substituting these into the multivariate normal PDF:
\[
\begin{aligned}
P(\mathbf{y}|\boldsymbol{\beta}, \mathbf{X}) &= (2\pi)^{-n/2} |\sigma^2\mathbf{I}|^{-1/2} \exp\left(-\frac{1}{2}(\mathbf{y} - \mathbf{X}\boldsymbol{\beta})^T (\sigma^2\mathbf{I})^{-1} (\mathbf{y} - \mathbf{X}\boldsymbol{\beta})\right) \\[10pt]
&= (2\pi)^{-n/2} (\sigma^2)^{-n/2} \exp\left(-\frac{1}{2\sigma^2}(\mathbf{y} - \mathbf{X}\boldsymbol{\beta})^T (\mathbf{y} - \mathbf{X}\boldsymbol{\beta})\right) \\[10pt]
&= (2\pi\sigma^2)^{-n/2} \exp\left(-\frac{1}{2\sigma^2}(\mathbf{y} - \mathbf{X}\boldsymbol{\beta})^T(\mathbf{y} - \mathbf{X}\boldsymbol{\beta})\right)
\end{aligned}
\]
Final Result
Thus, we arrive at the likelihood function:
\[
P(\mathbf{y}|\boldsymbol{\beta}, \mathbf{X}) = (2\pi\sigma^2)^{-n/2} \exp\left(-\frac{1}{2\sigma^2}(\mathbf{y} - \mathbf{X}\boldsymbol{\beta})^T(\mathbf{y} - \mathbf{X}\boldsymbol{\beta})\right)
\]
This likelihood function represents the probability of observing the data \(\mathbf{y}\) given the parameters \(\boldsymbol{\beta}\) and the predictors \(\mathbf{X}\), under the assumption of normally distributed errors.