4. Fundamentals of Hypothesis Testing
4.1 Null and Alternative Hypotheses
Hypothesis testing involves two hypotheses:
- Null Hypothesis (H₀): A statement of no effect or no difference
- Alternative Hypothesis (H₁ or H_A): A statement of an effect or a difference
Example:
\[
\begin{aligned}
H_0: \mu &= \mu_0 \\
H_1: \mu &\neq \mu_0
\end{aligned}
\]
4.2 Type I and Type II Errors
- Type I Error (α): Rejecting H₀ when it's true
- Type II Error (β): Failing to reject H₀ when it's false
Relationship:
\[
\text{Power} = 1 - \beta = P(\text{Reject } H_0 | H_1 \text{ is true})
\]
4.3 Power of a Test
Power is the probability of correctly rejecting a false null hypothesis. It depends on:
- Sample size (n)
- Effect size (δ)
- Significance level (α)
For a two-sided z-test:
\[
\text{Power} = \Phi\left(\frac{\delta\sqrt{n}}{\sigma} - z_{1-\alpha/2}\right) + \Phi\left(-\frac{\delta\sqrt{n}}{\sigma} - z_{1-\alpha/2}\right)
\]
Where Φ is the standard normal cumulative distribution function.
4.4 p-values and Significance Levels
The p-value is the probability of obtaining test results at least as extreme as the observed results, assuming the null hypothesis is true.
Decision rule: Reject H₀ if p-value < α (significance level)
5. Parametric Tests
5.1 Z-test
Used when population standard deviation is known and sample size is large.
Test statistic:
\[
Z = \frac{\bar{X} - \mu_0}{\sigma / \sqrt{n}}
\]
5.2 T-test
Used when population standard deviation is unknown and sample size is small.
One-sample t-test statistic:
\[
t = \frac{\bar{X} - \mu_0}{s / \sqrt{n}}
\]
Two-sample t-test statistic (equal variances):
\[
t = \frac{(\bar{X}_1 - \bar{X}_2) - (\mu_1 - \mu_2)}{s_p \sqrt{\frac{2}{n}}}
\]
Where \(s_p\) is the pooled standard deviation.
5.3 ANOVA (Analysis of Variance)
Used to compare means of three or more groups.
F-statistic for one-way ANOVA:
\[
F = \frac{\text{Between-group variability}}{\text{Within-group variability}} = \frac{MS_B}{MS_W}
\]
5.4 Chi-square Tests
Used for categorical data analysis.
Chi-square statistic for goodness-of-fit:
\[
\chi^2 = \sum_{i=1}^k \frac{(O_i - E_i)^2}{E_i}
\]
Where O_i are observed frequencies and E_i are expected frequencies.
Interview-Style Question
Question: A company claims that their new process reduces the mean production time from 50 minutes to 45 minutes. A sample of 36 products produced with the new process has a mean production time of 47 minutes with a standard deviation of 6 minutes. At a 5% significance level, can we conclude that the new process has reduced the mean production time?
Solution:
- Set up hypotheses:
\[
\begin{aligned}
H_0: \mu &= 50 \\
H_1: \mu &< 50
\end{aligned}
\]
- Choose test: We'll use a one-sample t-test (unknown population standard deviation)
- Calculate t-statistic:
\[
t = \frac{\bar{X} - \mu_0}{s / \sqrt{n}} = \frac{47 - 50}{6 / \sqrt{36}} = -3
\]
- Degrees of freedom: df = n - 1 = 35
- Find critical value: t_critical (0.05, 35) ≈ -1.690 (one-tailed)
- Decision: Since t < t_critical, we reject H₀
- Alternatively, calculate p-value:
\[
p\text{-value} = P(T \leq -3) \approx 0.0024
\]
Conclusion: At a 5% significance level, we have sufficient evidence to conclude that the new process has reduced the mean production time.