Fundamentals of Hypothesis Testing and Parametric Tests

4. Fundamentals of Hypothesis Testing

4.1 Null and Alternative Hypotheses

Hypothesis testing involves two hypotheses:

Example:

\[ \begin{aligned} H_0: \mu &= \mu_0 \\ H_1: \mu &\neq \mu_0 \end{aligned} \]

4.2 Type I and Type II Errors

Relationship:

\[ \text{Power} = 1 - \beta = P(\text{Reject } H_0 | H_1 \text{ is true}) \]

4.3 Power of a Test

Power is the probability of correctly rejecting a false null hypothesis. It depends on:

For a two-sided z-test:

\[ \text{Power} = \Phi\left(\frac{\delta\sqrt{n}}{\sigma} - z_{1-\alpha/2}\right) + \Phi\left(-\frac{\delta\sqrt{n}}{\sigma} - z_{1-\alpha/2}\right) \]

Where Φ is the standard normal cumulative distribution function.

4.4 p-values and Significance Levels

The p-value is the probability of obtaining test results at least as extreme as the observed results, assuming the null hypothesis is true.

Decision rule: Reject H₀ if p-value < α (significance level)

5. Parametric Tests

5.1 Z-test

Used when population standard deviation is known and sample size is large.

Test statistic:

\[ Z = \frac{\bar{X} - \mu_0}{\sigma / \sqrt{n}} \]

5.2 T-test

Used when population standard deviation is unknown and sample size is small.

One-sample t-test statistic:

\[ t = \frac{\bar{X} - \mu_0}{s / \sqrt{n}} \]

Two-sample t-test statistic (equal variances):

\[ t = \frac{(\bar{X}_1 - \bar{X}_2) - (\mu_1 - \mu_2)}{s_p \sqrt{\frac{2}{n}}} \]

Where \(s_p\) is the pooled standard deviation.

5.3 ANOVA (Analysis of Variance)

Used to compare means of three or more groups.

F-statistic for one-way ANOVA:

\[ F = \frac{\text{Between-group variability}}{\text{Within-group variability}} = \frac{MS_B}{MS_W} \]

5.4 Chi-square Tests

Used for categorical data analysis.

Chi-square statistic for goodness-of-fit:

\[ \chi^2 = \sum_{i=1}^k \frac{(O_i - E_i)^2}{E_i} \]

Where O_i are observed frequencies and E_i are expected frequencies.

Interview-Style Question

Question: A company claims that their new process reduces the mean production time from 50 minutes to 45 minutes. A sample of 36 products produced with the new process has a mean production time of 47 minutes with a standard deviation of 6 minutes. At a 5% significance level, can we conclude that the new process has reduced the mean production time?

Solution:

  1. Set up hypotheses:
    \[ \begin{aligned} H_0: \mu &= 50 \\ H_1: \mu &< 50 \end{aligned} \]
  2. Choose test: We'll use a one-sample t-test (unknown population standard deviation)
  3. Calculate t-statistic:
    \[ t = \frac{\bar{X} - \mu_0}{s / \sqrt{n}} = \frac{47 - 50}{6 / \sqrt{36}} = -3 \]
  4. Degrees of freedom: df = n - 1 = 35
  5. Find critical value: t_critical (0.05, 35) ≈ -1.690 (one-tailed)
  6. Decision: Since t < t_critical, we reject H₀
  7. Alternatively, calculate p-value:
    \[ p\text{-value} = P(T \leq -3) \approx 0.0024 \]

Conclusion: At a 5% significance level, we have sufficient evidence to conclude that the new process has reduced the mean production time.