Analysis of variance (ANOVA) is the collective name of an array of methods that find wide applications in inferential statistics. In essence, we compare groups of observations in order to check if there are significant differences between them, which may be attributed to the impact of underlying factors. One such case occurs when we compare sample means taken from m populations, in order to test the hypothesis that the respective expected values are all the same. Note that, so far, we only considered two populations; with ANOVA we may check an arbitrary number of populations. The ability to analyze the impact of factors is also useful to assess the significance of statistical models, as well as to design statistical experiments. The approach relies on the comparison of different estimates of variance, which should not be significantly different, if factors are not relevant; if we find a statistically significant difference in estimates, then we may reject the hypothesis that factors have no impact.
In this section, we take a somewhat limited view, which is nevertheless able to convey the essentials of ANOVA. We consider two simple and specific cases:
- One-way ANOVA, whereby we assume that there is one factor at work.
- Two-way ANOVA, whereby we assume that there are two factors at work.
We will take another view at ANOVA in the context of linear regression in Section 10.3.4.
9.6.1 One-way ANOVA
In Section 9.4.1 we considered a test concerning the hypothesis that the means of two (normal) populations are the same:
It is easy to imagine situations in which we want to check a similar claim for more than two populations. To set the stage for the following treatment, let us assume that we have m normal populations, i = 1,…, m, and that we take a sample of n elements from each population. If the number of observations for each population is the same, we have a balanced design; otherwise, we have an unbalanced design. Formally, we are considering the following random variables:
where the subscripts i and j refer to populations and observations, respectively. As usual, all observations are assumed independent. We denote by μi the unknown expected value of population i. Formally, the null hypothesis we want to test is
against the alternative Ha that not all expected values are the same. Another key assumption concerns population variances. They are unknown, but it is assumed that all of them have the same value σ. This might seem a bold assumption, but keep in mind that we want to check the equality of the expected values or, more informally, if there is any significant difference among the populations; hence, in terms of null hypothesis, it is natural to assume the same variance.
Since we have m samples of size n, we have a grand total of nm independent, normally distributed observations. If we standardize, square, and add all of them, we obtain a chi-square random variable with nm degrees of freedom
Since expected values are unknown, we should replace them with sample means for each population
where the notation points out that this is a sample mean obtained by summing over the second subscript j. If we plug these sample means into Eq. (9.33), we get the random variable
as the sum of squares within samples, since deviations are taken with respect to each expected value within each population. This is again a chi-square variable, but with nm − m degrees of freedom, since we have estimated m expected values. Given the properties of chi-square variables, we obtain
which means that SSw/(nm − m) is an unbiased estimator of σ2, regardless of whether the null hypothesis H0 is true.
Now we build another estimator of σ2, which is unbiased only if H0 holds, i.e., if the expected values are the same: μi = μ, for i = 1,…, m. In such a case, we could estimate μ by taking the overall sample mean
Then, to estimate σ2, we could take a different route. Let us define the sum of squares between samples
To see the rationale behind the definition, let us observe that, under the null hypothesis, the variables
are standard normal. If we square and sum these variables, we get the following chi-square variable:
If we plug Eq. (9.35) into the sum above to replace the unknown expected value μ, under the null hypothesis, we obtain
This implies that, under H0
Table 9.5 Sample data for one-way ANOVA.
- E[SSb]/σ2 = m − 1
- SSb/(m − 1) is an unbiased estimator of σ2
To summarize, we have two estimators for the unknown variance: SSw/(nm − m) is always unbiased; SSb/(m − 1) is unbiased only if the means are all the same. Then, under the null hypothesis, the ratio of the two estimators should be close to 1. Moreover, it can be shown that SSb/(m − 1) tends to overestimate σ2 if H0 is not true. Then, we consider the following test statistic:
which under H0 is a F variable with m − 1 and nm − m degrees of freedom. We reject the hypothesis when the test statistic is too large. More precisely, if F1−α, m−1, nm−m is the (1 − α)-quantile of the F distribution, we obtain a test with significance level α if we reject when TS > F1−α, m−1, nm−m.
Example 9.24 Let us apply one-way ANOVA to the data listed in Table 9.5, where we have three samples of size n = 6, taken from m = 3 populations. The first step is to calculate the sums
We observe that the three sample means do look rather different. Now we should test the null hypothesis
We proceed calculating the following sums of squares:20
Thus, we find the following alternative estimates of the unknown variance σ2:
which do look different, at first sight. The test statistic is
and, assuming a significance level α = 5%, it should be compared with the following quantile of the F distribution with 2 and 15 degrees of freedom:
We see that the test statistic does not fall into the rejection region and, therefore, the apparent difference in sample means is not statistically significant. Actually, using the CDF of the F distribution, we obtain the p-value
which is pretty large. In order to reject the null hypothesis, we should accept a very large probability of a type I error.
This procedure can be easily adapted to the unbalanced case, where the m samples have not the same size. It is often argued, however, that a balanced design is preferable for nonnormal populations, as the resulting test is a bit more robust to lack of normality.
Leave a Reply