In one-way ANOVA we are testing if observations from different populations have a different mean, which can be considered as the one factor affecting such observations. In two-way ANOVA we consider the possibility that two factors affect observations. As a first step, it is useful to reconsider one-way ANOVA in a slightly different light. What we are implicitly assuming is that each random variable can be expressed as the sum of an unknown value plus a random disturbance

where . Then, E[Xij] = μi, which is the only factor affecting the expected value of the observations. If we denote the average expected value by μ, where

we may write

where αi = μi − μ and . Hence, the average value of αi is zero, but the null hypothesis of one-way ANOVA is much stronger, since it amounts to saying that there is no effect due to αi, and this is true if αi = 0 for all i.
We may generalize the idea and consider two factors

where

In this case, we are taking into consideration the presence of two factors, which are not interacting. If we want to account for interaction, we should extend the model to

If we organize observations in rows indexed by i and columns indexed j, we may test the following hypotheses:
- There is no row effect, i.e., αi = 0, for all i.
- There is no column effect, i.e., βj = 0, for all j.
- There is no effect due to interaction, i.e., γij = 0, for all i and j.
Let us consider the first case, assuming that there is no interaction and that variance is σ2, for all i and j:

As in one-way ANOVA, we build different estimators of σ2, one of which is unbiased only if the null hypothesis is true. To obtain an estimator that is always valid, let us consider

This is a chi-square variable with nm degrees of freedom, if observations are normal and independent. To estimate the unknown parameters, we consider the appropriate sample means

We should recall that, since the sum of the parameters αi is zero, we need to estimate only m − 1 of them; by the same token, we need to estimate only n − 1 parameters βj. So, we need to estimate a grand total of

parameters. Then, if we plug the above estimators into Eq. (9.36), we find that

is chi-square with

degrees of freedom. Then, if we define the sum of squared errors as

we have

Therefore, we have built an unbiased estimator of variance. Now, we build another estimator, which is unbiased only under the null hypothesis. In fact, under H0, we have:

Since , the sum of squared standardized variables

is a chi-square variable with m degrees of freedom, if the null hypothesis is true. Replacing μ by its estimator .., we lose one degree of freedom. So, if we define the row sum of squares

we have

Therefore, we have another estimator of variance, but this one is unbiased only under the null hypothesis. When H0 is not true, this estimator tends to overestimates σ2. Then, we may run a test based on the test statistic

which, under H0, has F distribution with (m − 1) and (m − l)(n − 1) degrees of freedom. Given a significance level α, we reject the null hypothesis that there is no row effect if

Clearly, a similar route can be taken to check for column effects, where we define a column sum of squares

which is related to a chi-square variable with n − 1 degrees of freedom, and we reject the null hypothesis that there is no column effect if

The case with interactions is a bit trickier, but it follows the same conceptual path.
Leave a Reply