In one-way ANOVA we are testing if observations from different populations have a different mean, which can be considered as the one factor affecting such observations. In two-way ANOVA we consider the possibility that two factors affect observations. As a first step, it is useful to reconsider one-way ANOVA in a slightly different light. What we are implicitly assuming is that each random variable can be expressed as the sum of an unknown value plus a random disturbance
where . Then, E[Xij] = μi, which is the only factor affecting the expected value of the observations. If we denote the average expected value by μ, where
we may write
where αi = μi − μ and . Hence, the average value of αi is zero, but the null hypothesis of one-way ANOVA is much stronger, since it amounts to saying that there is no effect due to αi, and this is true if αi = 0 for all i.
We may generalize the idea and consider two factors
where
In this case, we are taking into consideration the presence of two factors, which are not interacting. If we want to account for interaction, we should extend the model to
If we organize observations in rows indexed by i and columns indexed j, we may test the following hypotheses:
- There is no row effect, i.e., αi = 0, for all i.
- There is no column effect, i.e., βj = 0, for all j.
- There is no effect due to interaction, i.e., γij = 0, for all i and j.
Let us consider the first case, assuming that there is no interaction and that variance is σ2, for all i and j:
As in one-way ANOVA, we build different estimators of σ2, one of which is unbiased only if the null hypothesis is true. To obtain an estimator that is always valid, let us consider
This is a chi-square variable with nm degrees of freedom, if observations are normal and independent. To estimate the unknown parameters, we consider the appropriate sample means
We should recall that, since the sum of the parameters αi is zero, we need to estimate only m − 1 of them; by the same token, we need to estimate only n − 1 parameters βj. So, we need to estimate a grand total of
parameters. Then, if we plug the above estimators into Eq. (9.36), we find that
is chi-square with
degrees of freedom. Then, if we define the sum of squared errors as
we have
Therefore, we have built an unbiased estimator of variance. Now, we build another estimator, which is unbiased only under the null hypothesis. In fact, under H0, we have:
Since , the sum of squared standardized variables
is a chi-square variable with m degrees of freedom, if the null hypothesis is true. Replacing μ by its estimator .., we lose one degree of freedom. So, if we define the row sum of squares
we have
Therefore, we have another estimator of variance, but this one is unbiased only under the null hypothesis. When H0 is not true, this estimator tends to overestimates σ2. Then, we may run a test based on the test statistic
which, under H0, has F distribution with (m − 1) and (m − l)(n − 1) degrees of freedom. Given a significance level α, we reject the null hypothesis that there is no row effect if
Clearly, a similar route can be taken to check for column effects, where we define a column sum of squares
which is related to a chi-square variable with n − 1 degrees of freedom, and we reject the null hypothesis that there is no column effect if
The case with interactions is a bit trickier, but it follows the same conceptual path.
Leave a Reply