Two-way ANOVA

In one-way ANOVA we are testing if observations from different populations have a different mean, which can be considered as the one factor affecting such observations. In two-way ANOVA we consider the possibility that two factors affect observations. As a first step, it is useful to reconsider one-way ANOVA in a slightly different light. What we are implicitly assuming is that each random variable can be expressed as the sum of an unknown value plus a random disturbance

where . Then, E[X_ij] = μ_i, which is the only factor affecting the expected value of the observations. If we denote the average expected value by μ, where

we may write

where α_i = μ_i − μ and . Hence, the average value of α_i is zero, but the null hypothesis of one-way ANOVA is much stronger, since it amounts to saying that there is no effect due to α_i, and this is true if α_i = 0 for all i.

We may generalize the idea and consider two factors

where

In this case, we are taking into consideration the presence of two factors, which are not interacting. If we want to account for interaction, we should extend the model to

If we organize observations in rows indexed by i and columns indexed j, we may test the following hypotheses:

There is no row effect, i.e., α_i = 0, for all i.
There is no column effect, i.e., β_j = 0, for all j.
There is no effect due to interaction, i.e., γ_ij = 0, for all i and j.

Let us consider the first case, assuming that there is no interaction and that variance is σ², for all i and j:

As in one-way ANOVA, we build different estimators of σ², one of which is unbiased only if the null hypothesis is true. To obtain an estimator that is always valid, let us consider

This is a chi-square variable with nm degrees of freedom, if observations are normal and independent. To estimate the unknown parameters, we consider the appropriate sample means

We should recall that, since the sum of the parameters α_i is zero, we need to estimate only m − 1 of them; by the same token, we need to estimate only n − 1 parameters β_j. So, we need to estimate a grand total of

parameters. Then, if we plug the above estimators into Eq. (9.36), we find that

is chi-square with

degrees of freedom. Then, if we define the sum of squared errors as

we have

Therefore, we have built an unbiased estimator of variance. Now, we build another estimator, which is unbiased only under the null hypothesis. In fact, under H₀, we have:

Since , the sum of squared standardized variables

is a chi-square variable with m degrees of freedom, if the null hypothesis is true. Replacing μ by its estimator .., we lose one degree of freedom. So, if we define the row sum of squares

we have

Therefore, we have another estimator of variance, but this one is unbiased only under the null hypothesis. When H₀ is not true, this estimator tends to overestimates σ². Then, we may run a test based on the test statistic

which, under H₀, has F distribution with (m − 1) and (m − l)(n − 1) degrees of freedom. Given a significance level α, we reject the null hypothesis that there is no row effect if

Clearly, a similar route can be taken to check for column effects, where we define a column sum of squares

which is related to a chi-square variable with n − 1 degrees of freedom, and we reject the null hypothesis that there is no column effect if

The case with interactions is a bit trickier, but it follows the same conceptual path.

Comments

Leave a Reply Cancel reply