Estimating and testing variance

It is easy to prove that sample variance S² is an unbiased estimator of variance σ², but if we want a confidence interval for variance, we need distributional results on S², which depend on the underlying population. For a normal population we may take advantage of Theorem 9.4. In particular, we recall that the sample variance is related to the chi-square distribution as follows:

where denotes a chi-square distribution with n − 1 degrees of freedom. The conceptual path we should follow is the same that have seen for the expected value, with a slight difference: Unlike the t distribution, the chi-square distribution is not symmetric and has only positive support (see Fig. 7.18). To build a confidence interval with confidence level (1 − α), we need the two quantiles, and , defined by the conditions¹⁵

where . Both quantiles are positive, and . Then we have

which after rearranging leads to the confidence interval

We may also test a hypothesis about variance, just as we did for the expected value:

The distributional result above implies that, under the null hypothesis, we have

From Section 7.7.2 we also recall that the expected value of a variable is n − 1. Then, if the test statistic has a value that is sensibly smaller or sensibly larger than n − 1, we should question the null hypothesis. The test procedure is the following:

Example 9.18 The following sample:

was generated by a pseudorandom variate generator; the underlying distribution was normal, with μ = 10 and σ = 20. Now let us forget what we know and build a confidence interval for standard deviation, with confidence level 95%. The sample standard deviation is

To apply Eq. (9.20) we need the following quantiles from the chi-square distribution with 9 degrees of freedom:

These quantiles, unlike those from the normal distribution, are not symmetric and yield the confidence interval

Taking square roots, we notice that the confidence interval for the standard deviation

does not include the true value σ = 20, which may happen with probability 5%. If we want to test the null hypothesis

against the alternative hypothesis H_a : σ ≠ 20, with significance level α = 0.05, we calculate the test statistic according to Eq. (9.21):

This value looks pretty large. In fact, , and the null hypothesis is (incorrectly) rejected. Since this was a two-tail test, we could have equivalently observed that σ₀ = 20 was not included in the confidence interval above. Again, we must be aware that type I errors are a real possibility.

If we have to compare the variances of two populations, we should run a test such as

If both populations are normal and we take two independent samples of size n₁ and n₂, respectively, from Theorem 9.4 we know that

where and and are the sample variances for the two samples. Hence, we have two independent chi-square variables. In Section 7.7.2 we have seen that the ratio of two independent chi-square variables is related to the F distribution; more precisely, has F distribution with n₁ − 1 and n₂ − 1 degrees of freedom. Then, under the null hypothesis, we have

Using the familiar logic, we should run the following test:

Example 9.19 A reliable production process should yield items with low variability of key measures related to their quality. Imagine that we compare two technologies to check if they are significantly different in terms of such variability. A sample of n₁ = 10 items obtained by process 1 yields , whereas process 2 yields , for sample size n₂ = 12. Can we say that there is a significant difference in variance?

Let us choose α = 10%. The test statistic is

By using suitable statistical software, we find the following quantiles of the F distribution with 9 and 11 degrees of freedom:

Since F = 0.5172 ∈ [0.3223, 2.8962], we cannot reject the null hypothesis. Superficially, it seems that there is quite a difference in the two variances, as one is almost twice as much as the other one, but the sample sizes are too small to draw a reliable conclusion. If we had n₁ = 100 and n₂ = 120, then we would use

and we could reject the null hypothesis.

Once again, we see that if we have distributional results on relevant statistics, we can follow the usual drill to come up with confidence intervals, hypothesis tests, p-values, etc.

Estimating and testing variance

Comments

Leave a Reply Cancel reply