Estimating and testing variance

It is easy to prove that sample variance S2 is an unbiased estimator of variance σ2, but if we want a confidence interval for variance, we need distributional results on S2, which depend on the underlying population. For a normal population we may take advantage of Theorem 9.4. In particular, we recall that the sample variance is related to the chi-square distribution as follows:

images

where images denotes a chi-square distribution with n − 1 degrees of freedom. The conceptual path we should follow is the same that have seen for the expected value, with a slight difference: Unlike the t distribution, the chi-square distribution is not symmetric and has only positive support (see Fig. 7.18). To build a confidence interval with confidence level (1 − α), we need the two quantiles, images and images, defined by the conditions15

images

where images. Both quantiles are positive, and images. Then we have

images

which after rearranging leads to the confidence interval

images

We may also test a hypothesis about variance, just as we did for the expected value:

images

The distributional result above implies that, under the null hypothesis, we have

images

From Section 7.7.2 we also recall that the expected value of a images variable is n − 1. Then, if the test statistic has a value that is sensibly smaller or sensibly larger than n − 1, we should question the null hypothesis. The test procedure is the following:

images

Example 9.18 The following sample:

images

was generated by a pseudorandom variate generator; the underlying distribution was normal, with μ = 10 and σ = 20. Now let us forget what we know and build a confidence interval for standard deviation, with confidence level 95%. The sample standard deviation is

images

To apply Eq. (9.20) we need the following quantiles from the chi-square distribution with 9 degrees of freedom:

images

These quantiles, unlike those from the normal distribution, are not symmetric and yield the confidence interval

images

Taking square roots, we notice that the confidence interval for the standard deviation

images

does not include the true value σ = 20, which may happen with probability 5%. If we want to test the null hypothesis

images

against the alternative hypothesis Ha : σ ≠ 20, with significance level α = 0.05, we calculate the test statistic according to Eq. (9.21):

images

This value looks pretty large. In fact, images, and the null hypothesis is (incorrectly) rejected. Since this was a two-tail test, we could have equivalently observed that σ0 = 20 was not included in the confidence interval above. Again, we must be aware that type I errors are a real possibility.

If we have to compare the variances of two populations, we should run a test such as

images

If both populations are normal and we take two independent samples of size n1 and n2, respectively, from Theorem 9.4 we know that

images

where and images and images are the sample variances for the two samples. Hence, we have two independent chi-square variables. In Section 7.7.2 we have seen that the ratio of two independent chi-square variables is related to the F distribution; more precisely, images has F distribution with n1 − 1 and n2 − 1 degrees of freedom. Then, under the null hypothesis, we have

images

Using the familiar logic, we should run the following test:

images

Example 9.19 A reliable production process should yield items with low variability of key measures related to their quality. Imagine that we compare two technologies to check if they are significantly different in terms of such variability. A sample of n1 = 10 items obtained by process 1 yields images, whereas process 2 yields images, for sample size n2 = 12. Can we say that there is a significant difference in variance?

Let us choose α = 10%. The test statistic is

images

By using suitable statistical software, we find the following quantiles of the F distribution with 9 and 11 degrees of freedom:

images

Since F = 0.5172 ∈ [0.3223, 2.8962], we cannot reject the null hypothesis. Superficially, it seems that there is quite a difference in the two variances, as one is almost twice as much as the other one, but the sample sizes are too small to draw a reliable conclusion. If we had n1 = 100 and n2 = 120, then we would use

images

and we could reject the null hypothesis.

Once again, we see that if we have distributional results on relevant statistics, we can follow the usual drill to come up with confidence intervals, hypothesis tests, p-values, etc.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *