It is easy to prove that sample variance S2 is an unbiased estimator of variance σ2, but if we want a confidence interval for variance, we need distributional results on S2, which depend on the underlying population. For a normal population we may take advantage of Theorem 9.4. In particular, we recall that the sample variance is related to the chi-square distribution as follows:
where denotes a chi-square distribution with n − 1 degrees of freedom. The conceptual path we should follow is the same that have seen for the expected value, with a slight difference: Unlike the t distribution, the chi-square distribution is not symmetric and has only positive support (see Fig. 7.18). To build a confidence interval with confidence level (1 − α), we need the two quantiles, and , defined by the conditions15
where . Both quantiles are positive, and . Then we have
which after rearranging leads to the confidence interval
We may also test a hypothesis about variance, just as we did for the expected value:
The distributional result above implies that, under the null hypothesis, we have
From Section 7.7.2 we also recall that the expected value of a variable is n − 1. Then, if the test statistic has a value that is sensibly smaller or sensibly larger than n − 1, we should question the null hypothesis. The test procedure is the following:
Example 9.18 The following sample:
was generated by a pseudorandom variate generator; the underlying distribution was normal, with μ = 10 and σ = 20. Now let us forget what we know and build a confidence interval for standard deviation, with confidence level 95%. The sample standard deviation is
To apply Eq. (9.20) we need the following quantiles from the chi-square distribution with 9 degrees of freedom:
These quantiles, unlike those from the normal distribution, are not symmetric and yield the confidence interval
Taking square roots, we notice that the confidence interval for the standard deviation
does not include the true value σ = 20, which may happen with probability 5%. If we want to test the null hypothesis
against the alternative hypothesis Ha : σ ≠ 20, with significance level α = 0.05, we calculate the test statistic according to Eq. (9.21):
This value looks pretty large. In fact, , and the null hypothesis is (incorrectly) rejected. Since this was a two-tail test, we could have equivalently observed that σ0 = 20 was not included in the confidence interval above. Again, we must be aware that type I errors are a real possibility.
If we have to compare the variances of two populations, we should run a test such as
If both populations are normal and we take two independent samples of size n1 and n2, respectively, from Theorem 9.4 we know that
where and and are the sample variances for the two samples. Hence, we have two independent chi-square variables. In Section 7.7.2 we have seen that the ratio of two independent chi-square variables is related to the F distribution; more precisely, has F distribution with n1 − 1 and n2 − 1 degrees of freedom. Then, under the null hypothesis, we have
Using the familiar logic, we should run the following test:
Example 9.19 A reliable production process should yield items with low variability of key measures related to their quality. Imagine that we compare two technologies to check if they are significantly different in terms of such variability. A sample of n1 = 10 items obtained by process 1 yields , whereas process 2 yields , for sample size n2 = 12. Can we say that there is a significant difference in variance?
Let us choose α = 10%. The test statistic is
By using suitable statistical software, we find the following quantiles of the F distribution with 9 and 11 degrees of freedom:
Since F = 0.5172 ∈ [0.3223, 2.8962], we cannot reject the null hypothesis. Superficially, it seems that there is quite a difference in the two variances, as one is almost twice as much as the other one, but the sample sizes are too small to draw a reliable conclusion. If we had n1 = 100 and n2 = 120, then we would use
and we could reject the null hypothesis.
Once again, we see that if we have distributional results on relevant statistics, we can follow the usual drill to come up with confidence intervals, hypothesis tests, p-values, etc.
Leave a Reply