Testing hypotheses about the difference in the mean of two populations

Sometimes, we have to run a test concerning two (or more) populations. For instance, we could wonder if two markets for a given product are really different in terms of expected demand. Alternatively, after the re-engineering of a business processes, we could wonder whether the new performance measures are significantly different from the old ones. In both cases, the rationalization of the problem calls for assessing the difference between two expected values, μ₁ − μ₂, where μ₁ and μ₂ are the expected values of two random variables. As we have seen, finding a confidence interval and hypothesis testing are both related to the distributional properties of a relevant statistic. Given our aim, it is quite natural to take samples from the two populations, with respective sizes n₁ and n₂, and exploit the statistic

i.e., the difference between the two sample means. Exactly what we should do depends on a number of questions:

Is the number of observations, from both populations, large or small?
Are the two variances known? If they are not, can we assume that they are equal?
Are the samples from the two populations independent?
Are the two populations normal?

In the following, we consider a subset of the possible cases, assuming normal populations, in order to illustrate the issues involved.

The case of large and independent samples If the two samples are both large and mutually independent, the statistic (9.18) is, at least approximately, normally distributed. If the populations are normal, it is in fact normally distributed. Furthermore, independence allows to estimate the standard deviation of the difference by

where and are the two sample variances. The standardized statistic

is (approximately) standard normal. Applying the same reasoning that we have used with a single normal population, the following confidence interval can be built:

On the basis of these estimates, it is also easy to test if the two populations are significantly different; in this case, the test boils down to checking whether the origin lies within the confidence interval.

Example 9.16 We want to compare the average yearly wage for two groups of professionals. Two independent samples are taken, with size n₁ = 55 and n₂ = 65, respectively; the observed sample means are and , and sample standard deviations are S₁ = €3,000 and S₂ = €5,000. Can we say that the observed difference is statistically significant?

The first step is calculating the sample standard deviation according to Eq. (9.19):

Then, we specify the null hypothesis,

and the alternative

Accordingly, the test statistic is

Since Z > 3, the null hypothesis is rejected with any sensible value of the significance level. For instance, if we select α = 5%, the relevant quantile is z_0.975 = 1.96, which is smaller than the test statistic. If we want a 95% confidence interval for the difference in the two population means, we obtain

We see that 0 is not included in the confidence interval, suggesting again that the difference is statistically significant.

The case of small and independent samples With small samples (say, n₁, n₂ < 30), the procedure is not so simple, unless we know the two population variances and . Since this is hardly the case, we cannot rely on the normality of the test statistic. A relatively easy case occurs when we may assume that the variances of the two populations are the same, since this allows pooling observations to estimate the common standard deviation as

Note that the pooled estimator is based on a weighted combination of the two sample variances, where weights are related to the respective degrees of freedom n₁ − 1 and n₂ − 1. Then, we use the standard deviation of the statistic , i.e.

to build the confidence interval

We may test hypotheses by a similar token. Here, we rely on a t distribution, which requires that the two populations be normal, and the total degrees of freedom are n₁ + n₂ − 2, since we estimate two means.

If the two variances are different, we could try again to resort to the t distribution, at least as a reasonable approximation, but it is not clear how many degrees of freedom we should use. A (nontrivial) distributional result justifies the following estimate:

Since in general is not an integer, we may round it down (which makes sense, because with fewer degrees of freedom a confidence interval is larger and more conservative) and build the confidence interval

The case of paired observations: paired t testing All of the procedures described above rely on the independence between the two samples. Now assume, on the contrary, that the samples are strictly related. Such a case occurs when the observations are actually paired. For instance, assume that we sample random financial scenarios, indexed by k, and we evaluate the performance of two portfolio management policies on each scenario, resulting in observations and . In this case, we cannot say that the two observations are independent; arguably, both policies could result in a bad performance when applied in a recessive scenario. However, if we are just interested in checking if one of the two policies has a significant advantage over the other one, we can work directly with the observed differences

Table 9.4 Testing the effectiveness of a preventive maintenance policy.

and the statistics

We see that, by pairing observations, we are back to the case of single population, and a confidence interval for the difference is

We may also test the difference, running a test which is aptly called the paired t test.

Example 9.17 A large corporation runs 10 production plants around the world, which suffer from excessive downtimes, i.e., wasted time because of machine breakdowns. A new preventive maintenance policy is applied, and the corporation would like to check whether it has been effective. To this aim, the data illustrated in Table 9.4 are collected. For each plant, we know the monthly number of hours actually lost, before and after the introduction of the new preventive maintenance policy. The last column shows the reduction of lost hours, where a negative sign implies that actually more production capacity was lost after the new policy was implemented. Of course, this may occasionally happen and does not imply that the policy is ineffective. If we denote by μ₁ the expected lost hours before and by μ₂ the expected lost hours after changing the maintenance process, we should test the null hypothesis

versus the alternative

The test statistics are

It is easy to see that we cannot reject the null hypothesis for any safe significance level. For instance, if we choose α = 0.1, the relevant quantile is t_0.9,9 = 1.3830. The rejection region is the right tail, but TS < 1.3830. The p-value is

which is definitely too large to conclude that the new maintenance procedure has been effective.

Testing hypotheses about the difference in the mean of two populations

Comments

Leave a Reply Cancel reply