Testing with p-values

In the manufacturing example of the previous section we found such a large value for the test statistic that we are quite confident that the null hypothesis should be rejected, whatever significance level we choose. In other cases, finding a suitable value of α can be tricky. Recall that the larger the value of α, the easier it is to reject the null hypothesis. This happens because the rejection region, whether one- or two-tail, increases with α. We could find a case in which we “accept” the null hypothesis if α = 0.05, but we reject it if α = 0.06. This is clearly a critical situation, because the right confidence level is nowhere engraved on a rock. A useful concept from this perspective is the p-value.

It is easier to understand the concept referring to a one-tail test with H0μ ≤ μ0, vs. Haμ > μ0. The rejection region, as shown in Fig. 9.5(b), is the right tail. If the value of the test statistic is TS = t, the p-value is defined as

images

where Tn−1 is a t variable with n − 1 degrees of freedom. It is important to realize that we compute the p-value after having observed the sample: The random variable TS has been realized, and its numeric value is t. It is easy to see that we would reject the null hypothesis for any significance level α > p, and we would fail to reject it for any value α < p. Hence, calculating a p-value is a way to draw the line between rejection and failure to reject.

Example 9.15 Let us consider again the manufacturing case above, where H0 : μ ≤ 1250, Ha : μ > 1250, S = 70, and n = 30. But now let us assume that the sample mean turns out to be images. In this case, the value of the test statistic TS is

images

Table 9.3 Hypothesis testing about the mean of a normal population, when variance is unknown (TS = test statistic; α = significance level).

images

By using software or statistical tables of the CDF for a t distribution with 29 degrees of freedom, we obtain P(Tn−1 ≤ 0.7825) = 0.7799. Hence

images

This means that, in order to reject, we should admit a large probability, 22.01%, of committing a type I error. This seems a bit too large and, indeed, the difference between 1260 and 1250 does not seem that significant, for this sample size and this variability.

If the sample mean were images, repeating the above calculation would yield p = 0.002. Hence, we would reject the null hypothesis for any significance level α > 0.2%. This is strong evidence that the difference between 1290 and 1250 is statistically significant and cannot be attributed to sampling variability alone. By the way, we should sit down and reflect about the difference between what is statistically significant and what is significant from the business perspective. If the values we are comparing refer to the useful life of some product, we should not take it for granted that the standard customer will notice the difference. The decision to switch to the new manufacturing process or not depends on the cost of the improved process with respect to the old one, and the awareness level of customers.

To complete the overall picture, with two-sided tests we reject in two cases: when TS < −t1−α/2, n−1 and when TS > t1−α/2, n−1. Hence, the p-value is the probability that the absolute value of a random variable Tn−1 is larger than the absolute value of the test statistic. Exploiting symmetry of t distribution, we can write this probability as

images

when TS = t. The remaining one-tail test is easy to figure out, and Table 9.3 summarizes what we have found so far about hypothesis testing.

A remark on p-values In closing this section, it is important to point out a common misunderstanding related to p-values, which is due to an incorrect way of reading (9.16) and (9.17). Since p-values are evaluated using probabilities, it is tempting to consider them as probabilities. However, this is wrong: p-values are random variables, not probabilities. True, we do calculate p-values using probabilities, but they depend on the numerical realization t of the test statistic TS, which is a random variable. If we take two random samples, we will find different p-values. So, we cannot consider them as probabilities of type I errors, which are given by the significance level α, which is specified before taking the sample. What p-values provide us with is a feeling for statistical significance. When a p-value is very small, this suggests that there is really strong evidence against the null hypothesis. In fact, many statistical software tools print something like P > |t| = 0.000 or Corresponding p-value < 0.005. These are an indication of strong reasons for rejecting the null hypothesis, as the test statistic falls on the fair tails of the distribution. A large p-value suggests that we should take a large value of α to reject the null hypothesis, implying a large probability of committing a type I error. Since this is not safe, in such a case it is wise to admit that we cannot reject H0.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *