Size and power of a test

In the elementary theory of hypothesis testing we consider a null hypothesis such as

images

against the alternative Ha : μ ≠ μ0. Given a sample X = (X1,…, Xn), we considered a rejection region C related to two tails of a standard normal or t distribution. In this case, it is quite easy to evaluate the probability of a type I error

images

This is just the probability that, even though the null hypothesis is true, the test statistic falls in the rejection region. The notation Pμ0 refers to the fact that we associate a probability distribution with the unique value μ0 considered in the null hypothesis. But which probability distribution should we consider when the null hypothesis is H0 : μ ≤ μ0?

To generalize the analysis, let us consider a vector of parameters θ and a null hypothesis of the form

images

versus the alternative

images

Here Θ0 is an arbitrary region, and images is its complement. The vector of parameters could be subject to additional restrictions; for instance, a subset of parameters could take only nonnegative values. Let us denote the set of feasible values of parameters by Θ; clearly, images. In the first case above the set Θ0 is a singleton, Θ0 = {μ0}, and we speak of a simple hypothesis; otherwise, we have a composite hypothesis.

DEFINITION 9.17 (Size of a test) We say that a test with rejection region C has size α if

images

Of course, the size of a test is essentially another name for the significance level. From its definition, we see that α is related to the worst-case distribution in terms of type I errors. Since we want to be conservative, it is natural to give priority to type I errors by keeping α reasonably small. However, given a test size, we should find a test minimizing the probability of a type II error. If images, the probability of type II error is

images

Note that

images

DEFINITION 9.18 (Power of a test) The power function for a test with rejection region C is a function of θ, defined as

images

Note that the power of a test is a function, as it depends on the true value the unknown parameter vector; hence, in a practical setting, we are not able to find a single value giving the power of the test. Still, we may observe that the ideal power function is 0 for θ ∈ Θ0 and 1 for images. This ideal power function cannot be obtained in practice, but we can look for tests that have maximal power when images. The theory of optimal, or most powerful, tests is developed in the context of mathematical statistics and it relies on systematic ways to devise testing procedures. We outline one of them in the next section.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *