Adapting statistical inference procedures

The core topics in statistical inference are point and interval parameter estimation, hypothesis testing, and analysis of variance. Some of the related procedures are conceptually easy to adapt to a multivariate case. For instance, maximum likelihood estimation is not quite different, even though it is going to prove computationally more challenging, thus requiring numerical optimization methods for the maximization of the likelihood function. In other cases, things are not that easy and may call for the introduction of new classes of multivariable probability distributions to characterize data. The following example shows that a straightforward extension of single-variable (univariate) methods may not be appropriate.

images

Fig. 15.2 A bidimensional hypothesis test.

Example 15.1 Let us consider a hypothesis test concerning the mean of a multivariate probability distribution. We want to check the hypothesis that the expected value of a jointly normal random variable Y ∼ N(μ, Σ) is μ0. Therefore, we test the null hypothesis

images

against the alternative one

images

The random vector Y has components Y1 and Y2; let us denote the two components of vector μ0 by μ01 and μ02, respectively. One possible approach would be to calculate the sample mean of each component, images and images, and run two univariate tests, one for μ01 and one for μ02. More precisely, we could test the null hypothesis H0 : μ1 = μ01 on the basis of the sample mean images, and H0 : μ2 = μ02 on the basis of the sample mean images. Then, if we reject even one of the two null univariate hypotheses, we reject the multivariate hypothesis as well. Unfortunately, this approach may fail, as illustrated in Fig. 15.2. In the figure, we show an ellipse, which is a level curve of the joint PDF, and two possible sample means, corresponding to vectors images and images. The rotation of the ellipse corresponds to a positive correlation between Y1 and Y2. If we account for the nature of the PDF of jointly normal variables (see Section 8.4), it turns out that the acceptance region for the test above should be an ellipse. Let us assume that the ellipse in Fig. 15.2 is the acceptance region for the test; then, in the case of images, the null hypothesis should be rejected; in the case of images, we do not reject H0. On the contrary, testing the two hypotheses separately results in a rectangular acceptance region around μ0 (the rectangle is a square if the two standard deviations are the same). We immediately observe that, along each dimension, the distance between μ0 and images and the distance between μ0 and images are exactly the same, in absolute value. Hence, if we run separate tests, we either reject or accept the null hypothesis for both sample means, which is not correct. The point is that a rectangular acceptance region, unlike the elliptical one in Fig. 15.2, does not account for correlation.

Incidentally, a proper analysis of the testing problem in Example 15.1 leads to consider the test statistic

images

where S−1 is the inverse of the sample covariance matrix. Hence, distributional results for univariate statistics, which involve a t distribution, must be extended by the introduction of a new class of probability distributions, namely, Hotelling’s T2 distribution.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *