Adapting statistical inference procedures

The core topics in statistical inference are point and interval parameter estimation, hypothesis testing, and analysis of variance. Some of the related procedures are conceptually easy to adapt to a multivariate case. For instance, maximum likelihood estimation is not quite different, even though it is going to prove computationally more challenging, thus requiring numerical optimization methods for the maximization of the likelihood function. In other cases, things are not that easy and may call for the introduction of new classes of multivariable probability distributions to characterize data. The following example shows that a straightforward extension of single-variable (univariate) methods may not be appropriate.

Fig. 15.2 A bidimensional hypothesis test.

Example 15.1 Let us consider a hypothesis test concerning the mean of a multivariate probability distribution. We want to check the hypothesis that the expected value of a jointly normal random variable Y ∼ N(μ, Σ) is μ₀. Therefore, we test the null hypothesis

against the alternative one

The random vector Y has components Y₁ and Y₂; let us denote the two components of vector μ₀ by μ₀₁ and μ₀₂, respectively. One possible approach would be to calculate the sample mean of each component, and , and run two univariate tests, one for μ₀₁ and one for μ₀₂. More precisely, we could test the null hypothesis H₀ : μ₁ = μ₀₁ on the basis of the sample mean , and H₀ : μ₂ = μ₀₂ on the basis of the sample mean . Then, if we reject even one of the two null univariate hypotheses, we reject the multivariate hypothesis as well. Unfortunately, this approach may fail, as illustrated in Fig. 15.2. In the figure, we show an ellipse, which is a level curve of the joint PDF, and two possible sample means, corresponding to vectors and . The rotation of the ellipse corresponds to a positive correlation between Y₁ and Y₂. If we account for the nature of the PDF of jointly normal variables (see Section 8.4), it turns out that the acceptance region for the test above should be an ellipse. Let us assume that the ellipse in Fig. 15.2 is the acceptance region for the test; then, in the case of , the null hypothesis should be rejected; in the case of , we do not reject H₀. On the contrary, testing the two hypotheses separately results in a rectangular acceptance region around μ₀ (the rectangle is a square if the two standard deviations are the same). We immediately observe that, along each dimension, the distance between μ₀ and and the distance between μ₀ and are exactly the same, in absolute value. Hence, if we run separate tests, we either reject or accept the null hypothesis for both sample means, which is not correct. The point is that a rectangular acceptance region, unlike the elliptical one in Fig. 15.2, does not account for correlation.

Incidentally, a proper analysis of the testing problem in Example 15.1 leads to consider the test statistic

where S⁻¹ is the inverse of the sample covariance matrix. Hence, distributional results for univariate statistics, which involve a t distribution, must be extended by the introduction of a new class of probability distributions, namely, Hotelling’s T² distribution.

Adapting statistical inference procedures

Comments

Leave a Reply Cancel reply