Given a random vector images with expected value μ, the covariance matrix can be expressed as

images

Note that inside the expectation we are multiplying a column vector p × 1 and a row vector 1 × p, which does result in a square matrix p × p. It may also be worth noting that there is a slight inconsistency of notation, since we denote variance in the scalar case by σ2, but we do not use Σ2 here, as this would be somewhat confusing. The element in row i and column j of matrix Σ, [Σ]ij, is the covariance σij between Xi and Xj. Consistently, we should regard the variance of Xi as Cov(XiXi) = σii. We may also express the covariance matrix as

images

This is just a vector generalization of Eq. (8.5). If we consider a linear combination Z of variables X, i.e.,

images

then the variance of Z is

images

where Σ is the covariance matrix of X. By a similar token, let us consider a linear transformation from random vector X to random vector Z, represented by the matrix A, i.e.

images

It turns out6 out that the covariance matrix of Z is

images

By recalling that a linear combination of jointly normal variables is normal, the following theorem can be immediately understood.

THEOREM 15.1 Let X be a vector of n jointly normal variables with expected value μ and covariance matrix ΣGiven a matrix imagesthe transformed vector AXtaking values in imageshas a jointly normal distribution with expected value Aμ and covariance matrix AΣAT.

The above properties refer to covariance matrices, i.e., to probabilistic concepts. The same results carry over to the sample covariance matrix, which we denote by S. Again, there is a bit of notational inconsistency with respect to sample variance S2 in the scalar case, but we will think of sample variance in terms of sample covariance, images, and adopt this notation, which is consistent with the use of Σ for a covariance matrix. The sample covariance matrix may be expressed in terms of random observation vectors X(k):

images

The expression in Eq. (15.4) is a multivariable generalization of the familiar way of rewriting sample variance; see Eq. (9.7). It is also fairly easy to show that we may write the sample covariance matrix in a very compact form using the data matrix χ. The sum in Eq. (15.4) can be expressed as χTχ, and by rewriting the vector of sample means as in Eq. (15.1) we obtain

images

From a computational point of view, Eq. (15.5) may not be quite convenient; however, these ways of rewriting the sample covariance matrix may come in handy when proving theorems and analyzing data manipulations. If the data are already centered, then expressing the sample covariance matrix is immediate:

images

We should also note the following properties, that generalize what we are familiar with in the scalar case:

images
images

where b is an arbitrary vector of real numbers.

If we need the sample correlation matrix R, consisting of sample correlation coefficients Rij between Xi and Xj, we may introduce the diagonal matrix of sample standard deviations

images

and then let

images

15.3.2 Measuring distance and the Mahalanobis transformation

In Section 3.3.1 we defined the concept of vector norm, which can be used to measure the distance between points in images. We might also define the distance between observed vectors in the same way, but in statistics we typically want to account for the covariance structure as well. As an introduction, consider the distance between the realization of a random variable X and its expected value μ, or between two realizations X1 and X2. Does a distance measure based on a plain difference, such as |X − μ| or |X1 − X2|, make sense? Such a measure is highly debatable, from a statistical perspective, as it disregards dispersion altogether. A suitable measure should be expressed in terms of number of standard deviations, which leads to the standardized distances

images

Alternatively, we may consider the squared distances

images

To generalize the idea to the distance between observation vectors X(1) and X(2), we may rely on the covariance matrix and define the squared distance

images
images

Fig. 15.6 Illustration of Mahalanobis distance.

where Σ−1 is the inverse of the covariance matrix. More often than not, we do not know the underlying covariance matrix, and we have to replace it with the sample covariance matrix S. We may also express the distance with respect to the expected value in the same way:

images

The last expression should be familiar, since it is related to the argument of the exponential function that defines the joint PDF of a multivariable normal distribution.7 We also recall that the level curves of this PDF are ellipses, whose shape depends on the correlation between variables. This is very helpful in understanding the rationale behind the definition of the distances described above, which are known as Mahalanobis distances. Consider the two points A and B in Fig. 15.6. The figure is best interpreted in terms of a bivariate normal distribution with expected value μ; the ellipse is a level curve of its PDF. Geometrically, if we consider standard Euclidean distance, the points A and B do not have the same distance from μ. However, if we account for covariance by Mahalanobis distance, we see that the two points have the same distance from μ. Strictly speaking, we cannot compare the probabilities of outcomes A and B, as they are both zero; nevertheless, the two points are on the same “isodensity” curve and are, in a loose sense, equally likely.

Mahalanobis distance can also be interpreted as a Euclidean distance modified by a suitably chosen weight matrix, which changes the relative importance of dimensions. Measuring distances is essential in clustering algorithms, as we will see in Section 17.4.1. Finally, Mahalanobis distance may also be interpreted in terms of a transformation, called Mahalanobis transformation. Consider the square root of the covariance matrix, i.e., a symmetric matrix Σ1/2 such that

images

and the transformation

images

where X is a random variable with expected value μ and covariance matrix Σ. Clearly, this transformation is just an extension of familiar standardization of a scalar random variable. The distance between X and μ can be expressed in terms of standardized variables as follows:

images

Now, using Eqs. (15.3) and (15.7), we find that the covariance matrix of the standardized variables is

images

Thus, we see that Mahalanobis transformation yields a set of uncorrelated and standardized variables.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *