Principal component analysis in practice is carried out on sampled data, but it may be instructive to consider an example where both the probabilistic and the statistical sides are dealt with.5 Consider first a random variable with bivariate normal distribution, X ∼ N(0, Σ), where
and ρ > 0. Essentially X1 and X2 are standard normal variables with positive correlation ρ. To find the eigenvalues of Σ, we must find its characteristic polynomial and solve the corresponding equation
This yields the two eigenvalues λ1 = 1 + ρ and λ2 = 1 − ρ. Note that the two eigenvalues are positive, since ρ is a correlation coefficient. To find the first eigenvalue, we consider the system of linear equations:
Clearly, the two equations are linearly dependent and any vector such that u1 = u2 is an eigenvector. By a similar token, any vector such that u1 = −u2 is an eigenvector corresponding to λ2. Two normalized eigenvectors are
These are the rows of the transformation matrix
Since we are dealing with standard normals, μ = 0 and the first principal component is
The second principal component is
As a further check, let us compute the variance of the first principal component:
Figure 17.2 shows the level curves of the joint density of X when ρ= 0.85:
Since correlation is positive, the main axis of the ellipses has positive slope. It is easy to see that along that direction we have the largest variability.
As we noted, practical PCA is carried out on sampled data. Figure 17.3 shows a sample of size 200 from the above bivariate normal distribution, with ρ = 0.85. The sample statistics are
Fig. 17.2 Level curves of a multivariate normal with ρ = 0.85.
Since the sample size is not very large, we see that estimated parameters are not too close to what is expected. Nevertheless, the observation cloud (scatterplot) displayed in Fig. 17.3(a) clearly shows positive correlation. The matrix of normalized eigenvectors is
Apart from a sign, the values in this matrix, if the estimates were perfect, should be . The eigenvalues of the sample covariance matrix are
and indeed:
Note that data need be centered, since the sample means are not zero. The two small plots in Figs. 17.3(b) and (c) show the two principal components, i.e., the projections of the original data. We clearly see that the first principal component accounts for most variability, precisely
Leave a Reply