Another view is obtained by interpreting the first principal component in terms of orthogonal projection. Consider a unit vector images, and imagine projecting the observed vector X on u. This yields a vector parallel to u, of length uTX. Since u has unit length, the projection of observation X(k) on u is

images

We are projecting p-dimensional observations on just one axis, and of course we would like to have an approximation that is as good as possible. More precisely, we should find u in such a way that the distance between the originally observed vector X(k) and its projection is as small as possible. If we have a sample of n observations, we should minimize the average distance

images

which looks much like a least-squares problem. This amounts to an orthogonal projection of the original vectors on u, where we know that the original and the projected vectors are orthogonal.3 Hence, we can apply the Pythagorean theorem to rewrite the problem:

images

Therefore, we essentially want to maximize

images

subject to the condition images. The problem can be restated as

images

But we know that, assuming data are centered, the sample covariance matrix is Sχ = χTχ/(n − 1); hence, the problem is equivalent to

images
images

In plain English, what we want is finding one dimension on which multidimensional data should be projected, in such a way that the variance of the projected data is maximized. This makes sense from a least-squares perspective, but it also have an intuitive appeal: The dimension along which we maximize variance is the one providing the most information.

To solve the problem above, we may associate the constraint (17.2) with a Lagrange multiplier λ and augment the objective function (17.1) to obtain the Lagrangian function:4

images

The gradient of the Lagrangian function with respect to u is

images

and setting it to zero yields the first-order optimality condition

images

This amounts to saying that λ must be an eigenvalue of the sample covariance matrix, but which one? We can rewrite the objective function (17.1) as follows:

images

Hence, we see that λ should be the largest eigenvalue of SXu is the corresponding normalized eigenvector, and we obtain the same result as in the previous section. Furthermore, we should continue on the same route, by asking for another direction in which variance is maximized, subject to the constraint that it is orthogonal to the first direction we found. Since eigenvectors of a symmetric matrix are orthogonal, we see that indeed we will find all of them, in decreasing order of the corresponding eigenvalues.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *