The linear data transformation, including centering, can be written as
where . We assume that data have already been centered, in order to ease notation. Hence
The Zi variables are called principal components: Z1 is the first principal component. We recall that the matrix A rotating axes is orthogonal:
Now, let us consider the sample covariance matrix of X, i.e., SX. Since we assume centered data, we recall from Section 15.3.1 that this matrix is given as follows:
Now, we may also find the corresponding sample covariance matrix for Z, SZ, taking advantage of the results of Section 15.3.1. However, we would like to find a matrix A such that the resulting principal components are uncorrelated; in other words, SZ should be diagonal:
where the sample variance of each principal component. The matrix A should diagonalize the sample covariance matrix SX, and we have already seen such a diagonalization in Eq. (3.16). To diagonalize SX, we should consider the product
where matrix P is orthogonal and its columns consist of the normalized eigenvectors of the sample covariance matrix; since this is symmetric, its eigenvectors are indeed orthogonal.2 The diagonalized matrix consists of the eigenvalues λi, i = 1,…,p, of the sample covariance matrix SX. Putting everything together, we see that the rows , i = 1,…,p, of matrix A should be the normalized eigenvectors of the sample covariance matrix:
We also see that the sample variances of the principal components Zi are the eigenvalues of SX:
If we sort eigenvalues in decreasing order, we see that indeed Z1 is the first principal component, accounting for most variability. Then, the second principal component Z2 is orthogonal to Z1 and is the second in rank. The fraction of variance explained by the first q components is
Taking the first few components, we can account for most variability and reduce the problem dimension by replacing the original variables by the principal components.
Leave a Reply