Principal component analysis (PCA) is a data reduction method. Technically, we take a vector of random variables , and we transform it to another vector , by a linear transformation represented by a square matrix . In more detail we have
These equations should not be confused with regression equations. The transformed Zi variables are not observed and used in a fitting procedure; indeed, there is no error term. They are just transformations of the original variables, which are not classified as dependent or independent. Hence, PCA is an interdependence technique, aimed at metric data, and used for exploratory purposes. In Section 17.2 we show that, by taking suitable combinations, we may find a small subset of Zi variables, the principal components, that explain most of the variability in the original variables Xi. By disregarding the less relevant components, we reduce data dimensionality without losing a significant portion of information.
Leave a Reply