The rationale behind factor analysis may be best understood by a small numerical example.
Example 17.2 Consider observations in and the correlation matrix of their component variables X1, X2, …, X5:
Does this suggest some structure? We see that X1 and X2 seem to be strongly correlated with each other, whereas they seem weekly correlated with X3, X4, and X5. The latter variables, on the contrary, seem to be strongly correlated with one another. This suggests that the components of the random vector X have some latent structure. In particular, we may imagine the existence of two factors, f1 and f2, that explain the two groups of variables. Some additional “noise” component must exist, otherwise we would have a perfectly block-structured matrix, but it seems that we might reduce dimensionality from 5 to 2, without losing much information.
The example displays an obvious structure, which may need some data transformation work to be discovered in more general cases. Factor analysis (FA) is somewhat related to principal component analysis, as both may be regarded as data reduction procedures; however
- In PCA we build linear combinations of observable variables; in FA we look for unobservable underlying factors.
- In PCA we want to explain most of observed variance; in FA we work with covariances and correlations.
To express FA in formulas, let us introduce a vector f of m factors, fj, j = 1,…,m, where m < p. These factors are common to all p components Xi of a random vector X; these are also associated with specific factors , i = 1,…,p, resulting in the following set of relationships:
The coefficients λij are called factor loadings and are related to the impact of common factor fj on component Xi. The following conditions are typically assumed:
- E[fj] = 0, Var(fj) = 1
- , (specific variance)
- Cov(fj, fk) = 0 for j ≠ k (uncorrelated factors)
- for i ≠ k (uncorrelated specific components)
These assumptions imply that E[Xi] = μi and all of the involved factors, common and specific, are mutually uncorrelated; in other words, the model assumes that the factors fj represent whatever is “common” between the components. Since common factors are uncorrelated, we have an orthogonal factor model. Since specific factors are uncorrelated as well, we may also speak of a diagonal model.7 This way of writing FA bears some resemblance to multiple linear regression, but we should note some key differences:
- We are relating factors to many variables Xi simultaneously, not a single one.
- In linear regression we use observable explanatory variables; factors fj are latent (unobservable).
We may also express FA in matrix form:
where the loading matrix collects the factor loadings λij. In order to make the idea more precise and operational, we need to express the covariance matrix of X:
An important consequence of the above assumptions is that the matrix Ψ is diagonal; in fact, its diagonal contains the specific variances ψi.
The main task of FA is to find the loading matrix Λ. There is a host of approaches for doing so, including methods based on maximum-likelihood estimation, and commercial software packages for multivariate analysis offer the user plenty of choices. A most important point to notice is that, whatever method we use, factors are not unique. To see this, consider an orthogonal matrix T, representing a vector rotation; since TTT = I, we may rewrite (17.3) as
where Λ* = ΛT and f* = TTf. This amounts to rotating the factors, and it is easy to see that the covariance matrix Σ can be expressed in terms of Λ* as well:
This shows that the choice of factors is not unique. Software tools also offer many factor rotation strategies, which may help in finding a sensible interpretation of the factors. Generally, finding such an interpretation is difficult when many factor loadings are large for all of the variables; factor rotation may be used to find a meaningful structure. Doing so is not trivial at all, as it requires experience and domain-specific knowledge.
Example 17.3 Consider the data-generating model of Eq. (17.3), where:
The two factors f1 and f2 are independent standard normal variables, and the five specific factors , i = 1,…,5, are independent normal variables with expected value 0 and standard deviation 5. Of course, in real life, we do not know the underlying data-generating process, but let us see what we can recover by sampling n = 1000 observations by a Monte Carlo method and applying factor analysis on the resulting data. By using a commercial software package,8 we find the following estimates:
This looks disappointing at first sight, as does not look quite like Λ. However, we should take two points into account:
- Factors may need to be rotated to find a meaningful pattern.
- For numerical convenience, factor analysis is applied to standardized observations.
The last consideration suggests that we may check the estimates by comparing the estimated correlation matrix from the factor model with the straightforward sample correlation matrix R. The sample correlation matrix for the random sample was
We urge the reader to compare this estimate with the true correlation matrix for the data-generating model. The correlation matrix from the factor model, due to standardization, is just the covariance matrix
We see that indeed factor analysis recreates the correlation matrix rather well. With a smaller sample and a stronger impact of specific factors (their standard deviation is only 5 in our little experiment), the results can be less reassuring. Furthermore, the task of finding the right rotation and a useful interpretation remains a challenge.
Leave a Reply