Author: haroonkhan
-
Structural equation models with latent variables
Consider the relationship between the following variables: The assumption that these variables are somehow related makes sense, but unfortunately they are not directly observable; they are latent variables. Nevertheless, imagine that we wish to build a model expressing the dependence between latent variables. For instance, we may consider the structural equation where ζ and ξ are latent variables, ν is an error term,…
-
Discriminant analysis
Consider a firm that, on the basis of a set of variables measuring customer attributes, wishes to discriminate between purchasers and nonpurchasers of a product of service. In concrete, the firm has collected a sample of consumers and, given their attributes and observed behavior, wants to find a way to classify them. Two-group discriminant analysis…
-
Canonical correlation
Consider two sets of variables that are collected in vectors X and Y, respectively, and imagine that we would like to study the relationship between the two sets. One way for doing so is by forming two linear combinations, Z = aTX and W = bTY, in such a way that the correlation ρZ,W is maximized. This is what is accomplished by canonical correlation, or canonical analysis. Essentially,…
-
Cluster analysis
The aim of cluster analysis is categorization, i.e., the creation of groups of objects according to their similarities. The idea is hinted at in Fig. 15.3. There are other methods, such as discriminant analysis, essentially aimed at separating groups of observations. However, they differ in the underlying approach, and some can only deal with metric data.…
-
Factor analysis
Factor analysis is another interdependence technique, which shares some theoretical background with PCA, as we show in Section 17.3. Factor analysis can be used for data reduction, too, but it should not be confused with PCA, as in factor analysis we are looking for hidden factors that may explain common sources of variance between variables. Formally,…
-
Principal component analysis
Principal component analysis (PCA) is a data reduction method. Technically, we take a vector of random variables , and we transform it to another vector , by a linear transformation represented by a square matrix . In more detail we have These equations should not be confused with regression equations. The transformed Zi variables are not observed and used in…
-
Multiple regression models
In regression models there is a clear separation between the regressed variable and the regressors (explanatory variables): This does not necessarily mean that there is a causal relationship, but it is enough to classify regression models as dependence models. Regression models arise naturally for dealing with metric variables, but we may use binary variables to…
-
AN OVERVIEW OF MULTIVARIATE METHODS
Multivariate methods can be classified along different features: In the following sections we outline some multivariate methods, suggesting a classification along the above dimensions. We do not aim at being comprehensive; the idea is getting to appreciate the richness of this field of statistics, as well as the classification above in concrete terms.
-
Missing data and outliers
Outliers and wrong data are quite common in data analysis. If data are collected automatically, and they are engineering measurements, this may not be a tough issue; however, when people are involved, either because we are collecting data using questionnaires, or because we are investigating a social system, things may turn out to be a…
-
Adapting statistical inference procedures
The core topics in statistical inference are point and interval parameter estimation, hypothesis testing, and analysis of variance. Some of the related procedures are conceptually easy to adapt to a multivariate case. For instance, maximum likelihood estimation is not quite different, even though it is going to prove computationally more challenging, thus requiring numerical optimization…