Category: Introduction to Multivariate Analysis

  • Covariance matrices

    Given a random vector  with expected value μ, the covariance matrix can be expressed as Note that inside the expectation we are multiplying a column vector p × 1 and a row vector 1 × p, which does result in a square matrix p × p. It may also be worth noting that there is a slight inconsistency of notation, since we denote…

  • MATRIX ALGEBRA AND MULTIVARIATE ANALYSIS

    In this section we discuss a few more concepts that are useful in multivariate analysis. Unfortunately, when moving to multivariate statistics, we run out of notation. As usual, capital letters will refer to random quantities, with boldface reserved for random vectors such as X and Z; elements of these vectors will be denoted by Xi and Zi, and scalar random variables will…

  • Correspondence analysis

    Correspondence analysis is a graphical technique for representing the information included in a two-way contingency table containing frequency counts. For example, Table 15.2 lists the number of times an attribute (crispy, sugar-free, good with coffee, etc.) is used by consumers to describe a snack (cookies, candies, muffins, etc.).5 The method deals with two categorical or discrete quantitative variables…

  • Multidimensional scaling

    Multidimensional scaling is a family of procedures that aim at producing a low-dimensional representation of object similarity/dissimilarity. Consider n brands and a similarity matrix, whose entry dij measures the distance between brands i and j, as perceived by consumers. This matrix is a direct input of multidimensional scaling, whereas other methods aim at computing distances. Then, we want to find a representation…

  • Structural equation models with latent variables

    Consider the relationship between the following variables: The assumption that these variables are somehow related makes sense, but unfortunately they are not directly observable; they are latent variables. Nevertheless, imagine that we wish to build a model expressing the dependence between latent variables. For instance, we may consider the structural equation where ζ and ξ are latent variables, ν is an error term,…

  • Discriminant analysis

    Consider a firm that, on the basis of a set of variables measuring customer attributes, wishes to discriminate between purchasers and nonpurchasers of a product of service. In concrete, the firm has collected a sample of consumers and, given their attributes and observed behavior, wants to find a way to classify them. Two-group discriminant analysis…

  • Canonical correlation

    Consider two sets of variables that are collected in vectors X and Y, respectively, and imagine that we would like to study the relationship between the two sets. One way for doing so is by forming two linear combinations, Z = aTX and W = bTY, in such a way that the correlation ρZ,W is maximized. This is what is accomplished by canonical correlation, or canonical analysis. Essentially,…

  • Cluster analysis

    The aim of cluster analysis is categorization, i.e., the creation of groups of objects according to their similarities. The idea is hinted at in Fig. 15.3. There are other methods, such as discriminant analysis, essentially aimed at separating groups of observations. However, they differ in the underlying approach, and some can only deal with metric data.…

  • Factor analysis

    Factor analysis is another interdependence technique, which shares some theoretical background with PCA, as we show in Section 17.3. Factor analysis can be used for data reduction, too, but it should not be confused with PCA, as in factor analysis we are looking for hidden factors that may explain common sources of variance between variables. Formally,…

  • Principal component analysis

    Principal component analysis (PCA) is a data reduction method. Technically, we take a vector of random variables , and we transform it to another vector , by a linear transformation represented by a square matrix . In more detail we have These equations should not be confused with regression equations. The transformed Zi variables are not observed and used in…