MATRIX ALGEBRA AND MULTIVARIATE ANALYSIS

In this section we discuss a few more concepts that are useful in multivariate analysis. Unfortunately, when moving to multivariate statistics, we run out of notation. As usual, capital letters will refer to random quantities, with boldface reserved for random vectors such as X and Z; elements of these vectors will be denoted by X_i and Z_i, and scalar random variables will be denoted by Y as usual. Lowercase letters, such as x and x, refer to numbers or specific realizations of random quantities X and x, respectively. We will also use matrices such as Σ, S, and A; usually, there is no ambiguity between matrices and random vectors. However, we also need to represent the whole set of observations in matrix form. Observation k is a vector , with elements , j = 1,…p, corresponding to single variables or dimensions. Observations are typically collected into matrices, where columns correspond to single variables and rows to their joint realizations (observations). The whole dataset will be denoted by χ, to avoid confusion with vector X. The element [χ]_kj in row k and column j of the data matrix is the element j of observation k, i.e., :

For instance, by using the data matrix χ, we may express the column vector of sample means in the compact form

Here, is a column vector with n elements set to 1, not to be confused with the identity matrix . A useful matrix is

Example 15.3 (The centering matrix) When we premultiply a vector , consisting of univariate observations X₁,…,X_n, by the matrix

we are subtracting the sample mean from all elements of X:

Not surprisingly, the matrix H_n is called centering matrix and may be used with a data matrix χ in order to obtain the matrix of centered data

To understand how this last formula works, you should think of the data matrix as a bundle of column vectors, each one corresponding to a single variable.

MATRIX ALGEBRA AND MULTIVARIATE ANALYSIS

Comments

Leave a Reply Cancel reply