In this section we discuss a few more concepts that are useful in multivariate analysis. Unfortunately, when moving to multivariate statistics, we run out of notation. As usual, capital letters will refer to random quantities, with boldface reserved for random vectors such as X and Z; elements of these vectors will be denoted by Xi and Zi, and scalar random variables will be denoted by Y as usual. Lowercase letters, such as x and x, refer to numbers or specific realizations of random quantities X and x, respectively. We will also use matrices such as Σ, S, and A; usually, there is no ambiguity between matrices and random vectors. However, we also need to represent the whole set of observations in matrix form. Observation k is a vector , with elements , j = 1,…p, corresponding to single variables or dimensions. Observations are typically collected into matrices, where columns correspond to single variables and rows to their joint realizations (observations). The whole dataset will be denoted by χ, to avoid confusion with vector X. The element [χ]kj in row k and column j of the data matrix is the element j of observation k, i.e., :
For instance, by using the data matrix χ, we may express the column vector of sample means in the compact form
Here, is a column vector with n elements set to 1, not to be confused with the identity matrix . A useful matrix is
Example 15.3 (The centering matrix) When we premultiply a vector , consisting of univariate observations X1,…,Xn, by the matrix
we are subtracting the sample mean from all elements of X:
Not surprisingly, the matrix Hn is called centering matrix and may be used with a data matrix χ in order to obtain the matrix of centered data
To understand how this last formula works, you should think of the data matrix as a bundle of column vectors, each one corresponding to a single variable.
Leave a Reply