MULTIPLE LINEAR REGRESSION BY LEAST SQUARES

Running a linear regression with multiple explanatory variables is a rather straightforward extension of what especially if we assume fixed, deterministic values of the regressors. The underlying statistical model is

images

We avoid using α to denote a constant term, so that we may group parameters into a vector images. The model is estimated on the basis of a sample of n observations,

images

which are collected in vector Y and matrix χ:

images

The data matrix images collects observed values of the regressor variables, and it includes a leading column of ones. This makes notation more uniform; we may think that coefficient β0 is associated with a stream of constant observations xi0 = 1. To estimate β, we apply ordinary least squares, on the basis of the following regression equations:

images

where ei is the residual for observation i, and bj is the estimate of parameter βjj = 0,…,q. If we collect residuals in vector images and coefficients in vector b, the regression equations may be rewritten in the following convenient matrix form:

images

Using least squares, we aim at minimizing the sum of squared residuals, which is just the squared norm of vector e:

images

Now we only have to follow the familiar least-squares drift, but in matrix terms. The concepts of Section 3.9.1, concerning derivatives of quadratic forms, come in quite handy here. To see why, let us rewrite Eq. (16.2):

images

This is a function of the vector of coefficients b and includes a constant term, a linear term, and a quadratic form. Furthermore the matrix χT χ is square, symmetric, and positive semidefinite, implying that the associated quadratic form is convex. Hence, we are minimizing a convex function, and stationarity conditions are sufficient for optimality; we must just take the gradient, i.e., the vector of partial derivatives of the quadratic form with respect to each coefficient bj, and set it to zero. From Section 3.9.1, we recall the following rules to obtain the gradient of linear and quadratic functions of multiple variables:

images

for a column vector a and a square matrix A. By applying these rules to (16.3), we immediately get the optimality conditions:

images

This is just a system of linear equations; the reader is urged to check the size of each matrix involved and to verify that all of the sizes match; in particular, the square matrix χTχ belongs to the space images. To solve this system, formally, we have just to invert a matrix:

images

Can something go wrong with this matrix inversion? The answer is “definitely yes,” and it is fairly easy to see why, by a proper interpretation of the regression equation (16.1). What we are doing is trying to express a vector images as a linear combination of q + 1 vectors:

images

where xjj = 1,…,q, is a column vector collecting the n observations xij of variable j, and images is a vector consisting of elements equal to 1.

Since n > q + 1, there is little hope of succeeding, and we must settle for an optimal approximation, whereby we project the vector images onto a subspace of q + 1 vectors, in such a way as to minimize the norm of the residual vector e = Y − χb. In general, we cannot take for granted that these q + 1 vectors are linearly independent; if they are not, even the coefficients in this approximation will not be well defined, since one of the basis columns can be expressed as a linear combination of some other columns. So, in order to avoid trouble, the vectors 1n and xj should be linearly independent, which amounts to saying that the data matrix χ is full-rank.1 If so, it turns out that the matrix χTχ is nonsingular and Eq. (16.4) makes sense. Actually, there is an even subtler issue: Even if the columns of χ are linearly independent, some regressor variables could be strongly correlated. Even in such a case, it is unlikely that random sampling will result in truly linearly dependent columns; however, the χTχ could be close to singular, resulting in unstable estimates of the regression parameters. This issue is called multicollinearity and is outlined in Section 16.2.1.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *