A GLIMPSE OF STOCHASTIC REGRESSORS AND HETEROSKEDASTIC ERRORS

In this section we outline what happens when we relax a bit our assumptions about the underlying statistical model:

images

The first thing to note is that now the explanatory variable is random. This is certainly going to make things a tad more complicated, but we do not want to change our assumptions substantially, which is possible by taking advantage of the law of iterated expectations.14 The trick is to restate assumptions conditionally on Xi, whenever necessary:

  • (Xi, Yi) are i.i.d. realizations of a joint probability distribution; this also implies that errors are independent (but it does not imply that they are identically distributed).
  • images, i.e., the conditional expectation of errors is zero; note that by the law of iterated expectations this implies that the unconditional expectation of the errors is zero:imagesbut the converse is not true.
  • A technical assumption, which we will disregard in the following but that is needed to derive some results concerning estimators, is that outliers are unlikely, in the sense that (Xi, Yi) have finite fourth-order moments; we recall that fat tails are measured by kurtosis, which is related to fourth-order moments.

The assumption of homoskedasticity should be expressed in terms of conditional variance

images

but we do not want to take this for granted.

This framework looks more complicated, but the essential results we have shown in the simpler setting hold with the assumptions mentioned above. To see an example of the technicalities involved, let us see how unbiasedness of the estimate of slope can be proved. Here we concentrate on the slope, since it is usually the parameter of interest, but the approach can be used to deal with the intercept as well. The starting point is again Eq. (10.12), which should be written as

images

We would like to prove that E(W) = 0, where W is the ratio of the two sums in the equation above. If we take the expectations, in this case we cannot just move explanatory variable outside, since they are not numbers anymore. However, we can use the assumptions and the law of iterated expectations, by conditioning on Xi, i = 1,…,n:

images

The conditional expectation allows us to treat explanatory variables as numbers, performing the same tricks again:

images

Note that we take Xi outside the conditional expectation, and then use the fact that

images

due to assumptions concerning the independence between observations and the conditional expectation of errors. An equivalent way of seeing this result is that the assumptions imply the conditional unbiasedness of the estimator

images

which in turn implies unbiasedness by the application of iterated expectations.

Things are not that easy when we consider standard errors without assuming homoskedasticity. To see why, imagine that we consider variance of the numerator in W:

images

Since regressors are stochastic, we cannot take them outside variance. We cannot factorize the product and simplify the expression, since images need not be independent of Xi. Moreover, we cannot automatically assume that the variance of a ratio is the ratio of the variances. If we want to evaluate SE(b), we must settle for more complicated formulas. An important asymptotic result is the normality of the estimator of slope.

THEOREM 10.4 The asymptotic distribution of the estimator b, under the previously stated assumptions, is characterized by the following limit:

images

where  and  refers to convergence in distribution.

PROOF A fully detailed and rigorous proof would be somewhat technical and tedious, but we may at least appreciate the role of stochastic convergence concepts, including Slutsky’s theorem, which we illustrated in Section 9.8.5Equation (10.25) implies

images

where images and images.

To take advantage of Slutsky’s theorem, we need to assess which terms above converge, in probability or in distribution, to some relevant quantity. What we know about sample mean and sample variance implies

images

The central limit theorem tells us that

images

Then, applying Slutsky’s theorem, we see that the second term in (10.26) converges in probability to zero. Applying the central limit theorem to the numerator of the first term yields

images

where σv is the standard deviation of vi. Then, we also see that

images

from which the theorem follows immediately. We should note that this proof is not quite rigorous, as applying the central limit theorem requires finiteness of variance, which can be ensured by proper assumptions.

The theorem implies that, for a large sample, the estimator is unbiased, asymptotically normal, and consistent. As usual, we do not really know the variance of b in the statement of the theorem, but we can estimate it by the following formula, based on observed residuals and the substitution of variances with their sample counterparts:

images

When drawing statistical inferences, we use the same procedure as in the homoskedastic case, but we use the standard error images Many software packages implement these formulas, which are robust to heteroskedasticity and require a minimal set of assumptions.

An alternative to these robust formulas is obtained if we assume some more specific structure on the nature of heteroskedasticity. For instance, let us assume that

images

i.e., variances of errors are known up to a proportionality constant images. Then we may rewrite (10.24) multiplying both sides of the equation by images:

images

Using this trick, we see that images, i.e., we are back to the homoskedastic case. Now the sum of squared residuals is

images

The resulting approach is called weighted least squares and should be contrasted to the ordinary least squares (OLS). The result is, after all, rather intuitive: We should attribute more weight to observations with a large value of wi, i.e., observations affected by a smaller noise.

What we have accomplished by introducing weighted least squares may sound purely academic, as it seems quite hard to have a detailed knowledge of variances images or weights Wi. However, there are more general approaches that can be used to approximate this knowledge. One such procedure is the estimation of an equation describing variance, i.e., a model relating images to one or more variables by a regression equation. The procedure may be sketched as follows:

  1. Fit a regression model using ordinary least squares and evaluate residuals ei.
  2. Regress the squared residuals on the explanatory variable to obtain an equation predicting variance for each observation as a function of the explanatory variable, images in the case of multiple regression, analysis of residuals may suggest the most appropriate variables to use to estimate such a variance function.
  3. Given the estimated variance function, obtain estimates of weights images images
  4. Apply weighted least squares.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *