In this section we outline what happens when we relax a bit our assumptions about the underlying statistical model:
The first thing to note is that now the explanatory variable is random. This is certainly going to make things a tad more complicated, but we do not want to change our assumptions substantially, which is possible by taking advantage of the law of iterated expectations.14 The trick is to restate assumptions conditionally on Xi, whenever necessary:
- (Xi, Yi) are i.i.d. realizations of a joint probability distribution; this also implies that errors are independent (but it does not imply that they are identically distributed).
- , i.e., the conditional expectation of errors is zero; note that by the law of iterated expectations this implies that the unconditional expectation of the errors is zero:but the converse is not true.
- A technical assumption, which we will disregard in the following but that is needed to derive some results concerning estimators, is that outliers are unlikely, in the sense that (Xi, Yi) have finite fourth-order moments; we recall that fat tails are measured by kurtosis, which is related to fourth-order moments.
The assumption of homoskedasticity should be expressed in terms of conditional variance
but we do not want to take this for granted.
This framework looks more complicated, but the essential results we have shown in the simpler setting hold with the assumptions mentioned above. To see an example of the technicalities involved, let us see how unbiasedness of the estimate of slope can be proved. Here we concentrate on the slope, since it is usually the parameter of interest, but the approach can be used to deal with the intercept as well. The starting point is again Eq. (10.12), which should be written as
We would like to prove that E(W) = 0, where W is the ratio of the two sums in the equation above. If we take the expectations, in this case we cannot just move explanatory variable outside, since they are not numbers anymore. However, we can use the assumptions and the law of iterated expectations, by conditioning on Xi, i = 1,…,n:
The conditional expectation allows us to treat explanatory variables as numbers, performing the same tricks again:
Note that we take Xi outside the conditional expectation, and then use the fact that
due to assumptions concerning the independence between observations and the conditional expectation of errors. An equivalent way of seeing this result is that the assumptions imply the conditional unbiasedness of the estimator
which in turn implies unbiasedness by the application of iterated expectations.
Things are not that easy when we consider standard errors without assuming homoskedasticity. To see why, imagine that we consider variance of the numerator in W:
Since regressors are stochastic, we cannot take them outside variance. We cannot factorize the product and simplify the expression, since need not be independent of Xi. Moreover, we cannot automatically assume that the variance of a ratio is the ratio of the variances. If we want to evaluate SE(b), we must settle for more complicated formulas. An important asymptotic result is the normality of the estimator of slope.
THEOREM 10.4 The asymptotic distribution of the estimator b, under the previously stated assumptions, is characterized by the following limit:
where and refers to convergence in distribution.
PROOF A fully detailed and rigorous proof would be somewhat technical and tedious, but we may at least appreciate the role of stochastic convergence concepts, including Slutsky’s theorem, which we illustrated in Section 9.8.5. Equation (10.25) implies
where and .
To take advantage of Slutsky’s theorem, we need to assess which terms above converge, in probability or in distribution, to some relevant quantity. What we know about sample mean and sample variance implies
The central limit theorem tells us that
Then, applying Slutsky’s theorem, we see that the second term in (10.26) converges in probability to zero. Applying the central limit theorem to the numerator of the first term yields
where σv is the standard deviation of vi. Then, we also see that
from which the theorem follows immediately. We should note that this proof is not quite rigorous, as applying the central limit theorem requires finiteness of variance, which can be ensured by proper assumptions.
The theorem implies that, for a large sample, the estimator is unbiased, asymptotically normal, and consistent. As usual, we do not really know the variance of b in the statement of the theorem, but we can estimate it by the following formula, based on observed residuals and the substitution of variances with their sample counterparts:
When drawing statistical inferences, we use the same procedure as in the homoskedastic case, but we use the standard error Many software packages implement these formulas, which are robust to heteroskedasticity and require a minimal set of assumptions.
An alternative to these robust formulas is obtained if we assume some more specific structure on the nature of heteroskedasticity. For instance, let us assume that
i.e., variances of errors are known up to a proportionality constant . Then we may rewrite (10.24) multiplying both sides of the equation by :
Using this trick, we see that , i.e., we are back to the homoskedastic case. Now the sum of squared residuals is
The resulting approach is called weighted least squares and should be contrasted to the ordinary least squares (OLS). The result is, after all, rather intuitive: We should attribute more weight to observations with a large value of wi, i.e., observations affected by a smaller noise.
What we have accomplished by introducing weighted least squares may sound purely academic, as it seems quite hard to have a detailed knowledge of variances or weights Wi. However, there are more general approaches that can be used to approximate this knowledge. One such procedure is the estimation of an equation describing variance, i.e., a model relating to one or more variables by a regression equation. The procedure may be sketched as follows:
- Fit a regression model using ordinary least squares and evaluate residuals ei.
- Regress the squared residuals on the explanatory variable to obtain an equation predicting variance for each observation as a function of the explanatory variable, in the case of multiple regression, analysis of residuals may suggest the most appropriate variables to use to estimate such a variance function.
- Given the estimated variance function, obtain estimates of weights
- Apply weighted least squares.
Leave a Reply