In this section we want to tackle a few statistical issues concerning the estimation of the unknown parameters of the data-generating model, featuring a nonstochastic regressor and homoskedastic errors:
As we said, the values xi are numbers and the errors are independent and identically distributed random variables satisfying the following assumptions:
- Their expected value is zero: .
- The variance for all errors is the same and does not depend on xi i.e., .
- When needed, we will also assume that errors are normally distributed: . In such a case, independence can be substituted by lack of correlation: , for i ≠ j.
Our task mirrors what we did when estimating expected value by sample mean:
- We want to assess the properties of least-squares estimators a and b in terms of bias and estimation error.
- We want to find confidence intervals for estimators.
- We want to test hypotheses about the regression parameters.
- We want to assess the performance of our model in terms of explanatory power and prediction capability.
Since the results that we obtain depend on the assumptions above, it is important to check them a posteriori by analyzing the residuals ei = Yi − (a + bxi), which are a proxy of the unobservable errors. Simple graphical checks are illustrated in Section 10.3.5.
Properties of estimators
The least-squares estimators a and b, as given by (10.1) and (10.3), are random variables, since they depend on the observed response variables Yi which in turn depend on the errors. Given the underlying assumptions about regressors and errors, we see that
In order to assess the viability of the estimators, we must assess some features of their probability distribution, which in turn requires rewriting them in terms of the underlying random variables . Let us start with the estimator of the slope parameter β. Using Eq. (10.7), where we plug random variables Yi in place of numbers yi we get the (ordinary) least-squares estimator b:
Given the underlying model, we see that the sample mean of the response variables is given by
where is the average of the xi and is the sample mean of the errors, over the n observations. Then we rewrite b as
We see that b is given by the sum of β and a random term depending on the errors . To prove unbiasedness, we need to show that the expected value of this random term is zero:
In the manipulations above, we have used the fact that β and xi are numbers and can be taken outside the expectation; then we rely on the assumption that the expected value of the errors is zero, as well as the expected value of their sample mean.
The same line of reasoning can be adopted to prove unbiasedness of a. We rewrite (10.1) relying on the assumptions:
We see that a can be broken down in the sum of three pieces; taking the expected value yields
In these manipulations we have broken down the expected value of a sum into the sum of the expected values, as usual, and we have taken numbers outside the expectation. The second term is zero because b is an unbiased estimator of β and the last term is zero because of our assumptions about the errors. We should note that these results do not depend on the specific distribution of errors, which need not be normal.
Having an unbiased estimators is good news, but we also need to assess their variability. We are concerned about the estimation errors (b − β) and (a − α). Generally speaking, variability of an estimator can be measured by the standard error of estimate, which we denote by SE(·):
Since our estimators are unbiased, we can recast SE into a more familiar form; recalling that E [Z2] = Var(Z) + E2 [Z], for any random variable Z, we get
since β is a number. We see that, because of unbiasedness, the standard error of the estimate is just the standard deviation of the estimator. A similar relationship holds for SE(a), a, and α.
Let us evaluate SE of the least-squares estimators, starting with b. We may rewrite Eq. (10.12) slightly:
This holds because . Then we may calculate the variance of b directly:
In the manipulations above we have taken advantage of the nature of the xi (numbers) and of the errors (mutually independent and with constant standard deviation ). So, the standard error of b is
Fig. 10.3 It is difficult to tell the right slope when data are concentrated.
An apparent missing piece of information in this formula is ; however, we may estimate this on the basis of residuals, as shown later in Section 10.3.2. Now it is useful to interpret the result we have obtained.
- As expected, the reliability of our estimate of the slope depends on the intrinsic variability of the phenomenon we are modeling. If the noisy contribution from errors is low, then the n observations are very close to the line Y = α + βx and estimating the slope is a fairly easy task. Indeed, we see that SE(b) is proportional to .
- Another fairly intuitive observation is that the more observations we have, the better. This is pretty evident in the standard deviation of the sample mean, where a factor pops up. We do not see here an explicit contribution of the number of observations, but nevertheless each additional observation adds a squared contribution to the denominator of the ratio, reducing SE(b).
- A less obvious observation is that our ability to estimate the slope depends on where the observations are located. The denominator of the ratio includes a term looking like a variance, and it is in fact a measure of the (nonrandom) variability of the observations xi. If the points xi are close to each other, i.e., are close to their average , we have a small denominator. It is difficult to see the impact of small variations of x on Y, because this effect is “buried” in background noise. If the observed range of x is wide enough, assessing the impact of x on Y is easier. This is illustrated intuitively in Fig. 10.3.
Formula (10.16) suggests that, in order to get a good estimate of slope, we should have observations over a large range of the explanatory variable x. However, it is worth noting that a linear relationship might just be an acceptable approximation of a nonlinear phenomenon over a limited range of values. Hence, by taking a wide sample we might run into a different kind of trouble, namely a poor fit resulting from the nonlinearity of the observed phenomenon.
In order to assess the standard estimation error for a, we may follow the same route that we took for the slope. We use (10.14) to express variance of the estimator:
This expression looks quite complex, but it is easy to read. The first term relates SE (a) to SE(b); the second term is related to the variance of the sample mean of errors ; finally, we can show that the last term is zero. To see this, let us rewrite b − β using (10.15) to make the contribution of errors explicit:
where we have exploited the mutual independence between the errors and the fact that their variance does not depend on the observations. Now, plugging the expression of SE(b), we obtain
This formula, too, lends itself to a useful interpretation, which is apparent when looking at the two nonzero terms in Eq. (10.17).
- The term SE(b) tells us that getting the wrong slope will influence the error that we make in estimating the intercept, but this depends on the average value of the explanatory variable. This may be understood by looking at Fig. 10.4. If = 0, rotating the regression line around the barycenter of data has no impact on the intercept, whereas the impact is large when is large.Fig. 10.4 Schematics of the impact of slope on estimating the intercept.
- The second term is related to a shift in the intercept due to the variability of the sample error. If = 0, then there is no such shift; otherwise the contribution to the estimation error depends on the sign of this sample mean. The magnitude of this effect depends on the intrinsic variability, measured by , and by the sample size n.
Now that we know the expected value and the variance of our least-squares estimators, a natural question is how they are distributed. The answer is fairly straightforward in our setting, nonstochastic regressors and homoskedastic errors, provided that we also assume that errors are normally distributed, i.e., . A look at Eqs. (10.12) and (10.14) shows that, in our settings, the least-squares estimators are essentially linear combinations of errors, i.e., linear combinations of normal variables. Since a linear combination of normal variables is itself a normal variable, we immediately see that estimators are normally distributed, too. We have proved the following theorem.
THEOREM 10.1 If regressor variables xi are nonstochastic and errors are i.i.d. normal variables, then least-squares estimators are normal random variables: b ~ N(β, SE2 (b)) and a ∼ N(α, SE2 (a)), with standard errors given by (10.16) and (10.18).
This result can actually be generalized. If we do not assume that errors are normally distributed, we can invoke the central limit theorem to show that least-squares estimators are asymptotically normal. 11 In Section 10.5 we also see that this holds for stochastic regressors and for heteroskedastic errors, but the involved formulas, as well as the underlying theory, are a bit more complicated.
Leave a Reply