Now we are armed with the necessary knowledge to draw statistical inferences about the regression parameters. Mirroring what we did with the estimation of expected value, we should
- Calculate confidence intervals for slope and intercept
- Check the significance of regression coefficients by testing suitable hypotheses
The technicalities involved here are essentially the same as those involved in dealing with estimation of the expected value, and we avoid repeating the reasoning. In the case of the expected value, everything revolves around the statistic
where is the standard deviation of the estimator or, in other words, its standard error, since the estimator is unbiased. In the case of the slope in a regression model, the relevant statistic is
Given our distributional results so far, it is no surprise that this statistic has t distribution with n − 2 degrees of freedom, assuming that errors are normally distributed. A similar result applies to the intercept. Hence, for a large sample, our estimators tend to be normally distributed. Courtesy the central limit theorem, it can be shown that least-squares estimators are approximately normal even if errors are nonnormal, provided that the other assumptions hold.
To compute confidence intervals, we fix a confidence level 1 − α, get the corresponding quantile t1−α/2,n−2 from t distribution, and compute
for the slope, and
for the intercept, relying on formulas (10.16), (10.18), and (10.19).
Example 10.7 Let us compute confidence intervals for the regression parameters for the data in Fig. 10.2. We already know that
furthermore = 8.3333. Using the formulas for standard errors, we find
and
If confidence level is 95%, the quantile we need is
The confidence interval for the slope is
and the confidence interval for the intercept is
We see that these are pretty large confidence intervals; even worse, they contain the origin, and we are not really even sure about the sign of the underlying parameters! This is no surprise, given the extremely scarce and noisy data, but it is what we can honestly say (always keeping the underlying assumptions in mind). We urge the reader to check that for the less noisy data of Fig. 10.1, we get confidence intervals
for slope and intercept, respectively. Also in this case, with very few data, we cannot trust the regression model too much, but at least we have a clear idea of the sign of the effect of the explanatory variable on the response variable.
Hypothesis testing proceeds much along the usual lines. The most common test we carry out is a t test concerning the slope, i.e., the significance of the effect of an explanatory variable. Even if an explanatory variable has no effect on the response variable, i.e., its slope coefficient is β = 0, the estimated value will not be zero because of random errors. It is then natural to test the null hypothesis
against the alternative one
It is customary to run a two-tail test, although we could use a one-tail test if we have a clear idea about the sign of the effect. Then the test statistic boils down to
which is calculated by any software package implementing linear regression. The p-value of the t test of the slope, given that the test statistic T assumes the value t, is
where Tn−2 is a t random variable with n − 2 degrees of freedom. Note that we are assuming a two-tail test, and that the result is exact if errors are normal.
Example 10.8 Continuing Example 10.7, we see that the test statistic for the noisy dataset of Fig. 10.2 is
The quantile for a two-tail test with α = 5% is the same quantile we used to calculate the confidence interval
from which we immediately see that the test statistic does not fall in the rejection region and we cannot reject the null hypothesis that the actual slope β is zero. The p-value of the test is
where T4 is a t variable with 4 degrees of freedom. We could reject the null hypothesis if we accepted a probability of type I error that is just a little less than 20%. Hence, we cannot really say that the effect of the explanatory variable is significant on the basis of the sparse and noisy data we have. For the less noisy first dataset, the test statistic is t = 7.0772 and the p-value is 0.0021, less than 1%.
Testing the intercept requires the same conceptual framework, provided we use the standard error SE(a). Usually, we are more interested in testing slopes, as they measure the impact of an explanatory variable on the response. However, there are situations in which testing the intercept is even more important.
Example 10.9 (Capital asset pricing model) In Example 9.21 we considered a factor model for returns of a financial asset. This is essentially a regression model that can be cast in the following form:
Here Rk is the (random) return from holding the risky asset k for some holding period, Rm is the (random) return from holding the market portfolio m for some holding period, and rf is the risk-free return over the same period (a number). This model is expressed in terms of excess return, i.e., the difference between a random return and the risk-free return.
A fundamental piece of financial theory is the capital asset pricing model (CAPM). This is much more than a regression model, as it is an equilibrium model concerning expected returns. Essentially, the model states that, at equilibrium
In other words, according to CAPM, αk = 0. The practical implication, if we believe the model, is that the risk premium from holding asset k, i.e., the expected excess return, depends only on the systematic risk, i.e., the risk from holding the market portfolio. The unsystematic risk, i.e., the specific risk related to firm k, is not rewarded, as it can be diversified away by holding a properly diversified portfolio. Then, the risk is just measured by the asset beta:
We cannot discuss the exact conditions leading to CAPM, but it is just natural to consider an empirical test of the theory. In this case, we are interested in testing the null hypothesis
against the alternative Ha : αk ≠ 0, to see if empirical data reject the theory. Apparently, it would be easy to check this by running suitable regressions, observing returns over consecutive time periods. Unfortunately, this is not that easy, and in fact empirical testing of CAPM has raised a fair share of controversy. From a financial perspective, it is not so obvious what makes a “market” portfolio, even though one could try surrogating that by a broad market index like S&P500. From a statistical perspective, there is a quite critical point: Who says that errors in consecutive time periods are independent?
The last remark of this example is a good reminder that assumptions should never be taken for granted. More sophisticated regression approaches have been devised to cope with correlated errors.
Leave a Reply