Statistical inferences about regression parameters

Now we are armed with the necessary knowledge to draw statistical inferences about the regression parameters. Mirroring what we did with the estimation of expected value, we should

Calculate confidence intervals for slope and intercept
Check the significance of regression coefficients by testing suitable hypotheses

The technicalities involved here are essentially the same as those involved in dealing with estimation of the expected value, and we avoid repeating the reasoning. In the case of the expected value, everything revolves around the statistic

where is the standard deviation of the estimator or, in other words, its standard error, since the estimator is unbiased. In the case of the slope in a regression model, the relevant statistic is

Given our distributional results so far, it is no surprise that this statistic has t distribution with n − 2 degrees of freedom, assuming that errors are normally distributed. A similar result applies to the intercept. Hence, for a large sample, our estimators tend to be normally distributed. Courtesy the central limit theorem, it can be shown that least-squares estimators are approximately normal even if errors are nonnormal, provided that the other assumptions hold.

To compute confidence intervals, we fix a confidence level 1 − α, get the corresponding quantile t_{1−α/2,n−2} from t distribution, and compute

for the slope, and

for the intercept, relying on formulas (10.16), (10.18), and (10.19).

Example 10.7 Let us compute confidence intervals for the regression parameters for the data in Fig. 10.2. We already know that

furthermore = 8.3333. Using the formulas for standard errors, we find

and

If confidence level is 95%, the quantile we need is

The confidence interval for the slope is

and the confidence interval for the intercept is

We see that these are pretty large confidence intervals; even worse, they contain the origin, and we are not really even sure about the sign of the underlying parameters! This is no surprise, given the extremely scarce and noisy data, but it is what we can honestly say (always keeping the underlying assumptions in mind). We urge the reader to check that for the less noisy data of Fig. 10.1, we get confidence intervals

for slope and intercept, respectively. Also in this case, with very few data, we cannot trust the regression model too much, but at least we have a clear idea of the sign of the effect of the explanatory variable on the response variable.

Hypothesis testing proceeds much along the usual lines. The most common test we carry out is a t test concerning the slope, i.e., the significance of the effect of an explanatory variable. Even if an explanatory variable has no effect on the response variable, i.e., its slope coefficient is β = 0, the estimated value will not be zero because of random errors. It is then natural to test the null hypothesis

against the alternative one

It is customary to run a two-tail test, although we could use a one-tail test if we have a clear idea about the sign of the effect. Then the test statistic boils down to

which is calculated by any software package implementing linear regression. The p-value of the t test of the slope, given that the test statistic T assumes the value t, is

where T_n−2 is a t random variable with n − 2 degrees of freedom. Note that we are assuming a two-tail test, and that the result is exact if errors are normal.

Example 10.8 Continuing Example 10.7, we see that the test statistic for the noisy dataset of Fig. 10.2 is

The quantile for a two-tail test with α = 5% is the same quantile we used to calculate the confidence interval

from which we immediately see that the test statistic does not fall in the rejection region and we cannot reject the null hypothesis that the actual slope β is zero. The p-value of the test is

where T₄ is a t variable with 4 degrees of freedom. We could reject the null hypothesis if we accepted a probability of type I error that is just a little less than 20%. Hence, we cannot really say that the effect of the explanatory variable is significant on the basis of the sparse and noisy data we have. For the less noisy first dataset, the test statistic is t = 7.0772 and the p-value is 0.0021, less than 1%.

Testing the intercept requires the same conceptual framework, provided we use the standard error SE(a). Usually, we are more interested in testing slopes, as they measure the impact of an explanatory variable on the response. However, there are situations in which testing the intercept is even more important.

Example 10.9 (Capital asset pricing model) In Example 9.21 we considered a factor model for returns of a financial asset. This is essentially a regression model that can be cast in the following form:

Here R_k is the (random) return from holding the risky asset k for some holding period, R_m is the (random) return from holding the market portfolio m for some holding period, and r_f is the risk-free return over the same period (a number). This model is expressed in terms of excess return, i.e., the difference between a random return and the risk-free return.

A fundamental piece of financial theory is the capital asset pricing model (CAPM). This is much more than a regression model, as it is an equilibrium model concerning expected returns. Essentially, the model states that, at equilibrium

In other words, according to CAPM, α_k = 0. The practical implication, if we believe the model, is that the risk premium from holding asset k, i.e., the expected excess return, depends only on the systematic risk, i.e., the risk from holding the market portfolio. The unsystematic risk, i.e., the specific risk related to firm k, is not rewarded, as it can be diversified away by holding a properly diversified portfolio. Then, the risk is just measured by the asset beta:

We cannot discuss the exact conditions leading to CAPM, but it is just natural to consider an empirical test of the theory. In this case, we are interested in testing the null hypothesis

against the alternative H_a : α_k ≠ 0, to see if empirical data reject the theory. Apparently, it would be easy to check this by running suitable regressions, observing returns over consecutive time periods. Unfortunately, this is not that easy, and in fact empirical testing of CAPM has raised a fair share of controversy. From a financial perspective, it is not so obvious what makes a “market” portfolio, even though one could try surrogating that by a broad market index like S&P500. From a statistical perspective, there is a quite critical point: Who says that errors in consecutive time periods are independent?

The last remark of this example is a good reminder that assumptions should never be taken for granted. More sophisticated regression approaches have been devised to cope with correlated errors.

Statistical inferences about regression parameters

Comments

Leave a Reply Cancel reply