We extend the simple linear regression concepts that were introduced. The first quite natural idea is building a linear regression model involving more than one regressor. Finding the parameters by ordinary least squares (OLS) is a rather straightforward exercise, as we see in Section 16.1. What is much less straightforward is the statistical side of the coin, since the presence of multiple variables introduces some new issues. In Section 16.2 we consider the problems of testing a multiple regression model, selecting regressor variables, and assessing forecasting uncertainty. We do so for the simpler case of nonstochastic regressors and under restrictive assumptions about the errors, i.e., independence, homoskedasticity, and normality. Even within this limited framework we may appreciate issues like bias from omitted variables and multicollinearity. An understanding of their impact is essential from the users’ point of view. In fact, given the computational power of statistical software, it is tempting to build a huge model encompassing a rich set of explanatory variables. In practice, this may be a dangerous route, and a sound parsimony principle should be always kept in mind.
Multiple linear regression models often include categorical regressors, which are typically accounted for by dummy, or binary, variables. When it is the regressed variable that is categorical, a crude application of regression modeling may lead to nonsensical results. As an example, we could try to model purchasing decisions in discrete terms (yes/no); a regression model can be adopted, estimating the purchase probability of a consumer as a function of explanatory variables related to product features and consumer’s profile. However, a standard linear regression model could well predict probabilities that are smaller than zero or larger than one. In Section 16.3 we consider logistic regression, a possible approach to cope with a categorical regressed variable, based on a nonlinear transformation of the output of a linear regression model. There are many settings in which nonlinearity in data must be explictly recognized, leading to nonlinear regression. This is a quite difficult topic, but in Section 16.4 we introduce some modeling tricks to transform a nonlinear regression model to a linear one.
Leave a Reply