Author: haroonkhan
-
Applications of PCA
Principal component analysis can be applied in a marketing setting when questionnaires are administered to potential customers asking for a quantitative evaluation along many dimensions. Many such questions are, or are perceived as, redundant. Spotting the few principal components may help in assessing which product features, or combination thereof, are most important. They can also tell…
-
A small numerical example
Principal component analysis in practice is carried out on sampled data, but it may be instructive to consider an example where both the probabilistic and the statistical sides are dealt with.5 Consider first a random variable with bivariate normal distribution, X ∼ N(0, Σ), where and ρ > 0. Essentially X1 and X2 are standard normal variables with positive correlation ρ. To find the eigenvalues of Σ,…
-
Another view of PCA
Another view is obtained by interpreting the first principal component in terms of orthogonal projection. Consider a unit vector , and imagine projecting the observed vector X on u. This yields a vector parallel to u, of length uTX. Since u has unit length, the projection of observation X(k) on u is We are projecting p-dimensional observations on just one axis, and of course we would like to…
-
A geometric view of PCA
The linear data transformation, including centering, can be written as where . We assume that data have already been centered, in order to ease notation. Hence The Zi variables are called principal components: Z1 is the first principal component. We recall that the matrix A rotating axes is orthogonal: Now, let us consider the sample covariance matrix of X, i.e., SX. Since we assume centered…
-
THE NEED FOR DATA REDUCTION
Consider a sample of observations , k = 1,…, n. Each observation X(k) consists of a vector of p elements . If p = 2, visualizing observations is easy, but this is certainly no piece of cake for large values of p. Hence, we need some way to reduce data dimensionality, by mapping observations in to observations in a lower-dimensional space , where q is possibly much smaller than p. Reducing data…
-
Introduction
This is certainly an age in which we do not suffer from scarcity of data. Using information infrastructures and the Web, we may collect plenty of observations of many variables, resulting in rich datasets waiting for analysis, maybe too rich. We sometimes need to simplify data in order to visualize them, to discover patterns, and to make…
-
Polynomial regression
A good starting point is polynomial regression. When facing a clearly nonlinear data pattern, like the one in Fig. 16.3(a), we may try to come up with a suitable approximation of the nonlinear function relating data. In principle, polynomials provide us with an arbitrary degree of flexibility.9 Let us take a closer look at a model of polynomial…
-
A GLANCE AT NONLINEAR REGRESSION
Logistic regression introduces a nonlinear transformation to account for the qualitative nature of the response variable. But even when considering a quantitative response, we may be forced to consider nonlinearity. Figure 16.3 shows two examples. Given these introductory examples, we should be convinced that sometimes a nonlinear regression model is warranted. Unfortunately, this may be a sort…
-
A digression: logit and probit choice models
The concepts behind logistic regression and the logit function have also been proposed as a tool to model brand choice in marketing applications. Since choice models are a good way to see integrated use of decision and statistical models, we outline the approach in this section. Consider an individual who chooses between two brands. Ideally,…
-
LOGISTIC REGRESSION
Consider the following questions: All of these questions could be addressed by building a statistical model, possibly a regression model, but they have a troubling feature in common: The response variable is either 0 or 1, where we interpret 0 as “it did not happen” and 1 as “it happened.” So far, we have considered…