If two random variables are not independent, it is natural to investigate their degree of dependence, which means finding a way to measure it and to take advantage of it. The second task leads to statistical modeling, which we will investigate later in the simplest case of linear regression. The first task is not as easy as it may seem; the joint density would tell us the whole story, but it is difficult to manage. More importantly, it is difficult to estimate from empirical data. We would like to come up with a limited set of summary measures that are easy to estimate, but fully capturing dependence with a single number is a tricky issue, as we will learn shortly.
A clue on what we could do to measure dependence can be obtained by checking Table 8.1 again. We observed that in most joint realizations of the two random variables, both values tend to be either larger or smaller than the respective averages. This leads to the definition of covariance.
DEFINITION 8.3 (Covariance) The covariance between random variables X and Y is defined as
Quite often, the covariance between two random variables is denoted by σXY.
Covariance is the expected value of the product of two deviations from the mean, and its sign depends on the signs of the two factors. We have positive covariance when the events {X > E[X]} and {Y > E[Y]} tend to occur together, as well as the events {X < E[X]} and {Y < E[Y]}, because the signs of the two factors in the product tend to be the same. If the signs tend to be different, we have a negative covariance.
Example 8.3 If two products are complements, it is natural to expect positive covariance between their demands; negative covariance can be expected if they are substitutes. Similarly, if we observe over time the demand for an item whose long- or midterm consumption is steady, a day of high demand should typically be followed by a day with low demand. As a concrete example, consider the weekly demand for diapers after a week of intense promotional sales.
From a computational perspective, it is very handy to express covariance as follows:
We easily see that if two variables are independent, then their covariance is zero, since independence implies E[XY] = E[X] · E[Y], courtesy of Theorem 8.2. However, the converse is not true in general, as we may see from the following counterexample.
Fig. 8.2 A counterexample about covariance.
Example 8.4 (Two dependent random variables may have zero co-variance) Let us consider a uniform random variable on the interval [−1, 1]; its expected value is zero and on its support the density function is constant and given by . Now, define random variable Y as
Clearly, there is a very strong interdependence between X and Y because, given the realization of X, Y is perfectly predictable. However, their covariance is zero! We have seen that
but E[X]=0 and
because of the symmetry of the integrand function, which is an odd function, in the sense that f(−x) = −f(x). One intuitive way to explain the weird finding of this example is the following. First note that points with coordinates (X, Y) lie on the upper half of the unit circumference X2 + Y2 = 1. But if Y < E[Y], we may have either X > E[X] or X < E[X]. This is illustrated in Fig. 8.2. A similar consideration applies when Y > E[Y].
The example shows that covariance is not really a perfect measure of dependence, as it may be zero in cases in which there is a very strong dependence. In fact, covariance is rather a measure of concordance between a pair of random variables. In other words, covariance measures a linear association between random variables. A strong nonlinear link, as the one in Fig. 8.2 may not be detected at all, or only partially. This point will be much clearer when we deal with simple linear regression.
The essential properties of covariance are the following:
Property 1. Cov(X, X) = Var(X). This property shows that covariance is a generalization of variance and explains its name. Using shorthand notation, .
Property 2. Cov(X, Y) = Cov(Y, X). This property points out an important issue: Covariance is a measure of association, but it has nothing to do with cause–effect relationships. This is an important point to keep in mind when building statistical models based on empirical data. Causality is not necessarily proved by statistical techniques which just exploit associations.
Property 3. Cov(αX, Y) = α Cov(X, Y), where α is any number. This property states that numbers can be “taken outside” covariance. It is instructive to note that, applying this property, we obtain
as expected.
Property 4. Cov(X,Y + Z) = Cov(X, Y) + Cov(X, Z). This is a sort of “distributive” property that comes handy when dealing with sums of random variables. While the first three properties are trivial to prove, it is instructive to prove this last one using (8.5) and linearity of expectation:
Leave a Reply