In order to fully appreciate the issues involved in characterizing the dependence of random variables, as well as to appreciate the role of independence, we should have some understanding of how to characterize the joint distribution of random variables.1 For the sake of simplicity, we will deal only with the case of two random variables with a joint distribution, leaving the general case as a relatively straightforward extension. The pathway to define all the relevant concepts for two jointly distributed random variables X and Y is similar to the case of a random single variable. The unifying concept is the cumulative distribution function (CDF), as it applies to both discrete and continuous variables. If we have two random variables, we consider joint events such as {X ≤ x} ∩ {Y ≤ y}, and associate a probability measure with them.
DEFINITION 8.1 (Joint cumulative distribution function) The joint CDF for random variables X and Y is defined as
In the following, we will often use the streamlined notation P(X ≤ x, Y ≤ y) to denote a joint event.
The joint CDF is a function of two variables, x and y, and it fully characterizes the joint distribution of the random variables X and Y.
We may also define the joint probability mass function (PMF) for discrete variables, and the joint probability density function (PDF) for continuous variables. For instance, let us refer to a pair of discrete variables, where X may take values xi, i = 1,2,3,…, and Y may take values yj, j = 1,2,3,…. The joint PMF is
We see immediately that, given the PMF, we may recover the CDF
Going the other way around, we may find the PMF given the CDF
We also recall that a PMF makes sense in the discrete case as these probabilities are well defined, whereas for the continuous case they are zero. When dealing with continuous random variables, we have introduced a PDF, which allows us to express probabilities associated with sets, rather than single values; to do so, we need to integrate the PDF over the set of interest. The idea can be generalized to jointly distributed random variables, but in this case we integrate over a two-dimensional domain.2 Hence, we have a joint PDF fX,Y (x,y) such that
In general, we cannot be sure that such a function exists. To be precise, we should say that the two random variables are jointly continuous if the joint PDF exists. Given the joint PDF, we may find the joint CDF by integration:
Example 8.1 It is instructive to see the connection between Eqs. (8.1) and (8.2), in the context of jointly distributed, continuous random variables. Consider the rectangular area
which is depicted as the darkest rectangle in Fig. 8.1. The probability
is the area below the joint PDF, over the rectangle. How can we find this area in terms of the joint CDF? Looking at Eq. (8.3), we see that the CDF gives the areas below the PDF, over infinite regions to the southwest with respect to each point. These are displayed as quadrants in the figure. It is easy to see that we can find the probability of C by taking the right sums and differences of the area over the quadrants. Let us denote the quadrant to the southwest with respect to point (x,y) as Q(x,y). Then, in terms of set operations (difference, union, and intersection), we may write
Fig. 8.1 Interpreting Eq. (8.1).
where, of course, Q(xi−1, yj) ∩ Q(xi, yj−1) = Q(xi−1, yj−1), which is depicted as the quadrant with the intermediate shading in the figure. Translating set operations to sums and differences, we get (8.1).
In the example we see how to obtain a probability in terms of the CDF, for jointly continuous random variables. To find the PDF in terms of the CDF, we should consider the limit case of a rectangle with edges going to zero. Doing so yields
In the single-variable case, we can find the PDF by taking the derivative of the CDF; since double integrals are involved in the bidimensional case, it should not come as a surprise that a second-order, mixed derivative is involved here.
Having defined joint characterizations of random variables, a first question is: Can we relate the joint CDF, PMF, and PDF to analogous functions describing the single variables? In general, whatever refers to a single variable, within the context of a multivariate distribution, is called marginal. So, to be more specific, given the joint CDF FX,Y(x, y), how can we find marginal CDFs FX(x) and FY(y) pertaining to each individual variable? In principle, the task for the CDF is fairly easy. We obtain the marginal CDFs for the two random variables as follows:
By the same token, FY(y) = FX,Y(+∞,y).
In the discrete case, to obtain the marginal PMFs from the joint PMF, we just use the total probability theorem3 and the fact that events {Y = yj} are disjoint:
If we want to obtain marginal PDFs from the joint PDF, we may work in much the same way as in the discrete case:
If we introduce the marginal density:
we have
The marginal PDF for Y is obtained in the same way, integrating with respect to x.
We see that, given a joint distribution, we may find the marginals. It is tempting to think that, given the two marginals, we may recover the joint distribution. This is not true, as the marginal distributions do not say anything about the link among random variables. In Example 7.16 we have seen that, in the context of a discrete-time stochastic process Xt, quite different processes may share the same marginal distribution for all time periods t. The following example, taking advantage of the concepts we have just learned, shows that quite different joint distributions may yield the same pair of marginals.
Example 8.2 Consider the following PDFs (in a moment we will check that they are legitimate densities):
They look quite different, but it is not too difficult to see that they yield the same marginals. The first case is easy:
By symmetry, we immediately see that fY(y) = 1 as well. Hence, the two marginals are two uniform distributions on the unit interval [0, 1]. Now let us tackle the second case. As we learned in Section 3.9.3, when integrating with respect to y, we should just treat x as a constant:
As before, it is easy to see by symmetry that fY(y) = 1 as well. Again, the two marginals are two uniform distributions, but the link between the two random variables is quite different.
Before closing the example, it is easy to see that fX,Y (x, y) is a legitimate density as it is never negative and
Actually, this is just the area of a unit square. Checking the legitimacy of gX,Y(x, y) is a bit more difficult. The easy part is checking that the integral over the unit square is 1. Given the marginals above, we may write
But we should also check that the function is never negative. One way of doing so would be to find its minimum over the unit square. The optimization of a function of multiple variables is fully addressed but what we know from Section 3.9.1 suggests that a starting point is the stationarity conditions
These conditions imply x = y = 0.5 but, unfortunately, this is neither a minimum nor a maximum. We invite the reader to check that the Hessian has two eigenvalues of opposite sign. However, with a little intuition, we may see that this density involves the product of two linear terms, 2x − 1 and 2y − 1, and these range on the interval [−1, 1], given the bounds on x and y. Hence, their product will never be smaller than −1, and the overall PDF will never be negative (its minimum is 0, for x = 1, y = − 1 and x = − 1, y = 1).
The example points out that there is a gap between two marginals and a joint distribution. The missing link is exactly what characterizes the dependence between the two random variables. This missing link is the subject of a whole branch of probability theory, which is called copula theory; a copula is a function capturing the essential nature of the dependence between random variables, separating it from the marginal distributions. Copula theory is beyond the scope of this book and in the following, we will just rely on a simple and partial characterization of dependence between random variables, based on covariance and correlation.
Leave a Reply