We are already familiar with the concept of conditional probability when events are involved. When dealing with random variables X and Y, we might wonder whether knowing something about Y, possibly even its realized value, can help us in predicting the value of X. To introduce the concepts in the simplest way, it is a good idea to work with a pair of discrete random variables with discrete support. So, let us consider a variable X that can take values xii = 1,…, k, and a variable Y that can take values yjj = 1,…, l. Given the joint PMF, we know all of the relevant probabilities


and we may also consider conditional probabilities, such as


assuming, of course, that P(Y = yj) ≠ 0. Generalizing a bit, let us define the conditional PMF:


It is essential to note that, if the two random variables are independent, then


i.e., knowledge of Y is no use in predicting X. If the two variables are not independent, one natural question concerns the expected value of X if we know that Y = yj. Such a conditional expectation is obtained as follows:


Example 8.6 Let X and Y be two binary random variables whose joint distribution is characterized by the PMF:


Let us find the distribution of X conditional on Y = 0 or Y = 1. The first step is computing the marginal PMF of Y:


Then we find pX|Y(x, 0) first:


By the same token:


Now we may compute the conditional expected values:


Incidentally, the unconditional expected value of X is


We see that knowledge of Y does change our expectation about X. The two random variables are not independent.

The case of two jointly continuous random variables is conceptually similar, and it goes through the definition of the following PDF:


for y such that fY(y) ≠ 0. It is no surprise that we cannot divide by a probability P(Y = y), as this is identically zero, but the concept is quite similar to the discrete case.

Conditioning is a useful concept that can be exploited, among other things:

  1. To simplify the calculations of expectations
  2. To characterize properties of some probability distributions
  3. To characterize properties of certain stochastic processes

We illustrate these points in the following sections.


Leave a Reply

Your email address will not be published. Required fields are marked *