Consider throwing a die twice. If we know that the result of the first draw is 4, does this change our probability assessment for the second draw? If the die is fair, and there is no cheating on the part of the person throwing it, the answer should be no. The two rolls are independent. In other cases, however, knowing that an event has occurred does tell us something about another event. We have seen such a case when dealing with the growth option example in Section 1.2.3. To formalize this, we should draw the line between two concepts:
- The a priori or unconditional probability of an event, which should apply when we do not have any information about occurred events
- The conditional probability, which results from a reassessment after collecting some (partial) knowledge represented by the occurrence of related events
DEFINITION 5.3 (Conditional probability) The probability of event E, conditional on G, is denoted by P(E|G) and is defined as5
Example 5.6 In dice throwing we know that, a priori, P({1}) = P({2}) = . But if we know that the event EVEN took place, we should update the unconditional probabilities, getting conditional probabilities. For instance, P({1} | EVEN) = 0, as 1 is an odd number and is ruled out if we know that the event EVEN happened. We may also see intuitively that P({2} | EVEN) should be , as 2 is just one possible outcome out of three, if we know that the event EVEN occurred.
We can obtain these results in a more systematic manner using the definition. For instance
The definition of conditional probability may look a bit weird at first, but it can be justified on the ground of the following intuition.
- A priori, whatever happens must lie in the sample space Ω, and P(Ω) = 1. If we know that G occurred, this is the new sample space, whose probability a priori was P(G). In the example above, G = {2, 4, 6} is the new sample space, if we know that event EVEN occurred. Dividing by the probability of G, which is typically less than 1, amounts to increasing all of the probabilities by a sensible renormalization factor. Indeed, such a renormalization implies that P(G | G) = 1, as it should be the case. This explains the term P(G) at the denominator.
- If G is the new sample space, event E may occur only if the intersection E ∩ G occurs. This explains the term P(E ∩ G) at the numerator.
There are cases in which the unconditional and the conditional probabilities are quite different. In other cases, information on an event G tells us nothing about another event E. In other words, the two events are independent.
DEFINITION 5.4 (Independence of two events) Two events E and G are said to be independent if
In other words, for independent events the joint probability can be expressed as the product of the individual probabilities.
This definition might seem a bit unrelated with conditional probability. However, it is easy to see that if E and G are independent events, then
and, by the very same token, P(G\E) = P(G). Hence, for independent events, unconditional and conditional probabilities are exactly the same and the occurrence of one event does not provide any useful information about the other one. Now it is a good idea is to check your understanding of independence with a couple of questions:
Q1. Are two disjoint events independent?
Q2. If G ⊂ F, are the two events independent?
Please: Answer before reading further!
When I ask students the first question, I typically emphasize the fact that the two events are disjoint and that they “have nothing to do with each other.” This is usually enough to trick them into answering “Yes, disjoint events must be independent!” A bit of reflection should tell you that this is plain wrong. If E and G are disjoint and we know that G occurred, then we may rule out E. Hence, they cannot be independent. More formally (assuming that the two events have strictly positive probabilities):
The second question is a bit easier. If G is included in F, then the occurrence of G implies the occurrence of F. Formally (ruling out events with zero probability again), we can state
The moral of the story is that, even though the definition of independence concerns the possibility of factoring a joint probability into the product of independent probabilities, you have to think in terms of information to fully appreciate the issues involved. There is a good reason, though, to phrase the definition of independent events in terms of a product: It generalizes immediately to more than two events.
DEFINITION 5.5 (Independence of multiple events) Consider a family of events {E1, E2, …,En}. The events E1, E2,…, En are said to be independent if, given any arbitrary subset Ej1, Ej2,…, Ejm of the family, with m ≤ n, we have
This definition might look overly involved. Is it not enough to require that the factorization condition applies to the whole family of n events and so be it? What the definition aims at capturing is that to have n independent events, knowledge about any subset of events should not tell us anything about the remaining ones. However, the following conjectures do seem plausible:
- If all of the events in a family are pairwise independent, can we say that they are independent in the sense of Definition 5.5?
- If Eq. (5.2) holds for the whole family, can we say that this implies a similar condition for the subsets of the family?
Intuition may be misleading, at times, and in fact the answer is no in both cases, as the following counterexamples show.
Example 5.7 Consider a random experiment consisting of the draw of an integer number between 1 and 4, where the four outcomes are equally likely. We see that the events A ≡ {1, 2}, B ≡ {1, 3}, C ≡ {1, 4} have the same probability, . It is also easy to see that these events are pairwise independent:
However,
To really get the point, it is useful to reason in terms of information and conditional probabilities. For instance, , because knowing that B occurred does not provide us with any additional information about occurrence of event A. However, P(A | B ∩ C) = 1 ≠ P(A) · P(B ∩ C), because if we know that event B ∩ C occurred, then necessarily the number 1 has been drawn, so A occurred for sure.
The example above shows that if we have three events, and all pairs are independent, the three of them are not necessarily independent. The next example goes the other way around, showing that even if the joint probability of three events factors into the product of their individual probabilities, they are not necessarily pairwise independent, so they cannot be considered independent.6
Example 5.8 Consider a three-dimensional space and events corresponding to a ball being placed at a point characterized by three coordinates (X, Y, Z). The possible points are (1, 0, 0), (0, 1, 0), (0, 0, 1), (1, 1, 0), (1, 1, 1), with probabilities , and respectively. We see immediately that:
and
So, the probability of the joint event does factor into the product of individual probabilities. However, the events are not pairwise independent:
A note on notation. In the last example we stuck to a notation inspired by set theory. As you can see, this can become a bit of a burden when events linked to numerical values of variables are involved. Since these are the kinds of events that we are mostly interested in, we can use a streamlined notation like P(X = 1, Y = 1, Z = 1). In other words, joint events can also be denoted by getting rid of the intersection operator:
Leave a Reply