TOTAL PROBABILITY AND BAYES’ THEOREMS

Conditional probabilities are a very important and powerful concept. In this section we see how we may tackle problems like the one in Example 5.2, which we use as a guideline. To frame the problem clearly, let us define the following events:

isM: the child is a male.
isF: the child is a female.
TM: the test predicts a male.
TF: the test predicts a female.

Now the first question is: What do we know and what would we like to know? The problem statement provides us with the following conditional probabilities:

from which we may also infer

Using conditional probabilities, we also see that what we need is, in a sense, inverting the conditional information in the probabilities above, as we need to compare the two conditional probabilities:

To see how we can accomplish our task, let us abstract a little and consider two events E and F. Intersection is a commutative operation:

Using the definition of conditional probability, we may write

but since the two left-hand sides are the same, we may also conclude that

We have proved the following theorem.

THEOREM 5.6 (Bayes’ theorem) Given two events E and F, we have

provided that P(E) ≠ 0.

Immediate application to our problem yields

Let us assume for simplicity that P(isM) = P(isF) = 0.5. In the relationship above, we need the probabilities of events TM and TF. Let us focus on the first one: In how many ways can the result of the test predict a male? Well, there are two cases:

The child is indeed a male, and the test predicts the correct result; this happens with probability P(TM | isM)P(isM).
The child is in fact a female, but the test is wrong; this happens with probability P(TM | isF)P(isF).

Since the two events are mutually exclusive, we may add the two probabilities to obtain

A similar result holds for event TF. Let us pause a moment and generalize the result.

THEOREM 5.7 (Total probability theorem) Given a sample space Ω and a family of mutually exclusive and collectively exhaustive events H₁, H₂, …, H_n, the probability of event E can be expressed as

The events H₁, H₂, …, H_n form a partition of the sample space, as illustrated in Fig. 5.6. Mutually exclusive means that all of them are disjoint:

Fig. 5.6 An illustration of the total probability theorem.

Collectively exhaustive means that their union yields the whole sample space:

Given such a partition, we see that we may cut the event E into a collection of n mutually exclusive “slices” that, when patched together, yield back event E. The total probability theorem is a very convenient way to decompose the calculation of probabilities when we may slice the relevant event into disjoint pieces, as suggested in Fig. 5.6, and conditional probabilities are easy to compute. This is a very useful theorem in computing probabilities.

If we put Bayes’ and total probability theorems together, we see that if H₁, H₂, H₃,…, H_n is a partition of the sample space, then for an event E we have the following equation:

Let us apply what we came up with to the gender prediction problem. The probability that Mary’s child is indeed a male is

By the same token, the probability that Prances’ child is indeed a female is

So, we see that Frances is the one who should be more confident about the gender of her child. We urge the reader to apply Bayes’ theorem to the illness problem of Section 1.2.2 and find the result that we obtained there by an informal reasoning.

Bayes’ theorem is fundamental in working with information and it is the starting point of a whole branch of statistics,⁷ which we touch on To conclude the section, we consider a rather well-known puzzle.

Example 5.9 Consider a dumb but quite popular TV program, in which the participant sits in front of three boxes A, B, and C. One of the boxes contains a prize and the guy, who has no clue where the prize is, has to choose one. Say that he chooses A. The presenter knows where the prize is; he opens box C, showing that it does not contain the prize; then, he offers the participant the possibility of giving up the previous choice and switching to box B. Should the participant accept the offer?

When handed this question, the class typically divides into two camps:

One school of thought maintains that there is no point in switching from box A to box B. A priori, the probability of finding the prize was ; now, with two box remaining, the two probabilities are just . Others go as far as to suggest that the presenter is cheating and trying to lure the participant into switching, in order to save the prize.
Another school of thought maintains that indeed the probabilities were symmetric a priori, but now the probability “mass” associated with box C should shift to box B; then, the probability that the prize is in box B is now and the participant would double the odds of winning by accepting the offer.⁸

Students hinting at the possibility that the presenter is cheating do have a point. We must state clearly the assumptions behind his behavior. In real games like this one, there are in fact many boxes with different prizes, and one would think that there is an incentive to try stealing the big one from the lucky participant. However, perhaps, a bigger incentive is to create suspense to keep the audience and make the game take more time, so that they can slip a few more juicy spots into the program. Therefore, let us assume that the presenter has no malicious intent and that his aim is just to stretch the game a little bit. Of course, whatever we conclude is as valid as this assumption, but this is a good feature of a formal analysis: Any assumption is stated clearly and we may assess its impact on our conclusions.

The first step in tackling the problem is finding a sensible formalization. We are dealing with the following events:

A, the prize is in box A.
B, the prize is in box B.
C, the prize is in box C.
opC, the presenter opens box C after participant’s choice.

What we need to do is evaluating the conditional probability P(A | opC); note that

so calculating one of the two probabilities is quite enough.

The next step is to clearly state what we know, or we assume to know:

A priori, the participant has no reason to believe that one box is more likely to contain the prize than the other ones:
The presenter is not cheating and knows where the prize is. Then, we can evaluate the following conditional probabilities:
- because in such a case he could either open box B or C and nothing would change. So, let us assume that he chooses one of the two possibilities purely at random.
- P(opC | B) = 1, because this is the only available option to him. He cannot open box A, because it is the selected one; he cannot open box B, because it would spoil the game.
- P(opC | C) = 0, because he would necessarily open box B in this case, to avoid spoiling the game.

Now we are ready to apply Bayes’ theorem:

What we miss in this expression is just P(opC), which can be found by the total probability theorem:

If we put everything together, we obtain

Hence, the participant should switch to box B, since the odds of winning the prize would be , rather than just .

We should note that the conclusion of the example depends on all of the assumptions we made. This is a strength of a formal analysis, not a limitation: by stating a problem clearly, we point out which assumptions are critical as well as if and how our conclusion depends on them. If we are uncertain about the assumptions, it is no good reason not to consider their role explicitly.

Problems

5.1 Consider two events E and G, such that E ⊆ G. Then prove that P(E) ≤ P(G).

5.2 Assume that P(A) = P(B) for two events A and B. Then prove that, given another event E

Find an interpretation of the result as a probability inversion formula.

5.3 In Example 5.9 we assumed that the presenter opens box C knowing where the prize is. Now, let us assume that he has no information on where the prize is. Does this change our conclusions?

TOTAL PROBABILITY AND BAYES’ THEOREMS

Problems

Comments

Leave a Reply Cancel reply