Binomial distribution

The binomial distribution arises as yet another variation on Bernoulli trials. We run n independent and identical experiments and let X be a random variable counting the number of successes. The support of the resulting random variable is {1, 2,…, n}, and its probability distribution depends on two parameters: the probability of success p and the number of experiments n. Since events are independent, it should be an easy affair to multiply probabilities of k successes and n − k failures, to get the PMF. However, there is an additional twist which is best illustrated by a simple example.

Example 6.12 A random experiment consists of three Bernoulli trials with success probability p. What is the probability of getting exactly one success? Since experiments are independent, the probability of a pattern in which we have one success and two failures is (1 − p)²p, but this is not the answer to the question. In the geometric distribution we know that the success must occur in the last trial, but here any one of the three can be the success. Indeed, there are three outcomes for which X = 1:

where F and S denote failure and success, respectively. Hence, we see that (1 − p)²p is just the probability of one pattern in which there is one success, but since there are three, the correct probability is P(X = 1) = 3(1 − p)²p.

In the example above, there is an easy solution, since there are three sequences of three experiments such that there is exactly one success. But how many sequences of, say, 50 experiments may result in 18 successes? The answer is provided by binomial coefficients, which were introduced back in Section 2.2.4, when dealing with combinatorial analysis and permutations. Given n trials, the number of sequences containing k successes is given by the following binomial coefficient:

In the example above

Then, the PMF of a binomial random variable with parameters p and n is

Using properties of the binomial coefficients, it is easy to see that these probabilities add up to 1. An interesting feature of the binomial distribution is that it depends on two parameters. Figure 6.5 shows two PMFs for n = 30; plot (a) refers to the case p = 0.2 and plot (b) to the case p = 0.4. The support is the same for both distributions, but we see how a change in p shifts the PMF. The binomial distribution can be used as a model of uncertainty even when there is no underlying experiment based on Bernoulli trials. Indeed, the two parameters can be fine-tuned to fit empirical data. One such example is modeling demand for items that are sold in small amounts; when sales volume is high, a continuous random variable may be a simpler model.

A binomial random variable X can be regarded as the sum of n independent and identically distributed Bernoulli variables Y_i with parameter p. This is most useful to find expected value and variance by direct application of Property 6.7 and Eq. (6.9):

Fig. 6.5 The PMF of a two binomial distributions with parameter n = 30 and (a) p = 0.2, (b) p = 0.4.

The following example illustrates the role and the limitations of binomial random variables for an interesting practical application, namely, overbooking strategies for airlines.²⁰

Example 6.13 An airline observes that 5% of booked passengers do not show up at check-in. Hence, they adopt an overbooking strategy, accepting a number of reservations that exceeds the number of available seats. When some passenger with a reservation cannot be accommodated, there is a cost, since the overbooked passenger must be rerouted, and maybe offered overnight accommodation.

If an aircraft has 50 seats, and the airline accepts 52 reservations, what is the probability that all of the passengers checking in can be accommodated on the aircraft?
If each overbooked passenger costs $300, what is the expected cost of the policy?

As a first check, we estimate the expected number of passengers checking in if there are 52 reservations:

This means that the policy is sensible since the average number of actual passengers is less than the aircraft capacity. However, this does not mean that there will never be trouble. By the same token, it would be a gross mistake to say that the expected overbooking cost is zero. Remember that the expected value of a function is not the function of the expected value.

To analyze the problem we must model the underlying uncertainty. We may associate each booked passenger with a Bernoulli trial: She will show up (a success) with probability p = 0.95, and she will not show up with probability 1 − p = 0.05. Here, we are using the information we have about a population of passengers as the probability that each single passenger does not check in. In doing so, the implicit assumption is that passengers cancel independently of one another; we immediately see that this is a simplification, since we are not taking behavior of families or groups into account. Still, doing so allows us to apply the binomial distribution to get a first feeling for the involved numbers.

Let us denote the number of passengers checking in by X, a binomial random variable with parameters p = 0.95 and n = 52. There will be no overbooking problem if the number of passengers checking in does not exceed aircraft capacity, and this happens with probability

Calculating the desired probability like this requires plenty of calculations. Since probabilities add up to one, it is definitely better to compute the probability above as

Using the PMF of a binomial random variable, we find

By the same token, the expected cost of the overbooking strategy is

While the cost may seem relatively small, the probability of an overbooking is fairly large: In one case out of four the airline has to manage a situation, and this may be detrimental for its image. Actually, this really depends on how busy that flight is. Indeed, to be precise, we did not compute the unconditional probability P(OK), but the conditional probability P(OK | res = 52), i.e., the probability that there is no overbooking situation if there are 52 reservations.

A more sensible analysis should consider the number of reservations we receive, as well as the structure of reservation cancellations, which may involve pairs or larger groups of passengers, and not only single ones. In real life, airlines also price different fares with different rights to change a reservation. Proper capacity management in airlines is one of the most fruitful domains for quantitative analysis.²¹

Binomial distribution

Comments

Leave a Reply Cancel reply