The geometric distribution is a generalization of the Bernoulli random variable. The underlying conceptual mechanism is the same, but the idea now is repeating identical and independent Bernoulli trials until we get the first success. The number of experiments needed to stop the sequence is a random variable X, with unbounded support 1, 2, 3,…. Finding its PMF is easy and is best illustrated by an example.
Example 6.11 The star of a horror movie is being chased by a killer monster, but she manages somehow to get home. The only problem is that she has n keys, and it goes without saying that she does not remember which one will open the door.17 She starts picking one key at random and trying it, until she finds the right one. What is the probability that the door will open after k trials?
The answer depends on how cool our hero is after the long chase and whether she is lucid enough to set the nonworking keys apart. If she is not that lucid, then we are within the framework of the geometric distribution. The probability of success is p = 1/n and, since wrong keys are put back in the same bunch with the other ones, all of the trials are identical and independent. If the door opens after k trials, this means that she failed k − 1 times. Since events are independent, we just take the product of individual probabilities. Denoting the number of trials by X, we obtain
Note that, in principle, there is no upper bound on the number of trials.
If our hero keeps cool and discards the wrong keys, we should carry out a more careful analysis since now there is some memory in the process. The first failure has probability
The second failure has probability
since now there is one key less. The pattern is the same for the next trials, and the last failure at the (k − 1)th trial has probability
Finally, the probability that trial k is a success is
Putting everything together, the desired probability is
In this case, the distribution of the number of trials boils down to a uniform distribution, which may look a bit disappointing after all of this work. In fact, a much smarter idea is just realizing that if wrong keys are not reinserted in the bunch, the underlying random mechanism is equivalent to just throwing the n keys into n bins at random. The probability that the right key is in any of those bins is just 1/n. Note that, without reinserting the keys, the support of the distribution is bounded by a worst case outcome of n trials.
The example shows that the PMF of a geometric variable with parameter p is18
Very large values are quite unlikely, but not impossible. Figure 6.4 shows the PMF of a geometric variable for p = 0.2. We see that large values are associated with a very small probability. Empirically, you will only observe finite realizations but allowing for extreme, however unlikely, values is important for risk management. By the way, the careful reader should wonder whether it is true that probabilities add up to one for the geometric distribution. Indeed, recalling properties of the geometric series,19 we see that
Fig. 6.4 PMF of a geometric distribution with parameter p = 0.2.
We could tackle another series to calculate the expected value of a geometric random variable. Straightforward application of the definition requires calculating the following:
This is a somewhat tedious calculation and is left as an exercise. We will discover a clever trick based on conditional expectation, allowing us to find the result quite easily. Using the same trick, it will prove very easy to show that the variance for a geometric random variable is
Leave a Reply