The most natural way to characterize a discrete distribution is by its PMF, which can be depicted as a set of bars whose height is the probability of each value. What happens when we consider a random variable that may take any real value on an interval? A starting point to build intuition is getting back to descriptive statistics and relative frequency histograms. Imagine taking a sample of values, which are naturally continuous, and plotting the corresponding histogram of relative frequencies. The appearance of the histogram depends on how large the bins are. If they are rather coarse intervals, the histogram will look extremely jagged, like the one in Fig. 7.1(a). If we shrink the bins, we get thinner bars, looking like the histogram (b) in the figure. Please note that all of the histograms in Fig. 7.1 refer to the same set of data. You may also notice that the relative frequencies in the second case are lower than in the first one; this happens because there are more bins, and we assign fewer outcomes to each one. In the limit, if the bins get smaller and smaller and the sample size is large, we will get something like histogram (c) in the figure. This looks much like a continuous function, describing where sampled values are more likely to fall, whereas the PMF is just a set of values on a discrete set of points.
To make this idea a bit more precise, let us consider a continuous uniform distribution, which is arguably the simplest distribution we can think of. We got acquainted with the discrete uniform distribution in Section 6.5.2. If we consider a continuous uniform variable on the interval [a, b], we should have a “uniform probability” over that interval, as depicted in Fig. 7.2. Given a point x in the interval, what is the probability that the random variable X takes that value, i.e., P(X = x)? Whatever this value is, it must be the same for all of the points in the interval. We know from Section 6.5.2 that in the discrete case p = 1/n, where n is the number of values in the support, but here we have an infinite number of values within the bounded interval [a, b]. Intuitively, if n → +∞, then p = 1/n → 0. Moreover, if we assign any strictly positive value to p, the sum of probabilities will go to infinity, but we know that probabilities should add up to 1. It is tempting to think that the root of the trouble is that we are dealing with a support consisting of infinitely many possible values. However, this is not really the case. In Section 6.5.6, we considered the Poisson distribution, which does have an infinite support. However, since probabilities vanish for large values of the random variable, their sum does converge to 1. This is not possible here, as we are considering a uniform variable. It seems that there is no way to assign a meaningful probability value to a single outcome in the continuous case.
Fig. 7.1 Frequency histograms for shrinking bin widths.
Fig. 7.2 A uniform distribution on the interval [a, b].
However, there is a way out of the dilemma. We can assign sensible probabilities to intervals, rather than single values. To be concrete, consider a uniform random variable on the interval [0, 10], which is denoted by X ∼ U(0, 10). Common sense suggests the following results:
Fig. 7.3 A bell-shaped, nonuniform distribution.
as in both cases we are considering an interval whose length is 3, i.e., 30% of the whole support. Notice that this probability depends on the width of the interval we consider, not on its location; indeed, this is what makes a distribution uniform, after all. More generally, it seems that if we consider an interval of width w included in [0, 10], the probability that X falls there should be the ratio between w and the width of the whole support: w/10. By the way, we recall from elementary geometry that a point has “length” zero; hence, we begin to feel that in fact P(X = x) = 0 for any value x.
So far, so good, but what about a nonuniform distribution, like that in Fig. 7.3? If we consider several intervals of the same width, we cannot associate the same probability with them. Probability should be related to the height of the distribution, which is not constant. Hence, the length of subintervals will not do the trick. Nevertheless, to keep the shape of the distribution duly into account, we may associate probability of an interval with the area under the distribution, over that interval. The concept is illustrated in Fig. 7.4. If we shift the interval [a, b] in the uniform case, we always get the same area, provided that the interval is a subset of the whole support; if we do the same in the bell-shaped case, we get quite different results. We start seeing that probabilities in the continuous case
- Are distributed, whereas they are concentrated at a discrete set of points in the discrete case; we cannot work with a probability mass function associating single values x with the probability P(X = x) = 0.Fig. 7.4 Linking probability to areas.
- May be associated with areas below a function that replaces the PMF, but plays a similar role; this function is the probability density function (PDF). We will denote the PDF of random variable X as fx(x).
To wrap our intuitive reasoning, we should state one fundamental property of the PDF. When dealing with descrete variables, we know that probabilities add up to 1:
For continuous variables, it must be the case that
implying that the overall area below the PDF must be 1. But we also recall from Section 2.13 that this area can be expressed by an integral. Therefore, condition (7.1) should be replaced by
If we are dealing with a uniform variable with support [a, b], then
This is just a condition on the area of a rectangle with one edge of length (b − a), and the other one corresponding to the value of the PDF. Therefore:
If the support is the interval [0, 10], then
as expected. With more general distributions, we have some more technical difficulties in calculating areas, but there are plenty of statistical tables and software packages taking care of this task for us.
Leave a Reply