Median and quantiles for continuous random variables

Roughly speaking, the median is a value splitting a dataset into two equal parts. When dealing with continuous random variables, we find that the median is a value mX such that

images

Geometrically, the median splits the PDF in two parts with an area equal to 0.5. In descriptive statistics, the median can be regarded as a specific case of percentile that corresponds to a 50% probability. In probability theory, the term is usually replaced by quantiles.

DEFINITION 7.1 (Quantiles of continuous random variable) Given the CDF FX (xof a continuous random variable and a probability level α ∈ [0, 1], we define the quantile xα of the distribution as the number satisfying the equation

images

Fig. 7.7 Probability and quantiles for a continuous random variable.

images

Geometrically, the quantile xα is a number leaving an area α to its left, under the PDF. Conceptually, computing a quantile requires inversion of the CDF, as illustrated in Fig. 7.7. Be sure to understand this figure, as quantiles play a prominent role in many applications to follow:

  1. Given a value xβ, we may find the corresponding probability β = P{X ≤ xβ} by evaluating the CDF FX(xβ).
  2. Given a probability α, we may find the corresponding quantile images images which is a value, by inverting the CDF.

A natural question is if the CDF is in fact an invertible function. In most cases, when dealing with continuous random variables, the CDF is a strictly increasing and continuous function; hence, inverting the function poses no difficulty. When support is infinite, we cannot really find quantiles corresponding to probabilities 0 and 1, and we should set x0 = −∞ and x1 = +∞. There is no guarantee of finding a unique quantile, as the CDF may be a nondecreasing function that is constant on certain intervals, rather than a strictly increasing function. This may happen if the support of the distribution consists of disjoint intervals.

Example 7.3 Consider values xa < xb < xc < xd and a continuous random variable X whose support consists of the disjoint intervals [xaxb] and [xcxd]. Since X cannot assume values between xb and xc, the CDF is constant on the interval [xb,xc], and FX(xb) = FX(xc) = α, for some probability value α. Clearly, the quantile xα seems undefined, since the function is noninvertible on that interval.

The example may look somewhat pathological, but in fact this is what happens with discrete random variables. This is why quantiles need to be defined in a more general way.

Table 7.1 PMF and CDF for the discrete probability distribution of Example 7.4.

images
images

Fig. 7.8 The CDF for a discrete random variable is not invertible.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *