Median and quantiles for continuous random variables

Roughly speaking, the median is a value splitting a dataset into two equal parts. When dealing with continuous random variables, we find that the median is a value m_X such that

Geometrically, the median splits the PDF in two parts with an area equal to 0.5. In descriptive statistics, the median can be regarded as a specific case of percentile that corresponds to a 50% probability. In probability theory, the term is usually replaced by quantiles.

DEFINITION 7.1 (Quantiles of continuous random variable) Given the CDF F_X (x) of a continuous random variable and a probability level α ∈ [0, 1], we define the quantile x_α of the distribution as the number satisfying the equation

Fig. 7.7 Probability and quantiles for a continuous random variable.

Geometrically, the quantile x_α is a number leaving an area α to its left, under the PDF. Conceptually, computing a quantile requires inversion of the CDF, as illustrated in Fig. 7.7. Be sure to understand this figure, as quantiles play a prominent role in many applications to follow:

Given a value x_β, we may find the corresponding probability β = P{X ≤ x_β} by evaluating the CDF F_X(x_β).
Given a probability α, we may find the corresponding quantile which is a value, by inverting the CDF.

A natural question is if the CDF is in fact an invertible function. In most cases, when dealing with continuous random variables, the CDF is a strictly increasing and continuous function; hence, inverting the function poses no difficulty. When support is infinite, we cannot really find quantiles corresponding to probabilities 0 and 1, and we should set x₀ = −∞ and x₁ = +∞. There is no guarantee of finding a unique quantile, as the CDF may be a nondecreasing function that is constant on certain intervals, rather than a strictly increasing function. This may happen if the support of the distribution consists of disjoint intervals.

Example 7.3 Consider values x_a < x_b < x_c < x_d and a continuous random variable X whose support consists of the disjoint intervals [x_a, x_b] and [x_c, x_d]. Since X cannot assume values between x_b and x_c, the CDF is constant on the interval [x_b,x_c], and F_X(x_b) = F_X(x_c) = α, for some probability value α. Clearly, the quantile x_α seems undefined, since the function is noninvertible on that interval.

The example may look somewhat pathological, but in fact this is what happens with discrete random variables. This is why quantiles need to be defined in a more general way.

Table 7.1 PMF and CDF for the discrete probability distribution of Example 7.4.

Fig. 7.8 The CDF for a discrete random variable is not invertible.

Median and quantiles for continuous random variables

Comments

Leave a Reply Cancel reply