Location measures: mean, median, and mode

We are all familiar with the idea of taking averages. Indeed, the most natural location measure is the mean.

DEFINITION 4.5 (Mean for a sample and a population) The mean for a population of size n is defined as

The mean for a sample of size n is

The two definitions above may seem somewhat puzzling, since they look identical. However, there is an essential difference between the two concepts. The mean of the population is a well-defined number, which we often denote by μ. If collecting information about the whole population is not feasible, we take a sample resulting in a mean . But if we take two different samples, possibly random ones, we will get different values for the mean. We will discover that the population mean is related to the concept of expected value in probability theory, whereas the sample mean is used in inferential statistics as a way to estimate the (unknown) expected value. The careful reader might also have noticed that we have used a lowercase letter x_i when defining the mean of a population and an uppercase letter X_i for the mean of a sample. Again, this is to reinforce the conceptual difference between them; We will use lowercase letters to denote numbers and uppercase letters to denote random variables. Observations in a random sample are, indeed, random variables.

Example 4.7 We want to estimate the mean number of cars entering a parking lot every 10 minutes. The following 10 observations have been gathered, over 10 nonoverlapping time periods of 10 minutes: 10, 22, 31, 9, 24, 27, 29, 9, 23, 12. The sample mean is

Note that the mean of integer numbers can be a fractional number. Also note that a single small observation can affect the sample mean considerably. If, for some odd reason, the first observation is 1000; then

The previous example illustrates the definition of mean, but when we have many data it might be convenient to use frequencies or relative frequencies. If we are given n observations, grouped into C classes with frequencies f_i, the sample mean is

Here, y_k is a value representative of the class. Note that y_k need not be an observed value. In fact, when dealing with continuous variables, y_k might be the midpoint of each bin; clearly, in such a case grouping data results in a loss of information and should be avoided. When variables are integer, one single value can be associated with a class, and no difficulty arises.

Example 4.8 Consider the data in Table 4.6, which contains days of unjustified absence per year of a group of employees. Then: C = 6, , and

If relative frequencies p_k = f_k/n are given, the mean is calculated as

where again y_k is the value associated with class k. It is easy to see that this is equivalent to Eq. (4.1). In this case, we are computing a weighted average of values, where weights are nonnegative and add up to one.

The median, sometimes denoted by m, is another measure of central tendency. Informally, it is the value of the middle term in a dataset that has been ranked in increasing order.

Table 4.6 Data for Example 4.8.

Days of absence	Frequency
0	410
1	430
2	290
3	180
4	110
5	20

Example 4.9 Consider the dataset: 10, 5, 19, 8, 3. Ranking the dataset (3, 5, 8, 10, 19), we see that the median is 8.

More generally, with a dataset of size n the median should be the order statistic

An obvious question is: What happens if we have an even number of elements? In such a case, we take the average of the two middle terms, i.e., the elements in positions n/2 and n/2 + 1.

Example 4.10 Considered the ordered observations

We have n = 12 observations; since (n + l)/2 = 6.5, we take the average of the sixth and seventh observations:

The median is less sensitive than the mean to extreme data (possibly outliers). To see this, consider the dataset (4.2) and imagine substituting the smallest observation, X₍₁₎ = 74.1, with a very small number. The mean is likely to be affected significantly, as the sample size is very small, but the median does not change. The same happens if we change X₍₁₂₎, i.e., the largest observation in the sample. This may be useful when the sample is small and chances are that an outlier enters the dataset. Generally speaking, there are statistics that may be more robust than other ones, and they should be considered when we have a small dataset that is sensitive to outliers.

The median can also be used to measure skewness. Observing the histograms in Fig. 4.5, we may notice that:

For a perfectly symmetric distribution, mean and median are the same.Fig. 4.5 Bar charts illustrating right- and left-skewed distributions.
For a right-skewed distribution [see histogram (a) in Fig. 4.5], the mean is larger than the median (and we speak of positively skewed distributions); this happens because we have rather unlikely, but very high values that bias the mean to the right with respect to the median.
By the same token, for a left-skewed distribution [see histogram (b) in Fig. 4.5], the mean is smaller than the median (and we speak of negatively skewed distributions).

In descriptive statistics there is no standard definition of skewness, but one possible definition, suggested by K. Pearson, is

where m is the median and σ is the standard deviation, a measure of dispersion defined in the next section. This definition indeed shows how the difference between mean and median can be used to quantify skewness.¹¹

Finally, another summary measure is the mode, which corresponds to the most frequent value. In the histograms of Fig. 4.5 the mode corresponds to the highest bar in the plot. In some cases, mean, mode, and median are the same. This happens in histogram (a) of Fig. 4.6. It might be tempting to generalize and say that the three measures are the same for a symmetric distribution, but a quick glance at Fig. 4.6(b) shows that this need not be the case.

Fig. 4.6 Single and bimodal distributions.

Fig. 4.7 A bimodal distribution.

Example 4.11 The histogram in Fig. 4.6(b) is somewhat pathological, as it has two modes. A more common occurrence is illustrated in Fig. 4.7, where there is one true mode (the “globally maximum” frequency) but also a secondary mode (a “locally maximum” frequency). A situation like this might be the result of sampling variability, in which case the secondary mode is just noise. In other cases, it might be the effect of a complex phenomenon and just “smoothing” the secondary mode is a mistake. We may list a few practical examples in which a secondary mode might result:

The delivery lead time from a supplier, i.e., the time elapsing between issuing an order and receiving the shipment. Lead time may feature a little variability because of transportation times, but a rather long lead time may occur when the supplier runs out of stock. Ignoring this additional uncertainty may result in poor customer service.
Consider the repair time of a manufacturing equipment. We may typically observe ordinary faults that take only a little time to be repaired, but occasionally we may have a major fault that takes much more time to be fixed.
Quite often, in order to compare student grades across universities in different countries, histograms are prepared for each university and they are somehow matched in order to define fair conversion rules. Usually, this is done by implicitly assuming that there is a “standard” grade, to which some variability is superimposed. Truth is that the student population is far from uniform; we may have a secondary mode for the subset more skilled students, which actually constitute a different population than ordinary students.¹²

Location measures: mean, median, and mode

Comments

Leave a Reply Cancel reply