Location measures: mean, median, and mode

We are all familiar with the idea of taking averages. Indeed, the most natural location measure is the mean.

DEFINITION 4.5 (Mean for a sample and a population) The mean for a population of size n is defined as

images

The mean for a sample of size n is

images

The two definitions above may seem somewhat puzzling, since they look identical. However, there is an essential difference between the two concepts. The mean of the population is a well-defined number, which we often denote by μ. If collecting information about the whole population is not feasible, we take a sample resulting in a mean images. But if we take two different samples, possibly random ones, we will get different values for the mean. We will discover that the population mean is related to the concept of expected value in probability theory, whereas the sample mean is used in inferential statistics as a way to estimate the (unknown) expected value. The careful reader might also have noticed that we have used a lowercase letter xi when defining the mean of a population and an uppercase letter Xi for the mean of a sample. Again, this is to reinforce the conceptual difference between them; We will use lowercase letters to denote numbers and uppercase letters to denote random variables. Observations in a random sample are, indeed, random variables.

Example 4.7 We want to estimate the mean number of cars entering a parking lot every 10 minutes. The following 10 observations have been gathered, over 10 nonoverlapping time periods of 10 minutes: 10, 22, 31, 9, 24, 27, 29, 9, 23, 12. The sample mean is

images

Note that the mean of integer numbers can be a fractional number. Also note that a single small observation can affect the sample mean considerably. If, for some odd reason, the first observation is 1000; then

images

The previous example illustrates the definition of mean, but when we have many data it might be convenient to use frequencies or relative frequencies. If we are given n observations, grouped into C classes with frequencies fi, the sample mean is

images

Here, yk is a value representative of the class. Note that yk need not be an observed value. In fact, when dealing with continuous variables, yk might be the midpoint of each bin; clearly, in such a case grouping data results in a loss of information and should be avoided. When variables are integer, one single value can be associated with a class, and no difficulty arises.

Example 4.8 Consider the data in Table 4.6, which contains days of unjustified absence per year of a group of employees. Then: C = 6, images images, and

images

If relative frequencies pk = fk/n are given, the mean is calculated as

images

where again yk is the value associated with class k. It is easy to see that this is equivalent to Eq. (4.1). In this case, we are computing a weighted average of values, where weights are nonnegative and add up to one.

The median, sometimes denoted by m, is another measure of central tendency. Informally, it is the value of the middle term in a dataset that has been ranked in increasing order.

Table 4.6 Data for Example 4.8.

Days of absenceFrequency
0410
1430
2290
3180
4110
520

Example 4.9 Consider the dataset: 10, 5, 19, 8, 3. Ranking the dataset (3, 5, 8, 10, 19), we see that the median is 8.

More generally, with a dataset of size n the median should be the order statistic

images

An obvious question is: What happens if we have an even number of elements? In such a case, we take the average of the two middle terms, i.e., the elements in positions n/2 and n/2 + 1.

Example 4.10 Considered the ordered observations

images

We have n = 12 observations; since (n + l)/2 = 6.5, we take the average of the sixth and seventh observations:

images

The median is less sensitive than the mean to extreme data (possibly outliers). To see this, consider the dataset (4.2) and imagine substituting the smallest observation, X(1) = 74.1, with a very small number. The mean is likely to be affected significantly, as the sample size is very small, but the median does not change. The same happens if we change X(12), i.e., the largest observation in the sample. This may be useful when the sample is small and chances are that an outlier enters the dataset. Generally speaking, there are statistics that may be more robust than other ones, and they should be considered when we have a small dataset that is sensitive to outliers.

The median can also be used to measure skewness. Observing the histograms in Fig. 4.5, we may notice that:

  • For a perfectly symmetric distribution, mean and median are the same.imagesFig. 4.5 Bar charts illustrating right- and left-skewed distributions.
  • For a right-skewed distribution [see histogram (a) in Fig. 4.5], the mean is larger than the median (and we speak of positively skewed distributions); this happens because we have rather unlikely, but very high values that bias the mean to the right with respect to the median.
  • By the same token, for a left-skewed distribution [see histogram (b) in Fig. 4.5], the mean is smaller than the median (and we speak of negatively skewed distributions).

In descriptive statistics there is no standard definition of skewness, but one possible definition, suggested by K. Pearson, is

images

where m is the median and σ is the standard deviation, a measure of dispersion defined in the next section. This definition indeed shows how the difference between mean and median can be used to quantify skewness.11

Finally, another summary measure is the mode, which corresponds to the most frequent value. In the histograms of Fig. 4.5 the mode corresponds to the highest bar in the plot. In some cases, mean, mode, and median are the same. This happens in histogram (a) of Fig. 4.6. It might be tempting to generalize and say that the three measures are the same for a symmetric distribution, but a quick glance at Fig. 4.6(b) shows that this need not be the case.

images

Fig. 4.6 Single and bimodal distributions.

images

Fig. 4.7 A bimodal distribution.

Example 4.11 The histogram in Fig. 4.6(b) is somewhat pathological, as it has two modes. A more common occurrence is illustrated in Fig. 4.7, where there is one true mode (the “globally maximum” frequency) but also a secondary mode (a “locally maximum” frequency). A situation like this might be the result of sampling variability, in which case the secondary mode is just noise. In other cases, it might be the effect of a complex phenomenon and just “smoothing” the secondary mode is a mistake. We may list a few practical examples in which a secondary mode might result:

  • The delivery lead time from a supplier, i.e., the time elapsing between issuing an order and receiving the shipment. Lead time may feature a little variability because of transportation times, but a rather long lead time may occur when the supplier runs out of stock. Ignoring this additional uncertainty may result in poor customer service.
  • Consider the repair time of a manufacturing equipment. We may typically observe ordinary faults that take only a little time to be repaired, but occasionally we may have a major fault that takes much more time to be fixed.
  • Quite often, in order to compare student grades across universities in different countries, histograms are prepared for each university and they are somehow matched in order to define fair conversion rules. Usually, this is done by implicitly assuming that there is a “standard” grade, to which some variability is superimposed. Truth is that the student population is far from uniform; we may have a secondary mode for the subset more skilled students, which actually constitute a different population than ordinary students.12

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *