Category: Descriptive Statistics On the Way to Elementary Probability
-
MULTIDIMENSIONAL DATA
So far, we have considered the organization and representation of data in one dimension, but in applications we often observe multidimensional data. Of course, we may list summary measures for each single variable, but this would miss an important point: the relationship between different variables. In issues concerning independence, correlation, etc. Here we want to…
-
Quartiles and boxplots
Among the many percentiles, a particular role is played by the quartiles, denoted by Q1, Q2, and Q3, corresponding to 25%, 50%, and 75%, respectively. Clearly, Q2 is simply the median. A look at these values and the mean tells a lot about the underlying distribution. Indeed, the interquartile range has been proposed as a measure of dispersion, and an alternative measure…
-
CUMULATIVE FREQUENCIES AND PERCENTILES
The median m is a value such that 50% of the observed values are smaller than or equal to it. In this section we generalize the idea to an arbitrary percentage. We could ask which value is such that 80% of the observations are smaller than or equal to it. Or, seeing things the other way around,…
-
Dispersion measures
Location measures do not tell us anything about dispersion of data. We may have two distributions sharing the same mean, median, and mode, yet they are quite different. Figure 4.8, repeated illustrates the importance of dispersion in discerning the difference between distributions sharing location measures. One possible way to characterize dispersion is by measuring the range X(n) − X(1) i.e.,…
-
Location measures: mean, median, and mode
We are all familiar with the idea of taking averages. Indeed, the most natural location measure is the mean. DEFINITION 4.5 (Mean for a sample and a population) The mean for a population of size n is defined as The mean for a sample of size n is The two definitions above may seem somewhat puzzling,…
-
SUMMARY MEASURES
A look at a frequency histogram tells us many things about the distribution of values of a variable of interest within a population or a sample. However, it would be quite useful to have a set of numbers capturing some essential features quantitatively; this is certainly necessary if we have to compare two histograms, since…
-
ORGANIZING AND REPRESENTING RAW DATA
We have introduced the basic concepts of frequencies and histograms in Section 1.2.1. Here we treat the same concepts in a slightly more systematic way, illustrating a few potential difficulties that may occur even with these very simple ideas. Imagine a car insurance agent who has collected the weekly number of accidents occurred during the last…
-
WHAT IS STATISTICS?
A rather general answer to this question is that statistics is a group of methods to collect, analyze, present, and interpret data (and possibly to make decisions). We often consider statistics as a branch of mathematics, but this is the result of a more recent tendency. From a historical perspective, the term “statistics” stems from…
-
Introduction
Some fundamental concepts of descriptive statistics, like frequencies, relative frequencies, and histograms, have been introduced informally. Here we want to illustrate and expand those concepts in a slightly more systematic way. Our treatment will be rather brief since, within the framework descriptive statistics is essentially a tool for building some intuition paving. We introduce basic…