Empirical distributions

Empirical distributions feature the closest link with descriptive statistics, since their PMF is typically estimated by collecting empirical relative frequencies. For instance, if we consider a sample of 10 observations of a random variable X, and X = 1 occurs in three cases, X = 2 in five cases, and X = 3 occurs twice, we may estimate

images

Empirical distributions feature the largest degree of flexibility and may also be used with qualitative data. Furthermore, they may reflect quite complicated random phenomena, leading to possibly multimodal distributions, whereas we shall see that theoretical distributions typically have a single mode. Yet, this flexibility may backfire when it results in model overfitting, i.e., when the probability model reflects peculiarities in the sampled data that do not carry over to the overall population.

Another point that cannot be overemphasized is that empirical distributions have a finite support by definition, as they rule out values below the smallest observation and above the largest one. This may result in a wrong perception of risk. Sometimes, an empirical distribution is adjusted by adding “tails” derived from a theoretical model in order to avoid this problem, but this patch requires some ad hoc reasoning.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *