The normal distribution is by far the most common, and misused, distribution in the theory of probability. It is also known as Gaussian distribution, but the term “normal” illustrates its central role quite aptly. Its PDF has a seemingly awkward form
Fig. 7.14 PDF of two normal distributions.
depending on two parameters, μ and σ2. Actually, we met such a function a while ago,15 and we noted its peculiar bell shape. Figure 7.14 shows two PDFs for μ = 0, and σ = 1, σ = 3. Actually, it is quite easy to interpret the PDF (7.14):
- The initial factor is just a normalization factor, and its role is only to ensure that the area below the PDF is 1.
- The expected value is just the parameter μ; indeed, this parameter has the effect of shifting the PDF left and right.
- The variance is just the parameter σ2; indeed, this parameter has the effect of changing the scale, i.e., spreading or concentrating the bell, as we can see in Fig. 7.14.
We often use the notation X ∼ N(μ, σ2) to indicate that X has normal distribution; note that the second parameter corresponds to variance, rather than standard deviation. It is very easy to see that for the normal distribution expected value (mean), mode, and median are just the same. The PDF is clearly symmetric with respect to the expected value, so skewness is zero. On the contrary, a somewhat surprising fact is that kurtosis for a normal variable is κ = 3, and it does not depend on the specific value of the parameters. Indeed, in some books the definition of kurtosis, which we gave in Definition 7.4, is replaced by
This is a surprising definition for the uninitiated, and we prefer the alternative one. The point is that the tail behavior of the normal distribution is a sort of benchmark, and it may be useful to express kurtosis of oilier distribution with reference to this base case. The appropriate name for κe is excess kurtosis.
The last point shows that all of the possible normal distributions are essentially the same in terms of tail behavior. In fact, there is something more to notice. We can transform any normal random variable into any other normal variable, with different parameters, just by a linear affine transformation. Consider a generic normal X ∼ N(μ, σ2), and consider the variable
In terms of PDF, we are just shifting the graph and changing its scale, without changing its basic form. Using the familiar rules concerning expected values and variance, we observe the following:
A normal variable Z ∼ N(0, 1), with zero expected value and unit variance is called standard normal. The transformation (7.15) is called standardization. Actually, it applies to any distribution, as it yields a variable with zero expected value and unit variance, but it plays an important role for the normal distribution. We may also go the other way around: Given a standard normal Z, we may invert (7.15) to get an arbitrary normal by destandardization:
The normal distribution has many nice properties, which we will discover in the following text and justify its popularity. One unpleasing feature, though, is that its CDF cannot be calculated analytically. As we know from integrating the density (7.14) requires finding its antiderivative. As it turns out, this is impossible and we must resort to numerical methods to evaluate the integral and, therefore, the CDF. This poses no practical difficulty as plenty of software is available to carry out this task efficiently and with more than adequate precision. We should mention that, traditionally, any text involving probability and statistics provides the reader with tables to carry out calculations by hand.16 The trouble is that we cannot have a set of tables for any possible normal distribution. However, we can easily carry out the job once, for the standard normal, and then apply standardization and destandardization to work with an arbitrary normal.17 Tables for the standard normal provide us with values of the following CDF:
Sometimes, only the right area is tabulated:
Of course, this does not change anything because of the symmetry of the PDF. Given a way to compute Φ(z), we can deal with probabilities for an arbitrary normal variable X ∼ N(μ, σ2). To find the probability P(X ≤ β), we should just apply standardization:
Example 7.7 Consider X ∼ N(3, 16), i.e., a normal variable with expected value 3 and standard deviation 3. Let us compute P(2 < X < 7):
When using statistical tables, we cannot carry out the above calculation directly, as typically we are provided with values Φ(z) only for z ≥ 0. However, we may easily to take advantage of symmetry to compute :
- We need the area of the PDF to the left of .Fig. 7.15 Using quantiles of the standard normal distribution.
- Because of symmetry with respect to the expected value E[Z] = 0, this is just the area to the right of z = ;.
- But this is just the probability:
- Hence:
The kind of gimmicks of the example above are not required anymore, if you have a decent piece of software, but they are still worth learning to really know the ropes of working with normal variables. This is also important because one of the most common tasks in statistics is the use of quantiles of normal distributions. Numerical inversion of the CDF for the standard normal, or reading statistical tables the other way around, yields the quantiles:
for a probability level q ∈ (0, 1). Actually, the usual notation in statistical applications is z1−α, where α is a rather small number, like 0.1 or 0.05; geometrically, the quantile z1−α leaves an area 1 − α of PDF to its left, and α is the area of the right tail. This is illustrated in Fig. 7.15. From the figure, we also see that if we want to leave two symmetric tails on the left and on the right, such that their total area is α, we should consider quantile z1−α/2 and observe that
Now, we know that there is a way to find quantiles zq for the standard normal, but how can we find a quantile xq for a generic normal variable? The quick-and-dirty recipe mirrors destandardization:
To see why this works, observe the following:
Example 7.8 Consider a normal variable X with expected value μ = 100 and standard deviation σ = 20. What is its 95% quantile? We are looking for a number x0.95 such that
Statistical software provides us with the corresponding quantile for the standard normal distribution: z0.95 = 1.6449. Hence
Example 7.9 (A well-known rule for the normal distribution) Given a normal variable X ∼ N(μ, σ2), we might wonder how many realizations are expected to fall in an interval of the form μ ± κσ. We find
We see that almost all of the realizations are expected to fall “within three standard deviations of the mean.” In other words, the width of the interval including almost all of them is six standard deviations; indeed, a managerial philosophy has been called six sigma because of this.
This also shows that the normal distribution has rather thin tails, and this is why it serves as a benchmark in terms of kurtosis. If we observe events that go much beyond the three-sigma wall we should question the applicability of a model based on the normal distribution. A well-known example is the stock market crash of October 19, 1987. This date did deserve the name of “Black Monday,” as the Dow Jones Industrial Average index dropped from 2246 to 1738, a decline of almost 25% in one day. Fitting a normal distribution against index returns shows that this event was about 20 standard deviations below average. In fact, it is rather common to observe such extreme events on financial markets. On the one hand, alternative distributions have been proposed, with fatter tails, to better account for such phenomena. On the other hand, more radical approaches have been proposed, modeling the dynamic behavior of stock market participants, which are not completely rational decisionmakers. The very applicability of probability modeling to this kind of system have been questioned.18
Fig. 7.16 The CDF for an empirical distribution.
Leave a Reply