Random-number generation

Any Monte Carlo approach relies on the ability of generating variables that look reasonably random. Clearly, no computer algorithm can be truly random, but all we need is a way of generating pseudorandom variables that would trick statistical tests into believing that they are truly random. The starting point of any such strategy is the generation of uniform variables on the unit interval [0, 1]; then, there is a wide array of methods to transform uniform variables into whatever distribution we need.

In the past, the standard textbook method used to generate u(0, 1) variates was based on linear congruential generators (LCGs). A LCG generates a sequence of nonnegative integer numbers Z_i as follows; given an integer number Z_i−1, we generate the next number in the sequence by computing

where a (the multiplier), c (the shift), and m (the modulus) are properly chosen parameters and “mod” denotes the remainder of integer division (e.g., 15 mod 6 = 3). Then, to generate a uniform variate on the unit interval, we return the number (Z_i/m). Clearly, we also need an initial number Z₀ to start the sequence; this number is the seed of the sequence.

Table 9.6 Sample sequence generated by a linear congruential generator.

Example 9.27 Let us take a = 5, c = 3, and m = 16. If we start from the seed Z₀ = 7, the first integer number we generate is

Hence

Then we obtain

Proceeding this way, we get the sequence illustrated in Table 9.6.

It is clear that there is nothing random in the sequence generated by a LCG. Starting the sequence from the same seed will always yield the same sequence. Furthermore, we actually generate rational numbers, rather than real ones; this is not a serious problem, provided that m is large enough. But there is another reason to choose a large value for m: The generator is periodic! In fact, we may generate at most m distinct integer numbers Z_i, in the range from 0 to m − 1, and whenever we repeat a previously generated number, the sequence repeats itself (which is not very random at all). We see from the previous example that we return to the initial seed Z₀ = 7 after 16 steps. Since the maximum possible period is m, we should make it very large in order to have a large period. The proper choice of a and c ensures that the period is maximized and that the sequence looks random. Designing a good random-number generator is not easy; luckily, when we purchase good numerical software, someone has already solved the issue for us. Actually, recent developments have led to better generators, but all we need is an idea of what seeds are and how they influence random-number generation.

The second issue we have to tackle is how to generate a pseudorandom variable with an arbitrary distribution. One general approach is based on the inverse transform method. Suppose that we are given the CDF F(x) = P{X ≤ x}, and that we want to generate random variables according to F. If we are able to invert F(x) easily, we may apply the following approach:

Draw a random number U ∼ u(0, 1).
Return X = F⁻¹(U).

It is easy to see that the random variate X generated by this method is actually characterized by the distribution function F:

where we have used the monotonicity of F and the fact that U is uniformly distributed.

Example 9.28 A typical distribution that can be simulated easily by the inverse transform method is the exponential distribution. If X ∼ exp(μ), where l/μ is the expected value of X, its distribution function is

Direct application of the inverse transform yields

Since the distributions of U and (1 − U) are actually the same, it is customary to generate exponential variates by drawing a random number U and returning −ln(U)/μ.

The inverse transform method is only one of the general recipes that are used in practice. Sometimes, ad hoc methods are applied to specific distributions. As far we are concerned, we just need to know that statistical software is available to sample virtually any conceivable distribution.

9.7.3 Methodology of a simulation study

Building a simulation model requires much more than programming events and generating pseudorandom variables. A simulation study involves the following steps as well:

Input analysis. This phase requires devising probability distributions that match empirical data, if available. We should not just assume that, say, a random time has exponential distribution. We should check the fit of any distribution, e.g., by running nonparametric tests like the chi-square goodness-of-fit test.

Model verification. Model verification means checking that the program does what it is supposed to do, i.e., that there are no bugs. Note that this does not imply that the model is a good one; it means simply checking that the program behaves according to its specification.

Model validation. This is a check on the model itself. Even when the program works as specified, it is no guarantee of success, if the specifications themselves are flawed. We should check that the inputs, as well as the business rules encoded in the simulation program, are realistic. Model validation is fairly easy if we are simulating an existing system; we should just compare observed and predicted performance. In other cases, things are not that easy.

Output analysis. This is carried out along the lines of confidence intervals. The caveats of Section 9.2.1 apply.

Finally, we should mention the role of properly designed experiments to check the influence of design parameters. ANOVA techniques do play a major role here.

Random-number generation

9.7.3 Methodology of a simulation study

Comments

Leave a Reply Cancel reply