Almost-sure convergence

The last type of convergence that we consider is a strong one in the sense that it implies convergence in probability, which in turn implies convergence in distribution.

DEFINITION 9.12 (Almost-sure convergence) The sequence of random variables X1X2,…, converges almost surely to random variable X if for every imagesthe following condition holds:

images

Almost-sure convergence is denoted by images. Sometimes, almost-sure convergence is referred to as convergence with probability 1; correspondingly, the notation “w.p.1” rather than “a.s.” may be found.

Comparing definition 9.12 of almost-sure convergence with definition 9.7 of convergence in probability may be confusing, as they seem quite similar. The difference is a swap of the limit operator with the probability operator. This commutation is not innocuous at all, and it does make a difference. To understand the definition, it is useful to recall the definition of random variables as functions X(ω) from the sample space Ω to real numbers and to reflect on the concept of pointwise convergence of functions. For instance, consider a sequence of deterministic functions fn(x), where x ∈ [0, 1]. We say that this sequence of functions converges pointwise to function f(x) if, for all x ∈ [0, 1], we have

images

Definition 9.12 weakens this condition a bit, in the sense that we require that the condition

images

holds only for almost any ω ∈ Ω. The set of outcomes ω for which the sequence does not converge must be a set of null measure; equivalently, we have convergence for a set with probability measure 1. The following nice examples illustrate these concepts.24

Example 9.31 Let the sample space be Ω = [0, 1], where probability is uniform. Consider the sequence of random variables

images

and the random variable X(ω) = ω. If ω ∈ [0, 1), we see that

images

This does not happen for ω = 1, since Xn(1) = 2 for any n. Hence, we have convergence of Xn(ω) to X(ω) for all outcomes except a single one, which is a set of null measure. Hence

images

Example 9.32 Let the sample space be Ω = [0, 1], again with uniform probability, as in the previous example. The indicator function associated with an interval is denoted by I[a,b](x) and takes value 1 if x ∈ [a, b], 0 otherwise.

Now define a sequence of random variables as follows:

images

To see the logic behind this sequence, notice that the interval [0, 1] is sliced in two parts to define X2 and X3; then in three parts to define X4X5, and X6; etc. These slices, for increasing n, are smaller and smaller. Whatever ω we choose, X1(ω) = s + 1. The other variables in the sequence take either value s, or value s + 1, depending on whether ω is in the interval of the indicator function associated with each variable in the sequence.

Now let us consider the random variable X(ω) = ω. The sequence Xn converges in probability to X, i.e., images. To see this, consider the probability

images

The random variables Xn(ω) and X(ω) differ on a subinterval of Ω that is smaller and smaller for n increasing; in other words, the measure of this interval goes to zero. In fact, for a suitably small images, the probability in Eq. (9.37) is just the probability that Xn falls in this subinterval, but since this probability goes to zero, Xn converges in probability to X. However, the sequence Xn does not converge almost surely to X. To see this, fix an arbitrary ω; for increasing n, the values in the sequence alternate between ω and ω + 1, and there is no pointwise convergence.

The counterexample shows that convergence in probability does not imply almost-sure convergence, whereas it can be shown that almost-sure convergence does imply convergence in probability. Not surprisingly, if we apply almost-sure convergence, we get a version of the law of large numbers that is stronger than the one in Theorem 9.8.

THEOREM 9.13 (Strong law of large numbers) Let X1X2,… be a sequence of i.i.d. random variables, with E[Xi] = μ and finite variance Var(Xi) = σ2 < +∞. Then, we have

images

A comparison of this theorem with the similar Theorem 9.8 about the weak law of large numbers is puzzling, as they do look the same. Actually, finiteness of variance is a stronger condition than necessary, and alternative statements can be found in the literature; for instance, we could just require E[|Xn|] < ∞. The price we pay for relaxing assumptions is in terms of quite involved theorem proofs, which depend on the type of stochastic convergence involved. We leave such technicalities aside and just remark that the convergence concept in the strong law of large numbers is indeed stronger than the convergence concept used in the weak law.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *