The last type of convergence that we consider is a strong one in the sense that it implies convergence in probability, which in turn implies convergence in distribution.
DEFINITION 9.12 (Almost-sure convergence) The sequence of random variables X1, X2,…, converges almost surely to random variable X if for every , the following condition holds:
Almost-sure convergence is denoted by . Sometimes, almost-sure convergence is referred to as convergence with probability 1; correspondingly, the notation “w.p.1” rather than “a.s.” may be found.
Comparing definition 9.12 of almost-sure convergence with definition 9.7 of convergence in probability may be confusing, as they seem quite similar. The difference is a swap of the limit operator with the probability operator. This commutation is not innocuous at all, and it does make a difference. To understand the definition, it is useful to recall the definition of random variables as functions X(ω) from the sample space Ω to real numbers and to reflect on the concept of pointwise convergence of functions. For instance, consider a sequence of deterministic functions fn(x), where x ∈ [0, 1]. We say that this sequence of functions converges pointwise to function f(x) if, for all x ∈ [0, 1], we have
Definition 9.12 weakens this condition a bit, in the sense that we require that the condition
holds only for almost any ω ∈ Ω. The set of outcomes ω for which the sequence does not converge must be a set of null measure; equivalently, we have convergence for a set with probability measure 1. The following nice examples illustrate these concepts.24
Example 9.31 Let the sample space be Ω = [0, 1], where probability is uniform. Consider the sequence of random variables
and the random variable X(ω) = ω. If ω ∈ [0, 1), we see that
This does not happen for ω = 1, since Xn(1) = 2 for any n. Hence, we have convergence of Xn(ω) to X(ω) for all outcomes except a single one, which is a set of null measure. Hence
Example 9.32 Let the sample space be Ω = [0, 1], again with uniform probability, as in the previous example. The indicator function associated with an interval is denoted by I[a,b](x) and takes value 1 if x ∈ [a, b], 0 otherwise.
Now define a sequence of random variables as follows:
To see the logic behind this sequence, notice that the interval [0, 1] is sliced in two parts to define X2 and X3; then in three parts to define X4, X5, and X6; etc. These slices, for increasing n, are smaller and smaller. Whatever ω we choose, X1(ω) = s + 1. The other variables in the sequence take either value s, or value s + 1, depending on whether ω is in the interval of the indicator function associated with each variable in the sequence.
Now let us consider the random variable X(ω) = ω. The sequence Xn converges in probability to X, i.e., . To see this, consider the probability
The random variables Xn(ω) and X(ω) differ on a subinterval of Ω that is smaller and smaller for n increasing; in other words, the measure of this interval goes to zero. In fact, for a suitably small , the probability in Eq. (9.37) is just the probability that Xn falls in this subinterval, but since this probability goes to zero, Xn converges in probability to X. However, the sequence Xn does not converge almost surely to X. To see this, fix an arbitrary ω; for increasing n, the values in the sequence alternate between ω and ω + 1, and there is no pointwise convergence.
The counterexample shows that convergence in probability does not imply almost-sure convergence, whereas it can be shown that almost-sure convergence does imply convergence in probability. Not surprisingly, if we apply almost-sure convergence, we get a version of the law of large numbers that is stronger than the one in Theorem 9.8.
THEOREM 9.13 (Strong law of large numbers) Let X1, X2,… be a sequence of i.i.d. random variables, with E[Xi] = μ and finite variance Var(Xi) = σ2 < +∞. Then, we have
A comparison of this theorem with the similar Theorem 9.8 about the weak law of large numbers is puzzling, as they do look the same. Actually, finiteness of variance is a stronger condition than necessary, and alternative statements can be found in the literature; for instance, we could just require E[|Xn|] < ∞. The price we pay for relaxing assumptions is in terms of quite involved theorem proofs, which depend on the type of stochastic convergence involved. We leave such technicalities aside and just remark that the convergence concept in the strong law of large numbers is indeed stronger than the convergence concept used in the weak law.
Leave a Reply