Almost-sure convergence

The last type of convergence that we consider is a strong one in the sense that it implies convergence in probability, which in turn implies convergence in distribution.

DEFINITION 9.12 (Almost-sure convergence) The sequence of random variables X₁, X₂,…, converges almost surely to random variable X if for every , the following condition holds:

Almost-sure convergence is denoted by . Sometimes, almost-sure convergence is referred to as convergence with probability 1; correspondingly, the notation “w.p.1” rather than “a.s.” may be found.

Comparing definition 9.12 of almost-sure convergence with definition 9.7 of convergence in probability may be confusing, as they seem quite similar. The difference is a swap of the limit operator with the probability operator. This commutation is not innocuous at all, and it does make a difference. To understand the definition, it is useful to recall the definition of random variables as functions X(ω) from the sample space Ω to real numbers and to reflect on the concept of pointwise convergence of functions. For instance, consider a sequence of deterministic functions f_n(x), where x ∈ [0, 1]. We say that this sequence of functions converges pointwise to function f(x) if, for all x ∈ [0, 1], we have

Definition 9.12 weakens this condition a bit, in the sense that we require that the condition

holds only for almost any ω ∈ Ω. The set of outcomes ω for which the sequence does not converge must be a set of null measure; equivalently, we have convergence for a set with probability measure 1. The following nice examples illustrate these concepts.²⁴

Example 9.31 Let the sample space be Ω = [0, 1], where probability is uniform. Consider the sequence of random variables

and the random variable X(ω) = ω. If ω ∈ [0, 1), we see that

This does not happen for ω = 1, since X_n(1) = 2 for any n. Hence, we have convergence of X_n(ω) to X(ω) for all outcomes except a single one, which is a set of null measure. Hence

Example 9.32 Let the sample space be Ω = [0, 1], again with uniform probability, as in the previous example. The indicator function associated with an interval is denoted by I_[a,b](x) and takes value 1 if x ∈ [a, b], 0 otherwise.

Now define a sequence of random variables as follows:

To see the logic behind this sequence, notice that the interval [0, 1] is sliced in two parts to define X₂ and X₃; then in three parts to define X₄, X₅, and X₆; etc. These slices, for increasing n, are smaller and smaller. Whatever ω we choose, X₁(ω) = s + 1. The other variables in the sequence take either value s, or value s + 1, depending on whether ω is in the interval of the indicator function associated with each variable in the sequence.

Now let us consider the random variable X(ω) = ω. The sequence X_n converges in probability to X, i.e., . To see this, consider the probability

The random variables X_n(ω) and X(ω) differ on a subinterval of Ω that is smaller and smaller for n increasing; in other words, the measure of this interval goes to zero. In fact, for a suitably small , the probability in Eq. (9.37) is just the probability that X_n falls in this subinterval, but since this probability goes to zero, X_n converges in probability to X. However, the sequence X_n does not converge almost surely to X. To see this, fix an arbitrary ω; for increasing n, the values in the sequence alternate between ω and ω + 1, and there is no pointwise convergence.

The counterexample shows that convergence in probability does not imply almost-sure convergence, whereas it can be shown that almost-sure convergence does imply convergence in probability. Not surprisingly, if we apply almost-sure convergence, we get a version of the law of large numbers that is stronger than the one in Theorem 9.8.

THEOREM 9.13 (Strong law of large numbers) Let X₁, X₂,… be a sequence of i.i.d. random variables, with E[X_i] = μ and finite variance Var(X_i) = σ² < +∞. Then, we have

A comparison of this theorem with the similar Theorem 9.8 about the weak law of large numbers is puzzling, as they do look the same. Actually, finiteness of variance is a stronger condition than necessary, and alternative statements can be found in the literature; for instance, we could just require E[|X_n|] < ∞. The price we pay for relaxing assumptions is in terms of quite involved theorem proofs, which depend on the type of stochastic convergence involved. We leave such technicalities aside and just remark that the convergence concept in the strong law of large numbers is indeed stronger than the convergence concept used in the weak law.

Almost-sure convergence

Comments

Leave a Reply Cancel reply