Taken from Real Analysis and Probability by R.M. Dudley.

For a sequence of n repeated, independent trials of an experiment, some probability distributions and variables converge as n tends to infinity. In proving such limit theorems, it is useful to be able to construct a probability space on which a sequence of independent random variables is defined in a natural way; specifically, as coordinates for a countable Cartesian product.

The Cartesian product of finitely many \sigma-finite measure spaces gives a \sigma-finite measure space. For example, Cartesian products of Lesbesgue measure on the line give Lesbesgue measure on finite-dimensional Euclidean spaces. But suppose we take a measure space \{0,1\} with two points each having measure 1\mu(\{0\})=1=\mu(\{1\}), and form a countable Cartesian product of copies of this space, so that the measure of any countable product of sets equals the product of their measures. Then we would get an uncountable space in which all singletons have measure 1, giving the measure usually called counting measure. An uncountable set with counting measure is not a \sigma-finite space, although in this example it was a countable product of finite measure spaces. By contrast, the the countable product of probability measures will again be a probability space. Here are some definitions.

For each n=1,2,\dots let (\Omega_n,S_n,P_n) be a probability space. Let \Omega be the Cartesian product \displaystyle \prod_{n\geq 1}\Omega_n, that is, the set of all sequences \{\omega_n\}_{n\geq 1} with \omega_n\in\Omega_n for all n. Let \pi_n be the natural projection of \Omega onto \Omega_n for each n\pi_n\left(\{\omega_m\}_{m\geq 1}\right)=\omega_n for all n. Let S be the smallest \sigma-algebra of subsets of \Omega such that for all m\pi_m is measurable from (\Omega,S) to (\Omega_m,S_m). In other words, S is the smallest \sigma-algebra containing all sets \pi_n^{-1}(A) for all n and all A\in S_n.

Let \mathcal{R} be the collection of all sets \displaystyle \prod_{n\geq 1}A_n\subset\Omega where A_n\in \mathcal{S}_n for all n and A_m=\Omega_m except for at most finitely many values of n. Elements of \mathcal{R} will be called rectangles. Now recall the notion of semiring. \mathcal{R} has this property.


For any set X, a collection \mathcal{D}\subset 2^X is called a semiring if \emptyset\in\mathcal{D} and for any A and B in \mathcal{D}, we have A\cap B\in\mathcal{D} and \displaystyle A\backslash B=\bigcup_{j\geq 1}C_j for some finite n and disjoint C_j\in\mathcal{D}.


The collection \mathcal{R} if rectangles in the infinite product \Omega is a semiring. The algebra \mathcal{A} generated by \mathcal{R} is the collection of finite disjoint unions of elements of \mathcal{R}.

Proof : If C and D are any two rectangles, then clearly C\cap D is a rectangle (\checkmark). In a product of two spaces, the collection of rectangles is a semiring (p.95, Proposition 3.2.2). Specifically, a difference of two rectangles is a union of two disjoint rectangles:

\displaystyle (A\times B)\backslash (E\times F)=((A\backslash E)\times B)\cup ((A\cap E)\times(B\backslash F)).

It follows by induction that in any finite Cartesian product, any difference C\backslash D of rectangles is a finite disjoint union of rectangles. Thus \mathcal{R} is a semiring. We have \Omega\in\mathcal{R}, so the ring generated by \mathcal{R} is an algebra (p.96, Proposition 3.2.3). Since every algebra is a ring, \mathcal{A} is the algebra generated by \mathcal{R}. By Proposition 3.2.3, \mathcal{A} consists of all finite disjoint unions of elements of \mathcal{R} \bullet

Now for A=\prod_n A_n\in\mathcal{R}, let P(A):=\prod_nP_n(A_n). The product converges since all but finitely many factors are 1. Here is the main theorem to be proved in the rest of this section:

Theorem: Existence Theorem for Infinite Product Probabilities

P on \mathcal{R} extends uniquely to a (countably additive) probability measure on \mathcal{S}.

Proof : For each A\in\mathcal{A}, write A as a finite disjoint union of sets in \mathcal{R}, say

\displaystyle A=\bigcup_{r=1}^N B_r,

and define

\displaystyle P(A):=\sum_{r=1}^NP(B_r).

Let us first show that P is well-defined and finitely additive on finite disjoint unions. Each B_r is a product of sets A_{rn}\in\mathcal{S}_n with A_{rn}=\Omega_n for all n\geq n(r) for some finite n(r). Let m be the maximum of the n(r) for r=1,\dots,N. Then since all the A_{rn} equal \Omega_n for n\geq m, properties of P on such sets are equivalent to properties of the finite product measure on \Omega_1\times\cdots\times\Omega_m. To show that P is well-defined, if a set of two different finite disjoint union of sets in \mathcal{R}, we can take the maximum of the values of m for the two unions and still have a finite product. So P is well-defined and finitely additive on \mathcal{A} by the finite product measure theorem (p.139 , Theorem 4.4.6).

If P us countably additive on \mathcal{A}, then it has a unique countably additive extension to \mathcal{S} by the Carathéodory Extension Theorem. So it’s enough to prove countable additivity on \mathcal{A}. Equivalently, if A_j\in \mathcal{A}, A_1\supset A_2\supset\cdots, and \cap_j A_j=\emptyset we want to prove P(A_j)\downarrow 0 (P is continuous at \emptyset” — p.86, Theorem 3.1.1). In other words, if A_j is a decreasing sequence of sets in \mathcal{A} and for some \varepsilon>0P(A_j)\geq\varepsilon for all j, we must show \cap_jA_j\neq\emptyset.

Let P^{(0)}L=P on \mathcal{A}. For each n\geq 1, let

\displaystyle \Omega^{(n)}:=\prod_{m>n}\Omega_m.

Let \mathcal{A}^{(n)} and P^{(n)} be defined on \Omega^{(n)} just as \mathcal{A} and P were on \Omega. For each E\subset \Omega and x_i\in\Omega_ii=1,\dots,n, let

\displaystyle E^{(n)}(x_1,\dots,x_n):=\left\{\{x_m\}_{m>n}\in\Omega^{(n)}:x=\{x_i\}_{i\geq 1}\in E\right\}.

For a set A in a product space X\times Y and x\in X, let A_x:=\{y\in Y:(x,y)\in A\}. If A is in a product \sigma-algebra \mathcal{S}\otimes\mathcal{T} then A_x\in\mathcal{T} (due to the proof of Theorem 4.4.3, p. 135). For any E\in\mathcal{A} there is an N large enough so that

\displaystyle E=F\times\prod_{n>N}\Omega_n for some \displaystyle F\subset \prod_{n\leq N}\Omega.

Since E is a finite union of rectangles with this property, take the maximum of the values of NN for the rectangles. Then

\displaystyle F=\bigcup_{k=1}^m F_k


\displaystyle F_k=\prod_{i=1}^NF_{ki}

for some F_{ki}\in\mathcal{S}_ii=1,\dots,Nk=1,\dots,m. Now for any n<N, and x_i\in\Omega_ii=1,\dots,nE^{(n)}(x_1,\dots,x_n)=G\times \Omega^{(N)} where G is the union of those sets

\displaystyle \prod_{i=n+1}^NF_{ki} such that x_i\in F_{ki} for all i=1,\dots,n.

Thus E^{(n)}(x_1,\dots,x_n)\in \mathcal{A}^{(n)}, so P^{(n)} of it is defined. Then by the Tonelli-Fubini theorem in


we have

\displaystyle P(E)=\int P^{(n)}\left(E^{(n)}(x_1,\dots,x_n)\right)\,\prod_{j=1}^ndP_j(x_j).

For \varepsilon with P(A_j)\geq\varepsilon for all j, let


For each j, apply the formula for P to E=A_jn=1. Then

\displaystyle\varepsilon\leq P(A_j)=\int P^{(1)}\left(A_j^{(1)}(x_1)\right)\,dP_1(x_1)

\displaystyle \left(\int_{F_j}+\int_{\Omega_1\backslash F_j}\right)P^{(1)}\left(A_j^{(1)}(x_1)\right)\,dP_1(x_1)

\leq P_1(F_j)+\varepsilon/2.

Thus P_1(F_j)\geq\varepsilon/2 for all j. As j increases, the sets A_j decrease; thus, so do the A_j^{(1)} and the F_j.

Since P_1 is countably additive,

\displaystyle P_1\left(\bigcap_{j}F_j\right)\geq \varepsilon/2>0

by monotone convergence of indicator functions, so \cap_j F_j\neq\emptyset. Take any y\in\cap_j F_j. Let f_j(y,x):=P^{(2)}\left(A_j^{(2)}(y,x)\right) and G_j:=\{x_2\in\Omega_2:f_j(y_1,x_2)>\varepsilon/4\}. Then G_j decreases as j increases,

\displaystyle \varepsilon/2<P^{(1)}\left(A_j^{(1)}(y_1)\right)=\int f_j(y_1,x)\,dP_2(x)    for al j,

and P_2(G_j)>\varepsilon/4 for all j, so the intersection of all the G_j is non-empty in \Omega_2 and we can choose y_2 in it.

Inductively, by the same argument there are y_n\in\Omega_n for all n such that P^{(n)}\left(y_1,\cdots,y_n\right)\geq\varepsilon/2^n for all j and n. Let y:=\{y_n\}_{n\geq 1}\in\Omega. To prove that y\in A_j for each j, choose n large enough (depending on j) so that for all x_1,\dots,x_nA_j^{(n)}(x_1,\dots,x_n)=\emptyset or \Omega^{(n)}. This is possible since A_j\in\mathcal{A}. Then A_j^{(n)}(y_1,\dots,y_n)=\Omega^{(n)}, so y\in A_j. Hence \cap_j A_j\neq\emptyset \bullet

Actually, this theorem holds for arbitrary (not necessarily countable) products of probability spaces. The proof needs no major change, since each set in the \sigma-algebra \mathcal{S} depends only on countably many coordinates. In other words, given a product \prod_{i\in I}\Omega)i, where I is a possibly uncountable index set, for each set A\in\mathcal{S} there is a countable subset J of I and a set B\subset\prod_{i\in J}\Omega_i such that A=B\times\prod_{i\not\in J}\Omega_j.