Recap

This post is a follow-up to my earlier post on the falling moment generating function

ΦX(t)=E(1+t)X=k=0EXktkk!.

I argued that the FMGF is a natural discrete equivalent to the moment generating function (MGF) MX(t)=EetX. I also argued that although it is very similar to the probability generating function (PGF) GX(t)=EtX, since ΦX(t)=GX(1+t), it seems to give slightly more pleasant expressions for the common distributions and a slightly simpler proof of the “law of small numbers”.

After that blogpost, Oliver Johnson pointed out an even better argument for the FMGF, based on the idea of “thinning”. (Olly is the author of the Substack “Logging the World” and the book “Numbercrunch”, and was my PhD supervisor back in the day.)

Scaling and thinning

All the “GFs” work well with adding up independent random variables – to find the GF of the independent sum X+Y, we just multiply the GF for X and the GF for Y. So with the FMGF, for example, we have have ΦX+Y(t)=ΦX(t)ΦY(t), because

E(1+t)X+Y=E(1+t)XE(1+t)Y.

The MGF works particularly well with scaling random variables. By scaling, I just mean multiplying by a constant, so going from X to aX. We have MaX(t)=MX(at), since

Eet(aX)=Ee(at)X.

But, while scaling is a natural operation for continuous random variables, it doesn’t really make sense for discrete random variables. (From now on, when I say “discrete”, I’ll specifically mean random variables that take values in the non-negative integers {0,1,2,3,}.) Scaling a continuous random variable that takes values in R by a factor of a=0.7 gives you another continuous random variable that takes values in R; but scaling a discrete random variable that takes values in {0,1,2,3,} by a factor of a=0.7 gives you a random variable that takes values in {0,0.7,1.4,2.1,}, which is not really a comparable thing.

Instead, for discrete random variables, a more natural operation is thinning. To thin a discrete random variable X by a constant a[0,1], we can think of X objects as trying to arrive, but each one only making it with probability a and getting lost with probability 1a. Formally, if we write aX for the a-thinning of X, we have

aX=i=1XBi,

where the Bi are IID Bernoulli(a) random variables.

Like scaling, thinning reduces the expectation EX=μ to E(aX)=aμ; but unlike scaling, thinning makes sure we still only get non-negative integer outcomes. So, in this sense, thinning is the natural discrete equivalent of scaling.

So, what happens to the FMGF if we thin a random variable. Well, it’s not to difficult to check that we get

ΦaX(t)=E(1+t)aX=E(E(1+t)aX|X)=E(1+at)X=ΦX(at),

where we used the fact that, conditional on X, the thinning aX is Binomial(x,a), which has FMGF (1+at)X.

So we see that the way the FMGF behaves under thinning, ΦaX(t)=ΦX(at) is exactly the way that the MGF behaves under scaling, MaX(t)=MX(at). (This would not work nearly as well for the PGF, which has the very awkward expression GaX(t)=GX(ata+1).)

Comparing coefficients of tk in ΦaX(t)=ΦX(at) (or, more formally, by differentiating k times and taking t=0), we learn how thinning changes the falling moments: specifically, we get E(aX)k=akEXk. This is the discrete equivalent of how scaling changes the moments: E(aX)k=akEXk.

Large numbers

The way that the MGF works so well with adding independent random variables and scaling random variables allows us to prove a very important result called the law of large numbers.

Let X1,X2, be IID random variables with expectation μ. Then consider the summed-and-scaled random variable

Yn=1n(X1+X2++Xn).

The law of large numbers says that Ynμ as n, by which we mean that Yn tends in distribution to the distribution that is the point mass at μ.

The law of large numbers can be proved using the MGF. Note that by the properties of the MGF we’ve discussed in this post, we have MYn(t)=MX(t/n)n. If we take a Taylor expansion

MX(t)=1+μt+

of the MGF of X, then the MGF of Yn is

MYn(t)=MX(tn)n=(1+μtn+)neμt

But eμt is the MGF of the point mass at μ, which proves the result.

Thin numbers

But we can now follow exactly the same argument for discrete random variables, with thinning instead of scaling, and the FMGF instead of the MGF.

Let X1,X2, be IID discrete random variables with expectation μ. Then consider the summed-and-thinned random variable

Yn=1n(X1+X2++Xn).

The “law of thin numbers” (as Harremoës, Johnson and Kontoyiannis call it) says that YnPo(μ) as n, by which we mean that Yn tends in distribution to the Poisson distribution with rate μ.

The law of thin numbers can be proved using the FMGF. Note that by the properties of the FMGF we’ve discussed in this post, we have ΦYn(t)=ΦX(t/n)n. If we take a Taylor expansion

ΦX(t)=1+μt+

of the FMGF of X, then the FMGF of Yn is

ΦYn(t)=ΦX(tn)n=(1+μtn+)neμt

But eμt is the FMGF of the Poisson distribution with rate μ, which proves the result.