Previously: Falling moments I, Falling and thinning (Falling moments II).

The Hermite distribution, introduced by Kemp & Kemp, is defined like this. It is a discrete distribution on the non-negative integers. It has two parameters, $\mu$ and $\delta$, where $\mu \geq \delta \geq 0$. Now, let $Y \sim \operatorname{Po}(\mu - \delta)$ and $Z \sim \operatorname{Po}(\delta/2)$ be independent Poisson distributions; then $X = Y + 2Z$ has the Hermite distribution $X \sim \operatorname{Herm}(\mu, \delta)$.

(I’m using a different parametrisation from Kemp & Kemp and Wikipedia that I think will work much more nicely in what follows.)

It is known that a Poisson distribution $W \sim \operatorname{Po}(\mu)$ is “equidispersed”, in that its expectation $\mathbb EW = \mu$ and its variance $\operatorname{Var}(W) = \mu$ are equal. But often in real life, count data is actually “overdispersed”, in that its variance is bigger than its mean. The Hermite distribution is an attempt to model data that is similar to a Poisson but slightly overdispersed. Note that if $X \sim \operatorname{Herm}(\mu, \delta)$, then

\[\begin{gather*} \mathbb EX = \mathbb EY + 2\, \mathbb EZ = (\mu - \delta) + 2 \big(\frac{\delta}{2}\big) = \mu , \\ \operatorname{Var}(X) = \operatorname{Var}(X) + 2^2 \, \operatorname{Var}(Y) = (\mu - \delta) + 4 \big(\frac{\delta}{2}\big) = \mu + \delta . \end{gather*}\]

So the variance $\mu + \delta$ is bigger than the expectation $\mu$. Note the restriction $\delta \leq \mu$ means the distribution can only be “slightly overdispersed” – the variance can only be at most twice as big as the expectation.

But the reason that I’m interested in the Hermite distribution is that it seems to behave in many ways like a discrete (non-negative integer) version of the normal distribution. Let’s look at some properties.

Normal Hermite
Let $X \sim \operatorname{N}(\mu, \sigma^2)$ be normally distributed with expectation $\mu$ and variance $\sigma^2$. Let $X \sim \operatorname{Herm}(\mu, \delta)$ be Hermite distributed with expectation $\mu$ and dispersion $\delta$.
The expectation (first moment) is $\mathbb EX = \mu$. The expectation (first falling moment) is $\mathbb EX = \mu$.
The second moment is $\mathbb EX^2 = \mu^2 + \sigma^2$. The second falling moment is $\mathbb EX^{\underline{2}} = \mu^2 + \delta$.
The variance $\operatorname{Var}(X) = \mathbb EX^2 - (\mathbb EX)^2$ is $\operatorname{Var}(X) = \sigma^2$. The dispersion $\operatorname{Disp}(X) = \mathbb EX^{\underline{2}} - (\mathbb EX)^2$ is $\operatorname{Disp}(X) = \delta$.
If we set the variance $\sigma^2$ to 0, then $X \sim \operatorname{N}(\mu, 0)$ is a point mass with expectation $\mu$. If we set the dispersion $\delta$ to 0, then $X \sim \operatorname{Herm}(\mu, 0)$ is a Poisson distribution with expectation $\mu$.
If we scale $X$ by a scaling factor $a$, we get $aX \sim \operatorname{N}(a\mu, a^2 \sigma^2)$, where the expectation is scaled by $a$ and the variance by $a^2$. If we thin $X$ by a thinning factor $a$, we get $a \circ X \sim \operatorname{Herm}(a\mu, a^2 \delta)$, where the expectation is scaled by $a$ and the dispersion by $a^2$.
If we shift $X$ by adding an independent point mass with expectation $b$, we get $X + b \sim \operatorname{N}(\mu + b, \sigma^2)$, where the expectation is shifted by $b$ and the variance stays the same. If we shift $X$ by adding an independent Poisson distribution with expectation $b$, we get $X + \operatorname{Po}(b) \sim \operatorname{Herm}(\mu + b, \delta)$, where the expectation is shifted by $b$ and the dispersion stays the same.
If we add together two independent normal distributions $X_1 \sim \operatorname{N}(\mu_1, \sigma^2_1)$ and $X_2 \sim \operatorname{N}(\mu_2, \sigma^2_2)$, we get another normal distribution $X_1 + X_2 \sim \operatorname{N}(\mu_1 + \mu_2, \sigma^2_1 + \sigma^2_2)$ where the parameters have been added. If we add together two independent Hermite distributions $X_1 \sim \operatorname{Herm}(\mu_1, \delta_1)$ and $X_2 \sim \operatorname{Herm}(\mu_2, \delta_2)$, we get another Hermite distribution $X_1 + X_2 \sim \operatorname{Herm}(\mu_1 + \mu_2, \delta_1 + \delta_2)$ where the parameters have been added.
$X$ is infinitely divisible $X$ is infinitely divisible
The MGF of $X$ is $M_X(t) = \exp\big(\mu t + \tfrac12\sigma^2 t^2\big)$. The FMGF of $X$ is $\Phi_X(t) = \exp\big(\mu t + \tfrac12\delta t^2\big)$.
The normal distribution features as the limiting distribution in the central limit theorem. ???

Most of these are straightforward to check – perhaps the trickiest (and most surprising!) is the thinning result.

If we use the characterisation $X = Y + 2Z$, then $a \circ X = a \circ Y + a \circ (2Z)$. It’s straightforward that $a \circ Y \sim \operatorname{Po}(a(\mu - \delta))$. What about $a \circ (2Z)$? We can think of $2Z$ as representing pairs of items. With proability $a^2$, both items in a pair are lost; so from those pairs of items that are both lost or both arrive, we get $2\,\operatorname{Po}(a^2 \delta/ 2)$. With probability $2a(1-a)$, one of the pair arrives and the other is lost; so from the pairs of items where we lose just one, we get $\operatorname{Po}(2a(1-a)\delta/2)$. So, all together, we have

\[\begin{align*} a \circ X &= \operatorname{Po}\big(a(\mu - \delta)\big) + \operatorname{Po}\big(2a(1-a)\delta/2\big) + 2\,\operatorname{Po}(a^2 \delta/ 2) \\ &= \operatorname{Po}(a\mu - a\delta) + \operatorname{Po}(a\delta - a^2\delta) + 2\,\operatorname{Po}(a^2 \delta/ 2) \\ &= \operatorname{Po}(a\mu - a\delta + a\delta - a^2\delta) + 2\,\operatorname{Po}(a^2 \delta/ 2) \\ &= \operatorname{Po}(a\mu - a^2\delta) + 2\,\operatorname{Po}(a^2 \delta/ 2) \\ &\sim \operatorname{Herm}(a\mu, a^2\delta) . \end{align*}\]

For the FMGF, note that

\[\Phi_{2Z}(t) = \mathbb E(1+t)^{2Z} = \mathbb E \big((1+t)^2\big)^Z = \mathbb E(1 + 2t + t^2)^Z = \Phi_Z(2t + t^2) .\]

Thus we have

\[\begin{align*} \Phi_X(t) &= \Phi_Y(t)\,\Phi_Z(2t + t^2) \\ &= \exp\big((\mu - \delta)t\big) \, \exp\big(\frac{\delta}{2}(2t + t^2) \big) \\ &= \exp\big((\mu - \delta)t + \delta t + \tfrac12 \delta t^2 \big) \\ &= \exp\big(\mu t + \tfrac12 \delta t^2 \big) . \end{align*}\]

The FMGF gives an alternative proof of the thinning results: we have

\[\Phi_{a\circ X}(t) = \Phi_X(at) = \exp\big(a\mu t + \tfrac12 a^2\delta t^2 \big) ,\]

but I admit that I didn’t actually believe it until I’d checked it “by hand”.

(This post was largely inspired by this paper of Jørgensen & Kokonendji from 2015, which also, it turns out, covers most, if not everything, in my previous two posts on this topic.)