Fractional Moments of the Geometric Distribution

In one of my papers, I needed an upper bound on the fractional moments of the geometric distribution. The integer moments have a nice-looking upper bound involving factorials. In addition, Mathematica showed that the same upper bound holds for fractional moments once we use Gamma function instead of factorials. The problem was, I could not prove it. Worse, I could not find a proof after much looking on the Internet.

This post contains a proof that for any real \lambda > 1, the \lambdath moment of a geometric random variable with success probability \displaystyle p \geq 1/2 is at most \displaystyle \Gamma(\lambda + 1)/p^\lambda.

Geometric distribution and its moments

 

Definition 1 (Geometric distributions {\mathcal{G}_\alpha^+} and {\mathcal{G}_\alpha}) Let {\alpha \in (0, 1/2)}. Define {\mathcal{G}_\alpha^+} as the distribution on positive integers {k} satisfying {\Pr_{X \sim \mathcal{G}_\alpha^+}[X = k] = \alpha^{k-1}(1 - \alpha)}. Define {\mathcal{G}_\alpha} as the distribution on non-negative integers {k} satisfying {\Pr_{Y \sim \mathcal{G}_\alpha}[Y = k] = \alpha^k(1 - \alpha)}.

Let {X \sim \mathcal{G}_\alpha^+} and {Y \sim \mathcal{G}_\alpha}. Semantically, given an ordered list of independent and identical Bernoulli trials, {X} counts the number of trials to get the first success while {Y} counts the number of failures before the first success.

For any real {\lambda > 0}, we can express the moments of {X} in terms of polylogarithm functions, as follows:

\displaystyle \mathop\mathbb{E}[X^\lambda] = \sum_{x=1}^\infty{ x^\lambda (1-\alpha) \alpha^{x-1} } = \frac{1-\alpha}{\alpha} \sum_{x=1}^\infty{ \frac{\alpha^{x}}{x^{-\lambda} } } = \frac{1-\alpha}{\alpha}\mathrm{Li}_{-\lambda}(\alpha) \ \ \ \ \ (1)

where {\mathrm{Li}_{\lambda}} is the polylogarithm function of order {\lambda}. It is defined as

\displaystyle \mathrm{Li}_{\lambda}(\alpha) \triangleq \sum_{k=1}^\infty{ \frac{\alpha^k}{k^\lambda}} \qquad \text{for}\quad \alpha, \lambda \in \mathbb{R} \,. \ \ \ \ \ (2)

Polylogarithm functions of non-positive integer orders can be found using the recursive rule

\displaystyle \mathrm{Li}_1(\alpha) = -\log(1-\alpha) \, , \qquad \text{and} \qquad \dfrac{\partial}{\partial \alpha}{\mathrm{Li}_{\lambda}(\alpha)} = \frac{\mathrm{Li}_{\lambda-1}(\alpha)}{\alpha} \, ,\quad \lambda \in 0, 1, 2, 3, \ldots \, .

This allows us to compute

\displaystyle \mathrm{Li}_{-1}(\alpha) = \frac{\alpha}{(1-\alpha)^2}\, , \qquad \text{and} \qquad \mathrm{Li}_{-2}(\alpha) = \frac{\alpha (1+\alpha)}{(1-\alpha)^3} \, . \ \ \ \ \ (3)

Recall from~(1) that for any real {\lambda > 0}, {\mathop\mathbb{E}[X^\lambda] = \frac{1-\alpha}{\alpha}\mathrm{Li}_{-\lambda}(\alpha)}. It follows that

\displaystyle \mathop\mathbb{E} [X] = \frac{1}{1-\alpha} \quad\text{and}\quad \mathop\mathbb{E}[X^2] = \frac{1 + \alpha}{(1-\alpha)^2} \, . \ \ \ \ \ (4)

It is also easy to check that

\displaystyle \mathop\mathbb{E} Y^\lambda = \mathop\mathbb{E}[(X-1)^\lambda] = \alpha\,\mathop\mathbb{E}[X^\lambda] \,. \ \ \ \ \ (5)

Main Proposition

Proposition For any real {\lambda \geq 1, \alpha \in (0, 1/2)}, and {X \sim \mathcal{G}_\alpha^+},

\displaystyle \mathop\mathbb{E}[X^\lambda] \leq \frac{\Gamma(1+\lambda)}{(1-\alpha)^\lambda} \, , \ \ \ \ \ (6)

and an equality is achieved with {\lambda = 1} and {\alpha = 1/2}.

 

Proof.

Case: {\lambda} is an integer. (This special case has a short proof which we record here for completeness. Note, however, that this case is subsumed by the following cases.)

For any real {a > 0} and {\alpha \in (0, 1)} such that {\alpha e^a < 1}, the moment generating function of {X} is

\displaystyle \begin{array}{rcl} M_X(a) &=& \mathop\mathbb{E}[e^{a X}] = \sum_{\lambda=0}^\infty{\frac{a^\lambda \mathop\mathbb{E}[X^\lambda]}{\lambda!}} \nonumber \\ &=& \frac{1-\alpha}{\alpha} \sum_{k \geq 1} \alpha^k \sum_{\lambda \geq 0} \frac{a^\lambda k^\lambda }{\lambda!} = \frac{1-\alpha}{\alpha} \sum_{k \geq 1} (\alpha e^a)^k \\ &=& \frac{1-\alpha}{\alpha} (1 - \frac{1}{1-\alpha e^a}) \,, \quad\text{or} \end{array}

\displaystyle M_X(a) = \frac{(1-\alpha) e^a}{1-\alpha e^a}\,. \ \ \ \ \ (7)

We can differentiate the moment generating function {M_X(a)} from (7) to the order {\lambda} and utilize the fact {\alpha < 1/2} to get

\displaystyle \begin{array}{rcl} \frac{d^\lambda}{d a^\lambda}M_X(a) &=& \frac{(1-\alpha)e^a}{(1-\alpha e^a)^{\lambda+1}} \sum_{k=0}^{\lambda-1}{\genfrac<>{0pt}{}{t}{k} \alpha^k} \\ &\leq& \frac{(1-\alpha)e^a}{(1-\alpha e^a)^{\lambda+1}} \cdot \frac{\lambda!}{2} \cdot \frac{1}{1-\alpha} \\ &\leq& \frac{e^a \lambda!}{(1-\alpha e^a)^{\lambda}} \end{array}

where {\genfrac<>{0pt}{}{\lambda}{k}} are the Eulerian numbers satisfying {\genfrac<>{0pt}{}{\lambda}{k} \leq \lambda!/2} for any {0 \leq k \leq \lambda-1}. It follows that

\displaystyle \mathop\mathbb{E}[X^\lambda] =\frac{d^\lambda}{d a^\lambda}M_X(a)\bigg\vert_{a=0} = \frac{1}{(1-\alpha )^{\lambda}} \sum_{k=0}^{\lambda-1}{\genfrac<>{0pt}{}{t}{k}\alpha^k} \,, \ \ \ \ \ (8)

and

\displaystyle \mathop\mathbb{E}[X^\lambda] \leq \frac{\lambda!}{(1-\alpha)^{\lambda}} \,. \ \ \ \ \ (9)

Case: {\lambda \geq 2} is a real. The proof for a real {\lambda} builds on the analytic properties of {\Gamma(1+\lambda)} and {\mathrm{Li}_{-\lambda}(\alpha)}. Let us focus on the functions

\displaystyle f_\alpha(\lambda) \triangleq \frac{(1-\alpha)^{\lambda+1} }{\alpha} \mathrm{Li}_{-\lambda}(\alpha)\,, \qquad\text{and} \ \ \ \ \ (10)

\displaystyle f(\lambda) \triangleq f_{1/2}(\lambda) = 2^{-\lambda} \mathrm{Li}_{-\lambda}(1/2) \,. \ \ \ \ \ (11)

We claim that {f_\alpha(\lambda) \leq f(\lambda)} for a fixed {\lambda}. Specifically, observe that {f_\alpha(\lambda)} is a product of {1/\alpha, (1-\alpha)^{1+\lambda}}, and {\mathrm{Li}_{-\lambda}(\alpha)}; since each of these functions is convex in {\alpha}, so is {f_\alpha(\lambda)}. Moreover, since {f_0(\lambda) < f_{1/2}(\lambda)}, we conclude that {f_{\alpha}(\lambda) \leq f(\lambda)}. Thus the claim is equivalent to showing {f(\lambda) \leq \Gamma(1+\lambda)} for the specified regime of {\lambda}. Suppose {\lambda \in [a, a+1]} for an integer {a \geq 2}. Let {g(\lambda)} be the first-order approximation of {\Gamma(1+\lambda)} around {\lambda = a}. Since the derivatives {\Gamma^{(n)}(1+\lambda)} are positive for every {n \geq 0}, {g(1+\lambda)} is a lower bound for {\Gamma(1+\lambda)}. Recall that {\Gamma^\prime(1+\lambda) = \Gamma(1+\lambda) \Psi(1+\lambda)} where {\Psi} is the Digamma function defined as {\Psi(s) \triangleq \Gamma^\prime(s)/\Gamma(s)}. Since {\Psi(1+\lambda) > \log(1/2+\lambda)} (cf. (3) in \cite{diamond2016bounds}), it follows that

\displaystyle \Gamma(1+\lambda) > g(\lambda; a) \triangleq \Gamma(1+a) + (\lambda - a) \Gamma(1+a) \log(1/2 + a) \, .

On the other hand, the straight-line segment {\phi(\lambda)} determined by the points {(a, f(a))} and {(a+1, f(a+1))} serves as an upper bound on the convex function {f(\lambda)} in the interval {[a, a+1]}. This line is

\displaystyle \phi(\lambda; a) \triangleq f(a) + (\lambda - a) ( f(a+1) - f(a) ) \, .

Observe that the line {\phi(\lambda; a)} stays below the line {g(\lambda; a)} for {\lambda \in [a, a+1], a = 2, 3, 4, \cdots} since we can use (1) and (9) to show that

\displaystyle \phi(a;a) = f(a) = 2^{-a} \mathrm{Li}_{-a}(1/2) \leq 2^{-a} \cdot \frac{1/2}{1-1/2} \frac{a!}{(1-1/2)^a} = a! = \Gamma(1+a) = g(a; a) \, .

This proves the claim for every real {\lambda \geq 2}.

Case: {\lambda \in [1, 2]} is a real. Here, as before, we seek to show that {f(\lambda) \leq \Gamma(1+\lambda)} for {\lambda \in [1,2]}. The straight-line bounds from the previous case does not work in this case because {g(\lambda; 1) < f(\lambda; 1)} for {\lambda \in [1, 2]}. Our argument, instead, relies on the following expression of the negative-order Polylogarithms in term of the Gamma function (cf. (13.1) in \cite{wood1992polylog}):

\displaystyle \mathrm{Li}_{-\lambda}(\alpha) = \Gamma(1+\lambda) \sum_{k = -\infty}^\infty{ (- \log(\alpha) + 2 \pi k i)^{1+\lambda}} \, , \qquad \lambda > 0 \, . \ \ \ \ \ (12)

Using {\alpha = 1/2} in (12) and further simplification, we can write (11) as

\displaystyle f(\lambda) = 2^{-\lambda}\Gamma(1+\lambda) \left( T_0(\lambda) + T(\lambda) \right) \, , \ \ \ \ \ (13)

where

\displaystyle \begin{array}{rcl} T(\lambda) &=& \sum_{k \geq 1}{T_k(\lambda)}\, , \\ T_k(\lambda) &\triangleq& z_k^{-(1+\lambda)} + (z_k^*)^{-(1+\lambda)}\, , \qquad \text{and} \\ z_k &\triangleq& \log(2) + 2\pi k i\, , \qquad z_k \in \mathbb{C} \\ T_0(\lambda) &=& \frac{1}{(\log(2)^{1+\lambda}} \, . \end{array}

Our plan is to show that {T(\lambda) < 0} for {\lambda \in [1,2]} since it would imply that

\displaystyle \begin{array}{rcl} f(\lambda) &\leq& 2^{-\lambda} T_0(\lambda) \Gamma(1+\lambda) = \frac{2^{-\lambda} \Gamma(1+\lambda)}{ (\log(2))^{1+\lambda} } \\ &=& \frac{\Gamma(1+\lambda)}{(\log(4))^\lambda \log(2)} \leq \frac{\Gamma(1+\lambda)}{\log(8)} \\ &<& \Gamma(1+\lambda) \end{array}

using {\lambda \geq 1}.

It remains to show that {T(\lambda) < 0} for {\lambda \in [1,2]}. First, observe that we can explicitly compute {T(\lambda)} for {\lambda = 1,2} using (13). In particular, we can use (3) to compute {\mathrm{Li}_{-1}(1/2) = (1/2) / (1/4) = 2}, and {\mathrm{Li}_{-2}(1/2) = (1/2)(3/2)/(1/8) = 6}, and further using (13), we get

\displaystyle \begin{array}{rcl} T(1) &=& \frac{f(1)}{2^{-1}\Gamma(1+1)} - T_0(1) = \frac{\mathrm{Li}_{-1}(1/2)}{1} - \frac{1}{(\log(2))^2} \\ &=& 2 - \frac{1}{(\log(2))^2} \approx -0.08 \, , \quad\text{and}\\ T(2) &=& \frac{f(2)}{2^{-2}\Gamma(1+2)} - T_0(2) = \frac{\mathrm{Li}_{-2}(1/2)}{2} - \frac{1}{(\log(2))^3} \\ &=& 3 - \frac{1}{(\log(2))^3} \approx -3.0027 \, , \end{array}

and, in particular, {T(1) < T(2) < 0}. Thus to show that {T(\lambda) < 0} for {\lambda \in [1,2]}, it suffices to show that {\partial T_k(\lambda)/\partial \lambda} is strictly positive for every {k \geq 1}. The rest of the proof is devoted to this task.

Let {r_k \triangleq \vert z_k \vert = \sqrt{\ln(2)^2 + (2 \pi k)^2}} and {\theta_k \triangleq \arg{z_k} = \arctan(2 \pi k/\log(2) )}. Then {z_k = r_k e^{i \theta_k}, z_k^* = r_k e^{-i \theta_k}, 1/z_k = e^{-i \theta_k}/r_k}, and {1/z_k^* = e^{i \theta_k}/r_k}. Thus

\displaystyle \begin{array}{rcl} \frac{\partial}{\partial \lambda}T_k(\lambda) &=& \frac{\partial}{\partial \lambda}\left[ \left(\frac{1}{z_k} \right)^{1+\lambda} + \left(\frac{1}{z_k^*}\right)^{1+\lambda} \right] \nonumber \\ &=& \left(\frac{1}{z_k} \right)^{1+\lambda}\ln \frac{1}{z_k} + \left(\frac{1}{z_k^*}\right)^{1+\lambda} \ln \frac{1}{z_k^*} \nonumber \\ &=&\left( \frac{1}{r_k}\right)^{1+\lambda}\left( e^{-i \theta_k(1+\lambda)} \ln \frac{e^{-i \theta_k}}{r_k} + e^{i \theta_k(1+\lambda)} \ln \frac{e^{i \theta_k}}{r_k} \right)\,. \end{array}

Thus

\displaystyle \frac{\partial}{\partial \lambda}T_k(\lambda) = -2 \left( \frac{1}{r_k}\right)^{1+\lambda}\left( \ln(r_k) \cos(\theta_k(1+\lambda)) + \theta_k \sin (\theta_k(1+\lambda)) \right) \,. \ \ \ \ \ (14)

Since {r_k \geq r_1 = \log(2) > 0} and {\theta_k \in [0, 2\pi)}, the signs of the trigonometric terms decide the sign of the right-hand side above. We claim that

\displaystyle \ln(r_k) \cos(\theta_k(1+\lambda)) + \theta_k \sin (\theta_k(1+\lambda)) < 0 \ \ \ \ \ (15)

which, in turn, implies that the right-hand side of (14) is strictly positive, as desired. Recall that

\displaystyle \arctan(x) = \pi/2 - \arctan(1/x)

if {x > 1}, and

\displaystyle \arctan(x) = x - \frac{x^3}{3} + \frac{x^5}{5} - \cdots < x

if {\vert x \vert < 1}. Define {\beta_k \triangleq 2 \pi k/\log(2)}. Since {\beta_k > 0} for all {k \geq 1}, we have

\displaystyle \theta_k = \arctan(\beta_k) = \frac{\pi}{2} - \arctan(1/\beta_k) \in \left[ \pi/2 - 1/\beta_k, \pi/2 \right] \, ,

and consequently, since {\lambda \in [1,2]},

\displaystyle \theta_k(1+\lambda) \in \left[ \pi - 2/\beta_k, 3\pi/2 \right] \, .

If we have { \theta_k(1+\lambda) \in (\pi, 3 \pi/2)} then both {\sin(\theta_k(1+\lambda)) } and {\cos(\theta_k(1+\lambda))} are strictly negative, satisfying (15). Otherwise, if {\theta_k(1+\lambda) \in \{\pi, 3\pi/2\}}, exactly one of these terms is strictly negative while the other one is zero; (15) is satisfied in this case as well.

The only remaining case is {\theta_k(1+\lambda) \in [\pi - 2/\beta_k, \pi)} where {2/\beta_k = \ln(2)/\pi k < \pi/2}; in this case, the cosine term is negative but the sine term is positive. Let {a(k, \lambda) \triangleq \ln(r_k) \cos(\theta_k(1+\lambda))} and {b(k, \lambda) \triangleq \theta_k \sin(\theta_k(1+\lambda))}. If we can show that {\vert a(k, \lambda) \vert > \vert b(k, \lambda) \vert}, (15) will be satisfied. With this in mind, observe that {k \geq 1} implies {\log(r_k) = (1/2) \log(r_k^2) = (1/2) \log( (\log(2)^2 + 4 \pi^2 k^2) > 3/2}, and that {\theta_k \leq \pi/2}. Using this, we calculate

\displaystyle \begin{array}{rcl} | a(k, \lambda) | - | b(k, \lambda)| &>& (3/2) |\cos(\pi - 2/\beta_k)| - (\pi/2) |\sin(\pi - 2/\beta_k)| = (3/2) \cos(2\beta_k) - (\pi/2)\sin(2\beta_k) \\ &>& (3/2) \cos(\ln(2)/\pi) - (\pi/2)\sin(\ln(2)/\pi) \\ &>& 1 \end{array}

by a direct evaluation, and~(15) is satisfied, as desired. \Box

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s