# Forkable Strings are Rare

In a blockchain protocol such as Bitcoin, the users see the world as a sequence of states. A simple yet functional view of this world, for the purpose of analysis, is a Boolean string $w = w_1, w_2, \cdots$ of zeros and ones, where each bit is independently biased towards $1$ favoring the “bad guys.”

A bad guy is activated when $w_t = 1$ for some $t$. He may try to present the good guys with a conflicting view of the world, such as presenting multiple candidate blockchains of equal length. This view is called a “fork”. A string $w$ that allows the bad guy to fork (with nonnegligible probability) is called a “forkable string”. Naturally, we would like to show that forkable strings are rare: that the manipulative power of the bad guys over the good guys is negligible.

Claim ([1], Bound 2). Suppose $w =w_1, \cdots, w_n$ is a Boolean string, with every bit independently set to $1$ with probability $(1-\epsilon)/2$ for some $\epsilon < 1$. The probability that $w$ is forkable is at most $\exp(-\epsilon^3n/2)$.

In this post, we present a commentary on the proof that forkable strings are rare. I like the proof because it uses simple facts about random walks, generating functions, and stochastic domination to bound an apparently difficult random process.

The Biased Random Walk, $W$. Let $w := w_1w_2\cdots \in \{0,1\}^*$ be a Boolean string with

$\displaystyle \Pr[w_t = 1] = \frac{1-\epsilon}{2}.$

Define $W$ to be the $(p,q)$-biased random walk on the two-dimensional grid, starting at the origin. The horizontal axis is indexed by nonnegative “time” steps. The vertical axis is index by the integers. The walk $W$ takes a step “up” with probability $p$ and a step “down” with probability $q$. In this note, we refer to this walk by the random walk.

Two More Random Walks, $\lambda$ and $\mu$Define two more random walk variables $\lambda(t), \mu(t)$ that depend on $w$. Both $\lambda$ and $\mu$ are zero at $t = 0$. If $w_t = 1$, both $\lambda, \mu$ take a step “up” i.e., their value increases by one. When $w_t = 0$, they take one step “down” with the following exception:

• If  $\lambda(t-1) = 0$ then $\lambda(t) \leftarrow 0$
• If $\mu(t-1) = 0$ and $\lambda(t-1) > 0$, then $\mu(t) \leftarrow 0$

These rules imply that

$\displaystyle \lambda(w) \geq 0, \lambda(w) \geq \mu(w) \text{ for all strings } w,$

and whenever $\lambda(t)$ equals $\mu(t)$, they take the value zero.

The Divergence of $\mu$. Imagine a stream on incoming zero-one symbols $\cdots, w_t, \cdots$ which determine the walks. Let $\tau$ be the last time $\mu$ was nonnegative. We say $\mu$ “diverges” at time $\tau$.

Forkability. The string $w$ is “forkable” if $\mu(w) \geq 0$.

$\displaystyle \Pr[w \text{ is forkable}] \leq \Pr[\mu \text{ diverges at or after } n].$

Let $Q$ be an upper bound on this probability.

Want to Show:

$Q = \exp( -\epsilon^3 (1 - O(\epsilon)) n/2 ).$

Two Kinds of Epochs.  Every time $t$ when $\lambda(t) = \mu(t) = 0$ marks the beginning of an epoch. If the first symbol $w_t$ in an epoch is 1, we call this an “up-epoch”; otherwise, we call it a “down-epoch”. Inside an up-epoch, it is always the case that $\lambda = \mu > 0$.

The down-epochs are more interesting, however:

1. At the beginning of a down-epoch, only $\mu$ takes the step down while $\lambda$ stays at zero.
2. It remains there as long as the incoming symbols $w_t$ are all zeros.
3. Whenever $w_t = 1$, both $\lambda$ and $\mu$ move up in unison. Let $\Delta := \lambda(t) - \mu(t)$.
4. Next, two things can happen. Either $\lambda$ reaches zero before $\mu$ does. Then we go back to step 2.
5. Otherwise, $\mu$ reaches zero when $\lambda$ is at $\Delta > 0$. This completes the first part of a down-epoch.
6. Now begins the second part of a down-epoch: now $\mu$ waits at zero for $\lambda$ to make $\Delta$ descends and reach zero again.
7. When $\lambda$ eventually reaches zero, a down-epoch is complete.

In order to bound the probability of eventual divergence, we need to bound the probability of an epoch-completion. Let us begin with some definitions.

First epoch completion probability. Let $m_t$ be the probability that the first epoch completes at time $t$.

Divergence probability. Let $\ell_t$ be the probability that $\mu$ diverges at $t$.

Our goal is to bound the forkability:

$\Pr[w \text{ is forkable}] \leq \sum_{t \geq n} \ell_t$.

Before we lay out the plan, let us introduce some key ingredients which we will use. These are:

1. Generating functions
2. Convergence of GFs
3. Stochastic dominance of GFs
4. Gambler’s Ruin
5. Facts about a biased random walk

Generating Functions, or GF in short. A generating function $\mathsf{G}(Z)$ is a formal power series $\sum_{t}{g_t Z^t}$ which “generates” the sequence $G=\{g_n\}$ as its coefficients. “A generating function is a clothesline where we hang the coefficients”, says Herbert Wilf. We identify the GF $\mathsf{G}$ with its coefficients $G$. If the coefficients of a GF are probabilities, then it is called a probability generating function, or PGF in short. These GFs satisfy $\mathsf{G}(1)=\sum_{g_t} = 1$. We use the notation $[G]_k$ to indicate the $k$th coefficient $g_k$.

Radius of Convergence of a GF. Suppose a generating function $\mathsf{G}(Z) = \sum_{t}{g_t Z^t}$ converges at some $z$, with the radius of convergence being $R$. This means, $\mathsf{G}(z^\prime)$ converges for all $\vert z-z^\prime \vert < R$, but there is some $z^\prime$ with $\vert z-z^\prime \vert = R$ so that $\mathsf{G}(z^\prime)$ does not converge. When $\mathsf{G}(Z)$ converges with radius $R$, it means

$\displaystyle \lim_{n \rightarrow \infty}\frac{g_n}{R^n}=0$, or $\displaystyle g_n = O(1/R^n)$.

Thin Tail of a Converging GF. Then,

$\displaystyle \sum_{t \geq T}{g_t} = O(1/R^n)$

because if we sum a constant number of terms, each $O(1/R^{n+c})$, the sum is still $O(1/R^n)$.

Stochastic Dominance of PGFs.  The probability distribution $A = \{a_t\}$ “stochastically dominates” another distribution $B = \{b_t\}$ if, for all $t \geq T$, it holds that $a_t \geq b_t$. We express this fact as $\mathsf{B} \preceq \mathsf{A}$. Because $\mathsf{A}(1) = \mathsf{B}(1)=0$, the above also means $b_t \geq a_t$ for all $t < T$.

Gambler’s Ruin. Suppose $\mu = 0$ at present. The probability that $\mu$ ever reaches $\mu = z > 0$ is $(p/q)^z$. See here for more on the Gambler’s ruin problem.

Let us define some properties of the biased random walk $W$.

First Ascent Time. Let $\{a_t\}$ be the probabilities that the $(p,q)$-biased random walk $W$, starting at the origin, visits (or hits) $1$ for the first time at step $t$. Let $\mathsf{A}(Z)$ be the corresponding generating function. We can see that

$\displaystyle \mathsf{A}(Z) = p Z + q Z \mathsf{A}(Z)^2.$

Observe that $\mathsf{D}$ is not a PGF, since the gambler’s ruin tells us that there is a probability of $p/q$ that the first ascent never happens, giving $\mathsf{A}(1) = 1 - p/q$.

First Descent Time. Let $\{d_t\}$ be the probability that the $(p,q)$-biased random walk $W$, starting at the origin, visits (or hits) $-1$ for the first time at step $t$. Let $\mathsf{D}(Z)$ be the corresponding generating function. We can see that

$\displaystyle \mathsf{D}(Z) = q Z + p Z \mathsf{D}(Z)^2.$

Since $q > p$, $\mathsf{D}$ is indeed a PGF.

$t$-Descent Time. Let $\{d^{(t)}_k\}$ be the probability that the $(p,q)$-biased random walk $W$, starting at the origin, visits (or hits) $-t$ for the first time at step $k$. The convolution rule for GFs tells us that the GF for this sequence is $\mathsf{D}(Z)^t$.

We observe that $a_0 = d_0 = 0, a_1 = p, d_1 = q$, and $d^{(t)_k} = 0$ for $k < t$.

GF for Epoch-Completion and Divergence. Let $\mathsf{M}$ and $\mathsf{L}$ be the GF and GF of the sequences $M = \{m_t\}$ and $L=\{\ell_t\}$, respectively.

Epoch-Completion Probability. The GF $\mathsf{L}$ is indeed a PGF since the divergence can occur at any time step. However, $\mathsf{M}$ is not a PGF:  if $\mu(1) = -1$, the probability that $\mu$ reaches zero ever again is $p/q$. On the other hand, if $\mu >0$ (respectively, $\lambda >0$) the future event $\mu = 0$ (respectively, $\lambda = 0$) happens with probability $1$ since $q > p$. Together, the probability that an epoch is ever complete is

$\displaystyle p \cdot \Pr[\text{up-epoch completes}] + q \cdot \Pr[\text{down-epoch completes}]$
$\displaystyle = p \cdot 1 + q \cdot \left( \Pr[\mu \text{ ever reaches up to zero} ] \cdot \Pr[\lambda \text{ ever reaches down to zero}]\right)$
$\displaystyle = p + q (p/q) \cdot 1$
$= 2p = 1 - \epsilon.$

This means $\mathsf{M}(1) = \sum_t{m_t} = 1-\epsilon$; there is an $\epsilon$ probability that no epoch ever completes. This is why $\mathsf{M}$ is not a probability generating function.

The Plan. Recall that we want to get an upper-bound $Q$ to the quantity  $\sum_{t \geq n}\{\ell_t\}$. Since $\{\ell_t\}$ are probabilities, it suffices to get an upper-bound $R$ for the radius of convergence of $\mathsf{L}$. This will give us $Q = O(1/R^n)$. However, we need to express and solve the generating function $\mathsf{L}$ before we can talk about its radius of convergence.

Expressing $\mathsf{L}$ in terms of $\mathsf{M}$. Observe that we can describe the combined random walk as the regular expression

$\displaystyle \text{Walk = Epoch}^*\text{ Divergence}$.

The probability that $\lambda$ never meets $\mu$ for $t >0$ is $\epsilon$. We make two observations: (1) A divergence at $t$ can happen in many ways: zero epoch then divergence, or one epoch then divergence, or two epochs then divergence, and so on; and (2) An epoch is independent of the previous epoch. Now we can express $\mathsf{L}$ in terms of $\mathsf{M}$ as follows:

$\displaystyle \mathsf{L}(Z) = \epsilon \left( 1 + \mathsf{M}(Z) + \mathsf{M}(Z)^2 + \cdot \right) = \frac{\epsilon}{1-\mathsf{M}(Z)}\,.$

A Problem with the Down-Epoch. The generating function $\mathsf{M}$ seems problematic to analyze: in particular, the length of the second part of a down-epoch depends on the quantity $\Delta$, the position of $\lambda$ at the end of the first part of that epoch. However, we observe that $\Delta$ is at most the length of the first part of a down-epoch. This allows us to express a “relaxed” epoch as described next.

Let $\{\hat{m}_t\}$ be the sequence of “relaxed” epoch-completion probabilities. Let $\hat{\mathsf{M}}$ be the corresponding GF. Let $\hat{\mathsf{L}}$ be the corresponding divergence GF if we replace $\mathsf{M}$ with $\hat{\mathsf{M}}$ in the expression of $\mathsf{L}$ above. The relaxation happens in the sense that for every $t$, the “relaxed” epoch-completion probability $\hat{m}_t$ is never larger than the actual epoch-completion probability $m_t$. This means, $\displaystyle \mathsf{M} \preceq \hat{\mathsf{M}}$, and consequently, $\displaystyle \mathsf{L} \preceq \hat{\mathsf{L}}$.

Therefore, we have a new plan.

The Revised Plan. Recall that our goal is to upper-bound the quantity $\sum_{t \geq n}{\ell_t}$. Our original plan was to get an upper-bound for the radius of convergence of $\mathsf{L}$. Now it suffices to get an upper-bound $R$ for the radius of convergence of $\hat{\mathsf{L}}$, because the tail the sequence $\{\hat{\ell}_t\}$ will be thicker than the tail of the sequence $\{\ell_t\}$. Therefore, the quantity we seek to bound, $Q$, will be $O(1/R^n)$.

Now we have a plan of action:

1. Find an expression of $\hat{\mathsf{M}}$
2. Find the radius of convergence $R$ for $\hat{\mathsf{M}}$
3. Show that $\hat{\mathsf{M}}(z) < 1$ when $z < R$
4. Since $\mathsf{M} \preceq \hat{\mathsf{M}}$, it would follow that $\mathsf{M}(z) < 1$ for $z < R$
5. This would imply, in turn, that $\mathsf{L}(Z) = \epsilon/(1-\mathsf{M}(Z))$ converges with the radius of convergece $R$
6. This would imply $Q = O(1/R^n)$ when $n \rightarrow \infty$

We begin by defining the relaxed epoch.

The “Relaxed” Epoch.

1. An epoch starts with an up-step of $\mu$, in which case $\mu$ has to descend one step to complete the epoch
2. Alternatively, an epoch starts with a down-step of $\mu$, so that now $\mu = -1$ and the gap between $\lambda$ and $\mu$ is $1$
3. From here, $\mu$ reaches zero after an unspecified number of horizontal steps. This completes the first part of the down-epoch
4. Later on, $\lambda$ will have to descend one vertical step for each of these horizontal steps
5. $\lambda$ will have to descend one more step to cover the gap mentioned in step 2. This completes the second part of the down-epoch

Defining the GF $\hat{\mathsf{M}}$. How do we express this “relaxed” epoch-completion probabilities? Of course, we will use a generating function in variable $Z$. Recall the definitions of $\mathsf{A}, \mathsf{D},$ and $\mathsf{D}^t$. Step 1 would be simply $p Z \mathsf{D}(Z)$. Step 2+5 would be  $q Z \mathsf{D}(Z)$. Step 3 and 4 are tricky.

For Steps 3 and 4, let $g_t$ be the probability that the random walk makes the first ascent at $t$, times the probability that the same walk ever descends $t$ vertical positions. After some head scratching, we write

$\displaystyle g_{t+k} = a_t d^{(t)}_k$

where $g_{t+k}$ is the probability that the biased walk $W$, starting at the origin, visits $1$ for the first time at time $t$, times the probability that the biased walk $W$, starting at the origin, visits $-t$ for the first time at time $k$.  The corresponding GF is

$\displaystyle \ \sum_{t \geq 0}{ \sum_{k \geq 0}{g_{t+k} Z^{t+k} } }$

$\displaystyle =\sum_{t \geq 0}{a_t Z^t \sum_{k \geq 0}{d^{(t)}_k} Z^k}$

$\displaystyle = \sum_{t \geq 0}{a_t Z^t \mathsf{D}(Z)^t }$

$\displaystyle = \mathsf{A}\big(Z \mathsf{D}(Z) \big).$

Putting these all together,

The GF for the “relaxed” epoch is

$\displaystyle \hat{\mathsf{M}}(Z) = pZ \mathsf{D}(Z) + qZ \mathsf{D}(Z) \mathsf{A}\left(Z \mathsf{D}(Z) \right).$

The Radius of Convergence for $\mathsf{A}$ and $\mathsf{D}$. By solving the quadratic expressions of $\mathsf{A}(z)$ and $\mathsf{D}(z)$, we see that

$\displaystyle \mathsf{D}(z) = \frac{1-\sqrt{1-4pq z^2} }{2p z}$, and

$\displaystyle \mathsf{A}(z) = \frac{1-\sqrt{1-4pq z^2} }{2q z}$.

Hence the GFs $\mathsf{A, D}$ converge as long as the discriminant $\sqrt{1-4pq z^2} > 0$, or

$\displaystyle z^2 < \frac{1}{4pq} = \frac{1}{1-\epsilon^2}$

which is equivalent to

$\displaystyle \vert z \vert < 1 + \epsilon^2/2 +O(\epsilon^4)$.

The Radius of Convergence for $\hat{\mathsf{M}}$The second term in the expression of $\hat{\mathsf{M}}(z)$ is $q z \mathsf{A}\left( z \mathsf{D}(z) \right) \mathsf{D}(z)$. After expanding this expression, the discriminant has to be positive for convergence. That is,

$\displaystyle 1 - (1-\epsilon^2) \left( \frac{1- \sqrt{1 - (1-\epsilon^2) z^2} }{1-\epsilon} \right)^2 > 0.$

This boils down to

$\displaystyle \vert z \vert < R = 1 + \epsilon^3/2 + O(\epsilon^4)$.

An Upper Bound on $\hat{\mathsf{L}}$ When $p z \mathsf{D}(Z)$ converges, it converges to a value less than $1/2$. The same is true for $q z \mathsf{A}(z)$. In particular,

$\displaystyle q \hat{z} \mathsf{A}(\hat{z}) < \frac{1}{2}$ where $\hat{z} \leftarrow z \mathsf{D}(z)$.

This means, $\vert \hat{\mathsf{M}}(z) \vert < 1$ when $\vert z \vert < R$. In turn, this implies $\hat{\mathsf{L}}(z) = \epsilon / (1-\mathsf{M}(z) )$ converges for $z < R$.

Bounding the Tail of $\mathsf{L}$. Using the Taylor series expansion $\log(1+x) = x -x^2/2 + O(x^4)$, we get $R = \exp \left( \epsilon^3/2 + O(\epsilon^4)/2 \right)$.

The quantity we desire, $Q = O(1/R^n)$. Our final result, then, is

$\displaystyle \Pr[w \text{ is forkable}] \leq \exp \left( -\epsilon^3(1+ O(\epsilon))n/2 \right).$

References