Forkable Strings are Rare

In a blockchain protocol such as Bitcoin, the users see the world as a sequence of states. A simple yet functional view of this world, for the purpose of analysis, is a Boolean string w = w_1, w_2, \cdots of zeros and ones, where each bit is independently biased towards 1 favoring the “bad guys.”

A bad guy is activated when w_t = 1 for some t. He may try to present the good guys with a conflicting view of the world, such as presenting multiple candidate blockchains of equal length. This view is called a “fork”. A string w that allows the bad guy to fork (with nonnegligible probability) is called a “forkable string”. Naturally, we would like to show that forkable strings are rare: that the manipulative power of the bad guys over the good guys is negligible.

Claim ([1], Bound 2). Suppose w =w_1, \cdots, w_n is a Boolean string, with every bit independently set to 1 with probability (1-\epsilon)/2 for some \epsilon < 1. The probability that w is forkable is at most \exp(-\epsilon^3n/2).

In this post, we present a commentary on the proof that forkable strings are rare. I like the proof because it uses simple facts about random walks, generating functions, and stochastic domination to bound an apparently difficult random process.

The Biased Random Walk, W. Let w := w_1w_2\cdots \in \{0,1\}^* be a Boolean string with

\displaystyle \Pr[w_t = 1] = \frac{1-\epsilon}{2}.

Define W to be the (p,q)-biased random walk on the two-dimensional grid, starting at the origin. The horizontal axis is indexed by nonnegative “time” steps. The vertical axis is index by the integers. The walk W takes a step “up” with probability p and a step “down” with probability q. In this note, we refer to this walk by the random walk.

Two More Random Walks, \lambda and \muDefine two more random walk variables \lambda(t), \mu(t) that depend on w. Both \lambda and \mu are zero at t = 0. If w_t = 1, both \lambda, \mu take a step “up” i.e., their value increases by one. When w_t = 0, they take one step “down” with the following exception:

  • If  \lambda(t-1) = 0 then \lambda(t) \leftarrow 0
  • If \mu(t-1) = 0 and \lambda(t-1) > 0, then \mu(t) \leftarrow 0

These rules imply that

\displaystyle \lambda(w) \geq 0, \lambda(w) \geq \mu(w) \text{ for all strings } w,

and whenever \lambda(t) equals \mu(t), they take the value zero.

The Divergence of \mu. Imagine a stream on incoming zero-one symbols \cdots, w_t, \cdots which determine the walks. Let \tau be the last time \mu was nonnegative. We say \mu “diverges” at time \tau.

Forkability. The string w is “forkable” if \mu(w) \geq 0.

\displaystyle \Pr[w \text{ is forkable}] \leq \Pr[\mu \text{ diverges at or after } n].

Let Q be an upper bound on this probability.

Want to Show:

Q = \exp( -\epsilon^3 (1 - O(\epsilon)) n/2 ).

Two Kinds of Epochs.  Every time t when \lambda(t) = \mu(t) = 0 marks the beginning of an epoch. If the first symbol w_t in an epoch is 1, we call this an “up-epoch”; otherwise, we call it a “down-epoch”. Inside an up-epoch, it is always the case that \lambda = \mu > 0.

The down-epochs are more interesting, however:

  1. At the beginning of a down-epoch, only \mu takes the step down while \lambda stays at zero.
  2. It remains there as long as the incoming symbols w_t are all zeros.
  3. Whenever w_t = 1, both \lambda and \mu move up in unison. Let \Delta := \lambda(t) - \mu(t).
  4. Next, two things can happen. Either \lambda reaches zero before \mu does. Then we go back to step 2.
  5. Otherwise, \mu reaches zero when \lambda is at \Delta > 0. This completes the first part of a down-epoch.
  6. Now begins the second part of a down-epoch: now \mu waits at zero for \lambda to make \Delta descends and reach zero again.
  7. When \lambda eventually reaches zero, a down-epoch is complete.

In order to bound the probability of eventual divergence, we need to bound the probability of an epoch-completion. Let us begin with some definitions.

First epoch completion probability. Let m_t be the probability that the first epoch completes at time t.

Divergence probability. Let \ell_t be the probability that \mu diverges at t.

Our goal is to bound the forkability:

\Pr[w \text{ is forkable}] \leq \sum_{t \geq n} \ell_t.

Before we lay out the plan, let us introduce some key ingredients which we will use. These are:

  1. Generating functions
  2. Convergence of GFs
  3. Stochastic dominance of GFs
  4. Gambler’s Ruin
  5. Facts about a biased random walk

Generating Functions, or GF in short. A generating function \mathsf{G}(Z) is a formal power series \sum_{t}{g_t Z^t} which “generates” the sequence G=\{g_n\} as its coefficients. “A generating function is a clothesline where we hang the coefficients”, says Herbert Wilf. We identify the GF \mathsf{G} with its coefficients G. If the coefficients of a GF are probabilities, then it is called a probability generating function, or PGF in short. These GFs satisfy \mathsf{G}(1)=\sum_{g_t} = 1. We use the notation [G]_k to indicate the kth coefficient g_k.

Radius of Convergence of a GF. Suppose a generating function \mathsf{G}(Z) = \sum_{t}{g_t Z^t} converges at some z, with the radius of convergence being R. This means, \mathsf{G}(z^\prime) converges for all \vert z-z^\prime \vert < R, but there is some z^\prime with \vert z-z^\prime \vert = R so that \mathsf{G}(z^\prime) does not converge. When \mathsf{G}(Z) converges with radius R, it means

\displaystyle \lim_{n \rightarrow \infty}\frac{g_n}{R^n}=0, or \displaystyle g_n = O(1/R^n).

Thin Tail of a Converging GF. Then,

\displaystyle \sum_{t \geq T}{g_t} = O(1/R^n)

because if we sum a constant number of terms, each O(1/R^{n+c}), the sum is still O(1/R^n).

Stochastic Dominance of PGFs.  The probability distribution A = \{a_t\} “stochastically dominates” another distribution B = \{b_t\} if, for all t \geq T, it holds that a_t \geq b_t. We express this fact as \mathsf{B} \preceq \mathsf{A}. Because \mathsf{A}(1) = \mathsf{B}(1)=0, the above also means $b_t \geq a_t$ for all t < T.

Gambler’s Ruin. Suppose \mu = 0 at present. The probability that \mu ever reaches \mu = z > 0 is $(p/q)^z$. See here for more on the Gambler’s ruin problem.

Let us define some properties of the biased random walk W.

First Ascent Time. Let \{a_t\} be the probabilities that the (p,q)-biased random walk W, starting at the origin, visits (or hits) 1 for the first time at step t. Let \mathsf{A}(Z) be the corresponding generating function. We can see that

\displaystyle \mathsf{A}(Z) = p Z + q Z \mathsf{A}(Z)^2.

Observe that \mathsf{D} is not a PGF, since the gambler’s ruin tells us that there is a probability of p/q that the first ascent never happens, giving \mathsf{A}(1) = 1 - p/q.

First Descent Time. Let \{d_t\} be the probability that the (p,q)-biased random walk W, starting at the origin, visits (or hits) -1 for the first time at step t. Let \mathsf{D}(Z) be the corresponding generating function. We can see that

\displaystyle \mathsf{D}(Z) = q Z + p Z \mathsf{D}(Z)^2.

Since q > p, \mathsf{D} is indeed a PGF.

t-Descent Time. Let \{d^{(t)}_k\} be the probability that the (p,q)-biased random walk W, starting at the origin, visits (or hits) -t for the first time at step k. The convolution rule for GFs tells us that the GF for this sequence is \mathsf{D}(Z)^t.

We observe that a_0 = d_0 = 0, a_1 = p, d_1 = q, and d^{(t)_k} = 0 for k < t.

GF for Epoch-Completion and Divergence. Let \mathsf{M} and \mathsf{L} be the GF and GF of the sequences M = \{m_t\} and L=\{\ell_t\}, respectively.

Epoch-Completion Probability. The GF \mathsf{L} is indeed a PGF since the divergence can occur at any time step. However, \mathsf{M} is not a PGF:  if \mu(1) = -1, the probability that \mu reaches zero ever again is p/q. On the other hand, if \mu >0 (respectively, \lambda >0) the future event \mu = 0 (respectively, \lambda = 0) happens with probability 1 since q > p. Together, the probability that an epoch is ever complete is

\displaystyle p \cdot \Pr[\text{up-epoch completes}] + q \cdot \Pr[\text{down-epoch completes}]
\displaystyle = p \cdot 1 + q \cdot \left( \Pr[\mu \text{ ever reaches up to zero} ] \cdot \Pr[\lambda \text{ ever reaches down to zero}]\right)
\displaystyle = p + q (p/q) \cdot 1
= 2p = 1 - \epsilon.

This means \mathsf{M}(1) = \sum_t{m_t} = 1-\epsilon; there is an \epsilon probability that no epoch ever completes. This is why \mathsf{M} is not a probability generating function.

The Plan. Recall that we want to get an upper-bound Q to the quantity  \sum_{t \geq n}\{\ell_t\}. Since \{\ell_t\} are probabilities, it suffices to get an upper-bound R for the radius of convergence of \mathsf{L}. This will give us Q = O(1/R^n). However, we need to express and solve the generating function \mathsf{L} before we can talk about its radius of convergence.

Expressing \mathsf{L} in terms of \mathsf{M}. Observe that we can describe the combined random walk as the regular expression

\displaystyle \text{Walk = Epoch}^*\text{ Divergence}.

The probability that \lambda never meets \mu for t >0 is \epsilon. We make two observations: (1) A divergence at t can happen in many ways: zero epoch then divergence, or one epoch then divergence, or two epochs then divergence, and so on; and (2) An epoch is independent of the previous epoch. Now we can express \mathsf{L} in terms of \mathsf{M} as follows:

\displaystyle \mathsf{L}(Z) = \epsilon \left( 1 + \mathsf{M}(Z) + \mathsf{M}(Z)^2 + \cdot \right) = \frac{\epsilon}{1-\mathsf{M}(Z)}\,.

A Problem with the Down-Epoch. The generating function \mathsf{M} seems problematic to analyze: in particular, the length of the second part of a down-epoch depends on the quantity \Delta, the position of \lambda at the end of the first part of that epoch. However, we observe that \Delta is at most the length of the first part of a down-epoch. This allows us to express a “relaxed” epoch as described next.

Let \{\hat{m}_t\} be the sequence of “relaxed” epoch-completion probabilities. Let \hat{\mathsf{M}} be the corresponding GF. Let \hat{\mathsf{L}} be the corresponding divergence GF if we replace \mathsf{M} with \hat{\mathsf{M}} in the expression of \mathsf{L} above. The relaxation happens in the sense that for every t, the “relaxed” epoch-completion probability \hat{m}_t is never larger than the actual epoch-completion probability m_t. This means, \displaystyle \mathsf{M} \preceq \hat{\mathsf{M}}, and consequently, \displaystyle \mathsf{L} \preceq \hat{\mathsf{L}}.

Therefore, we have a new plan.

The Revised Plan. Recall that our goal is to upper-bound the quantity \sum_{t \geq n}{\ell_t}. Our original plan was to get an upper-bound for the radius of convergence of \mathsf{L}. Now it suffices to get an upper-bound R for the radius of convergence of \hat{\mathsf{L}}, because the tail the sequence \{\hat{\ell}_t\} will be thicker than the tail of the sequence \{\ell_t\}. Therefore, the quantity we seek to bound, Q, will be O(1/R^n).

Now we have a plan of action:

  1. Find an expression of \hat{\mathsf{M}}
  2. Find the radius of convergence R for \hat{\mathsf{M}}
  3. Show that \hat{\mathsf{M}}(z) < 1 when z < R
  4. Since \mathsf{M} \preceq \hat{\mathsf{M}}, it would follow that \mathsf{M}(z) < 1 for z < R
  5. This would imply, in turn, that \mathsf{L}(Z) = \epsilon/(1-\mathsf{M}(Z)) converges with the radius of convergece R
  6. This would imply Q = O(1/R^n) when n \rightarrow \infty

We begin by defining the relaxed epoch.

The “Relaxed” Epoch.

  1. An epoch starts with an up-step of \mu, in which case \mu has to descend one step to complete the epoch
  2. Alternatively, an epoch starts with a down-step of \mu, so that now \mu = -1 and the gap between \lambda and \mu is 1
  3. From here, \mu reaches zero after an unspecified number of horizontal steps. This completes the first part of the down-epoch
  4. Later on, \lambda will have to descend one vertical step for each of these horizontal steps
  5. \lambda will have to descend one more step to cover the gap mentioned in step 2. This completes the second part of the down-epoch


Defining the GF \hat{\mathsf{M}}. How do we express this “relaxed” epoch-completion probabilities? Of course, we will use a generating function in variable Z. Recall the definitions of \mathsf{A}, \mathsf{D}, and \mathsf{D}^t. Step 1 would be simply p Z \mathsf{D}(Z). Step 2+5 would be  q Z \mathsf{D}(Z). Step 3 and 4 are tricky.

For Steps 3 and 4, let g_t be the probability that the random walk makes the first ascent at t, times the probability that the same walk ever descends t vertical positions. After some head scratching, we write

\displaystyle g_{t+k} = a_t d^{(t)}_k

where g_{t+k} is the probability that the biased walk W, starting at the origin, visits 1 for the first time at time t, times the probability that the biased walk W, starting at the origin, visits -t for the first time at time k.  The corresponding GF is

\displaystyle \  \sum_{t \geq 0}{ \sum_{k \geq 0}{g_{t+k} Z^{t+k} } }

\displaystyle =\sum_{t \geq 0}{a_t Z^t \sum_{k \geq 0}{d^{(t)}_k} Z^k}

\displaystyle = \sum_{t \geq 0}{a_t Z^t \mathsf{D}(Z)^t }

\displaystyle = \mathsf{A}\big(Z \mathsf{D}(Z) \big).

Putting these all together,

The GF for the “relaxed” epoch is

\displaystyle \hat{\mathsf{M}}(Z) = pZ \mathsf{D}(Z) + qZ \mathsf{D}(Z) \mathsf{A}\left(Z \mathsf{D}(Z) \right).

The Radius of Convergence for \mathsf{A} and \mathsf{D}. By solving the quadratic expressions of \mathsf{A}(z) and \mathsf{D}(z), we see that

\displaystyle \mathsf{D}(z) = \frac{1-\sqrt{1-4pq z^2} }{2p z}, and

\displaystyle \mathsf{A}(z) = \frac{1-\sqrt{1-4pq z^2} }{2q z}.

Hence the GFs \mathsf{A, D} converge as long as the discriminant \sqrt{1-4pq z^2} > 0, or

\displaystyle z^2 < \frac{1}{4pq} = \frac{1}{1-\epsilon^2}

which is equivalent to

\displaystyle \vert z \vert < 1 + \epsilon^2/2 +O(\epsilon^4).

The Radius of Convergence for \hat{\mathsf{M}}The second term in the expression of \hat{\mathsf{M}}(z) is q z \mathsf{A}\left( z \mathsf{D}(z) \right) \mathsf{D}(z). After expanding this expression, the discriminant has to be positive for convergence. That is,

\displaystyle 1 - (1-\epsilon^2) \left( \frac{1- \sqrt{1 - (1-\epsilon^2) z^2} }{1-\epsilon} \right)^2 > 0.

This boils down to

\displaystyle \vert z \vert < R = 1 + \epsilon^3/2 + O(\epsilon^4).

An Upper Bound on \hat{\mathsf{L}} When p z \mathsf{D}(Z) converges, it converges to a value less than 1/2. The same is true for q z \mathsf{A}(z). In particular,

\displaystyle q \hat{z} \mathsf{A}(\hat{z}) < \frac{1}{2} where \hat{z} \leftarrow z \mathsf{D}(z).

This means, \vert \hat{\mathsf{M}}(z) \vert < 1 when \vert z \vert < R. In turn, this implies \hat{\mathsf{L}}(z) = \epsilon / (1-\mathsf{M}(z) ) converges for z < R.

Bounding the Tail of \mathsf{L}. Using the Taylor series expansion \log(1+x) = x -x^2/2 + O(x^4), we get R = \exp \left( \epsilon^3/2 + O(\epsilon^4)/2 \right).

The quantity we desire, Q = O(1/R^n). Our final result, then, is

\displaystyle \Pr[w \text{ is forkable}] \leq \exp \left( -\epsilon^3(1+ O(\epsilon))n/2 \right).


[1] Forkable Strings are Rare

[2] Ouroboros: A Provably Secure Proof-of-Stake Blockchain Protocol

[3] Ouroboros Praos: An adaptively-secure, semi-synchronous
proof-of-stake blockchain


One thought on “Forkable Strings are Rare

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s