Random Walk on Integers

Imagine that a particle is walking in a two-dimensional space, starting at the origin $(0, 0)$. At every time-step (or “epoch”) $n = 0, 1, 2, 3, \cdots$ it takes a $\pm 1$ vertical step. At every step, the particle either moves up by $+1$, or down by $-1$. This walk is “unbiased” in the sense that the up/down steps are equiprobable.

In this post, we will discuss some natural questions about this “unbiased” or “simple” random walk. For example, how long will it take for the particle to return to zero? What is the probability that it will ever reach +1? When will it touch $a$ for the first time? Contents of this post are a summary of the Chapter “Random Walks” from the awesome “Introduction to Probability” (Volume I) by William Feller. Also, this is an excellent reference.

Edit: the Biased Case. The biased walk follows

$\displaystyle p:=\Pr[ \text{down}]\, ,\qquad q:=\Pr[ \text{up} ]\, ,\qquad p + q = 1$.

The lazy walk follows

$\displaystyle p:=\Pr[ \text{down}]\, ,\qquad q:=\Pr[ \text{up} ]\, , \qquad \Pr[stay] = r\, ,\qquad p + q = 1$.

In the exposition, we’ll first state the unbiased case and when appropriate, follow it up with the biased case.

Definitions. Let $S_n$ be the position of the particle at time $n$ if it starts at the origin. The probability that $S_n = r$ is

$\displaystyle p_{n,r} := \Pr[ S_n = r] = {n \choose \frac{n+r}{2}} 2^{-n}$ .

To enrich our vocabulary, we also define

$\mathop{\Pr_k}[\text{event}]$

for the probability of an event when the walk starts at height $k$.

Define

$\displaystyle N_n(a,b) = \text{\# of paths from } (0,a) \text{ to } (n,b)$.

Clearly, $\displaystyle N_n(0, r) = {n \choose r}$.

Define

$\displaystyle N_n^0(a,b) = \text{\# of paths from } (0,a) \text{ to } (n,b) \text{ that touches or crosses zero}$.

The hitting time, or the first passage time, is defined as

$\displaystyle \tau_a = \text{Time to first hit the level }$ .

In particular, $\tau_0$ is the time to hit the origin for the first time.

Expressing $\mathop{\Pr_0}[S_n = k]$ without $k$For simple random walks,

$\mathop{\Pr_0}[S_n = k] \leq \dfrac{3}{\sqrt{n}}$ .

$\mathop{\Pr_0}[S_{2n} = 2k] \leq \dfrac{1 + o(1)}{\sqrt{\pi n}}$ .

Notice that the right-hand sides do not have $k$. The trick, in short, is to observe that

• a $0 \rightarrow k$ path of length $n$ contains exactly $(n + k)/2$ up moves and $(n - k)/2$ down moves; and
• since the up/down probabilities are equal, the quantity of interest is $\displaystyle {n \choose n/2 + k/2}$ which is at most $\displaystyle {n \choose n/2}$. Stirling’s approximation does the rest.

The Reflection Principle. Consider a simple (possibly lazy) random walk. Let $a, b$ be positive integers. The following two quantities are equal:

1. the number of $a \rightarrow b$ paths that never touch or cross the origin, and
2. the number of paths from $a$ to $-b$ .

In short,

$\displaystyle N_n^0(a,b) = N_n(-a,b)$ .

Since there is a bijection between the paths and the probabilities,

$\mathop{\Pr_a}[\tau_0 < n, S_n = b] = \mathop{\Pr_a}[S_n = -b]$.

This also implies that

$\mathop{\Pr_a}[S_n > 0, \tau_0 < n] = \mathop{\Pr_a}[S_n < 0]$ .

That is, the probability that $S_n$ is negative after $n$ steps is the same as the probability that $S_n$ is positive _but_ has touched or crossed the origin.

Proof: For every path that starts at $a$ and touches zero for the first time at $c$, there is another—symmetric—path that starts at $-a$ and touches zero for the first time at $c$.

An Application of the Reflection Principle: the Ballot Theorem. The reflection principle can be used to give a closed-form expression for the probability that a path stays strictly above the origin all along.

The Ballot Theorem. The number of $0 \rightarrow r$ paths of length $\displaystyle n$ that never touches or crosses zero is

$| \{ \text{paths from }1 \rightarrow r \}| - | \{ \text{paths from } 1 \rightarrow -r \}|$

$= {n-1 \choose r-1} - {n-1 \choose r+1} = \frac{r}{n} {n \choose r}$ .

A Return to the Origin. Suppose a walk, starting at the origin, returns to the origin at epoch $2k$. It is clear that this path takes $k$ steps up and $k$ steps down, in some order. The probability that this happens is

$\displaystyle u_{2k}:=\Pr[S_{2k} = 0]={2k \choose k} 2^{-2k} \sim \dfrac{1}{\sqrt{\pi k}}$ as $k \rightarrow \infty$.

The First Return to the Origin. Let $f_{2k}$ denotes the probability that the path returns to the origin for the first time at epoch $2k$. Suppose a (possibly not the first) return to the origin happens at epoch $2n$. Then, the first return could have happened at epochs $2, 4, 6, \cdots, 2n$. Thus

$\displaystyle u_{2n} = f_{2}u_{2n - 2} + f_{4}u_{2n - 4} + \cdots + f_{2n}u_{0}$ .

The Relation between $\tau_0$ and $S_n$ for simple or lazy simple random walks.

$\mathop{\Pr_k[\tau_0 > r]} = \mathop{\Pr_0}[-k < S_r \leq k]$ .

Using the bound $\mathop{\Pr_0}[S_n = k] \leq 3/\sqrt{n}$ and integrating in $(-k, k]$, we get

$\mathop{\Pr_k[\tau_0 > r]} \leq \dfrac{6k}{\sqrt{n}}$ .

No Return to the Origin.  For simple random walks, the probability that a return to the origin happens at $2n$ is the same as the probability that a return never occurs in $2n$ steps.

$u_{2n} = \mathop{\Pr_0}[\text{ path returns at } 2n] = \mathop{\Pr_0}[\text{ path never returns within } 2n]\, ,$

$= \mathop{\Pr_0}[S_{2k} \neq 0]\, , \qquad 1 \leq k \leq n$ .

The above two observations, together, imply that

For a simple (possibly lazy) random walk,

$\mathop{\Pr_k}[ \text{no return within time } n] = \mathop{\Pr_k}[ \text{a return within time } n]$
$= \mathop{\Pr_k}[0 < S_n \leq 2k] = \mathop{\Pr_0}[-k < S_n \leq k]$ .

$\mathop{\Pr_0}[\tau_0 = 2n] = \displaystyle f_{2n} = u_{2n-2} - u_{2n}$ .

This is equivalent to saying that the number of non-returning paths up to epoch $2n - 2$ are of two kinds: those that return at $2n$ (for the first time), and those that do not return at $2n$. Using $u_{2n} = {2n \choose n} (pq)^n$, we can show that

$\mathop{\Pr_0}[\tau_0 = 2n] = \displaystyle f_{2n} = \frac{u_{2n}}{2n-1} \sim \frac{1}{(2n - 1)\sqrt{\pi n}}$ as $n \rightarrow \infty$.

Integrating in $[n, \infty]$ for large $n$, we get that for a simple walk,

$\mathop{\Pr_0}[\tau_0 \geq r] \approx \displaystyle \sqrt{2/\pi} \int_r^\infty{x^{-3/2} dx} = \sqrt{2/\pi} [\dfrac{-2}{\sqrt{x}} ]_r^\infty$
$= \sqrt{2/\pi} \dfrac{2}{\sqrt{r}} \leq \dfrac{2}{\sqrt{r}}$ .

Contrast this with what we already know for a simple walk:

$\mathop{\Pr_k}[\tau_0 > r] \leq \dfrac{6k}{\sqrt{r}}$ .

First Return to the Origin after Starting from One. The reflection principle can be used to show that

$\mathop{\Pr_1}[ \tau_0 = 2n + 1] = \dfrac{ {2n \choose n} }{(n+1) 2^{2n + 1}} = \dfrac{c_n}{2^{2n + 1}}$

where $c_n$ is the $n$th Catalan number.

The Last Return to the Origin. The probability that up to and including epoch $2n$ the last visit to the origin occurs at epoch $2k$ is given by

$\displaystyle \alpha_{2k, 2n} = u_{2k} u_{2n-2k} \approx \dfrac{1}{\pi \sqrt{k(n-k)} }$ ,

where $k = 0, 1, \cdots, n$.

The first factor is the probability that a return occurs at epoch $2k$, and the second factor is the probability that a path of length $2n - 2k$ never returns to the origin. (Recall that $u_n$ is also the probability of no return.)

Using $k = nx$ where $x \in [0,1]$, we have

$\displaystyle \alpha{2n, 2nx} \approx \approx \dfrac{1}{n}\cdot \dfrac{1}{\pi \sqrt{x (1-x)} }$ .

Below is the plot of $\displaystyle 1/(\pi \sqrt{x (1-x)})$ (in blue) and its integral from zero to $x$ (in green). We remark that the integral equals $(2/\pi) \arcsin(\sqrt{x})$.

Staying on One Side for a Long Time. An integration of the above shows that for a fixed $0 < x < 1$ and $n$ sufficiently large,

$\displaystyle \Pr[\text{ last return within first }x\text{ fraction of the path } ]$

$\displaystyle = \sum_{k \leq x n} \approx \frac{2}{\pi} \arcsin \sqrt{x}$ .

The curve looks like the following:

This means an unbiased random walk is “more likely” to diverge either early or late (than diverging in the middle sections). In particular, the symmetry of the above curve implies that the probability that a path of length $2n$ spends a $2k$ time on one side—either at the beginning or at the end—is

$\displaystyle \alpha_{2k, 2n} \approx \dfrac{1}{n}\cdot \dfrac{1}{\pi \sqrt{x (1-x)} }$.

Number of Sign-Changes. The number of lead-changes in an $n$ trial game is proportional to $\sqrt{n}$. This is surprising because ordinarily, one would think that the number of side-changes is proportional to the length of the path. Specifically,

Suppose there are exactly $r$ sign-changes in a path of length $2n+1$. The corresponding probability is

$\displaystyle \xi_{r, 2n+1} = 2 \Pr[S_{2n+1} = 2r + 1] = 2 p_{2n+1, 2r+1} = {2n+1 \choose n+r+1}2^{-(2n+1)}$ .

Observe that this probability decreases as $r$ increases with $n$ fixed. This means that a path is more likely to experience few sign-changes.

Suppose $N = 2n$ and $n$ is large. We know that if $r = x\sqrt{N})$ for some positive real $x$, then $\Pr[S_{2n+1} = 2r + 1] \approx \Pr[S_{N} = 2x\sqrt{N}] \approx \mathcal{R}(2x) - 1/2$ where $\mathcal{R}(x)$ is the area of the standard normal distribution in the interval $[-\infty, x]$.

The probability that there are at most $x \sqrt{n}$ sign-changes in $n$ epochs, tends to $2 \mathcal{R}(2x) - 1$.

Supremum. Let $k \leq r$. The probability that a path of length $n$ has a maximum at most $r$ equals

$\displaystyle p_{n, 2r-1} = {n \choose \frac{n-1}{2}+1 }2^{-n}$.

This probability, obtained via the reflection principle, is the same as the probability that the said path touched or crossed the line $r$.

Maximum. Let $Z_n$ be a random variable denoting the maximum of a path of length $n$. Let $n \oplus r = 1$ if $n,r$ have different parity, and zero otherwise. Then

$\Pr[ Z_n = r ] = p_{n,r + n \oplus r}$ .

First Passage Time/Hitting Time.  For a simple random walk, the probability that the first passage through $r$ occurs at epoch $n$ is given by

$\displaystyle \phi_{r,n} = \frac{1}{2}\left[ p_{n-1,r-1} - p_{n-1, r+1} \right] = \frac{r}{n}p_{n,r}$ .

Moreover, as $r \rightarrow \infty$, the probability that the first passage through $r$ occurs before epoch $tr^2$ (for fixed $t$) tends to

$2\left(1-\mathcal{R}\left( 1/ \sqrt{t} \right)\right)$.

This means, the waiting time for the first passage through $r$ scales as $r^2$.

The Ascent/Descent Generating Functions. Let us define

$a_n := \mathop{\Pr_0}[S_n = 1 \text{ and }S_m \leq 0 \text{ for all } m < n]$ .

That is, $a_n = \Pr[\text{first ascent at time }n]$ .

Analogously, define

$d_n := \mathop{\Pr_0}[S_n = -1 \text{ and }S_m \geq 0 \text{ for all } m < n]$ .

The generating function for the probabilities $\{a_n\}$ is

$\displaystyle A(Z) := \frac{1 - \sqrt{ 1- 4pqZ^2} }{2 q Z}$

where $p = \Pr[\text{up}]$ and $q = \Pr[\text{down}]$. Likewise, the generating function for the probabilities $\{d_n\}$ is

$\displaystyle D(Z) := \frac{1 - \sqrt{ 1- 4pqZ^2} }{2 p Z}$ .

Suppose $p < q$ (so that the walk has a stronger down-pull), and in particular, let $p = (1-\epsilon)/2$ and $q = (1+\epsilon)/2$ for some $\epsilon$ so that $| p - q | = \epsilon$. Then

$\mathop{\Pr_0}[\text{walk ever makes an ascent}] = A(1) = \dfrac{1 - \epsilon}{1+\epsilon} = \dfrac{p}{q}$ .

On the other hand, if $p \geq q$ (so that the walk has a stronger up-pull) then $A(1) = (1-\epsilon)/(1-\epsilon) = 1$; this means that the walk eventually makes an ascent with certainty.

When $p \geq q$, we know that the walk will eventually make the first ascent. But how long does it take to make it? Observe that

$A^\prime(Z) = \dfrac{d A(Z)}{d Z} = \dfrac{-1 + (1-4pqZ^2)^{-1/2} }{2q Z^2}$ .

Thus

$\mathbb{E}[\text{time to make the first ascent}] = A^\prime(1) = \dfrac{1}{p-q} = \dfrac{1}{\epsilon}$ .

On the same note,

$\displaystyle \mathbb{E}[\text{time to make the }r\text{th ascent}] = \dfrac{dA(Z)^r}{d Z}\Big\vert_{Z = 1} = \dfrac{r}{p-q} = \dfrac{r}{\epsilon}$ .

The $r$th Return to the Origin. An $r$th return at epoch $n$ has the same probability as a first passage through $r$ and epoch $n-r$, which equals $\phi{r, n-r}$ defined above.

Time Spent on the Positive Side. The number of paths of length $2n$ such that $S_{2n}=0$ and exactly $2k$ of its sides are above the axis, is independent of $k$, and equal to

$\displaystyle \frac{2^{2n} u_{2n} }{n+1} = 2^{2n+1} f_{2n+2}$

where $k = 0, 1, 2, \cdots, n$.