# Bounding the Supremum of a Gaussian Process: Talagrand’s Generic Chaining (Part 1)

This post is part of a series which answers a certain question about the supremum of a Gaussian process. I am going to write, as I have understood, a proof given in Chapter 1 of the book “Generic Chaining” by Michel Talagrand. I recommend the reader to take a look at the excellent posts by James Lee on this matter. (I am a beginner, James Lee is a master.)

Let $(T,d)$ be a finite metric space. Let $\{X_t\}$ be a Gaussian process where each $X_t$ is a zero-mean Gaussian random variable. The distance between two points $s,t\in T$ is the square-root of the covariance between $X_s$ and $X_t$. In this post, we are interested in upper-bounding $Q$.

Question: How large can the quantity $Q := \mathbb{E} \sup_{t\in T} X_t$ be?

In this post we are going to prove the following fact:

$\boxed{\displaystyle \mathbb{E}\sup_{t\in T}X_t \leq O(1)\cdot \sup_{t\in T}\sum_{n\geq 1}{2^{n/2}d(t,T_{n-1})} ,}$

where $(t, A)$ is the distance between the point $X_t$ from the set $A$, and $\{T_i\}$ is a specific sequence of sets with $T_i\subset T$. Constructions of these sets will be discussed in a subsequent post.

Notice that the metric structure (i.e., the pairwise distances) does not change if we translate all distances by a constant. Hence, let us pick an arbitrary point $t_0 \in T$ and see that $Q=\mathbb{E} \sup_{t\in T} X_t$ equals $\mathbb{E} \sup_{t\in T} (X_t-X_{t_0})$ by the linearity of expectation. This formulation has the advantage that the random quantity $\sup_{t\in T} (X_t-X_{t_0})$ is clearly non-negative unless all $X_t$ equals $X_{t_0}$.

Let $Y=\sup_{t\in T} (X_t-X_{t_0})$. Recall that by the definition of expectation, $\mathbb{E}Y=\int_0^\infty{\Pr(Y \geq u) du}$. Looks like we need to upper-bound the quantity $\Pr\big(\sup_{t\in T}(X_t-X_{t_0}) \geq u\big)$ first. An immediate thought is to use a union bound to show that a sum is at least as large as the max:

$\Pr\big(\sup_{t\in T}(X_t-X_{t_0}) \geq u\big) \leq \sum_{t\in T}{\Pr\big(X_t-X_{t_0} \geq u\big) }.$

However, this could potentially burn us if either there are “too many” points (albeit each with a small contribution), or if the points are “too close” to each other (so that each point makes a large contribution). What can we do to prevent this?

Maybe we should try to group nearby points together in clusters. Here is an idea: we will do a “gradual clustering”, as described below.

At steps $0,1,2,\cdots n$ we will maintain a set $T_n$ containing some “representative points”. We would try to spread these points across the entire space as best as we can. At step $n$, every point $t$ will use its nearest point in $T_n$ as its representative.  Let us call this point $\pi_n(t)$ . Let us initialize $T_0=\{t_0\}$ with the (arbitrary) point we selected in the above discussion.

When $n = 1$, note that $X_t-X_{t_0}=X_t-X_{\pi_1(t)}+X_{\pi_1(t)}-X_{t_0}$. The first two points are close (as we had planned), and hence the supremum, when taken across all points $t\in T$, should be small and more controllable. As for the last two terms, note that the number of representative points should be small compared to the total number of points (we need to carefully control it, though), and hence we hope that the union bound on this part will not hurt us much.

After $n$ steps of choosing representative sets $\{T_n\}_{n\geq 0}$, we can write the telescoping sum $X_t-X_{t_0}=\sum_{n\geq 1}{X_{\pi_n(t)} - X_{\pi_{n-1}(t)}}$ provided we can make sure that $\pi_n(t)=t$ when $n$ is sufficiently large. (That is, each point would form its own cluster.)

We control the size of the sets $T_n$; in particular, we ensure that $\vert T_n \vert \leq N_n$ where $N_n=2^{2^n}$. (We’ll show later how we do it.) Because $\{X_t\}$ are Gaussian random variables, it follows that for $u \geq 0$, we have

$\displaystyle p_t(u,n):=\Pr\big(|X_{\pi_n(t)}-X_{\pi_{n-1}(t)}| \geq u\big)\leq 2\cdot \exp\big(-\frac{u^2}{2d(\pi_n(t),\pi_{n-1}(t))}\big)$.

Let us denote by $\Omega_u$ the event that for all $n \geq 1$ and all points $t\in T$,

$\displaystyle |X_{\pi_n(t)}-X_{\pi_{n-1}(t)}| \leq u 2^{n/2} d(\pi_n(t),\pi_{n-1}(t)).$

Think of $\Omega_u$ as a good event, which means successive approximations of points $X_t$ by representitives $\pi_n(t)$ are “close”, and hence possibly converging.  By combining the two previous equations, we have

$\displaystyle \Pr\big(\Omega_u\big)\leq 2\cdot \exp\big(-\frac{u^2\cdot 2^n}{2}\big)$.

Let $\overline{\Omega}_u$ be the complement of the event $\Omega_u$; we think of $\overline{\Omega}_u$ as a bad event which means that for each point $X_t$, all successive approximations to it are “far apart”. We can find the probability of this bad event by a union bound over all $n\geq 1$ and all point-pairs between $T_{n-1}$ and $T_n$.

At the $n$th step, there are

$\displaystyle \vert T_n \vert \times \vert T_{n-1} \vert = 2^{2^{n-1}+2^n} \leq 2^{2\cdot 2^n}=2^{2^{n+1}}$

representative pairs between $T_{n-1}$ and $T_n$. Therefore,

$\displaystyle p(u) := \Pr\big(\bar{\Omega_u}\big) \leq \sum_{n\geq 1}{|T_{n-1}|\cdot |T_n|\cdot p_t(u,n)} = \sum_{n\geq 1}{2\cdot 2^{2^{n+1}}}\exp\big(-u^22^n/2\big)$
$= \sum_{n\geq 1}{\frac{ 2^{1+2^{n+1}} }{e^{2^{n-1}}}e^{-u^2}}.$

Because the high-order terms in the sum in $p(u)$ fall off quickly when $u \geq 4$, we can write $p(u) \leq L\cdot e^{-u^2}$ where $L$ is a constant.

When $\Omega_u$ occurs, it means for any point, all its successive approximations are “close”. In this case, we see (by telescoping ) that $\sup_{t\in T} |X_t-X_{t_0}|=\sup_{t\in T}\sum_{n\geq 1}{X_{\pi_n(t)}-X_{\pi_n(t)}} \leq u\cdot S$ where $S:=\sup_{t\in T}\sum_{n\geq 1}{2^{n/2}d(\pi_n(t),\pi_{n-1}(t))}$.

For any point $t\in T$, using triangle inequality, $d(\pi_n(t),\pi_{n-1}(t)) \leq d(t,\pi_{n-1}(t)) + d(t,\pi_n(t))$ $\leq 2d(t,\pi_{n-1}(t))=2d(t,T_{n-1})$ since between $\pi_n(t)$ and $\pi_{n-1}(t)$, $\pi_{n-1}(t)$ is farther from $t$ by construction. Thus, $S \leq \sup_{t\in T}\sum_{n\geq 1}{2\cdot 2^{n/2}d(t,T_{n-1})}$.

However, when $\Omega_u$ does not occur, the following holds: $\displaystyle \sup_{t\in T}|X_t-X_{t_0}| > uS$. The probability of this is exactly $\Pr\big(\bar{\Omega_u}\big)\leq p(u)$. In other words, $\displaystyle \Pr\Big(\sup_{t\in T}|X_t-X_{t_0}| > uS\Big) \leq p(u) \leq O(1) e^{-u^2}.$

By integrating, we get that $\mathbb{E}\sup_{t\in T}(X_t-X_{t_0})$=$\int_0^\infty{\Pr\Big(\sup_{t\in T}|X_t-X_{t_0}| > uS\Big) d(uS)}$$\leq LS\int_0^\infty{e^{-u^2/2}}=O(1)\cdot S$ since $S$ does not depend on $u$, and the integral of a Gaussian is bounded by a constant.

This completes our proof of the following fact:

$\boxed{\mathbb{E}\sup_{t\in T}X_t \leq O(1)\cdot \sup_{t\in T}\sum_{n\geq 1}{2^{n/2}d(t,T_{n-1})}.}$