Bounding the Supremum of a Gaussian Process: Talagrand’s Generic Chaining (Part 1)

This post is part of a series which answers a certain question about the supremum of a Gaussian process. I am going to write, as I have understood, a proof given in Chapter 1 of the book “Generic Chaining” by Michel Talagrand. I recommend the reader to take a look at the excellent posts by James Lee on this matter. (I am a beginner, James Lee is a master.)

Let (T,d) be a finite metric space. Let \{X_t\} be a Gaussian process where each X_t is a zero-mean Gaussian random variable. The distance between two points s,t\in T is the square-root of the covariance between X_s and X_t. In this post, we are interested in upper-bounding Q.

Question: How large can the quantity Q := \mathbb{E} \sup_{t\in T} X_t be?

In this post we are going to prove the following fact:

\boxed{\displaystyle \mathbb{E}\sup_{t\in T}X_t \leq O(1)\cdot  \sup_{t\in T}\sum_{n\geq 1}{2^{n/2}d(t,T_{n-1})} ,}

where (t, A) is the distance between the point X_t from the set A, and \{T_i\} is a specific sequence of sets with T_i\subset T. Constructions of these sets will be discussed in a subsequent post.

Notice that the metric structure (i.e., the pairwise distances) does not change if we translate all distances by a constant. Hence, let us pick an arbitrary point t_0 \in T and see that Q=\mathbb{E} \sup_{t\in T} X_t equals \mathbb{E} \sup_{t\in T} (X_t-X_{t_0}) by the linearity of expectation. This formulation has the advantage that the random quantity \sup_{t\in T} (X_t-X_{t_0}) is clearly non-negative unless all X_t equals X_{t_0}.

Let Y=\sup_{t\in T} (X_t-X_{t_0}). Recall that by the definition of expectation, \mathbb{E}Y=\int_0^\infty{\Pr(Y \geq u) du}. Looks like we need to upper-bound the quantity \Pr\big(\sup_{t\in T}(X_t-X_{t_0}) \geq u\big) first. An immediate thought is to use a union bound to show that a sum is at least as large as the max:

\Pr\big(\sup_{t\in T}(X_t-X_{t_0}) \geq u\big) \leq \sum_{t\in T}{\Pr\big(X_t-X_{t_0} \geq u\big) }.

However, this could potentially burn us if either there are “too many” points (albeit each with a small contribution), or if the points are “too close” to each other (so that each point makes a large contribution). What can we do to prevent this?

Maybe we should try to group nearby points together in clusters. Here is an idea: we will do a “gradual clustering”, as described below.

At steps 0,1,2,\cdots n we will maintain a set T_n containing some “representative points”. We would try to spread these points across the entire space as best as we can. At step n, every point t will use its nearest point in T_n as its representative.  Let us call this point \pi_n(t) . Let us initialize T_0=\{t_0\} with the (arbitrary) point we selected in the above discussion.

When n = 1, note that X_t-X_{t_0}=X_t-X_{\pi_1(t)}+X_{\pi_1(t)}-X_{t_0}. The first two points are close (as we had planned), and hence the supremum, when taken across all points t\in T, should be small and more controllable. As for the last two terms, note that the number of representative points should be small compared to the total number of points (we need to carefully control it, though), and hence we hope that the union bound on this part will not hurt us much.

After n steps of choosing representative sets \{T_n\}_{n\geq 0}, we can write the telescoping sum X_t-X_{t_0}=\sum_{n\geq 1}{X_{\pi_n(t)} - X_{\pi_{n-1}(t)}} provided we can make sure that \pi_n(t)=t when n is sufficiently large. (That is, each point would form its own cluster.)

We control the size of the sets T_n; in particular, we ensure that \vert T_n \vert \leq N_n where N_n=2^{2^n}. (We’ll show later how we do it.) Because \{X_t\} are Gaussian random variables, it follows that for u \geq 0, we have

\displaystyle p_t(u,n):=\Pr\big(|X_{\pi_n(t)}-X_{\pi_{n-1}(t)}| \geq u\big)\leq 2\cdot \exp\big(-\frac{u^2}{2d(\pi_n(t),\pi_{n-1}(t))}\big).

Let us denote by \Omega_u the event that for all n \geq 1 and all points t\in T,

\displaystyle |X_{\pi_n(t)}-X_{\pi_{n-1}(t)}| \leq u 2^{n/2} d(\pi_n(t),\pi_{n-1}(t)).

Think of \Omega_u as a good event, which means successive approximations of points X_t by representitives \pi_n(t) are “close”, and hence possibly converging.  By combining the two previous equations, we have

\displaystyle \Pr\big(\Omega_u\big)\leq 2\cdot \exp\big(-\frac{u^2\cdot 2^n}{2}\big).

Let \overline{\Omega}_u be the complement of the event \Omega_u; we think of \overline{\Omega}_u as a bad event which means that for each point X_t, all successive approximations to it are “far apart”. We can find the probability of this bad event by a union bound over all n\geq 1 and all point-pairs between T_{n-1} and T_n.

At the nth step, there are

\displaystyle \vert T_n \vert \times \vert T_{n-1} \vert = 2^{2^{n-1}+2^n} \leq 2^{2\cdot 2^n}=2^{2^{n+1}}

representative pairs between T_{n-1} and T_n. Therefore,

\displaystyle p(u) := \Pr\big(\bar{\Omega_u}\big) \leq \sum_{n\geq 1}{|T_{n-1}|\cdot |T_n|\cdot p_t(u,n)} = \sum_{n\geq 1}{2\cdot 2^{2^{n+1}}}\exp\big(-u^22^n/2\big)
= \sum_{n\geq 1}{\frac{ 2^{1+2^{n+1}} }{e^{2^{n-1}}}e^{-u^2}}.

Because the high-order terms in the sum in p(u) fall off quickly when u \geq 4, we can write p(u) \leq L\cdot e^{-u^2} where L is a constant.

When \Omega_u occurs, it means for any point, all its successive approximations are “close”. In this case, we see (by telescoping ) that \sup_{t\in T} |X_t-X_{t_0}|=\sup_{t\in T}\sum_{n\geq 1}{X_{\pi_n(t)}-X_{\pi_n(t)}} \leq u\cdot S where S:=\sup_{t\in T}\sum_{n\geq 1}{2^{n/2}d(\pi_n(t),\pi_{n-1}(t))}.

For any point t\in T, using triangle inequality, d(\pi_n(t),\pi_{n-1}(t)) \leq  d(t,\pi_{n-1}(t)) + d(t,\pi_n(t)) \leq 2d(t,\pi_{n-1}(t))=2d(t,T_{n-1}) since between \pi_n(t) and \pi_{n-1}(t), \pi_{n-1}(t) is farther from t by construction. Thus, S \leq \sup_{t\in T}\sum_{n\geq 1}{2\cdot 2^{n/2}d(t,T_{n-1})}.

However, when \Omega_u does not occur, the following holds: \displaystyle \sup_{t\in T}|X_t-X_{t_0}| > uS. The probability of this is exactly \Pr\big(\bar{\Omega_u}\big)\leq p(u). In other words, \displaystyle \Pr\Big(\sup_{t\in T}|X_t-X_{t_0}| > uS\Big) \leq p(u) \leq O(1) e^{-u^2}.

By integrating, we get that \mathbb{E}\sup_{t\in T}(X_t-X_{t_0})=\int_0^\infty{\Pr\Big(\sup_{t\in T}|X_t-X_{t_0}| > uS\Big) d(uS)} \leq LS\int_0^\infty{e^{-u^2/2}}=O(1)\cdot S since S does not depend on u, and the integral of a Gaussian is bounded by a constant.

This completes our proof of the following fact:

\boxed{\mathbb{E}\sup_{t\in T}X_t \leq O(1)\cdot  \sup_{t\in T}\sum_{n\geq 1}{2^{n/2}d(t,T_{n-1})}.}

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s