This post is part of a series which answers a certain question about the supremum of a Gaussian process. I am going to write, as I have understood, a proof given in Chapter 1 of the book “Generic Chaining” by Michel Talagrand. I recommend the reader to take a look at the excellent posts by James Lee on this matter. (I am a beginner, James Lee is a master.)

Let be a finite metric space. Let be a Gaussian process where each is a zero-mean Gaussian random variable. The distance between two points is the square-root of the covariance between and . In this post, we are interested in upper-bounding .

**Question:** How large can the quantity be?

In this post we are going to prove the following fact:

where is the distance between the point from the set , and is a specific sequence of sets with . Constructions of these sets will be discussed in a subsequent post.

Notice that the metric structure (i.e., the pairwise distances) does not change if we translate all distances by a constant. Hence, let us pick an arbitrary point and see that equals by the linearity of expectation. This formulation has the advantage that the random quantity is clearly non-negative unless all equals .

Let . Recall that by the definition of expectation, . Looks like we need to upper-bound the quantity first. An immediate thought is to use a union bound to show that a sum is at least as large as the max:

However, this could potentially burn us if either there are “too many” points (albeit each with a small contribution), or if the points are “too close” to each other (so that each point makes a large contribution). What can we do to prevent this?

Maybe we should try to group nearby points together in clusters. Here is an idea: we will do a “gradual clustering”, as described below.

At steps we will maintain a set containing some “representative points”. We would try to spread these points across the entire space as best as we can. At step , every point will use its nearest point in as its representative. Let us call this point . Let us initialize with the (arbitrary) point we selected in the above discussion.

When , note that . The first two points are close (as we had planned), and hence the supremum, when taken across all points , should be small and more controllable. As for the last two terms, note that the number of representative points should be small compared to the total number of points (we need to carefully control it, though), and hence we hope that the union bound on this part will not hurt us much.

After steps of choosing representative sets , we can write the telescoping sum provided we can make sure that when is sufficiently large. (That is, each point would form its own cluster.)

We control the size of the sets ; in particular, we ensure that where . (We’ll show later how we do it.) Because are Gaussian random variables, it follows that for , we have

.

Let us denote by the event that for all and all points ,

Think of as a good event, which means successive approximations of points by representitives are “close”, and hence possibly converging. By combining the two previous equations, we have

.

Let be the complement of the event ; we think of as a bad event which means that for each point , all successive approximations to it are “far apart”. We can find the probability of this bad event by a union bound over all and all point-pairs between and .

At the th step, there are

representative pairs between and . Therefore,

Because the high-order terms in the sum in fall off quickly when , we can write where is a constant.

When occurs, it means for any point, all its successive approximations are “close”. In this case, we see (by telescoping ) that where .

For any point , using triangle inequality, since between and , is farther from by construction. Thus, .

However, when does not occur, the following holds: . The probability of this is exactly . In other words,

By integrating, we get that = since does not depend on , and the integral of a Gaussian is bounded by a constant.

This completes our proof of the following fact: