This post is part of a series which answers a certain question about the supremum of a Gaussian process. I am going to write, as I have understood, a proof given in Chapter 1 of the book “Generic Chaining” by Michel Talagrand. I recommend the reader to take a look at the excellent posts by James Lee on this matter. (I am a beginner, James Lee is a master.)
Let be a finite metric space. Let
be a Gaussian process where each
is a zero-mean Gaussian random variable. The distance between two points
is the square-root of the covariance between
and
. In this post, we are interested in upper-bounding
.
Question: How large can the quantity be?
In this post we are going to prove the following fact:
where is the distance between the point
from the set
, and
is a specific sequence of sets with
. Constructions of these sets will be discussed in a subsequent post.
Notice that the metric structure (i.e., the pairwise distances) does not change if we translate all distances by a constant. Hence, let us pick an arbitrary point and see that
equals
by the linearity of expectation. This formulation has the advantage that the random quantity
is clearly non-negative unless all
equals
.
Let . Recall that by the definition of expectation,
. Looks like we need to upper-bound the quantity
first. An immediate thought is to use a union bound to show that a sum is at least as large as the max:
However, this could potentially burn us if either there are “too many” points (albeit each with a small contribution), or if the points are “too close” to each other (so that each point makes a large contribution). What can we do to prevent this?
Maybe we should try to group nearby points together in clusters. Here is an idea: we will do a “gradual clustering”, as described below.
At steps we will maintain a set
containing some “representative points”. We would try to spread these points across the entire space as best as we can. At step
, every point
will use its nearest point in
as its representative. Let us call this point
. Let us initialize
with the (arbitrary) point we selected in the above discussion.
When , note that
. The first two points are close (as we had planned), and hence the supremum, when taken across all points
, should be small and more controllable. As for the last two terms, note that the number of representative points should be small compared to the total number of points (we need to carefully control it, though), and hence we hope that the union bound on this part will not hurt us much.
After steps of choosing representative sets
, we can write the telescoping sum
provided we can make sure that
when
is sufficiently large. (That is, each point would form its own cluster.)
We control the size of the sets ; in particular, we ensure that
where
. (We’ll show later how we do it.) Because
are Gaussian random variables, it follows that for
, we have
.
Let us denote by the event that for all
and all points
,
Think of as a good event, which means successive approximations of points
by representitives
are “close”, and hence possibly converging. By combining the two previous equations, we have
.
Let be the complement of the event
; we think of
as a bad event which means that for each point
, all successive approximations to it are “far apart”. We can find the probability of this bad event by a union bound over all
and all point-pairs between
and
.
At the th step, there are
representative pairs between and
. Therefore,
Because the high-order terms in the sum in fall off quickly when
, we can write
where
is a constant.
When occurs, it means for any point, all its successive approximations are “close”. In this case, we see (by telescoping ) that
where
.
For any point , using triangle inequality,
since between
and
,
is farther from
by construction. Thus,
.
However, when does not occur, the following holds:
. The probability of this is exactly
. In other words,
By integrating, we get that =
since
does not depend on
, and the integral of a Gaussian is bounded by a constant.
This completes our proof of the following fact: