In this post, I will discuss my own understanding of the spectral sparsification paper by Daniel Spielman and Nikhil Srivastava (STOC 2008). I will assume the following:
- The reader is a beginner, like me, and have already glanced through the Spielman-Srivastava paper (from now on, the SS paper).
- The reader has, like me, a basic understanding of spectral sparsification and associated concepts of matrix analysis. I will assume that she has read and understood the Section 2 (Preliminaries) of the SS paper.
- The reader holds a copy of the SS paper while reading my post.
First, I will mention the main theorems (actually, I will mention only what they “roughly” say).
The Main Theorems
SS Theorem 1. It is possible to construct a spectral sparsifier of the input graph if we sample (with replacement) each edge (of weight ) with probability proportional to its effective resistance . We will need to take samples, and the resultant sparsifier will have the following property with probability at least :
where
- , an vector, is a real valued function defined over the vertices of
- and $\tilde{L}$) are the Laplacian matrix of and , respectively
The above theorem says, the Laplacian quadratic form of (i.e., ) can be made very close to the Laplacian quadratic form of , with probability greater than . Note that this theorem needs the effective resistance of each edge already computed. The following theorem shows how to approximate efficiently.
SS Theorem 2. It is possible to approximately compute the effective resistances of each edge in time through the approximation of a special matrix , which we will discuss later.
Is an approximation of good enough for [SS Theorem 1] to still hold? As it turns out, it is, confirmed by the following lemma.
SS Corollary 6. Approximate effective resistances are good enough for obtaining a spectral sparsifier.
The Dots, Waiting to be Connected
The main questions are:
- How do we prove the existence of such a probability distribution which lets us generate a spectral sparsifier?
- How do we efficiently compute/approximate these probabilities?
The answer to these questions can be found through weaving elegant connections among some apparently disconnected/loosely-connected dots. I will mention these dots as D1,D2,… and walk the reader through the connections.
D1. Effective resistance of edge can be defined in terms of the corresponding row of the incidence matrix as follows:
where is the Moore-Penrose pseudoinverse of the Laplacian . Consequently, we can see that the diagonal entries of the matrix are the effective resistances .
D2. (depends on D1) We can construct the matrix $\Pi$ which is similar to as follows:
It was proved in [SS Lemma 3] that
- is a projection matrix, and hence .
- Rank of is .
- All eigenvalues of are either zero or one. Multiplicity of 1 is , and multiplicity of 0 is .
- Diagonal elements of is equal to the squared-2-norm that column. That is,
D3. (depends on D2) The diagonal entries of are therefore
Thus we are interested in computing (either exactly or approximately) the diagonal entries of .
D4. Now consider an diagonal matrix whose diagonal entries denote the “scaling factor” introduced by the sparsification algorithm. Namely,
where
- Edge is sampled (with replacement) with probability
- is the number of samples taken to generate the sparsifier
- is the weight of the edge in
- is the weight of the edge in
D5. (depends on D4) It can be noted that the number of times an edge was actually sampled within trials is expected to be equal to . Therefore, , and thus and .
D6. (depends on D4) Since , the Laplacian of can be written as
Note that we want to make , the Laplacian quadratic form of , very close to , the Laplacian quadratic form of .
D7. ([SS Lemma 5] Rudelson-Vershynin Lemma) Roughly, this lemma says the following. Suppose we draw vectors from certain probability distribution, and build corresponding rank-1 matrices . Then, the expectation of the 2-norm of the difference between (1) the average value of these rank-1 matrices and (2) the expected value of such a rank-1 matrix is bounded. This bound depends on the number of samples taken and maximum of the 2-norm of the vectors .
D8. (depends on D7, D2, D3) Suppose the probability with with each edge is sampled is proportional to the corresponding diagonal entry of the projection matrix . That is,
Then, using the Rudelson-Vershynin lemma (D7), it can be proved that
with probability
(This is proved in the SS paper after stating the Rudelson-Vershynin lemma.) Observe that
and hence does not distort too much.
D9. [SS Lemma 4] (depends on D8) This lemma says that, given D8, that is, if is a non-negative diagonal matrix satisfying
,
then
Essentially, this proves [SS Theorem 1].
D10. (depends on D9) If the probability distribution used to sparsify is approximated, and the estimate is (roughly speaking) bounded by certain factor , then this approximation only changes the approximation error in D9 to , thus preserving the guarantee of spectral sparsification [SS Theorem 1].
This proves [SS Corollary 6].
Now that we have proved [SS Theorem 1], we are left to prove [SS Theorem 2].
D11. (depends on D1) Let be the characteristic vector of , that is, zero in all the rows except the row which contains one. Now note that
Suppose be an matrix. If we know the $\latex \hat{Z}$, then $R_{uv}$ can be directly calculated from the difference of two columns of . Therefore, now we focus on computing .
Actually, we will compute , a matrix.
D12. This a special version (due to Achlioptas) of the famous Johnson-Lindenstrauss theorem. Roughly speaking, this tells that members of any fixed set of dimensional vectors can be projected in dimension through a random matrix where , such that their distances are closely preserved. Entries of are Bernoulli random values taken from .
D13. This is the Spielman-Teng linear system solver (STsolver), which can be used to generate approximate solutions to system of linear equations involving Laplacians. This algorithm runs in expected time , where
- is the number of non-zero entries in , and
- is a predetermined approximation error which is introduced to the generated solution . This error parameter can be set as an input to the solver.
D14. (uses D12) We use D12 to project the dimensional columns of the matrix into dimensions. Specifically, we want to compute the matrix
,
where is a matrix. First, we compute the matrix . This takes time because has non-zero entries.
D15. (depends on D14, uses D13) Note that has rows and columns. Let be any row of so that length of is . Now we use D13 to approximately solve the system of equations , which gives us approximate solution to a single row of the matrix . Each call to STsolver takes time since has only non-zero entries. If we call STsolver times, once for each row of , we will be able to approximate the matrix with a matrix in total time
,
where we absorb the term under the notation.
D16. (depends on D11, D15) Now we pick an appropriate value of , the error introduced by the Spielman-Teng solver. [SS Lemma 9], with the help of [SS Proposition 10], proves that if we pick a specific value of , the matrix obtained in D15 can be used to compute the effective resistances as outlined in D11. It turns out that the appropriate value of involves the minimum-to-maximum weight ratio
,
which makes the final running time of approximating the effective resistances to
.
Therefore, we have proved [SS Theorem 1].
We Are Done
To summarize,
- First we showed that we can obtain a spectral sparsifer if we sample from a certain probability distribution (see D4).
- Then we showed that these probabilities are proportional to the diagonal entries of the projection matrix (see D8).
- We also showed that the diagonal entries of the projection matrix actually encode the effective resistances of the corresponding edges scaled by corresponding weights (see D3).
- Next we showed that our reweighting scheme does not distort the quadratic form of , and hence produces the spectral guarantee. This is the proof of the [SS Theorem 1] (see D9).
- We also showed that approximating the effective resistances (as opposed to exactly computing them) does not hurt the sparsifier too much [SS Corollary 6] (see D10).
- So now we ask:
- (a) How do we approximate the effective resistances?
- (b) How fast?
- We show that the effective resistance of any given edge is equal to the norm of the difference of two columns of a certain matrix (see D11).
- We used a special version of the Johnson-Lindenstrauss lemma (due to Achlioptas) to project the columns of this matrix into lower dimensions. This projection is guaranteed to preserve pairwise distances (that is, norms of the differences between the columns). This lower-dimensional projection is the matrix (see D14).
- Then we approximated this matrix using Spielman-Teng linear system solver for Laplacian systems (see D15).
- Lastly we show that if we set the value the approximation parameter of the previous step equal to some specified value (determined by [SS Lemma 9]), we can approximate the effective resistances in reasonable time (see D16).
That’s all for today, folks. Please let me know if you spot any gap/mistake in the explanations.
Very well-written indeed! We need more from you.
Thanks a lot. I have been reading the Kelner-Levin paper and Koutis-Miller-Peng paper that built on top these ideas. Insha-Allah I will write about them later.