Spectral Sparsification by Spielman and Srivastava: How They Connected the Dots

In this post, I will discuss my own understanding of the spectral sparsification paper by Daniel Spielman and Nikhil Srivastava (STOC 2008). I will assume the following:

  1. The reader is a beginner, like me, and have already glanced through the Spielman-Srivastava paper (from now on, the SS paper).
  2. The reader has, like me, a basic understanding of spectral sparsification and associated concepts of matrix analysis. I will assume that she has read and understood the Section 2 (Preliminaries) of the SS paper.
  3. The reader holds a copy of the SS paper while reading my post.

First, I will mention the main theorems (actually, I will mention only what they “roughly” say).

The Main Theorems

SS Theorem 1. It is possible to construct a spectral sparsifier H of the input graph G if we sample (with replacement) each edge e (of weight w_e) with probability p_e proportional to its effective resistance R_e. We will need to take O(n\log{n}/\epsilon^2) samples, and the resultant sparsifier will have the following property with probability at least \frac{1}{2}:

\forall x \in \mathbb{R}^n \ \ \ (1-\epsilon)x^TLx\leq x^T\tilde{L}x\leq (1+\epsilon)x^TLx\ .

where

  • x, an n\times 1 vector, is a real valued function defined over the vertices of G
  • L and $\tilde{L}$) are the Laplacian matrix of G and H, respectively

The above theorem says, the Laplacian quadratic form of H (i.e., x^T\tilde{L}x) can be made very close to the Laplacian quadratic form of G, with probability greater than \frac{1}{2}. Note that this theorem needs the effective resistance R_e of each edge already computed. The following theorem shows how to approximate R_e efficiently.

SS Theorem 2. It is possible to approximately compute the effective resistances R_e of each edge e in time O(m (\log{r})/\epsilon^2) through the approximation of a special matrix Z, which we will discuss later.

Is an approximation of R_e good enough for [SS Theorem 1] to still hold? As it turns out, it is, confirmed by the following lemma.

SS Corollary 6. Approximate effective resistances R_e are good enough for obtaining a spectral sparsifier.

The Dots, Waiting to be Connected

The main questions are:

  1. How do we prove the existence of such a probability distribution which lets us generate a spectral sparsifier?
  2. How do we efficiently compute/approximate these probabilities?

The answer to these questions can be found through weaving elegant connections among some apparently disconnected/loosely-connected dots. I will mention these dots as D1,D2,… and walk the reader through the connections.

D1. Effective resistance R_e of edge e can be defined in terms of the corresponding row b_e of the m\times n incidence matrix B as follows:

R_e=b_eL^+b_e^T,

where L^+ is the Moore-Penrose pseudoinverse of the Laplacian L. Consequently, we can see that the (e,e) diagonal entries of the m\times m matrix BL^+B^T are the effective resistances R_e.

D2. (depends on D1) We can construct the m\times m matrix $\Pi$ which is similar to BL^+B^T as follows:

\Pi=W^{1/2}BL^{+}B^TW^{1/2}

It was proved in [SS Lemma 3] that

  1. \Pi is a projection matrix, and hence \Pi^2=\Pi.
  2. Rank of \Pi is n-1.
  3. All eigenvalues of \Pi are either zero or one. Multiplicity of 1 is n-1, and multiplicity of 0 is m-n+1.
  4. Diagonal elements of \Pi is equal to the squared-2-norm that column. That is, \Pi(e,e)=\lVert \Pi(.,e)\rVert^2

D3. (depends on D2) The diagonal entries of \Pi are therefore

\Pi(e,e)=\sqrt{W(e,e)}R_e\sqrt{W(e,e)}=w_eR_e

Thus we are interested in computing (either exactly or approximately) the diagonal entries of \Pi.

D4. Now consider an m\times m diagonal matrix S whose diagonal entries denote the “scaling factor” introduced by the sparsification algorithm. Namely,

S(e,e)=\frac{\tilde{w}_e}{w_e}

=\frac{ \text{number of times e was sampled}}{qp_e}

where

  1. Edge e is sampled (with replacement) with probability p_e
  2. q is the number of samples taken to generate the sparsifier H
  3. w_e is the weight of the edge e in G
  4. \tilde{w}_e=S(e,e)w_e is the weight of the edge e in H

D5. (depends on D4) It can be noted that the number of times an edge was actually sampled within q trials is expected to be equal to p_e. Therefore, \mathbb{E}[\tilde{w}_e]= w_e, and thus \mathbb{E}[S]= I and \mathbb{E}[\tilde{L}]=L.

D6. (depends on D4) Since \tilde{w}_e=S(e,e)w_e, the Laplacian of H can be written as

\tilde{L}=B^TWB=B^TW^{1/2}SW^{1/2}B

Note that we want to make x^T\tilde{L}x, the Laplacian quadratic form of  H, very close to  x^TLx, the Laplacian quadratic form of G.

D7. ([SS Lemma 5] Rudelson-Vershynin Lemma) Roughly, this lemma says the following. Suppose we draw q vectors y_i from certain probability distribution, and build corresponding rank-1 matrices y_iy_i^T. Then, the expectation of the 2-norm of the difference between (1) the average value of these rank-1 matrices and (2) the expected value of such a rank-1 matrix is bounded. This bound depends on the number of samples taken and maximum of the 2-norm of the vectors y_i.

D8. (depends on D7, D2, D3) Suppose the probability with with each edge e is sampled is proportional to the corresponding diagonal entry of the projection matrix \Pi. That is,

p_e=\Pi(e,e)=w_eR_e.\ \ \text{using D2,D3}

Then, using the Rudelson-Vershynin lemma (D7), it can be proved that

\lVert \Pi S\Pi - I\rVert_2 <\epsilon with probability \geq \frac{1}{2}.

 (This is proved in the SS paper after stating the Rudelson-Vershynin lemma.) Observe that

y^T\Pi(S-I)\Pi y=y^T\Pi S\Pi y-y^T\Pi\Pi y=y^T\Pi S\Pi y-y^T\Pi y\ \ \text{(from D2)}

and hence S does not distort y^T\Pi y too much.

D9. [SS Lemma 4] (depends on D8) This lemma says that, given D8, that is, if S is a non-negative diagonal matrix satisfying

\lVert \Pi S\Pi - I\rVert_2 <\epsilon,

then

\forall x \in \mathbb{R}^n \ \ \ (1-\epsilon)x^TLx\leq x^T\tilde{L}x\leq (1+\epsilon)x^TLx\ .

Essentially, this proves [SS Theorem 1].

D10. (depends on D9) If the probability distribution used to sparsify is approximated, and the estimate is (roughly speaking) bounded by certain factor \alpha, then this approximation only changes the approximation error \epsilon in D9 to \alpha \epsilon, thus preserving the guarantee of spectral sparsification [SS Theorem 1].

This proves [SS Corollary 6].

Now that we have proved [SS Theorem 1], we  are left to prove [SS Theorem 2].

D11. (depends on D1) Let \chi_u be the characteristic vector of u, that is, zero in all the rows except the u^{\text{th}} row which contains one. Now note that

R_{uv}=b_{uv}L^+b_{uv}^T

=(\chi_u-\chi_v)L^+(\chi_u-\chi_v)^T

=(\chi_u-\chi_v)L^+LL^+(\chi_u-\chi_v)^T

=(\chi_u-\chi_v)L^+(B^TWB)L^+(\chi_u-\chi_v)^T\ \ \text{since } L=B^TWB

=(\chi_u-\chi_v)L^+B^TW^{1/2}W^{1/2}BL^+(\chi_u-\chi_v)^T

=\lVert W^{1/2}BL^+(\chi_u-\chi_v) \rVert^2

Suppose \hat{Z}=W^{1/2}BL^+ be an m\times n matrix. If we know the $\latex \hat{Z}$, then $R_{uv}$ can be directly calculated from the difference of two columns of \hat{Z}. Therefore, now we focus on computing \hat{Z}.

Actually, we will compute \hat{Z}^T, a n\times m matrix.

D12. This a special version (due to Achlioptas) of the famous Johnson-Lindenstrauss theorem. Roughly speaking, this tells that members of any fixed set of n dimensional vectors can be projected in O(\log{n}/\epsilon^2) dimension through a k\times d random matrix Q where k=O(\log{n}/\epsilon^2), such that their distances are closely preserved. Entries of Q are Bernoulli random values taken from \{\pm 1/\sqrt{k}\}.

D13. This is the Spielman-Teng linear system solver (STsolver), which can be used to generate approximate solutions to system of linear equations involving Laplacians. This algorithm runs in expected time O(m\log{1/\delta})=\tilde{O}(m), where

  1. m is the number of non-zero entries in L, and
  2. \delta is a predetermined approximation error which is introduced to the generated solution x. This error parameter \delta can be set as an input to the solver.

D14. (uses D12) We use D12 to project the n dimensional columns of the n\times m matrix \hat{Z}^T into k=O(\log{n}/\epsilon^2) dimensions. Specifically, we want to compute the matrix

Z=Q\hat{Z}=QW^{1/2}BL^+,

where Q is a k\times m matrix. First, we compute the k\times n matrix Y=QW^{1/2}B. This takes O(m(\log{n})/\epsilon^2) time because B has 2m non-zero entries.

D15. (depends on D14, uses D13) Note that Y has k=O(\log{n}/\epsilon^2) rows and n columns. Let y^T be any row of Y so that length of y is n. Now we use D13 to approximately solve the system of equations Lx=y, which gives us approximate solution x=L^+y to a single row of the k\times n) matrix Z=YL^+. Each call to STsolver takes time \tilde{O}(m) since L has only 2m non-zero entries. If we call STsolver k times, once for each row of Y, we will be able to approximate the matrix Z with a matrix \tilde{Z} in total time

k\times \tilde{O}(m \log{1/\delta}))

=O(\log{n}/\epsilon^2)\times \tilde{O}(m\log{1/\delta})

\tilde{O}(m\log{(1/\delta)}/\epsilon^2),

where we absorb the \log{n} term under the \tilde{O} notation.

D16. (depends on D11, D15) Now we pick an appropriate value of \delta, the error introduced by the Spielman-Teng solver. [SS Lemma 9], with the help of [SS Proposition 10], proves that if we pick a specific value of \delta, the matrix Z obtained in D15 can be used to compute the effective resistances as outlined in D11. It turns out that the appropriate value of \delta involves the minimum-to-maximum weight ratio

r=\frac{w_{min}}{w_{max}},

which makes the final running time of approximating the effective resistances to
\tilde{O}(m\log{r}/\epsilon^2).

Therefore, we have proved [SS Theorem 1].

We Are Done

To summarize,

  1. First we showed that we can obtain a spectral sparsifer if we sample from a certain probability distribution (see D4).
  2. Then we showed that these probabilities are proportional to the diagonal entries of the projection matrix \Pi (see D8).
  3. We also showed that the diagonal entries of the projection matrix \Pi actually encode the effective resistances of the corresponding edges scaled by corresponding weights (see D3).
  4. Next we showed that our reweighting scheme does not distort the quadratic form of \Pi, and hence produces the spectral guarantee. This is the proof of the [SS Theorem 1] (see D9).
  5. We also showed that approximating the effective resistances (as opposed to exactly computing them) does not hurt the sparsifier too much [SS Corollary 6] (see D10).
  6. So now we ask:
    • (a) How do we approximate the effective resistances?
    • (b) How fast?
  7. We show that the effective resistance of any given edge is equal to the norm of the difference of two columns of a certain matrix \hat{Z} (see D11).
  8. We used a special version of the Johnson-Lindenstrauss lemma (due to Achlioptas) to project the columns of this matrix into lower dimensions. This projection is guaranteed to preserve pairwise distances (that is, norms of the differences between the columns). This lower-dimensional projection is the matrix Z (see D14).
  9. Then we approximated this matrix using Spielman-Teng linear system solver for Laplacian systems (see D15).
  10. Lastly we show that if we set the value the approximation parameter \delta of the previous step equal to some specified value (determined by [SS Lemma 9]), we can approximate the effective resistances in reasonable time (see D16).

That’s all for today, folks. Please let me know if you spot any gap/mistake in the explanations.

Advertisements

2 thoughts on “Spectral Sparsification by Spielman and Srivastava: How They Connected the Dots

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s