**Statement**

Let be an complex Hermitian matrix which means where denote the conjugate transpose operation. Let be two different eigenvalues of . Let be the two eigenvectors of corresponding to the two eigenvalues and , respectively.

Then the following is true:

Here denote the usual inner product of two vectors , i.e.,

.

**Proof**

It is given that

,

.

Since , it follows that

.

However, we have

.

Therefore,

.

Therefore, ,

and

Thus the eigenvectors corresponding to different eigenvalues of a Hermitian matrix are orthogonal. Additionally, the eigenvalues corresponding to a pair of non-orthogonal eigenvectors are equal.

Filed under: Elementary, Expository, Mathematics, Matrix Analysis, Spectral Graph Theory Tagged: eigenvalue, eigenvector, hermitian matrix, matrix ]]>

In this post, I am going to explain how I understand the proof of the hardcore lemma presented in the Arora-Barak complexity book (here). Because the formal proof can be found in the book I intend to write in an informal way. I think some subtleties are involved in turning the context of the lemma into a suitable two-player zero-sum game. Doing so enables one to use von Neumann’s minimax theorem to effectively “exchange the quantifiers” in the contrapositive statement of the lemma. Although the Arora-Barak proof mentions these subtleties, I am going to explore these in more detail and in a more accessible way for a beginner like me.

*Please let me know about any mistakes in this post. I make a lot of mistakes.*

This post is organized as follows. First, we informally present the idea of hard-on-average functions. Then we present John von Neumann’s minimax theorem for two-player zero-sum games and mention some of its implications. Next, we mention the hardcore lemma and proceed to prove it by contradiction. The contradictory assumption would lead us to construct a two-player zero-sum game between a circuit player and an input player, both playing with pure (deterministic) strategies. Then we allow the players to use mixed strategies (which are distributions on the pure strategies). This enables us to use von Neumann’s minimax theorem to show that there exists a certain distribution on small circuits. Circuits drawn from this distribution perform well on average on any suitable input distribution. By combining these circuits, we can construct a slightly larger circuit that would exactly compute the supposedly average-case hard function on a large fraction of inputs, contradicting the hardness assumption on the said function.

Consider the function that is -hard on average: that is, every small circuit intended to compute makes an error on at least fraction of inputs. Why small circuits? Because if we are allowed to have arbitrarily large circuits we can compute exactly.

The hardness quantified in the above manner is called “average-case hardness”.

In contrast, if every small circuit fails to compute on every input, is called “worst-case hard”. Keep in mind that both average-case and worst-case hardness is parameterized with an upper bound on the circuit size, which determines the largest size of the circuit being considered.

Why is hard? Or rather, given a fixed circuit, why does it succeed in computing on some inputs but fails on the other inputs? How is the “hardness” of distributed on its inputs? When we are restricted with small circuits to compute , are all inputs equally hard? Or some inputs harder than others? An answer to these questions is given by Impagliazzo’s hardcore lemma.

Suppose two players and are playing a game. can pick a move whereas can pick a move . They pick their moves this without knowing the opponent’s choice. The “value” of the game, after and has made their respective choices and , is denoted by . always prefers a high value, so he is the “max player”. On the other hand, always prefers a low value, and hence he is the “min player”. If wins a certain value, it means loses the same value. This can be modeled as trying to maximize whereas trying to minimize . If has chosen move , the minimum gain he can expect is . The worst case gain for for the entire game is then . Similarly, the worst case loss for is . The game is said to be in equilibrium if $v_A = v_B$, which means no player would have any incentive to change his move as long as the other player does not change his move. While it is not obvious whether an equilibrium should exist, von Neumann’s 1928 theorem says that an equilibrium would exist if the two players select their move from **any** distribution (that is not a Dirac delta function) on the candidate moves instead of always selecting a particular move. That is, suppose is a distribution on and is a distribution on . Let denote the expected value of on its inputs, . Then minimax theorem says:

Some points to note:

- The minimax theorem says that both players have a winning move as long as they pick from their own distributions. Here, a winning move is a move that best limits the other player’s amount of success.
- It also says that the equilibrium value of the game for both players is the same regardless of who plays first.
- For an equilibrium to exist, it does not matter what the actual distributions are. All that matters is that for each player, there exist multiple moves with positive probability.
- The minimax theorem is equivalent to the strong duality of linear programming. In fact, both sides of the above equation can be formulated as linear programs, each dual of the other. The proof of equality then directly follows from the fact that the optimum values attained by both the primal and the dual programs are equal.
**Lacks explanation.**The domain of the distributions and has to be finite. In other words, the players must choose from a finite number of moves. This is because the existence of Nash equilibrium is proved only for finite games, which means both the number of players and the number of moves available to them have to be finite.

Notice that the minimax principle is a worst-case loss-aversion policy. Suppose the first player is the min player. From his perspective, the value of the game would be

.

This is a minimization problem for the first (min) player over all possible choices of the second (max) player. However, the inner maximization problem is bounded and feasible because of the finite number of choices and the bounded payoff function. Hence the payoff function would achieve its optimum value at some vertex of the convex feasibility polytope whose vertices are all the candidate pure strategies of the second player. Thus we can safely assume that the second player would ultimately choose a pure strategy as opposed to selecting a convex combination of multiple pure strategies. Hence

Similarly, the equilibrium value of the second (max) player would be

**Fact.** Even if the first player knows the distribution used by the second player, he does not gain any extra information or advantage from it.

Informally, the hardcore lemma says that given an average-case hard function , there is always a “large” set of inputs on which no small circuit can do much better than guessing. In other word, inputs from this set are hard for all small circuits that intend to compute . This is why this set is called the *hard core* (or *hardcore*) of .

If a probability distribution assigns at most probability to any objects in its domain , it is called a **-density distribution**. If a probability distribution is uniform on exactly fraction of the objects in its domain, it is called a **-flat distribution**, or just -flat in short. Note that for a fixed and a finite domain, there can be only a finite number of -flats.

Suppose we are dealing with functions that take an bit input to a single bit output. Suppose there exists a function such that we mention circuits of size at most as **small circuits**. Additionally, we mention circuits of size at most as **smaller circuits**. The idea is that if we combine a polynomial number of smaller circuits, we get a small circuit.

Recall that a function** is -hard on average** if every small circuit intended to compute makes an error on at least fraction of inputs .

If a circuit is able to compute on the inputs drawn from a distribution with probability for any fixed , we say **performs well** on . Otherwise, we say **performs poorly** on . For the following discussion, let us fix an arbitrary positive real .

Consider the two statements below.

**Statement .** is -hard for small circuits, which means every small circuit fails to compute on at least fraction of inputs.

**Statement .** There exists a -density distribution on -bit strings such that all **smaller circuits** *perform poorly* on .

The hardcore lemma states that .

See the definitions above for the phrases “performs poorly” and “small / smaller circuits”.

We are going to prove by contradiction. That is, we are going to assume that is true but is false, and then proceed to show that , a contradiction.

Consider the statement , which is the contrapositive of the statement , as follows.

**Statement . **For every -density distribution there exists some smaller circuit which performs well on .

Keep in mind that for the sake of contradiction, we are assuming that Statement is true.

Statements and smells a lot like a game between an input player and a circuit player. The input player moves first and selects an input distribution . The circuit player moves second and selects a circuit . If performs well on , the input player pays the circuit player an amount of

for a fixed positive $\epsilon < 1/2$. Otherwise, the circuit player pays to the input player the above amount. Therefore, we can think of it as a two-player zero-sum game.

Consider selecting a single -density distribution for the input player. This is a pure strategy for him, but there exists uncountably many -density distributions on the set of inputs . This seemingly implies that the number of choices is not finite. However, there is an important fact relating every -density distribution to the set of all possible -flat distributions.

**Fact 1.** Every -density distribution is a convex combination of the -flat distributions.

See the section “Omitted Proofs” for a proof.

Thus selecting a particular is equivalent to taking a convex combination of a finite number of -flats, which is a finite operation.

Now the game is finite since both the number of small circuits and the number of -density distributions are bounded. The strategies (moves) are pure because both players select their moves deterministically.

Statement implies that for every produced by the input player, there exists a circuit that performs well of . Suppose the payoff for this particular circuit is

.

Since the circuit player is maximizing across all suitable circuits, he can guarantee himself a payoff of at least .

What happens if we allow the input player to have randomized (so-called mixed) strategies? Suppose we allow the input player to select an arbitrary distribution over all possible -distributions. After that, he can draw a specific . The circuit player still holds the advantage since thanks to the statement he has *some *well-performing* *circuit for every that the input player chooses.

The value attained by the circuit player at this stage is

.

However, there is a problem: a mixed strategy is a distribution on all possible -density distributions. We can easily see that the number = the number of distinct is uncountably infinite. How does the input player choose his mixed strategy from an infinite set of pure strategies? Luckily, the following fact saves the day.

**Fact 2.** Any distribution on all -density distributions is itself a -density distribution.

See the section “Omitted Proofs” for a proof.

Thus selecting a -density distribution is the same as selecting an arbitrary -density distribution which, according to Fact 1, is equivalent to taking a convex combination of a finite number of -flat distributions.

Next, suppose we force the circuit player to select an arbitrary distribution on smaller circuits. Then he can draw a circuit to play. Now both players are using mixed strategies and hence von Neumann’s minimax theorem applies. It tells us that an equilibrium with mixed strategies exists for both players, and moreover, the value of the game is the same no matter which player plays first.

Recall the fact that knowing the distribution used by the second player yields no advantage to the first player when both players use mixed strategies in a two-player zero-sum game. Since the circuit player is playing second and he is picking the circuit from certain distribution , the value of the game would be the same as the value of the game where he used only pure strategies i.e., picking single circuits instead of drawing . In other words,

.

This gives us the following fact.

**Fact 3.** Assuming statement is true, for every distribution put forth by the input player, the circuit player always has some circuit that performs well on it. In other words, the circuit player always wins a payoff at least playing second.

We can also see it the following way. When the circuit player picks a distribution instead of a single circuit, he has more power and since we are maximizing, his payoff cannot decrease from what it was when he was restricted to only pure strategies.

By von Neumann’s minimax theorem, when both players use the mixed strategies that lead to the equilibrium, the value obtained by the circuit player playing first would be the same as the value he would have obtained playing second. But statement implies that he would always have at least amount playing second. Therefore, he would also have the same payoff playing first. *Isn’t it magic?*

**Fact 4.** Assuming statement is true, there exists a distribution on small circuits for the circuit player such that for every mixed strategy used by the input player, the circuit player can select a circuit which always performs well on any distribution .

Suppose the circuit player has a distribution according to Fact 4 above. We can *independently* sample number of small circuits from and construct a silghtly larger circuit .

If the probability that all circuits are able to compute with probability at least for any positive , we say that the input is **good** for the distribution . Otherwise, we say that the input is **bad **for the distribution . Note that being good or bad is a property of an input and the circuit distribution and it does not depend on any specific input distribution.

**Fact 3.** Assuming is true, the total number of bad inputs must be strictly less than .

*Proof.* Because otherwise for every circuit distribution the input player can construct a equivalent to a -flat distribution which would be uniform on the bad inputs for . Then no circuit would be able to perform well on . This would mean that the value attained by the circuit player would be strictly less than , contradicting the value promised by the minimax theorem and the assumption .

Let be a good input. Then with probability , each smaller circuit will succeed in computing . Then by a Chernoff bound, the probability that on a good is strictly less than $2^{-O(n)}$ because . Because there can be at most good , by union bound, the probability that for a good is strictly less than . This means with positive probability there exists a circuit which correctly computes on all good . However, because the number of bad is strictly less than , the fraction of inputs on which the small circuit succeeds in correctly computing is strictly larger than . But this violates the given statement that is -hard on average.

Thus we have reached a contradiction: The statement cannot be true and hence the claim of the hardcore lemma follows.

- Arora-Barak proof of the hardcore lemma from the book “Computational Complexity – A Modern Approach (2009)”
- Mikio Nakayama’s excellent proof of von Neumann’s minimax theorem from strong LP Duality
- Tinne Hof Kjeldsen’s exposition on von Neumann’s proofs of the minimax theorem
- Jiri Matousek and Bernd Gartner’s proof of the strong LP duality from Farkas’ lemma
- Avner Magen’s lecture notes on how the strong LP duality implies von Neumann’s minimax theorem
- Jacob Fox’s lecture notes on the proof of Brouwer’s fixed point theorem from Sperner ‘s lemma

**Fact 1.** *Every -density distribution is a convex combination of the -flat distributions.*

*Proof.* We will only outline the proof, which is given as a hint on an exercise in the Arora-Barak book. Consider an embedding of the distributions on i in a dimensional metric space. Every point must be contained inside the hypercube of side located at the all-positive orthant. Consider the simplex formed by all the -flat distributions; this simplex is a proper subset of the said hypercube. All points inside the simplex are convex combinations of the vertices. Thus these points are distributions themselves.

Our claim states that every -density distribution must occur inside the simplex including its convex hull.

Suppose not. Then there exists a -density distribution occurring outside the simplex . By Farkas’s lemma, every point must occur either inside a convex body, or there exists a hyperplane separating it from the convex body. Let be the normal vector to that separating plane. Then there must exist a nonnegative scalar such that for all , and .

Consider the inner product . Note tha although for all , the s could attain any real value. Now let us reorder the terms in the above sum in a decreasing order and scan the terms from large values to small values. Let index the terms in this order. If we find a terms such that , we “move” some weight from to so that the weight in becomes . Sepcifically, we do the following: . This is equivalent to moving along the surface of the norm-1 ball centered at the origin. Doing so would ensure that the moved point still remains a distribution. However, its inner product with will strictly increase because of our descending ordering, meaning the point will still be separated from the simplex . By doing this repeatedly will become a -flat distribution because every non-zero coordinate will have weight and because we always stayed on the norm-1 ball, the sum of all weights must be 1. Hence we found a -flat distribution which is outside the simplex , contradicting the statement that is the simplex constructed from all the -flat distributions to begin with.

**Fact 2.** *Any distribution on all -density distributions is itself a -density distribution.*

*Proof.* Let are all the -density distributions on the strings . Consequently we have for all . Suppose is a distribution on such that we have . Let us consider the following sampling strategy for sampling a single string string as follows. First, we fix an arbitrary distribution on the s. Then we select a distribution . Finally we select . The probability that a particular will be drawn from this process is . Thus sampling in this way gives rise to a -density distribution. Because was an arbitrary distribution, our claim follows.

Filed under: Computer Science, Expository, Game Theory, Mathematics, Theoretical Computer Science, Uncategorized Tagged: Arora-Barak, Game Theroy, Hardcore Lemma, Hardness Amplification, Impagliazzo, Minimax Theorem, von Neumann ]]>

— “Hey, Saad!” exclaimed a low voice. I looked to my right. He just came out of that blue door of the weight room.

— “You used to be my TA !” he said. It always feels good to meet your past students, especially ones that still remember you. I looked hard. Remembering my students’ names used to be my forte, but probably not anymore.

— “Hey! I completely forgot your name. Which course was it?”

— “You were my TA in that intro programming course, about 5 years ago. I am Jason.” I still could not remember him. My memory sucks.

— “Jason! It’s great to meet you! What are you up to now? Grad school?”

— “No, I am in my junior year.” Which means he has about a year more in the college, I figured. A little surprising, I thought, because most undergrads finish college in 3-4 years. He seemed to be in a hurry. I looked down and didn’t find any words. Any appropriate words, to be precise.

— “See ya later!” We broke off.

*** *** ***

Yesterday was a Sunday night. Not too many people in the gym. I was pushing myself hard on the elliptical cross trainer with my orange headphones on. My sweat was dripping from the forehead to the gray floor, making small black patches. It felt good. At one point I looked up and there was Jason, walking in front of me. He smiled. I smiled back and waved my hand. Now I remembered.

Jason got on to the treadmill in front of me, slightly to my left to be exact. I was watching him, bottom to top. He had a pair of orange Nike sneakers on him, then black legs, white shorts and a white tee, then a head full of shiny black hairs. He started to walk slowly on the machine before picking up a little pace. I was watching. He had an impressive shoulder and upper back as well as biceps and triceps. As time wore on I noticed that he was not really running, but walking with steady gaits. His palms kept clutching the handles in front of him. I was watching. A small red light kept blinking on the black background of the back of his left leg.

He was done after about half an hour. Getting off the treadmill he walked off, once again smiling and waving on me. I kept watching.

*** *** ***

When you are running for an hour by yourself all kinds of thoughts come across your mind. Things you got to do. Things you should have done. Things you almost grasped for yourself. Things you can never have. Things you will never have. Things your peers have but you don’t. Things you always have planned to do. Things. A lot of things.

For me there is always an excuse for everything I missed.

Jason doesn’t have time for any excuses, though. He is full of life. He has both his legs amputated right from the middle thighs. He has his black, artificial legs, complete with knee joints and a pair of Nike sneakers. He is back after two years. In the change room I saw that he has a 5 inches long, deep, brown, thick surgery mark right in the middle of this chest. He lifts weights. He runs. He is full of life. A small red light blinks from the back of his left leg, just where his calves should have been. He is full of life. He will finish college in about a year. He doesn’t put up excuses. He is full of life and I know that he will be whatever he wants to be. He will capture every single of his dreams. He doesn’t put up excuses.

What is your excuse today?

Filed under: Life, Uncategorized ]]>

Here goes the code. It’s simple.

void main() { int arr[]={1,2,3,4}; // data const int *p1 = &arr[0]; // non-const ptr to const data int const *p2 = &arr[0]; // non-const ptr to const data int* const p3 = &arr[0]; // const ptr to non-const data const int* const p4 = &arr[0]; // const ptr to const data p1++; // ok p1[0]++; // compile error: modifying const data p2++; // ok p2[0]++; // compile error: modifying const data p3++; // compile error: modifying const ptr p3[0]++; // ok p4++; // compile error: modifying const ptr p4[0]++; // compile error: modifying const data }

Now, let’s see what the g++ compiler (C++ 4.8.1) has to say about this code. You can try it yourself at http://ideone.com/AJUBcD.

prog.cpp: In function ‘int main()’: prog.cpp:11:7: error: increment of read-only location ‘* p1’ p1[0]++; // compile error: modifying const data ^ prog.cpp:14:7: error: increment of read-only location ‘* p2’ p2[0]++; // compile error: modifying const data ^ prog.cpp:16:4: error: increment of read-only variable ‘p3’ p3++; // compile error: modifying const ptr ^ prog.cpp:19:4: error: increment of read-only variable ‘p4’ p4++; // compile error: modifying const ptr ^ prog.cpp:20:7: error: increment of read-only location ‘*(const int*)p4’ p4[0]++; // compile error: modifying const data ^

Filed under: Code, Computer Science, Uncategorized ]]>

First, recall the power series . For , this sum is definitely larger than its th term only. In other words, , which implies . Now,

This proof is extracted from the StackExchange discussion here. For more inequalities/bounds for binomial coefficients, see Wikipedia.

Filed under: Combinatorics, Computer Science, Elementary, Expository, Mathematics ]]>

Recall that the Laplacian matrix is a symmetric, positive semidefinite matrix. For the graph , its Laplacian matrix is as follows:

The following proof is taken from the lecture notes of Prof. Jonathan Kelner in MIT opencourseware.

**Proof that is an eigenvalue with multiplicity **

Since the sum of entries along a row/column of is , and hence must be an eigenvalue of any Laplacian matrix. The corresponding eigenvector is any constant vector, that is, a vector whose all entries are equal. Thus we have

and .

It can be easily verified that the rank of is because we cannot obtain the zero vector by summing up less than row/column vectors. Using the Rank-Nullity theorem, this implies that the dimension of the nullspace of is , which means the vector spans the null-space of and there cannot be any other vector which is not parallel to and . Hence, can appear only once as the eigenvalue of .

**Proof that is an eigenvalue with multiplicity **

Suppose is a non-zero eigenvalue of the Laplacian matrix of the undirected unweighted complete graph with vertices, . Let be the associated eigenvector. Since , we have and hence must be perpendicular to the constant vector, which means the dot product of

and

must be . This, in turn, implies that

.

That is, the sum of the entries of will be zero.

Now we want to find out the vector (which is equal to because is a non-zero eigenvalue of ).

Consider the entry of , which is obtained by multiplying the row of with . Since the graph is unweighted, this row vector will have at the position and at everywhere else. Therefore,

Further simplification shows that

.

According to Equation , the sum in the right hand side is . Hence, we see that

and therefore

.

Thus we showed that every non-zero eigenvalue of must be .

Filed under: Computer Science, Expository, Graph Theory, Mathematics, Matrix Analysis, Spectral Graph Theory, Uncategorized Tagged: eigenvalue, eigenvector, laplacian ]]>

Suppose is the Laplacian of a connected simple graph with vertices. Then the number of spanning trees in , denoted by , is the following:

the product of all non-zero eigenvalues of .

Before we proceed to the proof, let us give some definitions. We assume that some elementary concepts of linear algebra (e.g., rank, determinant, eigenvalue/eigenvector, minor/cofactor, etc.) are known to the reader. However, we use some known “facts” that we do not prove in this article.

The proof has two main parts:

**Part 1:** We prove that the number of spanning trees in a connected simple graph is equal to *any* cofactor of the Laplacian matrix of that graph. This is the heart of the theorem/proof.

**Part 2:** Next we use standard linear algebra results to relate the cofactor of any matrix and its eigenvalues. Since the matrix we use is the graph Laplacian, it readily relates the eigenvalues (of the Laplacian) with the number of spanning trees (using the result from previous part). This part can be found in any standard text in matrix analysis; for example: see Theorem 1.2.12, and Definitions 1.2.5 and 1.2.9 in the Horn and Johnson book [7].

Now we mention some preliminaries.

*(For concepts that are considered basics in linear algebra, we provide hyperlinks to detailed exposition.)*

Let be a graph on vertices and edges. is the adjacency matrix of with zeros in the main diagonal and the entry whenever the edge exists, and when the edge does not exist.

Let be the diagonal matrix whose entry is the degree of the vertex . The *combinatorial Laplacian*, or simply *Laplacian*, is the matrix . Note that sum of all entries in any row of is (why?); that is why the constant vector is an eigenvector of with eigenvalue . Since all eigenvalues of are non-negative, must be the smallest eigenvalue of .

Let be the incidence matrix where columns correspond to edges and rows correspond to vertices, and any column , corresponding to the edge ), contains zeros at all rows except at two positions: and . Note that can be written as the “square” of the incidence matrix; that is, .

Let be the cofactor of the diagonal element of a square matrix. Let be the principal minor of any square matrix .

Now let us mention some facts of which we will not present proofs.

**Fact F1.** Let be the submatrix obtained from the incidence matrix by removing its row. Then, the cofactor of the diagonal element of the Laplacian is the following:

**Proof***:* See the “Omitted Proofs” section.

**Fact F2.** Cauchy-Binet formula for determinant, which can compute the determinant of the product of two matrices, say , from the determinants of the submatrices of and .

**Fact F3.** (Unused)

**Fact F4.** The product of all non-zero eigenvalues of the combinatorial Laplacian is equal to the sum of all principal minors of . That is,

where is the principal minor of .

**Proof:** See the “Omitted Proofs” section below.

**F5.** Diagonal cofactors of an square matrix are equal to corresponding principal minors. That is,

.

**Proof***(of F5):* Same as the proof sketch of

Now we will establish some relationship between the incidence matrix and the connectedness of .

Note that each column of the incidence matrix has exactly one and one . Let us define the ** row sum **of a column of any matrix as the sum of all entries in that column. Similarly, let the row sum of a matrix be the sum of all its entries. Now note that if is connected, each column of the incidence matrix contains exactly two non-zero entries and , and thus row sum at each column is zero. Now, this implies that the sum of all row vectors of is the zero-vector and hence any row can be expressed as a linear combination of all other rows. Therefore, rows of are

**Proposition P1.** *If is connected and is its incidence matrix, then .*

Now consider a subgraph of on vertices such that . This means, all rows of are linearly independent, and thus there is at least one column in for which the row-sum is not zero. This implies that there is at least one edge which connects a vertex in to a vertex outside . In other words, cannot be a connected component of . Therefore, we have the following claim:

**Proposition P2.** *If the rank of the incidence matrix is for a -vertex subgraph of , then is connected.*

Now we will prove an important fact about the determinant of any square submatrix of .

**Lemma L0.** *The determinant of any square submatrix of is either 0 or .*

**Proof:** * *See the “Omitted Proofs” section.

Let be a non-empty submatrix obtained from by removing exactly one row from . Note that at least one column of will have non-zero row sum. Therefore, rows of will be linearly independent, and must be greater than or equal to the number of rows of . Since has rows, it follows that .

However, the rank of the incidence matrix must be greater than or equal to the rank of any row-subset of , such as . Since the largest row-subset of will have rows, we can have the following claim:

**Proposition P3.*** If is connected and is its incidence matrix, then .*

Now we will prove the following lemma.

**Lemma L1.** (Rank of the Incidence matrix of a connected graph) *Let be the incidence matrix of graph . Then, is connected .*

**Proof: ** See the “Omitted Proofs” section below.

The subgraph will be a spanning tree of the graph if it has the following properties:

- has vertices and edges.
- is connected.
- The incidence matrix , whose columns correspond to the edges of , has rows.

Let us consider , which is a matrix. Since is connected, there will be no row with all zero entries, and according to **Lemma L1** we know that . This means any collection of rows of are linearly independent. Consequently, any row-subset of containing rows (and all columns) will have rank . Now, since is a square matrix with rows and columns with rank , it follows that . Therefore we have proved the following lemma:

**Lemma L2.** (Determinant of an order- square submatrix of )

*If is the incidence matrix of a spanning tree of , every square submatrix of is non-singular.*

The above Lemma also leads us to the following elegant result.

**Lemma L3.** (Spanning tree of and incidence matrix ) *Let be a graph with vertices, and let be its incidence matrix. Let be a subgraph of with edges, and let be its incidence matrix. Then, is a spanning tree of every square submatrix of is non-singular.*

**Proof: **See the “Omitted Proofs” section below.

We can readily get the following corollary.

**Corollary C1.**

*Each non-singular square submatrix of corresponds to a spanning tree of induced by its columns. *

**Proof:** See the “Omitted Proofs” section.

The above corollary leads us to the following simple yet beautiful result.

**Lemma L4.** (Number of spanning tree and the incidence matrix )

*Let be a submatrix of derived by removing the row from . Thus has rows and columns. Then, the number of spanning trees in , denoted by , is equal to the number of non-singular submatrices of .*

**Proof:** See the “Omitted Proofs” section below.

Let us call the *Reduced Incidence Matrix*, derived by removing the row from . Thus the question of computing , the number of spanning trees in , is equivalent to computing the number of non-singular square submatrices of the reduced incidence matrix .

Now we can proceed to the proof of the main theorem.

First, let us briefly state what we are going to prove. Let be the eigenvalues (counting multiplicities) of the combinatorial Laplacian of the graph . We want to prove that

.

We will proceed as follows.

**Step 1: (Submatrix from )** Let be an submatrix of the incidence matrix by deleting the row of . Using Fact F1, we see that

.

Note that has rows and columns. We want

**Step 2: (Square submatrices of of order )** Now we will build all submatrices from as follows. Let be any ordered sequence of integers picked from such that . Let be the set of all such sequences . Now, given a particular sequence , let be the submatrix of obtained by keeping only those columns from whose index are in .

**Step 3: (Apply Fact F2)** We want to evaluate the determinant from Step 1 using Cauchy-Binet formula, as follows:

**Step 4: (Apply Lemma L0)** must be either or , and hence the above equation becomes the following:

**Step 5: (Apply Lemma L4)** By Lemma L4, the number of spanning trees in is the same as the number of non-singular square submatrices of . Therefore, the above equation becomes

.

**Step 6:** Note that the value of does not depend on , and thus each diagonal cofactor of the combinatorial Laplacian is equal to .

.

**Step 7:** **(Use Facts F4, F5)** Restating Fact F4, we have,

Thus, we have proved Kirchoff’s Matrix-Tree theorem.

**Fact F1.** Let be the submatrix obtained from the incidence matrix by removing its row. Then, the cofactor of the diagonal element of the Laplacian is the following:

* Proof sketch (of F1):* First note that for unweighted graphs, and hence, every entry of comes from the dot product of the row and column of . Now, the matrix will not have any contribution from the row and column of . This is exactly what happens when computing : First we take the principal minor by removing the row/column from and then take determinant. (The sign of will be , which is positive.)

**Fact F4.** The product of all non-zero eigenvalues of the combinatorial Laplacian is equal to the sum of all principal minors of . That is,

where is the principal minor of .

*Proof sketch (of F4):* Since eigenvalues are solutions to the characterstic polynomial of a square matrix , they can be related to the coefficients of by using elementary symmetric polynomials. Next, the coefficients of can be expressed as the sum of principal minors of . Details can be found in any standard matrix analysis textbook, e.g., Chapter 0 of [7].

**Lemma L0.** *The determinant of any square submatrix of is either 0 or .*

The following proof is taken from the Masters Thesis of Boomen [6].

**Proof:** The proof is by induction*.*

(Base case) First observe that only two nonzero entries appear at each column of , and one of them is and the other is . Now consider any submatrices of where each entry is either 0 or and the nonzero entries in a column are different. It can be checked that the value of the determinant of this submatrix can only be one of . This forms the basis of our induction.

(Induction hypothesis) Assume that the determinant of every square submatrix of of order smaller than is either 0 or .

(Induction) Suppose be a square submatrix of . We want to show that the .

Think about Laplace expansion of the determinant of . There can be only three cases:

- Case 1: If there is any row/column with all zeros, and we are done.
- Case 2: If any row/column of contains exactly one nonzero entry (), we expand along that row/column (which will have only one cofactor) and by applying the inductive hypothesis, the result is which is either 0 or , and we are done.
- Case 3: If all columns of contain exactly two nonzero entries (one 1 and the other -1), then if we add all rows the sum will be the zero vector. Hence and therefore , and we are done.

**Lemma L1.** (Rank of the Incidence matrix of a connected graph) *Let be the incidence matrix of graph . Then, is connected .*

**Proof:
**

*(The ** part.)* It given that . Therefore, there exists a submatrix of , call it , whose rank is . Suppose is the incidence matrix of the subgraph of on vertices. Since , we can apply **Proposition 4** and hence is connected.

**Lemma L3.** (Spanning tree of and incidence matrix ) *Let be a graph with vertices, and let be its incidence matrix. Let be a subgraph of with edges, and let be its incidence matrix. Then, is a spanning tree of every square submatrix of is non-singular.*

**Proof:
**

**Corollary C1.**

*Each non-singular square submatrix of corresponds to a spanning tree of induced by its columns.*

**Proof:
**Let be a connected graph. Let be a non-singular square submatrix of and let be the subgraph for which is the incidence matrix. Assume, without loss of generality, that . Since (because it is non-singular), at least one of its columns will have non-zero row-sum and thus one of its edges must be incident to the vertex such that . Thus the subgraph is connected, has vertices and edges, and thus it is a spanning tree of and has the same edges that correspond to the columns of .

**Lemma L4.** (Number of spanning tree and the incidence matrix )

*Let be a submatrix of derived by removing the row from . Thus has rows and columns. Then, the number of spanning trees in , denoted by , is equal to the number of non-singular submatrices of .*

**Proof:
**By

[1] Kirchhoff, G. Über die Auflösung der Gleichungen, auf welche man bei der untersuchung der linearen verteilung galvanischer Ströme geführt wird. *Ann. Phys. Chem.* **72**, 497-508, 1847.

[2] Deo, N., Graph theory wity applications to Engineering and Computer Science, Prentice Hall, Englewood Clis, New Jersey, 1974.

[3] Lankaster, P., Tismenetsky, M., Theory of matrices : with applications, 2nd edition, Academic Press, 1985.

[4] Butler, S. K., Eigenvalues and Structures of Graphs, PhD Dissertation, University of California at San Diego, 1-9, 2008.

[5] Brualdi, R. A., The Mutually Beneficial Relationship of Graphs and Matrices, American Mathematical Society, 21-22, 2011.

[6] Boomen, J., The Matrix Tree Theorem, Masters thesis, Radboud Universiteit Nijmegen, Netherlands, 2007.

[7] Horn, Roger A., and Charles R. Johnson. “Matrix Analysis. 1985.” *Cambridge, Cambridge*.

[8] Srivastava, Nikhil. Matrix Tree Theorems (Lecture Notes). http://www.cs.yale.edu/homes/spielman/561/2009/lect18-09.pdf

[9] Oveis Gharan, Shayan. Random Spanning Trees (Lecture Notes). https://homes.cs.washington.edu/~shayan/courses/cse599/adv-approx-1.pdf

Filed under: Computer Science, Expository, Graph Theory, Mathematics, Matrix Analysis, Spectral Graph Theory Tagged: kirchhoff, matrix tree theorem, spanning tree ]]>

- The reader is a beginner, like me, and have already glanced through the Spielman-Srivastava paper (from now on, the SS paper).
- The reader has, like me, a basic understanding of spectral sparsification and associated concepts of matrix analysis. I will assume that she has read and understood the Section 2 (Preliminaries) of the SS paper.
- The reader holds a copy of the SS paper while reading my post.

First, I will mention the main theorems (actually, I will mention only what they “roughly” say).

**SS Theorem 1.** *It is possible to construct a spectral sparsifier of the input graph if we sample (with replacement) each edge (of weight ) with probability proportional to its effective resistance . We will need to take samples, and the resultant sparsifier will have the following property with probability at least :*

*where*

*, an vector, is a real valued function defined over the vertices of**and $\tilde{L}$) are the Laplacian matrix of and , respectively*

The above theorem says, the Laplacian quadratic form of (i.e., ) can be made very close to the Laplacian quadratic form of , with probability greater than . Note that this theorem needs the effective resistance of each edge already computed. The following theorem shows how to *approximate* efficiently.

**SS Theorem 2.** *It is possible to approximately compute the effective resistances of each edge in time through the approximation of a special matrix , which we will discuss later.*

Is an approximation of good enough for [SS Theorem 1] to still hold? As it turns out, it is, confirmed by the following lemma.

**SS Corollary 6.** *Approximate effective resistances are good enough for obtaining a spectral sparsifier.*

The main questions are:

- How do we prove the existence of such a probability distribution which lets us generate a spectral sparsifier?
- How do we efficiently compute/approximate these probabilities?

The answer to these questions can be found through weaving elegant connections among some apparently disconnected/loosely-connected dots. I will mention these dots as D1,D2,… and walk the reader through the connections.

**D1.** Effective resistance of edge can be defined in terms of the corresponding row of the incidence matrix as follows:

where is the Moore-Penrose pseudoinverse of the Laplacian . Consequently, we can see that the diagonal entries of the matrix are the effective resistances .

**D2.** (depends on D1) We can construct the matrix $\Pi$ which is similar to as follows:

It was proved in **[SS Lemma 3]** that

- is a projection matrix, and hence .
- Rank of is .
- All eigenvalues of are either zero or one. Multiplicity of 1 is , and multiplicity of 0 is .
- Diagonal elements of is equal to the squared-2-norm that column. That is,

**D3.** (depends on D2) The diagonal entries of are therefore

Thus we are interested in computing (either exactly or approximately) the diagonal entries of .

**D4.** Now consider an diagonal matrix whose diagonal entries denote the “scaling factor” introduced by the sparsification algorithm. Namely,

where

- Edge is sampled (with replacement) with probability
- is the number of samples taken to generate the sparsifier
- is the weight of the edge in
- is the weight of the edge in

**D5.** (depends on D4) It can be noted that the number of times an edge was actually sampled within trials is *expected* to be equal to . Therefore, , and thus and .

**D6.** (depends on D4) Since , the Laplacian of can be written as

Note that we want to make , the Laplacian quadratic form of , very close to , the Laplacian quadratic form of .

**D7.** (**[SS Lemma 5] **Rudelson-Vershynin Lemma) Roughly, this lemma says the following. Suppose we draw vectors from certain probability distribution, and build corresponding rank-1 matrices . Then, the *expectation* of the *2-norm* of the difference between (1) the average value of these rank-1 matrices and (2) the expected value of such a rank-1 matrix is *bounded.* This bound depends on the number of samples taken and maximum of the 2-norm of the vectors .

**D8.** (depends on D7, D2, D3) Suppose the probability with with each edge is sampled is proportional to the corresponding diagonal entry of the projection matrix . That is,

Then, using the Rudelson-Vershynin lemma (D7), it can be proved that

with probability

(This is proved in the SS paper after stating the Rudelson-Vershynin lemma.) Observe that

and hence does not distort too much.

**D9. [SS Lemma 4]** (depends on D8) This lemma says that, given D8, that is, if is a non-negative diagonal matrix satisfying

,

then

Essentially, **this proves [SS Theorem 1]**.

**D10.** (depends on D9) If the probability distribution used to sparsify is approximated, and the estimate is (roughly speaking) bounded by certain factor , then this approximation only changes the approximation error in D9 to , thus preserving the guarantee of spectral sparsification **[SS Theorem 1]**.

This proves **[SS Corollary 6]**.

Now that we have proved [SS Theorem 1], we are left to prove **[SS Theorem 2]**.

**D11.** (depends on D1) Let be the characteristic vector of , that is, zero in all the rows except the row which contains one. Now note that

Suppose be an matrix. If we know the $\latex \hat{Z}$, then $R_{uv}$ can be directly calculated from the difference of two columns of . Therefore, now we focus on computing .

Actually, we will compute , a matrix.

**D12.** This a special version (due to Achlioptas) of the famous Johnson-Lindenstrauss theorem. Roughly speaking, this tells that members of any *fixed* set of dimensional vectors can be projected in dimension through a random matrix where , such that their distances are closely preserved. Entries of are Bernoulli random values taken from .

**D13.** This is the Spielman-Teng linear system solver (STsolver), which can be used to generate *approximate* solutions to system of linear equations involving Laplacians. This algorithm runs in *expected* time , where

- is the number of non-zero entries in , and
- is a predetermined approximation error which is introduced to the generated solution . This error parameter can be set as an input to the solver.

**D14.** (uses D12) We use D12 to project the dimensional columns of the matrix into dimensions. Specifically, we want to compute the matrix

,

where is a matrix. First, we compute the matrix . This takes time because has non-zero entries.

**D15.** (depends on D14, uses D13) Note that has rows and columns. Let be any row of so that length of is . Now we use D13 to approximately solve the system of equations , which gives us *approximate* solution to a *single row* of the matrix . Each call to STsolver takes time since has only non-zero entries. If we call STsolver times, once for each row of , we will be able to approximate the matrix with a matrix in total time

,

where we absorb the term under the notation.

**D16.** (depends on D11, D15) Now we pick an appropriate value of , the error introduced by the Spielman-Teng solver. **[SS Lemma 9]**, with the help of **[SS Proposition 10]**, proves that if we pick a specific value of , the matrix obtained in D15 can be used to compute the effective resistances as outlined in D11. It turns out that the appropriate value of involves the minimum-to-maximum weight ratio

,

which makes the final running time of approximating the effective resistances to $latex \tilde{O}(m\log{r}/\epsilon^2)$.

Therefore, **we have proved [SS Theorem 1]**.

To summarize,

- First we showed that we can obtain a spectral sparsifer if we sample from a
*certain*probability distribution (see D4). - Then we showed that these probabilities are proportional to the diagonal entries of the projection matrix (see D8).
- We also showed that the diagonal entries of the projection matrix actually encode the effective resistances of the corresponding edges scaled by corresponding weights (see D3).
- Next we showed that our reweighting scheme does not distort the quadratic form of , and hence produces the spectral guarantee. This is the proof of the
**[SS Theorem 1]**(see D9). - We also showed that approximating the effective resistances (as opposed to exactly computing them) does not hurt the sparsifier too much
**[SS Corollary 6]**(see D10). - So now we ask:
- (a) How do we approximate the effective resistances?
- (b) How fast?

- We show that the effective resistance of any given edge is equal to the norm of the difference of two columns of a certain matrix (see D11).
- We used a special version of the Johnson-Lindenstrauss lemma (due to Achlioptas) to project the columns of this matrix into lower dimensions. This projection is guaranteed to preserve pairwise distances (that is, norms of the differences of the columns). This lower-dimensional projection is the matrix (see D14).
- Then we approximated this matrix using Spielman-Teng linear system solver for Laplacian systems (see D15).
- Lastly we show that if we set the value the approximation parameter of the previous step equal to some specified value (determined by
**[SS Lemma 9]**), we can approximate the effective resistances in reasonable time (see D16).

That’s all for today, folks. Please let me know if you spot any gap/mistake in the explanations.

Filed under: Approximate Algorithms, Computer Science, Expository, Graph Theory, Mathematics, Matrix Analysis, Probability, Randomized Algorithms, Sparsification, Spectral Graph Theory Tagged: laplacian, sparsification, sparsifier, spectral graph theory, spectral sparsification, spielman ]]>

Since the Stirling numbers of the second kind are more intuitive, we will start with them.

Suppose you have different objects labeled . You also have identical boxes so you cannot order them, or differentiate between them. The question is: in how many ways can you put these different objects into these identical boxes? You cannot leave an empty box. Let’s say the solution is . What are the values of for different and ? These numbers, namely , are called the Stirling numbers of the second kind.

Now think about it. Lets start by observing the object. No matter how you throw different objects into identical boxes, you will end up in one of the two following scenarios:

(1) The object is all alone in some box.

(2) The box containing the object also contains one or more objects.

Consider case (1). If we delete the object along with the box, it is evident that the rest of the objects (there are of them) were arranged in the rest of the boxes (there are of them). In other words, number of ways we could end up in case (1) is .

Now consider case (2). This time, let’s delete only the object but leave the box and the neighbors as they were. If we want to put the deleted object back into the mix, where do we put it? Boxes are identical and each contains at least one of the elements . Looks like we could put the object back into any of the boxes and the resulting configuration will still be in case (2). Since there are exactly ways of putting the object back into any of the boxes, number of ways of generating a configuration of case (2) is .

Since there are no other cases, we have

The numbers are called the Stirling numbers of the second kind, which essentially tells us (now you see) the number of ways different objects can be arranged into identical boxes such that no box is left empty.

You can see Wikipedia for closed form solution and other properties of the Stirling number of the second kind.

The combinatorial idea of the Stirling numbers of the first kind is similar to the previous case, except that now the ordering of the objects inside each box matters.

Suppose you have numbers . Let us consider the following permutation of these numbers when :

value: | 3 | 1 | 5 | 6 | 7 | 4 | 2 | 8| 9 | ---------------------------------------------- position:| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8| 9 |

Now let’s find cycles. We observe that

— position 1 contains 3

position 3 contains 5,

position 5 contains 7,

position 7 contains 2,

position 2 contains 1 (cycle of length 5)

— position 4 contains 6,

position 6 contains 4 (cycle of length 2)

— position 8 contains 8 (cycle of length 1)

— position 9 contains 9 (cycle of length 1)

In other words, there are 4 cycles in the above permutation:

(35721)(64)(8) (9)

Now we are ready to ask the following question: in how many ways can you arrange the numbers so that you end up with exactly cycles? Suppose the answer to this question is . What are the values of for different and ? As you have guessed it, the numbers are indeed the Stirling numbers of the first kind.

So how can we derive a recursive relation for it? We will go back to our objects-and-boxes analogy. We want to throw different objects into exactly identical boxes such that elements inside any given box form a cycle. How many ways to do it? Like before, all admissible permutations of objects fall into one of two cases:

(1) The number is forms a cycle by itself.

(2) The number is part of a larger cycle.

In case (1), the number of ways we can end up here is the same as arranging the rest numbers “properly” into cycles. Therefore, there are different ways of arranging numbers into cycles so that they all fall in case (1).

In case (2), once again we delete the number without touching other elements in its cycle. After we deleted , observe that the rest of the numbers are already arranged in cycles, which can be done in ways. So how many ways could we have deleted ? Observe that this number, , could have appreared before any of the existing numbers, so there were ways we could have deleted from the permutation. Therefore, there are different ways of arranging numbers into cycles so that they all fall in case (2).

Since there are no other cases, we have

The numbers are called the Stirling numbers of the second kind, which essentially tell us (now you see) the number of partitions of integer , each having exactly cycles.

You can see Wikipedia for closed form solution and other properties of the Stirling numbers of the second kind.

Filed under: Combinatorics, Expository, Mathematics Tagged: stirling numbers ]]>

The words *recursive* and *computable* mean the same thing, and can be used interchangeably.

A *partial function* is defined on only some inputs. A* total function* is defined on all inputs. Here, inputs are natural numbers. If a function is not defined on some input , we say that *diverges*, denoted by . Otherwise, if we can compute , we say that *converges*, denoted by .

Every partial computable function is computable by a Turing Machine (TM). This is the famous Church-Turing Thesis. The inputs on which this function is undefined, the Turing Machine will produce no answer. Hence there exists a TM program for every partial function. The entire description of this program can be uniquely converted to a natural number. Thus the partial computable function which is computed by the TM program with description is denoted by , where is a natural number.

Consider any partial recursive function . The S-m-n theorem (in its simplest form) tells us that there exists a total computable function such that for all . In other words, it is possible to find (by a TM) a partial recursive function which has the same input-output mapping as for all inputs . You can think about it as hard-wiring one or more input-arguments of any function into its index.

Now we are ready to state the recursion theorem.

Kleene’s Recursion Theorem tells us that for every total computable function which takes a natural number as input and gives another natural number as output, there exists a particular input such that the two partial computable functions and have the same input-output characteristics.

In other words,

- Pick
*any*computable function that is total. - Then, there
*exists*some natural number , such that - If we grab the two partial functions and , we will see that
- Wow! They have the same input-output mapping for all inputs, that is,
- If diverges for some , so does .
- If converges for some , so does and moreover, .

This is the same as saying that every total computable function has an input element such that the two partial functions indexed by and will behave identically on all inputs. In this sense, is called the *fixed-point* for the total computable function .

The proof is elegant and short. However, it contains (like most recursion theory proofs) self-references and therefore sometimes hard to visualize for a beginner. Below we will outline the proof presented to our class lecture by Professor Johanna Franklin.

**Step 1.** Consider a total, one-to-one function with one input. We will define this function later. This function maps the given natural number to another natural number.

**Step 2.** Consider *any* total computable function , which also maps a given natural number to another natural number.

**Step 3.** Now consider the composition of , that is, the result of successively applying and on some input. Clearly, since both and are total, the composition will also be total. Let be the total function that computes this composition. Therefore, for all . Note that is a fixed number which must exist although we do not know its exact value. Also note that since is total, it follows that .

**Step 4.** Let . Since is a total function, such an must exist. Now we have .

**Step 5.** Now we will define a partial function on two inputs, as follows. First, we will evaluate (remember that is an input to our function ). If converges to some value, say, , then gives the same output as . Note that since is partial, this output may or may not converge. Otherwise, if , then our function diverges. This description is given as follows:

**Step 6.** Now it is time to define the total one-to-one function mentioned at Step 1. We will do it as follows. We will take the partial function defined in previous step, then use the S-m-n theorem to show the existence of another partial function which has identical input-output behavior as .

Therefore, is a function which maps a natural number to another natural number. Since the parameter of the function can be any natural number, is defined for all inputs, and hence total. Moreover, as S-m-n theorem tells us, is computable as well.

**Step 7.** Now we have all the pieces of the puzzle. Here we are interested at only one particular number, namely from Step 3. At Step 3 we showed that is total, and hence . Therefore, for any input parameter , we have the following.

Filed under: Computability/Logic, Expository, Mathematics Tagged: computability, kleene's recursion theorem, recursion theory ]]>