*(For me only) *Edit

## Number Theory

Chinese Remainder Theorem.Given a (pairwise co)prime factorization of an integer , there is a bijection between and the Cartesian product where is the set of units (i.e., invertible elements) modulo i.e., the set of integers that are coprime to .

*More.* This bijection takes an element to the tuple . Moreover, given a tuple , the corresponding element in is

where . Notice that $\overline{m_i} \bmod m_i \neq 0$ since the are coprime to each other.

Fermat’s Integer Factorization of .Try to find two integers such that and . Then must divide the product . Consequently, the gcd of and at least one of these factors must be non-trivial. This gcd divides .

End Number Theory Edit | Back to top

**Algebra**

Finite Abelian Groups.Every finite Abelian group can be writen as a direct productwhere and are powers of (possibly repeating) primes and divides , divides and so on.

Cyclic Groups.These are abelian groups which are generated by a single element. is cyclic. For every positive divisor of , the quotient group has precisely one subgroup of order , the one generated by the residue class of . The number of generators of equals the number of elements coprime to . This number is given by Euler’s totient function . The order of the element in is . The direct product of two cyclic groups and is cyclic if and only if and are coprime.Every finite abelian group can be expressed as “the” direct sum of cyclic subgroups of prime-power order, and these orders are uniquely determined.

Pontryagin Dual.Let be an additive finite Abelian group and let be the unit torus with multipliciation as the group operation. Let be a group homomorphism defined as . Then, the set forms an Abelian group with the group operation being “point-wise multiplication”,

and the identity operation being the constant function

.

is called the Pontryagin dual of , and the character is called the trivial character.

End Algebra Edit | Back to top

**Coding Theory**

Reed-Muller Code.The order- Reed-Muller code is the set of Boolean polynomials on variables and degree at most .

Reed-Solomon Code.In the Reed-Solomon code , a degree- polynomial over — the message — is encoded into a degree- polynomial over — the codeword — where the coefficients of the codeword are obtained by evaluating on a fixed set .Since can have at most zeros at any field, at most of the s can be zero. This means the minimum distance is at least which matches the singleton bound. By definition, the

Reed-Solomon code is MDS.

Singleton Bound.For a linear code, the distance is at most . Any code matching this bound is called MDS (maximum distance separable).

*Proof:* Suppose we have a -ary code with block length , dimension , and distance . This means each pair of codewords has Hamming distance at least . If we puncture the code by deleting (without loss of generality) the first symbols from each codeword, the resulting pairwise-distance among codewords is still at least and we still have a valid code as long as . This condition is written as

.

Plotkin Bound.If the distance of a code is strictly larger than half the block length , that is, if , the number of codewords is at most .

Johnson Bound.Consider a code whose relative distance is “large” i.e., at least . Then, for every codeword and every “large” , only a “small” number of strings are “close” to a codeword. In particular, for any codeword , the number of strings (in the target space) that are within a relative distance from x is at most .

Gilbert-Vershamov Bound.For every and all sufficiently large , there exists a code with relative distance on bit strings with block length . Here, is the binary entropy function.

Hadamard Matrix.Let . Thenis the Hadamard matrix of order .

- The rows/columns of are indexed by -bit Boolean strings. Let denote the inner-product modulo . Then
.

- The map is a linear code.
- The volume enclosed by the (unit-length) columns of (i.e., the determinant) is the largest possible for any real matrix of the same order.

Discrete Fourier Transform.See here.

End Coding Theory Edit | Back to top

## Probability

Expectation.Suppose a random variable takes its value from the domain ..

Additionally, if ,

.

Variance.In general,.

If are independent, then

.

Moments.For any random variable , its th moment is simply . These moments are given by the (exponential) moment generating function.

Given an mgf , we can calculate the th moment of by taking the th derivative of w.r.t. and evaluate it at .

The th factorial moment of is defined as

.

The regular moments and the factorial moments are related by

where are the Stirling numbers of the second kind.

Tower Property of Conditional Expectation.Let be a random variable and the sets denote events.

where denotes set complement.

Using ,

.

In particular, if ,

.

Proof: here.

The Birthday Paradox.Suppose . When you randomly draw times from an -set with replacement, the resulting -subset will contain a repeated element with constant probability.

*Explanation.* Suppose we choose independent samples from the numbers . For any , the probability that is . The expected number of collisions in is at most . This is a constant whenever .

Coupon Collector’s Problem.Suppose you want to color all items in a set of size as follows: at round , you pick a random element from ; if it is still uncolored, you color it red, put it back, and repeat. This process will require rounds to color all items.In particular, let be the number of rounds till the game ends. For any positive ,

.

There is also an asymptotically matching lower bound.

Balls in Bins.If balls are thrown into bins independently and uniformly at random, the “heaviest” bin contains at most balls with high probability. However, if the balls are placed sequentially—each into the “lightest” among bins sampled uniformly at random—then the largest load is at most . Thus even if , the bins are highly balanced in the sequential case.

Poisson Distribution.Let be a discrete random variable. If it has the Poisson distribution , its probability mass function at point is the probability that is given . That is,The expectation and variance of is . The expected deviation of from its mean is . A binomial distribution tends to a Poisson distribution when .

Note: The events and must be independent.

Hypergeometric Distribution.This is the binomial distribution without replacement. In particular, let be the probability that exactly “good” items are obtained in sample-without-replacement trials from a population of items containing exactly “good” items in total..

The mean of this distribution is, surprisingly, , which equals . The variance is

if .

Stochastic Dominance (First Order).Consider two random variables . stochastically dominates if “always” has a “fatter right tail” than that of . To be precise, for all in the (totally-ordered) domain of the variables, with strict inequality for some . having a fatter right tail is equivalent to having a thinner left tail.

The Gambler’s Ruin Problem.See here.

The Unbiased Random Walk.See here.

Shearer’s Inequality.Let be coordinate-sets such that each coordinate is covered by at least different s. Let be a collection of independent random variables. The entropy of is at most times the sum of the entropies of its restriction to each coordinate set. That is,

.

Total Variation Distance of two distributions. The total variation distance of two distributions over a set is equal to

- the maximum absolute difference over all subsets.
- half the sum of absolute pointwise differences.
- the sum of one-sided (i.e., where ) signed pointwise differences.
That is,

.

,

where and .

See here for a proof.

For two fixed distributions , the “disagreement probability” for any random variables is at least the tv-distance between and . That is,

.

Pinsker’s Inequality.The Kullback-Leibler divergence of two distributions is at least twice the square of their total-variaion distance. Namely,.

Kullback-Leibler Divergence.Given a probability distribution on a discrete sample space , let denote the “amount of surprise” associated with the event according to the distribution . TheEntropyis the expected amount of surprise, namely.

The

KL Divergenceof with respect to is

- the expected amount of surprise “remaining” in after revealing .
- the entropy of relative to .
- the expectation (in ) of the pointwise-difference of the log-probabilities.
.

Markov Chains.

- A Markov chain is aperiodic if it has self-loops. It is irreducible (i.e., connected) if every state is reachable. It is
ergodicif it has a uniquestationary distribution, which happens if and only if the chain is aperiodic and irreducible. Let be the transition probability matrix of the chain. The chain isreversibleif for all . A random walk islazyif it stays at the same state with a constant probability. Thus a lazy (i.e., aperiodic and reversible) and irreducible random walk is ergodic.- The
mixing timeof a random walk with an initial distribution is the number of steps it takes so that the distribution after steps is very close to its stationary distribution.An ergodic chain mixes rapidly.In particular, for any initial distribution ,where is the second largest (absolute) eigenvalue. The mixing time in the above case is

.

For a -regular graph, this simplifies to .

Mixing time in .The probability mass at any state is within an of its stationary value after onlysteps where is the spectral gap of the transition matrix . In particular, after steps, for any state ,

.

See here and here for nice expositions.

Ergodic Theorem.Asymptotically, the fraction of times the walk stays at a specific state equals the stationary mass at that state.

Mixing Time via Coupling.Given two identical walks on the same state-space starting from different initial states and , let be the time when they meet. Then.

- For the lazy- biased random walk on the
Hypercube, the mixing time is.

- For the lazy biased walk on the
Cyclewith,

.

- A
birth and death processis a lazy Markov chain on with possibly different up/down/stay probabilities for each state. Let denote the coupling time for two walks starting at and . Then.

A special case is when and are identical for each state. Assuming ,

.

In particular, this means

.

If , however,

.

Doob Martingale.For any bounded function and a sequence of random variables , define , , and,

, and

.

Then, the sequence is a martingale with respect to the sequence , i.e.,

for every .

Proof: Using , using to denote complement in the set , the left hand side is

using the def. of .

(since ) .

Using the tower property of conditional expectation, this quantity is nothing but

.

The Exponential Distribution. An exponential distribution with parameter is the continuous analog of the geometric distribution.

End Probability Edit | Back to top

## Combinatorics

Pigeonhole Principle.

- Let are two integers. If balls are placed in bins, at least one bin receives two balls.
- If a quantity is distributed into parts, then some part receives at least (at most) .
- Let be a probability distribution on a set . Then there must be some element with probability at least (at most) .

Binomial Identities.Triangle. .

Hockey Stick. .

Probability Generating Functions.Let be a formal power series with . If then “generates” the probability distribution . Suppose generates the distribution . It is easy to see that the convolution generates the probability distribution whose mass function is.

Composition of Generating Functions.Let and denote the number of ways to build two structures, respectively of types 1 and 2, on a -element set where and . Let be the number of ways to divide the set into an unspecified number of intervals, build a type-1 structure on each interval, then build a type-2 structure on theset of intervals. Set . Denote by the generating functions of the sequences . ThenWe can expand as

where

.

If are probability generating functions, we can interpret as the generating function for the sum of independent draws from where is distributed according to .

Sum of first binomial coefficients.Suppose . Then,.

This bound is due to Michael Lugo. (Aside: This implies, if , the sum is at most .)

The above bound is within a factor of as long as and . When , this multiplicative factor increases.

The following bound is due to Lovasz. It works better when ; it stays within a factor of for .

.

Sigma Field.A -field consists of a sample space and a collection of subsets satisfying the following conditions:

- .
- .
- .
The last condition, coupled with the second condition, implies the closure under countable union and intersection.

Ramsey’s Theorem.Let . Every edge-two-coloring of induces a monochromatic subgraph which is either a or a .Generally, given integers and , there exists a number —called the Ramsey number—such that for every and every edge--coloring of there will be a monochromatic for some .

Generalized Arithmetic Progression.Let be an additive group, i.e., an abelian group with the operation . Let .The set

is called a (generalized) arithmetic progression in where

- is the rank of
- N = (N_1, \cdots, N_d) \subset \mathbb{Z}_+^d$ is the dimension of ,
- is the starting point of
- are the basis vectors of
- the volume of
- the function is the usual dot product.
The progression is

if the map is injective (i.e., one-to-one) on . Equivalently, . An improper progession may result if the basis vectors are linearly dependent over . The progression is symmetric if .proper

Hales-Jewett Theorem.Let be an alphabet of cardinality . For a “sufficiently large” dimension and a -partition of the words , there must be a part that contains a “combinatorial line.”In an arithmetic setting, suppose , and . Then there is an integer latex such that if is partitioned into non-empty sets, then at least one of these parts must contain a proper arithmetic progression of the form

where and .

Like Szemeredi’s Theorem—the one-dimensional specialization of the Hales-Jewett theorem—the Hales-Jewett theorem has a density version.

Van der Waerden’s Theorem.Given integers , there exists an integer $latex N(m, k)$—called the van der Waerden number—such that if we color a proper arithmetic progression of length at least N \geq N(m, k)$ using different colors, there will be a monochromatic proper sub-progression of length at least . The smallest is called the Van der Waerden number. The most natural case is when .Equivalently, in an arbitrary partition of into subsets, at least one subset contains an arithmetic progression of length .

Szemeredi’s Theorem.Informally, it says that dense subsets of integers contain long arithmetic progressions.Let . Let be the size of the largest subset of an additive set avoiding an arithmetic progession of length . The following are equivalent:

- Every “sufficiently large” subset of must contain infinitely many arithmetic progressions of length for every . Here, “sufficiently large” means having a positive upper density.
- Every subset of of size at least must contain an progression of length .
- .
Gowers (2001) proved that

.

Setting in the second formulation, Szemeredi’s theorem implies van der Waerden’s theorem.

Roth’s Theorem.For any finite additive group of odd order we have.

Triangle Removal Lemma.If a graph has triangles, it is possible to remove all triangles by removing only edges.

Fibonacci Numbers Generating Function..

Harmonic Numbers Generating Function..

Binomial Coefficients Generating Function.for fixed .

for fixed .

Bell Numbers Exponential Generating Function.The th Bell number is the number of ways to partition ..

Catalan Numbers.The th Catalan number can be interpreted as the number of ways parentheses can be matched. The solution is.

They satisfy the recurrence relation

.

This reveals that the generating function for Catalan numbers, , must satisfy

.

End Combinatorics Edit | Back to top

## Graph Theory

Independent Set and Vertex Cover.A largest independent set is the complement to a minimum vertex cover. The vertex cover problem is the dual of the maximum matching problem. In a bipartite graph, the size of the maximum matching equals the size of the minimum vertex cover (Konig’s theorem).

Crossing Number.is the minimum number of edge crossing in a plane drawing of a graph . If , then . This is the crossing number inequality.

Szemeredi-Trotter Theorem.Suppose we have points and lines in a two-dimensional plane. Consider any bipartite graph where an edge implies that the point is on line . Then.

Szemeredi Regularity Lemma.Informally,

- Given a positive constant and a graph with a “large enough” vertex set, one can find a “nearly-uniform” partition of the vertices, into “not too many” parts, so that “most” of the part-pairs are -regular.
- Every graph has a constant size approximate representation where the size of the representation depends only on the quality of the approximation.
- All graphs are mostly composed of random-looking bipartite graphs.
More formally,

- We are given an and a positive integer .
- The partition-size is .
- At least a fraction of the part-pairs are -regular.

Here, a “nearly-uniform” partition is a partition where the part-sizes differ by at most one. The edge density of two disjoint vertex-sets is defined as

.

The disjoint vertex-sets are -regular if

for all “large” subsets .

Informally, a bipartite graph is -regular if its edges are dispersed like the edges of a random graph.

End Graph Theory Edit | Back to top

## Algebraic Graph Theory

Expander Mixing Lemma.Let be a -regular expander graph on vertices and edges. Suppose the second-largest eigenvalue of its adjacency matrix is at most . For any pair of vertex-sets , let be the set of “cross edges” between and . Let denote the cardinality of the set . Then, for all ,

In the left-hand side, the first term is the number of cross edges. The second term is the expected number of cross edges in a random graph with edge-probability . The left-hand side thus measures how much deviates from this expectation. The expander mixing lemma says that if is small, will be nearly random in this sense.

Random Walk on an Expander.Suppose is a -regular expander whose second-largest eigenvalue is at most in magnitude. Let be a set containing a fraction of vertices. The fraction of random walks of length , on the vertices of , that stay entirely within , is small; in particular, it is at most.

Isoperimetric Number (Edge Expansion).For any vertex-subset , its boundary is denoted by . Its isoperimetric ratio is defined asThe isoperimetric number of an -vertex graph is the smallest over all of size at least .

One can show that

where and is the second smallest eigenvalue of the Laplacian of .

Cheeger Inequality.Suppose is an -expander i.e., a -regular graph on vertices with the second-largest eigenvalue of its adjacency matrix being . Since the largest eigenvalue is , the quantity is thespectral gap. For every ,.

where . Equivalently,

.

The Laplacian Matrix.Let be the adjacency matrix of the graph on vertices and edges. Let be the diagonal degree sequence matrix. Then the matrix is called the (combinatorial) Laplacian matrix of $latex G$. Let the eigenvalues of be.

Let the eigenvalues of be

.

The Laplacian matrix has some amazing properties.

- The rank of is where is the number of connected components of . For a connected graph, .
- Since the rows of sum to one, the all-ones vector is an eigenvector of with eigenvalue zero.
- Since is a real symmetric matrix, all eigenvalues are real.
- If is -regular, for .
- The diameter of is strictly less than the number of distinct eigenvalues of .
- If we remove a cut-vertex from , the size of the largest component is at least .
Edge Addition.If we add an edge to , the non-zero eigenvalues never decreases. In particular, for all where are the new eigenvalues.- Let be the vertex- and edge-connectivity of , respectively. Then
.

- , where and . We can think as “baby Laplacians” where the two diagonal entries are one and the two off-diagonal entries are minus one.
- For any vector ,
.

- Let be the characteristic vector for a vertex-set . If the graph is unweighted i.e., for all , then
where is the boundary of , i.e., the number of edges with exactly one endpoint in .

- One can show that where . This means a graph with a large has a high edge expansion, or “well-connected.”
- If then for all .
- Let and be two weighted graphs that differ in only their weights. Then
.

- For path graphs,
.

- and .

The Spectrum of a Graph Product.Suppose has eigenvalue-eigenvector pairs . Similarly, suppose has eigenvalue-eigenvector pairs . Then, for each , the Cartesian product will have an eigenvalue associated with the eigenvector defined asfor .

Hoffman’s Bound on the Size of an Independent Set.The size of an independent set of a -regular graph on vertices is at most.

End Algebraic Graph Theory Edit | Back to top

## Linear Algebra

A complex matrix is

unitaryif . A real, unitary matrix isorthogonal. isHermitianif . It isnormalif . A Hermitian matrix isPositive Semidefiniteif all its eigenvalues are non-negative.

Cramer’s Rule.Suppose we want to solve a matrix-vector equation . Let denotes the matrix obtained by replacing the th column of the matrix by the . Then,

Spectral Decomposition.A matrix isHermitiani.e., , if and only if has an orthonormal set of eigenvectors with real eigenvalues i.e., where are an eigenvector-eigenvalue pair.

See here for a proof of the Hermitian case: orthogonal eigenvectors and real eigenvalues.

Normal Matrices.A matrix is normal if it commutes with its conjugate transpose, i.e., . This happens if and only if is unitarily diagonalizable, i.e., there exists a diagonal and unitary such that .A block upper-triangular matrix is normal if and only if all on-diagonal blocks are normal and all off-diagonal blocks are zero.

Polar Decomposition.Every matrix can be written as where is unitary, are positive semidefinite, and . Moreover, if is invertible then is unique.

Singular Value Decomposition.Every square matrix can be written as where are unitary and has non-negative real diagonal entries.

Proof: By polar decomposition, where is unitary and is positive semidefinite. By spectral theorem, for some unitary and real diagonal . Moreover, since is PSD, its eigenvalues are non-negative; hence has non-negative real entries. We set .

Cholesky Factorization.Every positive semidefinite complex matrix can be factored into where is lower-trialgular with real, positive diagonal entries.

Schur Decomposition.Any complex square matrix can be written as where is unitary and is upper triangular.

Square Root.Every positive semidefinite matrix (with bounded largest eigenvalue) can be factored into where is also positive semidefinite.

Hilbert-Schmidt Norm and Inner Product.Let denote the -th column of the matrix . Then,

.

.

Trace.Trace of a matrix is the sum of its eigenvalues.

.

.

Projection Matrices.Any projection matrix can be written as where is a basis of the target subspace. has eigenvalues . is always Hermitian i.e. , as evidenced by the sum formulation. satisfies .

Frobenius Norm.A generalization of the usual norm for matrices..

This norm, like the norm, is invariant under rotation.

Tensor Inner Product.The tensor inner product on the space is defined as follows:,

where are scalars and , and .

Conjugate Gradient Method(for Solving Symmetric Linear Systems).Suppose we would like to solve for where is a real symmetric Positive Semidefinite matrix. The matrix defines the inner product . We can find a set of basis vectors spanning that are mutually orthogonal according to this inner product. Then, the solution can be written as a linear combination of these basis vectors.In practice, the vectors and corresponding can be found using a process similar to the Gram-Schmidt orthonormalization. Any subset of these basis vectors gives an approximation of .

Facts about the Matrix Exponential.If commute, then . In addition, if commute then commute, as do .Moreover, for any invertible , . Next, . If then .

Golden-Thompson inequality.For any square matrices ,.

Lieb’s Theorem.Let be Hermitian and positive definite. The functionis concave.

Corollary (Tropp).Let be fixed, be random, both Hermitian. Then.

See Joel Tropp’s paper (Section 3.4).

Perron-Frobenius Theorem.Suppose is a real matrix with strictly positive entries. Let be its largest eigenvalue with corresponding eigenvector .

- is a positive real with multiplicity one. Every other eigenvalue has a strictly smaller absolute value.
- is no smaller than the minimum row-sum of , and it is no larger than the maximum column-sum.
- Every component of is a positive real. There is no other positive real eigenvector.

Weyl Inequalities.

Eigenvalues after a Hermitian Update.Let be order- Hermitian matrices with eigenvalues , respectively, for , arranged in increasing order. Then.

Eigenvalues after a PSD Update.Let be Hermitian and be positive semidefinite. Thenfor all .

Eigenvalues after a Rank-one Update.Let be Hermitian and $B = A + zz^*$ where is a vector. Let be their eigenvalues, arranged in increasing order. Then.

Eigenvalues after a Rank- Update.In general, if and the eigenvalues of , then.

Additive Bounds.If such that , then.

Eigenvalues by Bordering.Letwhere is Hermitian, an arbitrary complex vector, and an arbitrary real. Let be their eigenvalues arranged in increasing order. Then

.

Eigenvalues of a Submatrix.Let be the order- principal submatrix of an order- Hermitian matrix . Let be their eigenvalues, arranged in increasing order. Thenfor .

Majorizing Eigenvalues.The diagonal elements of a Hermitian matrix majorizes its eigenvalues.

Given a set , let be the minimum of the sum of every -subset of . **majorizes** $B$ if for every with equality for .

Sylvester’s Law of Inertia.If is symmetric and is non-singular, then the matrix has the same number of positive, negative, and zero eigenvalues as those of .

Rayleigh-Ritz Theorem.For an order- Hermitian matrix with eigenvalues and all , letbe the Rayleigh-Ritz ratio. Then

,

, and

.

Courant-Fischer Theorem.For every of unit length, the th largest eigenvalue of an order- Hermitian matrix is,

.

Let be the corresponding eigenvector and let be the Rayleigh-Ritz coefficient for a vector . Intuitively, the first statement says that

- is orthogonal to the span of the previous eigenvectors
- For every disjoint partition where , first we minimize ; let be this value
- Then, is the vector that achieves the largest value in for all dimensional subspace
- Consequently, is smaller than all subsequent eigenvalues and larger than all previous eigenvalues.

Since we can choose the dimensional subspace first, we have the second statement.

End Linear Algebra Edit | Back to top

## Randomized Algorithms

Generating Pairwise Independent Random Variables.Choose a field of size . Sample two uniformly random elements using uniformly random bits. Then the following random variables are uniform over and pairwise independent:See here for details.

Pairwise Independent Hash Functions.Let be a family of functions, each having input bits and output bits. Suppose the following holds: that if we pick at random, the probability that it maps two arbitrary but distinct input strings to the same output string is exactly . Then is a family of pairwise independent hash functions. In other words,.

Equivalently,the probability that two arbitrary strings will collide under a random is the square of the uniform probability on the output space.Intuitively,if we pick a random then for arbitrary in the domain, the strings are independent and uniformly distributed.

Valiant-Vazirani Lemma.Let be a collection of pairwise independent hash functions from . Let such that where . Then, with constant probability,the string has a unique preimagein under a randomly selected .

Leftover Hash Lemma.Suppose an -bit string has entropy at least . The leftover hash lemma says that it is possible to extract bits from that will be almost uniformly distributed. The way to do it is to use the index to choose a member from a pairwise-independent hash family . If a string is randomly chosen from an arbitrary subset , the distribution of the concatenated string will be at most away from the uniform distribution where .

Hitters.Fix a Boolean function on -bit inputs. Let be the set on inputs on which outputs , and suppose . An efficient, randomized algorithm , given only oracle access to , is ahitterfor if with probability at least , it needs to sample at most $m$ points from before it “hits ,” that is, it outputs an element from . We can construct hitters, using random walks on expanders, which takes samples and uses only random bits. In contrast, if we sample uniformly at random, it requires $mn$ random bits. See the appendix of Goldreich’s survey for details.

End Randomized Algorithms Edit | Back to top

**Deviation/Moment Inequalities**

Markov’s Inequality.Any nonnegative with mean satisfies for any positive .

Chebyshev’s Inequality.Any random variable with mean and variance satisfiesfor any positive . Setting , we get

.

Note: When studying the sum of a set of random variables, the Chernoff-Hoeffding inequality requires **mutual independence** of the random variables while for Chebyshev’s inequality, it suffices that they have only **pairwise independence** (so that the covariances are zero). See Nick Harvey’s notes here.

Mean-Median Inequality.The median of a random variable is within one standard deviation of its mean.

See here for a proof using Chebyshev’s inequality.

Paley-Zygmund Inequality.Let be a non-negative random variable with a finite variance. If then.

Note: The Paley-Zygmund inequality complements the Chebyshev inequality. While the latter says that “right tail is thin,” the former says that “the left tail is thin.”

Hoeffding’s Tail Inequality, Additive Deviation, Symmetric.The probability that the sum of independent and identically distributed -bounded random variables deviates at least away from its expectation is no more than .

Chernoff Bound, Multiplicative Deviation, Asymmetric.Let be the sum of many independent Bernoulli trials. Let and let be any positive number between zero and one. The probability that is below the mean is at most . However, The probability that is above the mean is at most .

(Hoeffding-like statement.)Equivalently, the probability that is below the mean is at most . However, The probability that is above the mean is at most .

*Note*. When is large, the normal approximation of the binomial distribution is less accurate for the positive tail. See the appendix of Alon-Spencer for a commentary.

Above are the plots of the Chernoff-tail and the Hoeffding-tail for both 0-1 and random variables. The plotted quantity is the bound for the probability that the sum deviates $t n$ away from the expectation, for and .

Rudelson-Vershynin Theorem on the Operator Norm of Random Projections.Let be identical copies of a random vector with and magnitude at most . Let . For any ,

Note: We can think as a “scaled” projection operator along multiplied by . Then acts as the mean of these projection operators. The theorem says that the largest distortion inflicted by on any vector is no more than with high probability. Notice also the similarity with Hoeffding’s tail inequality, which would give the exponent as for one-dimensional case. We lost a factor of 8 somewhere.

Azuma-Hoeffding Inequality for Martingales.Let be real-valued random variables with the (super)martingale property, i.e., , and the bounded-difference property, i.e., where may depend on . Then, for all and ,.

In particular, if for all ,

.

The one-sided statement, i.e., for non-absolute difference, is as you have hoped. See Motwani-Raghavan Chapter 4, Theorem 4.16 for details.

Khintchine Inequality.Let be uniformly random in and let be arbitrary. For any bounded , let . Let be the th moment of . Define . Then,,

where the constants hidden by the notation depend only on .

Marcinkiewicz-Zygmund Inequality.Let be independent random variables with zero mean and bounded th moment for any . Let be the th moment of . Then,,

where the constants hidden under the notation depend only on .

End Deviation/Moment Inequalities Edit | Back to top

## Complexity Theory

Goldreich-Levin Theorem.Every injective one-way function has a corresponding boolean function which is hard to predict; predicting is effectively inverting , which is unlikely when is an injective one-way function.

Lipton’s Random Self-Reducibility Theorem.A problem is called “random self-reducible” if you can solve on an arbitrary input by evaluating on a sequence of uniformly random inputs. Suppose you have a black-box to solve on a random input with high probability. Then you can construct a randomized algorithm that can solve on every input with high probability.

Yao’s Min-Max Lemma.You and I are playing a two-player game. You are the input player and I am the algorithm player. Your move is to select a distribution over inputs . My move is to select a distribution over algorithms . Both have finite size. Your goal is to make high, while my goal is to make it as low as possible. The best value for you is,

while the best value for me is

.

Yao’s min-max lemma says that the two values are equal.

BPP P/poly.

*Proof:* Suppose there is an algorithm which decides a language in BPP using a random string of length . We can construct a majority-vote algorithm which uses a random string of length $latex poly(n)$ but the probability that it errs on a given input string of length is at most , thanks to the Chernoff bound. For a fixed , the probability that is incorrect on some input is at most . Then by the probabilistic method, there exists a random string so that is correct on every input.

Communication Complexity for Equality.Each part must communicate at least bits to compute a Boolean predicate , where .

SAT is reducible to 3SAT.It suffices to formulate a 3-CNF formula equivalent to a clause with four variables, say . We introduce a dummy variable and write . Clearly, if there is an assignment which makes true, we can choose an assignment to the variable so that both clauses of are true. On the other hand, if an assignment makes false, then at least one clause of will be false regardless of which value we choose for .

Every -variable Boolean Function has a CNF Formula of size .Consider an -variable Boolean function . For every assignment , we construct an “indicator clause” that is false for but true for every other assignment. For example, if , the clause . Then, the desired CNF formula is the AND of all the indicator clauses for all “bad assignments” which makes . This formula outputs zero whenever , and one otherwise. This uses at most clauses, each with OR symbols; these clauses are collected using AND symbols. Thus the total number of symbols is at most .

Impagliazzo’s Hardcore Lemma.If a boolean function is “hard to compute on average” by small circuits, then there exists a set of inputs on which the same function is “extremely hard to compute on average” by slightly smaller circuits.

Yao’s XOR Lemma.Suppose a function is -hard for size , i.e., every circuit of that size fails to compute on at least a fraction of inputs. Define . If , then is -hard for circuits of “smaller” size, i.e., size at most.

See here for a proof of the XOR lemma from the hardcore lemma.

End Complexity Theory Edit | Back to top

## Cryptography

Shamir’s -Threshold Secret Sharing.Suppose you have a secret . If you have a degree- polynomial over any field , you can set and create a set of arbitrary evaluations of this polynomial in . Any of these evaluations can reconstruct via Lagrange interpolation and thus recover . Any smaller subset of evaluations cannot reconstruct , and thus get no information about the secret .

*Note.* See the similarity with the Reed-Solomon code.

Feldman’s Verifiable Secret Sharing.Suppose you want to share a secret .

Initialization: Make public where generates the field .Share generation: Like Shamir’s secrete sharing scheme, choose a (random) polynomial of degree , set , the secret, and distribute shares to each user .

Commitments: Create commitments to each coefficient of , and make them public.

Verification: The user can verify whether

because

Reconstruction: Lagrange interpolation from any shares.

Discrete-Log Setting .Let be a generator of a multiplicative group of prime order . An easy way to do it is to select a prime such that . Then, has a subgroup of order , and every element of is a generator. In particular, we can take to be a random element in .

Discrete-Log Assumption.Given and , it is hard to find .

Decisional Diffie-Hellman (DDH) Assumption.Given , it is hard to distinguish from random.

Computational Diffie-Hellman (CDH) Assumption.Given , it is hard to compute .

Bit Commitment from One-Way Permutations.Suppose you want to commit to a bit . Initialization: Let be a hard-core predicate for a strong one-way permutation . Commit: You sample a random string and send to the receiver. Reveal: You send and .

This scheme is “computationally hiding” because to predict one has to predict the hard-core using , which is “difficult” by definition. This is “perfectly binding” because there can be no such that .

Bit Commitment from One-Way Functions.Suppose you want to commit to a bit .

Initialization: Let be PRG. The receiver sends you a random string .Commit: You sample a random -bit string and a random -bit string . If you send to the receiver; otherwise, you send .

Reveal: You send to the receiver.

Recall that one-way functions imply PRG. The definition of a PRG implies that this scheme is **computationally hiding**. To cheat, you will have to find a such that . The number of pairs is while the size of the PRG’s output space is . Hence the probability that you’ll succeed is at most . Hence this scheme is **perfectly binding**.

Pedersen’s Bit Commitment Scheme.Suppose a sender wants to commit to a bit .

Initialization: Everyone agrees on a Discrete-Log setting . The receiver sends a random .Commit: The sender selects a random . The commitment is .

Reveal: The sender sends to the receiver, and he verifies whether .

The scheme is **information-theoretically hiding**; since is random, is also random regardless of . The scheme is **computatinally binding**: to cheat, one has to find two different opening pairs, say and , such that where and . Suppose you can extract such a pair from the commitments and the common knowledge . Since for some , you can solve the equation , or , or , or . But this contradicts the Discrete-Log assumption, since now you know , the discrete log of with respect to . See here for more schemes. Here’s the original paper.

Chaum-Pedersen Discrete Log Proof-of-Knowledge Scheme.

Setup: Fix a multiplicative group of a prime order . Suppose you know some element and two elements such that and for two independently chosen generators . Publish .Goal: you want to convince someone about your knowledge of without revealing what is.

Interaction: You select a random element from and give the verifier two elements from for . Then the verifier sends you a random element as a “challenge”. You then reply with a “response” .

Verification: The verifier then verifies whether .

Note: The Fiat-Shamir scheme turns a public-coin interactive proof-of-knowledge of the discrete log into a non-interactive proof-of-knowledge where the prover, instead of receiving random coins from the verifier, queries a random oracle to get the random bits himself. However, Goldwasser and Kalai have shown if Fiat-Shamir was used to make digital signatures, substituting the random oracle with any “hash function” results in an insecure digital signature scheme.

RSA Encryption Scheme.

Setup: Select two large primes and compute . Let . Pick an arbitrary element in the multiplicative group . This means, is coprime to and invertible modulo . Let be this inverse. is the public key, is the private key.Encryption: message is encrypted as .

Decryption: since .

Diffie-Hellman Key Exchange.You and I publicly agree on a prime and a generator of the group . I randomly pick a private value and you randomly pick your private value . I send you in public, while you send me in public. I construct while you construct . This is our shared secret assuming the discrete log problem is hard.

ElGamal Encryption Scheme.First, you and I publicly agree on a prime and a generator of the group . Then we construct a shared secret via the Diffie-Hellman key exchange protocol. You encrypt a message as . Since is cyclic, I can compute the inverse of , call it . Finally, I can decrypt your cyphertext as .

ElGamal Digital Signature Scheme.Initialization: Select a large prime and a generator of the group . Select a random secret element and set . Let be a cryptographically secure hash function.

Signning: To sign a message , randomly select some such that is invertible modulo . Let . Compute an such that . If , start over. The tuple is the signature of .

Verification: Check that and . For a correct signature, the LHS is .

Schnorr Digital Signature Scheme.

Setup: Everyone agrees on the Discrete-Log setting . The signer’s secret signing key is some , and his public verification key is . Everyone has access to a hash function .Signing: To sign the message , the signer chooses a random , computes , computes the hash where denotes concatenation. The pair is the signature for where, recall, that is the secret signing key.

Verification: Given and a purported signature , one computes ; and verifies that .

Hard-Core Predicate.An easy-to-compute boolean predicate is a hard-core for a function if no efficient algorithm can predict with non-negligible success even if they know .

Weak One-Way Functions Imply Storng Ones.If is a weak one-way function, then is a strong one-way function.

Every Strong One-Way Function has a Hard-Core Predicate.The Boolean inner product is a hard-core predicate of the function where is a strong one-way function and is arbitrary.

Intuitively, you have to invert to predict the inner product.

Oblivious Transfer.The context is a 1-out-of-N multiplexer. The sender holds values . The receiver holds an index . The receiver wants to know without the sender knowing . The sender wants that the receiver knows nothing other than .

1-out-of-2 Oblivious Transfer via Public Key Encryption.This is a variant of Even-Goldreich-Lempel. The sender chooses a random . The receiver chooses a valid key-pair and an “orphan” public key . The idea is that the sender, upon receiving , makes sure that the “other” public key, , is forced to a random value chosen by the sender. Next, the sender encrypts two values: for , he sends to the receiver. Then the receiver can find .

This gives computational security against the sender (breaking the encryption scheme) and the receiver as well (recovering the secret key from a given public key).

1-out-of-2 Oblivious Transfer via DDH Assumption.The sender and receiver agree on two generators . The receiver chooses a random exponent . He then sends the following to the sender:.

The sender samples four random exponents . Then he creates two “encryptions” where

and

.

The receiver, upon receiving , can recover

.

Note that if , then .

On the other hand, if , then .

This gives information-theoretic security against the receiver since all structure is lost in a . On the other hand, this gives computational security (i.e., the DDH assumption) against the sender since the sender can break if he can compute the discrete log from .

End Cryptography Edit | Back to top

## Distributed Computing

Byzantine Agreement.To deterministically achieve Byzantine agreement (consensus) among processors in a fully connected network in which processors may be faulty, where satisfies , exactly fcommunication rounds are required [Pease, Shostak, and Lamport]. This result was proved under the assumption that the network topology is knowna priorito each processor, i.e., the identifiers of the processors and of the links between processors are known.Micali’s BA protocol works on a completely

asynchronousnetwork when at least of the users are honest, and halts after 9 rounds in expectation. It is player-replaceable in the sense that players can be removed/corrupted immediately after sending a message.The Micali-Vaikuntnathan protocol works on a

synchronousnetwork, and allows the honest fraction to be as low as . With probability , the number of rounds is at most

The Impossibility of Byzantine Agreement.It is impossible to attain even a Boolean Byzantine Agreement among deterministic players in an asynchronous network if only one player is faulty [Fischer-Lynch-Paterson].

End Distributed Computing Edit | Back to top

## Optimization

**Newton’s Method.** Suppose you want to find a root of a locally-convex function in the vicinity of the initial point . The root will be approached in a sequence of points . Suppose you already know . You make a linear approximation of at as follows: . Because you want to find a root of , you set and solve for

.

Because you are doing a linear approximation, the error term decreases quadratically as you approach the true root . In particular, Taylor expansion of around tells us that the approximation error is

End Optimizaiton Edit | Back to top

## Series / Identities

Taylor Seriesof a real function around the point .

If , then

By definition,

.

**Fractional Powers of a Fraction.**

End Series Edit | Back to top

## Inequalities

when

The exponential function is steep above zero and flat below zero.for .

Concave functions are subadditive.Suppose is concave and . Then,

.

Cauchy-Schwartz Inequality.The inner product is at most the product of individual -lengths.

Holder’s Inequality.Let for real . Let . Then.

vs. Norm Inequality.Let .using in Cauchy-Schwartz inequality.

using in Holder’s inequality.

AM-GM Inequality.Let denote a convex combination i.e., . Then.

Equality occurs if .

Power Mean Inequality.For a vector and positive real , define . For positive real ,.

In other words, is “not too short:” for any ,

.

A Logarithmic Inequality.Let . Then.

Proof: Let be a convex function satisfying . Let be two points in the domain so that for some . By Jensen, we have since by assumption.

Now consider the function which is convex in . For this function, i.e., . This means for any , we are free to choose any so that . It follows that , or . Writing yields the above inequality.

End Inequalities Edit | Back to top

## Quantum Computing

State Vectors.qubits can specify indices, one for each coordinate of an -dimensional state vector in . An orthonormal basis of an -qubit system is the so-called “computational basis” , where is simply the standard basis vector .

Pauli Matrices/Gates.Let where.

For each , it holds that

.

and .

Their commutators have a nice structure:

.

The eigenvalues of these matrices are . The corresponding eigenvectors are:

.

.

.

Hadamard Gate..

.

Specifically, . Notice that

where we use the notation

where for .

Let , and . An -qubit Hadamard gate is

,

where denotes the Boolean inner product of two strings. It can also be written as the tensor product

.

Other Quantum Gates.

- Rotate by an angle of : .
- Phase gate i.e., Rotate by : .
- Axis-specific rotations: for and ,

Measurements.A measurement apparatus satisfies the completeness relation, i.e., it is a decomposition of the identity . The probability of observing the outcome after measuring is.

The state of the system after the measurement is .

Projective Measurements.The components of a projective measurement correspond to the spectral decomposition in that each is a projection operator for the eigenspace :.

This implies, the probability of observing a vector after measuring a quantum state is the squared modulus of the (complex) inner product of these two vectors:

.

The

expectationandvarianceof a projective measurement has a friendly form:.

,

where is the “standard deviation.”

The Heisenberg Uncertainty Principle.Let be two projective measurements. Then, with respect to any quantum state , the product of the standard deviations (across a large number of trials) is “large”:.

Entangled State.A quantum state in a -bit system is entangled if it cannot be written using a single -bit index. For example, suppose . The -bit integers index the basis states. We need two “independent” basis indices (namely 1 and 4) to specify the (unnormalized) state ; hence it is an entangled state.

The Bell Basis.Each of the following states is entangled. They form an orthonormal basis of .Put in another way, let denote the four Bell states denoted by the bits . In particular, corresponds to , and corresponds to . Then,

Bloch Sphere Representation.A single qubit can be represented by the tuple , as follows: can be identified with a point on the 2-sphere in 3-dimensions, with the vector at the North pole. Then is the angle of the vector $late v$ with the -axis, and is the angle between the -axis and the projection of on the -plane. TheBloch vectorfor is.

This is equivalent to setting .

Quantum Fourier Transform.Let , the imaginary unit, and . Let be the th power of the th root of unity. The QFT is a linear operator that takes.

The right hand side equals

.

End Quantum Computing Edit | Back to top

## Discrete Geometry

Radon’s Theorem.Any set of points in can be partitioned into red-blue points so that the respective convex hulls intersect.

Point-Line Incidences.Let be any field. Consider a set of points in the plane . The number of point-line incidences is .

End Discrete Geometry Edit | Back to top

## Analysis

Gamma function.The integral,

converges absolutely if . It is the unique function that satisfies . In particular, . Euler proved that

.

Weierstrass showed that

,

where is the Euler-Mascheroni constant. Euler’s reflection formula says that

if .

The duplication formula says that

.

Moreover, . The integral derivatives are

.

Fractional Differentiation of a monomial.Let for . Then for any real ,.

This agrees with the usual integral differentiation if .

Cauchy’s Repeated Integration. Let be a Lebesgue-integrable function defined in . Define the integration operator.

A repeated application of this operator can be expressed as

.

Riemann-Liouville Fractional Integral.The natural extension of Cauchy’s repeated integration’s order from an integer to a positive real is the Riemann-Liouville fractional integral:where is a locally-integrable function and .

It can be showed that for ,

,

where denotes the Fourier transform.

Riemann-Liouville Fractional Derivative.To differentiate to the order , we cannot directly use as the order in Riemann-Liouville integral. Instead, integrate to the order and then differentiate to the integral order ..

Note that the value of the derivative depends on all values of for , whereas the value of the classical derivatives are “local” in that they depend only on .

It can be showed that for ,

,

where denotes the Fourier transform.

Fourier Transform.Let . Then, for any function , its Fourier transform is a function defined as.

The inverse transform is defined as

.

The transform of a convolution is the product of the transforms:

.

The transform of a derivative is a scaling of the transform:

.

The

Plancherel theoremsays that the Fourier transform (of a square-integrable function) preserves the squared- norm of the function:.

This is generalized by

Parceval’s theoremfor two square-integrable functions.

Laplace Transform.Very similar to the Fourier transform, except the coefficients are complex and the integration runs only on the non-negative reals. For functions ,.

The inverse transform is given by Fourier-Mellin’s inverse formula aka the Bromwich line integral:

.

Laplace transform is linear. Moreover,

.

.

.

.

.

,

where is the integration operator denoting and is the Laplace transform.

If is a random variable with density $f$, then for any complex , .

Tonelli’s Theorem.If a two-variable function is non-negative, then we can change the order of integration in a double integral, i.e., if then.

Fubini’s Theoremhas the same conclusion as Tonelli’s theorem, but it stipulates that has a finite integral over .

Differentiation under the Integral.Also called “the Leibniz rule.”.

If are constants, the first two terms are zero.

Polylogarithm Function.For complex order and argument ,.

It can be expressed as a repeated integral of itself:

.

Moreover,

,

where and the notation denotes the Eulerian numbers. This expression is strikingly similar to the th moment of a geometrically distributed random variable with success parameter for some . Then

.

The Hypergeometric Series.For a complex and positive integers ,,

where is the rising Pochhammer symbol. If either or is non-positive, the series reduces to the polynomial

,

Laguerre Polynomials..

They satisfy the differential equation

.

They are orthogonal with respect to the inner product

=\delta_m(n).

Brouwer Fixed-Point Theorem.Let be a compact convex space, and let is continuous. Then there exists some with .

**Example 1:** If is a real interval, then every continuous curve must cross the diagonal straight line at some point.

**Example 2:** Consider the -dimensional space of probability distributions on the set , and let us call this space . This space is convex, closed, and bounded (hence compact). A right-stochastic real matrix (whose rows sum to one) has a trivial left-eigenvector with eigenvalue 1. Moreover, by the Gershgorin disk theorem, it is the largest possible eigenvalue (in absolute value). Recall that if is a left-eigenvalue, it is also a right-eigenvalue. The matrix , via matrix-multiplication from the left, acts as a continuous function on . Hence, must have a left-eigenvector, too. See here for details.

Prime Number Facts.

End Analysis Edit | Back to top

## Interesting Results

The union of two random spanning trees of any graph is an expander. Moreover, it is a linear-size cut sparsifier for random graphs. Goyal-Vempala 2004

It suffices to sample only elements from any distribution to test its entropy and support size. See here.

Let be an expander. Let contain an fraction of all edges. Consider a random walk that starts at a random edge from . The probability that it uses an edge from at the -th step is at most . See here.

For certain self-reducible problems, approximate counting is equivalent to almost uniform sampling. See here.

End Interesting Results Edit | Back to top