This page is always under construction.
(For me only)
Chinese Remainder Theorem. Given a (pairwise co)prime factorization of an integer , there is a bijection between and the Cartesian product where is the set of units modulo i.e., the set of integers that are coprime to . These are also the invertible elements modulo .
More. This bijection takes an element to the tuple . Moreover, given a tuple , the corresponding element in is where and is the multiplicative inverse of modulo .
Fermat’s Integer Factorization of . Try to find two integers such that and . Then must divide the product . Consequently, the gcd of and at least one of these factors must be non-trivial. This gcd divides .
Finite Abelian Groups. Every finite Abelian group can be writen as a direct product
where and are powers of (possibly repeating) primes and divides , divides and so on.
Cyclic Groups. These are abelian groups which are generated by a single element. is cyclic. For every positive divisor of , the quotient group has precisely one subgroup of order , the one generated by the residue class of . The number of generators of equals the number of elements coprime to . This number is given by Euler’s totient function . The order of the element in is . The direct product of two cyclic groups and is cyclic if and only if and are coprime.
Every finite abelian group can be expressed as “the” direct sum of cyclic subgroups of prime-power order, and these orders are uniquely determined.
Pontryagin Dual. Let be an additive finite Abelian group and let be the unit torus with multipliciation as the group operation. Let be a group homomorphism defined as . Then, the set forms an Abelian group with the group operation being “point-wise multiplication”
and the identity operation being the constant function
is called the Pontryagin dual of , and the character is called the trivial character.
Reed-Muller Code. The order- Reed-Muller code is the set of Boolean polynomials on variables and degree at most .
Reed-Solomon Code. In the Reed-Solomon code , a degree- polynomial over — the message — is encoded into a degree- polynomial over — the codeword — where the coefficients of the codeword are obtained by evaluating on a fixed set .
Since can have at most zeros at any field, at most of the s can be zero. This means the minimum distance is at least which matches the singleton bound. By definition, the Reed-Solomon code is MDS.
Singleton Bound. For a linear code, the distance is at most . Any code matching this bound is called MDS (maximum distance separable).
Plotkin Bound. If the distance of a code is strictly larger than half the block length , that is, if , the number of codewords is at most .
Johnson Bound. Consider a code whose relative distance is “large” i.e., at least . Then, for every codeword and every “large” , only a “small” number of strings are “close” to a codeword. In particular, for any codeword , the number of strings (in the target space) that are within a relative distance from x is at most .
Gilbert-Vershamov Bound. For every and all sufficiently large , there exists a code with relative distance on bit strings with block length . Here, is the binary entropy function.
Hadamard Matrix. Let . Then
is the Hadamard matrix of order .
- The rows/columns of are indexed by -bit Boolean strings. Let denote the inner-product modulo . Then
- The map is a linear code.
- The volume enclosed by the (unit-length) columns of (i.e., the determinant) is the largest possible for any real matrix of the same order.
Discrete Fourier Transform. See here.
Probability and Combinatorics
Expectation. Suppose a random variable takes its value from the domain .
Additionally, if ,
Variance. In general,
If are independent, then
Tower Law of Conditional Expectation.
In particular, if ,
The Birthday Paradox. Suppose . When you randomly draw times from an -set with replacement, the resulting -subset will contain a repeated element with constant probability.
Explanation. Suppose we choose independent samples from the numbers . For any , the probability that is . The expected number of collisions in is at most . This is a constant whenever .
Coupon Collector’s Problem. Suppose you want to color all items in a set of size as follows: at round , you pick a random element from ; if it is still uncolored, you color it red, put it back, and repeat. This process will require rounds to color all items.
Balls in Bins. If balls are thrown into bins independently and uniformly at random, the “heaviest” bin contains at most balls with high probability. However, if the balls are placed sequentially—each into the “lightest” among bins sampled uniformly at random—then the largest load is at most . Thus even if , the bins are highly balanced in the sequential case.
Poisson Distribution. Let be a discrete random variable. If it has the Poisson distribution , its probability mass function at point is the probability that is given . That is,
The expectation and variance of is . The expected deviation of from its mean is . A binomial distribution tends to a Poisson distribution when .
Note: The events and must be independent.
Hypergeometric Distribution. This is the binomial distribution without replacement. In particular, let be the probability that exactly “good” items are obtained in sample-without-replacement trials from a population of items containing exactly “good” items in total.
The mean of this distribution is, surprisingly, , which equals . The variance is
Probability Generating Functions. Let be a formal power series with . If then “generates” the probability distribution . Suppose generates the distribution . It is easy to see that the convolution generates the probability distribution whose mass function is
Composition of Generating Functions. Let and denote the number of ways to build two structures, respectively of types 1 and 2, on a -element set where and . Let be the number of ways to divide the set into an unspecified number of intervals, build a type-1 structure on each interval, then build a type-2 structure on the set of intervals. Set . Denote by the generating functions of the sequences . Then
Stochastic Dominance (First Order). Consider two random variables . stochastically dominates if “always” has a “fatter right tail” than that of . To be precise, for all in the (totally-ordered) domain of the variables, with strict inequality for some . having a fatter right tail is equivalent to having a thinner left tail.
The Gambler’s Ruin Problem. See here.
The Unbiased Random Walk. See here.
Sum of first binomial coefficients. Suppose . Then,
This bound is due to Michael Lugo. (Aside: This implies, if , the sum is at most .)
The above bound is within a factor of as long as and . When , this multiplicative factor increases.
Shearer’s Inequality. Let be coordinate-sets such that each coordinate is covered by at least different s. Let be a collection of independent random variables. The entropy of is at most times the sum of the entropies of its restriction to each coordinate set. That is,
Total Variation Distance of two distributions is their largest difference over all subsets of the domain . That is,
See here for a proof.
Pinsker’s Inequality. The Kullback-Leibler divergence of two distributions is at least twice the square of their total-variaion distance. Namely,
Kullback-Leibler Divergence. Given a probability distribution on a discrete sample space , let denote the “amount of surprise” associated with the event according to the distribution . The Entropy is the expected amount of surprise, namely
The KL Divergence of with respect to is
- the expected amount of surprise “remaining” in after revealing .
- the entropy of relative to .
- the expectation (in ) of the pointwise-difference of the log-probabilities.
- A Markov chain is aperiodic if it has self-loops. It is connected if every state is reachable. It is ergodic if it has a unique stationary distribution , which happens if and only if the chain is aperiodic and connected. Let be the transition probability matrix of the chain. The chain is reversible if for all . A random walk is lazy if it does not move, with probability . A lazy, connected random walk is ergodic.
- The mixing time of a random walk with an initial distribution is the number of steps it takes so that the distribution after steps is very close to its stationary distribution.
- An ergodic chain mixes rapidly: In particular, if , then
- The mixing time in the above case is . For a -regular graph, this simplifies to .
Sigma Field. A -field consists of a sample space and a collection of subsets satisfying the following conditions.
The last condition, coupled with the second condition, implies the closure under countable union and intersection.
Doob Martingale. For any bounded function and a sequence of random variables , define , , and
Then, the sequence is a martingale with respect to the sequence , i.e.,
for every .
Proof: By the tower property of conditional expectation,
Independent Set and Vertex Cover. A largest independent set is the complement to a minimum vertex cover. The vertex cover problem is the dual of the maximum matching problem. In a bipartite graph, the size of the maximum matching equals the size of the minimum vertex cover (Konig’s theorem).
Crossing Number. is the minimum number of edge crossing in a plane drawing of a graph . If , then . This is the crossing number inequality.
Szemeredi-Trotter Theorem. Suppose we have points and lines in a two-dimensional plane. Consider any bipartite graph where an edge implies that the point is on line . Then
Algebraic Graph Theory
Expander Mixing Lemma. Let be a -regular expander graph on vertices and edges. Suppose the second-largest eigenvalue of its adjacency matrix is at most . For any pair of vertex-sets , let be the set of “cross edges” between and . Let denote the cardinality of the set . Then, for all ,
In the left-hand side, the first term is the number of cross edges. The second term is the expected number of cross edges in a random graph with edge-probability . The left-hand side thus measures how much deviates from this expectation. The expander mixing lemma says that if is small, will be nearly random in this sense.
Random Walk on an Expander. Suppose is a -regular expander whose second-largest eigenvalue is at most in magnitude. Let be a set containing a fraction of vertices. The fraction of random walks of length , on the vertices of , that stay entirely within , is small; in particular, it is at most
Cheeger Inequality. Suppose is an -expander i.e., a -regular graph on vertices with the second-largest eigenvalue of its adjacency matrix being . Since the largest eigenvalue is , the quantity is the spectral gap. The boundary of a vertex set is denoted by . The edge expansion of is defined as
A complex matrix is unitary if . A real, unitary matrix is orthogonal. is Hermitian if . It is normal if . A Hermitian matrix is Positive Semidefinite if all its eigenvalues are non-negative.
Cramer’s Rule. Suppose we want to solve a matrix-vector equation . Let denotes the matrix obtained by replacing the th column of the matrix by the . Then,
Spectral Decomposition. A matrix is Hermitian i.e., , if and only if has an orthonormal set of eigenvectors with real eigenvalues i.e., where are an eigenvector-eigenvalue pair.
Normal Matrices. A matrix is normal if it commutes with its conjugate transpose, i.e., . This happens if and only if is unitarily diagonalizable, i.e., there exists a diagonal and unitary such that .
A block upper-triangular matrix is normal if and only if all on-diagonal blocks are normal and all off-diagonal blocks are zero.
Polar Decomposition. Every matrix can be written as where is unitary, are positive semidefinite, and . Moreover, if is invertible then is unique.
Singular Value Decomposition. Every square matrix can be written as where are unitary and has non-negative real diagonal entries.
Proof: By polar decomposition, where is unitary and is positive semidefinite. By spectral theorem, for some unitary and real diagonal . Moreover, since is PSD, its eigenvalues are non-negative; hence has non-negative real entries. We set .
Cholesky Factorization. Every positive semidefinite complex matrix can be factored into where is lower-trialgular with real, positive diagonal entries.
Schur Decomposition. Any complex square matrix can be written as where is unitary and is upper triangular.
Square Root. Every positive semidefinite matrix (with bounded largest eigenvalue) can be factored into where is also positive semidefinite.
Hilbert-Schmidt Norm and Inner Product.
Let denote the -th column of the matrix . Then,
Trace of a matrix is the sum of its eigenvalues.
Projection Matrices. Any projection matrix can be written as where is a basis of the target subspace. has eigenvalues . is always Hermitian i.e. , as evidenced by the sum formulation. satisfies .
Frobenius Norm. A generalization of the usual norm for matrices.
This norm, like the norm, is invariant under rotation.
Tensor Inner Product. The tensor inner product on the space is defined as follows:
where are scalars and , and .
Conjugate Gradient Method (for Solving Symmetric Linear Systems). Suppose we would like to solve for where is a real symmetric Positive Semidefinite matrix. The matrix defines the inner product . We can find a set of basis vectors spanning that are mutually orthogonal according to this inner product. Then, the solution can be written as a linear combination of these basis vectors.
In practice, the vectors and corresponding can be found using a process similar to the Gram-Schmidt orthonormalization. Any subset of these basis vectors gives an approximation of .
Facts about the Matrix Exponential. If commute, then . In addition, if commute then commute, as do .
Moreover, for any invertible , . Next, . If then .
Golden-Thompson inequality. For any square matrices ,
Lieb’s Theorem. Let be Hermitian and positive definite. The function
Corollary (Tropp). Let be fixed, be random, both Hermitian. Then
See Joel Tropp’s paper (Section 3.4).
Generating Pairwise Independent Random Variables. Choose a field of size . Sample two uniformly random elements using uniformly random bits. Then the following random variables are uniform over and pairwise independent:
See here for details.
Pairwise Independent Hash Functions. Let be a family of functions, each having input bits and output bits. Suppose the following holds: that if we pick at random, the probability that it maps two arbitrary but distinct input strings to the same output string is exactly . Then is a family of pairwise independent hash functions. In other words,
Equivalently, the probability that two arbitrary strings will collide under a random is the square of the uniform probability on the output space. Intuitively, if we pick a random then for arbitrary in the domain, the strings are independent and uniformly distributed.
Valiant-Vazirani Lemma. Let be a collection of pairwise independent hash functions from . Let such that where . Then, with constant probability, the string has a unique preimage in under a randomly selected .
Leftover Hash Lemma. Suppose an -bit string has entropy at least . The leftover hash lemma says that it is possible to extract bits from that will be almost uniformly distributed. The way to do it is to use the index to choose a member from a pairwise-independent hash family . If a string is randomly chosen from an arbitrary subset , the distribution of the concatenated string will be at most away from the uniform distribution where .
Hitters. Fix a Boolean function on -bit inputs. Let be the set on inputs on which outputs , and suppose . An efficient, randomized algorithm , given only oracle access to , is a hitter for if with probability at least , it needs to sample at most $m$ points from before it “hits ,” that is, it outputs an element from . We can construct hitters, using random walks on expanders, which takes samples and uses only random bits. In contrast, if we sample uniformly at random, it requires $mn$ random bits. See the appendix of Goldreich’s survey for details.
Markov’s Inequality. Any nonnegative with mean satisfies for any positive .
Chebyshev’s Inequality. Any random variable with mean and variance satisfies
for any positive . Setting , we get
Note: When studying the sum of a set of random variables, the Chernoff-Hoeffding inequality requires mutual independence of the random variables while for Chebyshev’s inequality, it suffices that they have only pairwise independence (so that the covariances are zero). See Nick Harvey’s notes here.
Mean-Median Inequality. The median of a random variable is within one standard deviation of its mean.
See here for a proof using Chebyshev’s inequality.
Paley-Zygmund Inequality. Let be a non-negative random variable with a finite variance. If then
Note: The Paley-Zygmund inequality complements the Chebyshev inequality. While the latter says that “right tail is thin,” the former says that “the left tail is thin.”
Hoeffding’s Tail Inequality, Additive Deviation, Symmetric. The probability that the sum of independent and identically distributed -bounded random variables deviates at least away from its expectation is no more than .
Chernoff Bound, Multiplicative Deviation, Asymmetric. Let be the sum of many independent Bernoulli trials. Let and let be any positive number between zero and one. The probability that is below the mean is at most . However, The probability that is above the mean is at most .
(Hoeffding-like statement.) Equivalently, the probability that is below the mean is at most . However, The probability that is above the mean is at most .
Note. When is large, the normal approximation of the binomial distribution is less accurate for the positive tail. See the appendix of Alon-Spencer for a commentary.
Above are the plots of the Chernoff-tail and the Hoeffding-tail for both 0-1 and random variables. The plotted quantity is the bound for the probability that the sum deviates $t n$ away from the expectation, for and .
Rudelson-Vershynin Theorem on the Operator Norm of Random Projections. Let be identical copies of a random vector with and magnitude at most . Let . For any ,
Note: We can think as a “scaled” projection operator along multiplied by . Then acts as the mean of these projection operators. The theorem says that the largest distortion inflicted by on any vector is no more than with high probability. Notice also the similarity with Hoeffding’s tail inequality, which would give the exponent as for one-dimensional case. We lost a factor of 8 somewhere.
Azuma-Hoeffding Inequality for Martingales. Let be real-valued random variables with the (super)martingale property, i.e., , and the bounded-difference property, i.e., where may depend on . Then, for all and ,
In particular, if for all ,
The one-sided statement, i.e., for non-absolute difference, is as you have hoped. See Motwani-Raghavan Chapter 4, Theorem 4.16 for details.
Khintchine Inequality. Let be uniformly random in and let be arbitrary. For any bounded , let . Let be the th moment of . Define . Then,
where the constants hidden by the notation depend only on .
Marcinkiewicz-Zygmund Inequality. Let be independent random variables with zero mean and bounded th moment for any . Let be the th moment of . Then,
where the constants hidden under the notation depend only on .
Goldreich-Levin Theorem. Every injective one-way function has a corresponding boolean function which is hard to predict; predicting is effectively inverting , which is unlikely when is an injective one-way function.
Lipton’s Random Self-Reducibility Theorem. A problem is called “random self-reducible” if you can solve on an arbitrary input by evaluating on a sequence of uniformly random inputs. Suppose you have a black-box to solve on a random input with high probability. Then you can construct a randomized algorithm that can solve on every input with high probability.
Yao’s Min-Max Lemma. You and I are playing a two-player game. You are the input player and I am the algorithm player. Your move is to select a distribution over inputs . My move is to select a distribution over algorithms . Both have finite size. Your goal is to make high, while my goal is to make it as low as possible. The best value for you is
while the best value for me is
Yao’s min-max lemma says that the two values are equal.
Proof: Suppose there is an algorithm which decides a language in BPP using a random string of length . We can construct a majority-vote algorithm which uses a random string of length $latex poly(n)$ but the probability that it errs on a given input string of length is at most , thanks to the Chernoff bound. For a fixed , the probability that is incorrect on some input is at most . Then by the probabilistic method, there exists a random string so that is correct on every input.
Communication Complexity for Equality. Each part must communicate at least bits to compute a Boolean predicate , where .
SAT is reducible to 3SAT. It suffices to formulate a 3-CNF formula equivalent to a clause with four variables, say . We introduce a dummy variable and write . Clearly, if there is an assignment which makes true, we can choose an assignment to the variable so that both clauses of are true. On the other hand, if an assignment makes false, then at least one clause of will be false regardless of which value we choose for .
Every -variable Boolean Function has a CNF Formula of size . Consider an -variable Boolean function . For every assignment , we construct an “indicator clause” that is false for but true for every other assignment. For example, if , the clause . Then, the desired CNF formula is the AND of all the indicator clauses for all “bad assignments” which makes . This formula outputs zero whenever , and one otherwise. This uses at most clauses, each with OR symbols; these clauses are collected using AND symbols. Thus the total number of symbols is at most .
Shamir’s -Threshold Secret Sharing. Suppose you have a secret . If you have a degree- polynomial over any field , you can set and create a set of arbitrary evaluations of this polynomial in . Any of these evaluations can reconstruct via Lagrange interpolation and thus recover . Any smaller subset of evaluations cannot reconstruct , and thus get no information about the secret .
Note. See the similarity with the Reed-Solomon code.
Feldman’s Verifiable Secret Sharing. Suppose you want to share a secret .
Initialization: Make public where generates the field .
Share generation: Like Shamir’s secrete sharing scheme, choose a (random) polynomial of degree , set , the secret, and distribute shares to each user .
Commitments: Create commitments to each coefficient of , and make them public.
Verification: The user can verify whether
Reconstruction: Lagrange interpolation from any shares.
Discrete-Log Setting . Let be a generator of a multiplicative group of prime order . An easy way to do it is to select a prime such that . Then, has a subgroup of order , and every element of is a generator. In particular, we can take to be a random element in .
Discrete-Log Assumption. Given and , it is hard to find .
Decisional Diffie-Hellman (DDH) Assumption. Given , it is hard to distinguish from random.
Computational Diffie-Hellman (CDH) Assumption. Given , it is hard to compute .
Bit Commitment from One-Way Permutations. Suppose you want to commit to a bit . Initialization: Let be a hard-core predicate for a strong one-way permutation . Commit: You sample a random string and send to the receiver. Reveal: You send and .
This scheme is “computationally hiding” because to predict one has to predict the hard-core using , which is “difficult” by definition. This is “perfectly binding” because there can be no such that .
Bit Commitment from One-Way Functions. Suppose you want to commit to a bit .
Initialization: Let be PRG. The receiver sends you a random string .
Commit: You sample a random -bit string and a random -bit string . If you send to the receiver; otherwise, you send .
Reveal: You send to the receiver.
Recall that one-way functions imply PRG. The definition of a PRG implies that this scheme is computationally hiding. To cheat, you will have to find a such that . The number of pairs is while the size of the PRG’s output space is . Hence the probability that you’ll succeed is at most . Hence this scheme is perfectly binding.
Pedersen’s Bit Commitment Scheme. Suppose a sender wants to commit to a bit .
Initialization: Everyone agrees on a Discrete-Log setting . The receiver sends a random .
Commit: The sender selects a random . The commitment is .
Reveal: The sender sends to the receiver, and he verifies whether .
The scheme is information-theoretically hiding; since is random, is also random regardless of . The scheme is computatinally binding: to cheat, one has to find two different opening pairs, say and , such that where and . Suppose you can extract such a pair from the commitments and the common knowledge . Since for some , you can solve the equation , or , or , or . But this contradicts the Discrete-Log assumption, since now you know , the discrete log of with respect to . See here for more schemes. Here’s the original paper.
Chaum-Pedersen Discrete Log Proof-of-Knowledge Scheme.
Setup: Fix a multiplicative group of a prime order . Suppose you know some element and two elements such that and for two independently chosen generators . Publish .
Goal: you want to convince someone about your knowledge of without revealing what is.
Interaction: You select a random element from and give the verifier two elements from for . Then the verifier sends you a random element as a “challenge”. You then reply with a “response” .
Verification: The verifier then verifies whether .
Note: The Fiat-Shamir scheme turns a public-coin interactive proof-of-knowledge of the discrete log into a non-interactive proof-of-knowledge where the prover, instead of receiving random coins from the verifier, queries a random oracle to get the random bits himself. However, Goldwasser and Kalai have shown if Fiat-Shamir was used to make digital signatures, substituting the random oracle with any “hash function” results in an insecure digital signature scheme.
RSA Encryption Scheme.
Setup: Select two large primes and compute . Let . Pick an arbitrary element in the multiplicative group . This means, is coprime to and invertible modulo . Let be this inverse. is the public key, is the private key.
Encryption: message is encrypted as .
Decryption: since .
Diffie-Hellman Key Exchange. You and I publicly agree on a prime and a generator of the group . I randomly pick a private value and you randomly pick your private value . I send you in public, while you send me in public. I construct while you construct . This is our shared secret assuming the discrete log problem is hard.
ElGamal Encryption Scheme. First, you and I publicly agree on a prime and a generator of the group . Then we construct a shared secret via the Diffie-Hellman key exchange protocol. You encrypt a message as . Since is cyclic, I can compute the inverse of , call it . Finally, I can decrypt your cyphertext as .
ElGamal Digital Signature Scheme.
Initialization: Select a large prime and a generator of the group . Select a random secret element and set . Let be a cryptographically secure hash function.
Signning: To sign a message , randomly select some such that is invertible modulo . Let . Compute an such that . If , start over. The tuple is the signature of .
Verification: Check that and . For a correct signature, the LHS is .
Schnorr Digital Signature Scheme.
Setup: Everyone agrees on the Discrete-Log setting . The signer’s secret signing key is some , and his public verification key is . Everyone has access to a hash function .
Signing: To sign the message , the signer chooses a random , computes , computes the hash where denotes concatenation. The pair is the signature for where, recall, that is the secret signing key.
Verification: Given and a purported signature , one computes ; and verifies that .
Hard-Core Predicate. An easy-to-compute boolean predicate is a hard-core for a function if no efficient algorithm can predict with non-negligible success even if they know .
Weak One-Way Functions Imply Storng Ones. If is a weak one-way function, then is a strong one-way function.
Every Strong One-Way Function has a Hard-Core Predicate. The Boolean inner product is a hard-core predicate of the function where is a strong one-way function and is arbitrary.
Intuitively, you have to invert to predict the inner product.
Oblivious Transfer. The context is a 1-out-of-N multiplexer. The sender holds values . The receiver holds an index . The receiver wants to know without the sender knowing . The sender wants that the receiver knows nothing other than .
1-out-of-2 Oblivious Transfer via Public Key Encryption. This is a variant of Even-Goldreich-Lempel. The sender chooses a random . The receiver chooses a valid key-pair and an “orphan” public key . The idea is that the sender, upon receiving , makes sure that the “other” public key, , is forced to a random value chosen by the sender. Next, the sender encrypts two values: for , he sends to the receiver. Then the receiver can find .
This gives computational security against the sender (breaking the encryption scheme) and the receiver as well (recovering the secret key from a given public key).
1-out-of-2 Oblivious Transfer via DDH Assumption. The sender and receiver agree on two generators . The receiver chooses a random exponent . He then sends the following to the sender:
The sender samples four random exponents . Then he creates two “encryptions” where
The receiver, upon receiving , can recover
Note that if , then .
On the other hand, if , then .
This gives information-theoretic security against the receiver since all structure is lost in a . On the other hand, this gives computational security (i.e., the DDH assumption) against the sender since the sender can break if he can compute the discrete log from .
Byzantine Agreement.Micali’s BA protocol works on a completely asynchronous network when at least of the users are honest, and halts after 9 rounds in expectation. It is player-replaceable in the sense that players can be removed/corrupted immediately after sending a message.
The Micali-Vaikuntnathan protocol works on a synchronous network, and allows the honest fraction to be as low as . With probability , the number of rounds is at most
The Impossibility of Byzantine Agreement. It is impossible to attain Byzantine Agreement among deterministic players in an asynchronous network.
Newton’s Method. Suppose you want to find a root of a locally-convex function in the vicinity of the initial point . The root will be approached in a sequence of points . Suppose you already know . You make a linear approximation of at as follows: . Because you want to find a root of , you set and solve for
Because you are doing a linear approximation, the error term decreases quadratically as you approach the true root . In particular, Taylor expansion of around tells us that the approximation error is
Taylor Series of a real function around the point .
The exponential function is steep above zero and flat below zero.
Concave functions are subadditive. Suppose is concave and . Then
Cauchy-Schwartz Inequality. The inner product is at most the product of individual lengths.
AM-GM Inequality. Let denote a convex combination i.e., . Then
Equality occurs if .
Power Mean Inequality. For a vector and positive real , define . For positive real ,
In other words, is “not too short:” for any ,
A Logarithmic Inequality. Let . Then
Proof: Let be a convex function satisfying . Let be two points in the domain so that for some . By Jensen, we have since by assumption.
Now consider the function which is convex in . For this function, i.e., . This means for any , we are free to choose any so that . It follows that , or . Writing yields the above inequality.
State Vectors. qubits can specify indices, one for each coordinate of an -dimensional state vector in . An orthonormal basis of an -qubit system is the so-called “computational basis” , where is simply the standard basis vector .
Pauli Matrices/Gates. Let where
For each , it holds that
Their commutators have a nice structure:
The eigenvalues of these matrices are . The corresponding eigenvectors are:
Specifically, . Notice that
where we use the notation
where for .
Let , and . An -qubit Hadamard gate is
where denotes the Boolean inner product of two strings. It can also be written as the tensor product
Other Quantum Gates.
- Rotate by an angle of : .
- Phase gate i.e., Rotate by : .
- Axis-specific rotations: for and ,
Measurements. A measurement apparatus satisfies the completeness relation, i.e., it is a decomposition of the identity . The probability of observing the outcome after measuring is
The state of the system after the measurement is .
Projective Measurements. The components of a projective measurement correspond to the spectral decomposition in that each is a projection operator for the eigenspace :
This implies, the probability of observing a vector after measuring a quantum state is the squared modulus of the (complex) inner product of these two vectors:
The expectation and variance of a projective measurement has a friendly form:
where is the “standard deviation.”
The Heisenberg Uncertainty Principle. Let be two projective measurements. Then, with respect to any quantum state , the product of the standard deviations (across a large number of trials) is “large”:
Entangled State. A quantum state in a -bit system is entangled if it cannot be written using a single -bit index. For example, suppose . The -bit integers index the basis states. We need two “independent” basis indices (namely 1 and 4) to specify the (unnormalized) state ; hence it is an entangled state.
The Bell Basis. Each of the following states is entangled. They form an orthonormal basis of .
Put in another way, let denote the four Bell states denoted by the bits . In particular, corresponds to , and corresponds to . Then,
Bloch Sphere Representation. A single qubit can be represented by the tuple , as follows: can be identified with a point on the 2-sphere in 3-dimensions, with the vector at the North pole. Then is the angle of the vector $late v$ with the -axis, and is the angle between the -axis and the projection of on the -plane. The Bloch vector for is
This is equivalent to setting .
Quantum Fourier Transform. Let , the imaginary unit, and . Let be the th power of the th root of unity. The QFT is a linear operator that takes
The right hand side equals
Radon’s Theorem. Any set of points in can be partitioned into red-blue points so that the respective convex hulls intersect.
Gamma function. The integral
converges absolutely if . It is the unique function that satisfies . In particular, . Euler proved that
Weierstrass showed that
where is the Euler-Mascheroni constant. Euler’s reflection formula says that
The duplication formula says that
Moreover, . The integral derivatives are
Fractional Differentiation of a monomial. Let for . Then for any real ,
This agrees with the usual integral differentiation if .
Cauchy’s Repeated Integration. Let be a Lebesgue-integrable function defined in . Define the integration operator
A repeated application of this operator can be expressed as
Riemann-Liouville Fractional Integral. The natural extension of Cauchy’s repeated integration’s order from an integer to a positive real is the Riemann-Liouville fractional integral:
where is a locally-integrable function and .
It can be showed that for ,
where denotes the Fourier transform.
Riemann-Liouville Fractional Derivative. To differentiate to the order , we cannot directly use as the order in Riemann-Liouville integral. Instead, integrate to the order and then differentiate to the integral order .
Note that the value of the derivative depends on all values of for , whereas the value of the classical derivatives are “local” in that they depend only on .
It can be showed that for ,
where denotes the Fourier transform.
Fourier Transform. Let . Then, for any function , its Fourier transform is a function defined as
The inverse transform is defined as
The transform of a convolution is the product of the transforms:
The transform of a derivative is a scaling of the transform:
The Plancherel theorem says that the Fourier transform (of a square-integrable function) preserves the squared- norm of the function:
This is generalized by Parceval’s theorem for two square-integrable functions
Laplace Transform. Very similar to the Fourier transform, except the coefficients are complex and the integration runs only on the non-negative reals. For functions ,
The inverse transform is given by Fourier-Mellin’s inverse formula aka the Bromwich line integral:
Laplace transform is linear. Moreover,
where is the integration operator denoting and is the Laplace transform.
If is a random variable with density $f$, then for any complex , .
Tonelli’s Theorem. If a two-variable function is non-negative, then we can change the order of integration in a double integral, i.e., if then
Fubini’s Theorem has the same conclusion as Tonelli’s theorem, but it stipulates that has a finite integral over .
Differentiation under the Integral. Also called “the Leibniz rule.”
If are constants, the first two terms are zero.
Polylogarithm Function. For complex order and argument ,
It can be expressed as a repeated integral of itself:
where and the notation denotes the Eulerian numbers. This expression is strikingly similar to the th moment of a geometrically distributed random variable with success parameter for some . Then
The Hypergeometric Series. For a complex and positive integers ,
where is the rising Pochhammer symbol. If either or is non-positive, the series reduces to the polynomial
They satisfy the differential equation
They are orthogonal with respect to the inner product
Brouwer Fixed-Point Theorem. Let be a compact convex space, and let is continuous. Then there exists some with .
Example 1: If is a real interval, then every continuous curve must cross the diagonal straight line at some point.
Example 2: Consider the -dimensional space of probability distributions on the set , and let us call this space . This space is convex, closed, and bounded (hence compact). A right-stochastic real matrix (whose rows sum to one) has a trivial left-eigenvector with eigenvalue 1. Moreover, by the Gershgorin disk theorem, it is the largest possible eigenvalue (in absolute value). Recall that if is a left-eigenvalue, it is also a right-eigenvalue. The matrix , via matrix-multiplication from the left, acts as a continuous function on . Hence, must have a left-eigenvector, too. See here for details.
The union of two random spanning trees of any graph is an expander. Moreover, it is a linear-size cut sparsifier for random graphs. Goyal-Vempala 2004
It suffices to sample only elements from any distribution to test its entropy and support size. See here.
Let be an expander. Let contain an fraction of all edges. Consider a random walk that starts at a random edge from . The probability that it uses an edge from at the -th step is at most . See here.
For certain self-reducible problems, approximate counting is equivalent to almost uniform sampling. See here.