This page is always under construction.

Number Theory
Coding Theory
Algebraic Graph Theory
Linear Algebra
Randomized Algorithms
Deviation Inequalities
Complexity Theory

Number Theory

Chinese Remainder Theorem. Given a (pairwise co)prime factorization of an integer M=m_1m_2, there is a bijection between \mathbb{Z}_M and the Cartesian product U_1 \times U_2 where U_i is the set of units modulo m_i i.e., the set of integers z that are coprime to m_i. These are also the invertible elements modulo m_i.

More. This bijection takes an element m \bmod M to the tuple (m \bmod m_1, m\bmod m_2). Moreover, given a tuple (a_1 \bmod m_1, \cdots, a_n \bmod m_2), the corresponding element in \mathbb{Z}_M is \sum{a_i N_i H_i} \bmod M where  N_i = (\prod{m_j})/m_i and H_i is the multiplicative inverse of N_i modulo m_i.

Fermat’s Integer Factorization of N. Try to find two integers $latex a, b$ such that a \neq \pm b \bmod N and a^2 \equiv b^2 \bmod N. Then N must divide the product (a+b)(a-b). Consequently, the gcd of N and at least one of these factors must be non-trivial. This gcd divides N.

Coding Theory

Reed-Muller Code. The order-r Reed-Muller code R(r,m) is the set of Boolean polynomials on m variables and degree at most r.

Why Reed-Solomon Code is MDS. In Reed-Solomon code, a degree-(k-1)  polynomial m(x) (the message) is encoded into a degree-(n-1) polynomial c(x) (the codeword) where the n coefficients c_0, \cdots, c_{n-1} of the codeword are obtained by evaluating m(x) at \alpha_0, \cdots, \alpha_{n-1}. Since m(x) can have at most k-1 zeros at any field, at most k-1 of the c_js can be zero. This means the minimum distance d is at least n - (k-1) which matches the singleton bound.

Singleton Bound. For a linear [n,k,d] code, the distance d is at most n-k+1.

Plotkin Bound. If the distance d of a code is strictly larger than half the block length n, that is, if d > n/2 + \alpha, the number of codewords is at most \displaystyle \lfloor d/ \alpha \rfloor.

Hadamard Matrix. Let H_1 = \left( \begin{array}{cc} 1 & 1 \\ 1 & -1 \end{array}\right) . Then

H := H_{2^n} = \left( \begin{array}{cc} H_{2^{n-1}} & H_{2^{n-1}} \\ H_{2^{n-1}} & -H_{2^{n-1}} \end{array}\right)

is the Hadamard matrix of order 2^n. Let us interprete x, y \in \{0,1\}^n as both strings and (corresponding) integers. Then H(x,y) = (-1)^{x \odot y} where \odot is the inner-product modulo 2. The map H : \{0,1\}^n \mapsto \{0,1\}^{2^n} is an [2^n, n, 2^{n-1}] linear code. The volume enclosed by the (unit-length) columns of H (i.e., the determinant) is the largest possible for any real matrix of the same order.


Birthday Paradox. Suppose k = \Omega(\sqrt{n}). When you randomly draw k times from an n-set with replacement, the resulting k-subset will contain a repeated element with constant probability.

Explanation. Suppose we choose k independent samples X=\{ x_1, \cdots, x_k \} from the numbers 1, \cdots, n. For any i\neq j, the probability that x_i=x_j is 1/n. The expected number of collisions in X is at most {k \choose 2}/n. This is a constant whenever k = \Omega(\sqrt{n}).

Coupon Collector’s Problem. Suppose you want to color all items in a set S of size n as follows: at round 1, 2, 3, \cdots, you pick a random element from S; if it is still uncolored, you color it red, put it back, and repeat. This process will require \Theta(n \log n) rounds to color all items.

Poisson Distribution. Let X \in \{0, 1, 2, \cdots\} be a discrete random variable. If it has the Poisson distribution P(\mu), its probability mass function at point k is the probability that X is k given \mathbf{E}\,X = \mu. That is,

f(k;\mu) := = \frac{\mu^k }{k! e^\mu}.

The expectation and variance of X is \mu. The expected deviation of X from its mean is f( \lfloor \mu \rfloor; \mu). A binomial distribution B(n,p) tends to a Poisson distribution when p \rightarrow 0.

Note: The events X = a and X = b must be independent.

Probability Generating Functions. Let \mathsf A(X) = \sum_{t=0}^\infty{a_t X^t} be a formal power series with a_t \geq 0. If \mathsf{A}(1) = \sum_t{a_t} = 1 then \mathsf{A} “generates” the probability distribution A(t) := \Pr[A = t] =  a_t. Suppose \mathsf{B} generates the distribution B(t). It is easy to see that the convolution A \ast B generates the probability distribution C = A + B whose mass function is

\displaystyle C(t) := \sum_{t = x + y}{A(x) + B(y)}.

Composition of Generating Functions. Let a_n and b_n denote the number of ways to build two structures, respectively of types 1 and 2, on a n-element set where a_0 = 0 and b_0 = 1. Let c_n be the number of ways to divide the set [n] into an unspecified number of intervals, build a type-1 structure on each interval, then build a type-2 structure on the set of intervals. Set c_0 = 1. Denote by \mathsf{A}(x), \mathsf{B}(x),\mathsf{C}(x) the generating functions of the sequences \{a_n\}, \{b_n\}, \{c_n\}. Then \mathsf{C}(x) = \mathsf{B}\left(\mathsf{A}(x) \right).

The Gambler’s Ruin Problem. See here.

The Unbiased Random Walk. See here.


Sum of first binomial coefficients. Suppose k \leq n/2. Then,

\displaystyle \sum_{i=0}^k {n \choose i} \leq {n \choose k} \frac{n - k+1}{n - 2k + 1} .

This bound is due to Michael Lugo. (Aside: This implies, if k \leq n/3, the sum is at most {n \choose k} .)

The above bound is within a factor of 1.5 as long as k \leq 0.49 n and n \approx 5000. When k \rightarrow n/2, this multiplicative factor increases.

The following bound is due to Lovasz. It works better when k \rightarrow n/2; it stays within a factor of 2 for n \approx 5000.

\displaystyle  \sum_{i=0}^{k-1} {n \choose i} \leq 2^{n-1} \exp\left(-\frac{(n - 2k - 2)^2}{4(n - k - 1)} \right) .


Shearer’s Inequality. Let S_i \subseteq [N], i \leq M be coordinate-sets such that each coordinate i \in [N] is covered by at least r different S_is. Let X = (X_1, \cdots, X_N) be a collection of independent random variables. The entropy of X is at most 1/r times the sum of the entropies of its restriction to each coordinate set. That is,

\displaystyle H(X_1, \cdots, X_N) \leq \frac{1}{r} \sum_{i=1}^M{ H(X\rvert_{S_i}} .


Algebraic Graph Theory

Expander Mixing Lemma. Let G be a d-regular expander graph on n vertices and m edges. Suppose the second-largest eigenvalue of its adjacency matrix is at most \lambda. For any pair of vertex-sets A,B, let E_{A,B} be the set of “cross edges” between A and B. Let \#A denote the cardinality of the set A. Then,

\displaystyle \left \vert \frac{\# E_{A,B}}{m} - \frac{\# A}{n} \cdot \frac{\#B}{n} \right \vert \leq \frac{\lambda \sqrt{\# A \cdot \# B}}{dn} < \frac{\lambda}{d}\,.

Random Walk on an Expander. Suppose G is a d-regular expander whose second-largest eigenvalue is at most \lambda in magnitude. Let W be a set containing a \rho fraction of vertices. The fraction of random walks of length \ell, on the vertices of G, that stay entirely within W, is small; in particular, it is at most

\displaystyle \rho \left( \rho + (1 - \rho) \frac{\lambda}{d} \right)^\ell\, .

Linear Algebra

Cramer’s Rule. Suppose we want to solve a matrix-vector equation Ax = b. Let A_i denotes the matrix obtained by replacing the ith column of the matrix A by the b. Then,

\displaystyle x_i = \frac{\text{det}(A_i)}{\text{det} (A)}\, .

Randomized Algorithms

Generating Pairwise Independent Random Variables. Choose a field F of size 2^n. Sample two uniformly random elements x_1, x_2 \in F using 2n uniformly random bits. Then the following 2^n random variables are uniform over F and pairwise independent:

\displaystyle Y_i := x_1 + i \cdot x_2 \qquad \forall i \in \{0, \cdots, 2^n - 1\}

See here for details.

Pairwise Independent Hash Functions. Let H be a family of functions, each having n input bits and k output bits. Suppose the following holds: that if we pick h \in H at random, the probability that it maps two arbitrary but distinct input strings to the same output string is exactly 1/2^{2k}. Then H is a family of pairwise independent hash functions. In other words,

\forall x,x^\prime \in \{0,1\}^n, y, y^\prime \in \{0,1\}^k \forall h \in_R H \Pr[h(x)=y \wedge h(x^\prime) = y^\prime]

= \Pr[h(x)=y] \Pr[h(x^\prime) = y^\prime] = \frac{1}{\vert A \vert \vert B \vert} = 2^{-2k}.

Equivalently, the probability that two arbitrary strings will collide under a random h \in H is the square of the uniform probability on the output space. Intuitively, if we pick a random h \in H then for arbitrary x, y in the domain, the strings h(x), h(y) are independent and uniformly distributed.

Hitters. Fix a Boolean function \sigma on n-bit inputs. Let S be the set on inputs on which \sigma outputs 1, and suppose \vert S \vert = \epsilon 2^n. An efficient, randomized algorithm A, given only oracle access to \sigma, is a hitter for \sigma if with probability at least 1 - \delta, it needs to sample at most $m$ points from \{0, 1\}^n before it “hits S,” that is, it outputs an element from S. We can construct hitters, using random walks on expanders, which takes m = O(\epsilon^{-1} log(1/\delta)) samples and uses only n + O(1/\delta) random bits. In contrast, if we sample uniformly at random, it requires $mn$ random bits. See the appendix of Goldreich’s survey for details.

Deviation Inequalities

Markov’s Inequality. Any nonnegative X with mean \mu satisfies \Pr[X \geq \alpha \mu ] \leq 1/\alpha for any positive \alpha.

Chebyshev’s Inequality. Any random variable X with mean \mu and variance \sigma^2 satisfies \Pr[ \vert X -\mu \vert > \alpha \sigma] \leq 1/\alpha^2 for any positive \alpha.

Note: While Chernoff-Hoeffding inequality requires mutual independence of all random variables, it suffices to have pairwise independence for Chebyshev’s inequality. See Nick Harvey’s notes here.

Hoeffding’s Tail Inequality, Additive Deviation, Symmetric. The probability that the sum of n independent and identically distributed [a,b]-bounded random variables deviates at least \lambda away from its expectation is no more than \displaystyle \exp\left( - 2 \lambda^2 / n(a-b)^2\right).

Chernoff Bound, Multiplicative Deviation, Asymmetric. Let S be the sum of many independent Bernoulli trials. Let \mu = E[S] and let \delta be any positive number between zero and one. The probability that S is \delta \mu below the mean is at most \exp(-\delta^2\mu/2). However, The probability that S is \delta\mu above the mean is at most \exp(-\delta^2\mu/3).

(Hoeffding-like statement.) Equivalently, the probability that S is \lambda below the mean is at most \exp(-\lambda^2/2\mu). However, The probability that S is \lambda above the mean is at most \exp(-\lambda^2/3\mu).

Note. When \lambda is large, the normal approximation of the binomial distribution is less accurate for the positive tail. See the appendix of Alon-Spencer for a commentary.


Above are the plots of the Chernoff-tail and the Hoeffding-tail for both 0-1 and \pm 1 random variables. The plotted quantity is the bound for the probability that the sum deviates $t n$ away from the expectation, for t \in [-0.1, 0.1] and n=1000.

Rudelson-Vershynin Theorem on the Operator Norm of Random Projections.  Let x_1, \cdots, x_n be n identical copies of a random vector x \in \mathbf{R}^d with \mathbf{E} x^Tx = I and magnitude \Vert x \Vert_2 at most R. Let \mu := (\sum_i{x_ix_i^T})/n. For any \lambda \in(0,1),

\displaystyle \Pr\left[  \left\Vert \mu - I \right\Vert_2 > \lambda \right] < 2d \exp \left( -n\lambda^2/4R^2 \right).

Note: We can think xx^T as a “scaled” projection operator along x multiplied by 1/\Vert x \Vert_2. Then \mu acts as the mean of these projection operators. The theorem says that the largest distortion inflicted by \mu on any vector is no more than \lambda with high probability. Notice also the similarity with Hoeffding’s tail inequality, which would give the exponent as -2(n\lambda)^2/nR^2=-2n\lambda^2/R^2 for one-dimensional case. We lost a factor of 8 somewhere.

Azuma-Hoeffding Inequality for Martingales. Let X_0, \cdots, X_n be real-valued random variables with the martingale property i.e., \mathbf{E} \left[ X_{t+1} \mid X_0, \cdots, X_t \right] \leq X_t and the bounded-difference property i.e., \vert X_t-X_{t-1} \vert \leq c for some absolute constant c. Then, the probability that the difference X_n - X_0 is more than \lambda is at most \displaystyle \exp \left( -\lambda^2/2nc^2 \right).

Complexity Theory

Goldreich-Levin Theorem. Every injective one-way function f has a corresponding boolean function b which is hard to predict; predicting b is effectively inverting f, which is unlikely when f is an injective one-way function.

Lipton’s Random Self-Reducibility Theorem. A problem P is called “random self-reducible” if you can solve P on an arbitrary input by evaluating P on a sequence of uniformly random inputs. Suppose you have a black-box to solve P on a random input with high probability. Then you can construct a randomized algorithm that can solve P on every input with high probability.

Yao’s Min-Max Lemma. You and I are playing a two-player game. You are the input player and I am the algorithm player. Your move is to select a distribution \mathcal{D} over inputs \mathcal{X} = \{x\}. My move is to select a distribution \mathcal{R} over algorithms \mathcal{A} = \{A\}. Both \mathcal{A}, \mathcal{X} have finite size. Your goal is to make cost(A,x) high, while my goal is to make it as low as possible. The best value for you is

\displaystyle \max_{\mathcal{D}} \min_{A \in \mathcal{A}} \underset{x \sim \mathcal{D}}{\mathbb{E} } cost(A, x) ,

while the best value for me is

\displaystyle \min_{ \mathcal{R} } \max_{x \in \mathcal{X}} \underset{A \sim \mathcal{R} }{\mathbb{E} } cost(A, x).

Yao’s min-max lemma says that the two values are equal.


BPP \subseteq P/poly.

Proof: Suppose there is an algorithm A^\prime which decides a language in BPP using a random string r of length poly(n). We can construct a majority-vote algorithm A which uses a random string R of length $latex poly(n)$ but the probability that it errs on a given input string of length n is at most 1/e^n, thanks to the Chernoff bound. For a fixed R, the probability that A(R,.) is incorrect on some input is at most 2^n/e^n < 1. Then by the probabilistic method, there exists a random string R^* so that A(x, R^*) is correct on every input.


Shamir’s (t,n)-Threshold Secret Sharing. Suppose you have a secret \sigma. If you have a degree-(t-1) polynomial P(x) = \sum_i{a_i x^i} over any field F, you can set a_0=\sigma and create a set of n arbitrary evaluations of this polynomial in F. Any t of these evaluations can reconstruct P via Lagrange interpolation and thus recover \sigma = a_0. Any smaller subset of evaluations cannot reconstruct P, and thus get no information about the secret \sigma.

Note. See the similarity with the Reed-Solomon code.

Feldman’s Verifiable (t,n)-Secret Sharing. Suppose you want to share a secret \sigma \in \mathbb{Z}_p^*. Initialization: Make p, g public where g generates the field \mathbb{Z}_p^*. Share generation: Like Shamir’s secrete sharing scheme, choose a (random) polynomial P(x) = \sum_i{a_i x^i} of degree t-1, set a_0=\sigma, the secret, and distribute shares \sigma_i = P(i) \bmod p to each user i \in [n]. Commitments: Create commitments c_i = g^{a_i} to each coefficient of P(x), and make them public. Verification: The user i can verify whether

g^{P(i)} \overset{?}{=}\prod_j{c_j^{i^j}}\, because

g^{P(i)} = g^{\sum_j{a_j i^j}} = \prod_j{g^{a_j i^j}} = \prod_j{c_j^{i^j}}.

Reconstruction: Lagrange interpolation from any t shares.

Bit Commitment from One-Way Permutations. Suppose you want to commit to a bit b. Initialization: Let h be a hard-core predicate for a strong one-way permutation f. Commit: You sample a random string r and send \big( f(r), h(r) \oplus b \big) to the receiver. Reveal: You send b and r.

This scheme is “computationally hiding” because to predict b one has to predict the hard-core h(r) using f(r), which is “difficult” by definition. This is “perfectly binding” because there can be no r^\prime such that f(r) = f(r^\prime).

Bit Commitment from One-Way Functions. Suppose you want to commit to a bit b. Initialization: Let G be PRG. The receiver sends you a random string r. Commit: You sample a random n-bit string s and a random 3n-bit string  r^\prime. If b == 0 you send \big( G(s) \oplus r \big) to the receiver; otherwise, you send \big( r^\prime \oplus r \big). Reveal: You send b, r^\prime, s to the receiver.

Recall that one-way functions imply PRG. The definition of a PRG implies that this scheme is “computationally hiding.” To cheat, you will have to find a s^\prime such that G(s) = G(s^\prime). The number of  (s, s^\prime) pairs is 2^{2n} while the size of the PRG’s output space is 2^{3n}. Hence the probability that you’ll succeed is at most 2^{-n}. Hence this scheme is “perfectly binding.”

Pedersen’s Bit Commitment Scheme. Suppose a sender wants to commit to a bit b. Initialization: Everyone agrees on a prime q, another prime p = 2q + 1. Let g \in \mathbb{Z}_p^* be any element with ord(g) = q. The receiver sends a random y \in \mathbb{Z}_p^*Commit: The sender selects a random r \in \mathbb{Z}_q. The commitment is c = g^b y^r \bmod p. Open: The sender sends (r,b) to the receiver, and he verifies whether c \overset{=}{?} g^b y^r \bmod p.

The scheme is “hiding”; since y^r is random, c is also random regardless of b. The scheme is binding: to cheat, one has to find two different opening pairs, say (r_0, 0) and (r_1, 1), which open the same commitment c = g^{r_0} = g^{r_1}y \bmod p. If one can find such r_0, r_1, one has also solved the discrete log problem y = g^{r_0 - r_1} \bmod p. See here for more schemes. Here’s the original paper.

Chaum-Pedersen Discrete Log Proof-of-Knowledge Scheme. Setup: Fix a group G=\mathbf{Z}_q for some prime q.  Suppose you know some element \alpha  and two elements h_1, h_2 such that h_1 = g_1^\alpha and h_2 = g_2^\alpha for two independently chosen generators g_1, g_2. Publish g_1, h_1, g_2, h_2Goal: you wanto to convince someone about your knowledge of alpha without revealing what \alpha is.

Here’s what you do. Interaction: You select a random element w from F and give the verifier two elements a_i = g_i^w from F for i \in [2]. Then the verifier sends you a random element c as a “challenge”. You then reply with a “response” r = w - \alpha c \bmod q. Verification: The verifier then verifies whether a_i = g_i^rh_i^c = g_i^{r+\alpha c}.

RSA Encryption Scheme. Select two large primes p,q and compute N = pq. Let \phi(N) = ord(\mathbb{Z}_N^*) = (p-1)(q-1). Pick an arbitrary element e in the multiplicative group \mathbb{Z}_N^*. This means, e is coprime to and invertible modulo \phi(N). Let d be this inverse. (N,e) is the public key, d is the private key. Encryption: message m \bmod N is encrypted as c = m^e \bmod N. Decryption: c \mapsto c^d \bmod N = m^{ed} \bmod N = m \bmod N since ed = 1 \bmod \phi(N).

Decisional Diffie-Hellman (DDH) Problem . Given g, g^a, g^b, distinguish g^{ab} from random.

Computational Diffie-Hellman (CDH) Problem. Given g, g^a, g^b, compute g^{ab}.

Diffie-Hellman Key Exchange. You and I publicly agree on a prime p and a generator g of the group G=\mathbb{Z}_p^*. I randomly pick a private value a \in G and you randomly pick your private value b \in G.  I send you A = g^a \bmod p in public, while you send me B = g^b \bmod p in public. I construct S = B^a = g^{ab} while you construct A^b = g^{ab}=S. This S is our shared secret assuming the discrete log problem is hard.

ElGamal Encryption Scheme. First, you and I publicly agree on a prime p and a generator g of the group G=\mathbb{Z}_p^*. Then we construct a shared secret s via the Diffie-Hellman key exchange protocol. You encrypt a message m \in G as c = m\cdot s. Since G is cyclic, I can compute the inverse of s, call it s^{-1}. Finally, I can decrypt your cyphertext as c \mapsto cs^{-1} = mss^{-1} = m.

ElGamal Digital Signature Scheme.Initialization: Select a large prime p and a generator g of the group G = \mathbb{Z}_p^*. Select a random secret element x \in \{2, \cdots, p-3 \} and set y := g^x. Let H be a cryptographically secure hash function. Signning a message m: randomly select k \in \{2,\cdots, p-2\} such that k is invertible modulo p-1. Let r:=g^k \bmod p. Compute an s such that H(m) \equiv xr + ks \bmod p-1. If s=0, start over. The tuple (r,s) is the signature of m. Verification: Check that 0 < r < p-1, 1 < s < p-2 and r^sy^r \equiv g^{H(m)} \bmod p-1. For a correct signature, the LHS is r^sy^r = g^{ks}g^{xr} = g^{H(m)}.

Hard-Core Predicate. An easy-to-compute boolean predicate b : \{0,1\}^n \rightarrow \{0,1\} is a hard-core for a function f if no efficient algorithm can predict b(x) with non-negligible success even if they know f(x).

Weak One-Way Functions Imply Storng Ones. If f is a weak one-way function, then g(x,y) = \big( f(x), f(y) \big) is a strong one-way function.

Every Strong One-Way Function has a Hard-Core Predicate. The Boolean inner product x \odot r is a hard-core predicate of the function g(x,r) := \big( f(x), r \big) where f is a strong one-way function and r is arbitrary.

Intuitively, you have to invert f to predict the inner product.


Newton’s Method. Suppose you want to find a root of a locally-convex function f(x) in the vicinity of the initial point x_0. The root will be approached in a sequence of points x_0, x_1, x_2, \cdots. Suppose you already know x_n. You make a linear approximation of f(x) at x = x_n as follows: y = f(x_n) + (x_{n+1} - x_n) f^\prime(x_n). Because you want to find a root of f, you set y = 0 and solve for

\displaystyle x_{n+1} = x_{n} - \frac{f(x_n)}{f^\prime(x_n)} .

Because you are doing a linear approximation, the error term decreases quadratically as you approach the true root x = \alpha. In particular, Taylor expansion of f around x_n tells us that the approximation error is (\alpha - x_n)^2 f^{\prime \prime}(x_n)/2! + O((\alpha - x_n)^3)




Taylor Series of a real function f(x) around the point a.

\displaystyle f(x) = \sum_{k=0}^{\infty}  f^{(k)}(a) (x-a)^k / k! .

\displaystyle \log (1-x) = -x -x^2/2-x^3/3-\cdots
\displaystyle \log(1+x) = x - x^2/2 + x^3/3 + \cdots


(1-x)^{1-x} > \exp(-x+x^2/2) when x \in (0, 1)

The exponential function is steep above zero and flat below zero.

e^x \geq 1+x for x\in \mathbb{R}.

Concave functions are subadditive. Suppose f(x) is concave and \alpha > 0. Then

f(\alpha x) \geq \alpha x ,
f(a)+f(b) \geq f(a+b) .