# One-Liners

(For me only)  Edit

## Number Theory

Chinese Remainder Theorem. Given a (pairwise co)prime factorization of an integer $M=m_1m_2$, there is a bijection between $\mathbb{Z}_M$ and the Cartesian product $U_1 \times U_2$ where $U_i$ is the set of units modulo $m_i$ i.e., the set of integers $z$ that are coprime to $m_i$. These are also the invertible elements modulo $m_i$.

More. This bijection takes an element $m \bmod M$ to the tuple $(m \bmod m_1, m\bmod m_2)$. Moreover, given a tuple $(a_1 \bmod m_1, \cdots, a_n \bmod m_2)$, the corresponding element in $\mathbb{Z}_M$ is $\sum{a_i N_i H_i} \bmod M$ where  $N_i = (\prod{m_j})/m_i$ and $H_i$ is the multiplicative inverse of $N_i$ modulo $m_i$.

Fermat’s Integer Factorization of $N$. Try to find two integers $a,b$ such that $a \neq \pm b \bmod N$ and $a^2 \equiv b^2 \bmod N$. Then $N$ must divide the product $(a+b)(a-b)$. Consequently, the gcd of $N$ and at least one of these factors must be non-trivial. This gcd divides $N$.

## Algebra

Finite Abelian Groups. Every finite Abelian group $G$ can be writen as a direct product

$\displaystyle G \cong \bigoplus_{i=1}^t \mathbb{Z}_{q_i} \cong \bigoplus_{i=1}^t \mathbb{Z}_{k_i}$

where $0 < t < \infty$ and $q_i$ are powers of (possibly repeating) primes and $k_1$ divides $k_2$, $k_2$ divides $k_3$ and so on.

Cyclic Groups. These are abelian groups which are generated by a single element. $\mathbb{Z}_p$ is cyclic. For every positive divisor $d$ of $n$, the quotient group $G_n := \mathbb{Z}/(n)$ has precisely one subgroup of order $d$, the one generated by the residue class of $n/d$. The number of generators of $G_n$ equals the number of elements coprime to $n$. This number is given by Euler’s totient function $\phi(n)$. The order of the element $d$ in $G_n$ is $n/gcd(d, n)$. The direct product of two cyclic groups $G_n$ and $G_m$ is cyclic if and only if $n$ and $m$ are coprime.

Every finite abelian group can be expressed as “the” direct sum of cyclic subgroups of prime-power order, and these orders are uniquely determined.

Pontryagin Dual. Let $G$ be an additive finite Abelian group and let $T$ be the unit torus with multipliciation as the group operation. Let $\chi : G \rightarrow T$ be a group homomorphism defined as $\chi(0) = 1, \chi(x+y) = \chi(x) \chi(y)$. Then, the set $\hat{G} = \{\chi\}$ forms an Abelian group with the group operation being “point-wise multiplication”

$(\chi \cdot \rho)(x) = \chi(x) \rho(x)\,, \qquad x \in G$ ,

and the identity operation being the constant function

$1_G: x \mapsto 1\, , \qquad x \in G$ .

$\hat{G}$ is called the Pontryagin dual of $G$, and the character $1_G$ is called the trivial character.

## Coding Theory

Reed-Muller Code. The order-$r$ Reed-Muller code $R(r,m)$ is the set of Boolean polynomials on $m$ variables and degree at most $r$.

Reed-Solomon Code. In the Reed-Solomon code $RS_{q, \mathcal{E} }(n,k)$, a degree-$(k-1)$  polynomial $m(x)$ over $GF(q)$ — the message — is encoded into a degree-$(n-1)$ polynomial $c(x)$ over $GF(q)$ — the codeword — where the $n$ coefficients $c_0, \cdots, c_{n-1}$ of the codeword are obtained by evaluating $m(x)$ on a fixed set $\mathcal{E} = \{ \alpha_0, \cdots, \alpha_{n-1} \}$.

Since $m(x)$ can have at most $k-1$ zeros at any field, at most $k-1$ of the $c_j$s can be zero. This means the minimum distance $d$ is at least $n - (k-1)$ which matches the singleton bound. By definition, the Reed-Solomon code is MDS.

Singleton Bound. For a linear $[n,k,d]$ code, the distance $d$ is at most $n-k+1$. Any code matching this bound is called MDS (maximum distance separable).

Plotkin Bound. If the distance $d$ of a code is strictly larger than half the block length $n$, that is, if $d > n/2 + \alpha$, the number of codewords is at most $\displaystyle \lfloor d/ \alpha \rfloor$.

Johnson Bound. Consider a code whose relative distance is “large” i.e., at least $1/2 - \epsilon$. Then, for every codeword and every “large” $\alpha > \sqrt{\epsilon}$, only a “small” number of strings are “close” to a codeword. In particular, for any codeword $x$, the number of strings (in the target space) that are within a relative distance $1/2 - \alpha$ from x is at most $1/2\alpha^2$.

Gilbert-Vershamov Bound. For every $\delta < 1/2$ and all sufficiently large $n$, there exists a code with relative distance $\delta$ on $n$ bit strings with block length $n/(1-H(\delta)$. Here, $H$ is the binary entropy function.

Hadamard Matrix. Let $H_1 = \left( \begin{array}{cc} 1 & 1 \\ 1 & -1 \end{array}\right)$ . Then

$H := H_{2^n} = \left( \begin{array}{cc} H_{2^{n-1}} & H_{2^{n-1}} \\ H_{2^{n-1}} & -H_{2^{n-1}} \end{array}\right)$

is the Hadamard matrix of order $2^n$.

• The rows/columns of $H$ are indexed by $n$-bit Boolean strings. Let $\odot$ denote the inner-product modulo $2$. Then

$H(x,y) = (-1)^{x \odot y}$.

• The map $H : \{0,1\}^n \mapsto \{0,1\}^{2^n}$ is a $[2^n, n, 2^{n-1}]$ linear code.
• The volume enclosed by the (unit-length) columns of $H$ (i.e., the determinant) is the largest possible for any real matrix of the same order.

Discrete Fourier Transform. See here.

## Probability/Combinatorics

Expectation. Suppose a random variable $X$ takes its value from the domain $\chi$.

$\displaystyle E[X] = \sum_{x \in \chi}{x \Pr[X = x]}$ .

Additionally, if $X \geq 0$,

$\displaystyle E[X] = \sum_{x \in \chi}{ \Pr[X \geq x]}$ .

Variance. In general,

$\displaystyle Var(X) = E[X^2] - E[X]^2$ .

If $X_1, \cdots, X_n$ are independent, then

$\displaystyle Var(\prod_i X_i) = \prod_i E[X_i^2] - \prod_i E[X_i]^2$ .

The Birthday Paradox. Suppose $k = \Omega(\sqrt{n})$. When you randomly draw $k$ times from an $n$-set with replacement, the resulting $k$-subset will contain a repeated element with constant probability.

Explanation. Suppose we choose $k$ independent samples $X=\{ x_1, \cdots, x_k \}$ from the numbers $1, \cdots, n$. For any $i\neq j$, the probability that $x_i=x_j$ is $1/n$. The expected number of collisions in $X$ is at most ${k \choose 2}/n$. This is a constant whenever $k = \Omega(\sqrt{n})$.

Coupon Collector’s Problem. Suppose you want to color all items in a set $S$ of size $n$ as follows: at round $1, 2, 3, \cdots$, you pick a random element from $S$; if it is still uncolored, you color it red, put it back, and repeat. This process will require $\Theta(n \log n)$ rounds to color all items.

Balls in Bins. If $n$ balls are thrown into $n$ bins independently and uniformly at random, the “heaviest” bin contains at most $\displaystyle \frac{\ln n}{\ln \ln n} \big(1+o(1) \big)$ balls with high probability. However, if the balls are placed sequentially—each into the “lightest” among $d$ bins sampled uniformly at random—then the largest load is at most $\displaystyle \frac{\ln \ln n}{\ln d} + \Theta(1)$. Thus even if $d=2$, the bins are highly balanced in the sequential case.

Poisson Distribution. Let $X \in \{0, 1, 2, \cdots\}$ be a discrete random variable. If it has the Poisson distribution $P(\mu)$, its probability mass function at point $k$ is the probability that $X$ is $k$ given $\mathbf{E}\,X = \mu$. That is,

$f(k;\mu) := = \frac{\mu^k }{k! e^\mu}.$

The expectation and variance of $X$ is $\mu$. The expected deviation of $X$ from its mean is $f( \lfloor \mu \rfloor; \mu)$. A binomial distribution $B(n,p)$ tends to a Poisson distribution when $p \rightarrow 0$.

Note: The events $X = a$ and $X = b$ must be independent.

Hypergeometric Distribution. This is the binomial distribution without replacement. In particular, let $h(k;N,K)$ be the probability that exactly $k$ “good” items are obtained in $n$ sample-without-replacement trials from a population of $N$ items containing exactly $K$ “good” items in total.

$\displaystyle h(k;N,K) = \frac{ {K \choose k} {N-K \choose n-k} }{ {N \choose n} }$ .

The mean of this distribution is, surprisingly, $nK/N$, which equals $\mathbf{E}_\mathrm{binomial}$. The variance is

$\displaystyle nK/N (N-K)/N \cdot (N-n)/(N-1) = \mathbf{Var}_\mathrm{binomial} \cdot (1- o(1) )$ if $n = o(N)$.

Probability Generating Functions. Let $\mathsf A(X) = \sum_{t=0}^\infty{a_t X^t}$ be a formal power series with $a_t \geq 0$. If $\mathsf{A}(1) = \sum_t{a_t} = 1$ then $\mathsf{A}$ “generates” the probability distribution $A(t) := \Pr[A = t] = a_t$. Suppose $\mathsf{B}$ generates the distribution $B(t)$. It is easy to see that the convolution $A \ast B$ generates the probability distribution $C = A + B$ whose mass function is

$\displaystyle C(t) := \sum_{t = x + y}{A(x) + B(y)}$.

Composition of Generating Functions. Let $a_n$ and $b_n$ denote the number of ways to build two structures, respectively of types 1 and 2, on a $n$-element set where $a_0 = 0$ and $b_0 = 1$. Let $c_n$ be the number of ways to divide the set $[n]$ into an unspecified number of intervals, build a type-1 structure on each interval, then build a type-2 structure on the set of intervals. Set $c_0 = 1$. Denote by $\mathsf{A}(x), \mathsf{B}(x),\mathsf{C}(x)$ the generating functions of the sequences $\{a_n\}, \{b_n\}, \{c_n\}$. Then $\mathsf{C}(x) = \mathsf{B}\left(\mathsf{A}(x) \right).$

The Gambler’s Ruin Problem. See here.

The Unbiased Random Walk. See here.

Sum of first binomial coefficients. Suppose $k \leq n/2$. Then,

$\displaystyle \sum_{i=0}^k {n \choose i} \leq {n \choose k} \frac{n - k+1}{n - 2k + 1}$ .

This bound is due to Michael Lugo. (Aside: This implies, if $k \leq n/3$, the sum is at most ${n \choose k}$ .)

The above bound is within a factor of $1.5$ as long as $k \leq 0.49 n$ and $n \approx 5000$. When $k \rightarrow n/2$, this multiplicative factor increases.

The following bound is due to Lovasz. It works better when $k \rightarrow n/2$; it stays within a factor of $2$ for $n \approx 5000$.

$\displaystyle \sum_{i=0}^{k-1} {n \choose i} \leq 2^{n-1} \exp\left(-\frac{(n - 2k - 2)^2}{4(n - k - 1)} \right)$ .

Shearer’s Inequality. Let $S_i \subseteq [N], i \leq M$ be coordinate-sets such that each coordinate $i \in [N]$ is covered by at least $r$ different $S_i$s. Let $X = (X_1, \cdots, X_N)$ be a collection of independent random variables. The entropy of $X$ is at most $1/r$ times the sum of the entropies of its restriction to each coordinate set. That is,

$\displaystyle H(X_1, \cdots, X_N) \leq \frac{1}{r} \sum_{i=1}^M{ H(X\rvert_{S_i}})$ .

Total Variation Distance of two distributions $f, g$ is their largest difference over all subsets of the domain $X$. That is,

$\displaystyle \Vert f - g \Vert_{tv} = \sup_{S \subseteq X} \vert f(S) - g(S) \vert$.

Moreover,

$\displaystyle \Vert f - g \Vert_{tv} = (1/2) \Vert f - g \Vert_1 = \sum_{x \in X}{\vert f(x) - g(x) \vert}$.

See here for a proof.

Pinsker’s Inequality. The Kullback-Leibler divergence of two distributions is at least twice the square of their total-variaion distance. Namely,

$\displaystyle D_\text{KL}(f \Vert g) \geq 2\Vert f - g \Vert_{tv}^2$.

Kullback-Leibler Divergence. Given a probability distribution $p$ on a discrete sample space $X$, let $s(i ; p) := \log \frac{1}{p(i)}$ denote the “amount of surprise” associated with the event $i$ according to the distribution $p$. The Entropy is the expected amount of surprise, namely

$\displaystyle H(p) = \sum_{i \in X}{p(i) s(i; p)}$.

The KL Divergence of $q$ with respect to $p$ is

• the expected amount of surprise “remaining” in $q$ after revealing $p$.
• the entropy of $q$ relative to $p$.
• the expectation (in $p$) of the pointwise-difference of the log-probabilities.

$\displaystyle D_\text{KL}(p \Vert q) = \sum_i{ p(i) \left(s(i;q) - s(i;p) \right) } = \sum_i{ p(i) \log \frac{p(i)}{q(i)}}$.

Markov Chains.

• A Markov chain is aperiodic if it has self-loops. It is connected if every state is reachable. It is ergodic if it has a unique stationary distribution $\pi$, which happens if and only if the chain is aperiodic and connected. Let $P = (p_{ij})$ be the transition probability matrix of the chain. The chain is reversible if $\pi_i p_{ij} = \pi_j p_{ji}$ for all $i,j$. A random walk is lazy if it does not move, with probability $1/2$. A lazy, connected random walk is ergodic.
• The mixing time $\tau$ of a random walk with an initial distribution $x$ is the number of steps $t$ it takes so that the distribution after $t$ steps is very close to its stationary distribution.
• An ergodic chain mixes rapidly: In particular, if $\lambda = \max{\lambda_2(P), \lambda_n(P)}$, then

$\displaystyle \vert P^t x - \pi \Vert_1 \leq \lambda^t \Vert x \Vert_2$.

• The mixing time in the above case is $\displaystyle O( \frac{\log \min \frac{1}{\pi_i} }{\lambda})$. For a $d$-regular graph, this simplifies to $O((\log n) / \lambda)$.

See here and here for nice expositions.

## Graph Theory

Independent Set and Vertex Cover. A largest independent set is the complement to a minimum vertex cover. The vertex cover problem is the dual of the maximum matching problem. In a bipartite graph, the size of the maximum matching equals the size of the minimum vertex cover (Konig’s theorem).

Crossing Number. $cr(G)$ is the minimum number of edge crossing in a plane drawing of a graph $G$. If $m > 7n$, then $cr(G) \geq m^3/39n^2$. This is the crossing number inequality.

Szemeredi-Trotter Theorem. Suppose we have $n$ points $P$ and $m$ lines $L$ in a two-dimensional plane. Consider any bipartite graph $G=(P,L,E)$ where an edge $e =(p,\ell)$ implies that the point $p$ is on line $\ell$. Then

$\displaystyle \vert E \vert = O\big( ( m^2n^2 )^{1/3} + m + n \big)$ .

## Algebraic Graph Theory

Expander Mixing Lemma. Let $G$ be a $d$-regular expander graph on $n$ vertices and $m$ edges. Suppose the second-largest eigenvalue of its adjacency matrix is at most $\lambda$. For any pair of vertex-sets $A,B$, let $E_{A,B}$ be the set of “cross edges” between $A$ and $B$. Let $\#A$ denote the cardinality of the set $A$. Then, for all $A, B \subset V$,

$\displaystyle \left \vert \# E_{A,B} - \frac{d}{n} \cdot \# A \#B \right \vert \leq \lambda \sqrt{\# A \cdot \# B} < \lambda n\,.$

In the left-hand side, the first term is the number of cross edges. The second term is the expected number of cross edges in a random graph with edge-probability $d/n$. The left-hand side thus measures how much $G$ deviates from this expectation. The expander mixing lemma says that if $\lambda$ is small, $G$ will be nearly random in this sense.

Random Walk on an Expander. Suppose $G$ is a $d$-regular expander whose second-largest eigenvalue is at most $\lambda$ in magnitude. Let $W$ be a set containing a $\rho$ fraction of vertices. The fraction of random walks of length $\ell$, on the vertices of $G$, that stay entirely within $W$, is small; in particular, it is at most

$\displaystyle \rho \left( \rho + (1 - \rho) \frac{\lambda}{d} \right)^\ell\,$.

Cheeger Inequality. Suppose $G$ is an $(n,d,\lambda_2)$-expander i.e., a $d$-regular graph on $n$ vertices with the second-largest eigenvalue of its adjacency matrix being $\lambda_2$. Since the largest eigenvalue is $d$, the quantity $d-\lambda_2$ is the spectral gap. The boundary of a vertex set $S$ is denoted by $\partial S = E_{S, \bar{S}}$. The edge expansion of $G$ is defined as

$\displaystyle h(G) := \min_{S \subset V, \vert S \vert \leq n/2}\frac{\vert \partial S \vert}{\vert S \vert}$.

Then,

$\displaystyle \frac{d-\lambda_2}{2} \leq h(G) \leq \sqrt{ 2d (d-\lambda_2) }$.

Equivalently,

$\displaystyle \frac{h(G)^2}{2d} \leq d- \lambda_2 \leq 2 h(G)$.

## Linear Algebra

A complex matrix $A$ is unitary if $A^* = A^{-1}$. A real, unitary matrix is orthogonal. $A$ is Hermitian if $A^* = A$. It is normal if $A^*A = AA^*$. A Hermitian matrix is Positive Semidefinite if all its eigenvalues are non-negative.

Cramer’s Rule. Suppose we want to solve a matrix-vector equation $Ax = b$. Let $A_i$ denotes the matrix obtained by replacing the $i$th column of the matrix $A$ by the $b$. Then,

$\displaystyle x_i = \frac{\text{det}(A_i)}{\text{det} (A)}\, .$

Spectral Decomposition.  A matrix $A$ is Hermitian i.e., $A = A^*$, if and only if $A$ has an orthonormal set of eigenvectors with real eigenvalues i.e., $A = \sum{\lambda_i v_i v_i^*}$ where $(v_i, \lambda_i)$ are an eigenvector-eigenvalue pair.

See here for a proof of the Hermitian case: orthogonal eigenvectors and real eigenvalues.

Normal Matrices. A matrix $A$ is normal if it commutes with its conjugate transpose, i.e., $A A^* = A^* A$. This happens if and only if $A$ is unitarily diagonalizable, i.e., there exists a diagonal $D$ and unitary $U$ such that $A = U D U^*$.

A block upper-triangular matrix is normal if and only if all on-diagonal blocks are normal and all off-diagonal blocks are zero.

Polar Decomposition. Every matrix $A$ can be written as $A = UR = LU$ where $U$ is unitary, $R,L$ are positive semidefinite, and $L = \sqrt{AA^*}, R = \sqrt{A^* A}$. Moreover, if $A$ is invertible then $U$ is unique.

Singular Value Decomposition. Every square matrix $A$ can be written as $A = U D V$ where $U,V$ are unitary and $D$ has non-negative real diagonal entries.

Proof: By polar decomposition, $A = LR$ where $L$ is unitary and $R$ is positive semidefinite. By spectral theorem, $R = T D T^*$ for some unitary $T$ and real diagonal $D$. Moreover, since $R$ is PSD, its eigenvalues are non-negative; hence $D$ has non-negative real entries. We set $U := LT, V := T^*$.

Cholesky Factorization. Every positive semidefinite complex matrix $A$ can be factored into $A = L L^*$ where $L$ is lower-trialgular with real, positive diagonal entries.

Schur Decomposition. Any complex square matrix $A$ can be written as $A = U T U^*$ where $U$ is unitary and $T$ is upper triangular.

Square Root. Every positive semidefinite matrix $A$ (with bounded largest eigenvalue) can be factored into $A = B^* B$ where $B$ is also positive semidefinite.

Hilbert-Schmidt Norm and Inner Product.

Let $a_i$ denote the $i$-th column of the matrix $A$. Then,

$\displaystyle \langle A, B \rangle_{HS} := \mathrm{tr}(A^* B) = \sum_{i}{\langle a_i, b_i \rangle}$.

$\displaystyle \Vert A \Vert_{HS}^2 = \Vert A \Vert_{F}^2 := \mathrm{tr}(A^*A)$.

Trace.

Trace of a matrix is the sum of its eigenvalues.

$\displaystyle \mathrm{tr}(AB) = \mathrm{tr}(BA)$.

$\displaystyle \mathrm{tr}(A x x^*) = \mathrm{tr}(x^* A x) = \langle x \vert A \vert x \rangle$.

Projection Matrices. Any projection matrix $P$ can be written as $P = \sum{x x^*}$ where $\{x\}$ is a basis of the target subspace. $P$ has eigenvalues $\{0,1\}$. $P$ is always Hermitian i.e. $P^* = P$, as evidenced by the sum formulation. $P$ satisfies $P^2 = P, P^*P = P$.

Frobenius Norm. A generalization of the usual $\ell_2$ norm for matrices.

$\displaystyle \Vert A \Vert_F^2 = \sum_{a_{ij}^2} = \mathrm{tr}(A^*A)$.

This norm, like the $\ell_2$ norm, is invariant under rotation.

Tensor Inner Product. The tensor inner product on the space $V \otimes W$ is defined as follows:

$\displaystyle \left\langle \sum_i{\alpha_i v_i \otimes w_i} \bigg\vert \sum_j{\beta_j v^\prime_j \otimes w^\prime_j} \right\rangle \quad := \quad \sum_{ij}{ \alpha_i^* \beta_j \langle v_i \vert v_j \rangle \langle w_i \vert w^\prime_j \rangle }$ ,

where $\alpha_i, \beta_j$ are scalars and $v_i, v^\prime_j \in V$, and $w_i, w^\prime_j \in W$.

Conjugate Gradient Method (for Solving Symmetric Linear Systems). Suppose we would like to solve $Ax = b$ for $x$ where $A$ is a real symmetric Positive Semidefinite matrix. The matrix $A$ defines the inner product $\langle x, y \rangle_A = \langle x \vert A \vert y \rangle$. We can find a set of basis vectors $P = \{p_1, \cdots, p_n\}$ spanning $\mathbb{R}^n$ that are mutually orthogonal according to this inner product. Then, the solution $x^*$ can be written as a linear combination $\{ \alpha_i \}$ of these basis vectors.

In practice, the vectors $\{p_i\}$ and corresponding $\alpha_i$ can be found using a process similar to the Gram-Schmidt orthonormalization. Any subset of these basis vectors gives an approximation of $x^*$.

Facts about the Matrix Exponential. If $X,Y$ commute, then $e^X e^Y = e^{X+Y}$. In addition, if $X,Y$ commute then $e^X, e^Y$ commute, as do $\log X, \log Y$.

Moreover, for any invertible $Y$, $\displaystyle e^{Y X Y^{-1}} = Y e^X Y^{-1}$. Next, $det(e^X) = e^{tr(X)}$. If $D=diag(a,b,\cdots)$ then $e^D = diag(e^a, e^b, \cdots)$.

Golden-Thompson inequality. For any square matrices $X,Y$,

$\displaystyle tr(e^{X+Y}) \leq tr(e^X e^Y) = tr(\sqrt{e^X} e^Y )\sqrt{e^X}$.

Lieb’s Theorem. Let $H$ be Hermitian and $A$ positive definite. The function

$\displaystyle A \mapsto \mathrm{tr} \exp \big[H + \log A \big]$

is concave.

Corollary (Tropp). Let $H$ be fixed, $X$ be random, both Hermitian. Then

$\displaystyle \mathbb{E} \mathrm{tr} \exp(H+X) \leq \mathrm{tr} \exp(H + \log \left[ \mathbb{E} e^X \right])$.

See Joel Tropp’s paper (Section 3.4).

## Randomized Algorithms

Generating Pairwise Independent Random Variables. Choose a field $F$ of size $2^n$. Sample two uniformly random elements $x_1, x_2 \in F$ using $2n$ uniformly random bits. Then the following $2^n$ random variables are uniform over $F$ and pairwise independent:

$\displaystyle Y_i := x_1 + i \cdot x_2 \qquad \forall i \in \{0, \cdots, 2^n - 1\}$

See here for details.

Pairwise Independent Hash Functions. Let $H$ be a family of functions, each having $n$ input bits and $k$ output bits. Suppose the following holds: that if we pick $h \in H$ at random, the probability that it maps two arbitrary but distinct input strings to the same output string is exactly $1/2^{2k}$. Then $H$ is a family of pairwise independent hash functions. In other words,

$\forall x,x^\prime \in \{0,1\}^n, y, y^\prime \in \{0,1\}^k \forall h \in_R H \Pr[h(x)=y \wedge h(x^\prime) = y^\prime]$

$= \Pr[h(x)=y] \Pr[h(x^\prime) = y^\prime] = \frac{1}{\vert A \vert \vert B \vert} = 2^{-2k}$.

Equivalently, the probability that two arbitrary strings will collide under a random $h \in H$ is the square of the uniform probability on the output space. Intuitively, if we pick a random $h \in H$ then for arbitrary $x, y$ in the domain, the strings $h(x), h(y)$ are independent and uniformly distributed.

Valiant-Vazirani Lemma. Let $\mathcal{H}$ be a collection of pairwise independent hash functions from $\{0,1\}^n \rightarrow \{0,1\}^k$. Let $S \subseteq \{0,1\}^n$ such that $\vert S \vert \in [K/4,K/2]$ where $K = 2^k$. Then, with constant probability, the string $0^k$ has a unique preimage in $S$ under a randomly selected $h \in \mathcal{H}$.

Leftover Hash Lemma. Suppose an $n$-bit string $s$ has entropy at least $n-t$. The leftover hash lemma says that it is possible to extract $n-t$ bits from $s$ that will be almost uniformly distributed. The way to do it is to use the index $s$ to choose a member $h_s : \{0,1\}^n \rightarrow \{0,1\}^k$ from a pairwise-independent hash family $H$. If a string $x$ is randomly chosen from an arbitrary subset $W\subseteq \{0,1\}^n$, the distribution of the concatenated string $s \Vert h_s(x)$ will be at most $\epsilon$ away from the uniform distribution where $\epsilon = \sqrt{ \vert W \vert / 2^k}$.

Hitters. Fix a Boolean function $\sigma$ on $n$-bit inputs. Let $S$ be the set on inputs on which $\sigma$ outputs $1$, and suppose $\vert S \vert = \epsilon 2^n$. An efficient, randomized algorithm $A$, given only oracle access to $\sigma$, is a hitter for $\sigma$ if with probability at least $1 - \delta$, it needs to sample at most $m$ points from $\{0, 1\}^n$ before it “hits $S$,” that is, it outputs an element from $S$. We can construct hitters, using random walks on expanders, which takes $m = O(\epsilon^{-1} log(1/\delta))$ samples and uses only $n + O(1/\delta)$ random bits. In contrast, if we sample uniformly at random, it requires $mn$ random bits. See the appendix of Goldreich’s survey for details.

## Deviation/Moment Inequalities

Markov’s Inequality. Any nonnegative $X$ with mean $\mu$ satisfies $\Pr[X \geq \alpha \mu ] \leq 1/\alpha$ for any positive $\alpha$.

Chebyshev’s Inequality. Any random variable $X$ with mean $\mu$ and variance $\sigma^2$ satisfies $\Pr[ \vert X -\mu \vert > \alpha \sigma] \leq 1/\alpha^2$ for any positive $\alpha$.

Note: While Chernoff-Hoeffding inequality requires mutual independence of all random variables, it suffices to have pairwise independence for Chebyshev’s inequality. See Nick Harvey’s notes here.

Paley-Zygmund Inequality. Let $Z$ be a non-negative random variable with a finite variance. If $\theta \in [0,1]$ then

$\displaystyle \Pr\bigg[ Z > \theta \mathbb{E} Z \bigg] \geq (1-\theta)^2 \frac{(\mathbb{E} Z)^2}{\mathbb{E} Z^2}$.

Note: The Paley-Zygmund inequality complements the Chebyshev inequality. While the latter says that “right tail is thin,” the former says that “the left tail is thin.”

Hoeffding’s Tail Inequality, Additive Deviation, Symmetric. The probability that the sum of $n$ independent and identically distributed $[a,b]$-bounded random variables deviates at least $\lambda$ away from its expectation is no more than $\displaystyle \exp\left( - 2 \lambda^2 / n(a-b)^2\right)$.

Chernoff Bound, Multiplicative Deviation, Asymmetric. Let $S$ be the sum of many independent Bernoulli trials. Let $\mu = E[S]$ and let $\delta$ be any positive number between zero and one. The probability that $S$ is $\delta \mu$ below the mean is at most $\exp(-\delta^2\mu/2)$. However, The probability that $S$ is $\delta\mu$ above the mean is at most $\exp(-\delta^2\mu/3)$.

(Hoeffding-like statement.) Equivalently, the probability that $S$ is $\lambda$ below the mean is at most $\exp(-\lambda^2/2\mu)$. However, The probability that $S$ is $\lambda$ above the mean is at most $\exp(-\lambda^2/3\mu)$.

Note. When $\lambda$ is large, the normal approximation of the binomial distribution is less accurate for the positive tail. See the appendix of Alon-Spencer for a commentary.

Above are the plots of the Chernoff-tail and the Hoeffding-tail for both 0-1 and $\pm 1$ random variables. The plotted quantity is the bound for the probability that the sum deviates $t n$ away from the expectation, for $t \in [-0.1, 0.1]$ and $n=1000$.

Rudelson-Vershynin Theorem on the Operator Norm of Random Projections.  Let $x_1, \cdots, x_n$ be $n$ identical copies of a random vector $x \in \mathbf{R}^d$ with $\mathbf{E} x^Tx = I$ and magnitude $\Vert x \Vert_2$ at most $R$. Let $\mu := (\sum_i{x_ix_i^T})/n$. For any $\lambda \in(0,1)$,

$\displaystyle \Pr\left[ \left\Vert \mu - I \right\Vert_2 > \lambda \right] < 2d \exp \left( -n\lambda^2/4R^2 \right).$

Note: We can think $xx^T$ as a “scaled” projection operator along $x$ multiplied by $1/\Vert x \Vert_2$. Then $\mu$ acts as the mean of these projection operators. The theorem says that the largest distortion inflicted by $\mu$ on any vector is no more than $\lambda$ with high probability. Notice also the similarity with Hoeffding’s tail inequality, which would give the exponent as $-2(n\lambda)^2/nR^2=-2n\lambda^2/R^2$ for one-dimensional case. We lost a factor of 8 somewhere.

Azuma-Hoeffding Inequality for Martingales. Let $X_0, \cdots, X_n$ be real-valued random variables with the martingale property i.e., $\mathbf{E} \left[ X_{t+1} \mid X_0, \cdots, X_t \right] \leq X_t$ and the bounded-difference property i.e., $\vert X_t-X_{t-1} \vert \leq c$ for some absolute constant $c$. Then, the probability that the difference $X_n - X_0$ is more than $\lambda$ is at most $\displaystyle \exp \left( -\lambda^2/2nc^2 \right)$.

Khintchine Inequality. Let $\displaystyle \varepsilon$ be uniformly random in $\displaystyle \{\pm 1 \}^n$ and let $\displaystyle x \in \mathbb{C}^n$ be arbitrary. For any bounded $p > 0$, let $\Vert x \Vert_p^p := \sum_i{ \vert x_i \vert }^p$. Let $m(Z,p)$ be the $p$th moment of $Z$. Define $X := \vert \sum_i{\varepsilon_i x_i} \vert$.  Then,

$\displaystyle \bigg( m(X, p) \bigg)^{1/p} = \Theta\bigg( \Vert x \Vert_2 \bigg)$,

where the constants hidden by the $\Theta$ notation depend only on $p$.

Marcinkiewicz-Zygmund Inequality. Let $x_1, \cdots, x_n \in \mathbb{C}$ be independent random variables with zero mean and bounded $p$th moment for any $1 \leq p < \infty$. Let $m(Z,p)$ be the $p$th moment of $Z$.  Then,

$\displaystyle m\bigg(\sum_i{x_i}, p \bigg) = \Theta\bigg(m\left( \Vert x \Vert_2,p \right) \bigg)$ ,

where the constants hidden under the $\Theta$ notation depend only on $p$.

## Complexity Theory

Goldreich-Levin Theorem. Every injective one-way function $f$ has a corresponding boolean function $b$ which is hard to predict; predicting $b$ is effectively inverting $f$, which is unlikely when $f$ is an injective one-way function.

Lipton’s Random Self-Reducibility Theorem. A problem $P$ is called “random self-reducible” if you can solve $P$ on an arbitrary input by evaluating $P$ on a sequence of uniformly random inputs. Suppose you have a black-box to solve $P$ on a random input with high probability. Then you can construct a randomized algorithm that can solve $P$ on every input with high probability.

Yao’s Min-Max Lemma. You and I are playing a two-player game. You are the input player and I am the algorithm player. Your move is to select a distribution $\mathcal{D}$ over inputs $\mathcal{X} = \{x\}$. My move is to select a distribution $\mathcal{R}$ over algorithms $\mathcal{A} = \{A\}$. Both $\mathcal{A}, \mathcal{X}$ have finite size. Your goal is to make $cost(A,x)$ high, while my goal is to make it as low as possible. The best value for you is

$\displaystyle \max_{\mathcal{D}} \min_{A \in \mathcal{A}} \underset{x \sim \mathcal{D}}{\mathbb{E} } cost(A, x)$ ,

while the best value for me is

$\displaystyle \min_{ \mathcal{R} } \max_{x \in \mathcal{X}} \underset{A \sim \mathcal{R} }{\mathbb{E} } cost(A, x)$.

Yao’s min-max lemma says that the two values are equal.

BPP $\subseteq$ P/poly.

Proof: Suppose there is an algorithm $A^\prime$ which decides a language in BPP using a random string $r$ of length $poly(n)$. We can construct a majority-vote algorithm $A$ which uses a random string $R$ of length $latex poly(n)$ but the probability that it errs on a given input string of length $n$ is at most $1/e^n$, thanks to the Chernoff bound. For a fixed $R$, the probability that $A(R,.)$ is incorrect on some input is at most $2^n/e^n < 1$. Then by the probabilistic method, there exists a random string $R^*$ so that $A(x, R^*)$ is correct on every input.

Communication Complexity for Equality. Each part must communicate at least $\Omega(\sqrt{k})$ bits to compute a Boolean predicate $f(x,y) : x \overset{?}{=} y$, where $x, y \in \{0,1\}^k$.

SAT is reducible to 3SAT. It suffices to formulate a 3-CNF formula equivalent to a clause with four variables, say $C = (a \vee b \vee c \vee d)$. We introduce a dummy variable $e$ and write $C^\prime = (a \vee b \vee e) \wedge (c \vee d \vee \bar{e})$. Clearly, if there is an assignment which makes $C$ true, we can choose an assignment to the variable $e$ so that both clauses of $C^\prime$ are true. On the other hand, if an assignment makes $C$ false, then at least one clause of $C^\prime$ will be false regardless of which value we choose for $e$.

Every $n$-variable Boolean Function has a CNF Formula of size $n 2^n$. Consider an $n$-variable Boolean function $f(x), x \in \{0,1\}^n$. For every assignment $a \in \{0,1\}^n$, we construct an “indicator clause” $C_a$ that is false for $a$ but true for every other assignment. For example, if $a = 010$, the clause $C_x = (x_1 \vee \bar{x_2} \vee x_3)$. Then, the desired CNF formula $C$ is the AND of all the indicator clauses for all “bad assignments” $a$ which makes $f(a) = 0$. This formula outputs zero whenever $f(x) = 0$, and one otherwise. This uses at most $2^n$ clauses, each with $n-1$ OR symbols; these clauses are collected using $2^n - 1$ AND symbols. Thus the total number of symbols is at most $(n-1)2^n + (2^n - 1) = n 2^n - 1$.

## Cryptography

Shamir’s $(t,n)$-Threshold Secret Sharing. Suppose you have a secret $\sigma$. If you have a degree-$(t-1)$ polynomial $P(x) = \sum_i{a_i x^i}$ over any field $F$, you can set $a_0=\sigma$ and create a set of $n$ arbitrary evaluations of this polynomial in $F$. Any $t$ of these evaluations can reconstruct $P$ via Lagrange interpolation and thus recover $\sigma = a_0$. Any smaller subset of evaluations cannot reconstruct $P$, and thus get no information about the secret $\sigma$.

Note. See the similarity with the Reed-Solomon code.

Feldman’s Verifiable $(t,n)-$Secret Sharing. Suppose you want to share a secret $\sigma \in \mathbb{Z}_p^*$.
Initialization: Make $p, g$ public where $g$ generates the field $\mathbb{Z}_p^*$.

Share generation: Like Shamir’s secrete sharing scheme, choose a (random) polynomial $P(x) = \sum_j{a_j x^j}$ of degree $t-1$, set $a_0=\sigma$, the secret, and distribute shares $\sigma_i = P(i) \bmod p$ to each user $i \in [n]$.

Commitments: Create commitments $c_j = g^{a_j}$ to each coefficient of $P(x)$, and make them public.

Verification: The user $i$ can verify whether

$g^{P(i)} \overset{?}{=}\prod_j{c_j^{i^j}}\,$ because

$g^{P(i)} = g^{\sum_j{a_j i^j}} = \prod_j{g^{a_j i^j}} = \prod_j{c_j^{i^j}}.$

Reconstruction: Lagrange interpolation from any $t$ shares.

Discrete-Log Setting $(G,g,q,p)$. Let $g$ be a generator of a multiplicative group of prime order $q$. An easy way to do it is to select a prime $q$ such that $p = 2q+1$. Then, $\mathbb{Z}_p^*$ has a subgroup $G$ of order $q$, and every element of $G$ is a generator. In particular, we can take $g$ to be a random element in $G$.

Discrete-Log Assumption. Given $g$ and $g^a$, it is hard to find $a$.

Decisional Diffie-Hellman (DDH) Assumption. Given $g, g^a, g^b$, it is hard to distinguish $g^{ab}$ from random.

Computational Diffie-Hellman (CDH) Assumption. Given $g, g^a, g^b$, it is hard to compute $g^{ab}$.

Bit Commitment from One-Way Permutations. Suppose you want to commit to a bit $b$. Initialization: Let $h$ be a hard-core predicate for a strong one-way permutation $f$. Commit: You sample a random string $r$ and send $\big( f(r), h(r) \oplus b \big)$ to the receiver. Reveal: You send $b$ and $r$.

This scheme is “computationally hiding” because to predict $b$ one has to predict the hard-core $h(r)$ using $f(r)$, which is “difficult” by definition. This is “perfectly binding” because there can be no $r^\prime$ such that $f(r) = f(r^\prime)$.

Bit Commitment from One-Way Functions. Suppose you want to commit to a bit $b$.
Initialization: Let $G$ be PRG. The receiver sends you a random string $r$.

Commit: You sample a random $n$-bit string $s$ and a random $3n$-bit string  $r^\prime$. If $b == 0$ you send $\big( G(s) \oplus r \big)$ to the receiver; otherwise, you send $\big( r^\prime \oplus r \big)$.

Reveal: You send $b, r^\prime, s$ to the receiver.

Recall that one-way functions imply PRG. The definition of a PRG implies that this scheme is computationally hiding. To cheat, you will have to find a $s^\prime$ such that $G(s) = G(s^\prime)$. The number of  $(s, s^\prime)$ pairs is $2^{2n}$ while the size of the PRG’s output space is $2^{3n}$. Hence the probability that you’ll succeed is at most $2^{-n}$. Hence this scheme is perfectly binding.

Pedersen’s Bit Commitment Scheme. Suppose a sender wants to commit to a bit $b$.
Initialization: Everyone agrees on a Discrete-Log setting $(G,g,q,p)$. The receiver sends a random $h \in G$.

Commit: The sender selects a random $r \in \mathbb{Z}_q$. The commitment is $c = g^r h^b \bmod p$.

Reveal: The sender sends $(r,b)$ to the receiver, and he verifies whether $c \overset{?}{=} g^r h^b \bmod p$.

The scheme is information-theoretically hiding; since $g^r$ is random, $c$ is also random regardless of $b$. The scheme is computatinally binding: to cheat, one has to find two different opening pairs, say $(r_0, 0)$ and $(r_1, 1)$, such that $c_0 = c_1$ where $c_0 = g^{r_0} \bmod p$ and $c_1 = g^{r_1} h \bmod p$. Suppose you can extract such a pair $r_0, r_1$ from the commitments $c_0, c_1$ and the common knowledge $g, h$. Since $h = g^a$ for some $a \in \mathbb{Z}_q$, you can solve the equation $c_0 = c_1$, or $h g^{r_1} = g^{r_0}$, or $h=g^{r_0-r_1}$, or $a = r_0 - r_1 \bmod p$. But this contradicts the Discrete-Log assumption, since now you know $a$, the discrete log of $h$ with respect to $g$. See here for more schemes. Here’s the original paper.

Chaum-Pedersen Discrete Log Proof-of-Knowledge Scheme.
Setup: Fix a multiplicative group $G$ of a prime order $q$.  Suppose you know some element $\alpha \in \mathbb{Z}_q$  and two elements $h_1, h_2 \in G$ such that $h_1 = g_1^\alpha$ and $h_2 = g_2^\alpha$ for two independently chosen generators $g_1, g_2$. Publish $g_1, h_1, g_2, h_2$.

Goal: you want to convince someone about your knowledge of $\alpha$ without revealing what $\alpha$ is.

Interaction: You select a random element $w$ from $G$ and give the verifier two elements $a_i = g_i^w$ from $G$ for $i \in [2]$. Then the verifier sends you a random element $c$ as a “challenge”. You then reply with a “response” $r = w - \alpha c \bmod q$.

Verification: The verifier then verifies whether $a_i = g_i^rh_i^c = g_i^{r+\alpha c}$.

Note: The Fiat-Shamir scheme turns a public-coin interactive proof-of-knowledge of the discrete log into a non-interactive proof-of-knowledge where the prover, instead of receiving random coins from the verifier, queries a random oracle to get the random bits himself. However, Goldwasser and Kalai have shown if Fiat-Shamir was used to make digital signatures, substituting the random oracle with any “hash function” results in an insecure digital signature scheme.

RSA Encryption Scheme.
Setup: Select two large primes $p,q$ and compute $N = pq$. Let $\phi(N) = ord(\mathbb{Z}_N^*) = (p-1)(q-1)$. Pick an arbitrary element $e$ in the multiplicative group $\mathbb{Z}_N^*$. This means, $e$ is coprime to and invertible modulo $\phi(N)$. Let $d$ be this inverse. $(N,e)$ is the public key, $d$ is the private key.

Encryption: message $m \bmod N$ is encrypted as $c = m^e \bmod N$.

Decryption: $c \mapsto c^d \bmod N = m^{ed} \bmod N = m \bmod N$ since $ed = 1 \bmod \phi(N)$.

Diffie-Hellman Key Exchange. You and I publicly agree on a prime $p$ and a generator $g$ of the group $G=\mathbb{Z}_p^*$. I randomly pick a private value $a \in G$ and you randomly pick your private value $b \in G$.  I send you $A = g^a \bmod p$ in public, while you send me $B = g^b \bmod p$ in public. I construct $S = B^a = g^{ab}$ while you construct $A^b = g^{ab}=S$. This $S$ is our shared secret assuming the discrete log problem is hard.

ElGamal Encryption Scheme. First, you and I publicly agree on a prime $p$ and a generator $g$ of the group $G=\mathbb{Z}_p^*$. Then we construct a shared secret $s$ via the Diffie-Hellman key exchange protocol. You encrypt a message $m \in G$ as $c = m\cdot s$. Since $G$ is cyclic, I can compute the inverse of $s$, call it $s^{-1}$. Finally, I can decrypt your cyphertext as $c \mapsto cs^{-1} = mss^{-1} = m$.

ElGamal Digital Signature Scheme.
Initialization: Select a large prime $p$ and a generator $g$ of the group $G = \mathbb{Z}_p^*$. Select a random secret element $x \in \{2, \cdots, p-3 \}$ and set $y := g^x$. Let $H$ be a cryptographically secure hash function.

Signning: To sign a message $m$, randomly select some $k \in \{2,\cdots, p-2\}$ such that $k$ is invertible modulo $p-1$. Let $r:=g^k \bmod p$. Compute an $s$ such that $H(m) \equiv xr + ks \bmod p-1$. If $s=0$, start over. The tuple $(r,s)$ is the signature of $m$.

Verification: Check that $0 < r < p-1, 1 < s < p-2$ and $r^sy^r \equiv g^{H(m)} \bmod p-1$. For a correct signature, the LHS is $r^sy^r = g^{ks}g^{xr} = g^{H(m)}$.

Schnorr Digital Signature Scheme.
Setup: Everyone agrees on the Discrete-Log setting $(G,g,q,p)$. The signer’s secret signing key is some $x \in \mathbb{Z}_q$, and his public verification key is $v = g^x$. Everyone has access to a hash function $H$.

Signing: To sign the message $M$, the signer chooses a random $k \in \mathbb{Z}_q$, computes $r = g^k$, computes the hash $h = H(r \Vert M)$ where $\Vert$ denotes concatenation. The pair $(k - x h, h)$ is the signature for $M$ where, recall, that $x$ is the secret signing key.

Verification: Given $M$ and a purported signature $(s, h^\prime)$, one computes $\displaystyle r^\prime = g^s v^{h^\prime} = g^{s + x h^\prime}$; and verifies that $\displaystyle h^\prime \overset{?}{=} H(r^\prime \Vert M)$.

Hard-Core Predicate. An easy-to-compute boolean predicate $b : \{0,1\}^n \rightarrow \{0,1\}$ is a hard-core for a function $f$ if no efficient algorithm can predict $b(x)$ with non-negligible success even if they know $f(x)$.

Weak One-Way Functions Imply Storng Ones. If $f$ is a weak one-way function, then $g(x,y) = \big( f(x), f(y) \big)$ is a strong one-way function.

Every Strong One-Way Function has a Hard-Core Predicate. The Boolean inner product $x \odot r$ is a hard-core predicate of the function $g(x,r) := \big( f(x), r \big)$ where $f$ is a strong one-way function and $r$ is arbitrary.

Intuitively, you have to invert $f$ to predict the inner product.

Oblivious Transfer. The context is a 1-out-of-N multiplexer. The sender holds $N$ values $\{x_i\}$. The receiver holds an index $b$. The receiver wants to know $x_b$ without the sender knowing $b$. The sender wants that the receiver knows nothing other than $x_b$.

1-out-of-2 Oblivious Transfer via Public Key Encryption. This is a variant of Even-Goldreich-Lempel. The sender chooses a random $r$. The receiver chooses a valid key-pair $(sk_b, pk_b) \leftarrow (x, g^x)$ and an “orphan” public key $pk_{1-b}) \leftarrow r \cdot pk_b$. The idea is that the sender, upon receiving  $pk_0, pk_1$, makes sure that the “other” public key, $pk_{1-b}$, is forced to a random value chosen by the sender. Next, the sender encrypts two values: for $i \in \{0,1\}$, he sends $c_i \leftarrow Enc_{pk_i}(x_i)$ to the receiver. Then the receiver can find $x_b \leftarrow Dec_{pk_b}(c_b)$.

This gives computational security against the sender (breaking the encryption scheme) and the receiver as well (recovering the secret key from a given public key).

1-out-of-2 Oblivious Transfer via DDH Assumption. The sender and receiver agree on two generators $g_0, g_1$. The receiver chooses a random exponent $r$. He then sends the following to the sender:

$h_0 \leftarrow g_0^r \qquad h_1 \leftarrow g_1^{r+b}$ .

The sender samples four random exponents $a_1, a_2,b_1, b_2$. Then he creates two “encryptions” $c_i \leftarrow A_i B_i C_i$ where

$A_i = g_0^{a_i} g_1^{b_i} \qquad B_i = h_0^{a_i}$ and

$C_0 = h_1^{b_0} x_0 \qquad C_1 = h_1^{b_1}g_1^{-b_1}x_1$ .

The receiver, upon receiving $c_b = A_b B_b C_b$, can recover

$x_b \leftarrow B_b C_b A_b^{-r}$ .

Note that if $b = 0$, then $B_0 C_0 A_0^{-r} = h_0^{a_0} \left( h_1^{b_0} x_0 \right) \left( g_0^{-r a_0} g_1^{-r b_0} \right)$ $=g_0^{r a_0} g_1^{r b_0} x_0 g_0^{-r a_0} g_1^{-r b_0} = x_0$ .

On the other hand, if $b = 1$, then $B_1 C_1 A_1^{-r} = h_0^{a_1} \left( h_1^{b_1} g_1^{-b_1} x_1 \right) \left( g_0^{-r a_1} g_1^{-r b_1} \right)$ $= g_0^{r a_1} \left( g_1^{(r+1)b_1} g_1^{-b_1} x_1 \right) \left( g_0^{-r a_1} g_1^{-r b_1} \right) = x_1$ .

This gives information-theoretic security against the receiver since all structure is lost in a $c_i \neq c_b$. On the other hand, this gives computational security (i.e., the DDH assumption) against the sender since the sender can break $g_1^{r+b}$ if he can compute the discrete log from $h_1 = g_0^r$.

## Distributed Computing

Byzantine Agreement.Micali’s BA protocol works on a completely asynchronous network when at least $2/3$ of the users are honest, and halts after 9 rounds in expectation. It is player-replaceable in the sense that players can be removed/corrupted immediately after sending a message.

The Micali-Vaikuntnathan protocol  works on a synchronous network, and allows the honest fraction to be as low as $\alpha = 1/2 + \delta, \delta> 0$. With probability $1 - \epsilon$, the number of rounds is at most

$\displaystyle \frac{\log 1/\epsilon}{\log \frac{2}{2-\alpha}}\, .$

The Impossibility of Byzantine Agreement. It is impossible to attain Byzantine Agreement among deterministic players in an asynchronous network.

## Optimization

Newton’s Method. Suppose you want to find a root of a locally-convex function $f(x)$ in the vicinity of the initial point $x_0$. The root will be approached in a sequence of points $x_0, x_1, x_2, \cdots$. Suppose you already know $x_n$. You make a linear approximation of $f(x)$ at $x = x_n$ as follows: $y = f(x_n) + (x_{n+1} - x_n) f^\prime(x_n)$. Because you want to find a root of $f$, you set $y = 0$ and solve for

$\displaystyle x_{n+1} = x_{n} - \frac{f(x_n)}{f^\prime(x_n)}$ .

Because you are doing a linear approximation, the error term decreases quadratically as you approach the true root $x = \alpha$. In particular, Taylor expansion of $f$ around $x_n$ tells us that the approximation error is $(\alpha - x_n)^2 f^{\prime \prime}(x_n)/2! + O((\alpha - x_n)^3)$

## Series

Taylor Series of a real function $f(x)$ around the point $a$.

$\displaystyle f(x) = \sum_{k=0}^{\infty} f^{(k)}(a) (x-a)^k / k! .$

$\displaystyle \log (1-x) = -x -x^2/2-x^3/3-\cdots$
$\displaystyle \log(1+x) = x - x^2/2 + x^3/3 + \cdots$

## Inequalities

$(1-x)^{1-x} > \exp(-x+x^2/2)$ when $x \in (0, 1)$

The exponential function is steep above zero and flat below zero.

$e^x \geq 1+x$ for $x\in \mathbb{R}$.

Concave functions are subadditive. Suppose $f(x)$ is concave and $\alpha > 0$. Then

$f(\alpha x) \geq \alpha x$ ,
$f(a)+f(b) \geq f(a+b)$ .

Cauchy-Schwartz Inequality. The inner product is at most the product of individual lengths.

$\displaystyle \langle x \vert y \rangle \leq \Vert x \Vert_2 \cdot \Vert y \Vert_2.$

AM-GM Inequality.  Let $\{w_i\}$ denote a convex combination i.e., $\sum{w_i}=1, w_i \geq 0$. Then

$\displaystyle \sum_i{w_i a_i} \geq \prod_i{a_i^{w_i}}$ .

Equality occurs if $w_1 = w_2 = \cdots$ .

Power Mean Inequality. For a vector $x \in \mathbb{R}^n$ and positive real $p$, define $\displaystyle \Vert x \Vert_p = \left(\sum_i{x_i^p}\right)^{1/p}$. For positive real $p \geq q$,

$\displaystyle \frac{\Vert x \Vert_p^p}{n} \geq \frac{\Vert x \Vert_q^q}{n}$.

In other words, $\Vert x \Vert_p$ is “not too short:” for any $t > 0$,

$\displaystyle \Vert x \Vert_{p+t} < \Vert x \Vert_p < \Vert x \Vert_{p+t} \ n^{\frac{t}{p(p+t)}}$.

A Logarithmic Inequality. Let $f:x \mapsto \log \dfrac{1}{1-x}$. Then

$\displaystyle y f(x) \leq x f(y) \qquad 0 < x \leq y \leq 1$.

Proof: Let $f$ be a convex function satisfying $f(a) = 0$. Let $a < x \geq b$ be two points in the domain so that $x = \alpha a + \beta b$ for some $\alpha, \beta >0, \alpha + \beta = 1$. By Jensen, we have $f(x) \leq \alpha f(a) + \beta f(b) = \beta f(b)$ since $f(a)=0$ by assumption.

Now consider the function $f : x \mapsto -\log(1-x)$ which is convex in $[0,1)$. For this function, $a=0$ i.e., $f(0)=0$. This means for any $x$, we are free to choose any $b \geq x$ so that $x = \alpha \cdot 0 + (x/b)\cdot b$. It follows that $f(x) \leq (x/b) f(b)$, or $b f(x) \leq x f(b)$. Writing $y \leftarrow b$ yields the above inequality.

## Quantum Computing

State Vectors. $n$ qubits can specify $N=2^n$ indices, one for each coordinate of an $N$-dimensional state vector in $\mathbb{C}^N$. An orthonormal basis of an $n$-qubit system is the so-called “computational basis” $\{ \vert k \rangle\}, k \in 0,1,2,\cdots N-1$, where $\vert k \rangle$ is simply the standard basis vector $e_k \in \{0,1\}^N$.

Pauli Matrices/Gates. Let $\mathcal{P} = \{X,Y,Z\}$ where

$X = \begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix} \qquad Y = \begin{bmatrix} 0 & -i \\ i & 0 \end{bmatrix}\qquad Z = \begin{bmatrix} 1 & 0 \\ 0 & -1 \end{bmatrix}$.

For each $M \in \mathcal{P}$, it holds that

$\displaystyle M^* = M \text{ (Hermitian)},\qquad M^2 = I \text{ (involutory)}, \qquad M^* M = I\text{ (unitary)}$ .

$\mathrm{tr}(M) = 0$ and $\mathrm{det}(M) = -1$.

Their commutators have a nice structure:

$[X,Y] = 2iZ \qquad [Y,Z] = 2iX \qquad [Z,X] = 2iY$.

The eigenvalues of these matrices are $\lambda(M) = \pm 1$. The corresponding eigenvectors $v_M(\lambda)$ are:

$\displaystyle v_X(\lambda) = \begin{pmatrix} \lambda \\ 1 \end{pmatrix}/\sqrt{2}$ .

$\displaystyle v_Y(\lambda) = \begin{pmatrix} -\lambda i \\ 1 \end{pmatrix}/\sqrt{2}$ .

$\displaystyle v_Z(\lambda) = \begin{pmatrix} 1+\lambda \\ 1-\lambda \end{pmatrix}/2$ .

$\displaystyle \qquad H = \frac{1}{\sqrt{2}} \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix} = (X+Z)/\sqrt{2}$.

$\displaystyle H : \vert b \rangle \mapsto \big( \vert 0 \rangle + (-1)^b \vert 1 \rangle \big)/\sqrt{2}\,, \qquad b \in \{0,1\}$.

Specifically, $\displaystyle H \vert 0 \rangle = \big(\vert 0 \rangle + \vert 1 \rangle \big)/\sqrt{2} = \big( \vert 0 \rangle + \exp(2 \pi i) \vert 1 \rangle \big)/\sqrt{2}$ . Notice that

$\displaystyle (-1)^b = \exp( \pi i b )= \exp( 2 \pi i b/2 ) = \exp( 2 \pi i 0.b )$

where we use the notation

$0.x_1x_2\cdots x_k = x/2^k$ where $x = x_1 x_2 \cdots x_n$ for $n \geq k, x_i \in \{0,1\}$ .

Let $x = x_1 \cdots x_n \in \{0,1\}^n$, and $N = 2^n$. An $n$-qubit Hadamard gate is

$\displaystyle H^{\otimes n} : \vert x \rangle \quad \mapsto \quad \frac{1}{\sqrt{N}} \sum_{z \in \{0,1\}^n}{ (-1)^{x \odot z} \vert z \rangle }$ ,

where $\odot$ denotes the Boolean inner product of two strings. It can also be written as the tensor product

$\displaystyle H^{\otimes n} : \vert x \rangle \quad \mapsto \quad \frac{1}{\sqrt{N}} \bigotimes_{\ell=1}^{n}{ \big[ \vert 0 \rangle + \exp(2 \pi i x_\ell/2) \vert 1 \rangle \big] }$ .

Other Quantum Gates.

• Rotate by an angle of $2^{-k}\cdot 2\pi$: $\displaystyle \qquad R_{k} = \begin{bmatrix} 1 & 0\\ 0 & \exp(2\pi i 2^{-k}) \end{bmatrix}$.
• Phase gate i.e., Rotate by $90^\circ$: $\displaystyle \qquad S = R_{2} = \begin{bmatrix} 1 & 0\\ 0 & i \end{bmatrix}$.
• Axis-specific rotations: for $m \in \{x,y,z\}$ and $M \in \{X,Y,Z\}$, $\displaystyle R_m(\theta) = \exp(i\theta M)$

Measurements. A measurement apparatus $M = \{ M_x \}$ satisfies the completeness relation, i.e., it is a decomposition of the identity $I = \sum_x{M_x^* M_x}$. The probability of observing the outcome $z$ after measuring $\vert \phi \rangle$ is

$\displaystyle p(z) = p(z; \vert \phi \rangle, M) = \langle M_z \phi \vert M_z \phi \rangle = \langle \phi \vert M_z^* M_z \vert \phi \rangle = \Vert M_z \vert \phi \rangle \Vert_2^2$.

The state of the system after the measurement is $\displaystyle \frac{M_z \vert \phi \rangle}{\Vert M_z \vert \phi \rangle \Vert_2} = \frac{M_z \vert \phi \rangle}{\sqrt{p(z)}}$.

Projective Measurements. The components of a projective measurement $M = \{M_x\}$ correspond to the spectral decomposition $\{(e_x, \lambda_x)\}_x$ in that each $M_x$ is a projection operator for the eigenspace $e_x e_x^*$:

$\displaystyle M = \sum_x{\lambda_x M_x} = \sum_x{\lambda_x e_x e_x^*}$.

This implies, the probability of observing a vector $\vert z \rangle$ after measuring a quantum state $\vert \phi \rangle$ is the squared modulus of the (complex) inner product of these two vectors:

$\displaystyle p(z) = p(z; \vert \phi \rangle, M) = \langle \phi \vert M_z \vert \phi \rangle = \vert \langle \phi \vert e_z \rangle \vert^2$.

The expectation and variance of a projective measurement has a friendly form:

$\langle M \rangle_\phi := \mathbb{E}(M; \vert \phi \rangle) = \sum_x{x p(x ; \vert \phi \rangle)} = \langle \phi \vert M \vert \phi \rangle$.

$\displaystyle [\Delta(M)]^2 := \langle (M - \langle M \rangle)^2 \rangle = \langle M^2 \rangle - \langle M \rangle^2$,

where $\Delta(.)$ is the “standard deviation.”

The Heisenberg Uncertainty Principle. Let $C,D$ be two projective measurements. Then, with respect to any quantum state $\vert \phi \rangle$, the product of the standard deviations (across a large number of trials) is “large”:

$\displaystyle \Delta(C; \vert \phi \rangle) \Delta(D; \vert \phi \rangle) \geq \frac{\big\vert \langle \phi \vert [C, D] \vert \phi \big\vert}{2}$.

Entangled State. A quantum state in a $k$-bit system is entangled if it cannot be written using a single $k$-bit index. For example, suppose $k=2$. The $k$-bit integers $\{1,2,3, 4\}$ index the $2^k$ basis states. We need two “independent” basis indices (namely 1 and 4) to specify the (unnormalized) state $\vert 1 \rangle + \vert 4 \rangle$; hence it is an entangled state.

The Bell Basis. Each of the following states is entangled. They form an orthonormal basis of $\mathbb{C}^2$.

$\displaystyle \Phi^{\pm}:=\frac{1}{\sqrt{2}}\left(\vert 00 \rangle \pm \vert 11 \rangle\right)\,, \qquad \Psi^{\pm}:=\frac{1}{\sqrt{2}}\left(\vert 01 \rangle \pm \vert 10 \rangle \right)\, .$

Put in another way, let $\vert \phi_{xy} \rangle$ denote the four Bell states denoted by the bits $x,y$. In particular, $y = 0$ corresponds to $\Phi^{\pm}$, and $y = 1$ corresponds to $\Psi^{\pm}$. Then,

$\displaystyle \vert \phi_{xy} \rangle = \big( \vert 0,y \rangle + (-1)^x \vert 1, 1\oplus y \rangle \big)/\sqrt{2}$

Bloch Sphere Representation. A single qubit $q := a \vert 0 \rangle + b \vert 1 \rangle$ can be represented by the tuple $(\theta, \phi)$, as follows: $q$ can be identified with a point $v$ on the 2-sphere in 3-dimensions, with the vector $\vert 0 \rangle$ at the North pole. Then $\theta$ is the angle of the vector $late v$ with the $z$-axis, and $\phi$ is the angle between the $x$-axis and the projection of $v$ on the $xy$-plane. The Bloch vector for $v$ is

$\displaystyle v = (\sin \theta \cos \phi, \sin \theta \sin \phi, \cos \theta)$ .

This is equivalent to setting $\displaystyle a := \cos(\theta/2), b := \exp(i \phi) \sin(\theta/2)$ .

Quantum Fourier Transform. Let $j, k \in \{0,1\}^n$, $i$ the imaginary unit, and $N = 2^n$. Let $\omega_j = \exp(2 \pi ij /N)$ be the $j$th power of the $N$th root of unity. The QFT is a linear operator that takes

$\displaystyle \vert j \rangle \quad \mapsto \quad \frac{1}{\sqrt{N}} \sum_{k=0}^{N-1} \omega_j^k \vert k \rangle$ .

The right hand side equals

$\displaystyle \frac{1}{\sqrt{N}} \bigotimes_{\ell = 1}^n {\big[ \vert 0 \rangle + \underbrace{e^{2\pi i j/2^\ell }}_{\omega_j^{N/2^\ell}} \vert 1 \rangle \big]}$ .

## Discrete Geometry

Radon’s Theorem. Any set of $d+2$ points in $\mathbf{R}^d$ can be partitioned into red-blue points so that the respective convex hulls intersect.

## Analysis

Gamma function. The integral

$\displaystyle \Gamma(z) = \int_0^\infty{ x^{z-1} e^{-x} dx}$,

converges absolutely if $z \in \mathbb{C}, \mathrm{Re}(z) > 0$.  It is the unique function that satisfies $\Gamma(1) = 1, \Gamma(z+1) = z \Gamma(z)$. In particular, $\Gamma(1/2) = \sqrt{\pi}$. Euler proved that

$\displaystyle \Gamma(z) = \prod_{n=1}^\infty \bigg[ \frac{ \big(1+1/n \big)^z }{ 1+z/n } \bigg]$.

Weierstrass showed that

$\displaystyle \Gamma(z) = \frac{e^{-\gamma z}}{z} \prod_{n=1}^\infty \bigg[ \frac{ e^{zn} }{ 1+z/n } \bigg]$,

where $\displaystyle \gamma \approx 0.577216$ is the Euler-Mascheroni constant. Euler’s reflection formula says that

$\displaystyle \Gamma(z) \Gamma(1-z) = \frac{ \pi }{\sin (\pi z)}$ if $z \not \in \mathbb{Z}$ .

The duplication formula says that

$\displaystyle \Gamma(z) \Gamma( z + 1/2) = \frac{2 \sqrt{\pi}}{2^{2z}} \Gamma(2z)$ .

Moreover, $\overline{\Gamma(z)} = \Gamma(\overline(z))$. The integral derivatives are

$\displaystyle \Gamma^{(n)}(z) = \int_0^\infty{ x^{z-1} e^{-x} (\ln x)^z dx }$ .

Fractional Differentiation of a monomial. Let $y(x) = x^\mu$ for $\mu \in \mathbb{R}$. Then for any real $s > 0$,

$\displaystyle \frac{d^s}{dx^s} = \frac{\Gamma(\mu + 1)}{\Gamma(\\mu - s + 1)} x^{\mu - s}$ .

This agrees with the usual integral differentiation if $s \in \mathbb{N}$.

Cauchy’s Repeated Integration. Let $f$ be a Lebesgue-integrable function defined in $[a,b]$. Define the integration operator

$\displaystyle I_a f (x) := \int_a^x{f(t) dt}$ .

A repeated application of this operator can be expressed as

$\displaystyle I_a^{(n)} f (x) = \int_a^x \int_a^{t_n} \cdots \int_a^{t_2}{ f(t_1) dt_1 \cdots dt_n}$

$\displaystyle = \frac{1}{(n-1)!} \int_a^x{ (x-t)^{n-1} f(t) dt }$.

Riemann-Liouville Fractional Integral. The natural extension of Cauchy’s repeated integration’s order from an integer $n$ to a positive real $s$ is the Riemann-Liouville fractional integral:

$\displaystyle I_a^{(s)} f(x) = \frac{1}{\Gamma(s)} \int_a^x{f(t) (x-t)^{s-1} dt }$

where $f$ is a locally-integrable function and $\alpha \in \mathbb{C}, \mathrm{Re}(\alpha) > 0$.

It can be showed that for $x\in \mathbb{R}$,

$\displaystyle \widehat{I_{-\infty}^s f}(x) = (i x)^{-s} \hat{f}(x)$ ,

where $\hat{\cdot}$ denotes the Fourier transform.

Riemann-Liouville Fractional Derivative. To differentiate to the order $s \in (k-1,k]$, we cannot directly use $-s$ as the order in Riemann-Liouville integral. Instead, integrate to the order $k-s$ and then differentiate to the integral order $k$.

$\displaystyle D_a^{s} f(x) = D_a^{k} I_a^{k-s} f(x) = \frac{1}{\Gamma(k-s)} \frac{d^k}{dx^k} \int_a^x{f(t) (x-t)^{k-s-1} dt }$.

Note that the value of the derivative depends on all values of $f(t)$ for $t \in [a,x]$, whereas the value of the classical derivatives are “local” in that they depend only on $x$.

It can be showed that for $x\in \mathbb{R}$,

$\displaystyle \widehat{D_{-\infty}^s f}(x) = (i x)^s \hat{f}(x)$ ,

where $\hat{\cdot}$ denotes the Fourier transform.

Fourier Transform. Let $e_k(x) = e^{-2 \pi i k x}$. Then, for any function $f : \mathbb{R} \rightarrow \mathbb{C}$, its Fourier transform is a function $\hat{f} : \mathbb{R} \rightarrow \mathbb{C}$ defined as

$\displaystyle (\mathcal{F} f)(s) = \hat{f}(s) = \int_{-\infty}^\infty{ f(x) e_k(x) dx}$.

The inverse transform is defined as

$\displaystyle (\mathcal{F}^{-1} \hat{f})(x) = f(x) = \int_{-\infty}^\infty{ \hat{f}(s) e_{k}(-s) ds}$.

The transform of a convolution is the product of the transforms:

$\displaystyle \mathcal{F}(f \star g)(s) = \mathcal{F} f (s) \cdot \mathcal{F}g (s)$ .

The transform of a derivative is a scaling of the transform:

$\displaystyle \widehat{f^{(n)}}(s) = (2 \pi i s)^n \hat{f}(s)$ .

The Plancherel theorem says that the Fourier transform (of a square-integrable function) preserves  the squared-$\ell_2$ norm of the function:

$\displaystyle \int{\vert f(x) \vert^2 dx} = \int{\vert \hat{f}(s) \vert^2 ds}$ .

This is generalized by Parceval’s theorem for two square-integrable functions $f,g$

$\displaystyle \int{ f(x) \overline{g(x)} dx} = \int{ \hat{f}(s) \overline{\hat{g}(s)} ds}$ .

Laplace Transform. Very similar to the Fourier transform, except the coefficients are complex and the integration runs only on the non-negative reals. For functions $f : \mathbb{R}_+ \rightarrow \mathbb{C}$,

$\displaystyle \hat{f}(s) = \int_0^\infty{ f(t) e^{-st} dt} \, , \qquad s \in \mathbb{C}$ .

The inverse transform is given by Fourier-Mellin’s inverse formula aka the Bromwich line integral:

$\displaystyle f(t) = \frac{1}{2\pi i} \lim_{T \rightarrow \infty} \int_{\gamma - iT}^{\gamma + iT}{ \hat{f} (s) e^{st} ds}$ .

Laplace transform is linear. Moreover,

$\displaystyle \widehat{e^{ct}}(s) = \frac{1}{s-c}$ .

$\displaystyle \widehat{e^{ct} f(t) }(s) = \hat{f}(s-c)$ .

$\displaystyle \widehat{f(t - T) }(s) =e^{-Ts} \hat{f}(s)$ .

$\displaystyle \widehat{ t f(t) }(s) = - \frac{d}{ds} \hat{f}(s)$ .

$\displaystyle \widehat{f^\prime (t) }(s) = s \hat{f}(s) - f(0)$ .

$\displaystyle [\mathcal{L} I_0^t f ](s) =\frac{1}{s} \mathcal{L} f(s)$,

where $\displaystyle I_0^t$ is the integration operator denoting $\displaystyle I_0^t f := \int_{x=0}^t{f(x) dx}$ and $\mathcal{L}$ is the Laplace transform.

If $X$ is a random variable with density $f$, then for any complex $s$, $\hat{f}(s) = \mathbb{E}\ e^{-sX}$ .

Tonelli’s Theorem. If a two-variable function $f$ is non-negative, then we can change the order of integration in a double integral, i.e., if $f(x,y) \geq 0$ then

$\displaystyle \int_{X,Y} f(x,y)dxdy = \int_X \bigg( \int_Y f(x,y) dy \bigg) dx = \int_Y \bigg( \int_X f(x,y) dx \bigg) dy$ .

Fubini’s Theorem has the same conclusion as Tonelli’s theorem, but it stipulates that $f$ has a finite integral over $X \times Y$.

Differentiation under the Integral. Also called “the Leibniz rule.”

$\displaystyle \frac{d}{dx} \int_{a(x)}^{b(x)} f(x,t) dt = f(x, b(x) ) b^\prime(x) - f(x, a(x) ) a^\prime(x) + \int_{a(x)}^{b(x)} \frac{\partial }{\partial x} f(x,t) dt$ .

If $a(x), b(x)$ are constants, the first two terms are zero.

Polylogarithm Function. For complex order $s$ and argument $z$,

$\displaystyle \mathrm{Li}_s(z) = \sum_{k=1}^\infty{ \frac{z^k}{k^s}}$ .

It can be expressed as a repeated integral of itself:

$\displaystyle \mathrm{Li}_{s+1}(z) = \int_0^z{\frac{\mathrm{Li}_s(t)}{t} dt}$ .

Moreover,

$\displaystyle \mathrm{Li}_{-n}(z) = {1 \over (1-z)^{n+1}} \sum_{k=0}^{n-1} \left\langle {n \atop k} \right\rangle z^{n-k}$ ,

where $n=1,2,3,\ldots$ and the $\left\langle {n \atop k} \right\rangle$ notation denotes the Eulerian numbers. This expression is strikingly similar to the $n$th moment of a geometrically distributed random variable $X$ with success parameter $1-\alpha$ for some $\alpha \in (0,1)$. Then

$\displaystyle\mathbb{E}\ X^n = \frac{1}{(1-\alpha)^n} \sum_{k=0}^{n-1} \left\langle {n \atop k} \right\rangle \alpha^{k}$.

The Hypergeometric Series. For a complex $z, \vert z \vert < 1$ and positive integers $a, b, c$ ,

$\displaystyle {}_2F_1(a, b; c; z) = \sum_{n=0}^\infty \frac{ (a)_n (b)_n }{(c)_n} \frac{z^n}{n!}$ ,

where $(a)_n = a(a+1)\cdots (a+n-1)$ is the rising Pochhammer symbol. If either $a$ or $b$ is non-positive, the series reduces to the polynomial

$\displaystyle {}_2F_1(-m, b; c; z) = \sum_{n=0}^\infty (-1)^n {m \choose n} \frac{ (b)_n }{(c)_n} z^n$ ,

Laguerre Polynomials.

$\displaystyle L_n(x) = \sum_{k=0}^n{ {n \choose k} \frac{(-1)^k x^k}{k!} }$ .

They satisfy the differential equation

$\displaystyle x L_n^{(2)}(x) + (1-x) L_n^{(1)}(x) + n L_n(x) = 0$ .

They are orthogonal with respect to the inner product

$\displaystyle \langle L_m, L_n \rangle = \int_0^\infty e^{-x} L_m(x) L_n(x) dx$ =\delta_m(n).

## Interesting Results

The union of two random spanning trees of any graph is an expander. Moreover, it is a linear-size cut sparsifier for random graphs. Goyal-Vempala 2004

It suffices to sample only $\Theta(n / \log n)$ elements from any distribution to test its entropy and support size. See here.

Let $G$ be an $(n, d, \lambda)$ expander. Let $F \subset E$ contain an $\alpha$ fraction of all edges. Consider a random walk that starts at a random edge from $F$. The probability that it uses an edge from $F$ at the $t$-th step is at most $\alpha + ( \lambda/d)^{t-1}$. See here.

For certain self-reducible problems, approximate counting is equivalent to almost uniform sampling. See here.