# Notes on the PCP Theorem and the Hardness of Approximation: Part 1

In this note, we are going to state the PCP theorem and its relation to the hardness of approximating some NP-hard problem.

## PCP Theorem: the Interactive Proof View

Intuitively, a PCP (Probabilistically Checkable Proof) system is an interactive proof system where the verifier is given $r(n)$ random bits and he is allowed to look into the proof $\pi$ in $q(n)$ many locations. If the string $x \in \{0,1\}^n$ is indeed in the language, then there exists a proof so that the verifier always accepts. However, if $x$ is not in the language, no prover can convince this verier with probability more than $1/2$. The proof has to be short i.e., of size at most $q(n) 2^{r(n)}$. This class of language is designated as PCP[r(n), q(n)].

Theorem A (PCP theorem). Every NP language has a highly efficient PCP verifier. In particular,

$NP = PCP[O(\log n), O(1)]$.

# Forkable Strings are Rare

In a blockchain protocol such as Bitcoin, the users see the world as a sequence of states. A simple yet functional view of this world, for the purpose of analysis, is a Boolean string $w = w_1, w_2, \cdots$ of zeros and ones, where each bit is independently biased towards $1$ favoring the “bad guys.”

A bad guy is activated when $w_t = 1$ for some $t$. He may try to present the good guys with a conflicting view of the world, such as presenting multiple candidate blockchains of equal length. This view is called a “fork”. A string $w$ that allows the bad guy to fork (with nonnegligible probability) is called a “forkable string”. Naturally, we would like to show that forkable strings are rare: that the manipulative power of the bad guys over the good guys is negligible.

Claim ([1], Bound 2). Suppose $w =w_1, \cdots, w_n$ is a Boolean string, with every bit independently set to $1$ with probability $(1-\epsilon)/2$ for some $\epsilon < 1$. The probability that $w$ is forkable is at most $\exp(-\epsilon^3n/2)$.

In this post, we present a commentary on the proof that forkable strings are rare. I like the proof because it uses simple facts about random walks, generating functions, and stochastic domination to bound an apparently difficult random process.

# A Publicly Verifiable Secret Sharing Scheme

We all know Shamir’s $(t,n)$-Secret Sharing Scheme: A dealer fixes a group $\mathbf{Z}_p$ and selects a random polynomial $P(x)$ of degree at most $t-1$ over this group. He sets the constant coefficient equal to your secret $\sigma$, then distributes $n$ arbitrary evaluations of this polynomial as shares to the $n$ parties. Any subset of $t$ parties can pool their shares and reconstruct the polynomial via Lagrange interpolation.

What if we don’t want to trust the dealer nor the parties? Feldman’s Verifiable Secret Sharing Scheme allows the dealer to publish, in addition to the regular Shamir-shares, a generator $g$ of the group $\mathbf{Z}_p^*$ along with $t$ proofs or commitments $c_j$ for each coefficient of the polynomial $P(x)$. Each party can verify that his share is correct by testing whether

$\displaystyle g^{P(i)}\, \overset{?}{=} \, \prod_j{c_j^{i^j}} \bmod p$.

However, the shares $p(i)$ are still private. What if we want everyone to be able to verify the correctness of all shares without knowing what the shares are? This is achieved by a Publicly Verifiable Secret Sharing Scheme, such as the one developed by Berry Schoenmaker, assuming the discrete log problem is hard.

# Vandermonde Submatrices and Arithmetic Progressions

[This post, which is based on an ongoing discussion with Alex Russell and Ravi Sundaram, contains some unpublished results.]

Currently, we are asking whether all submatrices of the order-$p$ Vandermonde matrix over a finite extension of $GF(2)$ are invertible where $p$ is prime. The answer is “no” in general: there are examples of fields where the Vandermonde matrix has a singular submatrix.

We can ask an easier(?) question, though. What happens if we randomly sample a set of columns and look into submatrices formed by a subset of the sampled columns. With a touch of beautiful insight, Professor Russell has connected Szemeredi’s theorem on arithmetic progressions with this question.

Let $AP_k$ denote an arithmetic progression of length $latek k$. Let $[N] := \{1, 2, \cdots, N\}$ for $N \in \mathbb{N}$.

The Szemerédi theorem says, any “sufficiently dense” subset $S \subset [N]$ contains infinitely many $AP_k$ for all $k \in \mathbb{N}$. A finitary version says: Fix your favourite $k \in \mathbb{N}, \delta \in [0, 1]$. Then,  there exists a natural $N := N_{k, \delta}$ such that if you look any subset $S \subset [N]$ of size at least $\delta N$, you will find an $AP_k$. Yet another version says:

Szemerédi’s Theorem. The size of the largest subset $S \subset [N]$ without an $AP_k$ cannot be too large; in particular, it is $o(N)$.

Recall that a function $f(x)$ is $o(g)$ if it grows too slow compared to $g(x)$, so that $\lim_{N\rightarrow \infty}{f(x)/g(x) = 0}$.

# When does the Discrete Fourier Transform Matrix have Nonsingular Submatrices?

I am studying a coding theory problem. The question is this:

Open Question: Is there a prime $p$ and a positive integer $d$ such that all submatrices of the $p\times p$ Discrete Fourier Transform matrix over the field $GF(2^d)$ are nonsingular?

Currently, I have only counterexamples: Let $d$ be the degree of the smallest extension over $GF(2)$ which contains a nontrivial $p$th root of unity. Then, I know a lot of primes $p$ for which the matrix $V$ has a singular submatrix.

In this post, I am going to show a failed attempt to answer this question using the results in this paper by Evra, Kowalski, and Lubotzky.

# Bounding the Supremum of a Gaussian Process: Talagrand’s Generic Chaining (Part 1)

This post is part of a series which answers a certain question about the supremum of a Gaussian process. I am going to write, as I have understood, a proof given in Chapter 1 of the book “Generic Chaining” by Michel Talagrand. I recommend the reader to take a look at the excellent posts by James Lee on this matter. (I am a beginner, James Lee is a master.)

Let $(T,d)$ be a finite metric space. Let $\{X_t\}$ be a Gaussian process where each $X_t$ is a zero-mean Gaussian random variable. The distance between two points $s,t\in T$ is the square-root of the covariance between $X_s$ and $X_t$. In this post, we are interested in upper-bounding $Q$.

Question: How large can the quantity $Q := \mathbb{E} \sup_{t\in T} X_t$ be?

In this post we are going to prove the following fact:

$\boxed{\displaystyle \mathbb{E}\sup_{t\in T}X_t \leq O(1)\cdot \sup_{t\in T}\sum_{n\geq 1}{2^{n/2}d(t,T_{n-1})} ,}$

where $(t, A)$ is the distance between the point $X_t$ from the set $A$, and $\{T_i\}$ is a specific sequence of sets with $T_i\subset T$. Constructions of these sets will be discussed in a subsequent post.

# Impagliazzo’s Hardcore Lemma: a Proof

Informally speaking, Impagliazzo’s hardcore lemma says that if a boolean function is “hard to compute on average” by small circuits, then there exists a set of inputs on which the same function is “extremely hard to compute on average” by slightly smaller circuits.

In this post, I am going to explain how I understand the proof of the hardcore lemma presented in the Arora-Barak complexity book (here). Because the formal proof can be found in the book I intend to write in an informal way. I think some subtleties are involved in turning the context of the lemma into a suitable two-player zero-sum game. Doing so enables one to use von Neumann’s minimax theorem to effectively “exchange the quantifiers” in the contrapositive statement of the lemma. Although the Arora-Barak proof mentions these subtleties, I am going to explore these in more detail and in a more accessible way for a beginner like me.