# Notes on the PCP Theorem and the Hardness of Approximation: Part 1

In this note, we are going to state the PCP theorem and its relation to the hardness of approximating some NP-hard problem.

## PCP Theorem: the Interactive Proof View

Intuitively, a PCP (Probabilistically Checkable Proof) system is an interactive proof system where the verifier is given $r(n)$ random bits and he is allowed to look into the proof $\pi$ in $q(n)$ many locations. If the string $x \in \{0,1\}^n$ is indeed in the language, then there exists a proof so that the verifier always accepts. However, if $x$ is not in the language, no prover can convince this verier with probability more than $1/2$. The proof has to be short i.e., of size at most $q(n) 2^{r(n)}$. This class of language is designated as PCP[r(n), q(n)].

Theorem A (PCP theorem). Every NP language has a highly efficient PCP verifier. In particular,

$NP = PCP[O(\log n), O(1)]$.

## PCP Theorem: the Hardness of Approximation View

Given a Boolean formula $\phi$ with $m$ clauses on $n$ variables, let $val(\phi)$ be the maximum fraction of satisfiable clauses, taken over all variables assignments. The MAX-3SAT problem asks you to find $val(\phi)$ for a given $\phi$. A $\rho$-approximation $A$ to the MAX-3SAT problem, for $0 < \rho \leq 1$, outputs a number $A(\phi) \geq \rho val(\phi)$.

It is easy to conceive a greedy $1/2$-approximation algorithm for MAX-3SAT: for the $i$th variable, choose the assignment that satisfies at least half of the remaining clauses, remove the satisfied clauses as well as the current variable from consideration, and repeat with the next variable. The celebrated algorithm of Goemans-Willamson (based on Semidefinite Programming) gives a $7/8-\epsilon$-approximation.

Theorem B (Approximating MAX-3SAT is NP-hard). There exists a $\rho^* < 1$ such that for every language $L$ in NP, there is an efficient mapping $f$ from the instances of the language to the instances of MAX-3SAT, so that if $x \in L$ then $val( f(x) ) = 1$, and otherwise $val( f(x) ) < \rho^*$.

The above theorem immediately implies that there is a constant $\rho^*$ such that if we have an efficient $\rho^*$-approximation algorithm for MAX-3SAT, then it could be used to decide every NP language. In other words, P = NP.

The qCSP problem is a generalization of the 3SAT problem where $3$-variable Boolean clauses are replaced by $q$-variable Boolean functions called constraints. The $\rho$-GAP$q$CSP problem is a decision problem where given a Boolean formula $\phi$, we have to decide whether $val(\phi)$ is one, or strictly below $\rho$.

Theorem C. There exists constants $\rho$ and $q$ such that the $\rho$-GAP$q$-CSP problem is NP-hard.

We claim that Theorem B is equivalent to the PCP Theorem (A), via Theorem C.

Theorem D. Theorems A, B, and C are equivalent.

Proof sketch: PCP implies GAP-CSP. Suppose NP equals $PCP[\log n, 1]$. Here, given any NP language $L$, the trick is to imagine a $q$-CSP constraint as a Boolean predicate $V_{x,r}$ which is true if there exists a $PCP[c \log n, q]$ verifier which accepts $x$. By the soundness and completeness of the PCP verifier, we have constructed a $1/2$-GAP$q$CSP instance. If one can decide this instance, he can also decide any NP language.

Proof sketch: GAP-CSP implies PCP. This is easy. Suppose you can decide an NP-hard $\rho$-GAP$q$CSP instance with $m$ clauses. You can construct a $PCP[\log m, q]$ verifier as follows: randomly choose a constraint $i \in [m]$ and query the $q$ literals in this clause. This verifier has completeness 1 and soundness $\rho$. The soundness can be boosted down to $1/2$ with sequential repetition.

Proof sketch: Approximate MAX-3SAT implies GAP-3CSP. This is easy, just treat each 3-CNF clause as a 3CSP constraint.

Proof sketch: GAP-3CSP implies Approximate MAX-3SAT. The idea is to treat a constraint as an AND of $q$-variable Boolean clauses, and then convert this formula into a 3-SAT formula.

## INDSET is Fundamentally Harder than MIN-VERTEX-COVER

A Minimum Vertex Cover of a graph is a minimal set of vertices such that every other vertex is a neighbor to a member of this set. Computing the size of the minimum vertex cover of a graph is the MIN-VERTEX-COVER problem. A $\rho$-approximation to MIN-VERTEX-COVER, for $\rho \geq 1$, outputs a set whose cardinality is at most a $1/\rho$ times larger than the size of the minimal cover.

Consider the following approximation algorithm for the minimum vertex cover in a graph $G$: let $S$ be the empty set. Add an arbitrary edge $e$ in $S$ (that is, its endpoints), then delete its endpoints from $G$ as well as all edges adjacent to $e$. Then repeat until no more edges can be included in $S$. Since the edges in $S$ is a matching, any vertex cover $C$ must contain at least one vertex for every edge in $S$; hence $\vert C \vert \geq \vert S \vert / 2$. Hence this algorithm is a $1/2$-approximation for MIN-VERTEX-COVER.

Recall that an Independent Set of a graph is a set of vertices that have no edges among themselves.

Fact (due to Tibor Gallai). The complement of the largest independent set in a  graph is a minimum vertex cover.

Proof sketch: Fix the largest independent set $S$ in any connected graph $G=(V,E)$. Each of the remaining vertices $u \in = \bar{S} = V \setminus S$ must have at least one edge with vertices in $S$; otherwise, $u$ would have been in $S$. The size of any minimal vertex cover must be at least $\vert \bar{S} \vert$: Clearly, $cover(S)=\bar{S}$ is a valid vertex cover, and if we exclude some $u \in \bar{S}$, we have to compensate by picking at least one vertex from $S$. If we use a different independent set $S^\prime$, the size of $cover(S^\prime)$ cannot be any smaller than that of $cover(S)$ since $S$ is the largest independent set.

Lemma (Approximating MIN-VERTEX-COVER is hard for some $\rho^\prime$).  There exists a constant $\rho^\prime < 1$ such that computing a $\rho^\prime$-approximation to the minimum vertex cover problem is NP-hard.

Proof sketch: Theorem B tells us that obtaining a $\rho$-approximation to the MAX-3SAT problem is NP-hard. Given a 3-CNF formula $\phi$, consider its conflict graph $G$: every clause is associated with seven vertices, one for each candidate assignments to its literals. (One of the eight possible assignments for a clause can never satisfy the clause: namely, the assignment culminating in a Boolean OR of three zeros.) There is an edge in the graph (between vertices associated with different clauses) if and only if the two clauses conflict on the same variable. For example, the two clauses $(a \vee \bar{b} \vee c)$ and $(b \vee d \vee e)$ conflict on the variable $b$; no assignment can satisfy them simultaneously. Suppose $G$ has $n$ vertices, in which case $\phi$ will have $N = n/7$ clauses.

The minimum vertex cover of $G$ has size $n - val(\phi)N$, since the largest independent set in $G$ has size $val(\phi)N$.

• Suppose $val(\phi) < \rho$. Then the size of the minimum vertex cover is at least $n - n\rho/7$ since $N = n/7$.
• Suppose $val(\phi) = 1$. A $\rho^\prime$ approximation to the minimum vertex cover would give us a vertex cover of size $(n-N)/\rho^\prime = 6n/7\rho^\prime$ since $N = n/7$. If we set $\rho^\prime = 6/(7-\rho)$, then this size is at most $(n/7)(7-\rho) = n - n\rho/7$.

Thus a $\rho^\prime$-approximation to the minimum vertex cover on $G$ would allows us to decide whether $val(\phi) < \rho$ which, by Theorem B, is NP-hard. Thus a $\rho^\prime$-approximation to the minimum vertex cover problem is NP-hard as well. (End proof.)

Computing the size of the largest independent set of a graph is called the INDSET problem.

Lemma (Approximating INDSET is hard for some $\rho$). There exists a real constant $\rho < 1$ such that a $\rho$-approximation to INDSET is NP-hard.

Proof sketch: Consider the graph $G$ in the preceding proof. In particular, the largest independent set of $G$ has size $val(\phi) N$. If we can make a $\rho$-approximation to the largest independent set problem on $G$, we can use it to obtain a $\rho$-approximation to the MAX-3CNF problem; but this is NP-hard by Theorem B. Hence a $\rho$-approximation to the largest independent set problem on $G$ is NP-hard as well. (End Proof.)