Informally speaking, Impagliazzo’s hardcore lemma says that if a boolean function is “hard to compute on average” by small circuits, then there exists a set of inputs on which the same function is “extremely hard to compute on average” by slightly smaller circuits.

In this post, I am going to explain how I understand the proof of the hardcore lemma presented in the Arora-Barak complexity book (here). Because the formal proof can be found in the book I intend to write in an informal way. I think some subtleties are involved in turning the context of the lemma into a suitable two-player zero-sum game. Doing so enables one to use von Neumann’s minimax theorem to effectively “exchange the quantifiers” in the contrapositive statement of the lemma. Although the Arora-Barak proof mentions these subtleties, I am going to explore these in more detail and in a more accessible way for a beginner like me.

Please let me know about any mistakes in this post. I make a lot of mistakes.

This post is organized as follows. First, we informally present the idea of hard-on-average functions. Then we present John von Neumann’s minimax theorem for two-player zero-sum games and mention some of its implications. Next, we mention the hardcore lemma and proceed to prove it by contradiction. The contradictory assumption would lead us to construct a two-player zero-sum game between a circuit player and an input player, both playing with pure (deterministic) strategies. Then we allow the players to use mixed strategies (which are distributions on the pure strategies). This enables us to use von Neumann’s minimax theorem to show that there exists a certain distribution on small circuits. Circuits drawn from this distribution perform well on average on any suitable input distribution. By combining these circuits, we can construct a slightly larger circuit that would exactly compute the supposedly average-case hard function on a large fraction of inputs, contradicting the hardness assumption on the said function.

Average-Case Hard Functions

Consider the function f that is \delta-hard on average: that is, every small circuit C intended to compute f makes an error on at least \delta fraction of inputs. Why small circuits? Because if we are allowed to have arbitrarily large circuits we can compute f exactly.

The hardness quantified in the above manner is called “average-case hardness”.
In contrast, if every small circuit fails to compute f on every input, f is called “worst-case hard”. Keep in mind that both average-case and worst-case hardness is parameterized with an upper bound on the circuit size, which determines the largest size of the circuit being considered.

Why is f hard? Or rather, given a fixed circuit, why does it succeed in computing f on some inputs but fails on the other inputs? How is the “hardness” of f distributed on its inputs? When we are restricted with small circuits to compute f, are all inputs equally hard? Or some inputs harder than others? An answer to these questions is given by Impagliazzo’s hardcore lemma.

von Neumann’s Minimax Theorem on Two-Player Zero-Sum Games

Suppose two players A and B are playing a game. A can pick a move i \in \{1, 2, 3, \cdots, m\} whereas B can pick a move j \in \{1, 2, 3, \cdots, n\}. They pick their moves this without knowing the opponent’s choice. The “value” of the game, after A and B has made their respective choices i and j, is denoted by P(i,j). A always prefers a high value, so he is the “max player”. On the other hand, B always prefers a low value, and hence he is the “min player”. If A wins a certain value, it means B loses the same value. This can be modeled as A trying to  maximize P(i,j) whereas B trying to minimize P(i,j). If A has chosen move i, the minimum gain he can expect is \min_j P(i,j). The worst case gain for A for the entire game is then v_A=\max_i \min_j P(i,j). Similarly, the worst case loss for B is v_B = \min_j \max_i P(i,j). The game is said to be in equilibrium if $v_A = v_B$, which means no player would have any incentive to change his move as long as the other player does not change his move. While it is not obvious whether an equilibrium should exist, von Neumann’s 1928 theorem says that an equilibrium would exist if the two players select their move from any distribution (that is not a Dirac delta function) on the candidate moves instead of always selecting a particular move. That is, suppose \mathcal{I} is a distribution on 1, 2, \cdots, m and \mathcal{J} is a distribution on 1, 2, \cdots, n. Let E_x[f(x)] denote the expected value of f on its inputs, x. Then minimax theorem says:

\max_{\mathcal{I}} \min_{\mathcal{J}} \mathbf{E}_{i\sim \mathcal{I},j\sim\mathcal{J}}[P(i,j)] =  \min_{ \mathcal{J}} \max_{\mathcal{I}} \mathbf{E}_{i\sim \mathcal{I},j\sim\mathcal{J}}[P(i,j)].

Some points to note:

  1. The minimax theorem says that both players have a winning move as long as they pick from their own distributions. Here, a winning move is a move that best limits the other player’s amount of success.
  2. It also says that the equilibrium value of the game for both players is the same regardless of who plays first.
  3. For an equilibrium to exist, it does not matter what the actual distributions are. All that matters is that for each player, there exist multiple moves with positive probability.
  4. The minimax theorem is equivalent to the strong duality of linear programming. In fact, both sides of the above equation can be formulated as linear programs, each dual of the other. The proof of equality then directly follows from the fact that the optimum values attained by both the primal and the dual programs are equal.
  5. Lacks explanation. The domain of the distributions \mathcal{I} and \mathcal{J} has to be finite. In other words, the players must choose from a finite number of moves. This is because the existence of Nash equilibrium is proved only for finite games, which means both the number of players and the number of moves available to them have to be finite.

Knowing the Opponent’s Mixed Strategy is No Advantage

Notice that the minimax principle is a worst-case loss-aversion policy. Suppose the first player is the min player. From his perspective, the value of the game would be

v= \min_{\mathcal{J}} \max_{\mathcal{I}} \mathbf{E}_{i\sim \mathcal{I},j\sim\mathcal{J}}[P(i,j)].

This is a minimization problem for the first (min) player over all possible choices of the second (max) player. However, the inner maximization problem is bounded and feasible because of the finite number of choices and the bounded payoff function. Hence the payoff function would achieve its optimum value at some vertex of the convex feasibility polytope whose vertices are all the candidate pure strategies of the second player. Thus we can safely assume that the second player would ultimately choose a pure strategy as opposed to selecting a convex combination of multiple pure strategies. Hence

v =\max_{\mathcal{I}} \min_{\mathcal{J}} \mathbf{E}_{i\sim \mathcal{I},j\sim\mathcal{J}}[P(i,j)] = \max_{\mathcal{I}} \min_{j \leq  j \leq n} \mathbf{E}_{i\sim \mathcal{I}}[P(i,j)].

Similarly, the equilibrium value of the second (max) player would be

v =  \min_{\mathcal{J}} \max_{\mathcal{I}} \mathbf{E}_{i\sim \mathcal{I},j\sim\mathcal{J}}[P(i,j)] =  \min_{\mathcal{J}} \max_{1 \leq i \leq m} \mathbf{E}_{j\sim\mathcal{J}}[P(i,j)].

Fact. Even if the first player knows the distribution used by the second player, he does not gain any extra information or advantage from it.

Back to the Hardcore Lemma

Informally, the hardcore lemma says that given an average-case hard function f, there is always a “large” set of inputs on which no small circuit can do much better than guessing. In other word, inputs from this set are hard for all small circuits that intend to compute f. This is why this set is called the hard core (or hardcore) of f.

Definitions

If a probability distribution H_\delta assigns at most 1/\delta|D| probability to any objects in its domain D, it is called a \delta-density distribution. If a probability distribution is uniform on exactly \delta fraction of the objects in its domain, it is called a \delta-flat distribution, or just \delta-flat in short. Note that for a fixed \delta and a finite domain, there can be only a finite number of \delta-flats.

Suppose we are dealing with functions f that take an n bit input to a single bit output. Suppose there exists a function S(n) such that we mention circuits of size at most S(n) as small circuits. Additionally, we mention circuits of size at most S(n) / \mathrm{poly}(n) as smaller circuits. The idea is that if we combine a polynomial number of smaller circuits, we get a small circuit.

Recall that a function f is \delta-hard on average if every small circuit C intended to compute f makes an error on at least \delta fraction of inputs x\in \{0,1\}^n.

If a circuit C is able to compute f on the inputs x drawn from a distribution H_\delta with probability  \geq  1/2 + \epsilon for any fixed 0 < \epsilon < 1/2, we say C performs well on H_\delta. Otherwise, we say C performs poorly on H_\delta. For the following discussion, let us fix an arbitrary positive real \epsilon < 1/2.

Statement of the Hardcore Lemma

Consider the two statements below.

Statement A. f is \delta-hard for small circuits, which means every small circuit fails to compute f on at least \delta fraction of inputs.

Statement B. There exists a \delta-density distribution H_\delta on n-bit strings such that all smaller circuits C perform poorly on H_\delta.

The hardcore lemma states that A \implies B.

See the definitions above for the phrases “performs poorly” and “small / smaller circuits”.

Proof of the Hardcore Lemma

We are going to prove A\implies B by contradiction. That is, we are going to assume that A is true but B is false, and then proceed to show that \bar{B} \implies \bar{A}, a contradiction.

Consider the statement \bar{B}, which is the contrapositive of the statement B, as follows.

Statement \bar{B}For every \delta-density distribution H_\delta there exists some smaller circuit C which performs well on H_\delta.

Keep in mind that for the sake of contradiction, we are assuming that Statement \bar{B} is true.

A Finite Two-Player Zero-Sum Game on Pure Strategies

Statements B and \bar{B} smells a lot like a game between an input player and a circuit player. The input player moves first and selects an input distribution H_\delta. The circuit player moves second and selects a circuit C. If C performs well on H_\delta, the  input player pays the circuit player an amount of

P(H_\delta, C) = \mathbf{Pr}_{x\sim H_\delta} [C(x)=f(x)]

for a fixed positive $\epsilon < 1/2$. Otherwise, the circuit player pays to the input player the above amount. Therefore, we can think of it as a two-player zero-sum game.

Consider selecting a single \delta-density distribution for the input player. This is a pure strategy for him, but there exists uncountably many \delta-density distributions on the set of inputs \{0, 1\}^n. This seemingly implies that the number of choices is not finite. However, there is an important fact relating every \delta-density distribution to the set of all possible \delta-flat distributions.

Fact 1. Every \delta-density distribution is a convex combination of the \delta-flat distributions.

See the section “Omitted Proofs” for a proof.

Thus selecting a particular H_\delta is equivalent to taking a convex combination of a finite number of \delta-flats, which is a finite operation.

Now the game is finite since both the number of small circuits and the number of \delta-density distributions are bounded. The strategies (moves) are pure because both players select their moves deterministically.

Statement \bar{B} implies that for every H_\delta produced by the input player, there exists a circuit C^* that performs well of H_\delta. Suppose the payoff for this particular circuit is

v^* \geq 1/2 + \epsilon.

Since the circuit player is maximizing across all suitable circuits, he can guarantee himself a payoff of at least v^*.

When the Input Player Uses a Mixed Strategy

What happens if we allow the input player to have randomized (so-called mixed) strategies? Suppose we allow the input player to select an arbitrary distribution \mathcal{H} over all possible \delta-distributions. After that, he can draw a specific H_\delta \sim \mathcal{H}. The circuit player still holds the advantage since thanks to the statement \bar{B} he has some well-performing circuit C for every H_\delta that the input player chooses.

The value attained by the circuit player at this stage is

v =  \min_{\mathcal{H}} \max_{C} \mathbf{Pr}_{H_\delta\sim \mathcal{H}, x\sim H_\delta} [C(x)=f(x)] = v^* \geq 1/2 + \epsilon.

However, there is a problem: a mixed strategy \mathcal{H} is a distribution  on all possible \delta-density distributions. We can easily see that the number \vert \mathcal{H} \vert = the number of distinct H_\delta is uncountably infinite. How does the input player choose his mixed strategy from an infinite set of pure strategies? Luckily, the following fact saves the day.

Fact 2. Any distribution on all \delta-density distributions is itself a \delta-density distribution.

See the section “Omitted Proofs” for a proof.

Thus selecting a \delta-density distribution H_\delta \sim \mathcal{H} is the same as selecting an arbitrary \delta-density distribution which, according to Fact 1, is equivalent to taking a convex combination of a finite number of \delta-flat distributions.

When Both Players Use Mixed Strategies

Next, suppose we force the circuit player to select an arbitrary distribution \mathcal{C} on smaller circuits. Then he can draw a circuit C\sim\mathcal{C} to play. Now both players are using mixed strategies and hence von Neumann’s minimax theorem applies. It tells us that an equilibrium with mixed strategies exists for both players, and moreover, the value of the game is the same no matter which player plays first.

Value of the Game when the Circuit Player Plays Second

Recall the fact that knowing the distribution used by the second player yields no advantage to the first player when both players use mixed strategies in a two-player zero-sum game. Since the circuit player is playing second and he is picking the circuit C from certain distribution \mathcal{C}, the value of the game would be the same as the value of the game where he used only pure strategies i.e., picking single circuits C instead of drawing C \sim \mathcal{C}. In other words,

v=\min_{ \mathcal{H}} \max_{\mathcal{C}} \mathbf{Pr}_{H_\delta\sim \mathcal{H}, x\sim H_\delta, C\sim \mathcal{C}} [C(x)=f(x)]

= \min_{ \mathcal{H}} \max_{C} \mathbf{Pr}_{H_\delta\sim \mathcal{H}, x\sim H_\delta} [C(x)=f(x)] = v^* \geq 1/2 + \epsilon.

This gives us the following fact.

Fact 3. Assuming statement \bar{B} is true, for every distribution H_\delta put forth by the input player, the circuit player always has some circuit C that performs well on it. In other words, the circuit player always wins a payoff at least 1/2 + \epsilon playing second.

We can also see it the following way. When the circuit player picks a distribution instead of a single circuit, he has more power and since we are maximizing, his payoff cannot decrease from what it was when he was restricted to only pure strategies.

Value of the Game when the Circuit Player Plays First

By von Neumann’s minimax theorem, when both players use the mixed strategies that lead to the equilibrium, the value obtained by the circuit player playing first would be the same as the value he would have obtained playing second. But statement \bar{B} implies that he would always have at least v^* \geq 1/2 + \epsilon amount playing second. Therefore, he would also have the same payoff playing first. Isn’t it magic?

Fact 4. Assuming statement \bar{B} is true, there exists a distribution \mathcal{C} on small circuits for the circuit player such that for every mixed strategy \mathcal{H} used by the input player, the circuit player can select a circuit C\sim \mathcal{C} which always performs well on any distribution H_\delta \sim \mathcal{H}.

Constructing a Small Circuit which Correctly Computes the Hard Function on a Large Fraction of Inputs

Suppose the circuit player has a distribution \mathcal{C} according to Fact 4 above. We can independently sample t = \mathrm{poly}(n) number of small circuits C_1, C_2, \cdots, C_t from \mathcal{C} and construct a silghtly larger circuit C(x) = \mathrm{majority}(C_1(x), C_2(x), \cdots, C_t(x)).

If the probability that all circuits C\sim \mathcal{C} are able to compute f(x) with probability at least 1/2 +\epsilon for any positive \epsilon < 1/2, we say that the input x is good for the distribution \mathcal{C}. Otherwise, we say that the input x is bad for the distribution \mathcal{C}. Note that being good or bad is a property of an input and the circuit distribution \mathcal{C} and it does not depend on any specific input distribution.

Fact 3. Assuming \bar{B} is true, the total number of bad inputs must be strictly less than \delta 2^n.

Proof. Because otherwise for every circuit distribution \mathcal{C} the input player can construct a \mathcal{H} equivalent to a \delta-flat distribution H^*_\delta which would be uniform on the bad inputs for \mathcal{C}. Then no circuit C\sim \mathcal{C} would be able to perform well on H^*_\delta. This would mean that the value attained by the circuit player would be strictly less than 1/2 + \epsilon, contradicting the value promised by the minimax theorem and the assumption \bar{B}. \square

Let x be a good input. Then with probability \geq 1/2 + \epsilon, each smaller circuit C_i will succeed in computing f(x). Then by a Chernoff bound, the probability that C(x) \neq f(x) on a good x is strictly less than $2^{-O(n)}$ because t=\mathrm{poly}(n). Because there can be at most 2^n good x, by union bound, the probability that C(x) \neq f(x) for a good x is strictly less than 1. This means with positive probability there exists a circuit C which correctly computes f(x) on all good x. However, because the number of bad x is strictly less than \delta 2^n, the fraction of inputs on which the small circuit C succeeds in correctly computing f is strictly larger than 1-\delta. But this violates the given statement that f is \delta-hard on average.

Thus we have reached a contradiction: The statement \bar{B} cannot be true and hence the claim of the hardcore lemma follows. \square

References

  1. Arora-Barak proof of the hardcore lemma from the book “Computational Complexity – A Modern Approach (2009)”
  2. Mikio Nakayama’s excellent proof of von Neumann’s minimax theorem from strong LP Duality
  3. Tinne Hof Kjeldsen’s exposition on von Neumann’s proofs of the minimax theorem
  4. Jiri Matousek and Bernd Gartner’s proof of the strong LP duality from Farkas’ lemma
  5. Avner Magen’s lecture notes on how the strong LP duality implies von Neumann’s minimax theorem
  6. Jacob Fox’s lecture notes on the proof of Brouwer’s fixed point theorem from Sperner ‘s lemma

Omitted Proofs

Fact 1. Every \delta-density distribution is a convex combination of the \delta-flat distributions.

Proof. We will only outline the proof, which is given as a hint on an exercise in the Arora-Barak book. Consider an embedding of the distributions on \{0,1\}^ni in a 2^n dimensional metric space. Every point must be contained inside the hypercube of side 1 located at the all-positive orthant. Consider the simplex C formed by all the \delta-flat distributions; this simplex is a proper subset of the said hypercube. All points inside the simplex C are convex combinations of the vertices. Thus these points are distributions themselves.

Our claim states that every \delta-density distribution must occur inside the simplex including its convex hull.

Suppose not. Then there exists a \delta-density distribution \bar{p} occurring outside the simplex C. By Farkas’s lemma, every point must occur either inside a convex body, or there exists a hyperplane separating it from the convex body. Let \bar{r} be the normal vector to that separating plane. Then there must exist a nonnegative scalar t such that \langle \bar{x}, r \rangle \leq t for all \bar{x} \in C, and \langle \bar{p}, \bar{r}\rangle > t.

Consider the inner product \langle \bar{p}, \bar{r}\rangle = \sum_{1 \leq i \leq 2^n}{p_i r_i} . Note tha although 0 \leq p_i \leq \delta for all i, the r_is could attain any real value. Now let us reorder the terms in the above sum in a decreasing order and scan the terms from large values to small values. Let j = 1, 2, \cdots, 2^n index the terms in this order. If we find a p_j r_j terms such that p_j < 1/\delta2^n, we “move” some weight from p_{j+1} to p_j so that the weight in p_j becomes 1/\delta2^n. Sepcifically, we do the following: d = 1/\delta2^n - p_j, p_j \gets p_j + d, p_{j+1} \gets p_{j+1}-d. This is equivalent to moving \bar{p} along the surface of the norm-1 ball centered at the origin. Doing so would ensure that the moved point still remains a distribution. However, its inner product with r will strictly increase because of our descending ordering, meaning the point will still be separated from the simplex C. By doing this repeatedly \bar{p} will become a \delta-flat distribution because every non-zero coordinate will have weight 1/\delta2^n and because we always stayed on the norm-1 ball, the sum of all weights must be 1. Hence we found a \delta-flat distribution which is outside the simplex C, contradicting the statement that C is the simplex constructed from all the \delta-flat distributions to begin with. \square

 

Fact 2. Any distribution on all \delta-density distributions is itself a \delta-density distribution.

Proof. Let D_1, D_2, D_3, \cdots are all the \delta-density distributions on the strings \{0, 1\}^n. Consequently we have D_i(x) \leq \delta for all i = 1, 2, 3, \cdots. Suppose \mu is a distribution on D_1, D_2, D_3, \cdots such that we have \mu(D_i) \leq 1, \sum_i{\mu(D_i)}=1. Let us consider the following sampling strategy for sampling a single string string x \in \{0, 1\}^n as follows. First, we fix an arbitrary distribution \mu on the D_is. Then we select a distribution D_i \sim \mu. Finally we select x\sim D_i. The probability that a particular x will be drawn from this process is \sum_i{\mu(D_i)D_i(x)} \leq \delta \sum_i{\mu(D_i)} \leq \delta. Thus sampling in this way gives rise to a \delta-density distribution. Because \mu was an arbitrary distribution, our claim follows. \square

 

 

Advertisements