Informally speaking, Impagliazzo’s hardcore lemma says that if a boolean function is “hard to compute on average” by small circuits, then there exists a set of inputs on which the same function is “extremely hard to compute on average” by slightly smaller circuits.
In this post, I am going to explain how I understand the proof of the hardcore lemma presented in the Arora-Barak complexity book (here). Because the formal proof can be found in the book I intend to write in an informal way. I think some subtleties are involved in turning the context of the lemma into a suitable two-player zero-sum game. Doing so enables one to use von Neumann’s minimax theorem to effectively “exchange the quantifiers” in the contrapositive statement of the lemma. Although the Arora-Barak proof mentions these subtleties, I am going to explore these in more detail and in a more accessible way for a beginner like me.
Please let me know about any mistakes in this post. I make a lot of mistakes.
This post is organized as follows. First, we informally present the idea of hard-on-average functions. Then we present John von Neumann’s minimax theorem for two-player zero-sum games and mention some of its implications. Next, we mention the hardcore lemma and proceed to prove it by contradiction. The contradictory assumption would lead us to construct a two-player zero-sum game between a circuit player and an input player, both playing with pure (deterministic) strategies. Then we allow the players to use mixed strategies (which are distributions on the pure strategies). This enables us to use von Neumann’s minimax theorem to show that there exists a certain distribution on small circuits. Circuits drawn from this distribution perform well on average on any suitable input distribution. By combining these circuits, we can construct a slightly larger circuit that would exactly compute the supposedly average-case hard function on a large fraction of inputs, contradicting the hardness assumption on the said function.
Average-Case Hard Functions
Consider the function that is
-hard on average: that is, every small circuit
intended to compute
makes an error on at least
fraction of inputs. Why small circuits? Because if we are allowed to have arbitrarily large circuits we can compute
exactly.
The hardness quantified in the above manner is called “average-case hardness”.
In contrast, if every small circuit fails to compute on every input,
is called “worst-case hard”. Keep in mind that both average-case and worst-case hardness is parameterized with an upper bound on the circuit size, which determines the largest size of the circuit being considered.
Why is hard? Or rather, given a fixed circuit, why does it succeed in computing
on some inputs but fails on the other inputs? How is the “hardness” of
distributed on its inputs? When we are restricted with small circuits to compute
, are all inputs equally hard? Or some inputs harder than others? An answer to these questions is given by Impagliazzo’s hardcore lemma.
von Neumann’s Minimax Theorem on Two-Player Zero-Sum Games
Suppose two players and
are playing a game.
can pick a move
whereas
can pick a move
. They pick their moves this without knowing the opponent’s choice. The “value” of the game, after
and
has made their respective choices
and
, is denoted by
.
always prefers a high value, so he is the “max player”. On the other hand,
always prefers a low value, and hence he is the “min player”. If
wins a certain value, it means
loses the same value. This can be modeled as
trying to maximize
whereas
trying to minimize
. If
has chosen move
, the minimum gain he can expect is
. The worst case gain for
for the entire game is then
. Similarly, the worst case loss for
is
. The game is said to be in equilibrium if $v_A = v_B$, which means no player would have any incentive to change his move as long as the other player does not change his move. While it is not obvious whether an equilibrium should exist, von Neumann’s 1928 theorem says that an equilibrium would exist if the two players select their move from any distribution (that is not a Dirac delta function) on the candidate moves instead of always selecting a particular move. That is, suppose
is a distribution on
and
is a distribution on
. Let
denote the expected value of
on its inputs,
. Then minimax theorem says:
Some points to note:
- The minimax theorem says that both players have a winning move as long as they pick from their own distributions. Here, a winning move is a move that best limits the other player’s amount of success.
- It also says that the equilibrium value of the game for both players is the same regardless of who plays first.
- For an equilibrium to exist, it does not matter what the actual distributions are. All that matters is that for each player, there exist multiple moves with positive probability.
- The minimax theorem is equivalent to the strong duality of linear programming. In fact, both sides of the above equation can be formulated as linear programs, each dual of the other. The proof of equality then directly follows from the fact that the optimum values attained by both the primal and the dual programs are equal.
- Lacks explanation. The domain of the distributions
and
has to be finite. In other words, the players must choose from a finite number of moves. This is because the existence of Nash equilibrium is proved only for finite games, which means both the number of players and the number of moves available to them have to be finite.
Knowing the Opponent’s Mixed Strategy is No Advantage
Notice that the minimax principle is a worst-case loss-aversion policy. Suppose the first player is the min player. From his perspective, the value of the game would be
.
This is a minimization problem for the first (min) player over all possible choices of the second (max) player. However, the inner maximization problem is bounded and feasible because of the finite number of choices and the bounded payoff function. Hence the payoff function would achieve its optimum value at some vertex of the convex feasibility polytope whose vertices are all the candidate pure strategies of the second player. Thus we can safely assume that the second player would ultimately choose a pure strategy as opposed to selecting a convex combination of multiple pure strategies. Hence
Similarly, the equilibrium value of the second (max) player would be
Fact. Even if the first player knows the distribution used by the second player, he does not gain any extra information or advantage from it.
Back to the Hardcore Lemma
Informally, the hardcore lemma says that given an average-case hard function , there is always a “large” set of inputs on which no small circuit can do much better than guessing. In other word, inputs from this set are hard for all small circuits that intend to compute
. This is why this set is called the hard core (or hardcore) of
.
Definitions
If a probability distribution assigns at most
probability to any objects in its domain
, it is called a
-density distribution. If a probability distribution is uniform on exactly
fraction of the objects in its domain, it is called a
-flat distribution, or just
-flat in short. Note that for a fixed
and a finite domain, there can be only a finite number of
-flats.
Suppose we are dealing with functions that take an
bit input to a single bit output. Suppose there exists a function
such that we mention circuits of size at most
as small circuits. Additionally, we mention circuits of size at most
as smaller circuits. The idea is that if we combine a polynomial number of smaller circuits, we get a small circuit.
Recall that a function is
-hard on average if every small circuit
intended to compute
makes an error on at least
fraction of inputs
.
If a circuit is able to compute
on the inputs
drawn from a distribution
with probability
for any fixed
, we say
performs well on
. Otherwise, we say
performs poorly on
. For the following discussion, let us fix an arbitrary positive real
.
Statement of the Hardcore Lemma
Consider the two statements below.
Statement .
is
-hard for small circuits, which means every small circuit fails to compute
on at least
fraction of inputs.
Statement . There exists a
-density distribution
on
-bit strings such that all smaller circuits
perform poorly on
.
The hardcore lemma states that .
See the definitions above for the phrases “performs poorly” and “small / smaller circuits”.
Proof of the Hardcore Lemma
We are going to prove by contradiction. That is, we are going to assume that
is true but
is false, and then proceed to show that
, a contradiction.
Consider the statement , which is the contrapositive of the statement
, as follows.
Statement . For every
-density distribution
there exists some smaller circuit
which performs well on
.
Keep in mind that for the sake of contradiction, we are assuming that Statement is true.
A Finite Two-Player Zero-Sum Game on Pure Strategies
Statements and
smells a lot like a game between an input player and a circuit player. The input player moves first and selects an input distribution
. The circuit player moves second and selects a circuit
. If
performs well on
, the input player pays the circuit player an amount of
for a fixed positive $\epsilon < 1/2$. Otherwise, the circuit player pays to the input player the above amount. Therefore, we can think of it as a two-player zero-sum game.
Consider selecting a single -density distribution for the input player. This is a pure strategy for him, but there exists uncountably many
-density distributions on the set of inputs
. This seemingly implies that the number of choices is not finite. However, there is an important fact relating every
-density distribution to the set of all possible
-flat distributions.
Fact 1. Every -density distribution is a convex combination of the
-flat distributions.
See the section “Omitted Proofs” for a proof.
Thus selecting a particular is equivalent to taking a convex combination of a finite number of
-flats, which is a finite operation.
Now the game is finite since both the number of small circuits and the number of -density distributions are bounded. The strategies (moves) are pure because both players select their moves deterministically.
Statement implies that for every
produced by the input player, there exists a circuit
that performs well of
. Suppose the payoff for this particular circuit is
.
Since the circuit player is maximizing across all suitable circuits, he can guarantee himself a payoff of at least .
When the Input Player Uses a Mixed Strategy
What happens if we allow the input player to have randomized (so-called mixed) strategies? Suppose we allow the input player to select an arbitrary distribution over all possible
-distributions. After that, he can draw a specific
. The circuit player still holds the advantage since thanks to the statement
he has some well-performing circuit
for every
that the input player chooses.
The value attained by the circuit player at this stage is
.
However, there is a problem: a mixed strategy is a distribution on all possible
-density distributions. We can easily see that the number
= the number of distinct
is uncountably infinite. How does the input player choose his mixed strategy from an infinite set of pure strategies? Luckily, the following fact saves the day.
Fact 2. Any distribution on all -density distributions is itself a
-density distribution.
See the section “Omitted Proofs” for a proof.
Thus selecting a -density distribution
is the same as selecting an arbitrary
-density distribution which, according to Fact 1, is equivalent to taking a convex combination of a finite number of
-flat distributions.
When Both Players Use Mixed Strategies
Next, suppose we force the circuit player to select an arbitrary distribution on smaller circuits. Then he can draw a circuit
to play. Now both players are using mixed strategies and hence von Neumann’s minimax theorem applies. It tells us that an equilibrium with mixed strategies exists for both players, and moreover, the value of the game is the same no matter which player plays first.
Value of the Game when the Circuit Player Plays Second
Recall the fact that knowing the distribution used by the second player yields no advantage to the first player when both players use mixed strategies in a two-player zero-sum game. Since the circuit player is playing second and he is picking the circuit from certain distribution
, the value of the game would be the same as the value of the game where he used only pure strategies i.e., picking single circuits
instead of drawing
. In other words,
.
This gives us the following fact.
Fact 3. Assuming statement is true, for every distribution
put forth by the input player, the circuit player always has some circuit
that performs well on it. In other words, the circuit player always wins a payoff at least
playing second.
We can also see it the following way. When the circuit player picks a distribution instead of a single circuit, he has more power and since we are maximizing, his payoff cannot decrease from what it was when he was restricted to only pure strategies.
Value of the Game when the Circuit Player Plays First
By von Neumann’s minimax theorem, when both players use the mixed strategies that lead to the equilibrium, the value obtained by the circuit player playing first would be the same as the value he would have obtained playing second. But statement implies that he would always have at least
amount playing second. Therefore, he would also have the same payoff playing first. Isn’t it magic?
Fact 4. Assuming statement is true, there exists a distribution
on small circuits for the circuit player such that for every mixed strategy
used by the input player, the circuit player can select a circuit
which always performs well on any distribution
.
Constructing a Small Circuit which Correctly Computes the Hard Function on a Large Fraction of Inputs
Suppose the circuit player has a distribution according to Fact 4 above. We can independently sample
number of small circuits
from
and construct a silghtly larger circuit
.
If the probability that all circuits are able to compute
with probability at least
for any positive
, we say that the input
is good for the distribution
. Otherwise, we say that the input
is bad for the distribution
. Note that being good or bad is a property of an input and the circuit distribution
and it does not depend on any specific input distribution.
Fact 3. Assuming is true, the total number of bad inputs must be strictly less than
.
Proof. Because otherwise for every circuit distribution the input player can construct a
equivalent to a
-flat distribution
which would be uniform on the bad inputs for
. Then no circuit
would be able to perform well on
. This would mean that the value attained by the circuit player would be strictly less than
, contradicting the value promised by the minimax theorem and the assumption
.
Let be a good input. Then with probability
, each smaller circuit
will succeed in computing
. Then by a Chernoff bound, the probability that
on a good
is strictly less than $2^{-O(n)}$ because
. Because there can be at most
good
, by union bound, the probability that
for a good
is strictly less than
. This means with positive probability there exists a circuit
which correctly computes
on all good
. However, because the number of bad
is strictly less than
, the fraction of inputs on which the small circuit
succeeds in correctly computing
is strictly larger than
. But this violates the given statement that
is
-hard on average.
Thus we have reached a contradiction: The statement cannot be true and hence the claim of the hardcore lemma follows.
References
- Arora-Barak proof of the hardcore lemma from the book “Computational Complexity – A Modern Approach (2009)”
- Mikio Nakayama’s excellent proof of von Neumann’s minimax theorem from strong LP Duality
- Tinne Hof Kjeldsen’s exposition on von Neumann’s proofs of the minimax theorem
- Jiri Matousek and Bernd Gartner’s proof of the strong LP duality from Farkas’ lemma
- Avner Magen’s lecture notes on how the strong LP duality implies von Neumann’s minimax theorem
- Jacob Fox’s lecture notes on the proof of Brouwer’s fixed point theorem from Sperner ‘s lemma
Omitted Proofs
Fact 1. Every -density distribution is a convex combination of the
-flat distributions.
Proof. We will only outline the proof, which is given as a hint on an exercise in the Arora-Barak book. Consider an embedding of the distributions on i in a
dimensional metric space. Every point must be contained inside the hypercube of side
located at the all-positive orthant. Consider the simplex
formed by all the
-flat distributions; this simplex is a proper subset of the said hypercube. All points inside the simplex
are convex combinations of the vertices. Thus these points are distributions themselves.
Our claim states that every -density distribution must occur inside the simplex including its convex hull.
Suppose not. Then there exists a -density distribution
occurring outside the simplex
. By Farkas’s lemma, every point must occur either inside a convex body, or there exists a hyperplane separating it from the convex body. Let
be the normal vector to that separating plane. Then there must exist a nonnegative scalar
such that
for all
, and
.
Consider the inner product . Note tha although
for all
, the
s could attain any real value. Now let us reorder the terms in the above sum in a decreasing order and scan the terms from large values to small values. Let
index the terms in this order. If we find a
terms such that
, we “move” some weight from
to
so that the weight in
becomes
. Sepcifically, we do the following:
. This is equivalent to moving
along the surface of the norm-1 ball centered at the origin. Doing so would ensure that the moved point still remains a distribution. However, its inner product with
will strictly increase because of our descending ordering, meaning the point will still be separated from the simplex
. By doing this repeatedly
will become a
-flat distribution because every non-zero coordinate will have weight
and because we always stayed on the norm-1 ball, the sum of all weights must be 1. Hence we found a
-flat distribution which is outside the simplex
, contradicting the statement that
is the simplex constructed from all the
-flat distributions to begin with.
Fact 2. Any distribution on all -density distributions is itself a
-density distribution.
Proof. Let are all the
-density distributions on the strings
. Consequently we have
for all
. Suppose
is a distribution on
such that we have
. Let us consider the following sampling strategy for sampling a single string string
as follows. First, we fix an arbitrary distribution
on the
s. Then we select a distribution
. Finally we select
. The probability that a particular
will be drawn from this process is
. Thus sampling in this way gives rise to a
-density distribution. Because
was an arbitrary distribution, our claim follows.