What’s the simplest nontrivial algorithm for Max-2AND?

Let’s say I barely want to do more work than flipping a coin for each variable…

Monday, September 12, 2022

tcs streaming-csps

$$ % math operators \newcommand{\poly}{\operatorname{poly}} \newcommand{\polylog}{\operatorname{polylog}} \newcommand{\rank}{\operatorname{rank}} \newcommand{\sign}{\operatorname{sign}} % functions \newcommand{\And}{\textsf{And}} \newcommand{\Or}{\textsf{Or}} \newcommand{\Xor}{\textsf{Xor}} \newcommand{\Sat}{\textsf{Sat}} \newcommand{\qed}{\blacksquare} % some letters \newcommand{\BA}{\mathbb A} \newcommand{\BB}{\mathbb B} \newcommand{\BC}{\mathbb C} \newcommand{\BD}{\mathbb D} \newcommand{\BE}{\mathbb E} \newcommand{\BF}{\mathbb F} \newcommand{\BG}{\mathbb G} \newcommand{\BH}{\mathbb H} \newcommand{\BI}{\mathbb I} \newcommand{\BJ}{\mathbb J} \newcommand{\BK}{\mathbb K} \newcommand{\BL}{\mathbb L} \newcommand{\BM}{\mathbb M} \newcommand{\BN}{\mathbb N} \newcommand{\BO}{\mathbb O} \newcommand{\BP}{\mathbb P} \newcommand{\BQ}{\mathbb Q} \newcommand{\BR}{\mathbb R} \newcommand{\BS}{\mathbb S} \newcommand{\BT}{\mathbb T} \newcommand{\BU}{\mathbb U} \newcommand{\BV}{\mathbb V} \newcommand{\BW}{\mathbb W} \newcommand{\BX}{\mathbb X} \newcommand{\BY}{\mathbb Y} \newcommand{\BZ}{\mathbb Z} \newcommand{\CA}{\mathcal A} \newcommand{\CB}{\mathcal B} \newcommand{\CC}{\mathcal C} \newcommand{\CD}{\mathcal D} \newcommand{\CE}{\mathcal E} \newcommand{\CF}{\mathcal F} \newcommand{\CG}{\mathcal G} \newcommand{\CH}{\mathcal H} \newcommand{\CI}{\mathcal I} \newcommand{\CJ}{\mathcal J} \newcommand{\CK}{\mathcal K} \newcommand{\CL}{\mathcal L} \newcommand{\CM}{\mathcal M} \newcommand{\CN}{\mathcal N} \newcommand{\CO}{\mathcal O} \newcommand{\CP}{\mathcal P} \newcommand{\CQ}{\mathcal Q} \newcommand{\CR}{\mathcal R} \newcommand{\CS}{\mathcal S} \newcommand{\CT}{\mathcal T} \newcommand{\CU}{\mathcal U} \newcommand{\CV}{\mathcal V} \newcommand{\CW}{\mathcal W} \newcommand{\CX}{\mathcal X} \newcommand{\CY}{\mathcal Y} \newcommand{\veca}{\mathbf a} \newcommand{\vecb}{\mathbf b} \newcommand{\vecc}{\mathbf c} \newcommand{\vecd}{\mathbf d} \newcommand{\vece}{\mathbf e} \newcommand{\vecf}{\mathbf f} \newcommand{\vecg}{\mathbf g} \newcommand{\vech}{\mathbf h} \newcommand{\veci}{\mathbf i} \newcommand{\vecj}{\mathbf j} \newcommand{\veck}{\mathbf k} \newcommand{\vecl}{\mathbf l} \newcommand{\vecm}{\mathbf m} \newcommand{\vecn}{\mathbf n} \newcommand{\veco}{\mathbf o} \newcommand{\vecp}{\mathbf p} \newcommand{\vecq}{\mathbf q} \newcommand{\vecr}{\mathbf r} \newcommand{\vecs}{\mathbf s} \newcommand{\vect}{\mathbf t} \newcommand{\vecu}{\mathbf u} \newcommand{\vecv}{\mathbf v} \newcommand{\vecw}{\mathbf w} \newcommand{\vecx}{\mathbf x} \newcommand{\vecy}{\mathbf y} \newcommand{\vecz}{\mathbf z} \newcommand{\veczero}{\mathbf 0} \newcommand{\vecone}{\mathbf 1} % math \renewcommand{\tilde}{\widetilde} \renewcommand{\hat}{\widehat} \newcommand{\eqdef}{\;\stackrel{\small \text{def}}{=}} \newcommand{\tpose}{\top} % probability \newcommand{\Exp}{\mathop{\mathbb E}} % complexity \newcommand{\P}{\mathbf P} \newcommand{\NP}{\mathbf{NP}} \newcommand{\NC}{\mathbf{NC}} \newcommand{\AC}{\mathbf{AC}} % algebra \newcommand{\im}{\operatorname{im}} \newcommand{\dist}{\operatorname{dist}} \newcommand{\wt}{\operatorname{wt}} $$
$$ \newcommand{\mtwoand}{\textsf{Max-2AND}} \newcommand{\mthreeand}{\textsf{Max-3AND}} \newcommand{\mkand}{\textsf{Max-}k\textsf{AND}} \newcommand{\mdcut}{\textsf{Max-DiCut}} \newcommand{\mcsp}{\textsf{Max-CSP}} \newcommand{\value}{\textsf{value}} \newcommand{\bias}{\textsf{bias}} \newcommand{\sbias}{\textsf{bias}^\pm} \newcommand{\pos}{\textsf{pos}} \newcommand{\neg}{\textsf{neg}} $$

I’m writing this post to describe some very simple approximation algorithms for the \(\mtwoand\) problem. These are based on recent joint work with Joanna Boyland, Michael Hwang, Tarun Prasad, and Santhoshini Velusamy [1].1 I’d like to stress that these algorithms aren’t exactly “new”: They are simplified versions of recent algorithms by Chou, Golovnev, and Velusamy [4], and do not quantitatively improve the performance. Instead, the main contribution of our paper [1] is showing that this performance can be achieved by algorithms which are “as simple as possible”, in a precise sense — and exactly why this should be the case is still a mystery.

The original motivation behind the work of Chou et al. [4], and our subsequent work [1], was designing algorithms for \(\mtwoand\) which can be implemented efficiently in the so-called “streaming model”. I’ll stick to the standard (“classical”) model for now, and defer talking about the streaming model until the last section below.


I’ll start with a brief teaser of what’s to come — feel free to skip to the definition of the \(\mtwoand\) problem below.

In an instance \(\Psi\) of \(\mtwoand\), say a variable \(x_i\) is “positively-biased” if the positive literal \(x_i\) is more common than the negative literal \(\overline{x_i}\) in \(\Psi\). Consider an algorithm which “searches” for good assignments in \(\Psi\). We say the algorithm is super-oblivious2 if it behaves in the following way: It selects some probability \(\frac12 \leq p \leq 1\), and randomly assigns positively-biased variables to \(1\) w.p. \(p\) and \(0\) w.p. \(1-p\) (and, correspondingly, negatively-biased variables to \(0\) w.p. \(p\) and \(1\) w.p. \(1-p\)). Thus, super-oblivious algorithms interpolate between the “greedy” algorithm which assigns all positively-biased variables to \(1\) and all negatively-biased vertices to \(0\), and the “trivial” algorithm which assigns all variables to \(1\)/\(0\) uniformly at random. We’ll show that the best approximation factor a super-oblivious algorithm can achieve is \(4/9\), and this corresponds to always picking \(p = 2/3\).

We also consider the (seemingly) weaker “measurement” task of estimating the value \(\value_\Psi\) of the best assignment. Given an instance \(\Psi\), we consider a scalar parameter called the bias of \(\psi\), denoted \(\bias_\Psi\), which measures the typical magnitude of the difference between the number of positive and negative appearances of a variable. We say an algorithm is bias-based if its output (an estimate for \(\value_\Psi\)) is a function only of \(\bias_\Psi\). We show that the optimal bias-based algorithm simply outputs a linear function of the bias and also achieves a \(4/9\) ratio.

The analyses for these two classes of algorithms, bias-based and super-oblivious, are intertwined, and rely only on very elementary “discrete math” tools. However, it’s not apparent (to me at least) why the numbers should “magically” work out to give tight bounds! Is there some “structural” property of the \(\mtwoand\) problem that suggests why this should be the case?

In our joint work [1], we prove an analogous results for several families of Boolean CSPs, including \(\mkand\) for all \(k \geq 3\). However, from the streaming perspective, we don’t seem to know how to prove lower bounds for \(\mkand\) for any \(k \geq 3\), and I hope that further understanding the situation for \(k=2\) may shed some light on this.

Defining the problem

An “instance” of \(\mtwoand\) is defined on some set of “variables”, and consists of a bunch of “constraints”; each constraint identifies two variables and declares that each must take some value. For instance, consider the following instance \(\Psi\) of \(\mtwoand\) on some variables \(x_1,\ldots,x_4\):

\[\begin{align*} x_1 &\wedge \overline{x_3} \\ x_2 &\wedge x_3 \\ x_3 &\wedge x_4 \\ x_2 &\wedge \overline{x_4} \end{align*}\]

Here, \(\wedge\) denotes Boolean conjunction and \(\overline{}\) denotes Boolean negation. Hence, the first constraint is satisfied if “\(x_1 = 1\) and \(x_3 = 0\).” Some quick vocab: Within a constraint, each variable or negated variable is called a “literal”, and we refer to these as “positive” and “negative” literals, respectively.

If we want to satisfy \(\Psi\), we could start out by “listening to” the first one and setting \(x_1 = 1\) and \(x_3 = 0\). This already means we can’t satisfy the second or third constraints. The fourth can still be satisfied: We set \(x_2 = 1\) and \(x_4 = 0\). Overall, this gives an “assignment” \(\vecx = 1100\) satisfying \(\frac12\) of the constraints. If \(\Psi\) denotes the above instance, we say “\(\vecx\) has value \(\frac12\) on \(\Psi\)”, and denote this by \(\value_\Psi(\vecx)=\frac12\).

Can we do better with a different assignment \(\vecx \in \{0,1\}^4\)? I wrote the following Python code to check:

from itertools import product
for x_1, x_2, x_3, x_4 in product([0,1], repeat=4):
    val = 0
    val += x_1 and not x_3
    val += x_2 and x_3
    val += x_3 and x_4
    val += x_2 and not x_4
    print("value of {}{}{}{} is {}/4".format(x_1, x_2, x_3, x_4, val))

According to the code, the answer is no, but this naïve approach is time-consuming: If we had \(n\) variables, we’d have to try all \(2^n\) possible assignments. Unfortunately, nobody knows any algorithm for finding the best solution which run much faster than this.


Consider an arbitrary instance \(\Psi\) of \(\mtwoand\), which consists of pairs of literals.3 Let \(\value_\Psi = \max_{\vecx \in \{0,1\}^n} \value_\Psi(\vecx)\) denote the value of the best possible assignment(s) for \(\Psi\). Ideally, we’d like algorithms which find optimal assignments for \(\Psi\) (call this the “search” version of \(\mtwoand\)), or at least calculate its value \(\value_\Psi\) (the “measurement” version of \(\mtwoand\)). But since these problems don’t seem to be tractable, we instead consider “relaxed” versions of the problems requiring only “approximately correct” solutions.

For instance, for the search problem, we want to find an assignment whose value “isn’t too much worse” than the value of the optimal assignment. Namely, given some \(0 \leq \alpha \leq 1\), we want to find an assignment \(\vecx\) such that

\[\alpha \; \value_\Psi \leq \value_\Psi(\vecx) \leq \value_\Psi.\]

In this case, we say we’re “\(\alpha\)-approximating” \(\value_\Psi\).

Similarly, in the measurement version, we want to “estimate” \(\value_\Psi\), by finding some estimate \(0 \leq v \leq 1\) such that \(\alpha \; \value_\Psi \leq v \leq \value_\Psi\).

The trivial approximation

Given an instance \(\Psi\) of \(\mtwoand\) on \(n\) variables, consider sampling an assignment \(\vecx \in \{0,1\}^n\) uniformly at random. Any particular constraint is satisfied with probability \(\frac14\), since we have to “get both variables right”. Thus, by linearity of expectation, we have

\[\Exp_\vecx[\value_\Psi(\vecx)] = \frac14.\]

This implies, by the probabilistic method, that every instance of \(\mtwoand\) has value \(\value_\Psi \geq \frac14\). In turn, this gives us an algorithm for approximate measurement: Given an instance \(\Psi\), ignore \(\Psi\) and output \(\frac14\). This is by definition a \(\frac14\)-approximation. Conversely, for every \(\epsilon > 0\) there are instances of \(\mtwoand\) with value \(\leq \frac14+\epsilon\); this means that \(\frac14\) is the best approximation we can get if we completely ignore the particular instance under consideration. Indeed, it’s a good exercise to check that a uniformly random instance on a sufficiently large number of variables will do the trick: first fix an assignment \(\vecx \in \{0,1\}^n\) and use Chernoff’s inequality to bound the probability that \(\value_\Psi(\vecx) > \frac14+\epsilon\), then take a union bound over all assignments \(\vecx \in \{0,1\}^n\). Hence, we term the \(\frac14\)-approximation the trivial approximation for \(\mtwoand\).

Similarly, for approximate search, for every \(\epsilon > 0\), the above algorithm outputs a \((\frac14-\epsilon)\)-approximation on sufficiently large instances.

Can we do better?

Setting up the algorithm

Bias of a variable

Let \(\pos_\Psi(i)\) and \(\neg_\Psi(i)\) denote the number of occurences of the literals \(x_i\) and \(\overline{x_i}\) in \(\Psi\), respectively. Then consider the quantity

\[\bias_\Psi(i) \eqdef \pos_\Psi(i) - \neg_\Psi(i).\]

If, for instance, \(\bias_\Psi(i)\) is a large positive number, that means \(x_i\) is much more common than \(\overline{x_i}\) in \(\Psi\), suggesting that we should assign \(x_i = 1\). Similarly, if \(\bias_\Psi(i)\) is a large negative number, we should assign it to \(0\). On the other hand, if \(\bias_\Psi(i)\) is small in magnitude, it may be “challenging to assign”, as \(x_i\) and \(\overline{x_i}\) are roughly equally common and so no matter what, we’ll end up violating close to half of the constraints involving \(x_i\) or its negation.

Total bias of an instance

Hence, it makes sense to consider the following quantity:

\[\bias_\Psi \eqdef \frac1{2m} \sum_{i=1}^n \lvert\bias_\Psi(i)\rvert,\]

where \(m\) is the total number of constraints in \(\Psi\). We have \(0 \leq \bias_\Psi \leq 1\); in particular, the upper-bound follows from the fact that even in the “best case” where every variable is “perfectly biased” (i.e., either \(\pos_\Psi(i)=0\) or \(\neg_\Psi(i)=0\)), then \(\sum_{i=1}^n \lvert\bias_\Psi(i)\rvert\) simply counts each constraint twice times. Note also that \(\bias_\Psi\) is easy to measure, using (say) \(\tilde{O}(m)\) time and \(\tilde{O}(n)\) space.

Intuitively, if \(\bias_\Psi \approx 1\) then we should expect \(\value_\Psi \approx 1\), as each variable is “almost perfectly biased” and therefore “easy to assign”. On the other hand, we can hope that if \(\bias_\Psi \approx 0\), then we can’t do much better than randomly assigning each variable, and thus that \(\value_\Psi(i) \approx \frac14\).

As I said in the teaser above, a “bias-based” algorithm for (the measurement version of) \(\mtwoand\) estimates \(\value_\Psi\) only using \(\bias_\Psi\). To analyze bias-based algorithms (and in particular to formalize the intuition that instances with high bias also have high value), we turn next to considering algorithms for the search version of \(\mtwoand\) which “treat variables the same if their biases have the same sign”.

Noisy greedy assignments

Given an instance \(\Psi\) and some \(0 \leq p \leq 1\), consider the following random assignment \(\vecz^p = (z_1^p,\ldots,z_n^p) \in \{0,1\}^n\). Each variable will be assigned independently. If \(\bias_\Psi(i) \geq 0\), then we assign \(z_i^p = 1\) w.p. \(p\) and \(z_i^p = 0\) w.p. \(1-p\). Similarly, if \(\bias_\Psi(i) < 0\), we assign \(z_i^p = 0\) w.p. \(p\) and \(z_i^p = 1\) w.p. \(1-p\). Again, as in the teaser, an algorithm for (the search version of) \(\mtwoand\) is super-oblivious if it outputs the assignment \(\vecz^p\) for some choice of \(p\). We denote the expected value of this assignment by

\[\lambda_\Psi(p) \eqdef \Exp_\vecz[\value_\Psi(\vecz)].\]

The algorithms and their analysis

The main goal now is to prove the following theorem, which states that bias-based and super-oblivious algorithms can achieve decent performance (almost twice the trivial threshold of \(\frac14\)!) for \(\mtwoand\).

Theorem (Chou-Golovnev-Velusamy [4], Boyland-Hwang-Prasad-Singer-Velusamy [1]). For every instance \(\Psi\) of \(\mtwoand\),

\[\frac49 \value_\Psi \leq \frac29(1+\bias_\Psi) \leq \lambda_\Psi(2/3) \leq \value_\Psi.\]

Thus, there is a super-oblivious algorithm which \(\frac49\)-approximates the search version of \(\mtwoand\), and a bias-based algorithm which \(\frac49\)-approximates the measurement version of \(\mtwoand\).

A quick note: Chou, Golovnev, and Velusamy [4] were the first to prove that super-oblivious and bias-based algorithms can \(\frac49\)-approximate \(\mtwoand\) (i.e., the second sentence of the theorem). However, their algorithm was more complicated: The value of \(p\) used in the noisy greedy assignment could depend on the bias of the instance. A consequence of our joint work [1] is that it’s sufficient to take \(p = 2/3\) for all instances (in fact, we show that a phenomenon like this holds for many CSPs, including \(\mkand\) for all \(k\)). In our paper we call this the “max-min property”. Roughly, this corresponds to the fact that for the approximation factor achieved by the best possible super-oblivious algorithm:

\[\alpha = \min_{\text{instance }\Psi} \max_{0 \leq p \leq 1} \frac{\lambda_\Psi(p)}{\value_\Psi},\]

the “max-min inequality” (which swaps the min and the max) is tight:

\[\alpha = \max_{0 \leq p \leq 1} \min_{\text{instance }\Psi} \frac{\lambda_\Psi(p)}{\value_p} = \min_{\text{instance }\Psi} \frac{\lambda_\Psi(2/3)}{\value_\Psi} = \frac49.\]

(This is implied by the theorem above and a counterexample below.)

Earlier, Feige and Jozeph [5] showed that the super-oblivious algorithm using the “greedy” assignment corresponding to \(p=1\) gives a \(2/5\)-approximation. Guruswami, Velingker, and Velusamy [6] similarly built a bias-based algorithm which gets a \(2/5\)-approximation for the measurement version of the problem.


Let’s prove the theorem. First, some more notation — sorry! For \(t \in \{0,\ldots,2\}\), let \(p_\Psi(t)\) denote the fraction of constraints in \(\Psi\) with exactly \(t\) positive literals. Also, define a “signed variant” of bias:

\[\sbias_\Psi = \frac1{2m} \sum_{i=1}^n \bias_\Psi(i).\]

We have \(-1 \leq \sbias_\Psi \leq 1\), and \(\sbias_\Psi \leq \bias_\Psi\), with equality iff every variable in \(\Psi\) is nonnegatively-biased; call this “Property P”. A priori, \(\sbias_\Psi\) isn’t useful for calculating \(\value_\Psi\) — it may be \(0\) even in instances with value \(1\). However, we do have the following useful formula:

\[\sbias_\Psi = -p_\Psi(0) + p_\Psi(2),\]

which follows from double-counting: Clauses with \(t\) positive literals and \(2-t\) negative literals contribute \(t - (2-t) = 2(t-1)\) to the sum \(\sum_{i=1}^n \bias_\Psi(i)\). Since \(p_\Psi(0)+p_\Psi(1)+p_\Psi(2) = 1\), adding \(1\) to both sides gives

\[1 + \sbias_\Psi = p_\Psi(1) + 2p_\Psi(2).\]

Proving the lower bound (in a special case)

Suppose \(\Psi\) satisfies Property P: All its variables have nonnegative bias. Then since \(\sbias_\Psi = \bias_\Psi\), we can write

\[\frac29(1 + \bias_\Psi ) = \frac29 p_\Psi(1) + \frac49 p_\Psi(2).\]

On the other hand, note that if \(\Psi\) satisfies Property P, then in the random assignment \(\vecx\), each variable is assigned to \(1\) w.p. \(\frac23\) and \(0\) w.p. \(\frac13\). Therefore, a clause with e.g. \(1\) positive and \(1\) negative literal will be satisfied by \(\vecx\) with probability \(\frac{2}3 \cdot \frac13 = \frac4{9}\), and overall we can write

\[\Exp_\vecx[\value_\Psi(\vecx)] = \frac1{9} p_\Psi(0) + \frac29 p_\Psi(1) + \frac49 p_\Psi(2).\]

Note that for each \(t \in \{0,1,2\}\), the coefficient of \(p_\Psi(t)\) in this expression exceeds that in \(1+\bias_\Psi\), and \(p_\Psi(t) \geq 0\); thus, we conclude

\[\frac29(1+\bias_\Psi) \leq \Exp_\vecx[\value_\Psi(\vecx)].\]

Proving the upper bound (in a special case)

In general, since \(\bias_\Psi \geq \sbias_\Psi\) we have

\[\frac12 (1+\bias_\Psi) \geq \frac12 p_\Psi(1) + p_\Psi(2).\]

Suppose that \(\Psi\) satisfies “Property O”: It’s optimally satisfied by the all-\(1\)’s assignment \(\vecone = (1,\ldots,1)\), i.e., \(\value_\Psi = \value_\Psi(\vecone)\). Then we have

\[\value_\Psi = p_\Psi(2),\]

since \(\vecone\) only satisfies constraints where both literals are positive. Again, by comparing coefficients, we conclude that

\[\value_\Psi \leq \frac12 (1+\bias_\Psi).\]

Putting it all together

The final bounds we proved in the previous two subsections are precisely what we want to prove the theorem, but a priori they only hold when \(\Psi\) satisfies Property P or Property O, respectively. However, we can take care of this with the following observation. Consider “flipping” some subset of variables in \(\Psi\) to get a new instance \(\Psi'\) (replacing a literal \(x_i\) with \(\overline{x_i}\) and vice versa). This operation preserves several important quantities:

\[\value_\Psi = \value_{\Psi'}, \bias_\Psi = \bias_{\Psi'}, \lambda_\Psi(2/3) = \lambda_{\Psi'}(2/3).\]

(However, \(\sbias_\Psi\) is not in general equal to \(\sbias_{\Psi'}\).) We can choose the variables we flip so that \(\Psi'\) satisfies Property P, and similarly make a \(\Psi''\) satisfying Property O, completing the proof of the theorem.

(As an aside, I find the “flipping” operation, which was introduced formally by Chou, Golovnev, Sudan, and Velusamy [2], helpful in interpreting bias-based algorithms for \(\mtwoand\). In particular, observe that for an instance \(\Psi\), \(\value_\Psi\) is simply the maximum value of \(p_{\Psi'}(2)\) over all “flips” \(\Psi'\) of \(\Psi\). Similarly, \(\bias_\Psi\) is the maximum value of \(p_{\Psi'}(2)-p_{\Psi'}(0)\) over all flips. Somehow the proof boils down to showing that — unsurprisingly — these two quantities are mutually informative.)

Discussion: Could we have done better?

In this section, I’m going to prove that no bias-biased algorithm or super-oblivious algorithm can achieve a ratio better than \(4/9\). The result for bias-based algorithms is due to Chou, Golovnev, and Velusamy [4], while the result for super-oblivious algorithm — as far as I know — hasn’t appeared before in the literature, but follows from an easy modification to a counterexample of Feige and Jozeph [5].

Lower bound against bias-based algorithms

The following construction is due to Chou, Golovnev, and Velusamy [4] (formalized in their later work with Sudan [2]).

For an instance \(\Psi\) of \(\mtwoand\), we can write a vector \(\vecp_\Psi = (p_\Psi(0),p_\Psi(1),p_\Psi(2)) \in \Delta_2\), where \(\Delta_2\) denotes the \(2\)-dimensional simplex \(\{(p_0,p_1,p_2) \in \BR^3 : p_i \geq 0 \; \forall i, p_0+p_1+p_2=1\}\). Now for arbitrarily large \(n\), consider two \(n\)-variable instances \(\Psi_N\) and \(\Psi_Y\) which satisfy Property P (i.e., every variable is positively-biased), such that \(\vecp_{\Psi_N} = (0,\frac45,\frac15)\) and \(\vecp_{\Psi_Y} = (\frac25,0,\frac35)\).

We have \(\bias_{\Psi_N} = \bias_{\Psi_Y} = \frac15\), so bias-based algorithms cannot distinguish \(\Psi_Y\) and \(\Psi_N\). Further, we always have \(\value_{\Psi_Y} \geq \value_{\Psi_Y}(\vecone) = \frac35\). On the other hand, suppose we construct \(\Psi_N\) in an appropriately “symmetric” way — say, we randomly sample a linear number of constraints, each of which consists of two uniformly random (distinct) variables; then, we negate one variable in \(4/5\) of the constraints. One can verify that in such an instance, the expected value of any fixed assignment \(\vecx \in \{0,1\}^n\) is roughly the same as the expected value of the “noisy greedy assignment” \(\vecz^p\), where \(p = \lvert\vecx\rvert/n\) is the fraction of \(1\)’s in \(\vecx\), and then again apply a Chernoff + union bound argument. The best such assignment will have expected value \(\max_{p \in [0,1]} \lambda_{\Psi_N}(p) = \max_{p \in [0,1]} \frac45 p(1-p) + \frac15p^2\). Now we can check that this expression has derivative \(\frac{-2}5 (3p-2)\) and is thus maximized at \(p = 2/3\), giving value \(\frac4{15}\). Hence, the best ratio a bias-based algorithm can achieve is \(\frac{4/15}{3/5} = \frac49\).

Lower bound against super-oblivious algorithms

Similarly, we can build instances which show that super-oblivious algorithms can’t achieve ratios better than \(4/9\). The key observation here is that super-oblivious algorithms ignore the magnitude of the bias of variables — e.g., they don’t distinguish between variables with small positive bias and variables with large positive bias. Thus, we should look for instances where in optimal assignments these two kinds of variables are assigned very differently.

Here’s one such example (a modification of an example on p. 416 of [5]) on three variables. Here, the constraints are weighted via nonnegative numbers (marked in parentheses), and \(\delta > 0\) is a small positive constant:

\[\begin{align*} x_1 &\wedge \overline{x_2} \quad (2+\delta) \\ \overline{x_1} &\wedge x_1 \quad (2) \\ x_1 &\wedge \overline{x_3} \quad (1) \end{align*}\]

This instance has total weight \(5 + \delta\). There exists an assignment with value \((3+\delta)/(5+\delta) \approx 3/5\): Assign \(x_1 = 1, x_2 = 0, x_3=0\). However, \(x_2\) has bias \(\delta\) which is small but positive, so any super-oblivious algorithm will assign both \(x_1\) and \(x_2\) to \(1\) w.p. \(p\) (and \(x_3\) to \(0\) w.p. \(p\)). So, the best assignment a super-oblivious algorithm will find has value \(\max_{p \in [0,1]} \frac{4+\delta}{5+\delta} p(1-p) + \frac1{5+\delta} p^2\). As \(\delta \to 0\), this approximates our old friend \(\max_{p \in [0,1]} \frac45 p(1-p) + \frac15p^2 = \frac4{15}\), again at \(p = \frac23\)! This gives exactly the same “gap” as in the previous section.

Implications for streaming

The original motivation for the bias-based \(4/9\)-approximation of Chou, Golovnev, and Velusamy [4] was coming up with a small-space streaming algorithm for the \(\mtwoand\) problem. Informally, a “space-\(s(n)\) streaming algorithm” is one that satisfies the following requirements:

  1. On instance with \(n\) variables, the algorithm uses \(s(n)\) space.
  2. The algorithm can only access the input data (i.e., the list of constraints) via a single, sequential pass.

Guruswami, Velingker, and Velusamy [6] observed that the bias of an instance is easy to measure in the streaming setting! Indeed, imagine a “bias vector” \(\vecb = (\bias_\Psi(1),\ldots,\bias_\Psi(n))\) containing the bias of each variable. We can view this vector as arising “dynamically” as constraints arrive in the stream: e.g., given a constraint \(\overline{x_i} \wedge x_j\), we increment \(\bias_\Psi(j)\) and decrement \(\bias_\Psi(j)\). Then \(\bias_\Psi\) equals the (normalized) \(1\)-norm of the bias vector \(\vecb\), and can be approximated using standard algorithms in the streaming literature, namely the \(O(\log n)\)-space \(\ell_1\)-norm estimation algorithms due to Indyk [7]. These norm estimation algorithms are so-called sketching algorithms; these are streaming algorithms with the special property that, roughly, their behavior cannot depend on the current position in the stream.

Thus, Chou, Golovnev, and Velusamy [4] showed that for all \(\epsilon > 0\), there is an \(O(\log n)\)-space sketching algorithm which \((\frac49-\epsilon)\)-approximates (the measurement version of) \(\mtwoand\). (The loss of a constant is due to error in the \(\ell_1\)-norm estimation.) They also “lifted” the counterexample describe in the previous section to prove that streaming algorithms which \((\frac49+\epsilon)\)-approximate \(\mtwoand\) need (at least) \(\Omega(\sqrt n)\) space.

Building on this, Chou, Golovnev, Sudan, and Velusamy [2] proved a general dichotomy theorem for Boolean CSPs (which they subsequently extended to CSPs over arbitrary alphabets [3]). They consider the problem \(\mcsp(f)\), where constraints are given by an arbitrary predicate \(f : \{0,1\}^k \to \{0,1\}\); specifically, they consider the \((\beta,\gamma)\)-gapped version, where the goal is to distinguish whether \(\value_\Psi \leq \beta\) or \(\value_\Psi \geq \gamma\) given an instance \(\Psi\) promised to satisfy one of the two. Very roughly, they show that such a problem has either an \(O(\log n)\)-space sketching algorithm or requires \(\Omega(\sqrt n)\)-space to sketch, and moreover, the \(O(\log n)\)-space sketching algorithm (if it exists) is a bias-based algorithm whose analysis also relies on noisy greedy assignments.

Importantly, for technical reasons their \(\Omega(\sqrt n)\)-space lower bound holds in general only against sketching algorithms, though in “special cases” (which include \(\mtwoand\)) it can be extended to streaming algorithms (and thus recovers the result of Chou et al. [4]).

Modifying for \(\mthreeand\)

In our paper [1], we perform an analogous analysis to the above one \(\mtwoand\) for \(\mkand\) for all \(k\), establishing its sketching approximability according to the dichotomy theorem of Chou et al. [2].

In particular, consider the \(\mthreeand\) problem (where \(\wedge\)’s are applied on three literals). We can consider the vector \((p_\Psi(0),\ldots,p_\Psi(3))\); the exact same arguments give the formulas \(\sbias_\Psi = -p_\Psi(0) -\frac13 p_\Psi(1) + \frac13 p_\Psi(2) + p_\Psi(3)\), \(\lambda_\Psi(2/3) = \frac1{27} p_\Psi(0) + \frac2{27} p_\Psi(1) + \frac4{27} p_\Psi(2) + \frac8{27} p_\Psi(3)\) when \(\Psi\) satisfies property P, and \(1+\bias_\Psi \geq \frac23 p_\Psi(1) + \frac43 p_\Psi(2) + 2 p_\Psi(3)\) with equality iff \(\Psi\) satisfies Property O. An analogous “coefficient comparison” implies that bias-based algorithms can \(\frac29\)-approximate \(\mthreeand\). We also find the right inputs to the dichotomy theorem in [2] showing that this is the best possible for \(o(\sqrt n)\)-space sketching algorithms; yet we show that the “special case” in [2] fails to extend this bound to hold against streaming algorithms.

Open questions

We conclude with some questions. First, here are some which would further clarify the “picture” I described in this post, though the answers may already be hidden in disguise somewhere:

  1. Suppose we’re given two vectors \(\vecq = (q(0),q(1),q(2)), \vecr = (r(0),r(1),r(2)) \in \Delta_2\). When can these arise as \(\vecq = \vecp_\Psi, \vecr = \vecp_{\Psi'}\) where \(\Psi\) and \(\Psi'\) are instances of \(\mtwoand\) related by a “flip”? What are necessary or sufficient conditions, and is there a sense in which the algorithms described above are taking advantage of this information? How about for \(\mkand\)?
  2. We presented a lower bound against super-oblivious algorithms for \(\mtwoand\). Is there a similar lower bound for \(\mthreeand\)? I’m particularly interested in this because the hard instance for \(\mtwoand\) seems to take advantage of the “decomposition” condition required by the [2] technique for streaming lower bounds (see the definition of “padded one-wise pairs” in §2.3 in that paper).
  3. In a similar vein, given \(0 \leq \beta < \gamma \leq 1\), consider the \((\beta,\gamma)\)-gapped version of \(\mkand\). Suppose that a bias-based algorithm can solve this problem. Can a super-oblivious algorithm always solve this problem (or more precisely, given an instance of value at least \(\gamma\), always produce an assignment of value exceeding \(\gamma\))? How about the other way around?

And here are two more ambitious questions:

  1. From a more “philosophical” perspective, why should the sort of analysis we present for \(\mtwoand\) and \(\mthreeand\) work out at all? Why should the “max-min property” be satisfied? (That is, why does picking \(p=2/3\) give the optimal super-oblivious algorithm?) Could this give us some more insight towards proving a streaming lower bound for \(\mthreeand\)?
  2. In recent joint work Raghuvansh Saxena, Madhu Sudan, and Santhoshini Velusamy [9], we show that we can beat the \(\frac49\) factor for \(\mdcut\) (the special case of \(\mtwoand\) where all constraints have exactly one negative literal) in a number of more relaxed versions of the streaming model. We overcome the limitations of “super-oblivious” algorithms by using the magnitudes of variables’ bias as well as their sign, and in particular, we implement the so-called “oblivious” algorithms of Feige and Jozeph [5]. However, Feige and Jozeph [5] show that these algorithms still cannot achieve a \(\frac12\)-approximation. What might be an alternative route towards a \(\frac12\)-approximation?


  1. I present some similar material in §4.1 of my undergraduate thesis [8] as well as my upcoming talk at APPROX 2022. 

  2. This is a play on the term oblivious in the paper of Feige and Jozeph [5], which there (roughly) means that the algorithm can’t distinguish between variables of the same bias. 

  3. It is natural to consider a generalization of this problem where the constraints can have nonnegative weights, and our goal is to find the maximum satisfiable fraction of weights. For now, we’ll stick with the unweighted version for simplicity, though the algorithm we describe works equally well in the weighted case. 


[1] Joanna Boyland, Michael Hwang, Tarun Prasad, Noah Singer, and Santhoshini Velusamy. "On sketching approximations for symmetric Boolean CSPs". In: APPROX 2022.
[2] Chi-Ning Chou, Alexander Golovnev, Madhu Sudan, and Santhoshini Velusamy. "Approximability of all Boolean CSPs with linear sketches". URL: https://arxiv.org/abs/2102.12351.
[3] Chi-Ning Chou, Alexander Golovnev, Madhu Sudan, and Santhoshini Velusamy. "Approximability of all finite CSPs with linear sketches". In: FOCS 2021. DOI: 10.1109/FOCS52979.2021.00117.
[4] Chi-Ning Chou, Alexander Golovnev, and Santhoshini Velusamy. "Optimal Streaming Approximations for all Boolean Max-2CSPs and Max-\(k\)SAT". In: FOCS 2020. DOI: 10.1109/FOCS46700.2020.00039.
[5] Uriel Feige and Shlomo Jozeph. "Oblivious Algorithms for the Maximum Directed Cut Problem". In: Algorithmica 71.2 (2015), pp. 409-428. DOI: 10.1007/s00453-013-9806-z.
[6] Venkatesan Guruswami, Ameya Velingker, and Santhoshini Velusamy. "Streaming Complexity of Approximating Max 2CSP and Max Acyclic Subgraph". In: APPROX 2017. DOI: 10.4230/LIPIcs.APPROX-RANDOM.2017.8.
[7] Piotr Indyk. "Stable distributions, pseudorandom generators, embeddings, and data stream computation". In: Journal of the ACM 53.3 (2006), pp. 307–323. DOI: 10.1145/1147954.1147955.
[8] Noah Singer. "On streaming approximation algorithms for constraint satisfaction problems". URL: https://nrs.harvard.edu/URN-3:HUL.INSTREPOS:37371750.
[9] Raghuvansh R. Saxena, Noah Singer, Madhu Sudan, and Santhoshini Velusamy. "Streaming complexity of CSPs with randomly ordered constraints". URL: https://arxiv.org/abs/2207.07158.