What’s the simplest nontrivial algorithm for Max2AND?
Let’s say I barely want to do more work than flipping a coin for each variable…
Monday, September 12, 2022
I’m writing this post to describe some very simple approximation algorithms for the \(\mtwoand\) problem. These are based on recent joint work with Joanna Boyland, Michael Hwang, Tarun Prasad, and Santhoshini Velusamy [1].^{1} I’d like to stress that these algorithms aren’t exactly “new”: They are simplified versions of recent algorithms by Chou, Golovnev, and Velusamy [4], and do not quantitatively improve the performance. Instead, the main contribution of our paper [1] is showing that this performance can be achieved by algorithms which are “as simple as possible”, in a precise sense — and exactly why this should be the case is still a mystery.
The original motivation behind the work of Chou et al. [4], and our subsequent work [1], was designing algorithms for \(\mtwoand\) which can be implemented efficiently in the socalled “streaming model”. I’ll stick to the standard (“classical”) model for now, and defer talking about the streaming model until the last section below.
Teaser
I’ll start with a brief teaser of what’s to come — feel free to skip to the definition of the \(\mtwoand\) problem below.
In an instance \(\Psi\) of \(\mtwoand\), say a variable \(x_i\) is “positivelybiased” if the positive literal \(x_i\) is more common than the negative literal \(\overline{x_i}\) in \(\Psi\). Consider an algorithm which “searches” for good assignments in \(\Psi\). We say the algorithm is superoblivious^{2} if it behaves in the following way: It selects some probability \(\frac12 \leq p \leq 1\), and randomly assigns positivelybiased variables to \(1\) w.p. \(p\) and \(0\) w.p. \(1p\) (and, correspondingly, negativelybiased variables to \(0\) w.p. \(p\) and \(1\) w.p. \(1p\)). Thus, superoblivious algorithms interpolate between the “greedy” algorithm which assigns all positivelybiased variables to \(1\) and all negativelybiased vertices to \(0\), and the “trivial” algorithm which assigns all variables to \(1\)/\(0\) uniformly at random. We’ll show that the best approximation factor a superoblivious algorithm can achieve is \(4/9\), and this corresponds to always picking \(p = 2/3\).
We also consider the (seemingly) weaker “measurement” task of estimating the value \(\value_\Psi\) of the best assignment. Given an instance \(\Psi\), we consider a scalar parameter called the bias of \(\psi\), denoted \(\bias_\Psi\), which measures the typical magnitude of the difference between the number of positive and negative appearances of a variable. We say an algorithm is biasbased if its output (an estimate for \(\value_\Psi\)) is a function only of \(\bias_\Psi\). We show that the optimal biasbased algorithm simply outputs a linear function of the bias and also achieves a \(4/9\) ratio.
The analyses for these two classes of algorithms, biasbased and superoblivious, are intertwined, and rely only on very elementary “discrete math” tools. However, it’s not apparent (to me at least) why the numbers should “magically” work out to give tight bounds! Is there some “structural” property of the \(\mtwoand\) problem that suggests why this should be the case?
In our joint work [1], we prove an analogous results for several families of Boolean CSPs, including \(\mkand\) for all \(k \geq 3\). However, from the streaming perspective, we don’t seem to know how to prove lower bounds for \(\mkand\) for any \(k \geq 3\), and I hope that further understanding the situation for \(k=2\) may shed some light on this.
Defining the problem
An “instance” of \(\mtwoand\) is defined on some set of “variables”, and consists of a bunch of “constraints”; each constraint identifies two variables and declares that each must take some value. For instance, consider the following instance \(\Psi\) of \(\mtwoand\) on some variables \(x_1,\ldots,x_4\):
\[\begin{align*} x_1 &\wedge \overline{x_3} \\ x_2 &\wedge x_3 \\ x_3 &\wedge x_4 \\ x_2 &\wedge \overline{x_4} \end{align*}\]Here, \(\wedge\) denotes Boolean conjunction and \(\overline{}\) denotes Boolean negation. Hence, the first constraint is satisfied if “\(x_1 = 1\) and \(x_3 = 0\).” Some quick vocab: Within a constraint, each variable or negated variable is called a “literal”, and we refer to these as “positive” and “negative” literals, respectively.
If we want to satisfy \(\Psi\), we could start out by “listening to” the first one and setting \(x_1 = 1\) and \(x_3 = 0\). This already means we can’t satisfy the second or third constraints. The fourth can still be satisfied: We set \(x_2 = 1\) and \(x_4 = 0\). Overall, this gives an “assignment” \(\vecx = 1100\) satisfying \(\frac12\) of the constraints. If \(\Psi\) denotes the above instance, we say “\(\vecx\) has value \(\frac12\) on \(\Psi\)”, and denote this by \(\value_\Psi(\vecx)=\frac12\).
Can we do better with a different assignment \(\vecx \in \{0,1\}^4\)? I wrote the following Python code to check:
According to the code, the answer is no, but this naïve approach is timeconsuming: If we had \(n\) variables, we’d have to try all \(2^n\) possible assignments. Unfortunately, nobody knows any algorithm for finding the best solution which run much faster than this.
Approximations
Consider an arbitrary instance \(\Psi\) of \(\mtwoand\), which consists of pairs of literals.^{3} Let \(\value_\Psi = \max_{\vecx \in \{0,1\}^n} \value_\Psi(\vecx)\) denote the value of the best possible assignment(s) for \(\Psi\). Ideally, we’d like algorithms which find optimal assignments for \(\Psi\) (call this the “search” version of \(\mtwoand\)), or at least calculate its value \(\value_\Psi\) (the “measurement” version of \(\mtwoand\)). But since these problems don’t seem to be tractable, we instead consider “relaxed” versions of the problems requiring only “approximately correct” solutions.
For instance, for the search problem, we want to find an assignment whose value “isn’t too much worse” than the value of the optimal assignment. Namely, given some \(0 \leq \alpha \leq 1\), we want to find an assignment \(\vecx\) such that
\[\alpha \; \value_\Psi \leq \value_\Psi(\vecx) \leq \value_\Psi.\]In this case, we say we’re “\(\alpha\)approximating” \(\value_\Psi\).
Similarly, in the measurement version, we want to “estimate” \(\value_\Psi\), by finding some estimate \(0 \leq v \leq 1\) such that \(\alpha \; \value_\Psi \leq v \leq \value_\Psi\).
The trivial approximation
Given an instance \(\Psi\) of \(\mtwoand\) on \(n\) variables, consider sampling an assignment \(\vecx \in \{0,1\}^n\) uniformly at random. Any particular constraint is satisfied with probability \(\frac14\), since we have to “get both variables right”. Thus, by linearity of expectation, we have
\[\Exp_\vecx[\value_\Psi(\vecx)] = \frac14.\]This implies, by the probabilistic method, that every instance of \(\mtwoand\) has value \(\value_\Psi \geq \frac14\). In turn, this gives us an algorithm for approximate measurement: Given an instance \(\Psi\), ignore \(\Psi\) and output \(\frac14\). This is by definition a \(\frac14\)approximation. Conversely, for every \(\epsilon > 0\) there are instances of \(\mtwoand\) with value \(\leq \frac14+\epsilon\); this means that \(\frac14\) is the best approximation we can get if we completely ignore the particular instance under consideration. Indeed, it’s a good exercise to check that a uniformly random instance on a sufficiently large number of variables will do the trick: first fix an assignment \(\vecx \in \{0,1\}^n\) and use Chernoff’s inequality to bound the probability that \(\value_\Psi(\vecx) > \frac14+\epsilon\), then take a union bound over all assignments \(\vecx \in \{0,1\}^n\). Hence, we term the \(\frac14\)approximation the trivial approximation for \(\mtwoand\).
Similarly, for approximate search, for every \(\epsilon > 0\), the above algorithm outputs a \((\frac14\epsilon)\)approximation on sufficiently large instances.
Can we do better?
Setting up the algorithm
Bias of a variable
Let \(\pos_\Psi(i)\) and \(\neg_\Psi(i)\) denote the number of occurences of the literals \(x_i\) and \(\overline{x_i}\) in \(\Psi\), respectively. Then consider the quantity
\[\bias_\Psi(i) \eqdef \pos_\Psi(i)  \neg_\Psi(i).\]If, for instance, \(\bias_\Psi(i)\) is a large positive number, that means \(x_i\) is much more common than \(\overline{x_i}\) in \(\Psi\), suggesting that we should assign \(x_i = 1\). Similarly, if \(\bias_\Psi(i)\) is a large negative number, we should assign it to \(0\). On the other hand, if \(\bias_\Psi(i)\) is small in magnitude, it may be “challenging to assign”, as \(x_i\) and \(\overline{x_i}\) are roughly equally common and so no matter what, we’ll end up violating close to half of the constraints involving \(x_i\) or its negation.
Total bias of an instance
Hence, it makes sense to consider the following quantity:
\[\bias_\Psi \eqdef \frac1{2m} \sum_{i=1}^n \lvert\bias_\Psi(i)\rvert,\]where \(m\) is the total number of constraints in \(\Psi\). We have \(0 \leq \bias_\Psi \leq 1\); in particular, the upperbound follows from the fact that even in the “best case” where every variable is “perfectly biased” (i.e., either \(\pos_\Psi(i)=0\) or \(\neg_\Psi(i)=0\)), then \(\sum_{i=1}^n \lvert\bias_\Psi(i)\rvert\) simply counts each constraint twice times. Note also that \(\bias_\Psi\) is easy to measure, using (say) \(\tilde{O}(m)\) time and \(\tilde{O}(n)\) space.
Intuitively, if \(\bias_\Psi \approx 1\) then we should expect \(\value_\Psi \approx 1\), as each variable is “almost perfectly biased” and therefore “easy to assign”. On the other hand, we can hope that if \(\bias_\Psi \approx 0\), then we can’t do much better than randomly assigning each variable, and thus that \(\value_\Psi(i) \approx \frac14\).
As I said in the teaser above, a “biasbased” algorithm for (the measurement version of) \(\mtwoand\) estimates \(\value_\Psi\) only using \(\bias_\Psi\). To analyze biasbased algorithms (and in particular to formalize the intuition that instances with high bias also have high value), we turn next to considering algorithms for the search version of \(\mtwoand\) which “treat variables the same if their biases have the same sign”.
Noisy greedy assignments
Given an instance \(\Psi\) and some \(0 \leq p \leq 1\), consider the following random assignment \(\vecz^p = (z_1^p,\ldots,z_n^p) \in \{0,1\}^n\). Each variable will be assigned independently. If \(\bias_\Psi(i) \geq 0\), then we assign \(z_i^p = 1\) w.p. \(p\) and \(z_i^p = 0\) w.p. \(1p\). Similarly, if \(\bias_\Psi(i) < 0\), we assign \(z_i^p = 0\) w.p. \(p\) and \(z_i^p = 1\) w.p. \(1p\). Again, as in the teaser, an algorithm for (the search version of) \(\mtwoand\) is superoblivious if it outputs the assignment \(\vecz^p\) for some choice of \(p\). We denote the expected value of this assignment by
\[\lambda_\Psi(p) \eqdef \Exp_\vecz[\value_\Psi(\vecz)].\]The algorithms and their analysis
The main goal now is to prove the following theorem, which states that biasbased and superoblivious algorithms can achieve decent performance (almost twice the trivial threshold of \(\frac14\)!) for \(\mtwoand\).
Theorem (ChouGolovnevVelusamy [4], BoylandHwangPrasadSingerVelusamy [1]). For every instance \(\Psi\) of \(\mtwoand\),
\[\frac49 \value_\Psi \leq \frac29(1+\bias_\Psi) \leq \lambda_\Psi(2/3) \leq \value_\Psi.\]Thus, there is a superoblivious algorithm which \(\frac49\)approximates the search version of \(\mtwoand\), and a biasbased algorithm which \(\frac49\)approximates the measurement version of \(\mtwoand\).
A quick note: Chou, Golovnev, and Velusamy [4] were the first to prove that superoblivious and biasbased algorithms can \(\frac49\)approximate \(\mtwoand\) (i.e., the second sentence of the theorem). However, their algorithm was more complicated: The value of \(p\) used in the noisy greedy assignment could depend on the bias of the instance. A consequence of our joint work [1] is that it’s sufficient to take \(p = 2/3\) for all instances (in fact, we show that a phenomenon like this holds for many CSPs, including \(\mkand\) for all \(k\)). In our paper we call this the “maxmin property”. Roughly, this corresponds to the fact that for the approximation factor achieved by the best possible superoblivious algorithm:
\[\alpha = \min_{\text{instance }\Psi} \max_{0 \leq p \leq 1} \frac{\lambda_\Psi(p)}{\value_\Psi},\]the “maxmin inequality” (which swaps the min and the max) is tight:
\[\alpha = \max_{0 \leq p \leq 1} \min_{\text{instance }\Psi} \frac{\lambda_\Psi(p)}{\value_p} = \min_{\text{instance }\Psi} \frac{\lambda_\Psi(2/3)}{\value_\Psi} = \frac49.\](This is implied by the theorem above and a counterexample below.)
Earlier, Feige and Jozeph [5] showed that the superoblivious algorithm using the “greedy” assignment corresponding to \(p=1\) gives a \(2/5\)approximation. Guruswami, Velingker, and Velusamy [6] similarly built a biasbased algorithm which gets a \(2/5\)approximation for the measurement version of the problem.
Analysis
Let’s prove the theorem. First, some more notation — sorry! For \(t \in \{0,\ldots,2\}\), let \(p_\Psi(t)\) denote the fraction of constraints in \(\Psi\) with exactly \(t\) positive literals. Also, define a “signed variant” of bias:
\[\sbias_\Psi = \frac1{2m} \sum_{i=1}^n \bias_\Psi(i).\]We have \(1 \leq \sbias_\Psi \leq 1\), and \(\sbias_\Psi \leq \bias_\Psi\), with equality iff every variable in \(\Psi\) is nonnegativelybiased; call this “Property P”. A priori, \(\sbias_\Psi\) isn’t useful for calculating \(\value_\Psi\) — it may be \(0\) even in instances with value \(1\). However, we do have the following useful formula:
\[\sbias_\Psi = p_\Psi(0) + p_\Psi(2),\]which follows from doublecounting: Clauses with \(t\) positive literals and \(2t\) negative literals contribute \(t  (2t) = 2(t1)\) to the sum \(\sum_{i=1}^n \bias_\Psi(i)\). Since \(p_\Psi(0)+p_\Psi(1)+p_\Psi(2) = 1\), adding \(1\) to both sides gives
\[1 + \sbias_\Psi = p_\Psi(1) + 2p_\Psi(2).\]Proving the lower bound (in a special case)
Suppose \(\Psi\) satisfies Property P: All its variables have nonnegative bias. Then since \(\sbias_\Psi = \bias_\Psi\), we can write
\[\frac29(1 + \bias_\Psi ) = \frac29 p_\Psi(1) + \frac49 p_\Psi(2).\]On the other hand, note that if \(\Psi\) satisfies Property P, then in the random assignment \(\vecx\), each variable is assigned to \(1\) w.p. \(\frac23\) and \(0\) w.p. \(\frac13\). Therefore, a clause with e.g. \(1\) positive and \(1\) negative literal will be satisfied by \(\vecx\) with probability \(\frac{2}3 \cdot \frac13 = \frac4{9}\), and overall we can write
\[\Exp_\vecx[\value_\Psi(\vecx)] = \frac1{9} p_\Psi(0) + \frac29 p_\Psi(1) + \frac49 p_\Psi(2).\]Note that for each \(t \in \{0,1,2\}\), the coefficient of \(p_\Psi(t)\) in this expression exceeds that in \(1+\bias_\Psi\), and \(p_\Psi(t) \geq 0\); thus, we conclude
\[\frac29(1+\bias_\Psi) \leq \Exp_\vecx[\value_\Psi(\vecx)].\]Proving the upper bound (in a special case)
In general, since \(\bias_\Psi \geq \sbias_\Psi\) we have
\[\frac12 (1+\bias_\Psi) \geq \frac12 p_\Psi(1) + p_\Psi(2).\]Suppose that \(\Psi\) satisfies “Property O”: It’s optimally satisfied by the all\(1\)’s assignment \(\vecone = (1,\ldots,1)\), i.e., \(\value_\Psi = \value_\Psi(\vecone)\). Then we have
\[\value_\Psi = p_\Psi(2),\]since \(\vecone\) only satisfies constraints where both literals are positive. Again, by comparing coefficients, we conclude that
\[\value_\Psi \leq \frac12 (1+\bias_\Psi).\]Putting it all together
The final bounds we proved in the previous two subsections are precisely what we want to prove the theorem, but a priori they only hold when \(\Psi\) satisfies Property P or Property O, respectively. However, we can take care of this with the following observation. Consider “flipping” some subset of variables in \(\Psi\) to get a new instance \(\Psi'\) (replacing a literal \(x_i\) with \(\overline{x_i}\) and vice versa). This operation preserves several important quantities:
\[\value_\Psi = \value_{\Psi'}, \bias_\Psi = \bias_{\Psi'}, \lambda_\Psi(2/3) = \lambda_{\Psi'}(2/3).\](However, \(\sbias_\Psi\) is not in general equal to \(\sbias_{\Psi'}\).) We can choose the variables we flip so that \(\Psi'\) satisfies Property P, and similarly make a \(\Psi''\) satisfying Property O, completing the proof of the theorem.
(As an aside, I find the “flipping” operation, which was introduced formally by Chou, Golovnev, Sudan, and Velusamy [2], helpful in interpreting biasbased algorithms for \(\mtwoand\). In particular, observe that for an instance \(\Psi\), \(\value_\Psi\) is simply the maximum value of \(p_{\Psi'}(2)\) over all “flips” \(\Psi'\) of \(\Psi\). Similarly, \(\bias_\Psi\) is the maximum value of \(p_{\Psi'}(2)p_{\Psi'}(0)\) over all flips. Somehow the proof boils down to showing that — unsurprisingly — these two quantities are mutually informative.)
Discussion: Could we have done better?
In this section, I’m going to prove that no biasbiased algorithm or superoblivious algorithm can achieve a ratio better than \(4/9\). The result for biasbased algorithms is due to Chou, Golovnev, and Velusamy [4], while the result for superoblivious algorithm — as far as I know — hasn’t appeared before in the literature, but follows from an easy modification to a counterexample of Feige and Jozeph [5].
Lower bound against biasbased algorithms
The following construction is due to Chou, Golovnev, and Velusamy [4] (formalized in their later work with Sudan [2]).
For an instance \(\Psi\) of \(\mtwoand\), we can write a vector \(\vecp_\Psi = (p_\Psi(0),p_\Psi(1),p_\Psi(2)) \in \Delta_2\), where \(\Delta_2\) denotes the \(2\)dimensional simplex \(\{(p_0,p_1,p_2) \in \BR^3 : p_i \geq 0 \; \forall i, p_0+p_1+p_2=1\}\). Now for arbitrarily large \(n\), consider two \(n\)variable instances \(\Psi_N\) and \(\Psi_Y\) which satisfy Property P (i.e., every variable is positivelybiased), such that \(\vecp_{\Psi_N} = (0,\frac45,\frac15)\) and \(\vecp_{\Psi_Y} = (\frac25,0,\frac35)\).
We have \(\bias_{\Psi_N} = \bias_{\Psi_Y} = \frac15\), so biasbased algorithms cannot distinguish \(\Psi_Y\) and \(\Psi_N\). Further, we always have \(\value_{\Psi_Y} \geq \value_{\Psi_Y}(\vecone) = \frac35\). On the other hand, suppose we construct \(\Psi_N\) in an appropriately “symmetric” way — say, we randomly sample a linear number of constraints, each of which consists of two uniformly random (distinct) variables; then, we negate one variable in \(4/5\) of the constraints. One can verify that in such an instance, the expected value of any fixed assignment \(\vecx \in \{0,1\}^n\) is roughly the same as the expected value of the “noisy greedy assignment” \(\vecz^p\), where \(p = \lvert\vecx\rvert/n\) is the fraction of \(1\)’s in \(\vecx\), and then again apply a Chernoff + union bound argument. The best such assignment will have expected value \(\max_{p \in [0,1]} \lambda_{\Psi_N}(p) = \max_{p \in [0,1]} \frac45 p(1p) + \frac15p^2\). Now we can check that this expression has derivative \(\frac{2}5 (3p2)\) and is thus maximized at \(p = 2/3\), giving value \(\frac4{15}\). Hence, the best ratio a biasbased algorithm can achieve is \(\frac{4/15}{3/5} = \frac49\).
Lower bound against superoblivious algorithms
Similarly, we can build instances which show that superoblivious algorithms can’t achieve ratios better than \(4/9\). The key observation here is that superoblivious algorithms ignore the magnitude of the bias of variables — e.g., they don’t distinguish between variables with small positive bias and variables with large positive bias. Thus, we should look for instances where in optimal assignments these two kinds of variables are assigned very differently.
Here’s one such example (a modification of an example on p. 416 of [5]) on three variables. Here, the constraints are weighted via nonnegative numbers (marked in parentheses), and \(\delta > 0\) is a small positive constant:
\[\begin{align*} x_1 &\wedge \overline{x_2} \quad (2+\delta) \\ \overline{x_1} &\wedge x_1 \quad (2) \\ x_1 &\wedge \overline{x_3} \quad (1) \end{align*}\]This instance has total weight \(5 + \delta\). There exists an assignment with value \((3+\delta)/(5+\delta) \approx 3/5\): Assign \(x_1 = 1, x_2 = 0, x_3=0\). However, \(x_2\) has bias \(\delta\) which is small but positive, so any superoblivious algorithm will assign both \(x_1\) and \(x_2\) to \(1\) w.p. \(p\) (and \(x_3\) to \(0\) w.p. \(p\)). So, the best assignment a superoblivious algorithm will find has value \(\max_{p \in [0,1]} \frac{4+\delta}{5+\delta} p(1p) + \frac1{5+\delta} p^2\). As \(\delta \to 0\), this approximates our old friend \(\max_{p \in [0,1]} \frac45 p(1p) + \frac15p^2 = \frac4{15}\), again at \(p = \frac23\)! This gives exactly the same “gap” as in the previous section.
Implications for streaming
The original motivation for the biasbased \(4/9\)approximation of Chou, Golovnev, and Velusamy [4] was coming up with a smallspace streaming algorithm for the \(\mtwoand\) problem. Informally, a “space\(s(n)\) streaming algorithm” is one that satisfies the following requirements:
 On instance with \(n\) variables, the algorithm uses \(s(n)\) space.
 The algorithm can only access the input data (i.e., the list of constraints) via a single, sequential pass.
Guruswami, Velingker, and Velusamy [6] observed that the bias of an instance is easy to measure in the streaming setting! Indeed, imagine a “bias vector” \(\vecb = (\bias_\Psi(1),\ldots,\bias_\Psi(n))\) containing the bias of each variable. We can view this vector as arising “dynamically” as constraints arrive in the stream: e.g., given a constraint \(\overline{x_i} \wedge x_j\), we increment \(\bias_\Psi(j)\) and decrement \(\bias_\Psi(j)\). Then \(\bias_\Psi\) equals the (normalized) \(1\)norm of the bias vector \(\vecb\), and can be approximated using standard algorithms in the streaming literature, namely the \(O(\log n)\)space \(\ell_1\)norm estimation algorithms due to Indyk [7]. These norm estimation algorithms are socalled sketching algorithms; these are streaming algorithms with the special property that, roughly, their behavior cannot depend on the current position in the stream.
Thus, Chou, Golovnev, and Velusamy [4] showed that for all \(\epsilon > 0\), there is an \(O(\log n)\)space sketching algorithm which \((\frac49\epsilon)\)approximates (the measurement version of) \(\mtwoand\). (The loss of a constant is due to error in the \(\ell_1\)norm estimation.) They also “lifted” the counterexample describe in the previous section to prove that streaming algorithms which \((\frac49+\epsilon)\)approximate \(\mtwoand\) need (at least) \(\Omega(\sqrt n)\) space.
Building on this, Chou, Golovnev, Sudan, and Velusamy [2] proved a general dichotomy theorem for Boolean CSPs (which they subsequently extended to CSPs over arbitrary alphabets [3]). They consider the problem \(\mcsp(f)\), where constraints are given by an arbitrary predicate \(f : \{0,1\}^k \to \{0,1\}\); specifically, they consider the \((\beta,\gamma)\)gapped version, where the goal is to distinguish whether \(\value_\Psi \leq \beta\) or \(\value_\Psi \geq \gamma\) given an instance \(\Psi\) promised to satisfy one of the two. Very roughly, they show that such a problem has either an \(O(\log n)\)space sketching algorithm or requires \(\Omega(\sqrt n)\)space to sketch, and moreover, the \(O(\log n)\)space sketching algorithm (if it exists) is a biasbased algorithm whose analysis also relies on noisy greedy assignments.
Importantly, for technical reasons their \(\Omega(\sqrt n)\)space lower bound holds in general only against sketching algorithms, though in “special cases” (which include \(\mtwoand\)) it can be extended to streaming algorithms (and thus recovers the result of Chou et al. [4]).
Modifying for \(\mthreeand\)
In our paper [1], we perform an analogous analysis to the above one \(\mtwoand\) for \(\mkand\) for all \(k\), establishing its sketching approximability according to the dichotomy theorem of Chou et al. [2].
In particular, consider the \(\mthreeand\) problem (where \(\wedge\)’s are applied on three literals). We can consider the vector \((p_\Psi(0),\ldots,p_\Psi(3))\); the exact same arguments give the formulas \(\sbias_\Psi = p_\Psi(0) \frac13 p_\Psi(1) + \frac13 p_\Psi(2) + p_\Psi(3)\), \(\lambda_\Psi(2/3) = \frac1{27} p_\Psi(0) + \frac2{27} p_\Psi(1) + \frac4{27} p_\Psi(2) + \frac8{27} p_\Psi(3)\) when \(\Psi\) satisfies property P, and \(1+\bias_\Psi \geq \frac23 p_\Psi(1) + \frac43 p_\Psi(2) + 2 p_\Psi(3)\) with equality iff \(\Psi\) satisfies Property O. An analogous “coefficient comparison” implies that biasbased algorithms can \(\frac29\)approximate \(\mthreeand\). We also find the right inputs to the dichotomy theorem in [2] showing that this is the best possible for \(o(\sqrt n)\)space sketching algorithms; yet we show that the “special case” in [2] fails to extend this bound to hold against streaming algorithms.
Open questions
We conclude with some questions. First, here are some which would further clarify the “picture” I described in this post, though the answers may already be hidden in disguise somewhere:
 Suppose we’re given two vectors \(\vecq = (q(0),q(1),q(2)), \vecr = (r(0),r(1),r(2)) \in \Delta_2\). When can these arise as \(\vecq = \vecp_\Psi, \vecr = \vecp_{\Psi'}\) where \(\Psi\) and \(\Psi'\) are instances of \(\mtwoand\) related by a “flip”? What are necessary or sufficient conditions, and is there a sense in which the algorithms described above are taking advantage of this information? How about for \(\mkand\)?
 We presented a lower bound against superoblivious algorithms for \(\mtwoand\). Is there a similar lower bound for \(\mthreeand\)? I’m particularly interested in this because the hard instance for \(\mtwoand\) seems to take advantage of the “decomposition” condition required by the [2] technique for streaming lower bounds (see the definition of “padded onewise pairs” in §2.3 in that paper).
 In a similar vein, given \(0 \leq \beta < \gamma \leq 1\), consider the \((\beta,\gamma)\)gapped version of \(\mkand\). Suppose that a biasbased algorithm can solve this problem. Can a superoblivious algorithm always solve this problem (or more precisely, given an instance of value at least \(\gamma\), always produce an assignment of value exceeding \(\gamma\))? How about the other way around?
And here are two more ambitious questions:
 From a more “philosophical” perspective, why should the sort of analysis we present for \(\mtwoand\) and \(\mthreeand\) work out at all? Why should the “maxmin property” be satisfied? (That is, why does picking \(p=2/3\) give the optimal superoblivious algorithm?) Could this give us some more insight towards proving a streaming lower bound for \(\mthreeand\)?
 In recent joint work Raghuvansh Saxena, Madhu Sudan, and Santhoshini Velusamy [9], we show that we can beat the \(\frac49\) factor for \(\mdcut\) (the special case of \(\mtwoand\) where all constraints have exactly one negative literal) in a number of more relaxed versions of the streaming model. We overcome the limitations of “superoblivious” algorithms by using the magnitudes of variables’ bias as well as their sign, and in particular, we implement the socalled “oblivious” algorithms of Feige and Jozeph [5]. However, Feige and Jozeph [5] show that these algorithms still cannot achieve a \(\frac12\)approximation. What might be an alternative route towards a \(\frac12\)approximation?
Footnotes

I present some similar material in §4.1 of my undergraduate thesis [8] as well as my upcoming talk at APPROX 2022. ↩

This is a play on the term oblivious in the paper of Feige and Jozeph [5], which there (roughly) means that the algorithm can’t distinguish between variables of the same bias. ↩

It is natural to consider a generalization of this problem where the constraints can have nonnegative weights, and our goal is to find the maximum satisfiable fraction of weights. For now, we’ll stick with the unweighted version for simplicity, though the algorithm we describe works equally well in the weighted case. ↩
Bibliography
[1]  Joanna Boyland, Michael Hwang, Tarun Prasad, Noah Singer, and Santhoshini Velusamy. "On sketching approximations for symmetric Boolean CSPs". In: APPROX 2022. 

[2]  ChiNing Chou, Alexander Golovnev, Madhu Sudan, and Santhoshini Velusamy. "Approximability of all Boolean CSPs with linear sketches". URL: https://arxiv.org/abs/2102.12351. 
[3]  ChiNing Chou, Alexander Golovnev, Madhu Sudan, and Santhoshini Velusamy. "Approximability of all finite CSPs with linear sketches". In: FOCS 2021. DOI: 10.1109/FOCS52979.2021.00117. 
[4]  ChiNing Chou, Alexander Golovnev, and Santhoshini Velusamy. "Optimal Streaming Approximations for all Boolean Max2CSPs and Max\(k\)SAT". In: FOCS 2020. DOI: 10.1109/FOCS46700.2020.00039. 
[5]  Uriel Feige and Shlomo Jozeph. "Oblivious Algorithms for the Maximum Directed Cut Problem". In: Algorithmica 71.2 (2015), pp. 409428. DOI: 10.1007/s004530139806z. 
[6]  Venkatesan Guruswami, Ameya Velingker, and Santhoshini Velusamy. "Streaming Complexity of Approximating Max 2CSP and Max Acyclic Subgraph". In: APPROX 2017. DOI: 10.4230/LIPIcs.APPROXRANDOM.2017.8. 
[7]  Piotr Indyk. "Stable distributions, pseudorandom generators, embeddings, and data stream computation". In: Journal of the ACM 53.3 (2006), pp. 307–323. DOI: 10.1145/1147954.1147955. 
[8]  Noah Singer. "On streaming approximation algorithms for constraint satisfaction problems". URL: https://nrs.harvard.edu/URN3:HUL.INSTREPOS:37371750. 
[9]  Raghuvansh R. Saxena, Noah Singer, Madhu Sudan, and Santhoshini Velusamy. "Streaming complexity of CSPs with randomly ordered constraints". URL: https://arxiv.org/abs/2207.07158. 