Dependency graph

Legend

Boxes: definitions
Ellipses: theorems and lemmas
Blue border: the statement of this result is ready to be formalized; all prerequisites are done
Orange border: the statement of this result is not ready to be formalized; the blueprint needs more work
Blue background: the proof of this result is ready to be formalized; all prerequisites are done
Green border: the statement of this result is formalized
Green background: the proof of this result is formalized
Dark green background: the proof of this result and all its ancestors are formalized
Dark green border: this is in Mathlib

Corollary 40Hardness of LSD and its variants, Corollary 3.2

Assuming SETH or OVC, for every \(\varepsilon {\gt}0\) there exists a constant \(c{\gt}0\) such that \(\mathsf{LSD}_{n,\ell }\) cannot be solved in \(O(n^{2-\varepsilon })\) time when \(\ell \ge c\log n\). Moreover, the same lower bound holds for bichromatic \(\gamma \text{-}\mathsf{LSD}_{n,\ell }\) for all \(\gamma \ge 1\), and for \(\mathsf{LSD}_{n,\ell ,t}\) for some \(t \in [0,1]\).

LaTeX

Corollary 42Hardness of MSD and its variants, Corollary 3.4

Assuming SETH or OVC, for every \(\varepsilon {\gt}0\) there exists a constant \(c{\gt}0\) such that \(\mathsf{MSD}_{n,\ell }\) cannot be solved in \(O(n^{2-\varepsilon })\) time when \(\ell \ge (\log n)^{\frac{c\log n}{(\log \log n)^{2}}}\). Moreover the same lower bound holds for bichromatic \(\gamma \text{-}\mathsf{MSD}_{n,\ell }\) for all \(1 \le \gamma \le (1 + \tfrac {1}{\log \log n})^{\frac{\log n}{(\log \log n)^{2}}}\) and for \(\mathsf{MSD}_{n,\ell ,t}\) for some \(t \in [0,1]\).

LaTeX

Definition 44Bichromatic \(\gamma \)-Additive-Max-IP

For \(A, B \subseteq \{ 0,1\} ^{\ell }\) with \(|A| = |B| = n\) and a threshold \(\alpha \), the bichromatic \(\gamma \)-Additive-Max-IP problem (\(\gamma \)-Additive-BMax-IP) asks to distinguish: Yes: there is \((a,b)\in A\times B\) with \(\langle a, b \rangle \ge \alpha \); No: for all \((a,b)\in A\times B\), \(\langle a, b \rangle {\lt} \alpha - \gamma \).

LaTeX

Definition 43Bichromatic \(\gamma \)-Additive-MSD

For \(A, B \subseteq \{ 0,1\} ^{\ell }\) with \(|A| = |B| = n\) and \(\alpha \in [0,1]\), the bichromatic \(\gamma \)-Additive-MSD problem asks to distinguish:

Yes: there exists \((a,b) \in A \times B\) with \(\dfrac {\langle a, b \rangle }{\lVert a \rVert \lVert b \rVert } \ge \alpha \);
No: for every \((a,b) \in A \times B\), \(\dfrac {\langle a, b \rangle }{\lVert a \rVert \lVert b \rVert } {\lt} \alpha - \gamma \).

LaTeX

Definition 8Attention, Definition 2.1

For input dimension \(d_{\mathrm{in}} \in \mathbb {N}\), output dimension \(d_{\mathrm{out}} \in \mathbb {N}\), embedding dimension \(m \in \mathbb {N}\), and matrices \(Q, K \in \mathbb {R}^{d_{\mathrm{in}} \times m}\) and \(V \in \mathbb {R}^{d_{\mathrm{in}} \times d_{\mathrm{out}}}\), an attention is the mapping \(A_{Q,K,V} : \mathbb {R}^{n \times d_{\mathrm{in}}} \to \mathbb {R}^{n \times d_{\mathrm{out}}}\) defined by

\[ A_{Q,K,V}(X) = \operatorname {softmax}\! \left(X Q K^{\top } X^{\top }\right) X V. \]

We write \(\mathcal A_{d_{\mathrm{in}}, m, d_{\mathrm{out}}}\) for the set of all such attentions.

LaTeX

Definition 33Bichromatic Max-IP

Given two sets \(A, B\) of vectors in \(\{ 0,1\} ^{\ell }\), bichromatic \(\mathsf{Max\text{-}IP}_{n,\ell }\) (resp. its \(\gamma \)-approximate and decision versions) asks to find \(i,j\) achieving (resp. \(\gamma \)-approximating, deciding) the maximal \(\langle a_i, b_j \rangle \).

LaTeX

Definition 29Bichromatic Min-IP

Given two sets \(A, B\) of vectors in \(\{ 0,1\} ^{\ell }\), bichromatic \(\mathsf{Min\text{-}IP}_{n,\ell }\) (resp. its \(\gamma \)-approximate and decision versions) asks to find \(i,j\) achieving (resp. \(\gamma \)-approximating, deciding) the minimal \(\langle a_i, b_j \rangle \) over \(a_i \in A\), \(b_j \in B\).

LaTeX

Definition 23Bichromatic Orthogonal Vectors

Given two sets \(A = \{ a_1, \ldots , a_n\} \) and \(B = \{ b_1, \ldots , b_n\} \) of vectors in \(\{ 0,1\} ^{\ell }\), bichromatic \(\mathsf{OV}_{n,\ell }\) asks whether there exist \(i,j\) with \(\langle a_i, b_j \rangle = 0\).

LaTeX

Definition 12Bag-of-words embedding

Fix a list of \(\ell \) key words. The bag-of-words embedding of a document \(D\) is the vector \(\operatorname {BOW}(D) \in \{ 0,1\} ^{\ell }\) whose \(i\)-th entry is \(1\) iff the \(i\)-th key word occurs in \(D\). (E.g. a document containing only "There are ten apples on the apple tree", with key words "apple", "tree", "computer", "ten", has embedding \((1,1,0,1)\).)

LaTeX

Definition 13Cosine similarity

For nonzero document embeddings \(v, w \in \mathbb {R}^{d}\), their cosine similarity is \(\dfrac {\langle v, w \rangle }{\lVert v \rVert \cdot \lVert w \rVert } \in [0,1]\) when \(v,w \in \{ 0,1\} ^d\); the value \(1\) means complete similarity and \(0\) means no similarity. (Used instead of the raw inner product so that two vectors with similar directions but large magnitude are still considered close.)

LaTeX

Definition 18\(\gamma \)-LSD, Definition A.15

Given \(v_1, \ldots , v_n \in \{ 0,1\} ^{\ell }\), \(\gamma \text{-}\mathsf{LSD}_{n,\ell }\) asks to find \(i^{*} \ne j^{*}\) such that

\[ \min _{1 \le i,j \le n} \dfrac {\langle v_i, v_j \rangle }{\lVert v_i \rVert \cdot \lVert v_j \rVert } \le \dfrac {\langle v_{i^{*}}, v_{j^{*}} \rangle }{\lVert v_{i^{*}} \rVert \cdot \lVert v_{j^{*}} \rVert } \le \gamma \cdot \min _{1 \le i,j \le n} \dfrac {\langle v_i, v_j \rangle }{\lVert v_i \rVert \cdot \lVert v_j \rVert }. \]

LaTeX

Definition 31\(\gamma \)-Max-IP, Definition A.10

Given \(v_1, \ldots , v_n \in \{ 0,1\} ^{\ell }\), \(\gamma \text{-}\mathsf{Max\text{-}IP}_{n,\ell }\) asks to find \(i \ne j\) such that \(\langle v_i, v_j \rangle \) is a \(\gamma \)-approximation of the maximal inner product.

LaTeX

Definition 27\(\gamma \)-Min-IP, Definition 2.8 / A.6

Given \(v_1, \ldots , v_n \in \{ 0,1\} ^{\ell }\), \(\gamma \text{-}\mathsf{Min\text{-}IP}_{n,\ell }\) asks to find \(i \ne j\) such that \(\langle v_i, v_j \rangle \) is a \(\gamma \)-approximation of the minimal inner product.

LaTeX

Definition 15\(\gamma \)-MSD, Definition 2.11

Given \(v_1, \ldots , v_n \in \{ 0,1\} ^{\ell }\), \(\gamma \text{-}\mathsf{MSD}_{n,\ell }\) asks to find \(i^{*}, j^{*}\) with \(i^{*}\ne j^{*}\) such that

\[ \frac{1}{\gamma }\cdot \max _{1 \le i,j \le n} \dfrac {\langle v_i, v_j \rangle }{\lVert v_i \rVert \cdot \lVert v_j \rVert } \le \dfrac {\langle v_{i^{*}}, v_{j^{*}} \rangle }{\lVert v_{i^{*}} \rVert \cdot \lVert v_{j^{*}} \rVert } \le \max _{1 \le i,j \le n} \dfrac {\langle v_i, v_j \rangle }{\lVert v_i \rVert \cdot \lVert v_j \rVert }, \]

i.e. a pair whose cosine similarity is within a factor \(\gamma \) of the maximum.

LaTeX

Definition 1Inner product of binary/real vectors

For \(v, w \in \mathbb {R}^{\ell }\) we write \(\langle v, w \rangle = \sum _{i=1}^{\ell } v[i]\, w[i]\) for their (standard) inner product, where \(v[i]\) denotes the \(i\)-th entry of \(v\). For binary vectors \(v,w \in \{ 0,1\} ^{\ell }\) this counts the coordinates on which both are \(1\). Recalled only so later nodes can depend on it.

LaTeX Lean

Definition 3Kronecker (tensor) product of vectors

For \(v \in \mathbb {R}^{a}\) and \(w \in \mathbb {R}^{b}\), the Kronecker product \(v \otimes w \in \mathbb {R}^{ab}\) is the vector whose entries are all the products of an entry of \(v\) and an entry of \(w\). We write \(v^{\otimes q}\) for the \(q\)-fold Kronecker power \(v \otimes \cdots \otimes v\). Standard.

LaTeX Lean

Definition 20\(k\)-SAT

\(k\mathsf{SAT}\) is the satisfiability problem for Boolean formulas in conjunctive normal form in which every clause has at most \(k\) literals; \(n\) denotes the number of variables. Recalled to state SETH.

LaTeX

Definition 2\(\ell _2\) norm

For \(v \in \mathbb {R}^{\ell }\), \(\lVert v \rVert = \sqrt{\langle v, v \rangle }\) is the \(\ell _2\) norm. We also use the \(\ell _1\) norm \(\lVert v \rVert _1 = \sum _{i=1}^{\ell } |v[i]|\); for a binary vector \(\lVert v \rVert _1\) is its number of \(1\) entries. Standard.

LaTeX Lean

Definition 17Least Similar Document, \(\mathsf{LSD}_{n,\ell }\), Definition A.14

Given nonzero \(v_1, \ldots , v_n \in \{ 0,1\} ^{\ell }\), \(\mathsf{LSD}_{n,\ell }\) asks to find \(i \ne j\) minimizing \(\dfrac {\langle v_i, v_j \rangle }{\lVert v_i \rVert \cdot \lVert v_j \rVert }\).

LaTeX

Definition 19LSD decision version, \(\mathsf{LSD}_{n,\ell ,t}\), Definition A.16

Given \(v_1, \ldots , v_n \in \{ 0,1\} ^{\ell }\) and \(t \in [0,1]\), \(\mathsf{LSD}_{n,\ell ,t}\) asks to determine whether there exist \(i \ne j\) with \(\dfrac {\langle v_i, v_j \rangle }{\lVert v_i \rVert \cdot \lVert v_j \rVert } \le t\).

LaTeX

Definition 30Maximum Inner Product, \(\mathsf{Max\text{-}IP}_{n,\ell }\), Definition A.9

Given \(v_1, \ldots , v_n \in \{ 0,1\} ^{\ell }\), \(\mathsf{Max\text{-}IP}_{n,\ell }\) asks to find a pair \(i \ne j\) such that \(\langle v_i, v_j \rangle \) is maximal.

LaTeX

Definition 32Max-IP decision version, \(\mathsf{Max\text{-}IP}_{n,\ell ,t}\), Definition A.11

Given \(v_1, \ldots , v_n \in \{ 0,1\} ^{\ell }\) and \(0 \le t \le \ell \), \(\mathsf{Max\text{-}IP}_{n,\ell ,t}\) asks to determine whether there exist \(i \ne j\) with \(\langle v_i, v_j \rangle \ge t\).

LaTeX

Definition 26Minimum Inner Product, \(\mathsf{Min\text{-}IP}_{n,\ell }\), Definition 2.7 / A.5

Given \(v_1, \ldots , v_n \in \{ 0,1\} ^{\ell }\), \(\mathsf{Min\text{-}IP}_{n,\ell }\) asks to find a pair \(i \ne j\) such that \(\langle v_i, v_j \rangle \) is minimum.

LaTeX

Definition 28Min-IP decision version, \(\mathsf{Min\text{-}IP}_{n,\ell ,t}\), Definition 2.9 / A.7

Given \(v_1, \ldots , v_n \in \{ 0,1\} ^{\ell }\) and \(0 \le t \le \ell \), \(\mathsf{Min\text{-}IP}_{n,\ell ,t}\) asks to determine whether there exists a pair \(i \ne j\) with \(\langle v_i, v_j \rangle \le t\).

LaTeX

Definition 9Multi-layer perceptron, Definition 2.2

A multi-layer perceptron (MLP) is represented by some continuous function \(\varphi : \mathbb {R}^{a} \to \mathbb {R}^{b}\) for positive integers \(a,b\) (modelling, following the universal approximation theorem, any function approximable by a neural network). It is applied to a matrix row-wise: for \(X \in \mathbb {R}^{n \times a}\), \(\varphi (X) = (\varphi (X_1), \ldots , \varphi (X_n)) \in \mathbb {R}^{n \times b}\).

LaTeX

Definition 14Most Similar Document, \(\mathsf{MSD}_{n,\ell }\), Definition 2.10

Given \(n\) document embeddings \(v_1, \ldots , v_n \in \{ 0,1\} ^{\ell }\) (assumed nonzero), \(\mathsf{MSD}_{n,\ell }\) asks to find \(1 \le i,j \le n\), \(i \ne j\), maximizing \(\dfrac {\langle v_i, v_j \rangle }{\lVert v_i \rVert \cdot \lVert v_j \rVert }\). Note \(\mathsf{MSD}\) differs from \(\mathsf{Max\text{-}IP}\) because of the normalization by \(\lVert v_i \rVert \lVert v_j \rVert \).

LaTeX

Definition 16MSD decision version, \(\mathsf{MSD}_{n,\ell ,t}\), Definition 2.12

Given \(v_1, \ldots , v_n \in \{ 0,1\} ^{\ell }\) and \(t \in [0,1]\), \(\mathsf{MSD}_{n,\ell ,t}\) asks to determine whether there exist \(i \ne j\) with \(\dfrac {\langle v_i, v_j \rangle }{\lVert v_i \rVert \cdot \lVert v_j \rVert } \ge t\).

LaTeX

Definition 22Orthogonal Vectors, \(\mathsf{OV}_{n,\ell }\), Definition 2.5 / A.2

Given binary vectors \(v_1, \ldots , v_n \in \{ 0,1\} ^{\ell }\), \(\mathsf{OV}_{n,\ell }\) asks to determine whether there exists a pair \(i \ne j\) with \(\langle v_i, v_j \rangle = 0\).

LaTeX

Definition 7Softmax

For a vector \(v \in \mathbb {R}^{n}\),

\[ \operatorname {softmax}(v) = \frac{(\exp (v[1]), \ldots , \exp (v[n]))}{\sum _{i=1}^{n} \exp (v[i])} \in \mathbb {R}^{n}. \]

For a matrix \(A \in \mathbb {R}^{n \times n}\) the softmax operator is applied row-wise: \(\operatorname {softmax}(A)_{i,:} = \operatorname {softmax}(A_{i,:})\).

LaTeX

Definition 11A transformer solving a decision problem

A transformer \(\mathsf{TF}\) solves a (decision) problem whose answer on instance \(E \in \mathbb {R}^{n \times d}\) is \(\mathsf{TF}(E)\). For a problem on vectors \(v_1, \ldots , v_n\) with answer a Boolean, \(\mathsf{TF}\) solves it if for every input \(E\) with \(E_{i,:} = v_i\) for all \(i\) one has \(\mathsf{TF}(E) = 1\) exactly when the answer is "yes" and \(\mathsf{TF}(E) = 0\) otherwise.

LaTeX

Definition 10Transformer, Definition 2.3

A transformer is a mapping \(\mathsf{TF}: \mathbb {R}^{n \times d} \to \mathbb {R}\) specified by an attention unit \(A_{Q,K,V}\) and two MLPs \(\varphi _1 : \mathbb {R}^{n \times d} \to \mathbb {R}^{n \times d_{\mathrm{in}}}\) and \(\varphi _2 : \mathbb {R}^{n \times d_{\mathrm{out}}} \to \mathbb {R}\). On an embedding matrix \(E \in \mathbb {R}^{n \times d}\) it outputs

\[ \mathsf{TF}(E) = \varphi _2\! \left(A_{Q,K,V}(\varphi _1(E))\right). \]

This models a single-attention-unit transformer (first MLP, then the attention unit, then the second MLP).

LaTeX

Definition 6Truly subquadratic time

An algorithm for a problem on inputs of size parameter \(n\) runs in truly subquadratic time if it runs in time \(O(n^{2-\varepsilon })\) for some fixed constant \(\varepsilon {\gt} 0\) (independent of the instance). Throughout, "cannot be solved in \(O(n^{2-\varepsilon })\) time" is understood with \(\varepsilon \) quantified as in each statement. We say a problem \(\mathcal P\) reduces to \(\mathcal Q\) if a truly subquadratic algorithm for \(\mathcal Q\) yields one for \(\mathcal P\); the two are subquadratic equivalent if each reduces to the other.

LaTeX

Lemma 47Additive-MSD is equivalent to Additive-Max-IP, Lemma B.5

Bichromatic \(\frac{\log n}{\ell }\)-Additive-\(\mathsf{MSD}_{n,\ell }\) with \(\ell = O(\log n)\) and bichromatic \((\log n)\)-Additive-Max-IP\(_{n,\ell '}\) with \(\ell ' = O(\log n)\) are subquadratic equivalent.

LaTeX

Lemma 5Continuity is closed under sums, products and composition

Finite sums, finite products and compositions of continuous functions between finite-dimensional real vector spaces are continuous, as are constant maps, coordinate projections, and the maps \(x \mapsto \max (x,c)\), \(x \mapsto \min (x,c)\). Recalled so the piecewise-linear MLP constructions below can be shown continuous.

LaTeX Lean

Lemma 4Multiplicativity of inner products under Kronecker product

For \(v, w \in \mathbb {R}^{a}\) and \(x, y \in \mathbb {R}^{b}\) we have \(\langle v \otimes x, w \otimes y \rangle = \langle v, w \rangle \cdot \langle x, y \rangle \). In particular \(\langle v^{\otimes q}, w^{\otimes q} \rangle = \langle v, w \rangle ^{q}\) and \(\lVert v^{\otimes q} \rVert = \lVert v \rVert ^{q}\). Standard property of tensor products.

LaTeX Lean

Lemma 37Bichromatic Max-IP reduction, Lemma A.12

Suppose there is an algorithm \(\mathcal A\) for bichromatic \((\gamma \text{-})\mathsf{Max\text{-}IP}_{n,\ell }\) running in time \(O(n^{2-\varepsilon }\operatorname {poly}(\ell ))\) for some \(\varepsilon {\gt}0\). Then there is an algorithm for \((\gamma \text{-})\mathsf{Max\text{-}IP}_{n,\ell }\) running in time \(O(n^{2-\varepsilon '}\operatorname {poly}(\ell ))\) for some \(\varepsilon '{\gt}0\).

LaTeX

Lemma 38Max-IP is at least as hard as OV, Lemma A.13

Suppose there is an algorithm \(\mathcal A\) for bichromatic \(\mathsf{Max\text{-}IP}_{n,\ell }\) running in time \(O(n^{2-\varepsilon }\operatorname {poly}(\ell ))\) for some \(\varepsilon {\gt}0\). Then there is an algorithm for \(\mathsf{OV}_{n,\ell }\) running in time \(O(n^{2-\varepsilon }\operatorname {poly}(\ell ))\).

LaTeX

Lemma 36Bichromatic Min-IP reduction, Lemma A.8

Suppose there is an algorithm \(\mathcal A\) for bichromatic \(\mathsf{Min\text{-}IP}_{n,\ell }\) running in time \(O(n^{2-\varepsilon }\operatorname {poly}(\ell ))\) for some \(\varepsilon {\gt}0\). Then there is an algorithm for \(\mathsf{Min\text{-}IP}_{n,\ell }\) running in time \(O(n^{2-\varepsilon '}\operatorname {poly}(\ell ))\) for some \(\varepsilon '{\gt}0\). The same holds for \(\gamma \text{-}\mathsf{Min\text{-}IP}_{n,\ell }\) and the decision version \(\mathsf{Min\text{-}IP}_{n,\ell ,t}\).

LaTeX

Lemma 35Min-IP is at least as hard as OV

For any \(\gamma \ge 1\), \(\mathsf{Min\text{-}IP}_{n,\ell }\) and \(\gamma \text{-}\mathsf{Min\text{-}IP}_{n,\ell }\) are both at least as hard as \(\mathsf{OV}_{n,\ell }\). Consequently, assuming OVC, for any \(\varepsilon {\gt}0\) there is \(c{\gt}0\) such that \(\mathsf{Min\text{-}IP}_{n,c\log n}\) cannot be solved in \(O(n^{2-\varepsilon })\) time.

LaTeX

Lemma 54Continuous coordinate-appending MLP, Lemma D.2

There exists a continuous function \(f : \mathbb {R}^{\ell } \to \mathbb {R}^{\ell +1}\) such that

\[ f(x) = \begin{cases} (x, 1) & \text{if } x[\ell ] \le 1,\\ (0, x) & \text{otherwise.} \end{cases} \]

LaTeX

Lemma 55Continuous normalizing MLP, Lemma D.3

There exists a continuous function \(f : \mathbb {R}^{\ell } \to \mathbb {R}^{\ell +1}\) such that

\[ f(x) = \begin{cases} \left(\frac{x}{\lVert x \rVert _1}, 1\right) & \text{if } x[d] \le 1,\\ \left(0, \frac{x}{\lVert x \rVert _1}\right) & \text{otherwise.} \end{cases} \]

LaTeX

Lemma 53Continuous thresholding MLP, Lemma D.1

For any \(a, b \in \mathbb {R}\) with \(b {\gt} a\), there exists a continuous function \(f : \mathbb {R}^{\ell } \to \mathbb {R}\) such that

\[ f(x) = \begin{cases} 1 & \text{if } x[i] \ge b \text{ for all } 1 \le i \le \ell ,\\ 0 & \text{if } x[i] {\lt} a \text{ for all } 1 \le i \le \ell . \end{cases} \]

LaTeX

Lemma 45Tensoring amplifies the MSD approximation factor, Lemma B.2

Suppose there is an algorithm for \(\gamma \text{-}\mathsf{MSD}_{n,\ell }\) with \(\ell = (\log n)^{\frac{c\log n}{(\log \log n)^{2}}}\) for any constant \(c{\gt}0\) and \(\gamma \le (1 + \tfrac {1}{\log \log n})^{\frac{\log n}{(\log \log n)^{2}}}\) that runs in \(O(n^{2-\varepsilon })\) time for any \(\varepsilon {\gt}0\). Then there is an algorithm for \((1 + \tfrac {1}{\log \log n})\text{-}\mathsf{MSD}_{n, (\log n)^{k}}\) with running time \(O(n^{2-\varepsilon })\) for any constant \(k{\gt}0\).

LaTeX

Lemma 48Bichromatic Additive-MSD reduces to Additive-Max-IP, Lemma B.3

Suppose there is an algorithm for \((1 + \tfrac {1}{\log \log n})\text{-}\mathsf{MSD}_{n, (\log n)^{k}}\) for any constant \(k{\gt}0\) with \(O(n^{2-\varepsilon })\) running time. Then there is an algorithm for bichromatic \(\frac{\log n}{\ell }\)-Additive-\(\mathsf{MSD}_{n, c\log n}\) for any \(c{\gt}0\) in \(O(n^{2-\varepsilon '})\) time for some \(\varepsilon '{\gt}0\).

LaTeX

Lemma 34OV is subquadratic equivalent to bichromatic OV, Lemma A.4

There exists an algorithm for \(\mathsf{OV}_{n,\ell }\) running in time \(O(n^{2-\varepsilon }\cdot \operatorname {poly}(\ell ))\) for some \(\varepsilon {\gt}0\) if and only if there exists an algorithm for bichromatic \(\mathsf{OV}_{n,\ell }\) running in time \(O(n^{2-\varepsilon '}\cdot \operatorname {poly}(\ell ))\) for some \(\varepsilon '{\gt}0\).

LaTeX

Theorem 49Additive-BMax-IP refutes OVC, Lemma B.6 (Chen 2020)

Suppose there is an algorithm for \((\log n)\)-Additive-BMax-IP\(_{n, c\log n}\) for any constant \(c{\gt}0\) running in \(O(n^{2-\varepsilon })\) time for some \(\varepsilon {\gt}0\). Then there is an algorithm for \(\mathsf{OV}_{n, c'\log n}\) for any constant \(c'{\gt}0\) running in \(O(n^{2-\varepsilon '})\) time for some \(\varepsilon '{\gt}0\), refuting SETH and OVC.

LaTeX

Theorem 46Karthik–Manurangsi reduction, Theorem B.4

Suppose there is an algorithm for \((1 + \tfrac {1}{\log \log n})\text{-}\mathsf{MSD}_{n, (\log n)^{k}}\) for any constant \(k{\gt}0\) running in \(O(n^{2-\varepsilon })\) time for any \(\varepsilon {\gt}0\). Then there is an algorithm for bichromatic \((\log n)\)-Additive-Max-IP\(_{n, c\log n}\) for any constant \(c{\gt}0\) running in \(O(n^{2-\varepsilon '})\) time for some \(\varepsilon '{\gt}0\).

LaTeX

Theorem 51An attention unit solves Max-IP and Min-IP decision, Theorem C.1

An attention unit with input and output MLPs and parameters \(d = \ell \), \(d_{\mathrm{in}} = \ell + 1\), \(d_{\mathrm{out}} = 1\), \(m \ge \ell + 1\) can solve \(\mathsf{Max\text{-}IP}_{n,\ell ,t}\) and \(\mathsf{Min\text{-}IP}_{n,\ell ,t}\) for \(1 \le t \le \ell \).

LaTeX

Theorem 52An attention unit solves MSD and LSD decision, Theorem C.2

An attention unit with input and output MLPs and parameters \(d = \ell \), \(d_{\mathrm{in}} = \ell + 1\), \(d_{\mathrm{out}} = 1\), \(m \ge \ell + 1\) can solve \(\mathsf{MSD}_{n,\ell ,t}\) and \(\mathsf{LSD}_{n,\ell ,t}\) for any \(t \in [0,1]\).

LaTeX

Theorem 50An attention unit solves OV, Theorem 4.1

An attention unit with input and output MLPs and parameters \(d = \ell \), \(d_{\mathrm{in}} = \ell \), \(d_{\mathrm{out}} = 1\), \(m \ge \ell + 1\) can solve \(\mathsf{OV}_{n,\ell }\).

LaTeX

Theorem 39Hardness of \(\gamma \)-LSD, Theorem 3.1

Assuming SETH or OVC, for every \(\varepsilon {\gt}0\) there exists a constant \(c{\gt}0\) such that \(\gamma \text{-}\mathsf{LSD}_{n,\ell }\) cannot be solved in \(O(n^{2-\varepsilon })\) time for any \(\gamma \ge 1\) when \(\ell = c\log n\).

LaTeX

Theorem 41Hardness of \(\gamma \)-MSD, Theorem 3.3 / B.1

Assuming SETH or OVC, for every \(\varepsilon {\gt}0\) there exists a constant \(c{\gt}0\) such that \(\gamma \text{-}\mathsf{MSD}_{n,\ell }\) cannot be solved in \(O(n^{2-\varepsilon })\) time when

\[ \ell \ge (\log n)^{\frac{c\log n}{(\log \log n)^{2}}} \quad \text{and}\quad \gamma \le \left(1 + \frac{1}{\log \log n}\right)^{\frac{\log n}{(\log \log n)^{2}}} = 2^{(\log n)^{1-o(1)}}. \]

LaTeX

Theorem 25SETH implies OVC, Williams (2005)

If SETH (Assumption 21) holds, then OVC (Conjecture 24) holds. Consequently all hardness results below can be stated under SETH or OVC.

LaTeX