1 Preliminaries: notation and library facts
For \(v, w \in \mathbb {R}^{\ell }\) we write \(\langle v, w \rangle = \sum _{i=1}^{\ell } v[i]\, w[i]\) for their (standard) inner product, where \(v[i]\) denotes the \(i\)-th entry of \(v\). For binary vectors \(v,w \in \{ 0,1\} ^{\ell }\) this counts the coordinates on which both are \(1\). Recalled only so later nodes can depend on it.
For \(v \in \mathbb {R}^{\ell }\), \(\lVert v \rVert = \sqrt{\langle v, v \rangle }\) is the \(\ell _2\) norm. We also use the \(\ell _1\) norm \(\lVert v \rVert _1 = \sum _{i=1}^{\ell } |v[i]|\); for a binary vector \(\lVert v \rVert _1\) is its number of \(1\) entries. Standard.
For \(v \in \mathbb {R}^{a}\) and \(w \in \mathbb {R}^{b}\), the Kronecker product \(v \otimes w \in \mathbb {R}^{ab}\) is the vector whose entries are all the products of an entry of \(v\) and an entry of \(w\). We write \(v^{\otimes q}\) for the \(q\)-fold Kronecker power \(v \otimes \cdots \otimes v\). Standard.
For \(v, w \in \mathbb {R}^{a}\) and \(x, y \in \mathbb {R}^{b}\) we have \(\langle v \otimes x, w \otimes y \rangle = \langle v, w \rangle \cdot \langle x, y \rangle \). In particular \(\langle v^{\otimes q}, w^{\otimes q} \rangle = \langle v, w \rangle ^{q}\) and \(\lVert v^{\otimes q} \rVert = \lVert v \rVert ^{q}\). Standard property of tensor products.
Finite sums, finite products and compositions of continuous functions between finite-dimensional real vector spaces are continuous, as are constant maps, coordinate projections, and the maps \(x \mapsto \max (x,c)\), \(x \mapsto \min (x,c)\). Recalled so the piecewise-linear MLP constructions below can be shown continuous.
An algorithm for a problem on inputs of size parameter \(n\) runs in truly subquadratic time if it runs in time \(O(n^{2-\varepsilon })\) for some fixed constant \(\varepsilon {\gt} 0\) (independent of the instance). Throughout, "cannot be solved in \(O(n^{2-\varepsilon })\) time" is understood with \(\varepsilon \) quantified as in each statement. We say a problem \(\mathcal P\) reduces to \(\mathcal Q\) if a truly subquadratic algorithm for \(\mathcal Q\) yields one for \(\mathcal P\); the two are subquadratic equivalent if each reduces to the other.