Fundamental Limitations on Subquadratic Alternatives to Transformers

9 MLP constructions (Appendix D)

Lemma 53 Continuous thresholding MLP, Lemma D.1

For any \(a, b \in \mathbb {R}\) with \(b {\gt} a\), there exists a continuous function \(f : \mathbb {R}^{\ell } \to \mathbb {R}\) such that

\[ f(x) = \begin{cases} 1 & \text{if } x[i] \ge b \text{ for all } 1 \le i \le \ell ,\\ 0 & \text{if } x[i] {\lt} a \text{ for all } 1 \le i \le \ell . \end{cases} \]
Proof

Define \(g : \mathbb {R}\to \mathbb {R}\) by

\[ g(x) = \begin{cases} 1 & \text{if } x \ge b,\\ \frac{1}{b-a}(x - a) & \text{if } a \le x {\lt} b,\\ 0 & \text{if } x {\lt} a, \end{cases} \]

which is continuous (it is piecewise linear and the pieces agree at the breakpoints \(a, b\)). Let \(f(x) = 1 - \prod _{i=1}^{\ell }(1 - g(x[i]))\). By Lemma 5, \(f\) is continuous (a finite product and difference of continuous maps). If \(x[i] \ge b\) for all \(i\) then every \(g(x[i]) = 1\), so the product is \(0\) and \(f(x) = 1\). If \(x[i] {\lt} a\) for all \(i\) then every \(g(x[i]) = 0\), so the product is \(1\) and \(f(x) = 0\).

Lemma 54 Continuous coordinate-appending MLP, Lemma D.2

There exists a continuous function \(f : \mathbb {R}^{\ell } \to \mathbb {R}^{\ell +1}\) such that

\[ f(x) = \begin{cases} (x, 1) & \text{if } x[\ell ] \le 1,\\ (0, x) & \text{otherwise.} \end{cases} \]
Proof

Define \(g : \mathbb {R}\to \mathbb {R}\) by \(g(x) = 1\) if \(x \le 1\), \(g(x) = 2 - x\) if \(1 {\lt} x \le 2\), and \(g(x) = 0\) if \(x {\gt} 2\), which is continuous. Let \(f_1(x) = (x, 1)\) and \(f_2(x) = (0, x)\) in \(\mathbb {R}^{\ell +1}\), both continuous. Set

\[ f(x) = g(x[\ell ])\cdot f_1(x) + (1 - g(x[\ell ]))\cdot f_2(x). \]

By Lemma 5, \(f\) is continuous, and one checks directly that it equals \((x,1)\) when \(x[\ell ] \le 1\) and \((0,x)\) when \(x[\ell ] {\gt} 2\) (and interpolates continuously in between).

Lemma 55 Continuous normalizing MLP, Lemma D.3

There exists a continuous function \(f : \mathbb {R}^{\ell } \to \mathbb {R}^{\ell +1}\) such that

\[ f(x) = \begin{cases} \left(\frac{x}{\lVert x \rVert _1}, 1\right) & \text{if } x[d] \le 1,\\ \left(0, \frac{x}{\lVert x \rVert _1}\right) & \text{otherwise.} \end{cases} \]
Proof

Let \(g\) be the same continuous function as in Lemma 54, and let \(f_1(x) = \left(\frac{x}{\lVert x \rVert _1}, 1\right)\) and \(f_2(x) = \left(0, \frac{x}{\lVert x \rVert _1}\right)\), both continuous on the relevant domain (nonzero inputs). Then

\[ f(x) = g(x[\ell ])\cdot f_1(x) + (1 - g(x[\ell ]))\cdot f_2(x) \]

is continuous by Lemma 5 and satisfies the required piecewise definition.