4 What Can and Cannot Be Done – I

In this chapter we study the trade-off between the rate \(R\) and the relative distance \(\delta \) of a code: if we fix \(\delta \), what is the best rate \(R\) we can hope to achieve? Section 4.1 recalls the Hamming bound in its asymptotic (rate vs. relative distance) form. Section 4.2 proves the Gilbert–Varshamov bound, our only positive (existential) result in this chapter. Sections 4.3 and 4.4 prove two further upper bounds on \(R\): the Singleton bound and the (stronger, for binary codes) Plotkin bound.

Asymptotically good code families

Definition 103 Asymptotically good family of codes

An infinite family \(\mathscr {C}\) of \(q\)-ary codes (for \(q\) either a fixed constant or allowed to grow with the block length) is called asymptotically good if both its rate \(R = R(\mathscr {C})\) and its relative distance \(\delta = \delta (\mathscr {C})\) are bounded away from \(0\), i.e., \(R(\mathscr {C}) {\gt} 0\) and \(\delta (\mathscr {C}) {\gt} 0\). The results of this chapter (in particular the Gilbert–Varshamov bound, Theorem 105, and the Plotkin bound, Theorem 111) determine for which pairs \((R,\delta )\) such families can and cannot exist.

4.1 Asymptotic Version of the Hamming Bound

We convert the (non-asymptotic) Hamming bound, already established as Theorem 1.7.2 (thm:c01-hamming-bound), into a statement purely about the rate \(R\) and relative distance \(\delta \) of an infinite family of codes.

Proposition 104 Asymptotic Hamming Bound, Prop 4.1.1

Let \(\mathscr {C}\) be an infinite family of \(q\)-ary codes with rate \(R = R(\mathscr {C})\) and relative distance \(\delta = \delta (\mathscr {C})\). Then

\[ R \le 1 - H_q\left(\frac{\delta }{2}\right). \]

Proof ▶

Fix a \(q\)-ary code \(C\) of block length \(n\), dimension \(k\), distance \(d\), rate \(R = k/n\) and relative distance \(\delta = d/n\). By the (non-asymptotic) Hamming bound (Theorem 1.7.2, thm:c01-hamming-bound),

\[ R = \frac{k}{n} \le 1 - \frac{\log _q Vol_q\! \left(\left\lfloor \frac{d-1}{2}\right\rfloor , n\right)}{n}. \]

By the volume lower bound on Hamming balls (Proposition 3.3.3 of Chapter 3),

\[ Vol_q\! \left(\left\lfloor \frac{d-1}{2}\right\rfloor , n\right) \ge q^{H_q(\delta /2) n - o(n)}. \]

Taking \(\log _q\) of both sides and dividing by \(n\) shows that \(\log _q Vol_q(\lfloor (d-1)/2\rfloor ,n)/n \ge H_q(\delta /2) - o(1)\), where the \(o(1)\) term tends to \(0\) as \(n\to \infty \). Substituting into the displayed inequality above gives, for a \(q\)-ary code \(C\) of rate \(R\), relative distance \(\delta \) and block length \(n\),

\[ R \le 1 - H_q\left(\frac{\delta }{2}\right) + o(1), \]

where the \(o(1)\) term tends to \(0\) as \(n\to \infty \). Taking \(n \to \infty \) along an infinite family \(\mathscr {C}\) of rate \(R = R(\mathscr {C})\) and relative distance \(\delta =\delta (\mathscr {C})\) yields the claimed asymptotic bound \(R \le 1 - H_q(\delta /2)\).

4.2 Gilbert-Varshamov Bound

We now prove our only positive (existential) lower bound on \(R\) in terms of \(\delta \) in this book.

Theorem 105 Gilbert-Varshamov Bound, Thm 4.2.1

Let \(q \ge 2\). For every \(0 \le \delta {\lt} 1 - \frac{1}{q}\) there exists a family of \(q\)-ary codes \(\mathscr {C}\) with rate \(R(\mathscr {C}) \ge 1 - H_q(\delta )\) and relative distance \(\delta (\mathscr {C}) \ge \delta \). If \(q\) is a prime power then there exists such a \(q\)-ary family of linear codes. Furthermore, for every \(0 \le \varepsilon \le 1 - H_q(\delta )\) and integer \(n\), if a matrix \(\mathbf{G}\) is picked uniformly at random from \(\mathbb {F}_q^{k \times n}\) for \(k = n(1 - H_q(\delta ) - \varepsilon )\), then \(\mathbf{G}\) generates a code of rate \(1 - H_q(\delta ) - \varepsilon \) and relative distance at least \(\delta \) with probability strictly greater than \(1 - q^{-\varepsilon n}\).

Proof ▶

The existence of a (possibly non-linear) family of codes with rate \(\ge 1 - H_q(\delta )\) and relative distance \(\ge \delta \) follows from Lemma 108. The existence of a linear family with the stated (stronger, high-probability) property, for \(q\) a prime power, follows from Lemma 109; taking \(\varepsilon = 0\) in that lemma in particular gives an infinite family of linear codes with rate \(\ge 1 - H_q(\delta )\) and relative distance \(\ge \delta \).

4.2.1 Greedy Construction

Algorithm 106 Gilbert’s Greedy Code Construction, Algorithm 4.2.1

Input: \(n, q, d\).
Output: A code \(C \subseteq [q]^n\) of distance \(d \ge 1\).

\(C \leftarrow \emptyset \).
While there exists \(\mathbf{v} \in [q]^n\) such that \(\Delta (\mathbf{v}, \mathbf{c}) \ge d\) for every \(\mathbf{c} \in C\), do:
Add \(\mathbf{v}\) to \(C\).
Return \(C\).

Lemma 107 Correctness of Algorithm 4.2.1

Algorithm 106 (Algorithm 4.2.1) terminates in \(q^{O(n)}\) time and outputs a code \(C \subseteq [q]^n\) with \(\Delta (C) \ge d\). Moreover, upon termination,

\[ \bigcup _{\mathbf{c} \in C} B(\mathbf{c}, d-1) = [q]^n. \]

Proof ▶

Distance. The while-loop condition in Step 2 guarantees that we never add a vector \(\mathbf{v}\) to \(C\) (in Step 3) unless \(\Delta (\mathbf{v},\mathbf{c}) \ge d\) for every \(\mathbf{c}\) already in \(C\); hence at every point in the execution \(\Delta (C) \ge d\).

Termination. If, at some iteration, a vector \(\mathbf{v} \in [q]^n\) fails the condition of Step 2 (i.e., it is “rejected” because \(\Delta (\mathbf{v},\mathbf{c}) {\lt} d\) for some \(\mathbf{c} \in C\) at that point), then \(\mathbf{v}\) will continue to be rejected in all later iterations, since \(C\) only grows and \(\mathbf{c}\) remains in \(C\). Hence once a vector is not added it will never be added later, so the algorithm can add each of the (at most) \(q^n\) vectors of \([q]^n\) at most once, and it must terminate. A naive implementation checks, in each iteration, all (at most) \(q^n\) vectors \(\mathbf{v} \in [q]^n\) against all (at most) \(q^n\) vectors \(\mathbf{c} \in C\), giving total time \(q^{O(n)}\); fixing an ordering of \([q]^n\) once and for all and scanning through it a single time (using the observation above that a rejected vector need never be reconsidered) improves this to \(O(nq^{2n}) = q^{O(n)}\).

Covering property. Suppose for contradiction that upon termination \(\bigcup _{\mathbf{c}\in C} B(\mathbf{c},d-1) \ne [q]^n\). Then there is a vector \(\mathbf{v} \in [q]^n \setminus C\) with \(\Delta (\mathbf{v},\mathbf{c}) \ge d\) for every \(\mathbf{c} \in C\), so \(\mathbf{v}\) satisfies the condition of Step 2 and could still be added to \(C\), contradicting termination of the algorithm. Hence \(\bigcup _{\mathbf{c}\in C} B(\mathbf{c},d-1) = [q]^n\).

Lemma 108 Lemma 4.2.2

For every pair of positive integers \(n,q\) and real \(\delta \in [0,1]\), writing \(d = \delta n\), there exists an \((n,k,d)_q\) code satisfying

\[ q^k \ge \frac{q^n}{Vol_q(d-1,n)}. \]

In particular, for every positive integer \(q\) and real \(\delta \in [0, 1-1/q]\) there exists an infinite family of \(q\)-ary codes \(\mathscr {C}\) of rate \(R\) and relative distance \(\delta \) satisfying \(R \ge 1 - H_q(\delta )\).

Proof ▶

Fix \(n,q,d=\delta n\) and run Algorithm 106 to obtain a code \(C\). By Lemma 107, \(C\) has distance at least \(d\) and, upon termination,

\[ \bigcup _{\mathbf{c}\in C} B(\mathbf{c},d-1) = [q]^n, \]

and hence

\[ \left|\bigcup _{\mathbf{c}\in C} B(\mathbf{c},d-1)\right| = q^n. \]

Since always \(\sum _{\mathbf{c}\in C} |B(\mathbf{c},d-1)| \ge \left|\bigcup _{\mathbf{c}\in C} B(\mathbf{c},d-1)\right|\), we get

\[ \sum _{\mathbf{c}\in C} |B(\mathbf{c},d-1)| \ge q^n. \]

As the volume of a Hamming ball is translation invariant, every term in the sum equals \(Vol_q(d-1,n)\), so

\[ \sum _{\mathbf{c}\in C} Vol_q(d-1,n) = Vol_q(d-1,n)\cdot |C| \ge q^n, \]

giving \(|C| \ge q^n / Vol_q(d-1,n)\), i.e., \(q^k \ge q^n/Vol_q(d-1,n)\) for \(C\) an \((n,k,d)_q\) code, as desired.

For the asymptotic statement, fix \(\delta \in [0,1-1/q]\). By the volume upper bound (Proposition 3.3.3 of Chapter 3),

\[ Vol_q(d-1,n) \le Vol_q(\delta n,n) \le q^{n H_q(\delta )}. \]

Combining with the bound above,

\[ q^k \ge \frac{q^n}{Vol_q(d-1,n)} \ge \frac{q^n}{q^{nH_q(\delta )}} = q^{n(1-H_q(\delta ))}, \]

so \(k \ge n(1-H_q(\delta ))\), i.e., for every \(n,q,\delta \) there is a code of rate at least \(1 - H_q(\delta )\) and relative distance at least \(\delta \). Applying this for a sequence of \(n \to \infty \) produces the desired infinite family of rate \(R \ge 1 - H_q(\delta )\) and relative distance \(\delta \).

4.2.2 Linear Code Construction

Lemma 109 Linear Gilbert–Varshamov via random matrices, §4.2.2

Let \(q\) be a prime power, \(0 \le \delta {\lt} 1-\frac{1}{q}\) and \(0 \le \varepsilon \le 1 - H_q(\delta )\). Let \(n\) be an integer and \(k = n(1-H_q(\delta )-\varepsilon )\). If a \(k \times n\) matrix \(\mathbf{G}\) over \(\mathbb {F}_q\) is picked uniformly at random, then with probability strictly greater than \(1 - q^{-\varepsilon n}\) the matrix \(\mathbf{G}\) has full rank and generates a linear code of rate \(1-H_q(\delta )-\varepsilon \) and relative distance at least \(\delta \).

Proof ▶

Write \(d = \delta n\). Pick a random \(k \times n\) matrix \(\mathbf{G}\) where each of the \(kn\) entries is chosen uniformly and independently at random from \(\mathbb {F}_q\). Fix \(\mathbf{m} \in \mathbb {F}_q^k \setminus \{ 0\} \). For a random \(\mathbf{G}\), the vector \(\mathbf{m}\mathbf{G}\) is a uniformly random vector of \(\mathbb {F}_q^n\) (Lemma 3.1.14 of Chapter 3). Thus

\[ \Pr _{\mathbf{G}}[wt(\mathbf{m}\mathbf{G}) {\lt} d] = \frac{Vol_q(d-1,n)}{q^n} \le \frac{q^{nH_q(\delta )}}{q^n} = q^{-k}\cdot q^{-\varepsilon n}, \]

where the first equality uses that \(wt(\mathbf{m}\mathbf{G}) {\lt} d\) is equivalent to \(\mathbf{m}\mathbf{G} \in B(\mathbf{0},d-1)\) together with uniformity of \(\mathbf{m}\mathbf{G}\), and the inequality is the volume upper bound \(Vol_q(d-1,n) \le q^{nH_q(\delta )}\) (Proposition 3.3.3 of Chapter 3).

There are \(q^k - 1\) non-zero vectors \(\mathbf{m}\); taking the union over all of them (union bound, Lemma 3.1.5 of Chapter 3),

\[ \Pr _{\mathbf{G}}\left[\exists \text{ non-zero } \mathbf{m} : wt(\mathbf{m}\mathbf{G}) {\lt} d\right] \le (q^k-1)\cdot q^{-k}\cdot q^{-\varepsilon n} {\lt} q^{-\varepsilon n}. \]

So with probability strictly greater than \(1 - q^{-\varepsilon n}\), every non-zero \(\mathbf{m}\) satisfies \(wt(\mathbf{m}\mathbf{G}) \ge d\). In particular the probability of this event is (strictly) positive, so such a \(\mathbf{G}\) exists.

We now argue such a \(\mathbf{G}\) has full rank and generates a code with the claimed parameters. By Proposition 2.3.6 of Chapter 2, the code generated by \(\mathbf{G}\) has distance at least \(d\) if and only if for every non-zero \(\mathbf{m}\), \(wt(\mathbf{m}\mathbf{G}) \ge d\); this is exactly the event above. If \(\mathbf{G}\) did not have full rank there would be a non-zero \(\mathbf{m}\) with \(\mathbf{m}\mathbf{G} = \mathbf{0}\), giving \(wt(\mathbf{m}\mathbf{G}) = 0 {\lt} d\), contradicting the property just established; hence \(\mathbf{G}\) has full rank \(k\) and generates an \([n,k,d]_q\) code, i.e., a linear code of rate \(k/n = 1-H_q(\delta )-\varepsilon \) and relative distance (at least) \(\delta \), as desired.

4.3 Singleton Bound

Theorem 110 Singleton Bound, Thm 4.3.1

For every \((n,k,d)_q\) code,

\[ k \le n - d + 1. \]

Consequently, if \(\mathscr {C}\) is an infinite family of codes of rate \(R\) and relative distance \(\delta \), then \(R \le 1 - \delta \).

Proof ▶

Let \(\mathbf{c}_1,\ldots ,\mathbf{c}_M\) be the codewords of an \((n,k,d)_q\) code \(C\) (so \(M = q^k\)); we show \(M \le q^{n-d+1}\). For \(i \in [M]\), let \(\mathbf{c}_i'\) be the prefix of \(\mathbf{c}_i\) of length \(n-d+1\).

We claim that for every \(i \ne j\), \(\mathbf{c}_i' \ne \mathbf{c}_j'\). If instead \(\mathbf{c}_i' = \mathbf{c}_j'\) for some \(i \ne j\), then \(\mathbf{c}_i\) and \(\mathbf{c}_j\) agree in all of the first \(n-d+1\) positions, so \(\Delta (\mathbf{c}_i,\mathbf{c}_j) \le n - (n-d+1) = d-1\), contradicting the fact that \(C\) has distance \(d\).

Hence \(M\) equals the number of distinct prefixes (of length \(n-d+1\)) of codewords of \(C\), and since there are only \(q^{n-d+1}\) strings of length \(n-d+1\) over \([q]\), we get \(M \le q^{n-d+1}\), i.e., \(q^k \le q^{n-d+1}\), so \(k \le n-d+1\) as desired.

For the asymptotic statement, suppose for contradiction that some infinite family of codes \(\mathscr {C}\) has rate \(R = R(\mathscr {C}) = 1-\delta + \varepsilon \) for some \(\varepsilon {\gt} 0\). Then there exists \(n {\gt} 2/ \varepsilon \) and a code \(C_n \in \mathscr {C}\) that is an \((n,k,d)_q\) code with \(k \ge n(1-\delta +\varepsilon )\) and \(d \ge \delta n\). By the choice of \(n\) we then have

\[ k \ge n(1-\delta +\varepsilon ) = n - \delta n + \varepsilon n \ge n - d + 2, \]

contradicting the non-asymptotic bound \(k \le n-d+1\) proved above.

4.4 Plotkin Bound

Theorem 111 Plotkin Bound, Thm 4.4.1

The following hold for any code \(C \subseteq [q]^n\) with distance at least \(d\):

If \(d = \left(1-\frac{1}{q}\right)n\), then \(|C| \le 2qn\).
If \(d {\gt} \left(1-\frac{1}{q}\right)n\), then \(|C| \le \dfrac {qd}{qd-(q-1)n}\).

Proof ▶

Let \(C = \{ \mathbf{c}_1,\ldots ,\mathbf{c}_m\} \) be a \(q\)-ary code of block length \(n\) and distance (at least) \(d\). Let \(f : [q]^n \to \mathbb {R}^{nq}\) be the function guaranteed by Lemma 115 (Mapping Lemma). For every \(i\), \(f(\mathbf{c}_i)\) is a unit-length vector in \(\mathbb {R}^{nq}\); and for every \(i \ne j\), since \(\Delta (\mathbf{c}_i,\mathbf{c}_j)\ge d\), Lemma 115 gives

\[ \langle f(\mathbf{c}_i), f(\mathbf{c}_j) \rangle \le 1 - \left(\frac{q}{q-1}\right)\frac{d}{n}. \]

Thus \(f(\mathbf{c}_1),\ldots ,f(\mathbf{c}_m)\) are unit vectors in \(\mathbb {R}^{nq}\) to which we can apply the Geometric Lemma (Lemma 113) to bound \(m = |C|\).

Part 1. If \(d = \left(1-\frac{1}{q}\right)n = \frac{(q-1)n}{q}\), then for all \(i \ne j\),

\[ \langle f(\mathbf{c}_i), f(\mathbf{c}_j) \rangle \le 1 - \left(\frac{q}{q-1}\right)\frac{d}{n} = 1 - 1 = 0. \]

So by part 1 of Lemma 113 (with \(N=nq\)), \(m \le 2nq\), as desired.

Part 2. If \(d {\gt} \left(\frac{q-1}{q}\right)n\), then for all \(i \ne j\),

\[ \langle f(\mathbf{c}_i), f(\mathbf{c}_j) \rangle \le 1 - \left(\frac{q}{q-1}\right)\frac{d}{n} = -\left(\frac{qd-(q-1)n}{(q-1)n}\right). \]

Let \(\varepsilon := \frac{qd-(q-1)n}{(q-1)n} {\gt} 0\) (positive since \(d {\gt} \left(\frac{q-1}{q}\right)n\)). Applying part 2 of Lemma 113 to \(f(\mathbf{c}_1),\ldots , f(\mathbf{c}_m)\) and this \(\varepsilon \) gives

\[ m \le 1 + \frac{1}{\varepsilon } = 1 + \frac{(q-1)n}{qd-(q-1)n} = \frac{qd-(q-1)n + (q-1)n}{qd-(q-1)n} = \frac{qd}{qd-(q-1)n}, \]

as desired.

Corollary 112 Corollary 4.4.2

Let \(\mathscr {C}\) be an infinite family of \(q\)-ary codes with relative distance \(0 \le \delta \le 1 - \frac{1}{q}\) and rate \(R\). Then

\[ R \le 1 - \left(\frac{q}{q-1}\right)\delta . \]

Proof ▶

Assume for contradiction that \(\mathscr {C}\) is an infinite family of \(q\)-ary codes with rate \(R = 1-\left(\frac{q}{q-1}\right)\delta + \varepsilon \) for some \(\varepsilon {\gt} 0\). Let \(C \in \mathscr {C}\) be a code of block length \(n \ge \frac{3}{\varepsilon }\cdot \log \left(\frac{1}{ \varepsilon }\right)\) with distance \(d \le \delta n\) and message length \(k \ge Rn\).

Partition the codewords of \(C\) so that codewords within a partition agree on the first \(n-n'\) symbols, where \(n' = \left\lfloor \frac{qd}{q-1} \right\rfloor - 1\). For every \(\mathbf{x} \in [q]^{n-n'}\), define the prefix code

\[ C_{\mathbf{x}} = \{ (c_{n-n'+1},\ldots ,c_n) : (c_1\ldots c_N) \in C, (c_1\ldots c_{n-n'}) = \mathbf{x}\} , \]

i.e., \(C_{\mathbf{x}}\) consists of the \(n'\)-length suffixes of all codewords of \(C\) that start with \(\mathbf{x}\).

By definition \(C_{\mathbf{x}}\) is a \(q\)-ary code of block length \(n' = \left\lfloor \frac{qd}{q-1}\right\rfloor -1\). We claim it also has distance at least \(d\) for every \(\mathbf{x}\): if for some \(\mathbf{x}\), \(\mathbf{c}_1 \ne \mathbf{c}_2 \in C_{\mathbf{x}}\) had \(\Delta (\mathbf{c}_1,\mathbf{c}_2) {\lt} d\), this would yield two codewords of \(C\), namely \((\mathbf{x},\mathbf{c}_1)\) and \((\mathbf{x},\mathbf{c}_2)\), at Hamming distance less than \(d\) from each other, contradicting the assumption \(\Delta (C) \ge d\).

Since \(n' {\lt} \left(\frac{q}{q-1}\right) d\) (by definition of \(n'\)), we have \(d {\gt} \left(1-\frac{1}{q}\right)n'\). Applying part 2 of Theorem 111 to \(C_{\mathbf{x}}\) gives

\[ |C_{\mathbf{x}}| \le \frac{qd}{qd-(q-1)n'} \le qd \le qn, \]

where the second inequality uses that \(qd - (q-1)n'\) is a positive integer and the third is immediate from \(d \le n\).

We now use the bound on \(|C_{\mathbf{x}}|\), for all \(\mathbf{x}\), to bound \(|C|\). Since \(|C| = \sum _{\mathbf{x} \in [q]^{n-n'}} |C_{\mathbf{x}}|\), this gives

\[ |C| \le \sum _{\mathbf{x}\in [q]^{n-n'}} qn = q^{n-n'+1}\cdot n \le q^{n-n'+1+\log n/\log q} \le q^{n - \frac{q}{q-1}d + 1 + \log n} \le q^{n\left(1-\delta \cdot \frac{q}{q-1}+\varepsilon \right)}, \]

where the first inequality uses the definition of \(n'\) and the final inequality uses that \(n \ge \frac{3}{\varepsilon }\cdot \log \left(\frac{1}{\varepsilon }\right)\). We conclude that \(R \le 1-\left(\frac{q}{q-1}\right)\delta +\varepsilon \). Since this holds for every \(\varepsilon {\gt} 0\), the corollary follows.

Lemma 113 Geometric Lemma, Lemma 4.4.3

Let \(\mathbf{v}_1,\ldots ,\mathbf{v}_m \in \mathbb {R}^N\) be non-zero vectors.

If \(\langle \mathbf{v}_i,\mathbf{v}_j \rangle \le 0\) for all \(i \ne j\), then \(m \le 2N\).
Let \(\mathbf{v}_i\) be unit vectors for \(1 \le i \le m\). Further, if \(\langle \mathbf{v}_i,\mathbf{v}_j \rangle \le -\varepsilon {\lt} 0\) for all \(i \ne j\), then \(m \le 1 + \frac{1}{\varepsilon }\).

Proof ▶

Part 1. We first focus on a subset of the \(m\) vectors that has positive inner product with some fixed vector \(\mathbf{u}\). Pick \(\mathbf{u} \in \mathbb {R}^N\) generic so that \(\langle \mathbf{u}, \mathbf{v}_i\rangle \ne 0\) for every \(i\); such a \(\mathbf{u}\) exists since each hyperplane \(\{ \mathbf{x} : \langle \mathbf{x},\mathbf{v}_i\rangle =0\} \) is a proper subspace of \(\mathbb {R}^N\) (as \(\mathbf{v}_i \ne 0\)), and a finite union of proper subspaces cannot cover \(\mathbb {R}^N\) (Lemma 114).

Without loss of generality (replacing \(\mathbf{u}\) by \(-\mathbf{u}\) if necessary) at least half of the \(\mathbf{v}_i\)’s have positive inner product with \(\mathbf{u}\); renumbering, assume these are \(\mathbf{v}_1,\ldots ,\mathbf{v}_\ell \) with \(\ell \ge m/2\). We show \(\mathbf{v}_1,\ldots ,\mathbf{v}_\ell \) are linearly independent, which suffices since it implies \(\ell \le N\) and thus \(m \le 2\ell \le 2N\).

Suppose for contradiction there is a linear dependency among \(\mathbf{v}_1,\ldots ,\mathbf{v}_\ell \): there exist \(\alpha _1,\ldots , \alpha _\ell \), not all zero, with \(\sum _{i\in [\ell ]} \alpha _i \mathbf{v}_i = 0\). Negating all \(\alpha _i\) if necessary, assume at least one \(\alpha _i\) is positive; by renumbering assume there is \(k\ge 1\) with \(\alpha _1,\ldots ,\alpha _k {\gt} 0\) and \(\alpha _{k+1},\ldots ,\alpha _\ell \le 0\).

Let \(\mathbf{w} = \sum _{i=1}^k \alpha _i \mathbf{v}_i\); by definition of the \(\alpha _i\)’s, \(\mathbf{w} = -\sum _{j=k+1}^\ell \alpha _j \mathbf{v}_j\). We first argue \(\mathbf{w}\ne 0\): since

\[ \langle \mathbf{u},\mathbf{w}\rangle = \sum _{i=1}^k \alpha _i \langle \mathbf{u},\mathbf{v}_i\rangle \ge \alpha _1\langle \mathbf{u}, \mathbf{v}_1\rangle {\gt} 0 \]

(using \(\langle \mathbf{u},\mathbf{v}_i\rangle {\gt} 0\) and \(\alpha _i {\gt} 0\) for \(i \le k\)), \(\mathbf{w}\) has non-zero inner product with \(\mathbf{u}\) and hence is not the zero vector.

But then

\[ 0 {\lt} \langle \mathbf{w},\mathbf{w}\rangle = \left\langle \sum _{i=1}^k \alpha _i \mathbf{v}_i, -\sum _{j=k+1}^\ell \alpha _j \mathbf{v}_j \right\rangle = -\sum _{i=1,j=k+1}^{k,\ell } \alpha _i\alpha _j\langle \mathbf{v}_i,\mathbf{v}_j\rangle \le 0, \]

since every term in the sum satisfies \(\alpha _i \ge 0\), \(\alpha _j \le 0\) and \(\langle \mathbf{v}_i,\mathbf{v}_j\rangle \le 0\), so \(-\alpha _i\alpha _j\langle \mathbf{v}_i,\mathbf{v}_j\rangle \le 0\). This is a contradiction, so \(\mathbf{v}_1,\ldots ,\mathbf{v}_\ell \) are linearly independent, proving part 1. (An alternate proof of part 1, by induction on \(N\), is also given in the text.)

Part 2. Let \(\mathbf{z} = \mathbf{v}_1 + \ldots + \mathbf{v}_m\). Then

\[ \| \mathbf{z}\| ^2 = \sum _{i=1}^m \| \mathbf{v}_i\| ^2 + 2\sum _{i{\lt}j} \langle \mathbf{v}_i,\mathbf{v}_j\rangle \le m + 2\binom {m}{2}\cdot (-\varepsilon ) = m(1-\varepsilon m + \varepsilon ), \]

using that each \(\mathbf{v}_i\) is a unit vector and \(\langle \mathbf{v}_i, \mathbf{v}_j\rangle \le -\varepsilon \) for all \(i \ne j\). Since \(\| \mathbf{z}\| ^2 \ge 0\), we get \(m(1-\varepsilon m+\varepsilon ) \ge 0\); since \(m \ge 1\), dividing by \(m\) gives \(1 - \varepsilon m + \varepsilon \ge 0\), i.e., \(\varepsilon m \le 1+\varepsilon \), i.e., \(m \le 1 + \frac{1}{\varepsilon }\), as desired.

Lemma 114 Finite union of proper subspaces

✓

A finite union of proper linear subspaces of \(\mathbb {R}^N\) is not all of \(\mathbb {R}^N\).

Lemma 115 Mapping Lemma, Lemma 4.4.4

For every \(q\) and \(n\), there exists a function \(f : [q]^n \longrightarrow \mathbb {R}^{nq}\) such that for every \(\mathbf{c}_1,\mathbf{c}_2 \in [q]^n\) we have

\[ \langle f(\mathbf{c}_1), f(\mathbf{c}_2)\rangle = 1 - \left(\frac{q}{q-1}\right)\left(\frac{\Delta (\mathbf{c}_1, \mathbf{c}_2)}{n}\right). \]

Consequently:

For every \(\mathbf{c} \in [q]^n\), \(\| f(\mathbf{c})\| = 1\).
If \(\Delta (\mathbf{c}_1,\mathbf{c}_2) \ge d\) then \(\langle f(\mathbf{c}_1),f(\mathbf{c}_2)\rangle \le 1 - \left(\frac{q}{q-1}\right)\left(\frac{d}{n}\right)\).

Proof ▶

We first define a map \(\phi : [q] \to \mathbb {R}^q\) satisfying the requirements of the lemma for the case \(n=1\) (up to normalization), and then apply \(\phi \) coordinate-wise to build \(f\).

Let \(\mathbf{e}_i \in \mathbb {R}^q\) denote the unit vector along the \(i\)th direction, \(\mathbf{e}_i = \langle 0,\ldots ,0,1,0,\ldots ,0\rangle \) with the \(1\) in position \(i\), and let \(\bar{\mathbf{e}} = \frac{1}{q}\sum _{i\in [q]} \mathbf{e}_i = \langle 1/q,\ldots ,1/q\rangle \). Then \(\langle \mathbf{e}_i, \mathbf{e}_j\rangle = 1\) if \(i=j\) and \(0\) otherwise, and \(\langle \bar{\mathbf{e}},\mathbf{e}_i\rangle = \langle \bar{\mathbf{e}}, \bar{\mathbf{e}}\rangle = 1/q\) for every \(i\).

Define \(\phi : [q] \to \mathbb {R}^q\) by \(\phi (i) = \mathbf{e}_i - \bar{\mathbf{e}}\). For every pair \(i,j \in [q]\),

\[ \langle \phi (i),\phi (j)\rangle = \langle \mathbf{e}_i-\bar{\mathbf{e}}, \mathbf{e}_j-\bar{\mathbf{e}}\rangle = \langle \mathbf{e}_i,\mathbf{e}_j \rangle - \langle \mathbf{e}_i,\bar{\mathbf{e}}\rangle - \langle \bar{\mathbf{e}},\mathbf{e}_j\rangle + \langle \bar{\mathbf{e}}, \bar{\mathbf{e}}\rangle = \langle \mathbf{e}_i,\mathbf{e}_j\rangle - 1/q. \]

Thus for every \(i \in [q]\), \(\| \phi (i)\| ^2 = \langle \mathbf{e}_i, \mathbf{e}_i\rangle - 1/q = \frac{q-1}{q}\), and for every \(i \ne j \in [q]\), \(\langle \phi (i),\phi (j)\rangle = -1/q\).

We now define \(f : [q]^n \to \mathbb {R}^{nq}\). For \(\mathbf{c} = (c_1,\ldots ,c_n) \in [q]^n\), define

\[ f(\mathbf{c}) = \sqrt{\frac{q}{n(q-1)}} \cdot \big(\phi (c_1),\phi (c_2), \ldots ,\phi (c_n)\big). \]

(The multiplicative factor \(\sqrt{q/(n(q-1))}\) ensures \(f(\mathbf{c})\) is a unit vector, as we verify below.)

Condition 1 (unit norm).

\[ \| f(\mathbf{c})\| ^2 = \frac{q}{(q-1)n}\cdot \sum _{i=1}^n \| \phi (c_i)\| ^2 = \frac{q}{(q-1)n}\cdot n \cdot \frac{q-1}{q} = 1. \]

Condition 2 (inner product formula). Write \(\mathbf{c}_1 = (x_1,\ldots ,x_n)\) and \(\mathbf{c}_2 = (y_1,\ldots ,y_n)\). Then

\begin{align*} \langle f(\mathbf{c}_1),f(\mathbf{c}_2)\rangle & = \sum _{\ell =1}^n \langle f(x_\ell ),f(y_\ell )\rangle \\ & = \left[\sum _{\ell : x_\ell \ne y_\ell } \langle \phi (x_\ell ), \phi (y_\ell )\rangle + \sum _{\ell : x_\ell = y_\ell } \langle \phi (x_\ell ),\phi (y_\ell )\rangle \right]\cdot \frac{q}{n(q-1)} \\ & = \left[\sum _{\ell : x_\ell \ne y_\ell } \left(\frac{-1}{q}\right) + \sum _{\ell : x_\ell = y_\ell } \left(\frac{q-1}{q}\right)\right]\cdot \frac{q}{n(q-1)} \\ & = \left[\Delta (\mathbf{c}_1,\mathbf{c}_2)\left(\frac{-1}{q}\right) + (n-\Delta (\mathbf{c}_1,\mathbf{c}_2))\left(\frac{q-1}{q}\right) \right]\cdot \frac{q}{n(q-1)} \\ & = 1 - \Delta (\mathbf{c}_1,\mathbf{c}_2)\left(\frac{q}{n(q-1)}\right) \left[\frac{1}{q}+\frac{q-1}{q}\right] \\ & = 1 - \left(\frac{q}{q-1}\right)\left(\frac{\Delta (\mathbf{c}_1, \mathbf{c}_2)}{n}\right), \end{align*}

as desired, where we used \(\| \phi (i)\| ^2=(q-1)/q\) and \(\langle \phi (i), \phi (j)\rangle =-1/q\) for \(i\ne j\) established above.

Conditions 1 and 2 (unit norm, and the exact inner product formula) are exactly the two consequences claimed; condition 2 of the lemma statement (\(\Delta (\mathbf{c}_1,\mathbf{c}_2)\ge d \Rightarrow \langle f(\mathbf{c}_1), f(\mathbf{c}_2)\rangle \le 1-\left(\frac{q}{q-1}\right)\frac{d}{n}\)) follows immediately from the exact formula, since \(\Delta (\mathbf{c}_1, \mathbf{c}_2) \ge d\) and the coefficient \(-q/((q-1)n)\) of \(\Delta (\mathbf{c}_1, \mathbf{c}_2)\) in the formula is negative.