8 What Cannot be Done – II

In this brief interlude we revisit the trade-offs between rate and relative distance for codes. So far the best lower bound on the rate \(R\) that we have seen is the Gilbert–Varshamov (GV) bound (105), and the best upper bound is a combination of the Plotkin (111) and Hamming (32) bounds. In this chapter we prove one more upper bound on \(R\), due to Elias and Bassalygo, and then present the linear programming (LP) method for upper-bounding the rate, culminating in the (McEliece–Rodemich–Rumsey–Welch) MRRW bound. We close with a short recap of the state of the art.

8.1 Elias-Bassalygo bound

We begin with a new upper bound on the rate, the Elias-Bassalygo bound. It uses the \(q\)-ary Johnson bound function \(J_q(\cdot )\) from Theorem 154, together with the \(q\)-ary entropy function, to bound the rate of a code in terms of its relative distance.

Lemma 165 Existence of a dense Hamming ball, Lemma 8.1.2

Given a \(q\)-ary code \(C \subseteq [q]^n\) and \(0 \le e \le n\), there exists a Hamming ball of radius \(e\) containing at least \(\dfrac {|C|\, \mathrm{Vol}_q(e,n)}{q^n}\) codewords of \(C\).

Proof ▶

We prove the existence of the required Hamming ball by the probabilistic method. Pick a received word \(\mathbf y \in [q]^n\) uniformly at random. The expected value of \(|B(\mathbf y, e) \cap C|\) equals \(\dfrac {|C|\, \mathrm{Vol}_q(e,n)}{q^n}\); indeed \(\mathbb {E}[|B(\mathbf y,e)\cap C|] = \sum _{\mathbf c \in C} \Pr [\mathbf c \in B(\mathbf y,e)] = \sum _{\mathbf c \in C} \dfrac {\mathrm{Vol}_q(e,n)}{q^n} = \dfrac {|C|\, \mathrm{Vol}_q(e,n)}{q^n}\), since for a fixed codeword \(\mathbf c\), \(\Pr _{\mathbf y}[\Delta (\mathbf c,\mathbf y) \le e] = \mathrm{Vol}_q(e,n)/q^n\). By the probabilistic method, there thus exists a \(\mathbf y \in [q]^n\) such that \(|B(\mathbf y,e) \cap C| \ge \dfrac {|C|\, \mathrm{Vol}_q(e,n)}{q^n}\), as desired.

Theorem 166 Elias-Bassalygo bound, Theorem 8.1.1

Every \(q\)-ary code of rate \(R\), relative distance \(\delta \), and large enough block length \(n\) satisfies

\[ R \le 1 - H_q\bigl(J_q(\delta )\bigr) + o(1) . \]

Proof ▶

Let \(C \subseteq [q]^n\) be any code with relative distance \(\delta \), and let \(d\) denote its (absolute) minimum distance. Define \(e = nJ_q(\delta ) - 1\). By Lemma 165, there exists a Hamming ball containing \(\mathscr B\) codewords of \(C\) with

\[ \mathscr B \ge \frac{|C|\, \mathrm{Vol}_q(e,n)}{q^n} . \tag {8.1} \]

On the other hand, by our choice of \(e\) and the Johnson bound (Theorem 154), a Hamming ball of radius \(e = nJ_q(\delta )-1\) around any point contains at most \(qdn\) codewords of \(C\), i.e.

\[ \mathscr B \le qdn . \tag {8.2} \]

Combining the lower bound (8.1) and the upper bound (8.2),

\[ |C| \le qdn \cdot \frac{q^n}{\mathrm{Vol}_q(e,n)} \le q^{n(1 - H_q(J_q(\delta )) + o(1))} , \]

where the second inequality uses the lower bound on the volume of a Hamming ball (Ch 3, Prop 3.3.3) together with \(qdn \le qn^2 \le q^{o(n)}\) for large enough \(n\). Taking logarithms base \(q\) and dividing by \(n\), this implies that the rate \(R\) of \(C\) satisfies

\[ R \le 1 - H_q\bigl(J_q(\delta )\bigr) + o(1) , \]

as desired.

Figure 8.1 in the text illustrates that, for binary codes, the Elias-Bassalygo bound is strictly tighter than all of the Hamming, Plotkin and Singleton upper bounds seen so far (32, 111, 110).

8.2 The linear programming bounds

We now turn to the strongest known method for upper-bounding the rate of codes, dating back to 1977: the (first) linear programming (LP) bound, also called the first MRRW bound after its discoverers McEliece, Rodemich, Rumsey and Welch. We restrict ourselves to binary codes throughout this section.

Krawtchouk polynomials and the linear program

Definition 167 Krawtchouk polynomials and functions, Eq. (8.5)

Fix a block length \(n\). For \(\ell = 0,1,\dots ,n\), the Krawtchouk polynomial \(K_\ell (X)\) is the real polynomial

\[ K_\ell (X) = \sum _{j=0}^{\ell } (-1)^j \binom {X}{j}\binom {n-X}{\ell - j} , \]

where \(\binom {X}{j}\) denotes the polynomial \(X(X-1)\cdots (X-j+1)/j!\) (so that plugging in \(X = m\) for an integer \(m\) recovers the usual binomial coefficient \(\binom {m}{j}\)), and similarly for \(\binom {n-X}{\ell -j}\). The Krawtchouk function \(K_\ell (i)\), for \(i = 0,1,\dots ,n\), is the evaluation of \(K_\ell (X)\) at \(X = i\).

Proposition 168 Krawtchouk basis property, Eq. (8.6)

For each \(\ell = 0,\dots ,n\), the polynomial \(K_\ell (X)\) has degree exactly \(\ell \). Consequently \(\{ K_\ell (X)\} _{\ell =0}^n\) is a basis of the space of real polynomials of degree at most \(n\), and the functions \(K_\ell : \{ 0,1,\dots ,n\} \to \mathbb R\), \(\ell = 0,\dots ,n\), form a basis of all real-valued functions on \(\{ 0,1,\dots ,n\} \). Consequently every function \(g : \{ 0,\dots ,n\} \to \mathbb R\) can be expressed uniquely as

\[ g(x) = \sum _{\ell =0}^n \hat g(\ell ) K_\ell (x) \]

for real coefficients \(\hat g(0),\dots ,\hat g(n)\).

Proof ▶

Each summand \((-1)^j \binom {X}{j}\binom {n-X}{\ell -j}\) is, as a polynomial in \(X\), of degree \(j\) (from \(\binom {X}{j}\)) plus degree \(\ell - j\) (from \(\binom {n-X}{\ell -j}\), itself a degree-\((\ell -j)\) polynomial in \(X\)), hence of degree exactly \(\ell \). One checks, e.g. by induction on \(\ell \) using the standard three-term recurrence for Krawtchouk polynomials, that the coefficient of \(X^\ell \) in the sum does not vanish, so \(\deg K_\ell (X) = \ell \) exactly. Thus \(K_0(X),\dots ,K_n(X)\) are \(n+1\) polynomials of pairwise distinct degrees \(0,1,\dots ,n\), hence linearly independent, and so they form a basis of the \((n+1)\)-dimensional space of real polynomials of degree at most \(n\).

Since every function on the finite set \(\{ 0,1,\dots ,n\} \) agrees with a unique interpolating polynomial of degree at most \(n\), restricting the basis \(K_0(X),\dots ,K_n(X)\) to \(\{ 0,1,\dots ,n\} \) again gives a basis, now of the space of all real-valued functions on \(\{ 0,1,\dots ,n\} \). The displayed expansion of an arbitrary \(g\) is exactly the (unique) expression of \(g\) in this basis.

Lemma 169 Character-sum form of the Krawtchouk function, Eq. (8.7)

For \(i, \ell \in \{ 0,1,\dots ,n\} \),

\[ K_\ell (i) = \sum _{z : \mathrm{wt}(z) = \ell } (-1)^{a \cdot z} , \]

where \(a\) is an arbitrary vector in \(\{ 0,1\} ^n\) of Hamming weight \(i\) (the sum on the right does not depend on the choice of such \(a\), by symmetry).

Definition 170 The first LP bound: the linear program, Eqs. (8.3)–(8.4)

Fix a block length \(n\) and a target distance \(d\). Consider the linear program in real variables \(A_0,\dots ,A_n\):

\begin{align*} \text{Maximize} \quad & \sum _{i=0}^n A_i \\ \text{s.t.} \quad & A_0 = 1, \\ & A_i \ge 0, \qquad i = 1,\dots ,n, \\ & A_i = 0, \qquad i = 1,\dots ,d-1, \\ & \sum _{i=0}^n K_\ell (i) A_i \ge 0, \qquad \ell = 1,\dots ,n . \end{align*}

We write \(\mathrm{LP}(n,d)\) for the optimum value of this program. (For a linear code \(C\) of distance \(d\), the intended meaning of \(A_i\) is the number of weight-\(i\) codewords of \(C\); the first three families of constraints are then immediate, while the last family follows from the MacWilliams identities relating the weight distribution of \(C\) to that of its dual code. )

Definition 171 Normalized distance distribution

For a (not necessarily linear) binary code \(C\) and \(i = 0,1,\dots ,n\), define

\[ A_i^C = \frac{\# \{ (x,y) \in C^2 : \Delta (x,y) = i\} }{|C|} , \]

the normalized distance distribution function of \(C\).

Theorem 172 The first LP bound, Section 8.2

Let \(C \subseteq \{ 0,1\} ^n\) be any (not necessarily linear) code of minimum distance at least \(d\). Then \(|C| \le \mathrm{LP}(n,d)\).

Proof ▶

We show that the normalized distance distribution \(A_i^C\) of Definition 171 is a feasible solution of the linear program of Definition 170, with objective value \(|C|\); this proves the theorem since \(\mathrm{LP}(n,d)\) is the maximum objective value over all feasible solutions.

The first three families of constraints are immediate from the definitions: \(A_0^C = 1\) (taking \(x=y\)), \(A_i^C \ge 0\) for all \(i\), and \(A_i^C = 0\) for \(1 \le i {\lt} d\) since \(C\) has minimum distance at least \(d\).

For the last family of constraints, fix \(\ell \in \{ 1,\dots ,n\} \). Using the character-sum expression (Lemma 169) for \(K_\ell (i)\) with \(a = x-y\) (a vector of weight \(\Delta (x,y) = i\)),

\begin{align*} \sum _{i=0}^n A_i^C K_\ell (i) & = \frac{1}{|C|}\sum _{i=0}^n \sum _{\substack {(x,y)\in C^2 \\ \begin{bgroup} \Delta (x,y)=i \end{bgroup}}} K_\ell (i) \\ & = \frac{1}{|C|}\sum _{i=0}^n \sum _{\substack {(x,y)\in C^2 \\ \begin{bgroup} \Delta (x,y)=i \end{bgroup}}} \Bigl(\sum _{z:\mathrm{wt}(z)=\ell } (-1)^{(x-y)\cdot z}\Bigr) \\ & = \frac{1}{|C|} \sum _{z:\mathrm{wt}(z)=\ell } \Bigl(\sum _{(x,y)\in C^2} (-1)^{(x-y)\cdot z}\Bigr) \\ & = \frac{1}{|C|} \sum _{z:\mathrm{wt}(z)=\ell } \Bigl(\sum _{x\in C} (-1)^{x\cdot z}\Bigr)\Bigl(\sum _{y\in C} (-1)^{y\cdot z}\Bigr) \\ & = \frac{1}{|C|} \sum _{z:\mathrm{wt}(z)=\ell } \Bigl(\sum _{x\in C} (-1)^{x\cdot z}\Bigr)^2 \; \ge \; 0 , \end{align*}

as required (here \(\mathrm{wt}(z) = \ell \), \(x - y\) over \(\mathbb F_2\) equals \(x+y\), and \((-1)^{(x-y)\cdot z} = (-1)^{x\cdot z}(-1)^{y\cdot z}\)).

Finally the objective value of \(A^C\) is

\[ \sum _{i=0}^n A_i^C = \frac{1}{|C|}\sum _{i=0}^n \sum _{\substack {(x,y)\in C^2 \\ \Delta (x,y)=i}} 1 = \frac{1}{|C|} \sum _{(x,y)\in C^2} 1 = |C| . \]

Hence \(A^C\) is feasible with objective value \(|C|\), so \(|C| \le \mathrm{LP}(n,d)\).

Dual view and the MRRW bound

We now upper bound \(\mathrm{LP}(n,d)\) itself, i.e. the maximum value the linear program of Definition 170 can take. Since it is a maximization program, we bound it via weak duality: any feasible solution of the dual program gives a valid upper bound.

Lemma 173 Weak duality for the LP bound, Eqs. (8.8)–(8.10)

Let \(d \le n\), and suppose \(\beta _1,\dots ,\beta _n \ge 0\) are real numbers such that

\[ -\sum _{\ell =1}^n \beta _\ell K_\ell (i) \ge 1 \qquad \text{for every } i = d,d+1,\dots ,n . \]

Then every binary code \(C \subseteq \{ 0,1\} ^n\) of minimum distance at least \(d\) satisfies

\[ |C| \le 1 + \sum _{\ell =1}^n \beta _\ell K_\ell (0) . \]

Proof ▶

Let \(A_i = A_i^C\) be the normalized distance distribution of \(C\) (Definition 171), which by the proof of Theorem 172 is feasible for the linear program, with objective value \(|C|\). Since \(A_0 = 1\) and \(A_i = 0\) for \(1 \le i {\lt} d\), the constraint \(\sum _{i=0}^n K_\ell (i) A_i \ge 0\) may be rewritten as

\[ -\sum _{i=d}^n K_\ell (i) A_i \le K_\ell (0) \qquad (\ell = 1,\dots ,n) . \tag {8.8} \]

Multiplying (8.8) by \(\beta _\ell \ge 0\) and summing over \(\ell = 1,\dots ,n\) gives

\[ \sum _{i=d}^n A_i \Bigl(-\sum _{\ell =1}^n \beta _\ell K_\ell (i)\Bigr) \le \sum _{\ell =1}^n \beta _\ell K_\ell (0) . \]

By hypothesis \(-\sum _{\ell =1}^n \beta _\ell K_\ell (i) \ge 1\) for \(i \ge d\), and \(A_i \ge 0\), so the left side is at least \(\sum _{i=d}^n A_i\). Hence

\[ \sum _{i=d}^n A_i \le \sum _{\ell =1}^n \beta _\ell K_\ell (0) . \tag {8.9} \]

Adding \(A_0 = 1\) to both sides,

\[ \sum _{i=0}^n A_i \le 1 + \sum _{\ell =1}^n \beta _\ell K_\ell (0) . \tag {8.10} \]

The left side equals \(|C|\) (the objective value of \(A^C\)), giving the claim.

Definition 174 The dual polynomial, Eq. (8.11)

Given nonnegative reals \(\beta _1,\dots ,\beta _n\), set \(\beta _0 = 1\) and define the degree-\(n\) polynomial (in the Krawtchouk basis of Proposition 168)

\[ \beta (X) := \beta _0 + \sum _{\ell =1}^n \beta _\ell K_\ell (X) . \]

Corollary 175 Reformulated dual bound

Suppose \(\beta _1,\dots ,\beta _n \ge 0\) are such that the polynomial \(\beta (X)\) of Definition 174 satisfies \(\beta (i) \le 0\) for all \(i = d,d+1,\dots ,n\). Then every binary code of minimum distance at least \(d\) has size at most \(\beta (0)\).

Proof ▶

The hypothesis \(\beta (i) \le 0\) for \(i \ge d\) says exactly \(-\sum _{\ell =1}^n \beta _\ell K_\ell (i) \ge \beta _0 = 1\) for \(i = d,\dots ,n\), which is the hypothesis of Lemma 173. Its conclusion is \(|C| \le 1 + \sum _{\ell =1}^n \beta _\ell K_\ell (0) = \beta _0 + \sum _{\ell =1}^n \beta _\ell K_\ell (0) = \beta (0)\).

Definition 176 The dual linear program

By Corollary 175, the tightest upper bound on \(A(n,d)\) (the largest size of a binary code of length \(n\) and distance \(d\)) obtainable by this method is the optimum value of the following optimization problem, which is the dual of the linear program of Definition 170:

\begin{align*} \text{Minimize} \quad & \beta (0) \\ \text{s.t.} \quad & \beta _\ell \ge 0, \qquad \ell = 1,2,\dots ,n, \\ & \beta (j) \le 0, \qquad j = d,\dots ,n . \end{align*}

Theorem 177 The MRRW bound, Section 8.2.1

For a family of binary codes of relative distance \(\delta \in (0,1)\), the rate \(R\) satisfies

\[ R \le R_{\mathrm{MRRW}}(\delta ) + o(1), \qquad \text{where} \qquad R_{\mathrm{MRRW}}(\delta ) := h\Bigl(\tfrac 12 - \sqrt{\delta (1-\delta )}\Bigr) , \]

and \(h(\cdot )\) denotes the binary entropy function.

Proof ▶

One constructs, for a suitable choice of the symmetric functions \(\phi , \Gamma : \{ 0,1\} ^n \to \mathbb R\) and a positive integer parameter \(m\), a dual-feasible solution \(\beta (X)\) (via Corollary 175 and Definition 176) whose value \(\beta (0)\), evaluated asymptotically as \(n \to \infty \) with \(d/n \to \delta \), equals \(2^{n(R_{\mathrm{MRRW}}(\delta ) + o(1))}\). The book takes \(\phi (x) = (n-2|x|)^m - (n-2d)^m\), so that \(\phi (x) \le 0\) whenever \(|x| \ge d\), and sets \(g(x) = \phi (x)\Gamma (x)^2\) (so \(g(x) \le 0\) whenever \(|x|\ge d\), since \(\Gamma (x)^2 \ge 0\)), but the choice of \(\Gamma \), the verification that the resulting \(\beta _\ell \)’s are all nonnegative, and the final asymptotic evaluation are not given on the assigned pages.

In the regime \(\delta = \tfrac 12 - \varepsilon \) with \(\varepsilon \to 0\), the GV bound (105) gives a lower bound on \(R\) of \(\Omega (\varepsilon ^2)\), the Elias-Bassalygo bound (166) gives an upper bound of \(O(\varepsilon )\), and the MRRW bound (177) gives an upper bound of \(O(\varepsilon ^2 \log (1/\varepsilon ))\) – much closer to the GV bound, though a gap of \(O(\log (1/\varepsilon ))\) between the GV lower bound and the MRRW upper bound remains open to this day.

8.3 A Breather

We close the chapter with a recap of what has been established so far for binary codes, in both Shannon’s and Hamming’s worlds, under both unique and list decoding.

For \(\mathrm{BSC}_p\): the capacity is \(1 - H(p)\) (144); the natural open question is to obtain the capacity result with explicit codes together with efficient encoding and decoding algorithms.

For Hamming’s world under unique decoding: for large enough alphabets, Reed-Solomon codes (128) meet the Singleton bound (110) and are strongly explicit; whether Reed-Solomon codes can be decoded up to half their distance remains open. For binary codes there is a gap between the best-known lower bound (GV, 105) and the best-known upper bound (MRRW, 177) on the rate at a given relative distance, and no explicit construction of a binary code lying on the GV bound is known; whether asymptotically good codes (103) can be explicitly constructed, and efficiently decoded, remains open.

For list decoding, the list-decoding capacity is \(1 - H_q(p)\) (158); the natural open questions are whether this capacity can be achieved with explicit codes (153) and efficient list decoding algorithms.

These open questions set the agenda for the remainder of the book.