7 Bridging the Gap Between Shannon and Hamming: List Decoding
Given \(0 \leq \rho \leq 1\) and \(L \geq 1\), a code \(C \subseteq \Sigma ^n\) is \((\rho , L)\)-list decodable if for every received word \(\mathbf{y} \in \Sigma ^n\),
A list-decoding algorithm for \(C\), given an error parameter \(\rho \), a received word \(\mathbf{y}\), is required to output all codewords of \(C\) within relative Hamming distance \(\rho \) of \(\mathbf{y}\); in particular, if the fraction of errors that occurred is at most \(\rho \), the transmitted codeword is guaranteed to appear in the output list. If \(C\) is \((\rho , L)\)-list decodable then any such algorithm outputs at most \(L\) codewords. The list-decodability of an infinite family of codes \(\mathscr {C}\) (with polynomial-sized lists) is the limit of \(\rho \) over all polynomials \(L\) such that all sufficiently long codes \(C \in \mathscr {C}\) are \((\rho , L)\)-list decodable.
Let \(C \subseteq [q]^n\) be a code of distance \(d\). If \(\rho {\lt} J_q\! \left(\frac{d}{n}\right)\), then \(C\) is a \((\rho , qdn)\)-list decodable code, where the function \(J_q(\delta )\) is defined as
We prove the theorem here only for the binary case, \(q = 2\); the general case is left by the book as an exercise. We use double counting: we bound the same quantity above and below, and derive the desired bound from the resulting inequality.
For notational convenience, let \(e = \rho n\). We must show that for every binary code \(C \subseteq \{ 0,1\} ^n\) with distance \(d\) (i.e., \(\Delta (\mathbf{c}_1, \mathbf{c}_2) \geq d\) for all distinct \(\mathbf{c}_1 \neq \mathbf{c}_2 \in C\)) and every \(\mathbf{y} \in \{ 0,1\} ^n\),
Fix arbitrary \(C\) and \(\mathbf{y}\). Let \(\mathbf{c}_1, \ldots , \mathbf{c}_M \in B(\mathbf{y}, e)\); we show \(M \leq 2dn\). Define \(\mathbf{c}_i' = \mathbf{c}_i - \mathbf{y}\) for \(1 \leq i \leq M\), and let
We bound \(S\) above and below.
Lower bound on \(S\). Since \(\mathbf{c}_i \in B(\mathbf{y}, e)\), we have \(\mathrm{wt}(\mathbf{c}_i') \leq e\) for all \(1 \leq i \leq M\). Since \(\Delta (\mathbf{c}_i, \mathbf{c}_j) \geq d\) for \(i \neq j\), subtracting \(\mathbf{y}\) from both coordinates preserves Hamming distance, so \(\Delta (\mathbf{c}_i', \mathbf{c}_j') \geq d\) for every \(i \neq j\). Hence
Upper bound on \(S\). Consider the \(n \times M\) matrix whose columns are \(\mathbf{c}_1'^{\, T}, \ldots , \mathbf{c}_M'^{\, T}\). Let \(m_i\) be the number of \(1\)’s in row \(i\), for \(1 \leq i \leq n\). Row \(i\) contributes \(m_i(M - m_i)\) to \(S\), since this is exactly the number of \(0\)-\(1\) pairs in that row (each such pair contributes exactly \(1\) to the Hamming distance sum \(S\)). Hence
Define \(\bar e\) by \(\bar e M = \sum _{i=1}^n m_i\). Since \(\sum _{i=1}^n m_i = \sum _{j=1}^M \mathrm{wt}(\mathbf{c}_j') \leq eM\), we get \(\bar e \leq e\). By the Cauchy-Schwarz inequality (taking \(\mathbf{x} = (m_1, \ldots , m_n)\), \(\mathbf{z} = (1/n, \ldots , 1/n)\)),
i.e., \(\sum _{i=1}^n m_i^2 \geq (\bar e M)^2 / n\). Thus
Combining the two bounds,
which, dividing by \(M {\gt} 0\) and rearranging, gives
where the middle equality is a direct algebraic identity, and the last inequality holds because \(\bar e \leq e \leq n/2\), so \((n - 2\bar e)^2 \geq (n-2e)^2\) and hence the denominator only shrinks.
Finally, since \(\rho = e/n {\lt} J_2(d/n) = \frac12\left(1 - \sqrt{1 - 2d/n}\right)\), we get \(n - 2e {\gt} \sqrt{n(n-2d)}\), i.e., \((n-2e)^2 {\gt} n(n-2d)\). Since \(n, e\) are integers, \((n-2e)^2 - n(n-2d) \geq 1\), and therefore \(M \leq 2dn/1 = 2dn\), as required.
Let \(q \geq 2\) be an integer and let \(0 \leq x \leq 1 - \frac1q\). Then the following inequalities hold:
where the second inequality is strict for \(x {\gt} 0\).
We first prove
Both sides are zero at \(x=0\). The derivative of the left-hand side is \(\frac{1}{2\sqrt{1 - xq/(q-1)}}\) and of the right-hand side is \(\frac{1}{2\sqrt{1-x}}\); the former is always at least as large as the latter (since \(xq/(q-1) \geq x\)), so the left-hand side increases at least as rapidly as the right-hand side as \(x\) increases from \(0\), which proves the inequality (this is exactly \(J_q(x) \geq 1 - \sqrt{1-x}\)).
For the second inequality, since \(x \geq 0\),
which implies \(\left(1 - \frac{x}{2}\right)^2 \geq 1-x\), i.e., \(1 - \frac{x}{2} \geq \sqrt{1-x}\) (both sides nonnegative for \(x \leq 1\)), which in turn gives \(1 - \sqrt{1-x} \geq x/2\). Both inequalities used here are strict for \(x{\gt}0\), so \(1 - \sqrt{1-x} {\gt} x/2\) for every \(x {\gt} 0\), as desired.
For every \(q\)-ary code with block length \(n\) and distance \(d\), if \(e \leq n - \sqrt{n(n-d)}\), then the code is \((e/n, qnd)\)-list decodable.
Let \(\delta = d/n\) and \(\rho = e/n\). The hypothesis \(e \leq n - \sqrt{n(n-d)}\) is equivalent to
By Lemma 155 (with \(x = \delta \)), \(J_q(\delta ) \geq 1 - \sqrt{1-\delta } \geq \rho \). Hence \(\rho \leq J_q(\delta ) = J_q(d/n)\), so by the Johnson Bound (Theorem 154), the code is \((\rho , qdn)\)-list decodable, i.e., \((e/n, qnd)\)-list decodable.
Let \(q \geq 2\), \(0 \leq \rho {\lt} 1 - \frac1q\), and \(\varepsilon {\gt} 0\) be a small enough real. Then the following holds for codes of large enough block length \(n\):
If \(R \leq 1 - H_q(\rho ) - \varepsilon \), then there exists a \(\big(\rho , O(\frac1\varepsilon )\big)\)-list decodable code.
If \(R \geq 1 - H_q(\rho ) + \varepsilon \), every \((\rho , L)\)-list decodable code has \(L \geq q^{\Omega (\varepsilon n)}\).
Immediate from Theorem 161 by taking \(L = \lceil 1/\varepsilon \rceil \): part (i) of Theorem 161 then gives a \((\rho , L)\)-list decodable code, i.e., a \((\rho , O(1/\varepsilon ))\)-list decodable code, of rate \(R \leq 1 - H_q(\rho ) - 1/L \leq 1 - H_q(\rho ) - \varepsilon /2\) (for \(\varepsilon \) small enough this is at most \(1 - H_q(\rho ) - \varepsilon \) up to renaming constants), giving (i); and part (ii) of Theorem 161 directly gives (ii).
The list-decoding capacity (of the \(q\)-ary worst-case error channel), as a function of the fraction of errors \(\rho \), is \(1 - H_q(\rho )\).
Let \(q \geq 2\) be an integer, \(0 {\lt} \rho {\lt} 1 - \frac1q\) a real number, and \(L \geq 1\) an integer. Then there exists a \((\rho , L)\)-list decodable code with rate
We use the probabilistic method: we pick a random code and show it satisfies the required property with nonzero probability, by showing the probability of a “bad event” is small.
Pick a code \(C\) at random where \(|C| = q^k\) with \(k \leq \left(1 - H_q(\rho ) - \frac1L\right) n\), i.e., for every message \(\mathbf{m} \in [q]^k\), choose \(C(\mathbf{m})\) uniformly and independently at random from \([q]^n\).
Given \(\mathbf{y} \in [q]^n\) and distinct \(\mathbf{m}_0, \ldots , \mathbf{m}_L \in [q]^k\), say the tuple \((\mathbf{y}, \mathbf{m}_0, \ldots , \mathbf{m}_L)\) defines a bad event if
A code is \((\rho ,L)\)-list decodable if and only if no bad event occurs (a bad event witnesses \(L+1\) codewords all within distance \(\rho n\) of some \(\mathbf{y}\), exceeding the list-size bound \(L\)).
Fix \(\mathbf{y}\) and \(\mathbf{m}_0, \ldots , \mathbf{m}_L\). For each fixed \(i\), by the random choice of \(C\),
Since the random choices of codewords for distinct messages are independent,
By the union bound over the \(q^n\) choices of \(\mathbf{y}\) and the \(\binom {q^k}{L+1}\) choices of \(L+1\) distinct messages,
using \(\binom {a}{b} \leq a^b\) and \(k = Rn\). By assumption \(R \leq 1 - H_q(\rho ) - 1/L\), so \(1 - H_q(\rho ) - R \geq 1/L\), hence
and so
Since the probability of a bad event is strictly less than \(1\), by the probabilistic method there exists a code \(C\) with no bad event, i.e., a \((\rho , L)\)-list decodable code of rate \(R \leq 1 - H_q(\rho ) - 1/L\).
Let \(q \geq 2\) be an integer and \(0 {\lt} \rho {\lt} 1 - \frac1q\) a real number. For every \((\rho , L)\) code \(C\) of rate \(1 - H_q(\rho ) + \varepsilon \), it is necessary that \(L \geq 2^{\Omega (\varepsilon n)}\).
We show, via the probabilistic method, that there exists \(\mathbf{y} \in [q]^n\) such that \(|C \cap B(\mathbf{y}, \rho n)|\) is exponentially large (in particular \(q^{\Omega (\varepsilon n)}\)); since a \((\rho , L)\)-list decodable code must have \(|C \cap B(\mathbf{y}, \rho n)| \leq L\) for every \(\mathbf{y}\), this forces \(L \geq q^{\Omega (\varepsilon n)} = 2^{\Omega (\varepsilon n)}\) (as \(q \geq 2\) is a constant).
Pick \(\mathbf{y} \in [q]^n\) uniformly at random. Fix \(\mathbf{c} \in C\); then
By linearity of expectation,
using \(|C| = q^{Rn}\) and \(R \geq 1 - H_q(\rho ) + \varepsilon \). Since the expectation of \(|C \cap B(\mathbf{y}, \rho n)|\) over uniformly random \(\mathbf{y}\) is at least \(q^{\Omega (\varepsilon n)}\), there must exist a specific \(\mathbf{y}\) with \(|C \cap B(\mathbf{y}, \rho n)| \geq q^{\Omega (\varepsilon n)}\), as desired.
Let \(q \geq 2\) be an integer, and \(0 {\lt} \rho {\lt} 1 - \frac1q\) be a real number.
Let \(L \geq 1\) be an integer; then there exists a \((\rho , L)\)-list decodable code with rate \(R \leq 1 - H_q(\rho ) - \frac1L\).
For every \((\rho , L)\) code of rate \(1 - H_q(\rho ) + \varepsilon \), it is necessary that \(L \geq 2^{\Omega (\varepsilon n)}\).
Given a code \(C\), a fixed codeword \(\mathbf{c} \in C\), and an error pattern \(\mathbf{e}\) such that some codeword of \(C\) lies within Hamming distance \(\rho n\) of \(\mathbf{c} + \mathbf{e}\), the error pattern \(\mathbf{e}\) is associated with a codeword \(c(\mathbf{e})\), namely the closest codeword of \(C\) that lies within Hamming distance \(\rho n\) from \(\mathbf{c} + \mathbf{e}\).
Let \(C\) be an \((n,k,d)_q\) code. If we fix the values in \(n-d+1\) out of the \(n\) positions in a possible codeword, then at most one codeword in \(C\) can agree with the fixed values.
Suppose two distinct codewords \(\mathbf{c}_1 \neq \mathbf{c}_2 \in C\) both agree with the fixed values on the same set of \(n-d+1\) positions. Then \(\mathbf{c}_1\) and \(\mathbf{c}_2\) agree with each other on those \(n-d+1\) positions, so they can disagree on at most \(n - (n-d+1) = d-1\) positions, i.e., \(\Delta (\mathbf{c}_1, \mathbf{c}_2) \leq d - 1 {\lt} d\). This contradicts the minimum distance of \(C\) being \(d\). Hence at most one codeword of \(C\) can agree with the fixed values on any fixed set of \(n-d+1\) positions.
Let \(\varepsilon {\gt} 0\) be a real and \(q \geq 2^{\Omega (1/\varepsilon )}\) be an integer. Then the following is true for every \(0 {\lt} \delta {\lt} 1 - 1/q\) and large enough \(n\). Let \(C \subseteq \{ 0, 1, \ldots , q-1\} ^n\) be a code with relative distance \(\delta \) and let \(S \subseteq [n]\) be such that \(|S| = (1-\rho )n\), where \(0 {\lt} \rho \leq \delta - \varepsilon \).
Then, for all \(\mathbf{c} \in C\) and all but a \(q^{-\Omega (\varepsilon n)}\) fraction of error patterns \(\mathbf{e} \in \{ 0, 1, \ldots , q-1\} ^n\) such that
the only codeword within Hamming distance \(\rho n\) of \(\mathbf{c} + \mathbf{e}\) is \(\mathbf{c}\) itself.
Fix \(\mathbf{c} \in C\) for the rest of the proof, and let \(d = \delta n\). Let \(\mathcal{E}_S\) be the set of all error patterns \(\mathbf{e}\) with \(\mathbf{e}_S = \mathbf{0}\) and \(\mathrm{wt}(\mathbf{e}) = \rho n\); since \(\mathbf{e}_S = \mathbf{0}\) and \(\mathrm{wt}(\mathbf{e}) = \rho n = |[n] \setminus S|\), every position of \([n] \setminus S\) must be nonzero in \(\mathbf{e}\), and each such position has \(q-1\) nonzero choices, so
Call \(\mathbf{e} \in \mathcal{E}_S\) bad if there exists another codeword \(\mathbf{c}' \neq \mathbf{c}\) with \(\Delta (\mathbf{c}', \mathbf{c} + \mathbf{e}) \leq \rho n\). We must show the number of bad error patterns is at most \(q^{-\Omega (\varepsilon n)}|\mathcal{E}_S|\).
For a bad \(\mathbf{e}\), let \(c(\mathbf{e}) \neq \mathbf{c}\) be its associated codeword (Definition 162), and let \(A \subseteq [n]\) be the set of positions where \(c(\mathbf{e})\) agrees with \(\mathbf{c} + \mathbf{e}\). Write \(|A| = \alpha n\). Since \(\Delta (c(\mathbf{e}), \mathbf{c}+\mathbf{e}) \leq \rho n\), they agree on at least \((1-\rho )n\) positions, so
Fix \(A\) with \(|A| = \alpha n\); we count how many bad \(\mathbf{e}\) map to this \(A\), and later aggregate over all (at most \(2^n\)) choices of \(A\). Let \(A_1 = A \cap S\) and \(A_2 = A \setminus A_1\), and write \(|A_1| = \beta n\), so \(|A_2| = (\alpha - \beta ) n\) and \(\beta \leq \alpha \).
We overestimate the number of \(\mathbf{e}\) mapping to \((A_1, A_2)\) as follows.
First, the values of \(\mathbf{e}\) on \([n] \setminus (S \cup A_2)\) must all be nonzero (since \(\mathbf{e}\) is supported exactly on \([n]\setminus S\)), so the number of possible values of \(\mathbf{e}_{[n]\setminus (S \cup A_2)}\) is at most
\[ (q-1)^{n - |S| - |A_2|} \leq q^{\, n - (1-\rho )n - (\alpha -\beta )n}. \]Fix a nonzero choice \(\mathbf{x}\) for \(\mathbf{e}_{[n]\setminus (S \cup A_2)}\). Since \(A_1 \subseteq S\), we have \(\mathbf{e}_{A_1} = \mathbf{0}\), so \((\mathbf{c} + \mathbf{e})_{A_1} = \mathbf{c}_{A_1}\); since \(A_1 \subseteq A\), \(c(\mathbf{e})\) agrees with \(\mathbf{c}+\mathbf{e}\) on \(A_1\), so \(c(\mathbf{e})_{A_1} = \mathbf{c}_{A_1}\) is already determined (as \(\mathbf{c}\) is fixed). By Lemma 163 applied to \(C\) (an \((n, k, d)_q\) code), fixing the values of \(c(\mathbf{e})\) on any \(n - d + 1 = (1-\delta )n + 1\) positions determines \(c(\mathbf{e})\) completely; since \(|A_1| = \beta n\) positions of \(c(\mathbf{e})\) are already fixed (equal to \(\mathbf{c}_{A_1}\)), fixing a further \((1-\delta )n + 1 - \beta n\) positions of \(c(\mathbf{e})_{A_2}\) determines \(c(\mathbf{e})\) completely, and hence determines \(\mathbf{e}\) completely (via \(\mathbf{e} = c(\mathbf{e}) - \mathbf{c}\) on \(A\), together with the already-fixed values elsewhere). Thus the number of choices for \(\mathbf{e}\) compatible with this \(\mathbf{x}\) is at most
\[ q^{(1-\delta )n + 1 - \beta n}. \]
Combining, the number of bad \(\mathbf{e}\) mapping to \((A_1, A_2)\) is at most
where the inequality uses \(\alpha \geq 1 - \delta + \varepsilon \) (so \(-\alpha n \leq -(1 - \delta + \varepsilon ) n\), and \(\rho n - \alpha n + (1-\delta )n \leq \rho n - \varepsilon n\)), and the last equality uses \(|\mathcal{E}_S| = (q-1)^{\rho n} \leq q^{\rho n}\) together with absorbing the base change into the asymptotic notation (as in the book).
Finally, summing over all (at most \(2^n\)) choices of \(A = (A_1, A_2)\), the total number of bad error patterns is at most
where the last inequality holds because for \(q \geq \Omega (1/\varepsilon )\) and \(n\) large enough, \(n/\log _2 q + 1 {\lt} 3\varepsilon n /4\). Hence the fraction of bad error patterns in \(\mathcal{E}_S\) is at most \(q^{-\Omega (\varepsilon n)}\), i.e., for all but a \(q^{-\Omega (\varepsilon n)}\) fraction of \(\mathbf{e} \in \mathcal{E}_S\), \(\mathbf{e}\) is not bad, which means \(\mathbf{c}\) is the unique codeword within Hamming distance \(\rho n\) of \(\mathbf{c} + \mathbf{e}\), as required.