6 What Happens When the Noise is Stochastic: Shannon’s Theorem

Definition 137 Stochastic channel

A (memoryless) stochastic channel consists of a finite input alphabet $\mathcal{X}$, a finite output alphabet $\mathcal{Y}$, and a transition matrix $\mathbf{M}$ indexed by $\mathcal{X}\times \mathcal{Y}$, where for every $(x,y)\in \mathcal{X}\times \mathcal{Y}$, $\mathbf{M}(x,y)=\Pr (y\mid x)$ denotes the probability that $y$ is output by the channel when $x$ is input to the channel; in particular $\sum _{y\in \mathcal{Y}}\mathbf{M}(x,y)=1$ for every $x\in \mathcal{X}$. The channel is memoryless in the sense that the noise acts independently on each symbol of a transmitted string.

Definition 138 Binary Symmetric Channel, $\mathrm{BSC}_p$

Let $0\le p\le 1$. The Binary Symmetric Channel with crossover probability $p$, denoted $\mathrm{BSC}_p$, is the stochastic channel with $\mathcal{X}=\mathcal{Y}=\{ 0,1\} $ and transition matrix

\[ \mathbf{M}(x,y)= \begin{cases} 1-p & \text{if } y=x,\\ p & \text{if } y\ne x. \end{cases} \]

That is, every transmitted bit is flipped independently with probability $p$. (One may restrict attention to $0\le p\le \tfrac 12$, since reliable communication over $\mathrm{BSC}_p$ for $p\le \tfrac 12$ implies reliable communication over $\mathrm{BSC}_{p'}$ for $p'{\gt}\tfrac 12$.) For the rest of the chapter, $\mathbf{e}\sim \mathrm{BSC}_p$ denotes an error pattern $\mathbf{e}\in \{ 0,1\} ^n$ drawn according to the error distribution induced by $\mathrm{BSC}_p$.

Definition 139 $q$-ary Symmetric Channel, $q\mathrm{SC}_p$

Let $q\ge 2$ and $0\le p\le 1-\tfrac 1q$. The $q$-ary Symmetric Channel with crossover probability $p$, denoted $q\mathrm{SC}_p$, is the stochastic channel with $\mathcal{X}=\mathcal{Y}=[q]$ and transition matrix

\[ \mathbf{M}(x,y)= \begin{cases} 1-p & \text{if } y=x,\\ \dfrac {p}{q-1} & \text{if } y\ne x. \end{cases} \]

That is, every symbol is retained with probability $1-p$ and distorted to each of the $q-1$ other symbols with equal probability $\tfrac {p}{q-1}$.

Definition 140 Binary Erasure Channel, $\mathrm{BEC}_\alpha $

Let $0\le \alpha \le 1$. The Binary Erasure Channel with erasure probability $\alpha $, denoted $\mathrm{BEC}_\alpha $, is the stochastic channel with $\mathcal{X}=\{ 0,1\} $, $\mathcal{Y}=\{ 0,1,{?}\} $ (where ${?}$ denotes an erasure), and transition matrix given by

\[ \mathbf{M}(0,0)=1-\alpha ,\quad \mathbf{M}(0,{?})=\alpha ,\quad \mathbf{M}(1,1)=1-\alpha ,\quad \mathbf{M}(1,{?})=\alpha , \]

and $\mathbf{M}(x,y)=0$ for every other pair $(x,y)$. That is, every transmitted bit is erased with probability $\alpha $ and left unchanged with probability $1-\alpha $.

Definition 141 Binary Input Additive Gaussian White Noise Channel, $\mathrm{BIAGWN}_\sigma $

Let $\sigma \ge 0$. The Binary Input Additive Gaussian White Noise Channel with standard deviation $\sigma $, denoted $\mathrm{BIAGWN}_\sigma $, is the (continuous) channel with input alphabet $\mathcal{X}=\{ -1,1\} $ and output alphabet $\mathcal{Y}=\mathbb {R}$, in which for $(x,y)\in \{ -1,1\} \times \mathbb {R}$ the noise $y-x$ is distributed according to the Gaussian distribution with mean $0$ and standard deviation $\sigma $, i.e.

\[ \Pr (y\mid x)=\frac{1}{\sigma \sqrt{2\pi }}\cdot \exp \! \left(-\frac{(y-x)^2}{2\sigma ^2}\right). \]

Definition 142 Decoding error probability

Fix a stochastic channel and a pair of encoding and decoding functions $(E,D)$ for a code of block length $n$. The decoding error probability of $(E,D)$ is a function $f(n)$ such that for every message $\mathbf{m}$, the decoding algorithm recovers $\mathbf{m}$ with probability at least $1-f(n)$ (the probability taken over the randomness of the channel noise), where $\lim _{n\to \infty }f(n)=0$, i.e. $f(n)=o(1)$. Ideally one would like $f(n)=2^{-\Omega (n)}$.

Definition 143 Channel capacity

Given a stochastic channel, its capacity is a real number $C\ge 0$ such that reliable communication is possible over the channel if and only if the rate is less than $C$. That is: for every rate $R{\lt}C$, there is a family of codes (with associated encoding and decoding functions) of rate $R$ whose decoding error probability is negligible (indeed $2^{-\Omega (n)}$); and for every rate $R{\gt}C$, for every choice of encoding and decoding functions of rate $R$, there is some message for which the decoding error probability is bounded below by a constant.

Theorem 144 Shannon’s Capacity Theorem for BSC, Thm 6.3.1

For real numbers $p,\varepsilon $ such that $0\le p{\lt}\tfrac 12$ and $0\le \varepsilon \le \tfrac 12-p$, the following statements are true for large enough $n$:

There exists a real $\delta {\gt}0$, an encoding function $E:\{ 0,1\} ^k\to \{ 0,1\} ^n$ and a decoding function $D:\{ 0,1\} ^n\to \{ 0,1\} ^k$ where $k\le \lfloor (1-H(p+\varepsilon ))n\rfloor $, such that the following holds for every $\mathbf{m}\in \{ 0,1\} ^k$:

\[ \Pr _{\mathbf{e}\sim \mathrm{BSC}_p}[D(E(\mathbf{m})+\mathbf{e})\ne \mathbf{m}]\le 2^{-\delta n}. \]
If $k\ge \lceil (1-H(p)+\varepsilon )n\rceil $ then for every pair of encoding and decoding functions $E:\{ 0,1\} ^k\to \{ 0,1\} ^n$ and $D:\{ 0,1\} ^n\to \{ 0,1\} ^k$, there exists $\mathbf{m}\in \{ 0,1\} ^k$ such that

\[ \Pr _{\mathbf{e}\sim \mathrm{BSC}_p}[D(E(\mathbf{m})+\mathbf{e})\ne \mathbf{m}]{\gt}\frac12. \]

Proof ▶

Part (1) is Lemma 149 (proved via the probabilistic method with expurgation) and part (2) is Lemma 146 (proved via a packing argument).

Corollary 145 Capacity of $\mathrm{BSC}_p$

For $0\le p{\lt}\tfrac 12$, the capacity of $\mathrm{BSC}_p$ is $1-H(p)$.

Proof ▶

Fix a rate $R{\lt}1-H(p)$ and choose $\varepsilon {\gt}0$ small enough that $R\le 1-H(p+\varepsilon )$. By part (1) of Theorem 144, for every large enough $n$ there is a code of dimension $k\le \lfloor (1-H(p+\varepsilon ))n\rfloor $, hence of rate at least (a value tending to) $R$, and decoding error probability $2^{-\Omega (n)}$; this shows every rate $R{\lt}1-H(p)$ is achievable. Conversely, fix $R{\gt}1-H(p)$ and let $\varepsilon =R-(1-H(p)){\gt}0$; then $k=Rn\ge \lceil (1-H(p)+\varepsilon )n\rceil $ for large enough $n$, so by part (2) of Theorem 144, for every choice of encoding and decoding functions of rate $R$ there is a message with decoding error probability greater than $1/2$, so the decoding error probability is bounded below by a constant. Hence $1-H(p)$ satisfies the requirements of Definition 143 for the capacity of $\mathrm{BSC}_p$.

Lemma 146 Converse of Shannon’s capacity theorem for BSC, part (2) of Thm 6.3.1

Let $0\le p{\lt}\tfrac 12$ and $0\le \varepsilon \le \tfrac 12-p$. For large enough $n$, if $k\ge \lceil (1-H(p)+\varepsilon )n\rceil $ then for every pair of encoding and decoding functions $E:\{ 0,1\} ^k\to \{ 0,1\} ^n$ and $D:\{ 0,1\} ^n\to \{ 0,1\} ^k$, there exists $\mathbf{m}\in \{ 0,1\} ^k$ such that

\[ \Pr _{\mathbf{e}\sim \mathrm{BSC}_p}[D(E(\mathbf{m})+\mathbf{e})\ne \mathbf{m}]{\gt}\frac12. \]

Proof ▶

We may assume $p{\gt}0$: if $p=0$ then $1-H(p)+\varepsilon =1+\varepsilon {\gt}1$, so the hypothesis $k\ge \lceil (1-H(p)+\varepsilon )n\rceil $ forces $k{\gt}n$, in which case $E$ cannot be injective and no decoding function can be correct on every message even with zero noise; the claim is then immediate.

For the sake of contradiction, assume that for every $\mathbf{m}\in \{ 0,1\} ^k$,

\[ \Pr _{\mathbf{e}\sim \mathrm{BSC}_p}[D(E(\mathbf{m})+\mathbf{e})\ne \mathbf{m}]\le \frac12. \]

Define $D_{\mathbf{m}}=\{ \mathbf{y}\mid D(\mathbf{y})=\mathbf{m}\} $. Since $D$ is a function, the sets $D_{\mathbf{m}}$, $\mathbf{m}\in \{ 0,1\} ^k$, partition $\{ 0,1\} ^n$.

Fix $\gamma {\gt}0$ (to be chosen at the end of the proof) and let $S_{\mathbf{m}}$ be the shell of inner radius $(1-\gamma )pn$ and outer radius $(1+\gamma )pn$ around $E(\mathbf{m})$, i.e.

\[ S_{\mathbf{m}}=B\big(E(\mathbf{m}),(1+\gamma )pn\big)\setminus B\big(E(\mathbf{m}),(1-\gamma )pn\big). \]

Fix $\mathbf{m}\in \{ 0,1\} ^k$. By assumption,

\[ \Pr [E(\mathbf{m})+\mathbf{e}\notin D_{\mathbf{m}}]\le \frac12. \]

By the multiplicative Chernoff bound applied to $wt(\mathbf{e})$, a sum of $n$ i.i.d. Bernoulli$(p)$ random variables of mean $pn$,

\[ \Pr [E(\mathbf{m})+\mathbf{e}\notin S_{\mathbf{m}}]\le 2^{-\Omega (\gamma ^2n)}. \]

By the union bound,

\[ \Pr [E(\mathbf{m})+\mathbf{e}\notin D_{\mathbf{m}}\cap S_{\mathbf{m}}]\le \frac12+2^{-\Omega (\gamma ^2n)}, \]

so, for large enough $n$,

\[ \Pr [E(\mathbf{m})+\mathbf{e}\in D_{\mathbf{m}}\cap S_{\mathbf{m}}]\ge \frac12-2^{-\Omega (\gamma ^2n)}\ge \frac14. \]

On the other hand, since all error patterns of the same Hamming weight occur with the same probability under $\mathrm{BSC}_p$,

\[ \Pr [E(\mathbf{m})+\mathbf{e}\in D_{\mathbf{m}}\cap S_{\mathbf{m}}]\le |D_{\mathbf{m}}\cap S_{\mathbf{m}}|\cdot p_{\max }, \qquad p_{\max }=\max _{d\in [(1-\gamma )pn,(1+\gamma )pn]}p^d(1-p)^{n-d}. \]

Since $p\le \tfrac 12$, the function $p^d(1-p)^{n-d}$ is decreasing in $d$, so the maximum is attained at $d=(1-\gamma )pn$, giving

\[ p_{\max }=p^{(1-\gamma )pn}(1-p)^{n-(1-\gamma )pn}=\left(\frac{1-p}{p}\right)^{\gamma pn}2^{-nH(p)}. \]

Combining the last three displays,

\[ |D_{\mathbf{m}}\cap S_{\mathbf{m}}|\ge \frac14\left(\frac{1-p}{p}\right)^{-\gamma pn}2^{nH(p)} =\frac14\left(\frac1p-1\right)^{-\gamma pn}2^{nH(p)}. \]

Summing over all $2^k$ messages $\mathbf{m}$, using disjointness of the $D_{\mathbf{m}}$,

\[ 2^n=\sum _{\mathbf{m}\in \{ 0,1\} ^k}|D_{\mathbf{m}}|\ge \sum _{\mathbf{m}\in \{ 0,1\} ^k}|D_{\mathbf{m}}\cap S_{\mathbf{m}}| \ge 2^k\cdot \frac14\left(\frac1p-1\right)^{-\gamma pn}2^{nH(p)} = 2^{k-2}\cdot 2^{H(p)n-\gamma p\log (1/p-1)n}. \]

Choosing $\gamma =\dfrac {\varepsilon }{2p\log (1/p-1)}$ (so $\gamma =\Theta (\varepsilon )$, using $0{\lt}p{\lt}\tfrac 12$), for large enough $n$ we get

\[ 2^n{\gt}2^{k+H(p)n-\varepsilon n}. \]

This implies $k{\lt}(1-H(p)+\varepsilon )n$, contradicting $k\ge \lceil (1-H(p)+\varepsilon )n\rceil $. This proves the lemma.

The same proof, with a suitable choice of $\gamma $, shows that the conclusion of Lemma 146 still holds if the hypothesis is weakened to decoding error probability at most $1-2^{-\beta n}$ (instead of $1/2$) for small enough $\beta =\beta (\varepsilon ){\gt}0$.

Lemma 147 Step 1: expected decoding error probability is small

Let $0\le p{\lt}\tfrac 12$, $0{\lt}\varepsilon \le \tfrac 12-p$, and let $\varepsilon '=\varepsilon /2$. Let $k\le \lfloor (1-H(p+\varepsilon ))n\rfloor $. Suppose $E:\{ 0,1\} ^k\to \{ 0,1\} ^n$ is chosen at random by picking $E(\mathbf{m}')$ independently and uniformly at random from $\{ 0,1\} ^n$ for every $\mathbf{m}'\in \{ 0,1\} ^k$, and let $D$ be the maximum likelihood decoding (MLD) function for $E$. Then there is $\delta '=\delta '(\varepsilon ,p){\gt}0$ such that for every fixed $\mathbf{m}\in \{ 0,1\} ^k$ and large enough $n$,

\[ \mathbb {E}_E\Big[\Pr _{\mathbf{e}\sim \mathrm{BSC}_p}[D(E(\mathbf{m})+\mathbf{e})\ne \mathbf{m}]\Big]\le 2^{-\delta 'n}. \]

Proof ▶

Write $\mathbf{y}=E(\mathbf{m})+\mathbf{e}$ for the received word. Splitting the sum defining the decoding error probability according to whether $\mathbf{y}$ lies in the Hamming ball $B(E(\mathbf{m}),(p+\varepsilon ')n)$,

\[ \Pr _{\mathbf{e}\sim \mathrm{BSC}_p}[D(\mathbf{y})\ne \mathbf{m}] =\sum _{\mathbf{y}\in B(E(\mathbf{m}),(p+\varepsilon ')n)}\Pr [\mathbf{y}\mid E(\mathbf{m})]\cdot \mathbb 1_{D(\mathbf{y})\ne \mathbf{m}} +\sum _{\mathbf{y}\notin B(E(\mathbf{m}),(p+\varepsilon ')n)}\Pr [\mathbf{y}\mid E(\mathbf{m})]\cdot \mathbb 1_{D(\mathbf{y})\ne \mathbf{m}}. \]

The second sum is at most

\[ \sum _{\mathbf{y}\notin B(E(\mathbf{m}),(p+\varepsilon ')n)}\Pr [\mathbf{y}\mid E(\mathbf{m})] =\Pr _{\mathbf{e}\sim \mathrm{BSC}_p}[wt(\mathbf{e}){\gt}(p+\varepsilon ')n]\le e^{-(\varepsilon ')^2n/2}, \]

by the additive Chernoff bound applied to $wt(\mathbf{e})$, a sum of $n$ i.i.d. Bernoulli$(p)$ random variables. Hence

\[ \Pr _{\mathbf{e}\sim \mathrm{BSC}_p}[D(E(\mathbf{m})+\mathbf{e})\ne \mathbf{m}] \le \sum _{\mathbf{y}\in B(E(\mathbf{m}),(p+\varepsilon ')n)}\Pr [\mathbf{y}\mid E(\mathbf{m})]\cdot \mathbb 1_{D(\mathbf{y})\ne \mathbf{m}}+e^{-(\varepsilon ')^2n/2}. \]

Taking the expectation of both sides over the random choice of $E$ and using linearity of expectation,

\[ \mathbb {E}_E\Big[\Pr _{\mathbf{e}\sim \mathrm{BSC}_p}[D(E(\mathbf{m})+\mathbf{e})\ne \mathbf{m}]\Big] \le e^{-(\varepsilon ')^2n/2}+\sum _{\mathbf{y}\in B(E(\mathbf{m}),(p+\varepsilon ')n)}\Pr [\mathbf{y}\mid E(\mathbf{m})]\cdot \mathbb {E}_E\big[\mathbb 1_{D(\mathbf{y})\ne \mathbf{m}}\big]. \]

Fix a received word $\mathbf{y}$ with $\Delta (\mathbf{y},E(\mathbf{m}))\le (p+\varepsilon ')n$. Since $D$ is MLD, $D(\mathbf{y})\ne \mathbf{m}$ can happen only if there is $\mathbf{m}'\ne \mathbf{m}$ with $\Delta (E(\mathbf{m}'),\mathbf{y})\le \Delta (E(\mathbf{m}),\mathbf{y})\le (p+\varepsilon ')n$. By the union bound, and using that each $E(\mathbf{m}')$, $\mathbf{m}'\ne \mathbf{m}$, is independent of $E(\mathbf{m})$ and uniform on $\{ 0,1\} ^n$,

\[ \mathbb {E}_E\big[\mathbb 1_{D(\mathbf{y})\ne \mathbf{m}}\big] \le \sum _{\mathbf{m}'\ne \mathbf{m}}\Pr \big[E(\mathbf{m}')\in B(\mathbf{y},(p+\varepsilon ')n)\big] =\sum _{\mathbf{m}'\ne \mathbf{m}}\frac{|B(\mathbf{y},(p+\varepsilon ')n)|}{2^n}. \]

By the upper bound on the volume of a Hamming ball, $|B(\mathbf{y},(p+\varepsilon ')n)|\le 2^{H(p+\varepsilon ')n}$, so, using $k\le (1-H(p+\varepsilon ))n$,

\[ \mathbb {E}_E\big[\mathbb 1_{D(\mathbf{y})\ne \mathbf{m}}\big] \le 2^{k}\cdot 2^{H(p+\varepsilon ')n-n} {\lt}2^{n(1-H(p+\varepsilon ))-n(1-H(p+\varepsilon '))} =2^{-n(H(p+\varepsilon )-H(p+\varepsilon '))}. \]

Substituting back, and using $\sum _{\mathbf{y}\in B(E(\mathbf{m}),(p+\varepsilon ')n)}\Pr [\mathbf{y}\mid E(\mathbf{m})]\le 1$,

\[ \mathbb {E}_E\Big[\Pr _{\mathbf{e}\sim \mathrm{BSC}_p}[D(E(\mathbf{m})+\mathbf{e})\ne \mathbf{m}]\Big] \le e^{-(\varepsilon ')^2n/2}+2^{-n(H(p+\varepsilon )-H(p+\varepsilon '))}. \]

Taking $\varepsilon '=\varepsilon /2$, since $\varepsilon {\gt}0$, both $e^{-(\varepsilon ')^2n/2}$ and $2^{-n(H(p+\varepsilon )-H(p+\varepsilon '))}$ are $2^{-\Omega (n)}$, so for a suitable $\delta '=\delta '(\varepsilon ,p){\gt}0$ and large enough $n$ the right-hand side is at most $2^{-\delta 'n}$.

Lemma 148 Averaging argument, Claim 6.3.3

Let $E^*:\{ 0,1\} ^k\to \{ 0,1\} ^n$ and $D^*:\{ 0,1\} ^n\to \{ 0,1\} ^k$ satisfy

\[ \mathbb {E}_{\mathbf{m}}\Big[\Pr _{\mathbf{e}\sim \mathrm{BSC}_p}[D^*(E^*(\mathbf{m})+\mathbf{e})\ne \mathbf{m}]\Big]\le 2^{-\delta 'n}, \]

where $\mathbf{m}$ is uniform on $\{ 0,1\} ^k$. List the messages as $\mathbf{m}_1,\ldots ,\mathbf{m}_{2^k}$ and let $P_i=\Pr _{\mathbf{e}\sim \mathrm{BSC}_p}[D^*(E^*(\mathbf{m}_i)+\mathbf{e})\ne \mathbf{m}_i]$; assume $P_1\le P_2\le \cdots \le P_{2^k}$. Then $P_{2^{k-1}}\le 2\cdot 2^{-\delta 'n}$.

Proof ▶

By definition of $P_i$ and the hypothesis on $E^*,D^*$,

\[ \frac{1}{2^k}\sum _{i=1}^{2^k}P_i=\mathbb {E}_{\mathbf{m}}\Big[\Pr _{\mathbf{e}\sim \mathrm{BSC}_p}[D^*(E^*(\mathbf{m})+\mathbf{e})\ne \mathbf{m}]\Big]\le 2^{-\delta 'n}. \]

Suppose, for the sake of contradiction, that $P_{2^{k-1}}{\gt}2\cdot 2^{-\delta 'n}$. Since $P_1\le \cdots \le P_{2^k}$, we have $P_i\ge P_{2^{k-1}}$ for every $i{\gt}2^{k-1}$, so, dropping half the summands from the sum,

\[ \frac{1}{2^k}\sum _{i=1}^{2^k}P_i\ge \frac{1}{2^k}\sum _{i=2^{k-1}+1}^{2^k}P_i {\gt} \frac{2\cdot 2^{-\delta 'n}\cdot 2^{k-1}}{2^k}=2^{-\delta 'n}. \]

This contradicts $\frac{1}{2^k}\sum _{i=1}^{2^k}P_i\le 2^{-\delta 'n}$. Hence $P_{2^{k-1}}\le 2\cdot 2^{-\delta 'n}$.

Lemma 149 Positive part of Shannon’s capacity theorem for BSC, part (1) of Thm 6.3.1

Let $0\le p{\lt}\tfrac 12$ and $0{\lt}\varepsilon \le \tfrac 12-p$. For large enough $n$, there exists a real $\delta {\gt}0$, an encoding function $E:\{ 0,1\} ^k\to \{ 0,1\} ^n$ and a decoding function $D:\{ 0,1\} ^n\to \{ 0,1\} ^k$ where $k\le \lfloor (1-H(p+\varepsilon ))n\rfloor $, such that the following holds for every $\mathbf{m}\in \{ 0,1\} ^k$:

\[ \Pr _{\mathbf{e}\sim \mathrm{BSC}_p}[D(E(\mathbf{m})+\mathbf{e})\ne \mathbf{m}]\le 2^{-\delta n}. \]

Proof ▶

We use the probabilistic method. Let $k_0=\lfloor (1-H(p+\varepsilon ))n\rfloor +1$. Choose $E:\{ 0,1\} ^{k_0}\to \{ 0,1\} ^n$ at random by picking $E(\mathbf{m})$ independently and uniformly at random from $\{ 0,1\} ^n$ for every $\mathbf{m}\in \{ 0,1\} ^{k_0}$, and let $D$ be the MLD function for $E$.

By Lemma 147 (Step 1), there is $\delta '=\delta '(\varepsilon ,p){\gt}0$ such that for every fixed $\mathbf{m}\in \{ 0,1\} ^{k_0}$,

\[ \mathbb {E}_E\Big[\Pr _{\mathbf{e}\sim \mathrm{BSC}_p}[D(E(\mathbf{m})+\mathbf{e})\ne \mathbf{m}]\Big]\le 2^{-\delta 'n}. \]

Taking the expectation of both sides over $\mathbf{m}$ uniform on $\{ 0,1\} ^{k_0}$ and switching the (finite, independent) order of the expectations over $E$ and $\mathbf{m}$,

\[ \mathbb {E}_E\Big[\mathbb {E}_{\mathbf{m}}\big[\Pr _{\mathbf{e}\sim \mathrm{BSC}_p}[D(E(\mathbf{m})+\mathbf{e})\ne \mathbf{m}]\big]\Big]\le 2^{-\delta 'n}. \]

By the probabilistic method, there is a choice $E^*$ of $E$ (and corresponding $D^*$) such that

\[ \mathbb {E}_{\mathbf{m}}\Big[\Pr _{\mathbf{e}\sim \mathrm{BSC}_p}[D^*(E^*(\mathbf{m})+\mathbf{e})\ne \mathbf{m}]\Big]\le 2^{-\delta 'n}. \tag {$*$} \]

Apply Lemma 148 to $(*)$: order the messages $\mathbf{m}_1,\ldots ,\mathbf{m}_{2^{k_0}}$ so that $P_1\le \cdots \le P_{2^{k_0}}$, where $P_i=\Pr _{\mathbf{e}\sim \mathrm{BSC}_p}[D^*(E^*(\mathbf{m}_i)+\mathbf{e})\ne \mathbf{m}_i]$. Then $P_{2^{k_0-1}}\le 2\cdot 2^{-\delta 'n}$, and hence $P_i\le 2\cdot 2^{-\delta 'n}$ for every $i\le 2^{k_0-1}$.

Define the new code with message set $\{ \mathbf{m}_1,\ldots ,\mathbf{m}_{2^{k_0-1}}\} $: fixing an arbitrary bijection of this set with $\{ 0,1\} ^{k_0-1}$, let $E$ be the restriction of $E^*$ to this message set (composed with the bijection), and let $D$ be $D^*$ composed with the same bijection on its output (and mapping any output outside the restricted message set to an arbitrary fixed message, say). This new code has dimension $k=k_0-1=\lfloor (1-H(p+\varepsilon ))n\rfloor $, and every message has decoding error probability at most $2\cdot 2^{-\delta 'n}=2^{-(\delta '-1/n)n}$. Setting $\delta =\delta '-1/n$, which is positive for large enough $n$, completes the proof.

Definition 150 Explicit code, Def 6.3.4

A code $C$ of block length $n$ is called explicit if there exists a $\mathrm{poly}(n)$-time algorithm that computes a succinct description of $C$ given $n$. For linear codes, such a succinct description could be a generator matrix or a parity check matrix.

Definition 151 Strongly explicit code, Def 6.3.5

A linear $[n,k]$ code $C$ is called strongly explicit if, given any index pair $(i,j)\in [k]\times [n]$, there is a $\mathrm{poly}(\log n)$-time algorithm that outputs $G_{i,j}$, where $G$ is a generator matrix of $C$.

Proposition 152 Prop 6.4.1

Let $0\le p{\lt}\tfrac 12$ and $0{\lt}\varepsilon \le \tfrac 12-p$. If an algorithm $A$ can handle $p+\varepsilon $ fraction of worst case errors (i.e. correctly recovers the transmitted message from up to $(p+\varepsilon )n$ adversarially chosen errors, for codewords of block length $n$), then $A$ can be used for reliable communication over $\mathrm{BSC}_p$.

Proof ▶

By the additive Chernoff bound, with probability at least $1-e^{-\varepsilon ^2n/2}$, the fraction of errors introduced by $\mathrm{BSC}_p$ on a transmitted codeword is at most $p+\varepsilon $. By the assumption on $A$, whenever this event occurs $A$ correctly recovers the transmitted message. Hence $A$, used as the decoding algorithm, achieves decoding error probability at most $e^{-\varepsilon ^2n/2}=2^{-\Omega (n)}$ over $\mathrm{BSC}_p$, i.e. it can be used for reliable communication over $\mathrm{BSC}_p$.