← All blueprints

Essential Coding Theory — Blueprint

6 What Happens When the Noise is Stochastic: Shannon’s Theorem

Definition 137 Stochastic channel
#

A (memoryless) stochastic channel consists of a finite input alphabet \(\mathcal{X}\), a finite output alphabet \(\mathcal{Y}\), and a transition matrix \(\mathbf{M}\) indexed by \(\mathcal{X}\times \mathcal{Y}\), where for every \((x,y)\in \mathcal{X}\times \mathcal{Y}\), \(\mathbf{M}(x,y)=\Pr (y\mid x)\) denotes the probability that \(y\) is output by the channel when \(x\) is input to the channel; in particular \(\sum _{y\in \mathcal{Y}}\mathbf{M}(x,y)=1\) for every \(x\in \mathcal{X}\). The channel is memoryless in the sense that the noise acts independently on each symbol of a transmitted string.

Definition 138 Binary Symmetric Channel, \(\mathrm{BSC}_p\)
#

Let \(0\le p\le 1\). The Binary Symmetric Channel with crossover probability \(p\), denoted \(\mathrm{BSC}_p\), is the stochastic channel with \(\mathcal{X}=\mathcal{Y}=\{ 0,1\} \) and transition matrix

\[ \mathbf{M}(x,y)= \begin{cases} 1-p & \text{if } y=x,\\ p & \text{if } y\ne x. \end{cases} \]

That is, every transmitted bit is flipped independently with probability \(p\). (One may restrict attention to \(0\le p\le \tfrac 12\), since reliable communication over \(\mathrm{BSC}_p\) for \(p\le \tfrac 12\) implies reliable communication over \(\mathrm{BSC}_{p'}\) for \(p'{\gt}\tfrac 12\).) For the rest of the chapter, \(\mathbf{e}\sim \mathrm{BSC}_p\) denotes an error pattern \(\mathbf{e}\in \{ 0,1\} ^n\) drawn according to the error distribution induced by \(\mathrm{BSC}_p\).

Definition 139 \(q\)-ary Symmetric Channel, \(q\mathrm{SC}_p\)
#

Let \(q\ge 2\) and \(0\le p\le 1-\tfrac 1q\). The \(q\)-ary Symmetric Channel with crossover probability \(p\), denoted \(q\mathrm{SC}_p\), is the stochastic channel with \(\mathcal{X}=\mathcal{Y}=[q]\) and transition matrix

\[ \mathbf{M}(x,y)= \begin{cases} 1-p & \text{if } y=x,\\ \dfrac {p}{q-1} & \text{if } y\ne x. \end{cases} \]

That is, every symbol is retained with probability \(1-p\) and distorted to each of the \(q-1\) other symbols with equal probability \(\tfrac {p}{q-1}\).

Definition 140 Binary Erasure Channel, \(\mathrm{BEC}_\alpha \)
#

Let \(0\le \alpha \le 1\). The Binary Erasure Channel with erasure probability \(\alpha \), denoted \(\mathrm{BEC}_\alpha \), is the stochastic channel with \(\mathcal{X}=\{ 0,1\} \), \(\mathcal{Y}=\{ 0,1,{?}\} \) (where \({?}\) denotes an erasure), and transition matrix given by

\[ \mathbf{M}(0,0)=1-\alpha ,\quad \mathbf{M}(0,{?})=\alpha ,\quad \mathbf{M}(1,1)=1-\alpha ,\quad \mathbf{M}(1,{?})=\alpha , \]

and \(\mathbf{M}(x,y)=0\) for every other pair \((x,y)\). That is, every transmitted bit is erased with probability \(\alpha \) and left unchanged with probability \(1-\alpha \).

Definition 141 Binary Input Additive Gaussian White Noise Channel, \(\mathrm{BIAGWN}_\sigma \)
#

Let \(\sigma \ge 0\). The Binary Input Additive Gaussian White Noise Channel with standard deviation \(\sigma \), denoted \(\mathrm{BIAGWN}_\sigma \), is the (continuous) channel with input alphabet \(\mathcal{X}=\{ -1,1\} \) and output alphabet \(\mathcal{Y}=\mathbb {R}\), in which for \((x,y)\in \{ -1,1\} \times \mathbb {R}\) the noise \(y-x\) is distributed according to the Gaussian distribution with mean \(0\) and standard deviation \(\sigma \), i.e.

\[ \Pr (y\mid x)=\frac{1}{\sigma \sqrt{2\pi }}\cdot \exp \! \left(-\frac{(y-x)^2}{2\sigma ^2}\right). \]
Definition 142 Decoding error probability
#

Fix a stochastic channel and a pair of encoding and decoding functions \((E,D)\) for a code of block length \(n\). The decoding error probability of \((E,D)\) is a function \(f(n)\) such that for every message \(\mathbf{m}\), the decoding algorithm recovers \(\mathbf{m}\) with probability at least \(1-f(n)\) (the probability taken over the randomness of the channel noise), where \(\lim _{n\to \infty }f(n)=0\), i.e. \(f(n)=o(1)\). Ideally one would like \(f(n)=2^{-\Omega (n)}\).

Definition 143 Channel capacity
#

Given a stochastic channel, its capacity is a real number \(C\ge 0\) such that reliable communication is possible over the channel if and only if the rate is less than \(C\). That is: for every rate \(R{\lt}C\), there is a family of codes (with associated encoding and decoding functions) of rate \(R\) whose decoding error probability is negligible (indeed \(2^{-\Omega (n)}\)); and for every rate \(R{\gt}C\), for every choice of encoding and decoding functions of rate \(R\), there is some message for which the decoding error probability is bounded below by a constant.

Theorem 144 Shannon’s Capacity Theorem for BSC, Thm 6.3.1

For real numbers \(p,\varepsilon \) such that \(0\le p{\lt}\tfrac 12\) and \(0\le \varepsilon \le \tfrac 12-p\), the following statements are true for large enough \(n\):

  1. There exists a real \(\delta {\gt}0\), an encoding function \(E:\{ 0,1\} ^k\to \{ 0,1\} ^n\) and a decoding function \(D:\{ 0,1\} ^n\to \{ 0,1\} ^k\) where \(k\le \lfloor (1-H(p+\varepsilon ))n\rfloor \), such that the following holds for every \(\mathbf{m}\in \{ 0,1\} ^k\):

    \[ \Pr _{\mathbf{e}\sim \mathrm{BSC}_p}[D(E(\mathbf{m})+\mathbf{e})\ne \mathbf{m}]\le 2^{-\delta n}. \]
  2. If \(k\ge \lceil (1-H(p)+\varepsilon )n\rceil \) then for every pair of encoding and decoding functions \(E:\{ 0,1\} ^k\to \{ 0,1\} ^n\) and \(D:\{ 0,1\} ^n\to \{ 0,1\} ^k\), there exists \(\mathbf{m}\in \{ 0,1\} ^k\) such that

    \[ \Pr _{\mathbf{e}\sim \mathrm{BSC}_p}[D(E(\mathbf{m})+\mathbf{e})\ne \mathbf{m}]{\gt}\frac12. \]
Proof

Part (1) is Lemma 149 (proved via the probabilistic method with expurgation) and part (2) is Lemma 146 (proved via a packing argument).

Corollary 145 Capacity of \(\mathrm{BSC}_p\)

For \(0\le p{\lt}\tfrac 12\), the capacity of \(\mathrm{BSC}_p\) is \(1-H(p)\).

Proof

Fix a rate \(R{\lt}1-H(p)\) and choose \(\varepsilon {\gt}0\) small enough that \(R\le 1-H(p+\varepsilon )\). By part (1) of Theorem 144, for every large enough \(n\) there is a code of dimension \(k\le \lfloor (1-H(p+\varepsilon ))n\rfloor \), hence of rate at least (a value tending to) \(R\), and decoding error probability \(2^{-\Omega (n)}\); this shows every rate \(R{\lt}1-H(p)\) is achievable. Conversely, fix \(R{\gt}1-H(p)\) and let \(\varepsilon =R-(1-H(p)){\gt}0\); then \(k=Rn\ge \lceil (1-H(p)+\varepsilon )n\rceil \) for large enough \(n\), so by part (2) of Theorem 144, for every choice of encoding and decoding functions of rate \(R\) there is a message with decoding error probability greater than \(1/2\), so the decoding error probability is bounded below by a constant. Hence \(1-H(p)\) satisfies the requirements of Definition 143 for the capacity of \(\mathrm{BSC}_p\).

Lemma 146 Converse of Shannon’s capacity theorem for BSC, part (2) of Thm 6.3.1

Let \(0\le p{\lt}\tfrac 12\) and \(0\le \varepsilon \le \tfrac 12-p\). For large enough \(n\), if \(k\ge \lceil (1-H(p)+\varepsilon )n\rceil \) then for every pair of encoding and decoding functions \(E:\{ 0,1\} ^k\to \{ 0,1\} ^n\) and \(D:\{ 0,1\} ^n\to \{ 0,1\} ^k\), there exists \(\mathbf{m}\in \{ 0,1\} ^k\) such that

\[ \Pr _{\mathbf{e}\sim \mathrm{BSC}_p}[D(E(\mathbf{m})+\mathbf{e})\ne \mathbf{m}]{\gt}\frac12. \]
Proof

We may assume \(p{\gt}0\): if \(p=0\) then \(1-H(p)+\varepsilon =1+\varepsilon {\gt}1\), so the hypothesis \(k\ge \lceil (1-H(p)+\varepsilon )n\rceil \) forces \(k{\gt}n\), in which case \(E\) cannot be injective and no decoding function can be correct on every message even with zero noise; the claim is then immediate.

For the sake of contradiction, assume that for every \(\mathbf{m}\in \{ 0,1\} ^k\),

\[ \Pr _{\mathbf{e}\sim \mathrm{BSC}_p}[D(E(\mathbf{m})+\mathbf{e})\ne \mathbf{m}]\le \frac12. \]

Define \(D_{\mathbf{m}}=\{ \mathbf{y}\mid D(\mathbf{y})=\mathbf{m}\} \). Since \(D\) is a function, the sets \(D_{\mathbf{m}}\), \(\mathbf{m}\in \{ 0,1\} ^k\), partition \(\{ 0,1\} ^n\).

Fix \(\gamma {\gt}0\) (to be chosen at the end of the proof) and let \(S_{\mathbf{m}}\) be the shell of inner radius \((1-\gamma )pn\) and outer radius \((1+\gamma )pn\) around \(E(\mathbf{m})\), i.e.

\[ S_{\mathbf{m}}=B\big(E(\mathbf{m}),(1+\gamma )pn\big)\setminus B\big(E(\mathbf{m}),(1-\gamma )pn\big). \]

Fix \(\mathbf{m}\in \{ 0,1\} ^k\). By assumption,

\[ \Pr [E(\mathbf{m})+\mathbf{e}\notin D_{\mathbf{m}}]\le \frac12. \]

By the multiplicative Chernoff bound applied to \(wt(\mathbf{e})\), a sum of \(n\) i.i.d. Bernoulli\((p)\) random variables of mean \(pn\),

\[ \Pr [E(\mathbf{m})+\mathbf{e}\notin S_{\mathbf{m}}]\le 2^{-\Omega (\gamma ^2n)}. \]

By the union bound,

\[ \Pr [E(\mathbf{m})+\mathbf{e}\notin D_{\mathbf{m}}\cap S_{\mathbf{m}}]\le \frac12+2^{-\Omega (\gamma ^2n)}, \]

so, for large enough \(n\),

\[ \Pr [E(\mathbf{m})+\mathbf{e}\in D_{\mathbf{m}}\cap S_{\mathbf{m}}]\ge \frac12-2^{-\Omega (\gamma ^2n)}\ge \frac14. \]

On the other hand, since all error patterns of the same Hamming weight occur with the same probability under \(\mathrm{BSC}_p\),

\[ \Pr [E(\mathbf{m})+\mathbf{e}\in D_{\mathbf{m}}\cap S_{\mathbf{m}}]\le |D_{\mathbf{m}}\cap S_{\mathbf{m}}|\cdot p_{\max }, \qquad p_{\max }=\max _{d\in [(1-\gamma )pn,(1+\gamma )pn]}p^d(1-p)^{n-d}. \]

Since \(p\le \tfrac 12\), the function \(p^d(1-p)^{n-d}\) is decreasing in \(d\), so the maximum is attained at \(d=(1-\gamma )pn\), giving

\[ p_{\max }=p^{(1-\gamma )pn}(1-p)^{n-(1-\gamma )pn}=\left(\frac{1-p}{p}\right)^{\gamma pn}2^{-nH(p)}. \]

Combining the last three displays,

\[ |D_{\mathbf{m}}\cap S_{\mathbf{m}}|\ge \frac14\left(\frac{1-p}{p}\right)^{-\gamma pn}2^{nH(p)} =\frac14\left(\frac1p-1\right)^{-\gamma pn}2^{nH(p)}. \]

Summing over all \(2^k\) messages \(\mathbf{m}\), using disjointness of the \(D_{\mathbf{m}}\),

\[ 2^n=\sum _{\mathbf{m}\in \{ 0,1\} ^k}|D_{\mathbf{m}}|\ge \sum _{\mathbf{m}\in \{ 0,1\} ^k}|D_{\mathbf{m}}\cap S_{\mathbf{m}}| \ge 2^k\cdot \frac14\left(\frac1p-1\right)^{-\gamma pn}2^{nH(p)} = 2^{k-2}\cdot 2^{H(p)n-\gamma p\log (1/p-1)n}. \]

Choosing \(\gamma =\dfrac {\varepsilon }{2p\log (1/p-1)}\) (so \(\gamma =\Theta (\varepsilon )\), using \(0{\lt}p{\lt}\tfrac 12\)), for large enough \(n\) we get

\[ 2^n{\gt}2^{k+H(p)n-\varepsilon n}. \]

This implies \(k{\lt}(1-H(p)+\varepsilon )n\), contradicting \(k\ge \lceil (1-H(p)+\varepsilon )n\rceil \). This proves the lemma.

The same proof, with a suitable choice of \(\gamma \), shows that the conclusion of Lemma 146 still holds if the hypothesis is weakened to decoding error probability at most \(1-2^{-\beta n}\) (instead of \(1/2\)) for small enough \(\beta =\beta (\varepsilon ){\gt}0\).

Lemma 147 Step 1: expected decoding error probability is small

Let \(0\le p{\lt}\tfrac 12\), \(0{\lt}\varepsilon \le \tfrac 12-p\), and let \(\varepsilon '=\varepsilon /2\). Let \(k\le \lfloor (1-H(p+\varepsilon ))n\rfloor \). Suppose \(E:\{ 0,1\} ^k\to \{ 0,1\} ^n\) is chosen at random by picking \(E(\mathbf{m}')\) independently and uniformly at random from \(\{ 0,1\} ^n\) for every \(\mathbf{m}'\in \{ 0,1\} ^k\), and let \(D\) be the maximum likelihood decoding (MLD) function for \(E\). Then there is \(\delta '=\delta '(\varepsilon ,p){\gt}0\) such that for every fixed \(\mathbf{m}\in \{ 0,1\} ^k\) and large enough \(n\),

\[ \mathbb {E}_E\Big[\Pr _{\mathbf{e}\sim \mathrm{BSC}_p}[D(E(\mathbf{m})+\mathbf{e})\ne \mathbf{m}]\Big]\le 2^{-\delta 'n}. \]
Proof

Write \(\mathbf{y}=E(\mathbf{m})+\mathbf{e}\) for the received word. Splitting the sum defining the decoding error probability according to whether \(\mathbf{y}\) lies in the Hamming ball \(B(E(\mathbf{m}),(p+\varepsilon ')n)\),

\[ \Pr _{\mathbf{e}\sim \mathrm{BSC}_p}[D(\mathbf{y})\ne \mathbf{m}] =\sum _{\mathbf{y}\in B(E(\mathbf{m}),(p+\varepsilon ')n)}\Pr [\mathbf{y}\mid E(\mathbf{m})]\cdot \mathbb 1_{D(\mathbf{y})\ne \mathbf{m}} +\sum _{\mathbf{y}\notin B(E(\mathbf{m}),(p+\varepsilon ')n)}\Pr [\mathbf{y}\mid E(\mathbf{m})]\cdot \mathbb 1_{D(\mathbf{y})\ne \mathbf{m}}. \]

The second sum is at most

\[ \sum _{\mathbf{y}\notin B(E(\mathbf{m}),(p+\varepsilon ')n)}\Pr [\mathbf{y}\mid E(\mathbf{m})] =\Pr _{\mathbf{e}\sim \mathrm{BSC}_p}[wt(\mathbf{e}){\gt}(p+\varepsilon ')n]\le e^{-(\varepsilon ')^2n/2}, \]

by the additive Chernoff bound applied to \(wt(\mathbf{e})\), a sum of \(n\) i.i.d. Bernoulli\((p)\) random variables. Hence

\[ \Pr _{\mathbf{e}\sim \mathrm{BSC}_p}[D(E(\mathbf{m})+\mathbf{e})\ne \mathbf{m}] \le \sum _{\mathbf{y}\in B(E(\mathbf{m}),(p+\varepsilon ')n)}\Pr [\mathbf{y}\mid E(\mathbf{m})]\cdot \mathbb 1_{D(\mathbf{y})\ne \mathbf{m}}+e^{-(\varepsilon ')^2n/2}. \]

Taking the expectation of both sides over the random choice of \(E\) and using linearity of expectation,

\[ \mathbb {E}_E\Big[\Pr _{\mathbf{e}\sim \mathrm{BSC}_p}[D(E(\mathbf{m})+\mathbf{e})\ne \mathbf{m}]\Big] \le e^{-(\varepsilon ')^2n/2}+\sum _{\mathbf{y}\in B(E(\mathbf{m}),(p+\varepsilon ')n)}\Pr [\mathbf{y}\mid E(\mathbf{m})]\cdot \mathbb {E}_E\big[\mathbb 1_{D(\mathbf{y})\ne \mathbf{m}}\big]. \]

Fix a received word \(\mathbf{y}\) with \(\Delta (\mathbf{y},E(\mathbf{m}))\le (p+\varepsilon ')n\). Since \(D\) is MLD, \(D(\mathbf{y})\ne \mathbf{m}\) can happen only if there is \(\mathbf{m}'\ne \mathbf{m}\) with \(\Delta (E(\mathbf{m}'),\mathbf{y})\le \Delta (E(\mathbf{m}),\mathbf{y})\le (p+\varepsilon ')n\). By the union bound, and using that each \(E(\mathbf{m}')\), \(\mathbf{m}'\ne \mathbf{m}\), is independent of \(E(\mathbf{m})\) and uniform on \(\{ 0,1\} ^n\),

\[ \mathbb {E}_E\big[\mathbb 1_{D(\mathbf{y})\ne \mathbf{m}}\big] \le \sum _{\mathbf{m}'\ne \mathbf{m}}\Pr \big[E(\mathbf{m}')\in B(\mathbf{y},(p+\varepsilon ')n)\big] =\sum _{\mathbf{m}'\ne \mathbf{m}}\frac{|B(\mathbf{y},(p+\varepsilon ')n)|}{2^n}. \]

By the upper bound on the volume of a Hamming ball, \(|B(\mathbf{y},(p+\varepsilon ')n)|\le 2^{H(p+\varepsilon ')n}\), so, using \(k\le (1-H(p+\varepsilon ))n\),

\[ \mathbb {E}_E\big[\mathbb 1_{D(\mathbf{y})\ne \mathbf{m}}\big] \le 2^{k}\cdot 2^{H(p+\varepsilon ')n-n} {\lt}2^{n(1-H(p+\varepsilon ))-n(1-H(p+\varepsilon '))} =2^{-n(H(p+\varepsilon )-H(p+\varepsilon '))}. \]

Substituting back, and using \(\sum _{\mathbf{y}\in B(E(\mathbf{m}),(p+\varepsilon ')n)}\Pr [\mathbf{y}\mid E(\mathbf{m})]\le 1\),

\[ \mathbb {E}_E\Big[\Pr _{\mathbf{e}\sim \mathrm{BSC}_p}[D(E(\mathbf{m})+\mathbf{e})\ne \mathbf{m}]\Big] \le e^{-(\varepsilon ')^2n/2}+2^{-n(H(p+\varepsilon )-H(p+\varepsilon '))}. \]

Taking \(\varepsilon '=\varepsilon /2\), since \(\varepsilon {\gt}0\), both \(e^{-(\varepsilon ')^2n/2}\) and \(2^{-n(H(p+\varepsilon )-H(p+\varepsilon '))}\) are \(2^{-\Omega (n)}\), so for a suitable \(\delta '=\delta '(\varepsilon ,p){\gt}0\) and large enough \(n\) the right-hand side is at most \(2^{-\delta 'n}\).

Lemma 148 Averaging argument, Claim 6.3.3

Let \(E^*:\{ 0,1\} ^k\to \{ 0,1\} ^n\) and \(D^*:\{ 0,1\} ^n\to \{ 0,1\} ^k\) satisfy

\[ \mathbb {E}_{\mathbf{m}}\Big[\Pr _{\mathbf{e}\sim \mathrm{BSC}_p}[D^*(E^*(\mathbf{m})+\mathbf{e})\ne \mathbf{m}]\Big]\le 2^{-\delta 'n}, \]

where \(\mathbf{m}\) is uniform on \(\{ 0,1\} ^k\). List the messages as \(\mathbf{m}_1,\ldots ,\mathbf{m}_{2^k}\) and let \(P_i=\Pr _{\mathbf{e}\sim \mathrm{BSC}_p}[D^*(E^*(\mathbf{m}_i)+\mathbf{e})\ne \mathbf{m}_i]\); assume \(P_1\le P_2\le \cdots \le P_{2^k}\). Then \(P_{2^{k-1}}\le 2\cdot 2^{-\delta 'n}\).

Proof

By definition of \(P_i\) and the hypothesis on \(E^*,D^*\),

\[ \frac{1}{2^k}\sum _{i=1}^{2^k}P_i=\mathbb {E}_{\mathbf{m}}\Big[\Pr _{\mathbf{e}\sim \mathrm{BSC}_p}[D^*(E^*(\mathbf{m})+\mathbf{e})\ne \mathbf{m}]\Big]\le 2^{-\delta 'n}. \]

Suppose, for the sake of contradiction, that \(P_{2^{k-1}}{\gt}2\cdot 2^{-\delta 'n}\). Since \(P_1\le \cdots \le P_{2^k}\), we have \(P_i\ge P_{2^{k-1}}\) for every \(i{\gt}2^{k-1}\), so, dropping half the summands from the sum,

\[ \frac{1}{2^k}\sum _{i=1}^{2^k}P_i\ge \frac{1}{2^k}\sum _{i=2^{k-1}+1}^{2^k}P_i {\gt} \frac{2\cdot 2^{-\delta 'n}\cdot 2^{k-1}}{2^k}=2^{-\delta 'n}. \]

This contradicts \(\frac{1}{2^k}\sum _{i=1}^{2^k}P_i\le 2^{-\delta 'n}\). Hence \(P_{2^{k-1}}\le 2\cdot 2^{-\delta 'n}\).

Lemma 149 Positive part of Shannon’s capacity theorem for BSC, part (1) of Thm 6.3.1

Let \(0\le p{\lt}\tfrac 12\) and \(0{\lt}\varepsilon \le \tfrac 12-p\). For large enough \(n\), there exists a real \(\delta {\gt}0\), an encoding function \(E:\{ 0,1\} ^k\to \{ 0,1\} ^n\) and a decoding function \(D:\{ 0,1\} ^n\to \{ 0,1\} ^k\) where \(k\le \lfloor (1-H(p+\varepsilon ))n\rfloor \), such that the following holds for every \(\mathbf{m}\in \{ 0,1\} ^k\):

\[ \Pr _{\mathbf{e}\sim \mathrm{BSC}_p}[D(E(\mathbf{m})+\mathbf{e})\ne \mathbf{m}]\le 2^{-\delta n}. \]
Proof

We use the probabilistic method. Let \(k_0=\lfloor (1-H(p+\varepsilon ))n\rfloor +1\). Choose \(E:\{ 0,1\} ^{k_0}\to \{ 0,1\} ^n\) at random by picking \(E(\mathbf{m})\) independently and uniformly at random from \(\{ 0,1\} ^n\) for every \(\mathbf{m}\in \{ 0,1\} ^{k_0}\), and let \(D\) be the MLD function for \(E\).

By Lemma 147 (Step 1), there is \(\delta '=\delta '(\varepsilon ,p){\gt}0\) such that for every fixed \(\mathbf{m}\in \{ 0,1\} ^{k_0}\),

\[ \mathbb {E}_E\Big[\Pr _{\mathbf{e}\sim \mathrm{BSC}_p}[D(E(\mathbf{m})+\mathbf{e})\ne \mathbf{m}]\Big]\le 2^{-\delta 'n}. \]

Taking the expectation of both sides over \(\mathbf{m}\) uniform on \(\{ 0,1\} ^{k_0}\) and switching the (finite, independent) order of the expectations over \(E\) and \(\mathbf{m}\),

\[ \mathbb {E}_E\Big[\mathbb {E}_{\mathbf{m}}\big[\Pr _{\mathbf{e}\sim \mathrm{BSC}_p}[D(E(\mathbf{m})+\mathbf{e})\ne \mathbf{m}]\big]\Big]\le 2^{-\delta 'n}. \]

By the probabilistic method, there is a choice \(E^*\) of \(E\) (and corresponding \(D^*\)) such that

\[ \mathbb {E}_{\mathbf{m}}\Big[\Pr _{\mathbf{e}\sim \mathrm{BSC}_p}[D^*(E^*(\mathbf{m})+\mathbf{e})\ne \mathbf{m}]\Big]\le 2^{-\delta 'n}. \tag {$*$} \]

Apply Lemma 148 to \((*)\): order the messages \(\mathbf{m}_1,\ldots ,\mathbf{m}_{2^{k_0}}\) so that \(P_1\le \cdots \le P_{2^{k_0}}\), where \(P_i=\Pr _{\mathbf{e}\sim \mathrm{BSC}_p}[D^*(E^*(\mathbf{m}_i)+\mathbf{e})\ne \mathbf{m}_i]\). Then \(P_{2^{k_0-1}}\le 2\cdot 2^{-\delta 'n}\), and hence \(P_i\le 2\cdot 2^{-\delta 'n}\) for every \(i\le 2^{k_0-1}\).

Define the new code with message set \(\{ \mathbf{m}_1,\ldots ,\mathbf{m}_{2^{k_0-1}}\} \): fixing an arbitrary bijection of this set with \(\{ 0,1\} ^{k_0-1}\), let \(E\) be the restriction of \(E^*\) to this message set (composed with the bijection), and let \(D\) be \(D^*\) composed with the same bijection on its output (and mapping any output outside the restricted message set to an arbitrary fixed message, say). This new code has dimension \(k=k_0-1=\lfloor (1-H(p+\varepsilon ))n\rfloor \), and every message has decoding error probability at most \(2\cdot 2^{-\delta 'n}=2^{-(\delta '-1/n)n}\). Setting \(\delta =\delta '-1/n\), which is positive for large enough \(n\), completes the proof.

Definition 150 Explicit code, Def 6.3.4
#

A code \(C\) of block length \(n\) is called explicit if there exists a \(\mathrm{poly}(n)\)-time algorithm that computes a succinct description of \(C\) given \(n\). For linear codes, such a succinct description could be a generator matrix or a parity check matrix.

Definition 151 Strongly explicit code, Def 6.3.5
#

A linear \([n,k]\) code \(C\) is called strongly explicit if, given any index pair \((i,j)\in [k]\times [n]\), there is a \(\mathrm{poly}(\log n)\)-time algorithm that outputs \(G_{i,j}\), where \(G\) is a generator matrix of \(C\).

Proposition 152 Prop 6.4.1

Let \(0\le p{\lt}\tfrac 12\) and \(0{\lt}\varepsilon \le \tfrac 12-p\). If an algorithm \(A\) can handle \(p+\varepsilon \) fraction of worst case errors (i.e. correctly recovers the transmitted message from up to \((p+\varepsilon )n\) adversarially chosen errors, for codewords of block length \(n\)), then \(A\) can be used for reliable communication over \(\mathrm{BSC}_p\).

Proof

By the additive Chernoff bound, with probability at least \(1-e^{-\varepsilon ^2n/2}\), the fraction of errors introduced by \(\mathrm{BSC}_p\) on a transmitted codeword is at most \(p+\varepsilon \). By the assumption on \(A\), whenever this event occurs \(A\) correctly recovers the transmitted message. Hence \(A\), used as the decoding algorithm, achieves decoding error probability at most \(e^{-\varepsilon ^2n/2}=2^{-\Omega (n)}\) over \(\mathrm{BSC}_p\), i.e. it can be used for reliable communication over \(\mathrm{BSC}_p\).