3 Extensions to More General Channels

Definition 29 Dobrushin channel (Def. 4.1)

Fix probability distributions \(\mathcal{D}_0,\mathcal{D}_1\) over \(\{ 0,1\} ^*\). A \((\mathcal{D}_0,\mathcal{D}_1)\)-Dobrushin channel \(\mathrm{DC}_{\mathcal{D}_0,\mathcal{D}_1}\) acts independently on each input bit by \((\mathrm{DC}\, 0)\sim \mathcal{D}_0\) and \((\mathrm{DC}\, 1)\sim \mathcal{D}_1\), and concatenates the outputs. It is square-integrable if for \(Y_i\sim \mathcal{D}_i\) we have \(\mathbb {E}|Y_i|^2{\lt}\infty \), \(i=0,1\). Repeat channels are the special case where \(\mathcal{D}_0\) is supported on all-zero strings, \(\mathcal{D}_1\) on all-one strings, with matching induced length distributions.

Definition 30 Biased Dobrushin channel (Def. 4.2)

A Dobrushin channel \(\mathrm{DC}_{\mathcal{D}_0,\mathcal{D}_1}\) is biased if for \(Y_0\sim \mathcal{D}_0\), \(Y_1\sim \mathcal{D}_1\) we have \(\mathbb {E}|Y_0|=\mathbb {E}|Y_1|{\lt}\infty \) and \(\mathbb {E}[w(Y_0)]{\lt}\tfrac 12\mathbb {E}|Y_0|\) and \(\mathbb {E}[w(Y_1)]{\gt}\tfrac 12\mathbb {E}|Y_1|\). The expected fraction of ones in the output of a long zero-run is \(f:=\mathbb {E}[w(Y_0)]/\mathbb {E}|Y_0|{\lt}\tfrac 12\).

Definition 31 Trimming Dobrushin channel (Def. 4.4)

Given \(n\) and distributions \(\mathcal{T}_\ell ,\mathcal{T}_r\) over \(\mathbb {N}\) (may depend on \(n\)), let \(\mathrm{TRIM}_{\mathcal{T}_\ell ,\mathcal{T}_r}\) act on \(x\in \{ 0,1\} ^n\) by \(\mathrm{TRIM}_{\mathcal{T}_\ell ,\mathcal{T}_r}\, x=x_{t_\ell }^{n-t_r}\) where \(t_\ell \sim \mathcal{T}_\ell \), \(t_r\sim \mathcal{T}_r\) are independent (the empty string if \(t_2{\lt}t_1\)). For a Dobrushin channel \(\mathrm{DC}=\mathrm{DC}_{\mathcal{D}_0,\mathcal{D}_1}\), the \((\mathcal{T}_\ell ,\mathcal{T}_r)\)-trimming Dobrushin channel is \(\mathrm{TDC}_{\mathcal{D}_0,\mathcal{D}_1,\mathcal{T}_\ell ,\mathcal{T}_r} :=\mathrm{TRIM}_{\mathcal{T}_\ell ,\mathcal{T}_r}\circ \mathrm{DC}\).

Lemma 32 Trimming preserves the Dobrushin rate (Lemma 4.5 / A.3)

Let \(\mathrm{DC}\) be a square-integrable Dobrushin channel and let \(\mathrm{TDC}_{\mathcal{T}_\ell ,\mathcal{T}_r}\) be the trimming version with \(\mathcal{T}_\ell ,\mathcal{T}_r\) each bounded by \(\nu \eta m\) for inputs of length \(m\). Then the information rate of \(\mathrm{TDC}_{\mathcal{T}_\ell ,\mathcal{T}_r}\) is at least \(\mathrm{Cap}(\mathrm{DC})-2\nu \eta \).

Proof ▶

Following the proof of Lemma 25, let \(T_\ell \sim \mathcal{T}_\ell \), \(T_r\sim \mathcal{T}_r\) be the numbers of bits trimmed off the left and right of the output \(Y=\mathrm{DC}\, X\) (with \(X\) on \(\{ 0,1\} ^m\)), and set \(L=T_\ell \), \(R=m-T_r\). We prove the analogue of Claim 26 with the modified second part

\[ \lim _{m\to \infty }\frac1m\sup _X I(X;Y\mid \widetilde L,\widetilde R) \le \lim _{m\to \infty }\frac1m\sup _X I(X;Y')+2\nu \eta , \]

\(Y'=\mathrm{TDC}_{\mathcal{T}_\ell ,\mathcal{T}_r}\, X\). The proof of the first equation is identical to Claim 26. For the second, by the same chain-rule argument (Lemma 12) we reach \(I(X;Y\mid \widetilde L,\widetilde R)\le I(X;Y_{L+1}^{R-1}\mid \widetilde L, \widetilde R)+H(Y_1^L,Y_R^m)\), but now the trimmed ends are no longer all-zero, so we can only bound \(H(Y_1^L,Y_R^m)\le \log ((2^{m\nu \eta })^2)=2m\nu \eta \). The rest of the argument proceeds as in Lemma 25, giving the stated bound.

Lemma 33 Received buffers stay zero-heavy (Lemma A.2)

Let \(c\) be a codeword of length \(m\) from the near-capacity-achieving code for a biased square-integrable Dobrushin channel \(\mathrm{DC}=\mathrm{DC}_{\mathcal{D}_0,\mathcal{D}_1}\) (the analogue of Proposition 28). The probability that any length-\(\nu \eta m\) substring of \(\mathrm{DC}\, c\) has a fraction of zeros less than \(f+\kappa \) (where \(f=\mathbb {E}[w(Y_0)]/\mathbb {E}|Y_0|\), \(Y_0\sim \mathcal{D}_0\)) vanishes as \(m\to \infty \), for any \(\nu ,\eta {\gt}0\) and all \(\kappa {\gt}0\) small enough.

Proof ▶

Let \(Y^1,\dots ,Y^t\) be the outputs of \(\mathrm{DC}\) on the individual bits of \(c\) fully contained in the first \(\nu \eta m\) output bits. Up to \(o(m)\) leftover bits (negligible with high probability), the window equals \(Y^1Y^2\cdots Y^t\), and by a union bound over \(O(1/\nu \eta )\) blocks it suffices to treat this prefix window. We must show

\[ \mathbb {P}\Big(\frac{\sum _{j=1}^t w(Y^j)}{\sum _{j=1}^t|Y^j|}{\lt}f+\kappa \Big) = \mathbb {P}\Big((f+\kappa )\sum _{j=1}^t|Y^j|-\sum _{j=1}^t w(Y^j){\gt}0\Big)\to 0. \]

Conditioning on \(t\) and using \(\mathbb {E}|Y_0|=\mathbb {E}|Y_1|\),

\[ \mathbb {E}\Big[(f+\kappa )\sum _j|Y^j|-\sum _j w(Y^j)\Big] = (f+\kappa )(\mathbb {E}t)\mathbb {E}|Y_0|-\mathbb {E}\Big[\sum _j \mathbb {E}[w(Y^j)]\Big]. \]

By the balance property of \(c\) (analogue of Proposition 28), each long substring of \(c\) has at least a \(\gamma \)-fraction of ones, so for \(t\ge \tfrac 12\mathbb {E}t\) (whp),

\[ (f+\kappa )(\mathbb {E}t)\mathbb {E}|Y_0|-\big(\gamma (\mathbb {E}t)\mathbb {E}[w(Y_1)]+(1-\gamma )(\mathbb {E}t)\mathbb {E}[w(Y_0)]\big) \le (\mathbb {E}t)\mathbb {E}|Y_0|\, \big(f+\kappa -(\gamma /2+(1-\gamma )f)\big). \]

Since \(f{\lt}\tfrac 12\) and \(\gamma {\gt}0\), choosing \(\kappa \) small makes this \(-C\, \mathbb {E}t\) for a constant \(C{\gt}0\), i.e. \(-\Omega (m)\). By concentration (variances are \(o(m^2)\), via Chebyshev’s inequality, Theorem 5), the probability vanishes as \(m\to \infty \).

Theorem 34 Codes for biased Dobrushin channels (Thm. 4.3)

Fix a biased square-integrable Dobrushin channel \(\mathrm{DC}\). For every \(\varepsilon {\gt}0\) there exists a sound code \(\mathcal{C}\) with rate \(R\) for the \(\mathrm{DC}\) with \(R\ge \mathrm{Cap}(\mathrm{DC})-\varepsilon \) and linear-time encoding, quasi-linear-time decoding. Moreover the decoder has probability of failure \(e^{-\Omega (n)}\).

Proof ▶

We adapt the construction of Definition 24, changing only buffer identification: declare any contiguous block of \(\nu \eta m\) bits with a fraction of ones less than \(f+\kappa \) to be part of a buffer (implemented with a running window of size \(\nu \eta m\); a buffer starts when the fraction drops below \(f+\kappa \) and ends when it rises above). The rate is as in Theorem 23 except \(n=km/(\mathrm{Cap}(\mathrm{DC})-\nu \eta -\psi (\varepsilon ,\delta ,\nu ,k,m))\) with \(\psi \to 0\); using Lemma 32 (information rate of the trimming Dobrushin channel is \(\ge \mathrm{Cap}(\mathrm{DC})-2\nu \eta \)) and taking \(\varepsilon ,\delta ,\eta ,\nu \) small and \(m\) large makes \(R\ge \mathrm{Cap}(\mathrm{DC})-\varepsilon \).

The error analysis mirrors the proof of Theorem 23; two additional claims are needed: (1) the probability that any length-\(\nu \eta m\) substring of a received buffer has a fraction of ones higher than \(f+\kappa \) vanishes as \(m\to \infty \) (an easier version of Lemma 33, with \(\nu \) small enough to identify buffers whp), and (2) the probability that any length-\(\nu \eta m\) substring of a received codeword has a fraction of ones lower than \(f+\kappa \) vanishes (Lemma 33). As in Theorem 23 each error type’s frequency is a binomial variable, so by the Chernoff bound (Theorem 8) and a union bound (Theorem 6) the failure probability is \(e^{-\Omega (n)}\). Encoding is linear and buffer identification (running window) is \(O(n)\), so decoding is quasi-linear.