3 Extensions to More General Channels
Fix probability distributions \(\mathcal{D}_0,\mathcal{D}_1\) over \(\{ 0,1\} ^*\). A \((\mathcal{D}_0,\mathcal{D}_1)\)-Dobrushin channel \(\mathrm{DC}_{\mathcal{D}_0,\mathcal{D}_1}\) acts independently on each input bit by \((\mathrm{DC}\, 0)\sim \mathcal{D}_0\) and \((\mathrm{DC}\, 1)\sim \mathcal{D}_1\), and concatenates the outputs. It is square-integrable if for \(Y_i\sim \mathcal{D}_i\) we have \(\mathbb {E}|Y_i|^2{\lt}\infty \), \(i=0,1\). Repeat channels are the special case where \(\mathcal{D}_0\) is supported on all-zero strings, \(\mathcal{D}_1\) on all-one strings, with matching induced length distributions.
A Dobrushin channel \(\mathrm{DC}_{\mathcal{D}_0,\mathcal{D}_1}\) is biased if for \(Y_0\sim \mathcal{D}_0\), \(Y_1\sim \mathcal{D}_1\) we have \(\mathbb {E}|Y_0|=\mathbb {E}|Y_1|{\lt}\infty \) and \(\mathbb {E}[w(Y_0)]{\lt}\tfrac 12\mathbb {E}|Y_0|\) and \(\mathbb {E}[w(Y_1)]{\gt}\tfrac 12\mathbb {E}|Y_1|\). The expected fraction of ones in the output of a long zero-run is \(f:=\mathbb {E}[w(Y_0)]/\mathbb {E}|Y_0|{\lt}\tfrac 12\).
Given \(n\) and distributions \(\mathcal{T}_\ell ,\mathcal{T}_r\) over \(\mathbb {N}\) (may depend on \(n\)), let \(\mathrm{TRIM}_{\mathcal{T}_\ell ,\mathcal{T}_r}\) act on \(x\in \{ 0,1\} ^n\) by \(\mathrm{TRIM}_{\mathcal{T}_\ell ,\mathcal{T}_r}\, x=x_{t_\ell }^{n-t_r}\) where \(t_\ell \sim \mathcal{T}_\ell \), \(t_r\sim \mathcal{T}_r\) are independent (the empty string if \(t_2{\lt}t_1\)). For a Dobrushin channel \(\mathrm{DC}=\mathrm{DC}_{\mathcal{D}_0,\mathcal{D}_1}\), the \((\mathcal{T}_\ell ,\mathcal{T}_r)\)-trimming Dobrushin channel is \(\mathrm{TDC}_{\mathcal{D}_0,\mathcal{D}_1,\mathcal{T}_\ell ,\mathcal{T}_r} :=\mathrm{TRIM}_{\mathcal{T}_\ell ,\mathcal{T}_r}\circ \mathrm{DC}\).
Let \(\mathrm{DC}\) be a square-integrable Dobrushin channel and let \(\mathrm{TDC}_{\mathcal{T}_\ell ,\mathcal{T}_r}\) be the trimming version with \(\mathcal{T}_\ell ,\mathcal{T}_r\) each bounded by \(\nu \eta m\) for inputs of length \(m\). Then the information rate of \(\mathrm{TDC}_{\mathcal{T}_\ell ,\mathcal{T}_r}\) is at least \(\mathrm{Cap}(\mathrm{DC})-2\nu \eta \).
Following the proof of Lemma 25, let \(T_\ell \sim \mathcal{T}_\ell \), \(T_r\sim \mathcal{T}_r\) be the numbers of bits trimmed off the left and right of the output \(Y=\mathrm{DC}\, X\) (with \(X\) on \(\{ 0,1\} ^m\)), and set \(L=T_\ell \), \(R=m-T_r\). We prove the analogue of Claim 26 with the modified second part
\(Y'=\mathrm{TDC}_{\mathcal{T}_\ell ,\mathcal{T}_r}\, X\). The proof of the first equation is identical to Claim 26. For the second, by the same chain-rule argument (Lemma 12) we reach \(I(X;Y\mid \widetilde L,\widetilde R)\le I(X;Y_{L+1}^{R-1}\mid \widetilde L, \widetilde R)+H(Y_1^L,Y_R^m)\), but now the trimmed ends are no longer all-zero, so we can only bound \(H(Y_1^L,Y_R^m)\le \log ((2^{m\nu \eta })^2)=2m\nu \eta \). The rest of the argument proceeds as in Lemma 25, giving the stated bound.
Let \(c\) be a codeword of length \(m\) from the near-capacity-achieving code for a biased square-integrable Dobrushin channel \(\mathrm{DC}=\mathrm{DC}_{\mathcal{D}_0,\mathcal{D}_1}\) (the analogue of Proposition 28). The probability that any length-\(\nu \eta m\) substring of \(\mathrm{DC}\, c\) has a fraction of zeros less than \(f+\kappa \) (where \(f=\mathbb {E}[w(Y_0)]/\mathbb {E}|Y_0|\), \(Y_0\sim \mathcal{D}_0\)) vanishes as \(m\to \infty \), for any \(\nu ,\eta {\gt}0\) and all \(\kappa {\gt}0\) small enough.
Let \(Y^1,\dots ,Y^t\) be the outputs of \(\mathrm{DC}\) on the individual bits of \(c\) fully contained in the first \(\nu \eta m\) output bits. Up to \(o(m)\) leftover bits (negligible with high probability), the window equals \(Y^1Y^2\cdots Y^t\), and by a union bound over \(O(1/\nu \eta )\) blocks it suffices to treat this prefix window. We must show
Conditioning on \(t\) and using \(\mathbb {E}|Y_0|=\mathbb {E}|Y_1|\),
By the balance property of \(c\) (analogue of Proposition 28), each long substring of \(c\) has at least a \(\gamma \)-fraction of ones, so for \(t\ge \tfrac 12\mathbb {E}t\) (whp),
Since \(f{\lt}\tfrac 12\) and \(\gamma {\gt}0\), choosing \(\kappa \) small makes this \(-C\, \mathbb {E}t\) for a constant \(C{\gt}0\), i.e. \(-\Omega (m)\). By concentration (variances are \(o(m^2)\), via Chebyshev’s inequality, Theorem 5), the probability vanishes as \(m\to \infty \).
Fix a biased square-integrable Dobrushin channel \(\mathrm{DC}\). For every \(\varepsilon {\gt}0\) there exists a sound code \(\mathcal{C}\) with rate \(R\) for the \(\mathrm{DC}\) with \(R\ge \mathrm{Cap}(\mathrm{DC})-\varepsilon \) and linear-time encoding, quasi-linear-time decoding. Moreover the decoder has probability of failure \(e^{-\Omega (n)}\).
We adapt the construction of Definition 24, changing only buffer identification: declare any contiguous block of \(\nu \eta m\) bits with a fraction of ones less than \(f+\kappa \) to be part of a buffer (implemented with a running window of size \(\nu \eta m\); a buffer starts when the fraction drops below \(f+\kappa \) and ends when it rises above). The rate is as in Theorem 23 except \(n=km/(\mathrm{Cap}(\mathrm{DC})-\nu \eta -\psi (\varepsilon ,\delta ,\nu ,k,m))\) with \(\psi \to 0\); using Lemma 32 (information rate of the trimming Dobrushin channel is \(\ge \mathrm{Cap}(\mathrm{DC})-2\nu \eta \)) and taking \(\varepsilon ,\delta ,\eta ,\nu \) small and \(m\) large makes \(R\ge \mathrm{Cap}(\mathrm{DC})-\varepsilon \).
The error analysis mirrors the proof of Theorem 23; two additional claims are needed: (1) the probability that any length-\(\nu \eta m\) substring of a received buffer has a fraction of ones higher than \(f+\kappa \) vanishes as \(m\to \infty \) (an easier version of Lemma 33, with \(\nu \) small enough to identify buffers whp), and (2) the probability that any length-\(\nu \eta m\) substring of a received codeword has a fraction of ones lower than \(f+\kappa \) vanishes (Lemma 33). As in Theorem 23 each error type’s frequency is a binomial variable, so by the Chernoff bound (Theorem 8) and a union bound (Theorem 6) the failure probability is \(e^{-\Omega (n)}\). Encoding is linear and buffer identification (running window) is \(O(n)\), so decoding is quasi-linear.