- Boxes
- definitions
- Ellipses
- theorems and lemmas
- Blue border
- the statement of this result is ready to be formalized; all prerequisites are done
- Orange border
- the statement of this result is not ready to be formalized; the blueprint needs more work
- Blue background
- the proof of this result is ready to be formalized; all prerequisites are done
- Green border
- the statement of this result is formalized
- Green background
- the proof of this result is formalized
- Dark green background
- the proof of this result and all its ancestors are formalized
- Dark green border
- this is in Mathlib
A Dobrushin channel \(\mathrm{DC}_{\mathcal{D}_0,\mathcal{D}_1}\) is biased if for \(Y_0\sim \mathcal{D}_0\), \(Y_1\sim \mathcal{D}_1\) we have \(\mathbb {E}|Y_0|=\mathbb {E}|Y_1|{\lt}\infty \) and \(\mathbb {E}[w(Y_0)]{\lt}\tfrac 12\mathbb {E}|Y_0|\) and \(\mathbb {E}[w(Y_1)]{\gt}\tfrac 12\mathbb {E}|Y_1|\). The expected fraction of ones in the output of a long zero-run is \(f:=\mathbb {E}[w(Y_0)]/\mathbb {E}|Y_0|{\lt}\tfrac 12\).
Fix a channel \(\mathsf{Ch}\) and let \(\mathcal{S}\) be the collection of all codes sound with respect to \(\mathsf{Ch}\). The capacity of \(\mathsf{Ch}\) is \(\mathrm{Cap}(\mathsf{Ch}) := \sup _{\mathcal{C}\in \mathcal{S}} R(\mathcal{C})\).
For \(\Omega \) a probability space, a binary communication channel is a map \(\mathsf{Ch}: \Omega \times \{ 0,1\} ^* \to \{ 0,1\} ^*\). For \(x \in \{ 0,1\} ^*\) we write \(\mathsf{Ch}\, x\) for the random variable \(\omega \mapsto \mathsf{Ch}(\omega ,x)\). Here \(\{ 0,1\} ^n\) is the set of bit strings of length \(n\) (for \(n\in \mathbb {N}\cup \{ \infty \} \)), \(\{ 0,1\} ^* = \bigcup _{n\in \mathbb {N}}\{ 0,1\} ^n\), \(x_j^k\) denotes the substring of \(x\) from index \(j\) to \(k\) inclusive (and \(x_i := x_i^i\)), \([n] := \{ 1,\dots ,n\} \), \(|x|\) is the length of \(x\), \(xy\) is concatenation, and \((x)^k\) is the \(k\)-fold self-concatenation with \((x)^0\) the empty string.
A code is a family \(\mathcal{C}= \{ \mathcal{C}_n\} _{n\ge 1}\) of subsets \(\mathcal{C}_n \subseteq \{ 0,1\} ^n\); its rate is \(R_n := \tfrac 1n \log |\mathcal{C}_n|\), and \(R = R(\mathcal{C}) := \lim _{n\to \infty } R_n\).
Fix the inner blocklength \(m=O(1)\), set \(|\Sigma |=2^m\), and let \(\mathcal{C}_{in}\subseteq \{ 0,1\} ^m\) be an approximately balanced inner code for the \(\mathrm{TRC}_\mathcal{D}\) of rate \(R-\varepsilon \) with (not necessarily efficient) encoder \(\mathsf{Enc}_{\mathrm{in}}\) and decoder \(\mathsf{Dec}_{\mathrm{in}}\) (existence by Proposition 28; the \(O(1)\)-size code is found by exhaustive search in \(O(1)\) time). The encoder \(\mathsf{Enc}:\{ 0,1\} ^{km}\to \{ 0,1\} ^n\) acts on \(x\in \{ 0,1\} ^{km}\) as follows.
Split \(x\) into \(x_1,\dots ,x_k\) with \(|x_j|=m\), viewing each \(x_j\) as a letter of \(\Sigma \), so \(x\in \Sigma ^k\). Apply the outer encoder \(\mathsf{Enc}_{\mathrm{out}}\) of Theorem 22 to get \(\widetilde x=\mathsf{Enc}_{\mathrm{out}}(x)\in \Sigma ^{k/(1-\delta -\varepsilon )}\).
Split \(\widetilde x\) into \(\widetilde x_1,\dots ,\widetilde x_{k'}\) with \(k'=k/(1-\delta -\varepsilon )\), view each \(\widetilde x_j\in \{ 0,1\} ^m\), and encode it with the inner code to get \(\widehat x_j=\mathsf{Enc}_{\mathrm{in}}(\widetilde x_j)\in \{ 0,1\} ^{m/(R-\varepsilon )}\) (the inner rate is \(R-\varepsilon \) for \(m\) large).
Concatenate the \(\widehat x_j\) with all-zero buffers \(0^b\), \(b=b(m)=\eta m\):
\[ \mathsf{Enc}(x) = \widehat x_1\, 0^b\, \widehat x_2\, 0^b\cdots 0^b\, \widehat x_{k'}, \]where \(b\) is a constant independent of \(n=k'\cdot (\tfrac {m}{R-\varepsilon }+b) = km/(\mathrm{Cap}(\mathrm{RC}_\mathcal{D})-\psi (\varepsilon ,\delta ,\eta ,k,m))\) with \(\psi \to 0\).
The decoder \(\mathsf{Dec}\) on a received string \(y\in \{ 0,1\} ^*\) reverses the steps:
Identify buffers: interpret any maximal contiguous block of \(\ge \tfrac {n}{2}\eta m\) zeros as a buffer; remove buffers to get the received inner strings \(y_1,\dots ,y_\ell \).
Decode each \(y_j\) with the inner code: \(\widetilde y_j=\mathsf{Dec}_{\mathrm{in}}(y_j)\in \{ 0,1\} ^m\) for \(j\le \ell \).
Interpret each \(\widetilde y_j\) as a letter of \(\Sigma \) and decode the concatenation \(\widetilde y=\widetilde y_1\cdots \widetilde y_\ell \in \Sigma ^\ell \) with the outer decoder: \(\mathsf{Dec}(y)=\mathsf{Dec}_{\mathrm{out}}(\widetilde y)\in \{ 0,1\} ^{km}\).
Fix probability distributions \(\mathcal{D}_0,\mathcal{D}_1\) over \(\{ 0,1\} ^*\). A \((\mathcal{D}_0,\mathcal{D}_1)\)-Dobrushin channel \(\mathrm{DC}_{\mathcal{D}_0,\mathcal{D}_1}\) acts independently on each input bit by \((\mathrm{DC}\, 0)\sim \mathcal{D}_0\) and \((\mathrm{DC}\, 1)\sim \mathcal{D}_1\), and concatenates the outputs. It is square-integrable if for \(Y_i\sim \mathcal{D}_i\) we have \(\mathbb {E}|Y_i|^2{\lt}\infty \), \(i=0,1\). Repeat channels are the special case where \(\mathcal{D}_0\) is supported on all-zero strings, \(\mathcal{D}_1\) on all-one strings, with matching induced length distributions.
For \(x \in \{ 0,1\} ^n\), \(b \in \{ 0,1\} \) and an index \(j \in [n]\), an insertion is the transformation \(x \mapsto x_1^j b\, x_{j+1}^n\) and a deletion is \(x \mapsto x_1^{j-1} x_{j+1}^n\) (Definition 2.8 of the paper). The edit distance between \(x,y \in \{ 0,1\} ^*\) is the minimum number of insertions and/or deletions needed to convert \(x\) into \(y\).
For a probability distribution \(\mathcal{D}\) over \(\mathbb {R}\), \(\mu (\mathcal{D}) := \mathbb {E}[\mathcal{D}]\) denotes its expectation (mean). When \(\mathcal{D}\) is clear from context we write \(\mu \). (Standard; recalled so later nodes may depend on it.)
For a probability distribution \(\mathcal{D}\) over \(\mathbb {N}\), let \(\Omega = \mathbb {N}^\infty \) with the infinite product measure \(\mathcal{D}^\infty \). The binary \(\mathcal{D}\)-repeat channel is
i.e. the \(i\)-th input bit is repeated \(\omega _i \sim \mathcal{D}\) times, independently. It is square-integrable if \(\mu (\mathcal{D}){\lt}\infty \) and \(\sigma ^2(\mathcal{D}){\lt}\infty \). The origin (input index) of the \(j\)-th output bit is well defined as \(\min \{ i \ge 1 : \sum _{k=1}^i \omega _k \ge j\} \). Special cases: the binary deletion channel (\(\mathcal{D}=\mathrm{Bernoulli}\)), the Poisson repeat channel (\(\mathcal{D}=\mathrm{Poisson}\)), and sticky channels.
An encoding algorithm for \(\mathcal{C}\) is a family of maps \(\mathsf{Enc}: \{ 0,1\} ^{nR_n} \to \mathcal{C}_n\), and a decoding algorithm is a family \(\mathsf{Dec}: \{ 0,1\} ^* \to \{ 0,1\} ^{nR_n}\). The code \(\mathcal{C}\) is sound with respect to a channel \(\mathsf{Ch}\) if there exist \(\mathsf{Enc},\mathsf{Dec}\) such that \(\mathbb {P}\big(\mathsf{Dec}(\mathsf{Ch}\, \mathsf{Enc}(X)) \ne X\big) \to 0\) as \(n\to \infty \), where \(X\) is uniform on \(\{ 0,1\} ^{nR_n}\).
A stochastic process \(\{ X_j\} _{j\ge 1}\) is stationary if for every \(j,N\in \mathbb {N}\), \((X_1,\dots ,X_N) \overset {\mathcal{D}}{=} (X_{j+1},\dots ,X_{j+N})\). It is stationary ergodic if, in addition, Birkhoff’s pointwise ergodic theorem (Theorem 13) holds: for every \(f\in L^1\), almost surely \(\mathbb {E}f(X_1) = \lim _{n\to \infty }\tfrac 1n\sum _{j=1}^n f(X_j)\).
Let \(\mathrm{TRIM}\) be the deterministic channel acting on \(x\in \{ 0,1\} ^n\) by deleting the longest all-zero prefix and suffix:
(the empty string if \(x\) is all zeros). The trimming \(\mathcal{D}\)-repeat channel is the composition \(\mathrm{TRC}_\mathcal{D}:= \mathrm{TRIM}\circ \mathrm{RC}_\mathcal{D}\).
Given \(n\) and distributions \(\mathcal{T}_\ell ,\mathcal{T}_r\) over \(\mathbb {N}\) (may depend on \(n\)), let \(\mathrm{TRIM}_{\mathcal{T}_\ell ,\mathcal{T}_r}\) act on \(x\in \{ 0,1\} ^n\) by \(\mathrm{TRIM}_{\mathcal{T}_\ell ,\mathcal{T}_r}\, x=x_{t_\ell }^{n-t_r}\) where \(t_\ell \sim \mathcal{T}_\ell \), \(t_r\sim \mathcal{T}_r\) are independent (the empty string if \(t_2{\lt}t_1\)). For a Dobrushin channel \(\mathrm{DC}=\mathrm{DC}_{\mathcal{D}_0,\mathcal{D}_1}\), the \((\mathcal{T}_\ell ,\mathcal{T}_r)\)-trimming Dobrushin channel is \(\mathrm{TDC}_{\mathcal{D}_0,\mathcal{D}_1,\mathcal{T}_\ell ,\mathcal{T}_r} :=\mathrm{TRIM}_{\mathcal{T}_\ell ,\mathcal{T}_r}\circ \mathrm{DC}\).
Let \(c\) be a codeword of length \(m\) from the near-capacity-achieving code for a biased square-integrable Dobrushin channel \(\mathrm{DC}=\mathrm{DC}_{\mathcal{D}_0,\mathcal{D}_1}\) (the analogue of Proposition 28). The probability that any length-\(\nu \eta m\) substring of \(\mathrm{DC}\, c\) has a fraction of zeros less than \(f+\kappa \) (where \(f=\mathbb {E}[w(Y_0)]/\mathbb {E}|Y_0|\), \(Y_0\sim \mathcal{D}_0\)) vanishes as \(m\to \infty \), for any \(\nu ,\eta {\gt}0\) and all \(\kappa {\gt}0\) small enough.
Assume \(\mathcal{D}\) is bounded by \(Bn\). Let \(Y=\mathrm{RC}_\mathcal{D}\, X\) and \(Y'=\mathrm{TRC}_\mathcal{D}\, X\), and let \(\widetilde L,\widetilde R\) be as in Lemma 25. Then
Let \(\mathsf{Ch}_1,\mathsf{Ch}_2:\Omega \times \{ 0,1\} ^*\to \{ 0,1\} ^*\) be two channels. Suppose there is a sequence \(p_n\ge 0\) with \(p_n\to 1\) such that for every \(x\in \{ 0,1\} ^n\) there is a set \(A_x\subseteq \Omega \) with: conditioned on \(A_x\), the distributions of \(\mathsf{Ch}_1\, x\) and \(\mathsf{Ch}_2\, x\) are the same, and \(\mathbb {P}(A_x)\ge p_n\). Then the information rates of \(\mathsf{Ch}_1\) and \(\mathsf{Ch}_2\) are equal.
Let \(\mathrm{DC}\) be a square-integrable Dobrushin channel and let \(\mathrm{TDC}_{\mathcal{T}_\ell ,\mathcal{T}_r}\) be the trimming version with \(\mathcal{T}_\ell ,\mathcal{T}_r\) each bounded by \(\nu \eta m\) for inputs of length \(m\). Then the information rate of \(\mathrm{TDC}_{\mathcal{T}_\ell ,\mathcal{T}_r}\) is at least \(\mathrm{Cap}(\mathrm{DC})-2\nu \eta \).
Let \(\mathrm{RC}_\mathcal{D}\) be a square-integrable repeat channel. Then the information rate of the \(\mathrm{TRC}_\mathcal{D}\) is \(\mathrm{Cap}(\mathrm{RC}_\mathcal{D})\).
Fix a square-integrable \(\mathcal{D}\)-repeat channel \(\mathsf{Ch}=\mathrm{RC}_\mathcal{D}\), or its trimming version \(\mathsf{Ch}=\mathrm{TRC}_\mathcal{D}\), with information rate \(\mathcal{I}\). For every \(\zeta ,\varepsilon \in (0,1)\) there exists \(\gamma \in (0,\tfrac 12)\) and a family of codes \(\mathcal{C}_n\subseteq \{ 0,1\} ^n\) for \(\mathsf{Ch}\) with rate \(R\ge \mathcal{I}-\varepsilon \) such that for every codeword \(c\in \mathcal{C}_n\) and every \(i\in [n-\zeta n]\),
i.e. every length-\(\zeta n\) substring of every codeword is approximately balanced.
Let \(Z \sim \mathrm{Binomial}(t,p)\). For any \(a {\gt} p\, t\) there is a constant \(c{\gt}0\) with \(\mathbb {P}(Z \ge a) \le e^{-c\, t}\); in particular, a \(\mathrm{Binomial}(t,p)\) exceeds \(\tfrac {a}{t}\cdot t\) for \(a/t {\gt} p\) only with probability \(e^{-\Omega (t)}\).
Fix a biased square-integrable Dobrushin channel \(\mathrm{DC}\). For every \(\varepsilon {\gt}0\) there exists a sound code \(\mathcal{C}\) with rate \(R\) for the \(\mathrm{DC}\) with \(R\ge \mathrm{Cap}(\mathrm{DC})-\varepsilon \) and linear-time encoding, quasi-linear-time decoding. Moreover the decoder has probability of failure \(e^{-\Omega (n)}\).
Consider a channel \(\mathsf{Ch}\) that acts independently on each input bit and appends the corresponding outputs, i.e. \(\mathsf{Ch}\, x \overset {\mathcal{D}}{=} (\mathsf{Ch}\, x_1)(\mathsf{Ch}\, x_2)\cdots (\mathsf{Ch}\, x_n)\) for \(x\in \{ 0,1\} ^n\), and suppose \(\mathbb {E}|\mathsf{Ch}\, b|{\lt}\infty \) for \(b\in \{ 0,1\} \). Then
the supremum over random variables \(X^n\) supported on \(\{ 0,1\} ^n\) with \(Y^n=\mathsf{Ch}\, X^n\). Moreover the capacity is attained by a stationary ergodic input process. When the right-hand limit exists for a channel not necessarily satisfying these hypotheses (e.g. a trimming repeat channel), it is called the information rate of the channel.
Fix a square-integrable repeat channel \(\mathrm{RC}_\mathcal{D}\). For every \(\varepsilon {\gt}0\) there exists a sound code \(\mathcal{C}\) with rate \(R\) for the \(\mathrm{RC}_\mathcal{D}\) such that \(R \ge \mathrm{Cap}(\mathrm{RC}_\mathcal{D})-\varepsilon \), with linear-time encoding and quasi-linear-time decoding. Moreover the decoder has probability of failure \(e^{-\Omega (n)}\).
For every \(\varepsilon ,\delta \in (0,1)\) there exists a family of codes \(\mathcal{C}_n\) of rate \(1-\delta -\varepsilon \) over an alphabet \(\Sigma \) of size \(O_\varepsilon (1)\) that can deterministically correct insertion/deletion (worst-case) errors of edit distance at most \(\delta n\). Moreover the \(\mathcal{C}_n\) have encoding and decoding algorithms running in linear and quasi-linear time, respectively.