1 Preliminaries
For a probability distribution \(\mathcal{D}\) over \(\mathbb {R}\), \(\mu (\mathcal{D}) := \mathbb {E}[\mathcal{D}]\) denotes its expectation (mean). When \(\mathcal{D}\) is clear from context we write \(\mu \). (Standard; recalled so later nodes may depend on it.)
For a probability distribution \(\mathcal{D}\) over \(\mathbb {R}\), \(\sigma ^2(\mathcal{D}) := \mathbb {E}\big[(\mathcal{D}-\mu (\mathcal{D}))^2\big]\) is its variance; written \(\sigma ^2\) when \(\mathcal{D}\) is understood.
For \(x \in \{ 0,1\} ^*\), the Hamming weight \(w(x) \in \mathbb {N}\) is the number of ones in \(x\).
For \(x \in \{ 0,1\} ^n\), \(b \in \{ 0,1\} \) and an index \(j \in [n]\), an insertion is the transformation \(x \mapsto x_1^j b\, x_{j+1}^n\) and a deletion is \(x \mapsto x_1^{j-1} x_{j+1}^n\) (Definition 2.8 of the paper). The edit distance between \(x,y \in \{ 0,1\} ^*\) is the minimum number of insertions and/or deletions needed to convert \(x\) into \(y\).
For a random variable \(X\) with finite variance and any \(c{\gt}0\), \(\mathbb {P}\big(|X-\mathbb {E}X| \ge c\big) \le \mathrm{Var}[X]/c^2\).
For events \(A_1,\dots ,A_t\), \(\mathbb {P}\big(\bigcup _i A_i\big) \le \sum _i \mathbb {P}(A_i)\).
\(\mathrm{Binomial}(t,p)\) is the distribution of the number of successes in \(t\) independent trials each succeeding with probability \(p\).
Let \(Z \sim \mathrm{Binomial}(t,p)\). For any \(a {\gt} p\, t\) there is a constant \(c{\gt}0\) with \(\mathbb {P}(Z \ge a) \le e^{-c\, t}\); in particular, a \(\mathrm{Binomial}(t,p)\) exceeds \(\tfrac {a}{t}\cdot t\) for \(a/t {\gt} p\) only with probability \(e^{-\Omega (t)}\).
For a discrete random variable \(X\), \(H(X) := -\sum _x \mathbb {P}(X=x)\log _2 \mathbb {P}(X=x)\) is its (base-2) Shannon entropy. All logs in this blueprint are base 2.
For discrete random variables \(X,Y\), \(H(X\mid Y) := H(X,Y) - H(Y)\) is the conditional entropy of \(X\) given \(Y\).
For discrete random variables \(X,Y\), the mutual information is \(I(X;Y) := H(X) - H(X\mid Y) = H(Y) - H(Y\mid X)\). Conditional mutual information \(I(X;Y\mid Z)\) is defined analogously.
\(H(X,Y) = H(Y) + H(X\mid Y)\), and more generally the chain rule \(I(X;Y,Z) = I(X;Z) + I(X;Y\mid Z)\) holds.
For an ergodic measure-preserving system and every \(f \in L^1\), the time averages converge almost surely: \(\lim _{n\to \infty } \tfrac 1n \sum _{j=1}^n f(X_j) = \mathbb {E}f(X_1)\) a.s.
For \(\Omega \) a probability space, a binary communication channel is a map \(\mathsf{Ch}: \Omega \times \{ 0,1\} ^* \to \{ 0,1\} ^*\). For \(x \in \{ 0,1\} ^*\) we write \(\mathsf{Ch}\, x\) for the random variable \(\omega \mapsto \mathsf{Ch}(\omega ,x)\). Here \(\{ 0,1\} ^n\) is the set of bit strings of length \(n\) (for \(n\in \mathbb {N}\cup \{ \infty \} \)), \(\{ 0,1\} ^* = \bigcup _{n\in \mathbb {N}}\{ 0,1\} ^n\), \(x_j^k\) denotes the substring of \(x\) from index \(j\) to \(k\) inclusive (and \(x_i := x_i^i\)), \([n] := \{ 1,\dots ,n\} \), \(|x|\) is the length of \(x\), \(xy\) is concatenation, and \((x)^k\) is the \(k\)-fold self-concatenation with \((x)^0\) the empty string.
For a probability distribution \(\mathcal{D}\) over \(\mathbb {N}\), let \(\Omega = \mathbb {N}^\infty \) with the infinite product measure \(\mathcal{D}^\infty \). The binary \(\mathcal{D}\)-repeat channel is
i.e. the \(i\)-th input bit is repeated \(\omega _i \sim \mathcal{D}\) times, independently. It is square-integrable if \(\mu (\mathcal{D}){\lt}\infty \) and \(\sigma ^2(\mathcal{D}){\lt}\infty \). The origin (input index) of the \(j\)-th output bit is well defined as \(\min \{ i \ge 1 : \sum _{k=1}^i \omega _k \ge j\} \). Special cases: the binary deletion channel (\(\mathcal{D}=\mathrm{Bernoulli}\)), the Poisson repeat channel (\(\mathcal{D}=\mathrm{Poisson}\)), and sticky channels.
A code is a family \(\mathcal{C}= \{ \mathcal{C}_n\} _{n\ge 1}\) of subsets \(\mathcal{C}_n \subseteq \{ 0,1\} ^n\); its rate is \(R_n := \tfrac 1n \log |\mathcal{C}_n|\), and \(R = R(\mathcal{C}) := \lim _{n\to \infty } R_n\).
An encoding algorithm for \(\mathcal{C}\) is a family of maps \(\mathsf{Enc}: \{ 0,1\} ^{nR_n} \to \mathcal{C}_n\), and a decoding algorithm is a family \(\mathsf{Dec}: \{ 0,1\} ^* \to \{ 0,1\} ^{nR_n}\). The code \(\mathcal{C}\) is sound with respect to a channel \(\mathsf{Ch}\) if there exist \(\mathsf{Enc},\mathsf{Dec}\) such that \(\mathbb {P}\big(\mathsf{Dec}(\mathsf{Ch}\, \mathsf{Enc}(X)) \ne X\big) \to 0\) as \(n\to \infty \), where \(X\) is uniform on \(\{ 0,1\} ^{nR_n}\).
Fix a channel \(\mathsf{Ch}\) and let \(\mathcal{S}\) be the collection of all codes sound with respect to \(\mathsf{Ch}\). The capacity of \(\mathsf{Ch}\) is \(\mathrm{Cap}(\mathsf{Ch}) := \sup _{\mathcal{C}\in \mathcal{S}} R(\mathcal{C})\).
Let \(\mathrm{TRIM}\) be the deterministic channel acting on \(x\in \{ 0,1\} ^n\) by deleting the longest all-zero prefix and suffix:
(the empty string if \(x\) is all zeros). The trimming \(\mathcal{D}\)-repeat channel is the composition \(\mathrm{TRC}_\mathcal{D}:= \mathrm{TRIM}\circ \mathrm{RC}_\mathcal{D}\).
A stochastic process \(\{ X_j\} _{j\ge 1}\) is stationary if for every \(j,N\in \mathbb {N}\), \((X_1,\dots ,X_N) \overset {\mathcal{D}}{=} (X_{j+1},\dots ,X_{j+N})\). It is stationary ergodic if, in addition, Birkhoff’s pointwise ergodic theorem (Theorem 13) holds: for every \(f\in L^1\), almost surely \(\mathbb {E}f(X_1) = \lim _{n\to \infty }\tfrac 1n\sum _{j=1}^n f(X_j)\).
Consider a channel \(\mathsf{Ch}\) that acts independently on each input bit and appends the corresponding outputs, i.e. \(\mathsf{Ch}\, x \overset {\mathcal{D}}{=} (\mathsf{Ch}\, x_1)(\mathsf{Ch}\, x_2)\cdots (\mathsf{Ch}\, x_n)\) for \(x\in \{ 0,1\} ^n\), and suppose \(\mathbb {E}|\mathsf{Ch}\, b|{\lt}\infty \) for \(b\in \{ 0,1\} \). Then
the supremum over random variables \(X^n\) supported on \(\{ 0,1\} ^n\) with \(Y^n=\mathsf{Ch}\, X^n\). Moreover the capacity is attained by a stationary ergodic input process. When the right-hand limit exists for a channel not necessarily satisfying these hypotheses (e.g. a trimming repeat channel), it is called the information rate of the channel.
For every \(\varepsilon ,\delta \in (0,1)\) there exists a family of codes \(\mathcal{C}_n\) of rate \(1-\delta -\varepsilon \) over an alphabet \(\Sigma \) of size \(O_\varepsilon (1)\) that can deterministically correct insertion/deletion (worst-case) errors of edit distance at most \(\delta n\). Moreover the \(\mathcal{C}_n\) have encoding and decoding algorithms running in linear and quasi-linear time, respectively.