Dependency graph

Legend

Boxes: definitions
Ellipses: theorems and lemmas
Blue border: the statement of this result is ready to be formalized; all prerequisites are done
Orange border: the statement of this result is not ready to be formalized; the blueprint needs more work
Blue background: the proof of this result is ready to be formalized; all prerequisites are done
Green border: the statement of this result is formalized
Green background: the proof of this result is formalized
Dark green background: the proof of this result and all its ancestors are formalized
Dark green border: this is in Mathlib

Definition 30Biased Dobrushin channel (Def. 4.2)

A Dobrushin channel \(\mathrm{DC}_{\mathcal{D}_0,\mathcal{D}_1}\) is biased if for \(Y_0\sim \mathcal{D}_0\), \(Y_1\sim \mathcal{D}_1\) we have \(\mathbb {E}|Y_0|=\mathbb {E}|Y_1|{\lt}\infty \) and \(\mathbb {E}[w(Y_0)]{\lt}\tfrac 12\mathbb {E}|Y_0|\) and \(\mathbb {E}[w(Y_1)]{\gt}\tfrac 12\mathbb {E}|Y_1|\). The expected fraction of ones in the output of a long zero-run is \(f:=\mathbb {E}[w(Y_0)]/\mathbb {E}|Y_0|{\lt}\tfrac 12\).

LaTeX

Definition 7Binomial distribution

\(\mathrm{Binomial}(t,p)\) is the distribution of the number of successes in \(t\) independent trials each succeeding with probability \(p\).

LaTeX Lean

Definition 18Capacity (Def. 2.4)

Fix a channel \(\mathsf{Ch}\) and let \(\mathcal{S}\) be the collection of all codes sound with respect to \(\mathsf{Ch}\). The capacity of \(\mathsf{Ch}\) is \(\mathrm{Cap}(\mathsf{Ch}) := \sup _{\mathcal{C}\in \mathcal{S}} R(\mathcal{C})\).

LaTeX

Definition 14Binary communication channel (Def. 2.1)

For \(\Omega \) a probability space, a binary communication channel is a map \(\mathsf{Ch}: \Omega \times \{ 0,1\} ^* \to \{ 0,1\} ^*\). For \(x \in \{ 0,1\} ^*\) we write \(\mathsf{Ch}\, x\) for the random variable \(\omega \mapsto \mathsf{Ch}(\omega ,x)\). Here \(\{ 0,1\} ^n\) is the set of bit strings of length \(n\) (for \(n\in \mathbb {N}\cup \{ \infty \} \)), \(\{ 0,1\} ^* = \bigcup _{n\in \mathbb {N}}\{ 0,1\} ^n\), \(x_j^k\) denotes the substring of \(x\) from index \(j\) to \(k\) inclusive (and \(x_i := x_i^i\)), \([n] := \{ 1,\dots ,n\} \), \(|x|\) is the length of \(x\), \(xy\) is concatenation, and \((x)^k\) is the \(k\)-fold self-concatenation with \((x)^0\) the empty string.

LaTeX

Definition 16Code and rate (Def. 2.3, part 1)

A code is a family \(\mathcal{C}= \{ \mathcal{C}_n\} _{n\ge 1}\) of subsets \(\mathcal{C}_n \subseteq \{ 0,1\} ^n\); its rate is \(R_n := \tfrac 1n \log |\mathcal{C}_n|\), and \(R = R(\mathcal{C}) := \lim _{n\to \infty } R_n\).

LaTeX

Definition 10Conditional entropy

For discrete random variables \(X,Y\), \(H(X\mid Y) := H(X,Y) - H(Y)\) is the conditional entropy of \(X\) given \(Y\).

LaTeX Lean

Definition 24Construction: encoder \(\mathsf{Enc}\) and decoder \(\mathsf{Dec}\) (Sec. 3.1)

Fix the inner blocklength \(m=O(1)\), set \(|\Sigma |=2^m\), and let \(\mathcal{C}_{in}\subseteq \{ 0,1\} ^m\) be an approximately balanced inner code for the \(\mathrm{TRC}_\mathcal{D}\) of rate \(R-\varepsilon \) with (not necessarily efficient) encoder \(\mathsf{Enc}_{\mathrm{in}}\) and decoder \(\mathsf{Dec}_{\mathrm{in}}\) (existence by Proposition 28; the \(O(1)\)-size code is found by exhaustive search in \(O(1)\) time). The encoder \(\mathsf{Enc}:\{ 0,1\} ^{km}\to \{ 0,1\} ^n\) acts on \(x\in \{ 0,1\} ^{km}\) as follows.

Split \(x\) into \(x_1,\dots ,x_k\) with \(|x_j|=m\), viewing each \(x_j\) as a letter of \(\Sigma \), so \(x\in \Sigma ^k\). Apply the outer encoder \(\mathsf{Enc}_{\mathrm{out}}\) of Theorem 22 to get \(\widetilde x=\mathsf{Enc}_{\mathrm{out}}(x)\in \Sigma ^{k/(1-\delta -\varepsilon )}\).
Split \(\widetilde x\) into \(\widetilde x_1,\dots ,\widetilde x_{k'}\) with \(k'=k/(1-\delta -\varepsilon )\), view each \(\widetilde x_j\in \{ 0,1\} ^m\), and encode it with the inner code to get \(\widehat x_j=\mathsf{Enc}_{\mathrm{in}}(\widetilde x_j)\in \{ 0,1\} ^{m/(R-\varepsilon )}\) (the inner rate is \(R-\varepsilon \) for \(m\) large).
Concatenate the \(\widehat x_j\) with all-zero buffers \(0^b\), \(b=b(m)=\eta m\):

\[ \mathsf{Enc}(x) = \widehat x_1\, 0^b\, \widehat x_2\, 0^b\cdots 0^b\, \widehat x_{k'}, \]

where \(b\) is a constant independent of \(n=k'\cdot (\tfrac {m}{R-\varepsilon }+b) = km/(\mathrm{Cap}(\mathrm{RC}_\mathcal{D})-\psi (\varepsilon ,\delta ,\eta ,k,m))\) with \(\psi \to 0\).

The decoder \(\mathsf{Dec}\) on a received string \(y\in \{ 0,1\} ^*\) reverses the steps:

Identify buffers: interpret any maximal contiguous block of \(\ge \tfrac {n}{2}\eta m\) zeros as a buffer; remove buffers to get the received inner strings \(y_1,\dots ,y_\ell \).
Decode each \(y_j\) with the inner code: \(\widetilde y_j=\mathsf{Dec}_{\mathrm{in}}(y_j)\in \{ 0,1\} ^m\) for \(j\le \ell \).
Interpret each \(\widetilde y_j\) as a letter of \(\Sigma \) and decode the concatenation \(\widetilde y=\widetilde y_1\cdots \widetilde y_\ell \in \Sigma ^\ell \) with the outer decoder: \(\mathsf{Dec}(y)=\mathsf{Dec}_{\mathrm{out}}(\widetilde y)\in \{ 0,1\} ^{km}\).

LaTeX

Definition 29Dobrushin channel (Def. 4.1)

Fix probability distributions \(\mathcal{D}_0,\mathcal{D}_1\) over \(\{ 0,1\} ^*\). A \((\mathcal{D}_0,\mathcal{D}_1)\)-Dobrushin channel \(\mathrm{DC}_{\mathcal{D}_0,\mathcal{D}_1}\) acts independently on each input bit by \((\mathrm{DC}\, 0)\sim \mathcal{D}_0\) and \((\mathrm{DC}\, 1)\sim \mathcal{D}_1\), and concatenates the outputs. It is square-integrable if for \(Y_i\sim \mathcal{D}_i\) we have \(\mathbb {E}|Y_i|^2{\lt}\infty \), \(i=0,1\). Repeat channels are the special case where \(\mathcal{D}_0\) is supported on all-zero strings, \(\mathcal{D}_1\) on all-one strings, with matching induced length distributions.

LaTeX

Definition 4Edit distance via insertions and deletions

For \(x \in \{ 0,1\} ^n\), \(b \in \{ 0,1\} \) and an index \(j \in [n]\), an insertion is the transformation \(x \mapsto x_1^j b\, x_{j+1}^n\) and a deletion is \(x \mapsto x_1^{j-1} x_{j+1}^n\) (Definition 2.8 of the paper). The edit distance between \(x,y \in \{ 0,1\} ^*\) is the minimum number of insertions and/or deletions needed to convert \(x\) into \(y\).

LaTeX Lean

Definition 9Entropy

For a discrete random variable \(X\), \(H(X) := -\sum _x \mathbb {P}(X=x)\log _2 \mathbb {P}(X=x)\) is its (base-2) Shannon entropy. All logs in this blueprint are base 2.

LaTeX Lean

Definition 3Hamming weight

For \(x \in \{ 0,1\} ^*\), the Hamming weight \(w(x) \in \mathbb {N}\) is the number of ones in \(x\).

LaTeX Lean

Definition 1Expectation \(\mu (\mathcal{D})\)

For a probability distribution \(\mathcal{D}\) over \(\mathbb {R}\), \(\mu (\mathcal{D}) := \mathbb {E}[\mathcal{D}]\) denotes its expectation (mean). When \(\mathcal{D}\) is clear from context we write \(\mu \). (Standard; recalled so later nodes may depend on it.)

LaTeX Lean

Definition 11Mutual information

For discrete random variables \(X,Y\), the mutual information is \(I(X;Y) := H(X) - H(X\mid Y) = H(Y) - H(Y\mid X)\). Conditional mutual information \(I(X;Y\mid Z)\) is defined analogously.

LaTeX Lean

Definition 15Square-integrable binary \(\mathcal{D}\)-repeat channel (Def. 2.2)

For a probability distribution \(\mathcal{D}\) over \(\mathbb {N}\), let \(\Omega = \mathbb {N}^\infty \) with the infinite product measure \(\mathcal{D}^\infty \). The binary \(\mathcal{D}\)-repeat channel is

\[ \mathrm{RC}_\mathcal{D}(\omega ,x) = (x_1)^{\omega _1}(x_2)^{\omega _2}\cdots (x_n)^{\omega _n} \qquad (x \in \{ 0,1\} ^n), \]

i.e. the \(i\)-th input bit is repeated \(\omega _i \sim \mathcal{D}\) times, independently. It is square-integrable if \(\mu (\mathcal{D}){\lt}\infty \) and \(\sigma ^2(\mathcal{D}){\lt}\infty \). The origin (input index) of the \(j\)-th output bit is well defined as \(\min \{ i \ge 1 : \sum _{k=1}^i \omega _k \ge j\} \). Special cases: the binary deletion channel (\(\mathcal{D}=\mathrm{Bernoulli}\)), the Poisson repeat channel (\(\mathcal{D}=\mathrm{Poisson}\)), and sticky channels.

LaTeX

Definition 17Encoder, decoder, soundness (Def. 2.3, part 2)

An encoding algorithm for \(\mathcal{C}\) is a family of maps \(\mathsf{Enc}: \{ 0,1\} ^{nR_n} \to \mathcal{C}_n\), and a decoding algorithm is a family \(\mathsf{Dec}: \{ 0,1\} ^* \to \{ 0,1\} ^{nR_n}\). The code \(\mathcal{C}\) is sound with respect to a channel \(\mathsf{Ch}\) if there exist \(\mathsf{Enc},\mathsf{Dec}\) such that \(\mathbb {P}\big(\mathsf{Dec}(\mathsf{Ch}\, \mathsf{Enc}(X)) \ne X\big) \to 0\) as \(n\to \infty \), where \(X\) is uniform on \(\{ 0,1\} ^{nR_n}\).

LaTeX

Definition 20Stationary ergodic process (Def. 2.6)

A stochastic process \(\{ X_j\} _{j\ge 1}\) is stationary if for every \(j,N\in \mathbb {N}\), \((X_1,\dots ,X_N) \overset {\mathcal{D}}{=} (X_{j+1},\dots ,X_{j+N})\). It is stationary ergodic if, in addition, Birkhoff’s pointwise ergodic theorem (Theorem 13) holds: for every \(f\in L^1\), almost surely \(\mathbb {E}f(X_1) = \lim _{n\to \infty }\tfrac 1n\sum _{j=1}^n f(X_j)\).

LaTeX

Definition 19Trimming repeat channel (Def. 2.5)

Let \(\mathrm{TRIM}\) be the deterministic channel acting on \(x\in \{ 0,1\} ^n\) by deleting the longest all-zero prefix and suffix:

\[ \mathrm{TRIM}\, x = x_{\min \{ i\in [n]:x_i=1\} }^{\max \{ i\in [n]:x_i=1\} } \]

(the empty string if \(x\) is all zeros). The trimming \(\mathcal{D}\)-repeat channel is the composition \(\mathrm{TRC}_\mathcal{D}:= \mathrm{TRIM}\circ \mathrm{RC}_\mathcal{D}\).

LaTeX

Definition 31Trimming Dobrushin channel (Def. 4.4)

Given \(n\) and distributions \(\mathcal{T}_\ell ,\mathcal{T}_r\) over \(\mathbb {N}\) (may depend on \(n\)), let \(\mathrm{TRIM}_{\mathcal{T}_\ell ,\mathcal{T}_r}\) act on \(x\in \{ 0,1\} ^n\) by \(\mathrm{TRIM}_{\mathcal{T}_\ell ,\mathcal{T}_r}\, x=x_{t_\ell }^{n-t_r}\) where \(t_\ell \sim \mathcal{T}_\ell \), \(t_r\sim \mathcal{T}_r\) are independent (the empty string if \(t_2{\lt}t_1\)). For a Dobrushin channel \(\mathrm{DC}=\mathrm{DC}_{\mathcal{D}_0,\mathcal{D}_1}\), the \((\mathcal{T}_\ell ,\mathcal{T}_r)\)-trimming Dobrushin channel is \(\mathrm{TDC}_{\mathcal{D}_0,\mathcal{D}_1,\mathcal{T}_\ell ,\mathcal{T}_r} :=\mathrm{TRIM}_{\mathcal{T}_\ell ,\mathcal{T}_r}\circ \mathrm{DC}\).

LaTeX

Definition 2Variance \(\sigma ^2(\mathcal{D})\)

For a probability distribution \(\mathcal{D}\) over \(\mathbb {R}\), \(\sigma ^2(\mathcal{D}) := \mathbb {E}\big[(\mathcal{D}-\mu (\mathcal{D}))^2\big]\) is its variance; written \(\sigma ^2\) when \(\mathcal{D}\) is understood.

LaTeX Lean

Lemma 33Received buffers stay zero-heavy (Lemma A.2)

Let \(c\) be a codeword of length \(m\) from the near-capacity-achieving code for a biased square-integrable Dobrushin channel \(\mathrm{DC}=\mathrm{DC}_{\mathcal{D}_0,\mathcal{D}_1}\) (the analogue of Proposition 28). The probability that any length-\(\nu \eta m\) substring of \(\mathrm{DC}\, c\) has a fraction of zeros less than \(f+\kappa \) (where \(f=\mathbb {E}[w(Y_0)]/\mathbb {E}|Y_0|\), \(Y_0\sim \mathcal{D}_0\)) vanishes as \(m\to \infty \), for any \(\nu ,\eta {\gt}0\) and all \(\kappa {\gt}0\) small enough.

LaTeX

Lemma 26Conditioning on the trimmed indices is cheap (Claim 3.3)

Assume \(\mathcal{D}\) is bounded by \(Bn\). Let \(Y=\mathrm{RC}_\mathcal{D}\, X\) and \(Y'=\mathrm{TRC}_\mathcal{D}\, X\), and let \(\widetilde L,\widetilde R\) be as in Lemma 25. Then

\begin{align} |I(X;Y)-I(X;Y\mid \widetilde L,\widetilde R)| & = o(n), \\ \lim _{n\to \infty }\frac1n\sup _X I(X;Y\mid \widetilde L,\widetilde R) & = \lim _{n\to \infty }\frac1n\sup _X I(X;Y’). \end{align}

LaTeX

Lemma 12Chain rule for entropy / mutual information

\(H(X,Y) = H(Y) + H(X\mid Y)\), and more generally the chain rule \(I(X;Y,Z) = I(X;Z) + I(X;Y\mid Z)\) holds.

LaTeX Lean

Lemma 27Coupled channels have equal information rate (Lemma A.1)

Let \(\mathsf{Ch}_1,\mathsf{Ch}_2:\Omega \times \{ 0,1\} ^*\to \{ 0,1\} ^*\) be two channels. Suppose there is a sequence \(p_n\ge 0\) with \(p_n\to 1\) such that for every \(x\in \{ 0,1\} ^n\) there is a set \(A_x\subseteq \Omega \) with: conditioned on \(A_x\), the distributions of \(\mathsf{Ch}_1\, x\) and \(\mathsf{Ch}_2\, x\) are the same, and \(\mathbb {P}(A_x)\ge p_n\). Then the information rates of \(\mathsf{Ch}_1\) and \(\mathsf{Ch}_2\) are equal.

LaTeX

Lemma 32Trimming preserves the Dobrushin rate (Lemma 4.5 / A.3)

Let \(\mathrm{DC}\) be a square-integrable Dobrushin channel and let \(\mathrm{TDC}_{\mathcal{T}_\ell ,\mathcal{T}_r}\) be the trimming version with \(\mathcal{T}_\ell ,\mathcal{T}_r\) each bounded by \(\nu \eta m\) for inputs of length \(m\). Then the information rate of \(\mathrm{TDC}_{\mathcal{T}_\ell ,\mathcal{T}_r}\) is at least \(\mathrm{Cap}(\mathrm{DC})-2\nu \eta \).

LaTeX

Lemma 25Trimming preserves the rate (Lemma 3.2)

Let \(\mathrm{RC}_\mathcal{D}\) be a square-integrable repeat channel. Then the information rate of the \(\mathrm{TRC}_\mathcal{D}\) is \(\mathrm{Cap}(\mathrm{RC}_\mathcal{D})\).

LaTeX

Proposition 28Approximately balanced capacity-achieving inner code (Prop. 3.4)

Fix a square-integrable \(\mathcal{D}\)-repeat channel \(\mathsf{Ch}=\mathrm{RC}_\mathcal{D}\), or its trimming version \(\mathsf{Ch}=\mathrm{TRC}_\mathcal{D}\), with information rate \(\mathcal{I}\). For every \(\zeta ,\varepsilon \in (0,1)\) there exists \(\gamma \in (0,\tfrac 12)\) and a family of codes \(\mathcal{C}_n\subseteq \{ 0,1\} ^n\) for \(\mathsf{Ch}\) with rate \(R\ge \mathcal{I}-\varepsilon \) such that for every codeword \(c\in \mathcal{C}_n\) and every \(i\in [n-\zeta n]\),

\[ \gamma \zeta n \le w\big(c_i^{i+\zeta n}\big) \le (1-\gamma )\zeta n, \]

i.e. every length-\(\zeta n\) substring of every codeword is approximately balanced.

LaTeX

Theorem 13Birkhoff’s pointwise ergodic theorem

For an ergodic measure-preserving system and every \(f \in L^1\), the time averages converge almost surely: \(\lim _{n\to \infty } \tfrac 1n \sum _{j=1}^n f(X_j) = \mathbb {E}f(X_1)\) a.s.

LaTeX Lean

Theorem 5Chebyshev’s inequality

For a random variable \(X\) with finite variance and any \(c{\gt}0\), \(\mathbb {P}\big(|X-\mathbb {E}X| \ge c\big) \le \mathrm{Var}[X]/c^2\).

LaTeX Lean

Theorem 8Chernoff / Hoeffding concentration bound

Let \(Z \sim \mathrm{Binomial}(t,p)\). For any \(a {\gt} p\, t\) there is a constant \(c{\gt}0\) with \(\mathbb {P}(Z \ge a) \le e^{-c\, t}\); in particular, a \(\mathrm{Binomial}(t,p)\) exceeds \(\tfrac {a}{t}\cdot t\) for \(a/t {\gt} p\) only with probability \(e^{-\Omega (t)}\).

LaTeX Lean

Theorem 34Codes for biased Dobrushin channels (Thm. 4.3)

Fix a biased square-integrable Dobrushin channel \(\mathrm{DC}\). For every \(\varepsilon {\gt}0\) there exists a sound code \(\mathcal{C}\) with rate \(R\) for the \(\mathrm{DC}\) with \(R\ge \mathrm{Cap}(\mathrm{DC})-\varepsilon \) and linear-time encoding, quasi-linear-time decoding. Moreover the decoder has probability of failure \(e^{-\Omega (n)}\).

LaTeX

Theorem 21Generalized Shannon’s Theorem, Dobrushin (Thm. 2.7)

Consider a channel \(\mathsf{Ch}\) that acts independently on each input bit and appends the corresponding outputs, i.e. \(\mathsf{Ch}\, x \overset {\mathcal{D}}{=} (\mathsf{Ch}\, x_1)(\mathsf{Ch}\, x_2)\cdots (\mathsf{Ch}\, x_n)\) for \(x\in \{ 0,1\} ^n\), and suppose \(\mathbb {E}|\mathsf{Ch}\, b|{\lt}\infty \) for \(b\in \{ 0,1\} \). Then

\[ \mathrm{Cap}(\mathsf{Ch}) = \lim _{n\to \infty }\frac1n \sup _{X^n} I(X^n;Y^n), \]

the supremum over random variables \(X^n\) supported on \(\{ 0,1\} ^n\) with \(Y^n=\mathsf{Ch}\, X^n\). Moreover the capacity is attained by a stationary ergodic input process. When the right-hand limit exists for a channel not necessarily satisfying these hypotheses (e.g. a trimming repeat channel), it is called the information rate of the channel.

LaTeX

Theorem 23Main theorem (Thm. 3.1)

Fix a square-integrable repeat channel \(\mathrm{RC}_\mathcal{D}\). For every \(\varepsilon {\gt}0\) there exists a sound code \(\mathcal{C}\) with rate \(R\) for the \(\mathrm{RC}_\mathcal{D}\) such that \(R \ge \mathrm{Cap}(\mathrm{RC}_\mathcal{D})-\varepsilon \), with linear-time encoding and quasi-linear-time decoding. Moreover the decoder has probability of failure \(e^{-\Omega (n)}\).

LaTeX

Theorem 22Worst-case insertion/deletion outer codes, HS17/HRS19 (Thm. 2.9)

For every \(\varepsilon ,\delta \in (0,1)\) there exists a family of codes \(\mathcal{C}_n\) of rate \(1-\delta -\varepsilon \) over an alphabet \(\Sigma \) of size \(O_\varepsilon (1)\) that can deterministically correct insertion/deletion (worst-case) errors of edit distance at most \(\delta n\). Moreover the \(\mathcal{C}_n\) have encoding and decoding algorithms running in linear and quasi-linear time, respectively.

LaTeX

Theorem 6Union bound

For events \(A_1,\dots ,A_t\), \(\mathbb {P}\big(\bigcup _i A_i\big) \le \sum _i \mathbb {P}(A_i)\).

LaTeX Lean