Dependency graph

Legend

Boxes: definitions
Ellipses: theorems and lemmas
Blue border: the statement of this result is ready to be formalized; all prerequisites are done
Orange border: the statement of this result is not ready to be formalized; the blueprint needs more work
Blue background: the proof of this result is ready to be formalized; all prerequisites are done
Green border: the statement of this result is formalized
Green background: the proof of this result is formalized
Dark green background: the proof of this result and all its ancestors are formalized
Dark green border: this is in Mathlib

Corollary 42Corollary 23: linear time in the logarithmic cost model

The product of an asymptotically good linear code of length \(O(\log n)\) with one of the expander codes (of length \(O(n/\log n)\)) from Section V or VI yields an asymptotically good code that can be decoded in linear time on a RAM in the logarithmic cost model.

LaTeX

Definition 9Asymptotically good family

A family of linear codes is asymptotically good if there are constants \(r{\gt}0\) and \(\alpha {\gt}0\) such that the codes have block length tending to infinity, rate at least \(r\), and minimum relative distance at least \(\alpha \).

LaTeX

Definition 4Binary entropy function

The binary entropy function is \(H(x) = -x\log _2 x - (1-x)\log _2(1-x)\) for \(x \in (0,1)\), with \(H(0)=H(1)=0\). It governs the Gilbert–Varshamov bound and the random-expansion estimates of Appendix II.

LaTeX Lean

Definition 15Unbalanced bipartite \((c,d)\)-regular graph

An unbalanced bipartite graph has its vertices split into two sets with no edges inside a set. It is \((c,d)\)-regular if every vertex on one side has degree \(c\) and every vertex on the other side has degree \(d\). By counting edges, the number of \(c\)-regular vertices exceeds the number of \(d\)-regular vertices by the factor \(d/c\).

LaTeX

Definition 16\((c,d,\epsilon ,\delta )\)-expander

A graph is a \((c,d,\epsilon ,\delta )\)-expander if it is a \((c,d)\)-regular graph in which every subset of at most an \(\epsilon \) fraction of the \(c\)-regular vertices expands (into the \(d\)-regular side) by a factor of at least \(\delta \). We use families of \((c,d,\epsilon ,\delta )\)-expanders in which \(c,d,\epsilon ,\delta \) are constant as the number of vertices grows.

LaTeX

Definition 13Boolean circuit: size and depth

An (acyclic) Boolean circuit has gates each computing a Boolean function of two input wires; a gate output may fan out to many wires. The size is the number of wires and the depth is the length of the longest directed path. This is the model used to analyze the parallel decoders.

LaTeX

Definition 8Correcting an \(\epsilon \) fraction of error

An algorithm corrects an \(\epsilon \) fraction of error from a code \(\mathcal{C}\) if, on input a word \(w\) that differs from some \(v \in \mathcal{C}\) in at most \(\epsilon n\) symbols, it outputs \(v\) (no restriction is placed on its behaviour on other inputs).

LaTeX

Definition 20Definition 4: edge-vertex incidence graph

Let \(G\) be a graph with edge set \(E\) and vertex set \(V\). The edge-vertex incidence graph of \(G\) is the bipartite graph with vertex set \(E \cup V\) and edge set \(\{ (e,v) \in E\times V : v \text{ is an endpoint of } e\} \). If \(G\) is \(d\)-regular on \(n\) vertices, this is a \((2,d)\)-regular graph with \(dn/2\) vertices on one side and \(n\) on the other.

LaTeX

Definition 22Definition 6: the expander code \(\mathcal{C}(B,\mathcal{S})\)

Let \(B\) be a \((c,d)\)-regular graph between a set of \(n\) nodes \(\{ v_1,\dots ,v_n\} \), called variables, and a set of \(cn/d\) nodes \(\{ C_1,\dots ,C_{cn/d}\} \), called constraints, with \(d{\gt}c\). Let \(b(i,j)\) be a function so that for each constraint \(C_i\) the variables neighbouring \(C_i\) are \(v_{b(i,1)},\dots ,v_{b(i,d)}\). Let \(\mathcal{S}\) be an error-correcting code of block length \(d\). The expander code \(\mathcal{C}(B,\mathcal{S})\) is the code of block length \(n\) whose codewords are the words \((x_1,\dots ,x_n)\) such that, for \(1 \le i \le cn/d\), the restriction \((x_{b(i,1)},\dots ,x_{b(i,d)})\) is a codeword of \(\mathcal{S}\). Because the restrictions are linear, \(\mathcal{C}(B,\mathcal{S})\) is a linear code.

LaTeX

Definition 14Expansion by a factor of \(\delta \)

Let \(G=(V,E)\) be a graph on \(n\) vertices. We say every set of at most \(m\) vertices expands by a factor of \(\delta \) if for all \(S \subseteq V\) with \(|S| \le m\),

\[ \bigl|\{ \, y : \exists \, x \in S \text{ with } (x,y)\in E\, \} \bigr| {\gt} \delta \, |S|. \]

LaTeX

Definition 1The binary field \(\mathrm{GF}(2)\)

The alphabet \(\{ 0,1\} \) with addition and multiplication modulo \(2\), i.e. the field \(\mathrm{GF}(2)= \mathbb {Z}/2\mathbb {Z}\). All codes in this blueprint are built over \(\mathrm{GF}(2)\) (the constructions generalize to larger fields).

LaTeX Lean

Definition 2Hamming distance

For words \(u,v \in \mathrm{GF}(2)^{\, n}\), the Hamming distance \(\Delta (u,v)\) is the number of coordinates in which \(u\) and \(v\) differ.

LaTeX Lean

Definition 3Hamming weight

The Hamming weight \(\operatorname {wt}(w)\) of \(w \in \mathrm{GF}(2)^{\, n}\) is the number of nonzero coordinates of \(w\); equivalently \(\operatorname {wt}(w) = \Delta (w,0)\).

LaTeX Lean

Definition 6Linear code, block length, rate

A linear code of block length \(n\) and rate \(r\) is a subspace \(\mathcal{C}\subseteq \mathrm{GF}(2)^{\, n}\) of dimension \(rn\). Words have \(n\) symbols, of which \(rn\) are message symbols (freely chosen) and the remaining \((1-r)n\) are determined by the linear constraints defining \(\mathcal{C}\).

LaTeX

Definition 31Simple Parallel Decoding Algorithm

Repeat until no such variable remains: in parallel, flip every variable that is in more unsatisfied than satisfied constraints.

LaTeX

Definition 24The even-weight parity-check code \(\mathcal{P}\)

Let \(\mathcal{P}\) be the code consisting of all even-weight words of length \(d\), i.e. those \((x_1,\dots ,x_d) \in \{ 0,1\} ^d\) with \(\sum _i x_i = 0 \pmod2\). It has rate \((d-1)/d\) and minimum relative distance \(2/d\).

LaTeX

Definition 12Pointer Machine

A Pointer Machine (Kolmogorov–Uspenskii machine) operates on a constant-degree directed graph: its input, output and working memory are nodes; at each step the CPU node inspects the configuration of nodes within distance two of itself and rearranges edges, possibly creating new nodes. Any reasonable model is simulable on a Pointer Machine up to a constant factor.

LaTeX

Definition 41Product (concatenation) of two codes

The product of an outer expander code over an alphabet of \((\log n)/2\) bits with an inner binary code of length \(\log n\) and rate \(1/2\) is the code obtained by encoding each \((\log n)/2\)-bit byte of a codeword with the inner code. Equivalently it is a code over the larger alphabet whose symbols are themselves protected by a short good code.

LaTeX

Definition 11RAM, logarithmic cost model

Under the logarithmic cost model the cost of a basic step is proportional to the bit-length of its arguments: adding two \(n\)-bit numbers costs \(\Theta (n)\), and reading an \(n\)-bit word from an address described by \(n\) bits costs \(2n\).

LaTeX

Definition 10RAM, uniform cost model

A Random Access Machine (RAM) has a central processor with basic operations (addition, subtraction, read/write of a memory cell, branch on zero, branch on positive, etc.). Under the uniform cost model each basic operation costs one unit of time, regardless of operand size.

LaTeX

Definition 7Minimum relative distance

A code \(\mathcal{C}\) has minimum relative distance \(\alpha \) if every pair of distinct words of \(\mathcal{C}\) differ in at least \(\alpha n\) symbols. For a linear code this equals \(\min \{ \operatorname {wt}(w)/n : w \in \mathcal{C},\ w \neq 0\} \).

LaTeX

Definition 26Satisfied and unsatisfied constraints

A constraint \(C_i\) is satisfied by a word \((x_1,\dots ,x_n)\) if its restriction \((x_{b(i,1)},\dots ,x_{b(i,d)})\) is a codeword of \(\mathcal{S}\); otherwise it is unsatisfied. For the even-weight code \(\mathcal{P}\), a constraint is satisfied iff the sum of its neighbouring variables is even.

LaTeX

Definition 27Simple Sequential Decoding Algorithm

Repeat the following until no such variable remains: if there is a variable that is in more unsatisfied than satisfied constraints, then flip its value (a \(0\) becomes \(1\) and a \(1\) becomes \(0\)). On termination, if all constraints are satisfied output the current word; otherwise report “failed to decode”.

LaTeX

Lemma 21Lemma 5: Alon–Chung

Let \(G\) be a \(d\)-regular graph on \(n\) vertices with second-largest eigenvalue \(\lambda \). Let \(X\) be a subset of the vertices of \(G\) of size \(\gamma n\). Then the number of edges contained in the subgraph induced by \(X\) in \(G\) is at most

\[ \frac{dn}{2}\left(\gamma ^2 + \frac{\lambda }{d}\, \gamma (1-\gamma )\right). \]

LaTeX

Lemma 5Azuma’s inequality

Let \(X_0,\dots ,X_m\) be a martingale with bounded differences \(|X_{i+1}-X_i| \le 1\). Then for \(\lambda {\gt} 0\), \(\Pr [\, |X_m - X_0| {\gt} \lambda \sqrt{m}\, ] {\lt} 2e^{-\lambda ^2/2}\).

LaTeX

Lemma 34Lemma 15: distance from eigenvalue separation

If \(\mathcal{S}\) is a linear code of rate \(r\), block length \(d\), and minimum relative distance \(\epsilon \), and \(B\) is the edge-vertex incidence graph of a \(d\)-regular graph with second-largest eigenvalue \(\lambda \), then the code \(\mathcal{C}(B,\mathcal{S})\) has rate at least \(2r-1\) and minimum relative distance at least

\[ \left(\frac{\epsilon -\lambda /d}{1-\lambda /d}\right)^2. \]

LaTeX

Lemma 37Gilbert–Varshamov bound

For every \(\epsilon \) with \(0{\lt}\epsilon {\lt}1/2\) and all sufficiently long block lengths, there exists a binary linear code of minimum relative distance \(\epsilon \) and rate \(1-H(\epsilon )\).

LaTeX

Lemma 39Lemma 21: Kahale path-counting bound

Let \(G_{n,d}\) be a \(d\)-regular graph on \(n\) nodes with second-largest eigenvalue \(\lambda \). Let \(S\) be a subset of the vertices of \(G_{n,d}\) of size \(\gamma n\). Then the number of paths of length \(k\) contained in the subgraph induced by \(S\) in \(G_{n,d}\) is at most

\[ \gamma n d^k\! \left(\gamma + \frac{\lambda }{d}(1-\gamma )\right)^{\! k}. \]

LaTeX

Lemma 35Lemma 17: progress of one parallel round

Let \(\mathcal{S}\) be a linear code of rate \(r\), block length \(d\), and minimum relative distance \(\epsilon \), and let \(B\) be the edge-vertex incidence graph of a \(d\)-regular graph with second-largest eigenvalue \(\lambda \). If a parallel decoding round for \(\mathcal{C}(B,\mathcal{S})\) is given as input a word of relative distance \(\alpha \) from a codeword, then it outputs a word of relative distance at most

\[ \alpha \left(\frac{2}{3} + \frac{16\alpha }{\epsilon ^2} + \frac{4\lambda }{\epsilon d}\right) \]

from that codeword.

LaTeX

Lemma 28Lemma 9: sequential decoding runs in linear time

For \(c\) and \(d\) constant, the implementation of the simple sequential decoding algorithm (Fig. 2 of the paper) runs in linear time on a Pointer Machine, and on a RAM under the uniform cost model.

LaTeX

Proposition 19Proposition 3: polynomial-time construction of Theorem 2 graphs

Each graph described in Theorem 18 can be constructed in time polynomial in its number of vertices. Moreover there is an algorithm that, given a logarithmic-size description of the graph, produces the labels of the neighbours of a node in time polynomial in the length of the labels (i.e. polylogarithmic in the number of vertices).

LaTeX

Proposition 33Proposition 12: parallel decoding as a circuit

For \(c\) and \(d\) constant, the simple parallel decoding algorithm can be implemented as a Boolean circuit of size \(O(n\log n)\) and depth \(O(\log n)\).

LaTeX

Proposition 25Section V example: \(\mathcal{C}(B,\mathcal{P})\)

Let \(B\) be a graph with expansion greater than \(c/2\) on sets of size at most \(\alpha n\), and let \(\mathcal{P}\) be the even-weight code of length \(d\). Then the parity-check matrix of \(\mathcal{C}(B,\mathcal{P})\) is the adjacency matrix of \(B\), and \(\mathcal{C}(B,\mathcal{P})\) has rate \(1-c/d\) and minimum relative distance at least \(\alpha \).

LaTeX

Proposition 17Proposition 1: random graphs are good expanders

Let \(B\) be a randomly chosen \((c,d)\)-regular bipartite graph between \(n\) \(c\)-regular vertices and \((c/d)n\) \(d\)-regular vertices. Then for all \(0{\lt}\alpha {\lt}1\), with high probability all sets of \(\alpha n\) \(c\)-regular vertices in \(B\) have at least

\[ n\! \left(\frac{c}{d}\bigl(1-(1-\alpha )^d\bigr) - \sqrt{\frac{2c\alpha \, H(\alpha )}{\log _2 e}}\right) \]

neighbours, where \(H(\cdot )\) is the binary entropy function.

LaTeX

Proposition 36Proposition 16: one round is a constant-depth circuit

For \(\mathcal{S}\) fixed, a parallel decoding round for \(\mathcal{C}(B,\mathcal{S})\) can be implemented as a linear-size Boolean circuit of constant depth.

LaTeX

Theorem 40Theorem 22: higher-rate generalization

For all integers \(k\ge 2\) and all \(\epsilon \) such that \(1-kH(\epsilon ){\gt}0\), there exists a polynomial-time constructible family of linear codes of rate \(1-kH(\epsilon )\) and minimum relative distance arbitrarily close to \(\epsilon ^{1+1/(k-1)}\), for which a circuit of size \(O(n\log n)\) and logarithmic depth will decode some \(\Omega (\epsilon ^{1+1/(k-1)})\) fraction of error. Moreover this circuit can be simulated in linear time on a sequential machine.

LaTeX

Theorem 44Theorem 26 (= Proposition 1): expansion of random graphs

Let \(B\) be a randomly chosen \((c,d)\)-regular bipartite graph between \(n\) variables and \((c/d)n\) constraints. Then for all \(0{\lt}\alpha {\lt}1\), with exponentially high probability all sets of \(\alpha n\) variables in \(B\) have at least

\[ n\! \left(\frac{c}{d}\bigl(1-(1-\alpha )^d\bigr) - \sqrt{\frac{2c\alpha \, H(\alpha )}{\log _2 e}}\right) \]

neighbours, where \(H(\cdot )\) is the binary entropy function.

LaTeX

Theorem 38Theorem 19: explicit asymptotically good expander codes

For all \(\epsilon \) such that \(1-2H(\epsilon ){\gt}0\), there exists a polynomial-time constructible family of expander codes of rate \(1-2H(\epsilon )\) and minimum relative distance arbitrarily close to \(\epsilon ^2\), in which an \(\alpha {\lt} \epsilon ^2/48\) fraction of error can be corrected by a circuit of size \(O(n\log n)\) and depth \(O(\log n)\). Moreover, the action of this circuit can be simulated in linear time on a Pointer Machine or RAM under the uniform cost model.

LaTeX

Theorem 18Theorem 2: Lubotzky–Phillips–Sarnak, Margulis

For every pair of primes \(p,q\) congruent to \(1\) modulo \(4\) such that \(p\) is a quadratic residue modulo \(q\), there is a \((p+1)\)-regular Cayley graph of \(PSL(2,\mathbb {Z}/q\mathbb {Z})\) with \(q(q^2-1)/2\) vertices whose second-largest eigenvalue is at most \(2\sqrt{p}\). A graph with this eigenvalue separation is a good expander.

LaTeX

Theorem 30Theorem 24: necessity of expansion

Let \(d{\gt}c\), and let \(B\) be a bipartite graph between \(n\) variables of degree \(c\) and \((c/d)n\) constraints of degree \(d\) such that the simple sequential decoding algorithm successfully decodes all sets of at most \(\alpha n\) errors in \(\mathcal{C}(B,\mathcal{P})\). Then every set of \(\alpha n\) variables must have at least

\[ \alpha n\left(1 + \frac{2\, \frac{c-1}{2d}}{3+\frac{c-1}{2d}}\right) \]

neighbours.

LaTeX

Theorem 32Theorem 11: correctness of parallel decoding

Let \(B\) be a \((c,d,\alpha ,(3/4+\epsilon )c)\)-expander, for any \(\epsilon {\gt}0\), and let \(\mathcal{P}\) be the code of all even-weight words of length \(d\). Then the simple parallel decoding algorithm corrects any \(\alpha _0 {\lt} \alpha (1+4\epsilon )/2\) fraction of error after \(\log _{1/(1-4\epsilon )}(\alpha _0 n)\) decoding rounds.

LaTeX

Theorem 23Theorem 7: rate and distance of \(\mathcal{C}(B,\mathcal{S})\)

Let \(B\) be a \((c,d,\alpha ,c/(d\epsilon ))\)-expander and let \(\mathcal{S}\) be an error-correcting code of block length \(d\), rate \(r {\gt} (c-1)/c\), and minimum relative distance \(\epsilon \). Then \(\mathcal{C}(B,\mathcal{S})\) has rate at least \(cr-(c-1)\) and minimum relative distance at least \(\alpha \).

LaTeX

Theorem 29Theorem 10: correctness of sequential decoding

Let \(B\) be a \((c,d,\alpha ,3c/4)\)-expander and let \(\mathcal{P}\) be the code of all even-weight words of length \(d\). Then the simple sequential decoding algorithm corrects an \(\alpha /2\) fraction of error from the code \(\mathcal{C}(B,\mathcal{P})\).

LaTeX

Theorem 43Theorem 25: upper bound on expansion

Let \(B\) be a bipartite graph between \(n\) \(c\)-regular vertices and \((c/d)n\) \(d\)-regular vertices. For all \(0{\lt}\alpha {\lt}1\), there exists a set of \(\alpha n\) \(c\)-regular vertices with at most

\[ n\, \frac{c}{d}\bigl(1-(1-\alpha )^d\bigr) + O(1) \]

neighbours.

LaTeX