- Boxes
- definitions
- Ellipses
- theorems and lemmas
- Blue border
- the statement of this result is ready to be formalized; all prerequisites are done
- Orange border
- the statement of this result is not ready to be formalized; the blueprint needs more work
- Blue background
- the proof of this result is ready to be formalized; all prerequisites are done
- Green border
- the statement of this result is formalized
- Green background
- the proof of this result is formalized
- Dark green background
- the proof of this result and all its ancestors are formalized
- Dark green border
- this is in Mathlib
Fix a linear code \(\mathcal{C}_{\mathrm{in}}\subseteq \mathbb {F}_2^{n_0}\) of dimension \(k_0\) that is \(\tau \)-nice for \(\tau = 1/\sqrt{n_0}\). Let \(\varepsilon \triangleq k_0/n_0\) and \(r \in \mathbb {N}\). Let \(G_0 \in \mathbb {F}_2^{n_0\times k_0}\) have left kernel \(\mathcal{C}_{\mathrm{in}}^\perp \), and set
Then
Suppose \(r \sim \operatorname {Poi}(\tilde c\varepsilon ^2 N)\) and \(g \sim \mathcal{D}_r\). Then the joint distribution of the coordinates \(g_\alpha \) is
In particular the \(g_\alpha \) are independent and identically distributed across \(\alpha \in [n]\), matching the product distribution \(\mathcal{D}^n\) of Theorem 50.
Suppose the constants \(c,\tilde c\) satisfy \(\tilde c \geq 4\ln 2\) and \(c \geq 72\tilde c\), that \(n\) is sufficiently large, \(r \sim \operatorname {Poi}(\tilde c \varepsilon ^2 N)\), and \(n_0 \geq 64/\varepsilon ^4\). Let
Then \(\mathbb {E}_r[(13)] \leq 2^{-\varepsilon ^2 N/2}\).
Throughout, \(\mathcal{C}_{\mathrm{out}}\subseteq \mathbb {F}_q^n\) and \(\mathcal{C}_{\mathrm{in}}\subseteq \mathbb {F}_2^{n_0}\) are linear codes of rate \(\varepsilon \), with \(q = 2^{k_0}\), \(k = \varepsilon n\), \(k_0 = \varepsilon n_0 = \log _2 q\), \(N = n_0 n\) and \(K = k_0 k\). Identifying \(\mathbb {F}_q\cong \mathbb {F}_2^{k_0}\), the concatenated code \(\mathcal{C}= \mathcal{C}_{\mathrm{out}}\circ \mathcal{C}_{\mathrm{in}}\subseteq \mathbb {F}_2^N\) is the encoding map \(\mathcal{C}: \mathbb {F}_2^K \to \mathbb {F}_2^N\) given, for a message \(m\) interpreted in \(\mathbb {F}_q^k\) with \(c = \mathcal{C}_{\mathrm{out}}(m)\), by
Its rate is the product of the rates and its distance is at least the product of the distances of \(\mathcal{C}_{\mathrm{out}}\) and \(\mathcal{C}_{\mathrm{in}}\).
Since \(1 - h_2(1/2 - \varepsilon ) = \Theta (\varepsilon ^2)\) as \(\varepsilon \to 0\), we say a family of low-rate codes \(\mathcal{C}_{N,\varepsilon } \subseteq \mathbb {F}_2^N\) approaches the GV bound if \(\mathcal{C}_{N,\varepsilon }\) has rate \(\Omega (\varepsilon ^2)\) and relative distance \(1/2 - O(\varepsilon )\), the asymptotics being as \(\varepsilon \to 0\) and \(N \to \infty \).
Fix a constant \(c {\gt} 0\). A message \(m \in \mathbb {F}_q^k\setminus \{ 0\} \) is bad if \(|X_m| \geq c\varepsilon N\). If there are no bad messages then every nonzero codeword of \(\mathcal{C}_{\mathrm{out}}\circ \mathcal{C}_{\mathrm{in}}\) has relative weight at least \(\frac{1-c\varepsilon }{2}\).
For \(r \geq 0\) define
For a finite alphabet \(\Sigma \) and \(n \in \mathbb {N}\), a code of length \(n\) is a subset \(\mathcal{C}\subseteq \Sigma ^n\). We also view \(\mathcal{C}\) as an encoding map \(\mathcal{C}: \Sigma ^k \to \Sigma ^n\) where \(k = \log _{|\Sigma |}|\mathcal{C}|\).
Let \(\mathcal{D}_r\) be the distribution on \(g \in \mathbb {F}_q^n\) obtained by sampling \(\mathcal{V}= ((\alpha _1,b_1),\dots ,(\alpha _r,b_r))\) with each \((\alpha _i,b_i) \in [n]\times \Omega \) independent and uniform, and setting \(g_\alpha = \sum _{b \in \mathcal{V}_\alpha } b\). Rewriting Definition 45 as an expectation gives
For a linear code \(\mathcal{C}\subseteq \mathbb {F}^n\) the dual code is
Over \(\mathbb {F}_q\) this uses the trace pairing of Definition 10.
For a word \(c \in \mathbb {F}_q^n\), the empirical distribution \(\mathcal{D}_c\) on \(\mathbb {F}_q\) is given by \(\Pr _{\mathcal{D}_c}[\sigma ] = \Pr _{\alpha \sim [n]}[c_\alpha = \sigma ]\), the fraction of coordinates of \(c\) equal to \(\sigma \).
Identifying \(\mathbb {F}_q\cong \mathbb {F}_2^{k_0}\) via a self-dual basis, for \(\varphi : \mathbb {F}_q^n \to \mathbb {R}\) and \(\omega \in \mathbb {F}_q^n\), the Fourier transform is
For \(x,y \in \mathbb {F}_2^N\) set \(\langle x, y\rangle = \sum _{\alpha =1}^N x_\alpha y_\alpha \in \mathbb {F}_2\). For \(\mathbb {F}_q\) with \(q {\gt} 2\) a power of \(2\) and \(x,y \in \mathbb {F}_q^n\) set \(\langle x, y\rangle = \operatorname {Tr}\! \left(\sum _\alpha x_\alpha y_\alpha \right)\). Fixing a self-dual basis (Lemma 9) to identify \(\mathbb {F}_q\) with \(\mathbb {F}_2^{k_0}\), these agree: \(\operatorname {Tr}(\alpha \cdot \beta ) = \langle \alpha , \beta \rangle \) for \(\alpha ,\beta \in \mathbb {F}_q\cong \mathbb {F}_2^{k_0}\).
For a finite field \(\mathbb {F}\), a code \(\mathcal{C}\subseteq \mathbb {F}^n\) is linear if it is an \(\mathbb {F}\)-linear subspace of \(\mathbb {F}^n\). Its dimension is its dimension as a subspace.
For \(c \in \mathbb {F}_q^n\), the min-entropy of \(\mathcal{D}_c\) is \(H_\infty (c) \triangleq -\log _2 \max _\sigma \Pr _{\mathcal{D}_c}[\sigma ]\), so that \(H_\infty (c) \leq \log _2 q\).
Let \(G_0 \in \mathbb {F}_2^{n_0 \times k_0}\) be a generator matrix of \(\mathcal{C}_{\mathrm{in}}\), so that \(\mathcal{C}_{\mathrm{in}}= \{ G_0 x : x \in \mathbb {F}_2^{k_0}\} \). Let \(\Omega \subseteq \mathbb {F}_2^{k_0}\) be the set of rows of \(G_0\), so \(|\Omega | = n_0\). Identifying \(\mathbb {F}_2^{k_0}\) with \(\mathbb {F}_q\) via a self-dual basis, we also view \(\Omega \subseteq \mathbb {F}_q\). We call \(\Omega \) the set derived from \(\mathcal{C}_{\mathrm{in}}\).
A code \(\mathcal{C}\subseteq \mathbb {F}_2^n\) is a random linear code of dimension \(k\) if it is chosen uniformly among all subspaces of \(\mathbb {F}_2^n\) of dimension \(k\). (More generally over \(\mathbb {F}_q\).)
For a code \(\mathcal{C}\subseteq \Sigma ^n\), the rate is \(R = \frac{\log _{|\Sigma |}|\mathcal{C}|}{n} = \frac{k}{n}\), where \(k = \log _{|\Sigma |}|\mathcal{C}|\).
For a code \(\mathcal{C}\subseteq \Sigma ^n\) with \(|\mathcal{C}| \geq 2\), the (relative) distance is \(\delta = \frac{1}{n}\min _{c \neq c' \in \mathcal{C}} \Delta (c,c')\). For a linear code this equals \(\frac{1}{n}\min _{0 \neq c \in \mathcal{C}}\operatorname {weight}(c)\).
For a smoothness parameter \(\eta {\gt} 0\), the smoothed min-entropy is
where the maximum is over distributions within total variation distance \(\eta \).
Fix \(0 {\lt} \tau {\lt} \varepsilon \). The inner code \(\mathcal{C}_{\mathrm{in}}\subseteq \mathbb {F}_2^{n_0}\) (of rate \(\varepsilon \)) is \(\tau \)-nice if for every \(i \in \{ 1,\dots ,n_0\} \), the number of codewords \(c \in \mathcal{C}_{\mathrm{in}}^\perp \) with \(\operatorname {weight}(c) = i\) is at most
(The \(\varepsilon \) is dropped from the name because it is the common rate of \(\mathcal{C}_{\mathrm{in}}\) and \(\mathcal{C}_{\mathrm{out}}\) throughout.)
Let \(q = 2^{k_0}\) and \(\mathbb {F}_q\) be the field with \(q\) elements, viewed as a degree \(k_0\) extension of \(\mathbb {F}_2\). The field trace is \(\operatorname {Tr}(x) = x + x^2 + \cdots + x^{2^{k_0-1}}\). It is \(\mathbb {F}_2\)-linear with image \(\mathbb {F}_2\).
For a sequence \(\mathcal{V}= ((\alpha _1,b_1),\dots ,(\alpha _r,b_r))\), let \(V = V(\mathcal{V}) \in \mathbb {N}^{n \times n_0}\) be its “unordered” version, where \(V[\alpha ,b]\) is the number of times \((\alpha ,b)\) appears in \(\mathcal{V}\). Let \(B = B(\mathcal{V}) = V \bmod 2\) (entrywise). The weight \(\| B\| \) is the number of \(1\)-entries of \(B\).
For a message \(m \in \mathbb {F}_q^k\), define
Then \(X_m\) is the bias (number of zeros minus number of ones) of the codeword \(\mathcal{C}(m) = (\mathcal{C}_{\mathrm{out}}\circ \mathcal{C}_{\mathrm{in}})(m)\): the weight of \(\mathcal{C}(m)\) is at least \(\frac12 - O(\varepsilon )\) iff \(X_m = O(\varepsilon N)\). For an ordered list \(\mathcal{V}\) of pairs in \([n]\times \Omega \) and \(\alpha \in [n]\), let \(\mathcal{V}_\alpha \) be the multiset \(\{ b : (\alpha ,b) \in \mathcal{V}\} \) (with multiplicity).
For \(\alpha \in [0,1]\) with \(\alpha n \in \mathbb {N}\), \(\binom {n}{\alpha n} \geq \frac{1}{n+1}2^{nh_2(\alpha )} \geq 2^{(h_2(\alpha ) - \frac{\log _2(2n)}{2n})n}\).
For \(\alpha \in [0,1/2]\) and \(n \in \mathbb {N}\), \(\sum _{i=0}^{\lfloor \alpha n\rfloor }\binom {n}{i} \leq 2^{nh_2(\alpha )}\).
For any \(\varphi : \mathbb {F}_q^n \to \mathbb {R}\) and any \(x\), \(\varphi (x) = \sum _{\omega \in \mathbb {F}_q^n}\hat{\varphi }(\omega )(-1)^{\langle x, \omega \rangle }\).
For an \(\mathbb {F}_q\)-subspace \(V \subseteq \mathbb {F}_q^n\) and any \(g \in \mathbb {F}_q^n\),
For \(x \in [0,\tfrac 14]\), \(h_2^{-1}(1 - x) \geq \frac12 - 5x^2\).
For \(x \in [0,1]\), \(h_2^{-1}(1 - x^2) \leq \frac12 - \frac{\sqrt{\ln 2}}{2}x\).
For all \(x \in [0,1]\), \(h_2(x) \leq -x\log _2 x + 2x\).
Suppose \(\Omega \) is derived from \(\mathcal{C}_{\mathrm{in}}\). For every integer \(r \geq 1\),
where \(\mathcal{V}\) ranges over tuples \(((\alpha _1,b_1),\dots ,(\alpha _r,b_r))\) and \(g_\mathcal{V}\in \mathbb {F}_q^n\) is defined by \((g_\mathcal{V})_\alpha = \sum _{b \in \mathcal{V}_\alpha } b\).
Let \(\mathcal{C}\subseteq \mathbb {F}_2^n\) be a random linear code of dimension \(Rn\) and let \(x,y \in \mathbb {F}_2^n \setminus \{ 0\} \) with \(x \neq y\). Then
If \(r \sim \operatorname {Poi}(\lambda )\) then \(\Pr [r \geq \lambda + t] \leq \exp \! \big(\frac{-t^2}{\lambda + t}\big)\). In particular \(\Pr [r \geq \lambda + 100\sqrt{\lambda }] \leq \exp (-100^2/2)\).
If \(Y \sim \operatorname {Poi}(\lambda )\) then \(\Pr [Y \text{ odd}] = \tfrac 12(1 - e^{-2\lambda })\).
If \(r \sim \operatorname {Poi}(\Lambda )\) balls are thrown independently and uniformly into \(M\) bins, then the occupancy counts \(Y_1,\dots ,Y_M\) are independent, each distributed as \(\operatorname {Poi}(\Lambda /M)\).
For \(q = 2^{k_0}\) there exists a self-dual basis \(\nu _1,\dots ,\nu _{k_0}\) of \(\mathbb {F}_q\) over \(\mathbb {F}_2\), i.e. \(\operatorname {Tr}(\nu _i \nu _j) = \mathbf{1}[i=j]\).
The number of ways to choose a multiset of \(s\) even positive integers (an ordered assignment of even values to \(\leq s/2\) chosen positions summing to a prescribed even total \(S\) with \(s = S\)) is at most \(2^{3S/2}\); concretely, the number of nonnegative integer compositions counted in the proof of Claim 4.3 is \(\binom {S/2 + S - 1}{S} \leq 2^{3S/2}\).
Let \(\mathcal{C}_{\mathrm{in}}\subseteq \mathbb {F}_2^{n_0}\) be a random linear code of dimension \(k_0 = \varepsilon n_0\). Then with probability at least \(1 - n_0 \cdot 2^{-\tau n_0}\), the code \(\mathcal{C}_{\mathrm{in}}\) is \(\tau \)-nice.
Fix a self-dual basis \(\nu _1,\dots ,\nu _{k_0}\). If \(\alpha = \sum _i \alpha _i \nu _i\) and \(\beta = \sum _j \beta _j \nu _j\) with \(\alpha _i,\beta _j \in \mathbb {F}_2\), then \(\operatorname {Tr}(\alpha \cdot \beta ) = \langle (\alpha _1,\dots ,\alpha _{k_0}), (\beta _1,\dots ,\beta _{k_0})\rangle \).
For any \(n \in \mathbb {N}\), \(\frac{8}{\sqrt n}\leq \gamma \leq \frac13\), integers \(\frac{24\log n}{\gamma }\leq k\leq \frac{n}{5}\), and \(2^{2k/3}\leq T\leq 2^{(1-\gamma )k}\), a random linear code \(\mathcal{C}\subseteq \mathbb {F}_2^n\) of dimension \(k\) satisfies the following with probability \(1 - 2^{-\Omega (\gamma k + \gamma ^2 n + \sqrt n)}\). For \(0 \leq j \leq n\) let \(\Delta _j\) be the number of codewords of weight \(j\), and let \(j^\star \) be the minimal \(j\) with \(\sum _{i=0}^j \Delta _i \geq T\). Then:
\(j^\star \geq \alpha n\) for some \(\alpha \geq h_2^{-1}\! \big(1 - 2\cdot \frac{k-\log T}{n}\big)\);
\(\dfrac {\sum _{i=0}^{j^\star } i\, \Delta _i}{\sum _{i=0}^{j^\star }\Delta _i} \geq (1 - 2\gamma )\, j^\star \);
\(\dfrac {\Delta _{j^\star +1}}{\sum _{i=0}^{j^\star }\Delta _i}\leq 2^{\sqrt n}\).
For any \(r\), the number of bad messages \(m \in \mathbb {F}_q^k\setminus \{ 0\} \) is at most \(\mathcal{B}_r(\mathcal{C}_{\mathrm{out}},\mathcal{C}_{\mathrm{in}})\).
There exists a linear code \(\mathcal{C}_{\mathrm{out}}\subseteq \mathbb {F}_q^n\) of dimension \(k\) with \(\Pr _{x\sim \mathcal{D}^n}[x\in \mathcal{C}_{\mathrm{out}}^\perp \setminus \{ 0\} ]\leq \frac{1}{q^k}\), satisfying the condition of Theorem 50 with \(\Delta = 0\).
Let \(\mathcal{C}_{\mathrm{out}}\) be a random linear code of dimension \(k = \varepsilon n\) over \(\mathbb {F}_q\) and let \(\gamma = \frac12\varepsilon \), \(q = O(n)\). With high probability the symbols of every nonzero codeword of \(\mathcal{C}_{\mathrm{out}}\) are not contained in any set of size smaller than \(q^{1-\gamma }\); equivalently every nonzero codeword has \(H_\infty (c)\geq (1-\gamma )\log _2 q\) on the support level required.
Let \(\delta \in [0,1/2]\) and \(\eta \in (0, 1 - h_2(\delta ))\). Then for any \(n {\gt} 1/\eta \) there exists a linear code \(\mathcal{C}\subseteq \mathbb {F}_2^n\) with rate \(R \geq 1 - h_2(\delta ) - \eta \) and relative distance at least \(\delta \).
Fix any sufficiently small \(\varepsilon {\gt} 0\). For integers \(k,q \in \mathbb {N}\) with \(q = 2^{k_0}\), let \(n_0 = k_0/\varepsilon \) and \(n = k/\varepsilon \). Let \(\mathcal{C}_{\mathrm{out}}\subseteq \mathbb {F}_q^n\) be an \(\mathbb {F}_2\)-linear code of rate \(\varepsilon \), and let \(\mathcal{C}_{\mathrm{in}}\subseteq \mathbb {F}_2^{n_0}\) be a random linear code of rate \(\varepsilon \). Assume there exist constants \(\bar c_\gamma , \bar c_\eta \) such that for every nonzero \(c \in \mathcal{C}_{\mathrm{out}}\),
and \(n_0 = \Omega _{\bar c_\gamma }\big(\frac{\log (1/\varepsilon )}{\varepsilon ^2}\big)\). Then with probability at least \(1 - 2^{-\Omega _{\bar c_\gamma }(\varepsilon ^2 n_0 + \sqrt{n_0})}\) over \(\mathcal{C}_{\mathrm{in}}\), the concatenated code \(\mathcal{C}= \mathcal{C}_{\mathrm{out}}\circ \mathcal{C}_{\mathrm{in}}\) has relative distance \(\frac12 - O_{\bar c_\gamma ,\bar c_\eta }(\varepsilon )\).
There exist constants \(c, \bar{c}, \tilde{c} {\gt} 0\) such that the following holds. Fix an integer \(k {\gt} 0\), any sufficiently small \(\varepsilon {\gt} 0\) (in terms of \(\bar c\)), and a power of two \(q = 2^{k_0}\). Let \(n_0 = k_0/\varepsilon \), \(n = k/\varepsilon \), \(N = n_0 n\), and suppose \(q \geq 2^{\tilde c/\varepsilon ^3}\). Let \(\mathcal{C}_{\mathrm{in}}\subseteq \mathbb {F}_2^{n_0}\) be a linear code of dimension \(k_0\) that is \(\tau \)-nice for \(\tau = 1/\sqrt{n_0}\), and let \(\mathcal{C}_{\mathrm{out}}\subseteq \mathbb {F}_q^n\) be an independent random linear code of dimension \(k\). Then with probability at least \(1 - 2^{-\varepsilon ^2 N/\bar c}\) over \(\mathcal{C}_{\mathrm{out}}\), the relative distance of \(\mathcal{C}= \mathcal{C}_{\mathrm{out}}\circ \mathcal{C}_{\mathrm{in}}\subseteq \mathbb {F}_2^N\) is at least \(\frac12 - c\varepsilon \).
There are constants \(\tilde c, c {\gt} 0\) so that the following holds. Let \(\varepsilon {\gt} 0\), and let \(\mathcal{C}_{\mathrm{in}}\subseteq \mathbb {F}_2^{n_0}\) be a binary linear code of dimension \(k_0 = \varepsilon n_0\) that is \(\tau \)-nice for \(\tau = 1/\sqrt{n_0}\), with \(n_0 \geq 64/\varepsilon ^4\). Let \(\Omega \subseteq \mathbb {F}_q\) be the set derived from \(\mathcal{C}_{\mathrm{in}}\) (Definition 38). Let \(Y \in \mathbb {F}\) be \(Y = \sum _{b\in \Omega }\zeta _b\cdot b\) where \(\zeta _b \sim \operatorname {Ber}\big(\frac{1-e^{-2\tilde c\varepsilon ^2}}{2}\big)\) are i.i.d., and let \(\mathcal{D}\) be the distribution of \(Y\). Suppose \(\mathcal{C}_{\mathrm{out}}\subseteq \mathbb {F}_q^n\) is a linear code of dimension \(k = \varepsilon n\) with
Then \(\mathcal{C}_{\mathrm{out}}\circ \mathcal{C}_{\mathrm{in}}\) is a binary linear code of rate \(\varepsilon ^2\) with relative distance at least \(\frac{1-c\varepsilon }{2}\).