← All blueprints

Essential Coding Theory — Blueprint

16 Information Theory Strikes Back: Polar Codes

This chapter introduces Polar codes, a class of codes developed from purely information-theoretic insights, and shows how they resolve the question of getting arbitrarily close to capacity for \(\)BSC_p\( with block length and decoding time polynomial in \)1/ε\(. We assume basic notions of Entropy and Conditional Entropy from information theory (the book’s Appendix~ E); the basic facts we use are recorded below as settled leaves. \begin{thmenv}[Entropy, Def.~ E.1.2] \label{def:c16-entropy} \mathlibok For a discrete random variable \end{thmenv} \)X\( supported on a finite set \)Ω\(, the entropy of \)X\( is \[ H(X) = \sum _{x\in \Omega }\Pr [X=x]\log \frac{1}{\Pr [X=x]}, \] (logarithms base \)2\(, with the convention \)0(1/0)=0\(). \)

Definition
#
Definition 346 Conditional entropy, Def. E.2.2
#

For jointly distributed random variables

\(\)X,Y\(, the entropy of \)X\( conditioned on \)Y\( is \[ H(X\mid Y) = \mathbb {E}_{y\sim Y}\big[H(X\mid Y=y)\big]. \] \)

Definition
#
Lemma 347 Chain rule, Thm. E.2.4

For jointly distributed random variables

\(\)X,Y\(: \)H(X,Y) = H(Y) + H(X∣Y)\(. \)

Lemma
#
Lemma 348 Conditioning does not increase entropy, Lem. E.2.6

For jointly distributed random variables

\(\)X,Y\(: \)H(X∣Y)≤H(X)\(. \)

Lemma
#
Lemma 349 Uniform distribution maximizes entropy, Lem. E.1.3

If

\(\)X\( is supported on \)Ω\( then \)H(X)≤|Ω|\(. \)

Lemma
#

16.1 Achieving Gap to Capacity

Recall that for \(\)p[0,1/2)\(, \)BSC_p\( takes input \)X=(X_1,…,X_n)\( and outputs \)Y=(Y_1,…,Y_n)\( where independently for each \)i\(, \)Y_i=X_i\( with probability \)1-p\( and \)Y_i≠X_i\( with probability \)p\(; we write \)Z=(Z_1,…,Z_n)Bern(p)^n\( for the error pattern, so \)Y=X+Z\(. \begin{thmenv}[Main theorem, Thm.~ 16.1.1] \label{thm:c16-main-capacity} \uses{def:c06-bsc, prop:c16-compression-to-code, thm:c16-strong-polarization, prop:c16-encoder-runtime, cor:c16-bpd-correctness, prop:c16-basic-polar-rate} For every \end{thmenv} \)p[0,1/2)\( there is a polynomial \)n_0(·)\( such that for every \)ε>0\( there exist \)k,n\( and an encoder \)E:F_2^kF_2^n\( and decoder \)D:F_2^nF_2^k\( satisfying: \begin{itemize} \item \textbf{Length and Rate.} \end{itemize}\)1/ε≤n≤n_0(1/ε)\( and \)k≥(1-H(p)-ε)·n\(. \item \textbf{Running times.} \)E\( and \)D\( run in time \)O(nn)\( (the \)O(·)\( hides a universal constant independent of \)p,ε\(). \item \textbf{Failure Probability.} For every \)mF_2^k\(, \[ \Pr _{\mathbf{Z}\in \mathrm{Bern}(p)^n}[\mathbf{m}\ne D(E(\mathbf{m})+\mathbf{Z})]\le \varepsilon . \] \)

Theorem
#
Proof

By Proposition 353, it suffices to find, given

\(\)p[0,1/2)\( and \)ε>0\(, an \)(ε,ε/n)\(-polarizing matrix \)PF_2^n×n\( with \)n\( bounded by a polynomial in \)1/ε\(, such that multiplication by \)P\( and decompression both take \)O(nn)\( time. Theorem~ \ref{thm:c16-strong-polarization} applied with parameters \)p,ε\( and \)c=2\( yields \)n\( and the matrix \)P=P_nF_2^n×n\( (Definition~ \ref{def:c16-basic-polarizing-matrix}) that is \)(ε,1/n^2)\(-polarizing. Since Theorem~ \ref{thm:c16-strong-polarization} also guarantees \)ε≥1/n\(, we have that \)P_n\( is \)(ε,ε/n)\(-polarizing. Furthermore there exists a set \)S⊆[n]\( with \)|S|≤(H(p)+ε)n\( such that \)H(W_S∣W_S)≤ε\( when \)W=Z·P\( and \)Z∼Bern(p)^n\(; equivalently (Proposition~ \ref{prop:c16-basic-polar-rate}) the rate of the resulting Basic Polar Code is at most \)H(p)+ε\(. By Proposition~ \ref{prop:c16-encoder-runtime}, compressing \)ZF_2^n\( to \)(Z·P)_S\( takes \)O(nn)\( time. Finally Corollary~ \ref{cor:c16-bpd-correctness} asserts that a decompression \)Z\( can be computed given \)(Z·P)_S\( in time \)O(nn)\(, and that \)Z\( equals \)Z\( with probability at least \)1-ε\(. Composing with the linear-algebraic reduction of Proposition~ \ref{prop:c16-compression-to-code} completes the proof. \)

Proof
Theorem 351 Strengthened main theorem, Thm. 16.1.2
#

There exist polynomially growing functions

\(\)n_0:[0,1]Z^+\( and \)T:Z^+Z^+\( such that for all \)p[0,1]\(, \)ε>0\( there exists \)δ>0\(, a function \)k:Z^+Z^+\( and an ensemble of functions \)E={E_n}_n\( and \)D={D_n}_n\( such that for all \)nZ^+\( with \)n≥n_0(1/ε)\( the following hold. \begin{enumerate} \item The codes are \end{enumerate}\)ε\(-close to capacity: \)k=k(n)≥(1-H(p)-ε)n\(, \)E_n:F_2^kF_2^n\( and \)D_n:F_2^nF_2^k\(. \item The codes correct a \)p\(-fraction of errors with all but exponentially small probability: \[ \Pr _{\mathbf{Z}\sim \mathrm{Bern}(p)^n,\ \mathbf{X}\sim U(\mathbb {F}_2^k)}[D_n(E_n(\mathbf{X})+\mathbf{Z})\ne \mathbf{X}]\le \exp (-\delta n). \] \item Encoding and decoding are efficient: \)E_n\( and \)D_n\( run in time at most \)T(n/ε)\(. \)

Theorem
#

16.2 Reduction to Linear Compression

Definition 352 \(\)τ\(-error linear compression scheme\)
#

For \(\)n≥m\(, a pair \)(H,D)\( with \)HF_2^n×m\( and \)D:F_2^mF_2^n\( forms a \)\(\)τ\(-error linear compression scheme\) for \(\)Bern(p)^n\( if \)H\( has rank \)m\( and \[ \Pr _{\mathbf{Z}\sim \mathrm{Bern}(p)^n}[D(\mathbf{Z}\cdot \mathbf{H})\ne \mathbf{Z}]\le \tau . \] The ratio \)m/n\( is called the \emph{(compression) rate} of the scheme. \)

Proposition 353 Prop. 16.2.1

Let

\(\)(H,D)\( be a \)τ\(-error linear compression scheme for \)Bern(p)^n\( with \)HF_2^n×m\(. Let \)k=n-m\( and let \)GF_2^k×n\( and \)G^*:F_2^n×k\( be full-rank matrices such that \)G·H=0\( and \)G·G^*=I_k\( (the \)k×k\( identity matrix). Then the encoder \)E:F_2^kF_2^n\( given by \)E(X)=X·G\( and the decoder \)D’:F_2^nF_2^k\( given by \)D’(Y)=(Y-D(Y·H))·G^*\( satisfy, for every \)mF_2^k\(, \[ \Pr _{\mathbf{Z}\in \mathbb {F}_2^n}[\mathbf{m}\ne D'(E(\mathbf{m})+\mathbf{Z})]\le \tau . \] \)

Proposition
#
Proof

Suppose

\(\)D(Z·H)=Z\(; we claim decoding is successful. Indeed, \begin{align*} D’(E(\mathbf{m})+\mathbf{Z}) & = (E(\mathbf{m})+\mathbf{Z}-D((E(\mathbf{m})+\mathbf{Z})\cdot \mathbf{H}))\cdot \mathbf{G}^*\\ & = (E(\mathbf{m})+\mathbf{Z}-D(\mathbf{Z}\cdot \mathbf{H}))\cdot \mathbf{G}^*\\ & = (E(\mathbf{m})+\mathbf{Z}-\mathbf{Z})\cdot \mathbf{G}^*\\ & = E(\mathbf{m})\cdot \mathbf{G}^*\\ & = \mathbf{m}. \end{align*} The second equality uses \)E(m)·H=m·G·H=0\(; the third uses the assumption \)D(Z·H)=Z\(; the last uses \)E(m)=m·G\( and \)G·G^*=I\(. Thus the probability of a decoding failure is at most the probability of a decompression failure, which by definition is at most \)τ\(. \)

Proof

16.3 The Polarization Phenomenon

16.3.1 Information Theory Review

Proposition 354 Prop. 16.3.1

Let

\(\)α≥0\(. \begin{enumerate} \item Let \end{enumerate}\)X\( be a random variable with \)H(X)≤α\(. Then there exists \)x\( such that \)_X[X≠x]≤α\(. \item Let \)(X,Y)\( be jointly distributed variables with \)H(X∣Y)≤α\(. Then the function \)A(y)=argmax_x{[X=x∣Y=y]}\( satisfies \)_(X,Y)[X≠A(Y)]≤α\(. \)

Proposition
#
Proof

For part (1), for

\(\)i\( in the support of \)X\( let \)p_i=_X[X=i]\( and let \)x=argmax_i{p_i}\(; let \)γ=1-p_x=_X[X≠x]\(. We show \)γ≤α\(. If \)γ≤1/2\(, \begin{align*} \alpha \ge H(X) = \sum _i p_i\log \frac1{p_i} & \ge \sum _{i\ne x}p_i\log \frac1{p_i} \ge \sum _{i\ne x}p_i\log \frac1{\sum _{j\ne x}p_j}\\ & = \Big(\sum _{i\ne x}p_i\Big)\log \frac1{\sum _{j\ne x}p_j} = \gamma \log \frac1\gamma \ge \gamma , \end{align*} where the first inequality drops the non-negative \)x\(-summand, the second uses \)p_i≤∑_j≠xp_j\( for \)i≠x\(, and the last uses \)γ≤1/2\( so \)(1/γ)≥1\(. If \)γ>1/2\(, then \)α≥H(X)=∑_ip_i(1/p_i)≥∑_ip_i(1/p_x)=(1/p_x)=(1/(1-γ))≥1≥γ\(, using \)p_i≤p_x\( for all \)i\( and \)γ≥1/2\(. For part (2), given \)y\( and \)i\(, let \)p_i,y=_X[X=i∣Y=y]\( and let \)x_y=argmax_i{p_i,y}\(. Let \)γ_y=1-p_x_y,y\(, so \)γ=E_Y[γ_Y]\( where \)γ=_(X,Y)[X≠A(Y)]\(. Let \)α_y=H(X∣Y=y)\(, so \)α=H(X∣Y)=E_Y[α_Y]\(. By part (1) applied to the conditional distribution of \)X\( given \)Y=y\(, we have \)γ_y≤α_y\( for every \)y\(, and hence \)γ=E_Y[γ_Y]≤E_Y[α_Y]=α\(. \)

Proof

16.3.2 Polarized Matrices and Decompression

Definition 355 Polarizing matrix, unpredictable columns, Def. 16.3.2

An invertible matrix

\(\)PF_2^n×n\( is \)\(\)(ε,τ)\(-polarizing for \)Bern(p)^n\(\) if, writing \(\)W=Z·P\( for \)ZBern(p)^n\( and \)W_<i=(W_1,…,W_i-1)\(, the set \[ S = S_\tau = \{ i\in [n]\mid H(W_i\mid \mathbf{W}_{{\lt}i})\ge \tau \} \] satisfies \)|S|≤(H(p)+ε)n\(. We refer to \)S\( as the set of \emph{unpredictable columns} of \)P\( (and \){W_i}_iS\( as the \emph{unpredictable bits} of \)W\(). \)

Definition
#
Algorithm 356 Polar Compressor, Alg. 16.3.1

Input: A polarizing matrix

\(\)PF_2^n×n\(, a string \)ZF_2^n\(, and a subset \)S⊆[n]\(. \textbf{Output:} A compressed string \)WF_2^|S|\(. \textbf{Procedure:} return \)(Z·P)_S\(, i.e., the coordinates of \)Z·P\( indexed by \)S\(. \)

Algorithm
#

If \(\)P_S\( denotes the \)n×|S|\( matrix whose columns correspond to the indices in \)S\(, then the compression of \)Z\( equals \)Z·P_S\(. Thus \)P_S\( plays the role of \)H\( from Proposition~ \ref{prop:c16-compression-to-code}, with \)G=(P^-1)_S\( and \)G^*=P_S\( (where \)S\( is the complement of \)S\(); in particular the complexity of multiplying a vector by \)P\( and \)P^-1\( dominates the encoding/decoding cost. \begin{thmenv}[Successive Cancellation Decompressor, Alg.~ 16.3.2] \label{alg:c16-scd} \uses{def:c16-polarizing-matrix, prop:c16-predictable} \textbf{Input:} \end{thmenv} \)S⊆[n]\(, \)WF_2^S\(, and a polarizing matrix \)PF_2^n×n\(. \textbf{Output:} \)ZF_2^n\( such that \)(Z·P)_S=W\(. \textbf{Procedure} \)SCD(W,P,S)\(: \begin{enumerate} \item For \end{enumerate}\)i=1\( to \)n\(: if \)iS\(, set \)W_iW_i\(; otherwise set \)W_iargmax_bF_2{[W_i=b∣W_<i=W_<i]}\( (using the predictor of Proposition~ \ref{prop:c16-predictable}). \item Return \)ZW·P^-1\(. \)

Algorithm
#
Lemma 358 Lem. 16.3.3

If

\(\)P\( is \)(ε,τ)\(-polarizing for \)Bern(p)^n\( with unpredictable columns \)S\(, then the successive cancellation decoder has failure probability at most \)τn\(: \[ \Pr _{\mathbf{Z}\in \mathrm{Bern}(p)^n}[\mathbf{Z}\ne \mathrm{SCD}((\mathbf{Z}\cdot \mathbf{P})_S,\mathbf{P},S)]\le \tau n. \] Thus, if \)P\( is \)(ε,ε/n)\(-polarizing then the failure probability of the successive cancellation decoder is at most \)ε\(. \)

Lemma
#
Proof

Let

\(\)W=Z·P\(. By Step~ 1 of Algorithm~ \ref{alg:c16-scd}, \)W_i=W_i\( for every \)iS\(. For \)iS\(, applying Proposition~ \ref{prop:c16-predictable}(2) with \)X=W_i\(, \)Y=W_<i\(, \)α=τ\( we get \)_Z[W_i≠A_i(W_<i)]≤τ\( where \)A_i(·)\( is the predictor of Proposition~ \ref{prop:c16-predictable}. By a union bound, \[ \Pr _{\mathbf{Z}}[\exists i \text{ s.t. } W_i\ne A_i(\widetilde{\mathbf{W}}_{{\lt}i})]\le \sum _{i=1}^n\Pr _{\mathbf{Z}}[W_i\ne A_i(\widetilde{\mathbf{W}}_{{\lt}i})]\le \tau n. \] By Step~ 1 and definition of \)A_i(·)\(, \)W_i=A_i(W_<i)\( for \)iS\(. Thus if \)W≠W\( there must exist some \)i\( with \)W_i≠W_i=A_i(W_<i)\(; hence \)[W≠W]≤τn\(. Since \)P\( is invertible, \)W≠W\( iff \)Z=W·P^-1≠W·P^-1=SCD(…)\(, so \)_Z[Z≠SCD((Z·P)_S,P,S)]≤τn\(. \)

Proof

16.3.3 A Polarizing Primitive

We now describe an explicit construction of a polarizing matrix. The basic idea is to start with a simple \(\)2×2\( \emph{polarization} step and iterate it. For two independent bits \)Z_1,Z_2\( with \)H(Z_1)=H(Z_2)=H(p)\(, of the four linear combinations of \)(Z_1,Z_2)\( over \)F_2\(, three are trivial (\)0\(, \)Z_1\(, \)Z_2\(); the only non-trivial new combination is \)Z_1+Z_2\(. This leads to the transform \)P_2:(Z_1,Z_2)↦(W_1,W_2)=(Z_1+Z_2,Z_2)\(, an invertible linear map given by the matrix \)P_2=

1

0

1

1

\(. One can show \)H(W_1,W_2)=2H(p)\( while \)H(W_1)>H(Z_1)=H(Z_2)=H(p)\( (since \)[W_1=1]=2p(1-p)(p,1/2)\( and \)H(·)\( is increasing there), and hence by the chain rule \)H(W_2∣W_1)=H(W_1,W_2)-H(W_1)<H(Z_1),H(Z_2)\(: the transform separates the entropy of two equally-entropic bits into a more entropic bit and a less entropic one. Iterating this \)2×2\( transform recursively on blocks yields, for \)n\( a power of two, a linear transform \)P_n\( on \)n\( bits (Figure~ 16.2 in the source); this is formalized as Definition~ \ref{def:c16-basic-polarizing-matrix} below. \)

16.4 Polar Codes, Encoder and Decoder

16.4.1 The Code and Polarization Claims

Definition 359 Basic polarizing matrix, Def. 16.4.1
#

We define the

\(\)n×n\( polarization matrix \)P_n\( recursively for \)n=2,4,8,…\(, by describing the linear map \)P_n:Z↦Z·P_n\(. For \)n=2\( and \)ZF_2^2\(, \)P_2(Z)=(Z_1+Z_2,Z_2)\(. For \)n=2^t\( and \)Z=(U,V)\( with \)U,VF_2^n/2\(, \[ P_n(\mathbf{Z}) = \big(P_{n/2}(\mathbf{U}+\mathbf{V}),\, P_{n/2}(\mathbf{V})\big). \] \)

Definition
#
Theorem 360 (Polynomially) Strong Polarization, Thm. 16.4.2

Fix

\(\)p(0,1/2)\( and constant \)c\(. There exists a polynomial function \)n_0\( such that for every \)ε>0\(, there exists \)n=2^t\( with \)1/ε≤n≤n_0(1/ε)\( and a set \)E⊆[n]\( with \)|E|≤(ε/2)·n\( such that for every \)iE\(, the conditional entropy \)H(W_i∣W_<i)\( (where \)W=P_n(Z)\(, \)Z∼Bern(p)^n\() is either less than \)n^-c\(, or greater than \)1-n^-c\(. Furthermore, if \)S={i[n]∣H(W_i∣W_<i)≥n^-c}\( then \)|S|≤(H(p)+ε)·n\( and the matrix \)P_n\( is \)(ε,1/n^c)\(-polarizing for \)Bern(p)^n\( with unpredictable columns \)S\(. \)

Theorem
#
Proof

We assume without loss of generality

\(\)c≥1\( (proving the theorem for larger \)c\( implies it for smaller \)c\(). Given \)p\( and \)c≥1\(, let \)γ=2^-c\(. Let \)β<∞\( and \)α<1\( be the constants given by Definition~ \ref{def:c16-strong-polarization-seq} for this choice of \)γ\(, as guaranteed to exist by Lemma~ \ref{lem:c16-local-to-global} applied to the locally polarizing, simple sequence of Lemma~ \ref{lem:c16-local-polarization}. Set \)n_0(x)={8x,2(2βx)^(1/α)}\(, a polynomial in \)x\(. Given \)ε>0\(, let \)t={(8/ε),(2β/ε)/(1/α)}\(, so that \)n=2^t≤{8/ε,2(2β/ε)^1/(1/α)}=n_0(1/ε)\(. This choice gives \)β·α^t≤ε/2\( and \)γ^t=2^-ct=n^-c\(. Also \)n≥4/ε\(, so \)2n^-c≤2n^-1≤ε/2\(. Recall the notation \)(Z_i^(j))\( of Section~ 16.5.1 (see Definition~ \ref{def:c16-jth-siblings} and Lemma~ \ref{lem:c16-pn-via-levels}), where \)W=P_n(Z)=(Z_1^(t),…,Z_n^(t))\(, and let \)X_j=H(Z_i^(j)∣Z_<i^(j))\( for \)i\( chosen uniformly at random from \)[n]\(. By Lemma~ \ref{lem:c16-local-polarization} the sequence \)X_0,…,X_t,…\( is simple and locally polarizing, so by Lemma~ \ref{lem:c16-local-to-global} it strongly polarizes; by the definition of strong polarization (with the above choice of \)γ,α,β,t\(), \begin{align*} \Pr _{i\in [n]}\big[H(W_i\mid \mathbf{W}_{{\lt}i})\in (n^{-c},1-n^{-c})\big] & = \Pr _i\big[H(Z_i^{(t)}\mid \mathbf{Z}_{{\lt}i}^{(t)})\in (\gamma ^t,1-\gamma ^t)\big]\\ & = \Pr [X_t\in (\gamma ^t,1-\gamma ^t)] \le \beta \alpha ^t \le \varepsilon /2. \end{align*} Thus the set \)E={i[n]∣H(W_i∣W_<i)(n^-c,1-n^-c)}\( satisfies \)|E|≤εn/2\(. For the "Furthermore" part, by the chain rule \)nH(p)=∑_i[n]H(W_i∣W_<i)≥∑_iS∖EH(W_i∣W_<i)≥(|S|-|E|)(1-n^-c)\(, where the definitions of \)S,E\( give the last inequality. Rearranging, \[ |S|\le \frac{nH(p)}{1-n^{-c}}+\varepsilon n/2\le nH(p)(1+2n^{-c})+\varepsilon n/2\le nH(p)+\varepsilon n. \] \)

Proof
Definition 361 Basic Polar (Compressing) Code, Def. 16.4.3

Given

\(\)0<p<1/2\( and \)ε≤1/4\(, let \)n\( and \)S⊆[n]\( be as given by Theorem~ \ref{thm:c16-strong-polarization} for \)c=2\(. The Basic Polar Code with parameters \)p\( and \)ε\( maps \)ZF_2^n\( to \)P_n(Z)_S\(. \)

Definition
#
Proposition 362 Rate of the Basic Polar Code, Prop. 16.4.4

For every

\(\)p(0,1/2)\( and \)ε>0\(, the rate of the Basic Polar Code with parameters \)p\( and \)ε\( is at most \)H(p)+ε\(. \)

Proposition
#
Proof

The compression scheme maps

\(\)ZF_2^n\( to \)P_n(Z)_SF_2^|S|\(, so its rate is \)|S|/n\( (Definition~ \ref{def:c16-linear-compression}). By Theorem~ \ref{thm:c16-strong-polarization} (applied with \)c=2\( as in Definition~ \ref{def:c16-basic-polar-code}), \)|S|≤(H(p)+ε)n\(, so the rate is at most \)H(p)+ε\(. \)

Proof

16.4.2 Encoding

Algorithm 363 Basic Polar Encoder, Alg. 16.4.1

Input:

\(\)n\( a power of two, \)ZF_2^n\(, and \)S⊆[n]\(. \textbf{Output:} Compression \)WF_2^S\( of \)Z\( given by \)W=(P_n(Z))_S\(. \textbf{Procedure:} return \)W=(P(n,Z))_S\(, where the recursive function \)P(n,Z)\( is: if \)n=1\( return \)Z\(; else, letting \)U=(Z_1,…,Z_n/2)\( and \)V=(Z_n/2+1,…,Z_n)\(, return \)(P(n/2,U+V), P(n/2,V))\(. \)

Algorithm
#
Proposition 364 Prop. 16.4.5

The runtime of the Basic Polar Encoder algorithm is

\(\)O(nn)\(. \)

Proposition
#
Proof

If

\(\)T(n)\( denotes the runtime of the algorithm on inputs of length \)n\(, then \)T(·)\( satisfies the recurrence \)T(n)=2T(n/2)+O(n)\(, since all operations other than the two recursive calls (namely computing \)U+V\( and assembling the output) can be done in \)O(n)\( time. This recurrence implies \)T(n)=O(nn)\(. \)

Proof

16.4.3 Decoding

Proposition 365 Bias update functions

Let

\(\)Z_1∼Bern(p_1)\( and \)Z_2∼Bern(p_2)\( be independent. Then \)Z_1+Z_2∼Bern(b^+(p_1,p_2))\( where \[ b^+(p_1,p_2) = p_1(1-p_2)+(1-p_1)p_2. \] Moreover, conditioned on \)Z_1+Z_2=a\( for \)a{0,1}\(, \)Z_2\( is a Bernoulli random variable with bias \)b^ℓ(p_1,p_2,a)\( given by \[ b^\ell (p_1,p_2,0)=\frac{p_1p_2}{p_1p_2+(1-p_1)(1-p_2)},\qquad b^\ell (p_1,p_2,1)=\frac{(1-p_1)p_2}{(1-p_1)p_2+p_1(1-p_2)}. \] Both \)b^+\( and \)b^ℓ\( are computable in \)O(1)\( arithmetic operations. \)

Proposition
#
Proof

The formula for

\(\)b^+\( is immediate: \)Z_1+Z_2=1\( iff exactly one of \)Z_1,Z_2\( equals \)1\(, which by independence has probability \)p_1(1-p_2)+(1-p_1)p_2\(. For \)b^ℓ\(, by Bayes' rule and independence of \)Z_1,Z_2\(, for \)a{0,1}\( \[ \Pr [Z_2=1\mid Z_1+Z_2=a] = \frac{\Pr [Z_2=1]\Pr [Z_1=a\ominus 1]}{\Pr [Z_1+Z_2=a]}, \] where \)a⊖1\( denotes \)a-1 2\(; for \)a=0\( this equals \)p_2p_1/(p_1p_2+(1-p_1)(1-p_2))\( (the event \)Z_1+Z_2=0\( splits into \)(Z_1,Z_2)=(0,0)\( or \)(1,1)\() and for \)a=1\( it equals \)p_2(1-p_1)/((1-p_1)p_2+p_1(1-p_2))\( similarly. Each formula is a fixed rational expression in \)p_1,p_2\(, hence computable in \)O(1)\( time. \)

Proof
Algorithm 366 Basic Polar Decoder, Alg. 16.4.2

Input:

\(\)n\( a power of two, \)W(F_2∪{?})^n\(, and \)0≤p<1/2\(. \textbf{Output:} \)ZF_2^n\( such that for every \)i\(, either \)W_i=?\( or \)(Z·P)_i=W_i\(. \textbf{Procedure} \)BPD(W;n,p)\(: let \)(Z,W,ρ)=RPD(W;n,(p,…,p))\( and return \)Z\(, where the recursive subroutine \)RPD(W;n,(p_1,…,p_n))\( (with \)W(F_2∪{?})^n\() is defined by: \begin{itemize} \item If \end{itemize}\)n=1\( and \)W_1F_2\(: return \)(W_1,W_1,p_1)\(. \item If \)n=1\( and \)W_1=?\(: return \)(1,1,p_1)\( if \)p_1≥1/2\(, else \)(0,0,p_1)\(. \item If \)n≥2\(: write \)W=(W^(1),W^(2))\( with \)W^(1)=W[1,…,n/2]\( and \)W^(2)=W[n/2+1,…,n]\(. For \)i=1,…,n/2\( let \)q_i=b^+(p_i,p_n/2+i)\(. Let \)(X,W^1,ρ^1)=RPD(W^(1);n/2,(q_1,…,q_n/2))\(. For \)i=1,…,n/2\( let \)r_i=b^ℓ(p_i,p_n/2+i,X_i)\(. Let \)(Y,W^2,ρ^2)=RPD(W^(2);n/2,(r_1,…,r_n/2))\(. Let \)Z=(X+Y,Y)\(, \)W=(W^1,W^2)\(, \)ρ=(ρ^1,ρ^2)\(; return \)(Z,W,ρ)\(. \)

The output \(\)ρ=(ρ_1,…,ρ_n)\( records, for each \)i\(, the value \)ρ_i=_Z[W_i=1∣W_<i=W_<i]\( when \)Z∼Bern(p_1)××Bern(p_n)\(. \)

Algorithm
#
Lemma 367 Lem. 16.4.6

Let

\(\)Z∼Bern(p_1)××Bern(p_n)\( and \)W=P_n(Z)\(. Let \)S⊆[n]\( and let \)W’=(W_1’,…,W_n’)\( be given by \)W_i’=W_i\( if \)iS\( and \)W_i’=?\( otherwise. Let \)(Z,W,ρ)=RPD(W’;n,(p_1,…,p_n))\(. If \)H(W_i∣W_<i)≤τ\( for every \)iS\(, then: \begin{enumerate} \item For every \end{enumerate}\)i\(, \)_Z[W_i=1∣W_<i=W_<i]=ρ_i\(. \item \)_Z[W≠W]≤τn\(. \item \)_Z[Z≠Z]≤τn\(. \)

Lemma
#
Proof

The main part is part (1); part (2) follows almost immediately from part (1) and Proposition 354; part (3) is then straightforward. We prove them in turn.

Part (1): \(\widetilde{\mathbf{W}}=P_n(\tilde{\mathbf{Z}})\), by induction on \(n\). For

\(\)n=1\(, the base cases of \)RPD\( directly show \)W_1=Z_1\(. For larger \)n\( (writing \)Z=(U,V)\(), by induction \)W^1=P_n/2(X)\( and \)W^2=P_n/2(Y)\(, and the last step of the recursive procedure gives \[ P_n(\tilde{\mathbf{Z}}) = P_n((\tilde{\mathbf{X}}+\tilde{\mathbf{Y}},\tilde{\mathbf{Y}})) = \big(P_{n/2}(\tilde{\mathbf{X}}+\tilde{\mathbf{Y}}+\tilde{\mathbf{Y}}),P_{n/2}(\tilde{\mathbf{Y}})\big) = (\widetilde{\mathbf{W}}^1,\widetilde{\mathbf{W}}^2) = \widetilde{\mathbf{W}}, \] by Definition~ \ref{def:c16-basic-polarizing-matrix} and \)Y+Y=0\(. Now we prove \)ρ_i=_Z[W_i=1∣W_<i=W_<i]\( by induction on \)n\(; the base case \)n=1\( is immediate from the definition of \)RPD\(. For the inductive step, consider two cases. If \)i≤n/2\(: then \)W_i=P_n/2(U+V)_i\( via the definition of \)P_n\(, and \)(U+V)∼Bern(q_1)××Bern(q_n/2)\( by Proposition~ \ref{prop:c16-bias-functions}. By the inductive claim applied to the recursive call \)RPD(W’[1,…,n/2];n/2,(q_1,…,q_n/2))\(, \)ρ_i=ρ_i^(1)=_(U+V)[W_i=1∣W_<i=W_<i] =_Z∼Bern(p_1)××Bern(p_n)[W_i=1∣W_<i=W_<i]\(, where the last equality follows from the definition of \)q_i\( (this uses that \)W_<i\( for \)i≤n/2\( is a deterministic function of \)U+V\( alone). If \)i>n/2\(: the condition \)W[1,…,n/2]=W[1,…,n/2]\( is equivalent to \)U+V=X\( (using \)W^1=P_n/2(X)\(, invertibility of \)P_n/2\(, and part (1) proven above). Conditioning on \)U+V=X\( implies, by Proposition~ \ref{prop:c16-bias-functions}, that \)V∼Bern(r_1)××Bern(r_n/2)\( — exactly how the \)r_i\('s were defined. Since \)W[n/2+1,…,n]=P_n/2(V)\(, we get \[ \Pr _{\mathbf{Z}}[W_i=1\mid \mathbf{W}_{{\lt}i}=\widetilde{\mathbf{W}}_{{\lt}i}] = \Pr _{\mathbf{V}\sim \mathrm{Bern}(r_1)\times \cdots \times \mathrm{Bern}(r_{n/2})}[W_i=1\mid \mathbf{W}_{{\lt}i}=\widetilde{\mathbf{W}}_{{\lt}i}], \] which by induction on the recursive call for \)W^(2)\( equals \)ρ^(2)_i-n/2=ρ_i\(. This concludes part (1). \textbf{Part (2).} If \)iS\(, by the base-case/pass-through structure of \)RPD\( we have \)W_i=W_i\(. Now assume \)iS\(. We claim \)W_i=1\( iff \)ρ_i=_Z[W_i=1∣W_<i=W_<i]≥1/2\(: indeed, when \)RPD\( is called on the base-case input \)(W_i’,1,p_i’)\( with \)W_i’=?\(, part (1) gives \)ρ_i=p_i’\(, and then \)W_i=argmax_bF_2[W_i=b∣W_<i=W_<i]\( by the base-case rule of Algorithm~ \ref{alg:c16-bpd}. By Proposition~ \ref{prop:c16-predictable} applied to \)X=W_i\(, \)Y=W_<i\(, \)α=τ\(: \)[W_i≠W_i∣W_<i=W_<i]≤τ\(. By a union bound over \)i\(, \)[∃i s.t. W_i≠W_i∣W_<i=W_<i]≤τn\(. If \)W≠W\(, there must exist an index \)i\( with \)W_i≠W_i\(, so \)[W≠W]≤τn\(. \textbf{Part (3).} By part (1), \)W=P_n(Z)\(; also \)W=P_n(Z)\(. If \)W=W\( (which holds with probability at least \)1-τn\( by part (2)), invertibility of \)P_n\( gives \)Z=P_n^-1(W)=P_n^-1(W)=Z\(. \)

Proof
Corollary 368 Cor. 16.4.7

For every input

\(\)n=2^t\(, \)p[0,1/2)\(, and \)W(F_2∪{?})^n\(, the Basic Polar Decoder (Algorithm~ \ref{alg:c16-bpd}) runs in time \)O(nn)\( and computes an output \)ZF_2^n\( such that \)P_n(Z)_i=W_i\( holds for every \)i\( for which \)W_i≠?\(. Furthermore, if \begin{enumerate} \item \end{enumerate}\)Z∼Bern(p)^n\(, \item \)W_S=P_n(Z)_S\(, and \item \)H(W_i∣W_<i)≤τ\( for every \)iS\(, \)

then \(\)_Z[Z≠Z]≤τn\(. \)

Corollary
#
Proof

\(\)BPD\( calls \)RPD(W;n,(p,…,p))\(, i.e., Lemma~ \ref{lem:c16-rpd-correctness} with \)p_1==p_n=p\(. The claimed output relation \)P_n(Z)_i=W_i\( for \)W_i≠?\( follows from part (1)/(2) of Lemma~ \ref{lem:c16-rpd-correctness} (taking \)S={i:W_i≠?}\(, so \)W=W\( deterministically on the "given" coordinates by construction of \)RPD\(, and \)W=P_n(Z)\( by part (1) of the lemma's proof). Under hypotheses (1)-(3), part (3) of Lemma~ \ref{lem:c16-rpd-correctness} directly gives \)_Z[Z≠Z]≤τn\(. For the runtime: \)RPD\( satisfies the same divide-and-conquer recursion \)T(n)=2T(n/2)+O(n)\( as the Basic Polar Encoder (Proposition~ \ref{prop:c16-encoder-runtime}), since computing \)q_i=b^+(p_i,p_n/2+i)\( and \)r_i=b^ℓ(p_i,p_n/2+i,X_i)\( for all \)i=1,…,n/2\(, as well as assembling \)Z,W,ρ\(, takes \)O(n)\( time given that \)b^+,b^ℓ\( are computable in \)O(1)\( time (Proposition~ \ref{prop:c16-bias-functions}). Hence \)T(n)=O(nn)\(, as claimed. \)

Proof

16.5 Analysis: Speed of Polarization

16.5.1 Overview of Analysis

Write \(\)n=2^t\(. For \)i[n]\(, \)0≤j≤t\(, let \)Z_i^(0)=Z_i\(, and for \)1≤j≤t\( and \)i\( such that \)i\( and \)i+2^t-j\( lie in the same block at level \)j\(, let \[ (Z_i^{(j)},Z_{i+2^{t-j}}^{(j)}) = P_2(Z_i^{(j-1)},Z_{i+2^{t-j}}^{(j-1)}) = (Z_i^{(j-1)}+Z_{i+2^{t-j}}^{(j-1)},\, Z_{i+2^{t-j}}^{(j-1)}). \] \)

Definition 369 \(\)j\(th-level siblings, Def.~ 16.5.1\)

A pair \(\)(i,i’)\( are \)\(\)j\(th-level siblings\) if they are in the same block at the \(\)j\(th level and \)i’=i+2^t-j\(. For \)i[n=2^t]\( and \)0≤j≤t\(, let \[ S_{i,j} \stackrel{\text{def}}{=} \{ k\in [n]\mid i\equiv k\! \! \pmod{2^{t-j}}\} . \] \)

Lemma 370 Eq. (16.12)

\(\)P_n(Z) = (Z_1^(t),…,Z_n^(t))\(. \)

Lemma
#
Proof

By induction on

\(\)t\( (i.e., on \)n=2^t\(). For \)t=0\( (\)n=1\(), \)P_1(Z)=Z=(Z_1^(0))\( trivially. For \)t≥1\(, write \)Z=(U,V)\( with \)U=(Z_1,…,Z_n/2)\(, \)V=(Z_n/2+1,…,Z_n)\(; by Definition~ \ref{def:c16-basic-polarizing-matrix}, \)P_n(Z)=(P_n/2(U+V),P_n/2(V))\(. By definition, for \)i≤n/2\(, \)Z_i^(1)=Z_i+Z_i+n/2=U_i+V_i\(, i.e., the first block at level \)1\( consists of \)U+V\(; for \)i>n/2\(, \)Z_i^(1)=V_i-n/2\( carries \)V\( unchanged. Applying the inductive hypothesis (for \)t-1\() to \)U+V\( gives \)P_n/2(U+V)=((U+V)_1^(t-1),…,(U+V)_n/2^(t-1))\(, and since level-\)j\( variables (\)j≥1\() for indices \)≤n/2\( are computed entirely from \)Z^(1)|_{1,…,n/2}=U+V\( by the same recursive rule shifted by one level, \)(U+V)_i^(t-1)=Z_i^(t)\( for \)i≤n/2\(. Symmetrically, \)P_n/2(V)_i=Z_n/2+i^(t)\( for \)i≤n/2\(. Combining, \)P_n(Z)=(Z_1^(t),…,Z_n^(t))\(. \)

Proof
Definition 371 Local Polarization, Def. 16.5.2

A sequence of random variables

\(\)X_0,…,X_j,…\( with \)X_j[0,1]\( is \emph{locally polarizing} if: \begin{enumerate} \item \textbf{(Unbiased)} For every \end{enumerate}\)j\( and \)a[0,1]\(: \)E[X_j+1∣X_j=a]=a\(. \item \textbf{(Variance in the middle)} For every \)τ>0\( there exists \)θ=θ(τ)>0\( such that for all \)j\(: if \)X_j(τ,1-τ)\( then \)|X_j+1-X_j|≥θ\(. \item \textbf{(Suction at the ends)} For every \)c<∞\( there exists \)τ=τ(c)>0\( such that (i) if \)X_j≤τ\( then \)[X_j+1≤X_j/c]≥1/2\(; and (ii) if \)1-X_j≤τ\( then \)[1-X_j+1≤(1-X_j)/c]≥1/2\(. \)

The sequence is simple if for every sequence \(\)a_0,…,a_j\(, conditioned on \)X_0=a_0,…,X_j=a_j\(, there are two values \)a^+,a^ℓ\( such that \)X_j+1\( equals \)a^+\( with probability \)1/2\( and \)a^ℓ\( with probability \)1/2\(. \)

Definition 372 (Polynomially) Strong Polarization, Def. 16.5.4
#

A sequence of random variables

\(\)X_0,X_1,…,X_t,…\( with \)X_j[0,1]\( \emph{strongly polarizes} if for all \)γ>0\( there exist \)α<1\( and \)β<∞\( such that for all \)t\(: \)[X_t(γ^t,1-γ^t)]≤β·α^t\(. \)

Definition
#

16.5.2 Local Polarization

Proposition 373 Prop. 16.5.6

For every

\(\)1≤j≤t\( and \)j\(th-level siblings \)i,i’\( with \)i<i’\(, the following hold: \begin{enumerate} \item \end{enumerate}\)S_i,j=S_i’,j=S_i,j-1∪S_i’,j-1\( (a disjoint union). \item \){Z_k^(j)∣kS_i,j}\( is independent of \){Z_k^(j)∣kS_i,j}\(. \item \)H(Z_i^(j-1)∣Z_<i^(j-1)) = HZ_i^(j-1)∣Z^(j-1)_{kS_i,j-1,k<i} = HZ_i^(j-1)∣Z^(j-1)_{kS_i,j,k<i}\(. \item \)H(Z_i’^(j-1)∣Z_<i’^(j-1)) = HZ_i’^(j-1)∣Z^(j-1)_{kS_i’,j-1,k<i’} = HZ_i’^(j-1)∣Z^(j-1)_{kS_i,j,k<i’}\(. \item \)H(Z_i^(j)∣Z_<i^(j)) = HZ_i^(j)∣Z^(j)_{kS_i,j,k<i} = HZ_i^(j)∣Z^(j-1)_{kS_i,j,k<i}\(. \item \)H(Z_i’^(j)∣Z_<i’^(j)) = HZ_i’^(j)∣{Z_i^(j)}∪Z^(j)_{kS_i,j,k<i’} = HZ_i’^(j)∣{Z_i^(j)}∪Z^(j-1)_{kS_i,j,k<i’}\(. \)

Proposition
#
Proof

Part (1) follows from the definitions of

\(\)S_i,j\( and of \)j\(th-level siblings: since \)i’=i+2^t-j\(, \)i≡i’2^t-j\(, giving \)S_i,j=S_i’,j\(. Also, \)k_1≡k_22^t-j+1\( implies \)k_1≡k_22^t-j\(, so \)S_i,j-1,S_i’,j-1⊆S_i,j\(; and if \)k≡i2^t-j+1\( then \)ki’2^t-j\( and vice versa, so \)S_i,j-1∩S_i’,j-1=∅\(. Part (2) follows from Lemma~ \ref{lem:c16-siblings-determined}: the set \){Z_k^(j)∣kS_i,j}\( is determined entirely by \){Z_k∣kS_i,j}\( (and similarly for \)kS_i,j\(, which is determined by \){Z_k∣kS_i,j}\(), and since the original \)Z_k\('s (\)k[n]\() are all independent, these two determined collections are independent. The first equality in part (3) follows immediately from part (2) (conditioning on variables independent of \)Z_i^(j-1)\( and \)Z_<i^(j-1)\( jointly does not change the entropy), and the second uses part (1) and the fact that \)S_i,j-1\( and \)S_i’,j-1\( are disjoint (the elements newly added going from \)S_i,j-1\( to \)S_i,j\( that are \)<i\( lie in \)S_i’,j-1\(, and by part (2) do not affect the conditional entropy of \)Z_i^(j-1)\(). The first equality in part (4) is similar; the second uses additionally that \)S_i’,j-1\( contains no elements between \)i\( and \)i’\(, together with part (2) (which gives that \)Z_i^(j-1)\( is independent of \)Z_i’^(j-1)\( and of \)Z^(j-1)_{kS_i,j-1,k<i}\(), and part (1). The first equalities in parts (5) and (6) are similar to part (3), using in part (6) that \){kS_i’,j∣k<i’}={i}∪{kS_i,j∣k<i}\(. The second equalities in parts (5) and (6) follow from Lemma~ \ref{lem:c16-siblings-bijection} (a one-to-one correspondence between the level-\)(j-1)\( and level-\)j\( variables indexed by \){kS_i,j∣k<i}\(, so conditioning on one set is equivalent to conditioning on the other). \)

Proof
Lemma 374 Lem. 16.5.7

For every

\(\)i\( and \)0≤j≤t\(, the set \){Z_k^(j)∣kS_i,j}\( is determined completely by \){Z_k∣kS_i,j}\(, and the \)Z_k\('s (for \)k[n]\() are all independent. \)

Lemma
#
Proof

The independence of the original

\(\)Z_k\('s is immediate, since \)Z∼Bern(p)^n\(. We prove the determinacy claim by induction on \)j\(. For \)j=0\(, \)Z_k^(0)=Z_k\(, so the claim is trivial. For \)j≥1\(: fix \)i\( and let \)i’=i+2^t-j\( be its \)j\(th-level sibling. Directly from the definition of \)S_i,j\(: \)S_i,j=S_i,j-1⊔S_i’,j-1\( (a disjoint union — both sets consist of indices congruent to \)i\( mod \)2^t-j+1\( or mod \)2^t-j+1\( shifted by \)2^t-j\( respectively, together covering exactly the residue class mod \)2^t-j\(). By the recursive rule \)(Z_k^(j),Z_k+2^t-j^(j))=P_2(Z_k^(j-1),Z_k+2^t-j^(j-1))\( (applied for each \)kS_i,j-1\(, pairing it with its counterpart in \)S_i’,j-1\(), every \)Z_k^(j)\( with \)kS_i,j\( is a function of \){Z_l^(j-1)∣lS_i,j-1∪S_i’,j-1}={Z_l^(j-1)∣lS_i,j}\(. By the inductive hypothesis applied separately to \)S_i,j-1\( and \)S_i’,j-1\( (both of which are sets of the same form \)S_k,j-1\( for a suitable representative \)k\(), this set is determined by \){Z_l∣lS_i,j-1}∪{Z_l∣lS_i’,j-1}={Z_l∣lS_i,j}\(, completing the induction. \)

Proof
Lemma 375 Lem. 16.5.8

There is a one-to-one map from the variables

\(\)Z^(j-1)_{kS_i,j, k<i}\( to the variables \)Z^(j)_{kS_i,j, k<i}\(, and so conditioning on one set is equivalent to conditioning on the other. \)

Lemma
#
Lemma 376 Lem. 16.5.9

\(\)(U,A)\( and \)(V,B)\( are identically distributed, where \)U=Z_i^(j-1)\(, \)V=Z_i’^(j-1)\(, \)A={Z_k^(j-1)∣kS_i,j-1,k<i}\( and \)B={Z_k^(j-1)∣kS_i’,j-1,k<i’}\(, for \)j\(th-level siblings \)i<i’\(. \)

Lemma
#
Lemma 377 Lem. 16.5.10

Let

\(\)p_1,p_2[0,1/2]\( with \)p_1<p_2\( and \)τ(0,1/2)\(, and let \)p_1○p_2p_1(1-p_2)+p_2(1-p_1)\(. Then: \begin{enumerate} \item \end{enumerate}\)H(p_1○p_2)≥H(p_2)\(. \item There exists \)θ=θ(τ)>0\( such that if \)H(p_1),H(p_2)(τ,1-τ)\( then \)H(p_1○p_2)-H(p_2)≥θ\(. \item If \)H(p_1),H(p_2)≤τ\( then \)H(p_1○p_2)≥(1-9/(1/τ))(H(p_1)+H(p_2))\(. In particular, for every \)c<∞\(, if \)τ≤2^-9c\( then \)H(p_1)+H(p_2)-H(p_1○p_2)≤(H(p_1)+H(p_2))/c\(. \item If \)H(p_1),H(p_2)≥1-τ\( and \)τ≤1-H(1/4)\( then \)H(p_1○p_2)≥1-20τ(1-H(p_2))\(. In particular, for every \)c’<∞\(, if \)τ<1/(20c’)\( then \)1-H(p_1○p_2)≤(1-H(p_2))/c’\(. \)

Lemma
#
Proof

Part (1) is immediate from the monotonicity of

\(\)H(·)\( on \)[0,1/2]\( (), since \)p_2≤p_1○p_2≤1/2\( for \)0≤p_1,p_2≤1/2\(, so \)H(p_1○p_2)≥H(p_2)\(. For part (2): let \)H^-1(x)=p[0,1/2]\( be the inverse of \)H\( on \)[0,1/2]\(, and set \)α=α(τ)=H^-1(τ)(1-2H^-1(1-τ))\(, \)β=β(τ)=2H^-1(1-τ)(1-H^-1(1-τ))\(, and \)γ=γ(τ)=((1-β)/β)\(; note \)α>0,β<1/2,γ>0\(. We show \)H(p_1○p_2)-H(p_2)≥αγ=:θ\(. Since \)p_1,p_2(H^-1(τ),H^-1(1-τ))\(, \[ p_1\circ p_2-p_2 = p_2(1-2p_1)+p_1-p_2 = p_1(1-2p_2)\ge H^{-1}(\tau )(1-2H^{-1}(1-\tau ))=\alpha . \] By elementary calculus (writing \)H’(q)=((1-q)/q)\( ), \)_p_2≤q≤p_1○p_2{H’(q)}≥H’(β)=γ\( (using that the maximum value of \)q=p_1○p_2\( over the range is at most \)β\(, achieved using \)p_1,p_2≤H^-1(1-τ)\(, and \)H’\( is decreasing). Hence, by the mean value theorem, \)H(p_1○p_2)-H(p_2)≥(p_1○p_2-p_2)·_p_2≤q≤p_1○p_2{H’(q)}≥αγ=θ\(. For part (3): recall \)H(p)≥p(1/p)\(, and for \)p≤1/2\(, \)-(1-p)(1-p)≤(1/2)(1-p)(p+p^2)≤(1/2)p≤2p\(, so \[ p\log (1/p)\le H(p)\le p\log (1/p)+2p, \qquad (p\le 1/2). \tag {} \]\)*\(\)

\[ \]

We have

\begin{align*} H(p_1)+H(p_2)-H(p_1\circ p_2) & \le p_1(\log (1/p_1)+2)+p_2(\log (1/p_2)+2)-(p_1\circ p_2)\log (1/(p_1\circ p_2))\\ & \le p_1(\log (1/p_1)+2)+p_2(\log (1/p_2)+2)-(p_1+p_2-2p_1p_2)\log (1/(2p_2))\\ & = p_1\log (2p_2/p_1)+p_2\log (2p_2/p_2)+2p_1p_2\log (1/(2p_2))+2(p_1+p_2)\\ & \le p_1\log (p_2/p_1)+2p_1p_2\log (1/(p_2))+6p_2\\ & \le 2p_1H(p_2)+7p_2 \le 2p_1H(p_2)+7H(p_2)/\log (1/p_2) \le 9H(p_2)/\log (1/\tau ), \end{align*}

using \(\)(*)\( throughout, \)p_1○p_2≤p_1+p_2≤2p_2\(, and (from \)(*)\( and \)τ≥H(p_2)\() that \)p_2(1/p_2)≤τ\( hence \)p_2≤τ\( and \)(1/p_2)≥(1/τ)\(, and similarly \)p_1≤τ≤1/(1/τ)\( (using \)τ<1/2\(). The "in particular" follows by taking \)τ≤2^-9c\(. For part (4): let \)H(p_i)=1-y_i\( and \)p_i=1/2-x_i\( for \)i=1,2\(. Since \)τ≤1-H(1/4)\(, \)H(p_i)≥1-τ\( gives \)x_i≤1/4\(. By a standard quadratic bound on \)H\( near \)1/2\( (), from \)1-τ≥H(1/4)\( and \)1-y_i=H(p_i)≥1-τ\( we get \)1-x_i^2≥1-y_i\(, i.e., \)x_i≤y_i\(. Also \)p_1○p_2=1/2-2x_1x_2\(, so \[ H(p_1\circ p_2) = H(1/2-2x_1x_2) \ge 1-20(x_1x_2)^2 \ge 1-20y_1y_2 = 1-20(1-H(p_1))(1-H(p_2)) \ge 1-20\tau (1-H(p_2)), \] using \)x_i≤y_i\( and \)1-H(p_1)≤τ\(. The "in particular" follows by taking \)τ<1/(20c’)\(. \)

Proof
Lemma 378 Lem. 16.5.11
#

If

\(\)(U,A)\( and \)(V,B)\( are identically distributed and independent random variables with \)U,VF_2\( and \)H(U∣A)=H(V∣B)=H(p)\(, then: \begin{enumerate} \item For every \end{enumerate}\)τ>0\( there exists \)θ>0\( such that if \)H(p)(τ,1-τ)\( then \)H(U+V∣A,B)≥H(p)+θ\(. \item For every \)c<∞\( there exists \)τ>0\( such that if \)H(p)≤τ\( then \)H(U+V∣A,B)≥(2-1/c)H(p)\(, and if \)H(p)≥1-τ\( then \)H(U+V∣A,B)≥1-(1-H(p))\(. \)

Lemma
#
Proof

Let

\(\)p_a=[U=1∣A=a]\(, so \)H(p)=H(U∣A)=E_A[H(p_A)]\(; similarly let \)q_b=[V=1∣B=b]\(. \textbf{Part (1).} Let \)θ(·)\( be as in Lemma~ \ref{lem:c16-po-bound}(2), let \)θ_1=θ(τ/2)\(, and set \)θ={θ_1/9,τ^2/36}\(. Let \)r_1=_A[H(p_A)≤τ/2]\(, \)r_2=_A[H(p_A)(τ/2,1-τ/2)]\(, \)r_3=_A[H(p_A)≥1-τ/2]\(; since \)(U,A),(V,B)\( are identically distributed the same holds replacing \)p_A\( by \)q_B\(. As \)r_1+r_2+r_3=1\(, one of them is \)≥1/3\(. If \)r_2≥1/3\(: with probability \)≥1/9\(, both \)H(p_A),H(q_B)(τ/2,1-τ/2)\(. For such \)a,b\(, \)U+V∣(A=a,B=b)∼Bern(p_a○q_b)\( and by Lemma~ \ref{lem:c16-po-bound}(2), \)H(U+V∣A=a,B=b)≥H(p_a)+θ_1\(; and by Lemma~ \ref{lem:c16-po-bound}(1), \)H(U+V∣A=a,B=b)≥H(p_a)\( for all \)a,b\(. Hence \)H(U+V∣A,B)≥E_A[p_A]+θ_1=H(p)+θ_1/9\(. If \)r_3≥1/3\(: since \)1-τ≥H(p)≥(1-r_3-_A[H(p_A)≤1-τ])(1-τ)+r_3(1-τ/2)\(, rearranging gives \)_A[H(p_A)≤1-τ]≥r_3τ/(2(1-τ))≥τ/6\(. So with probability \)≥τ/18\( we have \)A\( with \)H(p_A)≤1-τ\( and \)B\( with \)H(q_B)≥1-τ/2\(; for such \)a,b\(, by Lemma~ \ref{lem:c16-po-bound}(1), \)H(U+V∣A=a,B=b)≥H(q_b)≥H(p_a)+τ/2\(. Hence \)H(U+V∣A,B)≥E_A[H(p_A)]+τ^2/36=H(p)+τ^2/36\(. The case \)r_1≥1/3\( is analogous. In all cases \)H(U+V∣A,B)≥H(p)+θ\(. \textbf{Part 2 (case $H(p)\le \tau $).} Given \)c<∞\(, let \)τ’=τ(4c)\( be the constant from Lemma~ \ref{lem:c16-po-bound}(3) for constant \)4c\(, so \)H(p_1),H(p_2)≤τ’\( implies \)H(p_1○p_2)≥(1-)(H(p_1)+H(p_2))\(. Let \)τ=τ’/(2c)\( and assume \)H(p)≤τ\(. Let \)α=_A[H(p_A)≥τ’]\(; by Markov's inequality, \)α≤1/(2c)\(. Let \)γ=E_A[H(p_A)∣H(p_A)≥τ’]\( and \)δ=E_A[H(p_A)∣H(p_A)<τ’]\(, so \)H(p)=γα+δ(1-α)\(. Splitting on whether \)H(p_A)≥τ’\( and \)H(q_B)≥τ’\( (four cases \)S_11,S_10,S_01,S_00\(): on \)S_10∪S_01∪S_11\( (probability \)2α-α^2\(), by Lemma~ \ref{lem:c16-po-bound}(1) applied to whichever of \)p_a,q_b\( has entropy \)≥τ’\(, \)E[H(U+V∣A,B)∣(A,B)S_10∪S_01∪S_11]≥γ\(; on \)S_00\( (probability \)(1-α)^2\(), by Lemma~ \ref{lem:c16-po-bound}(3), \)E[H(U+V∣A,B)∣S_00]≥(1-)·2δ\(. Combining, \[ H(U+V\mid A,B) \ge (2\alpha -\alpha ^2)\gamma +(1-\alpha )^2\Big(1-\frac1{4c}\Big)2\delta = 2H(p)-\alpha H(p)-\frac1{2c}(1-\alpha )((1-\alpha )\delta ), \] using \)H(p)=γα+δ(1-α)\(. Using \)(1-α)δ≤H(p)\( and \)α≤1/(2c)\(, this gives \)H(U+V∣A,B)≥(2-1/c)H(p)\(. The case \)H(p)≥1-τ\( is symmetric (interchanging the roles of \)0\( and \)1\( throughout). \)

Proof
Lemma 379 Lem. 16.5.12

\(\)(U,A)\( and \)(V,B)\( are identically distributed, in the setting of the proof of Lemma~ \ref{lem:c16-local-polarization} below. \)

Lemma
#
Lemma 380 Local Polarization, Lem. 16.5.3

The sequence

\(\)X_0,…,X_j,…\( with \)X_j=H(Z_i^(j)∣Z_<i^(j))\( (where \)i\( is drawn uniformly from \)[n]\(, independently for each \)j\() is a simple and locally polarizing sequence. \)

Lemma
#
Proof

Fix

\(\)j\( and condition on \)X_j=H(p)\(. For any \)(j+1)\(-level siblings \)i<i’\(, drawing \)I\( uniformly from \)[n]\( is, conditioned on \)I{i,i’}\(, equally likely to give \)I=i\( or \)I=i’\( (since \){i,i’}\( partition \)[n]\( over all sibling pairs as \)I\( ranges over \)[n]\(). Let \)U=Z_i^(j)\(, \)V=Z_i’^(j)\(, \)A={Z_k^(j)∣k<i,kS_i,j}\(, \)B={Z_k^(j)∣k<i’,kS_i’,j}\(. If \)I=i\(, then \)X_j=H(U∣A)\( (Proposition~ \ref{prop:c16-siblings-structure}(3)); if \)I=i’\(, then \)X_j=H(V∣B)\( (Proposition~ \ref{prop:c16-siblings-structure}(4)). Furthermore, if \)I=i\(, \)X_j+1=H(U+V∣A,B)\( (using the recursive definition and Proposition~ \ref{prop:c16-siblings-structure}(1),(5)); and if \)I=i’\(, \)X_j+1=H(V∣A,B,U)\( (using Proposition~ \ref{prop:c16-siblings-structure}(1),(6), and the fact that \)V∣U\( and \)V∣U+V\( have the same distribution, since \)(U,V)↦(U+V,V)\( is a bijection). Since these conditions hold for each conditioning \)I{i,i’}\(, and the pairs \){i,i’}\( partition \)[n]\(, they hold unconditionally for a uniformly random \)I\(. The unbiasedness condition, \)E[X_j+1∣X_j=a]=a\(, follows from the fact that there is a bijection between \)(U,V)\( and \)(U+V,V)\(, so \[ H(U+V\mid A,B)+H(V\mid A,B,U) = H(U\mid A)+H(V\mid B) \] (both sides equal \)H(U,V∣A,B)\( by the chain rule, using independence of \)(U,A)\( and \)(V,B)\( established by Lemma~ \ref{lem:c16-uv-iid}). Writing \)2a\( for the right-hand side and \)2X_j+1\( for the left-hand side (since with probability \)1/2\( each, \)X_j+1\( takes one of these two values), this gives the unbiasedness condition. The variance-in-the-middle condition follows from Lemma~ \ref{lem:c16-conditioned-local-polarization}(1) (applied using the identical-distribution fact of Lemma~ \ref{lem:c16-uv-iid}), and the suction-at-the-ends condition follows from Lemma~ \ref{lem:c16-conditioned-local-polarization}(2). Simplicity holds since with probability \)1/2\(, \)X_j+1=H(U+V∣A,B)\(, and with probability \)1/2\(, \)X_j+1=H(V∣A,B,U)\( — exactly two values, each with probability \)1/2\(, as required by the definition of a simple sequence. \)

Proof

16.5.3 Local vs. Global Polarization

Lemma 381 Lem. 16.5.13

Let

\(\)γ>0\( be given and let \)c={4,γ^8/16}\(; let \)τ=τ(c)\( be from condition (3) of Definition~ \ref{def:c16-local-polarization} and \)θ={1-1/c,θ(τ)}\( from condition (2). For a simple, locally polarizing sequence \)X_0,…,X_t,…\(, let \)φ_j={X_j,1-X_j}\(. Then \)E[φ_j+1∣φ_j=a]≤(1-θ^2/16)·a\(. \)

Lemma
#
Proof

Without loss of generality assume

\(\)X_j≤1/2\( (the case \)X_j>1/2\( is symmetric, by swapping the roles of \)X_j\( and \)1-X_j\(), so \)a=φ_j=X_j\( and \)X_j=a^2\(. By simplicity and unbiasedness, there is \)δ\( such that \)X_j+1=a^2+δ\( with probability \)1/2\( and \)a^2-δ\( with probability \)1/2\(. If \)X_j≤τ\(, by unbiasedness and suction at the ends, \)δ≥(1-1/c)a^2\(; if \)X_j>τ\(, by variance in the middle, \)δ≥θ(τ)≥θ(τ)a^2\( (using \)a^2=X_j≤1\(). In either case \)δ≥θa^2\(. We bound \begin{align*} \mathbb {E}[\phi _{j+1}] & \le \mathbb {E}[\sqrt{X_{j+1}}] = \tfrac 12\sqrt{a^2+\delta }+\tfrac 12\sqrt{a^2-\delta } = \frac{a}{2}\Big(\sqrt{1+\delta /a^2}+\sqrt{1-\delta /a^2}\Big)\\ & \le \frac{a}{2}\Big(1+\frac{\delta }{2a^2}-\frac{\delta ^2}{16a^4}+1-\frac{\delta }{2a^2}-\frac{\delta ^2}{16a^4}\Big) = a\Big(1-\frac{\delta ^2}{16a^4}\Big) \le a\Big(1-\frac{\theta ^2}{16}\Big), \end{align*} where the second inequality uses \)1±x≤1±x/2-x^2/16\( for \)|x|≤1\( () and the last uses \)δ≥θa^2\(. \)

Proof
Lemma 382 Weakly Polynomial Polarization, Lem. 16.5.14
#

There exists

\(\)α_1<1\( such that for all even \)t\(, \)[X_t/2(α_1^t,1-α_1^t)]≤α_1^t\(. \)

Lemma
#
Proof

By induction on

\(\)j\( using Lemma~ \ref{lem:c16-phi-drop}, \)E[φ_j]≤(1-θ^2/16)^j\( (base case \)φ_0≤1\(). Let \)β=1-θ^2/16\( and \)α_1=β<1\(; then \)E[φ_t/2]≤β^t\(. By Markov's inequality, \)[φ_t/2≥α_1^t]≤β^t/α_1^t=α_1^t\(. If \)φ_t/2≤α_1^t\( then \)X_t/2(α_1^2t,1-α_1^2t)\(, hence \)X_t/2(α_1^t,1-α_1^t)\(; so \)[X_t/2(α_1^t,1-α_1^t)]≤α_1^t\(. \)

Proof
Lemma 383 Lem. 16.5.15, cf. Doob’s martingale inequality

For a simple, locally polarizing sequence with

\(\)τ=τ(c)\( as in Lemma~ \ref{lem:c16-phi-drop}: for all \)λ>0\(, if \)X_0≤λ\(, then the probability there exists \)j>0\( such that \)X_j≥τ\( is at most \)λ/τ\(. Similarly if \)X_0≥1-λ\(, then the probability there exists \)j>0\( such that \)X_j≤1-τ\( is at most \)λ/τ\(. \)

Lemma
#
Proof

We give the proof for

\(\)X_0≤λ\(; the case \)X_0≥1-λ\( is symmetric. We show \)[_0≤t≤T{X_t}≥τ]≤λ/τ\( for every integer \)T>0\(. Define \)Y_0=X_0\( and, for \)i≥1\(, \)Y_i=X_i\( if \)Y_i-1<τ\(, else \)Y_i=Y_i-1\(. By simplicity of \)X_i\('s, \)E[Y_i∣Y_i-1=a]=a\( for every \)i,a\( (unbiasedness applies once \)Y_i-1<τ\(, and is trivial once \)Y_i-1≥τ\(). Also \)_0≤t≤T{X_t}≥τ\( iff \)Y_T≥τ\(. Thus, by Markov's inequality, \[ \Pr \Big[\max _{0\le t\le T}\{ X_t\} \ge \tau \Big] = \Pr [Y_T\ge \tau ] \le \frac{\mathbb {E}[Y_T]}{\tau }, \] and \)E[Y_T]=E[Y_T-1]==E[Y_0]=E[X_0]≤λ\(, yielding the lemma. \)

Proof
Lemma 384 Weak to Strong Polarization, Lem. 16.5.16

There exists

\(\)α_2<1\( such that for every \)λ>0\(, if \)X_0(λ,1-λ)\(, then the probability that \)X_t/2(γ^t,1-γ^t)\( is at most \)λ/τ+α_2^t\( (where \)τ=τ(c)\(, \)c≥4\(, is as in the construction preceding Lemma~ \ref{lem:c16-phi-drop}, and \)γ\( is the target parameter). \)

Lemma
#
Proof

We consider

\(\)X_0≤λ\( (the case \)X_0≥1-λ\( is symmetric). Let \)Z_i=1\( if \)X_i<X_i-1\( and \)0\( otherwise; by simplicity, the \)Z_i\('s are i.i.d.\ and equal to \)1\( with probability \)1/2\(. Let \)Z=∑_i=1^t/2Z_i\(. Let \)E_1\( be the event that there exists \)1≤j≤t/2\( with \)X_j≥τ\(, and \)E_2\( the event that \)Z≤t/8\(. By Lemma~ \ref{lem:c16-doob}, \)E_1\( has probability at most \)λ/τ\(; by a Chernoff bound, \)E_2\( has probability at most \)α_2^t\( for some \)α_2<1\(. If \)E_1\( does not occur, then using simplicity: with probability \)1/2\(, \)X_1≤(1/c)X_0≤X_0/4\( (suction at the ends), and with probability \)1/2\(, \)X_1≤2X_0\( (unbiasedness bounds the other branch); iterating, \)X_t/2≤2^t/2·c^-ZX_0≤2^t/2c^-Z\(. If additionally \)E_2\( does not occur, \)X_t/2≤2^t/2·c^-t/8≤γ^t\(, by the choice \)c=(2/γ^2)^4\(. Combining, if neither \)E_1\( nor \)E_2\( occurs then \)X_t/2≤γ^t\(, giving \)[X_t/2≥γ^t] ≤[E_1]+[E_2] ≤λ/τ+α_2^t\(, and (for \)t\( large enough that \)γ^t<1-γ^t\() this bounds \)[X_t/2(γ^t,1-γ^t)]\(. \)

Proof
Lemma 385 Local vs. Global Polarization, Lem. 16.5.5

If a sequence

\(\)X_0,…,X_t,…\( with \)X_t[0,1]\( is simple and locally polarizing, then it is also strongly polarizing. \)

Lemma
#
Proof

Given

\(\)γ>0\(, let \)α_1<1\( be the constant from Lemma~ \ref{lem:c16-weak-poly-polarization} and \)α_2<1\( the constant from Lemma~ \ref{lem:c16-weak-to-strong}. Set \)α={α_1,α_2}<1\( and \)β=2+1/τ<∞\( (with \)τ=τ(c)\( as in Lemma~ \ref{lem:c16-weak-to-strong}, \)c={4,γ^8/16}\(). Let \)E\( be the event that \)X_t/2(α_1^t,1-α_1^t)\(; by Lemma~ \ref{lem:c16-weak-poly-polarization}, \)[E]≤α_1^t\(. Conditioned on \)E\( not occurring, applying Lemma~ \ref{lem:c16-weak-to-strong} with \)λ=α_1^t\( gives \)[X_t(γ^t,1-γ^t)]≤α_1^t/τ+α_2^t\(. Combining, \[ \Pr [X_t\in (\gamma ^t,1-\gamma ^t)] \le \alpha _1^t+\frac{\alpha _1^t}{\tau }+\alpha _2^t \le \alpha ^t+\frac{\alpha ^t}{\tau }+\alpha ^t = \beta \cdot \alpha ^t, \] as desired. \)

Proof

16.7 Summary and Additional Information

The reduction of error-correction to compression (Proposition 353) and the idea of using polarization to build a compression scheme (Definition 355, Theorem 360) together give a complete construction. One remaining subtlety, not addressed in the main construction, is that of efficiently identifying the set \(\)S\( of unpredictable columns; a polynomial-time algorithm for this (which also generalizes the strong-polarization analysis of this chapter beyond the binary case) is known but out of scope here. \begin{thmenv}[Mrs.\ Gerber's Lemma consequence, Lem.~ 16.7.1] \label{lem:c16-gerber} \notready \uses{def:c16-entropy, def:c16-conditional-entropy} If \end{thmenv} \)(U,A)\( and \)(V,B)\( are independent and \)U,V\( are binary-valued random variables with \)H(U∣A)=H(p)\( and \)H(V∣B)=H(q)\(, then \[ H(U+V\mid A,B) \ge H(p(1-q)+q(1-p)). \] \)

Lemma
#