16 Information Theory Strikes Back: Polar Codes

This chapter introduces Polar codes, a class of codes developed from purely information-theoretic insights, and shows how they resolve the question of getting arbitrarily close to capacity for BSC_p$ with block length and decoding time polynomial in $1/ε$. We assume basic notions of Entropy and Conditional Entropy from information theory (the book’s Appendix~ E); the basic facts we use are recorded below as settled leaves. \begin{thmenv}[Entropy, Def.~ E.1.2] \label{def:c16-entropy} \mathlibok For a discrete random variable \end{thmenv} $X$ supported on a finite set $Ω$, the entropy of $X$ is \[ H(X) = \sum _{x\in \Omega }\Pr [X=x]\log \frac{1}{\Pr [X=x]}, \] (logarithms base $2$, with the convention $0(1/0)=0$). $

Definition

Definition 346 Conditional entropy, Def. E.2.2

✓

For jointly distributed random variables

X,Y$, the entropy of $X$ conditioned on $Y$ is \[ H(X\mid Y) = \mathbb {E}_{y\sim Y}\big[H(X\mid Y=y)\big]. \] $

Definition

Lemma 347 Chain rule, Thm. E.2.4

✓

For jointly distributed random variables

X,Y$: $H(X,Y) = H(Y) + H(X∣Y)$. $

Lemma

Lemma 348 Conditioning does not increase entropy, Lem. E.2.6

✓

For jointly distributed random variables

X,Y$: $H(X∣Y)≤H(X)$. $

Lemma

Lemma 349 Uniform distribution maximizes entropy, Lem. E.1.3

✓

X$ is supported on $Ω$ then $H(X)≤|Ω|$. $

Lemma

16.1 Achieving Gap to Capacity

Recall that for p[0,1/2)$, $BSC_p$ takes input $X=(X_1,…,X_n)$ and outputs $Y=(Y_1,…,Y_n)$ where independently for each $i$, $Y_i=X_i$ with probability $1-p$ and $Y_i≠X_i$ with probability $p$; we write $Z=(Z_1,…,Z_n)Bern(p)^n$ for the error pattern, so $Y=X+Z$. \begin{thmenv}[Main theorem, Thm.~ 16.1.1] \label{thm:c16-main-capacity} \uses{def:c06-bsc, prop:c16-compression-to-code, thm:c16-strong-polarization, prop:c16-encoder-runtime, cor:c16-bpd-correctness, prop:c16-basic-polar-rate} For every \end{thmenv} $p[0,1/2)$ there is a polynomial $n_0(·)$ such that for every $ε>0$ there exist $k,n$ and an encoder $E:F_2^kF_2^n$ and decoder $D:F_2^nF_2^k$ satisfying: \begin{itemize} \item \textbf{Length and Rate.} \end{itemize}$1/ε≤n≤n_0(1/ε)$ and $k≥(1-H(p)-ε)·n$. \item \textbf{Running times.} $E$ and $D$ run in time $O(nn)$ (the $O(·)$ hides a universal constant independent of $p,ε$). \item \textbf{Failure Probability.} For every $mF_2^k$, \[ \Pr _{\mathbf{Z}\in \mathrm{Bern}(p)^n}[\mathbf{m}\ne D(E(\mathbf{m})+\mathbf{Z})]\le \varepsilon . \] $

Theorem

Proof ▶

By Proposition 353, it suffices to find, given

p[0,1/2)$ and $ε>0$, an $(ε,ε/n)$-polarizing matrix $PF_2^n×n$ with $n$ bounded by a polynomial in $1/ε$, such that multiplication by $P$ and decompression both take $O(nn)$ time. Theorem~ \ref{thm:c16-strong-polarization} applied with parameters $p,ε$ and $c=2$ yields $n$ and the matrix $P=P_nF_2^n×n$ (Definition~ \ref{def:c16-basic-polarizing-matrix}) that is $(ε,1/n^2)$-polarizing. Since Theorem~ \ref{thm:c16-strong-polarization} also guarantees $ε≥1/n$, we have that $P_n$ is $(ε,ε/n)$-polarizing. Furthermore there exists a set $S⊆[n]$ with $|S|≤(H(p)+ε)n$ such that $H(W_S∣W_S)≤ε$ when $W=Z·P$ and $Z∼Bern(p)^n$; equivalently (Proposition~ \ref{prop:c16-basic-polar-rate}) the rate of the resulting Basic Polar Code is at most $H(p)+ε$. By Proposition~ \ref{prop:c16-encoder-runtime}, compressing $ZF_2^n$ to $(Z·P)_S$ takes $O(nn)$ time. Finally Corollary~ \ref{cor:c16-bpd-correctness} asserts that a decompression $Z$ can be computed given $(Z·P)_S$ in time $O(nn)$, and that $Z$ equals $Z$ with probability at least $1-ε$. Composing with the linear-algebraic reduction of Proposition~ \ref{prop:c16-compression-to-code} completes the proof. $

Proof ▶

Theorem 351 Strengthened main theorem, Thm. 16.1.2

There exist polynomially growing functions

n_0:[0,1]Z^+$ and $T:Z^+Z^+$ such that for all $p[0,1]$, $ε>0$ there exists $δ>0$, a function $k:Z^+Z^+$ and an ensemble of functions $E={E_n}_n$ and $D={D_n}_n$ such that for all $nZ^+$ with $n≥n_0(1/ε)$ the following hold. \begin{enumerate} \item The codes are \end{enumerate}$ε$-close to capacity: $k=k(n)≥(1-H(p)-ε)n$, $E_n:F_2^kF_2^n$ and $D_n:F_2^nF_2^k$. \item The codes correct a $p$-fraction of errors with all but exponentially small probability: \[ \Pr _{\mathbf{Z}\sim \mathrm{Bern}(p)^n,\ \mathbf{X}\sim U(\mathbb {F}_2^k)}[D_n(E_n(\mathbf{X})+\mathbf{Z})\ne \mathbf{X}]\le \exp (-\delta n). \] \item Encoding and decoding are efficient: $E_n$ and $D_n$ run in time at most $T(n/ε)$. $

Theorem

16.2 Reduction to Linear Compression

Definition 352 τ$-error linear compression scheme$

For n≥m$, a pair $(H,D)$ with $HF_2^n×m$ and $D:F_2^mF_2^n$ forms a $τ$-error linear compression scheme$ for Bern(p)^n$ if $H$ has rank $m$ and \[ \Pr _{\mathbf{Z}\sim \mathrm{Bern}(p)^n}[D(\mathbf{Z}\cdot \mathbf{H})\ne \mathbf{Z}]\le \tau . \] The ratio $m/n$ is called the \emph{(compression) rate} of the scheme. $

Proposition 353 Prop. 16.2.1

Let

(H,D)$ be a $τ$-error linear compression scheme for $Bern(p)^n$ with $HF_2^n×m$. Let $k=n-m$ and let $GF_2^k×n$ and $G^*:F_2^n×k$ be full-rank matrices such that $G·H=0$ and $G·G^*=I_k$ (the $k×k$ identity matrix). Then the encoder $E:F_2^kF_2^n$ given by $E(X)=X·G$ and the decoder $D’:F_2^nF_2^k$ given by $D’(Y)=(Y-D(Y·H))·G^*$ satisfy, for every $mF_2^k$, \[ \Pr _{\mathbf{Z}\in \mathbb {F}_2^n}[\mathbf{m}\ne D'(E(\mathbf{m})+\mathbf{Z})]\le \tau . \] $

Proposition

Proof ▶

Suppose

D(Z·H)=Z$; we claim decoding is successful. Indeed, \begin{align*} D’(E(\mathbf{m})+\mathbf{Z}) & = (E(\mathbf{m})+\mathbf{Z}-D((E(\mathbf{m})+\mathbf{Z})\cdot \mathbf{H}))\cdot \mathbf{G}^*\\ & = (E(\mathbf{m})+\mathbf{Z}-D(\mathbf{Z}\cdot \mathbf{H}))\cdot \mathbf{G}^*\\ & = (E(\mathbf{m})+\mathbf{Z}-\mathbf{Z})\cdot \mathbf{G}^*\\ & = E(\mathbf{m})\cdot \mathbf{G}^*\\ & = \mathbf{m}. \end{align*} The second equality uses $E(m)·H=m·G·H=0$; the third uses the assumption $D(Z·H)=Z$; the last uses $E(m)=m·G$ and $G·G^*=I$. Thus the probability of a decoding failure is at most the probability of a decompression failure, which by definition is at most $τ$. $

Proof ▶

16.3 The Polarization Phenomenon

16.3.1 Information Theory Review

Proposition 354 Prop. 16.3.1

Let

α≥0$. \begin{enumerate} \item Let \end{enumerate}$X$ be a random variable with $H(X)≤α$. Then there exists $x$ such that $_X[X≠x]≤α$. \item Let $(X,Y)$ be jointly distributed variables with $H(X∣Y)≤α$. Then the function $A(y)=argmax_x{[X=x∣Y=y]}$ satisfies $_(X,Y)[X≠A(Y)]≤α$. $

Proposition

Proof ▶

For part (1), for

i$ in the support of $X$ let $p_i=_X[X=i]$ and let $x=argmax_i{p_i}$; let $γ=1-p_x=_X[X≠x]$. We show $γ≤α$. If $γ≤1/2$, \begin{align*} \alpha \ge H(X) = \sum _i p_i\log \frac1{p_i} & \ge \sum _{i\ne x}p_i\log \frac1{p_i} \ge \sum _{i\ne x}p_i\log \frac1{\sum _{j\ne x}p_j}\\ & = \Big(\sum _{i\ne x}p_i\Big)\log \frac1{\sum _{j\ne x}p_j} = \gamma \log \frac1\gamma \ge \gamma , \end{align*} where the first inequality drops the non-negative $x$-summand, the second uses $p_i≤∑_j≠xp_j$ for $i≠x$, and the last uses $γ≤1/2$ so $(1/γ)≥1$. If $γ>1/2$, then $α≥H(X)=∑_ip_i(1/p_i)≥∑_ip_i(1/p_x)=(1/p_x)=(1/(1-γ))≥1≥γ$, using $p_i≤p_x$ for all $i$ and $γ≥1/2$. For part (2), given $y$ and $i$, let $p_i,y=_X[X=i∣Y=y]$ and let $x_y=argmax_i{p_i,y}$. Let $γ_y=1-p_x_y,y$, so $γ=E_Y[γ_Y]$ where $γ=_(X,Y)[X≠A(Y)]$. Let $α_y=H(X∣Y=y)$, so $α=H(X∣Y)=E_Y[α_Y]$. By part (1) applied to the conditional distribution of $X$ given $Y=y$, we have $γ_y≤α_y$ for every $y$, and hence $γ=E_Y[γ_Y]≤E_Y[α_Y]=α$. $

Proof ▶

16.3.2 Polarized Matrices and Decompression

Definition 355 Polarizing matrix, unpredictable columns, Def. 16.3.2

An invertible matrix

PF_2^n×n$ is $(ε,τ)$-polarizing for $Bern(p)^n if, writing W=Z·P$ for $ZBern(p)^n$ and $W_<i=(W_1,…,W_i-1)$, the set \[ S = S_\tau = \{ i\in [n]\mid H(W_i\mid \mathbf{W}_{{\lt}i})\ge \tau \} \] satisfies $|S|≤(H(p)+ε)n$. We refer to $S$ as the set of \emph{unpredictable columns} of $P$ (and ${W_i}_iS$ as the \emph{unpredictable bits} of $W$). $

Definition

Algorithm 356 Polar Compressor, Alg. 16.3.1

Input: A polarizing matrix

PF_2^n×n$, a string $ZF_2^n$, and a subset $S⊆[n]$. \textbf{Output:} A compressed string $WF_2^|S|$. \textbf{Procedure:} return $(Z·P)_S$, i.e., the coordinates of $Z·P$ indexed by $S$. $

Algorithm

If P_S$ denotes the $n×|S|$ matrix whose columns correspond to the indices in $S$, then the compression of $Z$ equals $Z·P_S$. Thus $P_S$ plays the role of $H$ from Proposition~ \ref{prop:c16-compression-to-code}, with $G=(P^-1)_S$ and $G^*=P_S$ (where $S$ is the complement of $S$); in particular the complexity of multiplying a vector by $P$ and $P^-1$ dominates the encoding/decoding cost. \begin{thmenv}[Successive Cancellation Decompressor, Alg.~ 16.3.2] \label{alg:c16-scd} \uses{def:c16-polarizing-matrix, prop:c16-predictable} \textbf{Input:} \end{thmenv} $S⊆[n]$, $WF_2^S$, and a polarizing matrix $PF_2^n×n$. \textbf{Output:} $ZF_2^n$ such that $(Z·P)_S=W$. \textbf{Procedure} $SCD(W,P,S)$: \begin{enumerate} \item For \end{enumerate}$i=1$ to $n$: if $iS$, set $W_iW_i$; otherwise set $W_iargmax_bF_2{[W_i=b∣W_<i=W_<i]}$ (using the predictor of Proposition~ \ref{prop:c16-predictable}). \item Return $ZW·P^-1$. $

Algorithm

Lemma 358 Lem. 16.3.3

P$ is $(ε,τ)$-polarizing for $Bern(p)^n$ with unpredictable columns $S$, then the successive cancellation decoder has failure probability at most $τn$: \[ \Pr _{\mathbf{Z}\in \mathrm{Bern}(p)^n}[\mathbf{Z}\ne \mathrm{SCD}((\mathbf{Z}\cdot \mathbf{P})_S,\mathbf{P},S)]\le \tau n. \] Thus, if $P$ is $(ε,ε/n)$-polarizing then the failure probability of the successive cancellation decoder is at most $ε$. $

Lemma

Proof ▶

Let

W=Z·P$. By Step~ 1 of Algorithm~ \ref{alg:c16-scd}, $W_i=W_i$ for every $iS$. For $iS$, applying Proposition~ \ref{prop:c16-predictable}(2) with $X=W_i$, $Y=W_<i$, $α=τ$ we get $_Z[W_i≠A_i(W_<i)]≤τ$ where $A_i(·)$ is the predictor of Proposition~ \ref{prop:c16-predictable}. By a union bound, \[ \Pr _{\mathbf{Z}}[\exists i \text{ s.t. } W_i\ne A_i(\widetilde{\mathbf{W}}_{{\lt}i})]\le \sum _{i=1}^n\Pr _{\mathbf{Z}}[W_i\ne A_i(\widetilde{\mathbf{W}}_{{\lt}i})]\le \tau n. \] By Step~ 1 and definition of $A_i(·)$, $W_i=A_i(W_<i)$ for $iS$. Thus if $W≠W$ there must exist some $i$ with $W_i≠W_i=A_i(W_<i)$; hence $[W≠W]≤τn$. Since $P$ is invertible, $W≠W$ iff $Z=W·P^-1≠W·P^-1=SCD(…)$, so $_Z[Z≠SCD((Z·P)_S,P,S)]≤τn$. $

Proof ▶

16.3.3 A Polarizing Primitive

We now describe an explicit construction of a polarizing matrix. The basic idea is to start with a simple 2×2$ \emph{polarization} step and iterate it. For two independent bits $Z_1,Z_2$ with $H(Z_1)=H(Z_2)=H(p)$, of the four linear combinations of $(Z_1,Z_2)$ over $F_2$, three are trivial ($0$, $Z_1$, $Z_2$); the only non-trivial new combination is $Z_1+Z_2$. This leads to the transform $P_2:(Z_1,Z_2)↦(W_1,W_2)=(Z_1+Z_2,Z_2)$, an invertible linear map given by the matrix $P_2=

$. One can show $H(W_1,W_2)=2H(p)$ while $H(W_1)>H(Z_1)=H(Z_2)=H(p)$ (since $[W_1=1]=2p(1-p)(p,1/2)$ and $H(·)$ is increasing there), and hence by the chain rule $H(W_2∣W_1)=H(W_1,W_2)-H(W_1)<H(Z_1),H(Z_2)$: the transform separates the entropy of two equally-entropic bits into a more entropic bit and a less entropic one. Iterating this $2×2$ transform recursively on blocks yields, for $n$ a power of two, a linear transform $P_n$ on $n$ bits (Figure~ 16.2 in the source); this is formalized as Definition~ \ref{def:c16-basic-polarizing-matrix} below. $

16.4 Polar Codes, Encoder and Decoder

16.4.1 The Code and Polarization Claims

Definition 359 Basic polarizing matrix, Def. 16.4.1

We define the

n×n$ polarization matrix $P_n$ recursively for $n=2,4,8,…$, by describing the linear map $P_n:Z↦Z·P_n$. For $n=2$ and $ZF_2^2$, $P_2(Z)=(Z_1+Z_2,Z_2)$. For $n=2^t$ and $Z=(U,V)$ with $U,VF_2^n/2$, \[ P_n(\mathbf{Z}) = \big(P_{n/2}(\mathbf{U}+\mathbf{V}),\, P_{n/2}(\mathbf{V})\big). \] $

Definition

Theorem 360 (Polynomially) Strong Polarization, Thm. 16.4.2

Fix

p(0,1/2)$ and constant $c$. There exists a polynomial function $n_0$ such that for every $ε>0$, there exists $n=2^t$ with $1/ε≤n≤n_0(1/ε)$ and a set $E⊆[n]$ with $|E|≤(ε/2)·n$ such that for every $iE$, the conditional entropy $H(W_i∣W_<i)$ (where $W=P_n(Z)$, $Z∼Bern(p)^n$) is either less than $n^-c$, or greater than $1-n^-c$. Furthermore, if $S={i[n]∣H(W_i∣W_<i)≥n^-c}$ then $|S|≤(H(p)+ε)·n$ and the matrix $P_n$ is $(ε,1/n^c)$-polarizing for $Bern(p)^n$ with unpredictable columns $S$. $

Theorem

Proof ▶

We assume without loss of generality

c≥1$ (proving the theorem for larger $c$ implies it for smaller $c$). Given $p$ and $c≥1$, let $γ=2^-c$. Let $β<∞$ and $α<1$ be the constants given by Definition~ \ref{def:c16-strong-polarization-seq} for this choice of $γ$, as guaranteed to exist by Lemma~ \ref{lem:c16-local-to-global} applied to the locally polarizing, simple sequence of Lemma~ \ref{lem:c16-local-polarization}. Set $n_0(x)={8x,2(2βx)^(1/α)}$, a polynomial in $x$. Given $ε>0$, let $t={(8/ε),(2β/ε)/(1/α)}$, so that $n=2^t≤{8/ε,2(2β/ε)^1/(1/α)}=n_0(1/ε)$. This choice gives $β·α^t≤ε/2$ and $γ^t=2^-ct=n^-c$. Also $n≥4/ε$, so $2n^-c≤2n^-1≤ε/2$. Recall the notation $(Z_i^(j))$ of Section~ 16.5.1 (see Definition~ \ref{def:c16-jth-siblings} and Lemma~ \ref{lem:c16-pn-via-levels}), where $W=P_n(Z)=(Z_1^(t),…,Z_n^(t))$, and let $X_j=H(Z_i^(j)∣Z_<i^(j))$ for $i$ chosen uniformly at random from $[n]$. By Lemma~ \ref{lem:c16-local-polarization} the sequence $X_0,…,X_t,…$ is simple and locally polarizing, so by Lemma~ \ref{lem:c16-local-to-global} it strongly polarizes; by the definition of strong polarization (with the above choice of $γ,α,β,t$), \begin{align*} \Pr _{i\in [n]}\big[H(W_i\mid \mathbf{W}_{{\lt}i})\in (n^{-c},1-n^{-c})\big] & = \Pr _i\big[H(Z_i^{(t)}\mid \mathbf{Z}_{{\lt}i}^{(t)})\in (\gamma ^t,1-\gamma ^t)\big]\\ & = \Pr [X_t\in (\gamma ^t,1-\gamma ^t)] \le \beta \alpha ^t \le \varepsilon /2. \end{align*} Thus the set $E={i[n]∣H(W_i∣W_<i)(n^-c,1-n^-c)}$ satisfies $|E|≤εn/2$. For the "Furthermore" part, by the chain rule $nH(p)=∑_i[n]H(W_i∣W_<i)≥∑_iS∖EH(W_i∣W_<i)≥(|S|-|E|)(1-n^-c)$, where the definitions of $S,E$ give the last inequality. Rearranging, \[ |S|\le \frac{nH(p)}{1-n^{-c}}+\varepsilon n/2\le nH(p)(1+2n^{-c})+\varepsilon n/2\le nH(p)+\varepsilon n. \] $

Proof ▶

Definition 361 Basic Polar (Compressing) Code, Def. 16.4.3

Given

0<p<1/2$ and $ε≤1/4$, let $n$ and $S⊆[n]$ be as given by Theorem~ \ref{thm:c16-strong-polarization} for $c=2$. The Basic Polar Code with parameters $p$ and $ε$ maps $ZF_2^n$ to $P_n(Z)_S$. $

Definition

Proposition 362 Rate of the Basic Polar Code, Prop. 16.4.4

For every

p(0,1/2)$ and $ε>0$, the rate of the Basic Polar Code with parameters $p$ and $ε$ is at most $H(p)+ε$. $

Proposition

Proof ▶

The compression scheme maps

ZF_2^n$ to $P_n(Z)_SF_2^|S|$, so its rate is $|S|/n$ (Definition~ \ref{def:c16-linear-compression}). By Theorem~ \ref{thm:c16-strong-polarization} (applied with $c=2$ as in Definition~ \ref{def:c16-basic-polar-code}), $|S|≤(H(p)+ε)n$, so the rate is at most $H(p)+ε$. $

Proof ▶

16.4.2 Encoding

Algorithm 363 Basic Polar Encoder, Alg. 16.4.1

Input:

n$ a power of two, $ZF_2^n$, and $S⊆[n]$. \textbf{Output:} Compression $WF_2^S$ of $Z$ given by $W=(P_n(Z))_S$. \textbf{Procedure:} return $W=(P(n,Z))_S$, where the recursive function $P(n,Z)$ is: if $n=1$ return $Z$; else, letting $U=(Z_1,…,Z_n/2)$ and $V=(Z_n/2+1,…,Z_n)$, return $(P(n/2,U+V), P(n/2,V))$. $

Algorithm

Proposition 364 Prop. 16.4.5

The runtime of the Basic Polar Encoder algorithm is

O(nn)$. $

Proposition

Proof ▶

T(n)$ denotes the runtime of the algorithm on inputs of length $n$, then $T(·)$ satisfies the recurrence $T(n)=2T(n/2)+O(n)$, since all operations other than the two recursive calls (namely computing $U+V$ and assembling the output) can be done in $O(n)$ time. This recurrence implies $T(n)=O(nn)$. $

Proof ▶

16.4.3 Decoding

Proposition 365 Bias update functions

Let

Z_1∼Bern(p_1)$ and $Z_2∼Bern(p_2)$ be independent. Then $Z_1+Z_2∼Bern(b^+(p_1,p_2))$ where \[ b^+(p_1,p_2) = p_1(1-p_2)+(1-p_1)p_2. \] Moreover, conditioned on $Z_1+Z_2=a$ for $a{0,1}$, $Z_2$ is a Bernoulli random variable with bias $b^ℓ(p_1,p_2,a)$ given by \[ b^\ell (p_1,p_2,0)=\frac{p_1p_2}{p_1p_2+(1-p_1)(1-p_2)},\qquad b^\ell (p_1,p_2,1)=\frac{(1-p_1)p_2}{(1-p_1)p_2+p_1(1-p_2)}. \] Both $b^+$ and $b^ℓ$ are computable in $O(1)$ arithmetic operations. $

Proposition

Proof ▶

The formula for

b^+$ is immediate: $Z_1+Z_2=1$ iff exactly one of $Z_1,Z_2$ equals $1$, which by independence has probability $p_1(1-p_2)+(1-p_1)p_2$. For $b^ℓ$, by Bayes' rule and independence of $Z_1,Z_2$, for $a{0,1}$ \[ \Pr [Z_2=1\mid Z_1+Z_2=a] = \frac{\Pr [Z_2=1]\Pr [Z_1=a\ominus 1]}{\Pr [Z_1+Z_2=a]}, \] where $a⊖1$ denotes $a-1 2$; for $a=0$ this equals $p_2p_1/(p_1p_2+(1-p_1)(1-p_2))$ (the event $Z_1+Z_2=0$ splits into $(Z_1,Z_2)=(0,0)$ or $(1,1)$) and for $a=1$ it equals $p_2(1-p_1)/((1-p_1)p_2+p_1(1-p_2))$ similarly. Each formula is a fixed rational expression in $p_1,p_2$, hence computable in $O(1)$ time. $

Proof ▶

Algorithm 366 Basic Polar Decoder, Alg. 16.4.2

Input:

n$ a power of two, $W(F_2∪{?})^n$, and $0≤p<1/2$. \textbf{Output:} $ZF_2^n$ such that for every $i$, either $W_i=?$ or $(Z·P)_i=W_i$. \textbf{Procedure} $BPD(W;n,p)$: let $(Z,W,ρ)=RPD(W;n,(p,…,p))$ and return $Z$, where the recursive subroutine $RPD(W;n,(p_1,…,p_n))$ (with $W(F_2∪{?})^n$) is defined by: \begin{itemize} \item If \end{itemize}$n=1$ and $W_1F_2$: return $(W_1,W_1,p_1)$. \item If $n=1$ and $W_1=?$: return $(1,1,p_1)$ if $p_1≥1/2$, else $(0,0,p_1)$. \item If $n≥2$: write $W=(W^(1),W^(2))$ with $W^(1)=W[1,…,n/2]$ and $W^(2)=W[n/2+1,…,n]$. For $i=1,…,n/2$ let $q_i=b^+(p_i,p_n/2+i)$. Let $(X,W^1,ρ^1)=RPD(W^(1);n/2,(q_1,…,q_n/2))$. For $i=1,…,n/2$ let $r_i=b^ℓ(p_i,p_n/2+i,X_i)$. Let $(Y,W^2,ρ^2)=RPD(W^(2);n/2,(r_1,…,r_n/2))$. Let $Z=(X+Y,Y)$, $W=(W^1,W^2)$, $ρ=(ρ^1,ρ^2)$; return $(Z,W,ρ)$. $

The output ρ=(ρ_1,…,ρ_n)$ records, for each $i$, the value $ρ_i=_Z[W_i=1∣W_<i=W_<i]$ when $Z∼Bern(p_1)××Bern(p_n)$. $

Algorithm

Lemma 367 Lem. 16.4.6

Let

Z∼Bern(p_1)××Bern(p_n)$ and $W=P_n(Z)$. Let $S⊆[n]$ and let $W’=(W_1’,…,W_n’)$ be given by $W_i’=W_i$ if $iS$ and $W_i’=?$ otherwise. Let $(Z,W,ρ)=RPD(W’;n,(p_1,…,p_n))$. If $H(W_i∣W_<i)≤τ$ for every $iS$, then: \begin{enumerate} \item For every \end{enumerate}$i$, $_Z[W_i=1∣W_<i=W_<i]=ρ_i$. \item $_Z[W≠W]≤τn$. \item $_Z[Z≠Z]≤τn$. $

Lemma

Proof ▶

The main part is part (1); part (2) follows almost immediately from part (1) and Proposition 354; part (3) is then straightforward. We prove them in turn.

Part (1): $\widetilde{\mathbf{W}}=P_n(\tilde{\mathbf{Z}})$, by induction on $n$. For

n=1$, the base cases of $RPD$ directly show $W_1=Z_1$. For larger $n$ (writing $Z=(U,V)$), by induction $W^1=P_n/2(X)$ and $W^2=P_n/2(Y)$, and the last step of the recursive procedure gives \[ P_n(\tilde{\mathbf{Z}}) = P_n((\tilde{\mathbf{X}}+\tilde{\mathbf{Y}},\tilde{\mathbf{Y}})) = \big(P_{n/2}(\tilde{\mathbf{X}}+\tilde{\mathbf{Y}}+\tilde{\mathbf{Y}}),P_{n/2}(\tilde{\mathbf{Y}})\big) = (\widetilde{\mathbf{W}}^1,\widetilde{\mathbf{W}}^2) = \widetilde{\mathbf{W}}, \] by Definition~ \ref{def:c16-basic-polarizing-matrix} and $Y+Y=0$. Now we prove $ρ_i=_Z[W_i=1∣W_<i=W_<i]$ by induction on $n$; the base case $n=1$ is immediate from the definition of $RPD$. For the inductive step, consider two cases. If $i≤n/2$: then $W_i=P_n/2(U+V)_i$ via the definition of $P_n$, and $(U+V)∼Bern(q_1)××Bern(q_n/2)$ by Proposition~ \ref{prop:c16-bias-functions}. By the inductive claim applied to the recursive call $RPD(W’[1,…,n/2];n/2,(q_1,…,q_n/2))$, $ρ_i=ρ_i^(1)=_(U+V)[W_i=1∣W_<i=W_<i] =_Z∼Bern(p_1)××Bern(p_n)[W_i=1∣W_<i=W_<i]$, where the last equality follows from the definition of $q_i$ (this uses that $W_<i$ for $i≤n/2$ is a deterministic function of $U+V$ alone). If $i>n/2$: the condition $W[1,…,n/2]=W[1,…,n/2]$ is equivalent to $U+V=X$ (using $W^1=P_n/2(X)$, invertibility of $P_n/2$, and part (1) proven above). Conditioning on $U+V=X$ implies, by Proposition~ \ref{prop:c16-bias-functions}, that $V∼Bern(r_1)××Bern(r_n/2)$ — exactly how the $r_i$'s were defined. Since $W[n/2+1,…,n]=P_n/2(V)$, we get \[ \Pr _{\mathbf{Z}}[W_i=1\mid \mathbf{W}_{{\lt}i}=\widetilde{\mathbf{W}}_{{\lt}i}] = \Pr _{\mathbf{V}\sim \mathrm{Bern}(r_1)\times \cdots \times \mathrm{Bern}(r_{n/2})}[W_i=1\mid \mathbf{W}_{{\lt}i}=\widetilde{\mathbf{W}}_{{\lt}i}], \] which by induction on the recursive call for $W^(2)$ equals $ρ^(2)_i-n/2=ρ_i$. This concludes part (1). \textbf{Part (2).} If $iS$, by the base-case/pass-through structure of $RPD$ we have $W_i=W_i$. Now assume $iS$. We claim $W_i=1$ iff $ρ_i=_Z[W_i=1∣W_<i=W_<i]≥1/2$: indeed, when $RPD$ is called on the base-case input $(W_i’,1,p_i’)$ with $W_i’=?$, part (1) gives $ρ_i=p_i’$, and then $W_i=argmax_bF_2[W_i=b∣W_<i=W_<i]$ by the base-case rule of Algorithm~ \ref{alg:c16-bpd}. By Proposition~ \ref{prop:c16-predictable} applied to $X=W_i$, $Y=W_<i$, $α=τ$: $[W_i≠W_i∣W_<i=W_<i]≤τ$. By a union bound over $i$, $[∃i s.t. W_i≠W_i∣W_<i=W_<i]≤τn$. If $W≠W$, there must exist an index $i$ with $W_i≠W_i$, so $[W≠W]≤τn$. \textbf{Part (3).} By part (1), $W=P_n(Z)$; also $W=P_n(Z)$. If $W=W$ (which holds with probability at least $1-τn$ by part (2)), invertibility of $P_n$ gives $Z=P_n^-1(W)=P_n^-1(W)=Z$. $

Proof ▶

Corollary 368 Cor. 16.4.7

For every input

n=2^t$, $p[0,1/2)$, and $W(F_2∪{?})^n$, the Basic Polar Decoder (Algorithm~ \ref{alg:c16-bpd}) runs in time $O(nn)$ and computes an output $ZF_2^n$ such that $P_n(Z)_i=W_i$ holds for every $i$ for which $W_i≠?$. Furthermore, if \begin{enumerate} \item \end{enumerate}$Z∼Bern(p)^n$, \item $W_S=P_n(Z)_S$, and \item $H(W_i∣W_<i)≤τ$ for every $iS$, $

then _Z[Z≠Z]≤τn$. $

Corollary

Proof ▶

BPD$ calls $RPD(W;n,(p,…,p))$, i.e., Lemma~ \ref{lem:c16-rpd-correctness} with $p_1==p_n=p$. The claimed output relation $P_n(Z)_i=W_i$ for $W_i≠?$ follows from part (1)/(2) of Lemma~ \ref{lem:c16-rpd-correctness} (taking $S={i:W_i≠?}$, so $W=W$ deterministically on the "given" coordinates by construction of $RPD$, and $W=P_n(Z)$ by part (1) of the lemma's proof). Under hypotheses (1)-(3), part (3) of Lemma~ \ref{lem:c16-rpd-correctness} directly gives $_Z[Z≠Z]≤τn$. For the runtime: $RPD$ satisfies the same divide-and-conquer recursion $T(n)=2T(n/2)+O(n)$ as the Basic Polar Encoder (Proposition~ \ref{prop:c16-encoder-runtime}), since computing $q_i=b^+(p_i,p_n/2+i)$ and $r_i=b^ℓ(p_i,p_n/2+i,X_i)$ for all $i=1,…,n/2$, as well as assembling $Z,W,ρ$, takes $O(n)$ time given that $b^+,b^ℓ$ are computable in $O(1)$ time (Proposition~ \ref{prop:c16-bias-functions}). Hence $T(n)=O(nn)$, as claimed. $

Proof ▶

16.5 Analysis: Speed of Polarization

16.5.1 Overview of Analysis

Write n=2^t$. For $i[n]$, $0≤j≤t$, let $Z_i^(0)=Z_i$, and for $1≤j≤t$ and $i$ such that $i$ and $i+2^t-j$ lie in the same block at level $j$, let \[ (Z_i^{(j)},Z_{i+2^{t-j}}^{(j)}) = P_2(Z_i^{(j-1)},Z_{i+2^{t-j}}^{(j-1)}) = (Z_i^{(j-1)}+Z_{i+2^{t-j}}^{(j-1)},\, Z_{i+2^{t-j}}^{(j-1)}). \] $

Definition 369 j$th-level siblings, Def.~ 16.5.1$

A pair (i,i’)$ are $j$th-level siblings$ if they are in the same block at the j$th level and $i’=i+2^t-j$. For $i[n=2^t]$ and $0≤j≤t$, let \[ S_{i,j} \stackrel{\text{def}}{=} \{ k\in [n]\mid i\equiv k\! \! \pmod{2^{t-j}}\} . \] $

Lemma 370 Eq. (16.12)

P_n(Z) = (Z_1^(t),…,Z_n^(t))$. $

Lemma

Proof ▶

By induction on

t$ (i.e., on $n=2^t$). For $t=0$ ($n=1$), $P_1(Z)=Z=(Z_1^(0))$ trivially. For $t≥1$, write $Z=(U,V)$ with $U=(Z_1,…,Z_n/2)$, $V=(Z_n/2+1,…,Z_n)$; by Definition~ \ref{def:c16-basic-polarizing-matrix}, $P_n(Z)=(P_n/2(U+V),P_n/2(V))$. By definition, for $i≤n/2$, $Z_i^(1)=Z_i+Z_i+n/2=U_i+V_i$, i.e., the first block at level $1$ consists of $U+V$; for $i>n/2$, $Z_i^(1)=V_i-n/2$ carries $V$ unchanged. Applying the inductive hypothesis (for $t-1$) to $U+V$ gives $P_n/2(U+V)=((U+V)_1^(t-1),…,(U+V)_n/2^(t-1))$, and since level-$j$ variables ($j≥1$) for indices $≤n/2$ are computed entirely from $Z^(1)|_{1,…,n/2}=U+V$ by the same recursive rule shifted by one level, $(U+V)_i^(t-1)=Z_i^(t)$ for $i≤n/2$. Symmetrically, $P_n/2(V)_i=Z_n/2+i^(t)$ for $i≤n/2$. Combining, $P_n(Z)=(Z_1^(t),…,Z_n^(t))$. $

Proof ▶

Definition 371 Local Polarization, Def. 16.5.2

A sequence of random variables

X_0,…,X_j,…$ with $X_j[0,1]$ is \emph{locally polarizing} if: \begin{enumerate} \item \textbf{(Unbiased)} For every \end{enumerate}$j$ and $a[0,1]$: $E[X_j+1∣X_j=a]=a$. \item \textbf{(Variance in the middle)} For every $τ>0$ there exists $θ=θ(τ)>0$ such that for all $j$: if $X_j(τ,1-τ)$ then $|X_j+1-X_j|≥θ$. \item \textbf{(Suction at the ends)} For every $c<∞$ there exists $τ=τ(c)>0$ such that (i) if $X_j≤τ$ then $[X_j+1≤X_j/c]≥1/2$; and (ii) if $1-X_j≤τ$ then $[1-X_j+1≤(1-X_j)/c]≥1/2$. $

The sequence is simple if for every sequence a_0,…,a_j$, conditioned on $X_0=a_0,…,X_j=a_j$, there are two values $a^+,a^ℓ$ such that $X_j+1$ equals $a^+$ with probability $1/2$ and $a^ℓ$ with probability $1/2$. $

Definition 372 (Polynomially) Strong Polarization, Def. 16.5.4

A sequence of random variables

X_0,X_1,…,X_t,…$ with $X_j[0,1]$ \emph{strongly polarizes} if for all $γ>0$ there exist $α<1$ and $β<∞$ such that for all $t$: $[X_t(γ^t,1-γ^t)]≤β·α^t$. $

Definition

16.5.2 Local Polarization

Proposition 373 Prop. 16.5.6

For every

1≤j≤t$ and $j$th-level siblings $i,i’$ with $i<i’$, the following hold: \begin{enumerate} \item \end{enumerate}$S_i,j=S_i’,j=S_i,j-1∪S_i’,j-1$ (a disjoint union). \item ${Z_k^(j)∣kS_i,j}$ is independent of ${Z_k^(j)∣kS_i,j}$. \item $H(Z_i^(j-1)∣Z_<i^(j-1)) = HZ_i^(j-1)∣Z^(j-1)_{kS_i,j-1,k<i} = HZ_i^(j-1)∣Z^(j-1)_{kS_i,j,k<i}$. \item $H(Z_i’^(j-1)∣Z_<i’^(j-1)) = HZ_i’^(j-1)∣Z^(j-1)_{kS_i’,j-1,k<i’} = HZ_i’^(j-1)∣Z^(j-1)_{kS_i,j,k<i’}$. \item $H(Z_i^(j)∣Z_<i^(j)) = HZ_i^(j)∣Z^(j)_{kS_i,j,k<i} = HZ_i^(j)∣Z^(j-1)_{kS_i,j,k<i}$. \item $H(Z_i’^(j)∣Z_<i’^(j)) = HZ_i’^(j)∣{Z_i^(j)}∪Z^(j)_{kS_i,j,k<i’} = HZ_i’^(j)∣{Z_i^(j)}∪Z^(j-1)_{kS_i,j,k<i’}$. $

Proposition

Proof ▶

Part (1) follows from the definitions of

S_i,j$ and of $j$th-level siblings: since $i’=i+2^t-j$, $i≡i’2^t-j$, giving $S_i,j=S_i’,j$. Also, $k_1≡k_22^t-j+1$ implies $k_1≡k_22^t-j$, so $S_i,j-1,S_i’,j-1⊆S_i,j$; and if $k≡i2^t-j+1$ then $ki’2^t-j$ and vice versa, so $S_i,j-1∩S_i’,j-1=∅$. Part (2) follows from Lemma~ \ref{lem:c16-siblings-determined}: the set ${Z_k^(j)∣kS_i,j}$ is determined entirely by ${Z_k∣kS_i,j}$ (and similarly for $kS_i,j$, which is determined by ${Z_k∣kS_i,j}$), and since the original $Z_k$'s ($k[n]$) are all independent, these two determined collections are independent. The first equality in part (3) follows immediately from part (2) (conditioning on variables independent of $Z_i^(j-1)$ and $Z_<i^(j-1)$ jointly does not change the entropy), and the second uses part (1) and the fact that $S_i,j-1$ and $S_i’,j-1$ are disjoint (the elements newly added going from $S_i,j-1$ to $S_i,j$ that are $<i$ lie in $S_i’,j-1$, and by part (2) do not affect the conditional entropy of $Z_i^(j-1)$). The first equality in part (4) is similar; the second uses additionally that $S_i’,j-1$ contains no elements between $i$ and $i’$, together with part (2) (which gives that $Z_i^(j-1)$ is independent of $Z_i’^(j-1)$ and of $Z^(j-1)_{kS_i,j-1,k<i}$), and part (1). The first equalities in parts (5) and (6) are similar to part (3), using in part (6) that ${kS_i’,j∣k<i’}={i}∪{kS_i,j∣k<i}$. The second equalities in parts (5) and (6) follow from Lemma~ \ref{lem:c16-siblings-bijection} (a one-to-one correspondence between the level-$(j-1)$ and level-$j$ variables indexed by ${kS_i,j∣k<i}$, so conditioning on one set is equivalent to conditioning on the other). $

Proof ▶

Lemma 374 Lem. 16.5.7

For every

i$ and $0≤j≤t$, the set ${Z_k^(j)∣kS_i,j}$ is determined completely by ${Z_k∣kS_i,j}$, and the $Z_k$'s (for $k[n]$) are all independent. $

Lemma

Proof ▶

The independence of the original

Z_k$'s is immediate, since $Z∼Bern(p)^n$. We prove the determinacy claim by induction on $j$. For $j=0$, $Z_k^(0)=Z_k$, so the claim is trivial. For $j≥1$: fix $i$ and let $i’=i+2^t-j$ be its $j$th-level sibling. Directly from the definition of $S_i,j$: $S_i,j=S_i,j-1⊔S_i’,j-1$ (a disjoint union — both sets consist of indices congruent to $i$ mod $2^t-j+1$ or mod $2^t-j+1$ shifted by $2^t-j$ respectively, together covering exactly the residue class mod $2^t-j$). By the recursive rule $(Z_k^(j),Z_k+2^t-j^(j))=P_2(Z_k^(j-1),Z_k+2^t-j^(j-1))$ (applied for each $kS_i,j-1$, pairing it with its counterpart in $S_i’,j-1$), every $Z_k^(j)$ with $kS_i,j$ is a function of ${Z_l^(j-1)∣lS_i,j-1∪S_i’,j-1}={Z_l^(j-1)∣lS_i,j}$. By the inductive hypothesis applied separately to $S_i,j-1$ and $S_i’,j-1$ (both of which are sets of the same form $S_k,j-1$ for a suitable representative $k$), this set is determined by ${Z_l∣lS_i,j-1}∪{Z_l∣lS_i’,j-1}={Z_l∣lS_i,j}$, completing the induction. $

Proof ▶

Lemma 375 Lem. 16.5.8

There is a one-to-one map from the variables

Z^(j-1)_{kS_i,j, k<i}$ to the variables $Z^(j)_{kS_i,j, k<i}$, and so conditioning on one set is equivalent to conditioning on the other. $

Lemma

Lemma 376 Lem. 16.5.9

(U,A)$ and $(V,B)$ are identically distributed, where $U=Z_i^(j-1)$, $V=Z_i’^(j-1)$, $A={Z_k^(j-1)∣kS_i,j-1,k<i}$ and $B={Z_k^(j-1)∣kS_i’,j-1,k<i’}$, for $j$th-level siblings $i<i’$. $

Lemma

Lemma 377 Lem. 16.5.10

Let

p_1,p_2[0,1/2]$ with $p_1<p_2$ and $τ(0,1/2)$, and let $p_1○p_2p_1(1-p_2)+p_2(1-p_1)$. Then: \begin{enumerate} \item \end{enumerate}$H(p_1○p_2)≥H(p_2)$. \item There exists $θ=θ(τ)>0$ such that if $H(p_1),H(p_2)(τ,1-τ)$ then $H(p_1○p_2)-H(p_2)≥θ$. \item If $H(p_1),H(p_2)≤τ$ then $H(p_1○p_2)≥(1-9/(1/τ))(H(p_1)+H(p_2))$. In particular, for every $c<∞$, if $τ≤2^-9c$ then $H(p_1)+H(p_2)-H(p_1○p_2)≤(H(p_1)+H(p_2))/c$. \item If $H(p_1),H(p_2)≥1-τ$ and $τ≤1-H(1/4)$ then $H(p_1○p_2)≥1-20τ(1-H(p_2))$. In particular, for every $c’<∞$, if $τ<1/(20c’)$ then $1-H(p_1○p_2)≤(1-H(p_2))/c’$. $

Lemma

Proof ▶

Part (1) is immediate from the monotonicity of

H(·)$ on $[0,1/2]$ (), since $p_2≤p_1○p_2≤1/2$ for $0≤p_1,p_2≤1/2$, so $H(p_1○p_2)≥H(p_2)$. For part (2): let $H^-1(x)=p[0,1/2]$ be the inverse of $H$ on $[0,1/2]$, and set $α=α(τ)=H^-1(τ)(1-2H^-1(1-τ))$, $β=β(τ)=2H^-1(1-τ)(1-H^-1(1-τ))$, and $γ=γ(τ)=((1-β)/β)$; note $α>0,β<1/2,γ>0$. We show $H(p_1○p_2)-H(p_2)≥αγ=:θ$. Since $p_1,p_2(H^-1(τ),H^-1(1-τ))$, \[ p_1\circ p_2-p_2 = p_2(1-2p_1)+p_1-p_2 = p_1(1-2p_2)\ge H^{-1}(\tau )(1-2H^{-1}(1-\tau ))=\alpha . \] By elementary calculus (writing $H’(q)=((1-q)/q)$ ), $_p_2≤q≤p_1○p_2{H’(q)}≥H’(β)=γ$ (using that the maximum value of $q=p_1○p_2$ over the range is at most $β$, achieved using $p_1,p_2≤H^-1(1-τ)$, and $H’$ is decreasing). Hence, by the mean value theorem, $H(p_1○p_2)-H(p_2)≥(p_1○p_2-p_2)·_p_2≤q≤p_1○p_2{H’(q)}≥αγ=θ$. For part (3): recall $H(p)≥p(1/p)$, and for $p≤1/2$, $-(1-p)(1-p)≤(1/2)(1-p)(p+p^2)≤(1/2)p≤2p$, so \[ p\log (1/p)\le H(p)\le p\log (1/p)+2p, \qquad (p\le 1/2). \tag {} \]$*

\[ \]

We have

\begin{align*} H(p_1)+H(p_2)-H(p_1\circ p_2) & \le p_1(\log (1/p_1)+2)+p_2(\log (1/p_2)+2)-(p_1\circ p_2)\log (1/(p_1\circ p_2))\\ & \le p_1(\log (1/p_1)+2)+p_2(\log (1/p_2)+2)-(p_1+p_2-2p_1p_2)\log (1/(2p_2))\\ & = p_1\log (2p_2/p_1)+p_2\log (2p_2/p_2)+2p_1p_2\log (1/(2p_2))+2(p_1+p_2)\\ & \le p_1\log (p_2/p_1)+2p_1p_2\log (1/(p_2))+6p_2\\ & \le 2p_1H(p_2)+7p_2 \le 2p_1H(p_2)+7H(p_2)/\log (1/p_2) \le 9H(p_2)/\log (1/\tau ), \end{align*}

using (*)$ throughout, $p_1○p_2≤p_1+p_2≤2p_2$, and (from $(*)$ and $τ≥H(p_2)$) that $p_2(1/p_2)≤τ$ hence $p_2≤τ$ and $(1/p_2)≥(1/τ)$, and similarly $p_1≤τ≤1/(1/τ)$ (using $τ<1/2$). The "in particular" follows by taking $τ≤2^-9c$. For part (4): let $H(p_i)=1-y_i$ and $p_i=1/2-x_i$ for $i=1,2$. Since $τ≤1-H(1/4)$, $H(p_i)≥1-τ$ gives $x_i≤1/4$. By a standard quadratic bound on $H$ near $1/2$ (), from $1-τ≥H(1/4)$ and $1-y_i=H(p_i)≥1-τ$ we get $1-x_i^2≥1-y_i$, i.e., $x_i≤y_i$. Also $p_1○p_2=1/2-2x_1x_2$, so \[ H(p_1\circ p_2) = H(1/2-2x_1x_2) \ge 1-20(x_1x_2)^2 \ge 1-20y_1y_2 = 1-20(1-H(p_1))(1-H(p_2)) \ge 1-20\tau (1-H(p_2)), \] using $x_i≤y_i$ and $1-H(p_1)≤τ$. The "in particular" follows by taking $τ<1/(20c’)$. $

Proof ▶

Lemma 378 Lem. 16.5.11

(U,A)$ and $(V,B)$ are identically distributed and independent random variables with $U,VF_2$ and $H(U∣A)=H(V∣B)=H(p)$, then: \begin{enumerate} \item For every \end{enumerate}$τ>0$ there exists $θ>0$ such that if $H(p)(τ,1-τ)$ then $H(U+V∣A,B)≥H(p)+θ$. \item For every $c<∞$ there exists $τ>0$ such that if $H(p)≤τ$ then $H(U+V∣A,B)≥(2-1/c)H(p)$, and if $H(p)≥1-τ$ then $H(U+V∣A,B)≥1-(1-H(p))$. $

Lemma

Proof ▶

Let

p_a=[U=1∣A=a]$, so $H(p)=H(U∣A)=E_A[H(p_A)]$; similarly let $q_b=[V=1∣B=b]$. \textbf{Part (1).} Let $θ(·)$ be as in Lemma~ \ref{lem:c16-po-bound}(2), let $θ_1=θ(τ/2)$, and set $θ={θ_1/9,τ^2/36}$. Let $r_1=_A[H(p_A)≤τ/2]$, $r_2=_A[H(p_A)(τ/2,1-τ/2)]$, $r_3=_A[H(p_A)≥1-τ/2]$; since $(U,A),(V,B)$ are identically distributed the same holds replacing $p_A$ by $q_B$. As $r_1+r_2+r_3=1$, one of them is $≥1/3$. If $r_2≥1/3$: with probability $≥1/9$, both $H(p_A),H(q_B)(τ/2,1-τ/2)$. For such $a,b$, $U+V∣(A=a,B=b)∼Bern(p_a○q_b)$ and by Lemma~ \ref{lem:c16-po-bound}(2), $H(U+V∣A=a,B=b)≥H(p_a)+θ_1$; and by Lemma~ \ref{lem:c16-po-bound}(1), $H(U+V∣A=a,B=b)≥H(p_a)$ for all $a,b$. Hence $H(U+V∣A,B)≥E_A[p_A]+θ_1=H(p)+θ_1/9$. If $r_3≥1/3$: since $1-τ≥H(p)≥(1-r_3-_A[H(p_A)≤1-τ])(1-τ)+r_3(1-τ/2)$, rearranging gives $_A[H(p_A)≤1-τ]≥r_3τ/(2(1-τ))≥τ/6$. So with probability $≥τ/18$ we have $A$ with $H(p_A)≤1-τ$ and $B$ with $H(q_B)≥1-τ/2$; for such $a,b$, by Lemma~ \ref{lem:c16-po-bound}(1), $H(U+V∣A=a,B=b)≥H(q_b)≥H(p_a)+τ/2$. Hence $H(U+V∣A,B)≥E_A[H(p_A)]+τ^2/36=H(p)+τ^2/36$. The case $r_1≥1/3$ is analogous. In all cases $H(U+V∣A,B)≥H(p)+θ$. \textbf{Part 2 (case $H(p)\le \tau $).} Given $c<∞$, let $τ’=τ(4c)$ be the constant from Lemma~ \ref{lem:c16-po-bound}(3) for constant $4c$, so $H(p_1),H(p_2)≤τ’$ implies $H(p_1○p_2)≥(1-)(H(p_1)+H(p_2))$. Let $τ=τ’/(2c)$ and assume $H(p)≤τ$. Let $α=_A[H(p_A)≥τ’]$; by Markov's inequality, $α≤1/(2c)$. Let $γ=E_A[H(p_A)∣H(p_A)≥τ’]$ and $δ=E_A[H(p_A)∣H(p_A)<τ’]$, so $H(p)=γα+δ(1-α)$. Splitting on whether $H(p_A)≥τ’$ and $H(q_B)≥τ’$ (four cases $S_11,S_10,S_01,S_00$): on $S_10∪S_01∪S_11$ (probability $2α-α^2$), by Lemma~ \ref{lem:c16-po-bound}(1) applied to whichever of $p_a,q_b$ has entropy $≥τ’$, $E[H(U+V∣A,B)∣(A,B)S_10∪S_01∪S_11]≥γ$; on $S_00$ (probability $(1-α)^2$), by Lemma~ \ref{lem:c16-po-bound}(3), $E[H(U+V∣A,B)∣S_00]≥(1-)·2δ$. Combining, \[ H(U+V\mid A,B) \ge (2\alpha -\alpha ^2)\gamma +(1-\alpha )^2\Big(1-\frac1{4c}\Big)2\delta = 2H(p)-\alpha H(p)-\frac1{2c}(1-\alpha )((1-\alpha )\delta ), \] using $H(p)=γα+δ(1-α)$. Using $(1-α)δ≤H(p)$ and $α≤1/(2c)$, this gives $H(U+V∣A,B)≥(2-1/c)H(p)$. The case $H(p)≥1-τ$ is symmetric (interchanging the roles of $0$ and $1$ throughout). $

Proof ▶

Lemma 379 Lem. 16.5.12

(U,A)$ and $(V,B)$ are identically distributed, in the setting of the proof of Lemma~ \ref{lem:c16-local-polarization} below. $

Lemma

Lemma 380 Local Polarization, Lem. 16.5.3

The sequence

X_0,…,X_j,…$ with $X_j=H(Z_i^(j)∣Z_<i^(j))$ (where $i$ is drawn uniformly from $[n]$, independently for each $j$) is a simple and locally polarizing sequence. $

Lemma

Proof ▶

Fix

j$ and condition on $X_j=H(p)$. For any $(j+1)$-level siblings $i<i’$, drawing $I$ uniformly from $[n]$ is, conditioned on $I{i,i’}$, equally likely to give $I=i$ or $I=i’$ (since ${i,i’}$ partition $[n]$ over all sibling pairs as $I$ ranges over $[n]$). Let $U=Z_i^(j)$, $V=Z_i’^(j)$, $A={Z_k^(j)∣k<i,kS_i,j}$, $B={Z_k^(j)∣k<i’,kS_i’,j}$. If $I=i$, then $X_j=H(U∣A)$ (Proposition~ \ref{prop:c16-siblings-structure}(3)); if $I=i’$, then $X_j=H(V∣B)$ (Proposition~ \ref{prop:c16-siblings-structure}(4)). Furthermore, if $I=i$, $X_j+1=H(U+V∣A,B)$ (using the recursive definition and Proposition~ \ref{prop:c16-siblings-structure}(1),(5)); and if $I=i’$, $X_j+1=H(V∣A,B,U)$ (using Proposition~ \ref{prop:c16-siblings-structure}(1),(6), and the fact that $V∣U$ and $V∣U+V$ have the same distribution, since $(U,V)↦(U+V,V)$ is a bijection). Since these conditions hold for each conditioning $I{i,i’}$, and the pairs ${i,i’}$ partition $[n]$, they hold unconditionally for a uniformly random $I$. The unbiasedness condition, $E[X_j+1∣X_j=a]=a$, follows from the fact that there is a bijection between $(U,V)$ and $(U+V,V)$, so \[ H(U+V\mid A,B)+H(V\mid A,B,U) = H(U\mid A)+H(V\mid B) \] (both sides equal $H(U,V∣A,B)$ by the chain rule, using independence of $(U,A)$ and $(V,B)$ established by Lemma~ \ref{lem:c16-uv-iid}). Writing $2a$ for the right-hand side and $2X_j+1$ for the left-hand side (since with probability $1/2$ each, $X_j+1$ takes one of these two values), this gives the unbiasedness condition. The variance-in-the-middle condition follows from Lemma~ \ref{lem:c16-conditioned-local-polarization}(1) (applied using the identical-distribution fact of Lemma~ \ref{lem:c16-uv-iid}), and the suction-at-the-ends condition follows from Lemma~ \ref{lem:c16-conditioned-local-polarization}(2). Simplicity holds since with probability $1/2$, $X_j+1=H(U+V∣A,B)$, and with probability $1/2$, $X_j+1=H(V∣A,B,U)$ — exactly two values, each with probability $1/2$, as required by the definition of a simple sequence. $

Proof ▶

16.5.3 Local vs. Global Polarization

Lemma 381 Lem. 16.5.13

Let

γ>0$ be given and let $c={4,γ^8/16}$; let $τ=τ(c)$ be from condition (3) of Definition~ \ref{def:c16-local-polarization} and $θ={1-1/c,θ(τ)}$ from condition (2). For a simple, locally polarizing sequence $X_0,…,X_t,…$, let $φ_j={X_j,1-X_j}$. Then $E[φ_j+1∣φ_j=a]≤(1-θ^2/16)·a$. $

Lemma

Proof ▶

Without loss of generality assume

X_j≤1/2$ (the case $X_j>1/2$ is symmetric, by swapping the roles of $X_j$ and $1-X_j$), so $a=φ_j=X_j$ and $X_j=a^2$. By simplicity and unbiasedness, there is $δ$ such that $X_j+1=a^2+δ$ with probability $1/2$ and $a^2-δ$ with probability $1/2$. If $X_j≤τ$, by unbiasedness and suction at the ends, $δ≥(1-1/c)a^2$; if $X_j>τ$, by variance in the middle, $δ≥θ(τ)≥θ(τ)a^2$ (using $a^2=X_j≤1$). In either case $δ≥θa^2$. We bound \begin{align*} \mathbb {E}[\phi _{j+1}] & \le \mathbb {E}[\sqrt{X_{j+1}}] = \tfrac 12\sqrt{a^2+\delta }+\tfrac 12\sqrt{a^2-\delta } = \frac{a}{2}\Big(\sqrt{1+\delta /a^2}+\sqrt{1-\delta /a^2}\Big)\\ & \le \frac{a}{2}\Big(1+\frac{\delta }{2a^2}-\frac{\delta ^2}{16a^4}+1-\frac{\delta }{2a^2}-\frac{\delta ^2}{16a^4}\Big) = a\Big(1-\frac{\delta ^2}{16a^4}\Big) \le a\Big(1-\frac{\theta ^2}{16}\Big), \end{align*} where the second inequality uses $1±x≤1±x/2-x^2/16$ for $|x|≤1$ () and the last uses $δ≥θa^2$. $

Proof ▶

Lemma 382 Weakly Polynomial Polarization, Lem. 16.5.14

There exists

α_1<1$ such that for all even $t$, $[X_t/2(α_1^t,1-α_1^t)]≤α_1^t$. $

Lemma

Proof ▶

By induction on

j$ using Lemma~ \ref{lem:c16-phi-drop}, $E[φ_j]≤(1-θ^2/16)^j$ (base case $φ_0≤1$). Let $β=1-θ^2/16$ and $α_1=β<1$; then $E[φ_t/2]≤β^t$. By Markov's inequality, $[φ_t/2≥α_1^t]≤β^t/α_1^t=α_1^t$. If $φ_t/2≤α_1^t$ then $X_t/2(α_1^2t,1-α_1^2t)$, hence $X_t/2(α_1^t,1-α_1^t)$; so $[X_t/2(α_1^t,1-α_1^t)]≤α_1^t$. $

Proof ▶

Lemma 383 Lem. 16.5.15, cf. Doob’s martingale inequality

For a simple, locally polarizing sequence with

τ=τ(c)$ as in Lemma~ \ref{lem:c16-phi-drop}: for all $λ>0$, if $X_0≤λ$, then the probability there exists $j>0$ such that $X_j≥τ$ is at most $λ/τ$. Similarly if $X_0≥1-λ$, then the probability there exists $j>0$ such that $X_j≤1-τ$ is at most $λ/τ$. $

Lemma

Proof ▶

We give the proof for

X_0≤λ$; the case $X_0≥1-λ$ is symmetric. We show $[_0≤t≤T{X_t}≥τ]≤λ/τ$ for every integer $T>0$. Define $Y_0=X_0$ and, for $i≥1$, $Y_i=X_i$ if $Y_i-1<τ$, else $Y_i=Y_i-1$. By simplicity of $X_i$'s, $E[Y_i∣Y_i-1=a]=a$ for every $i,a$ (unbiasedness applies once $Y_i-1<τ$, and is trivial once $Y_i-1≥τ$). Also $_0≤t≤T{X_t}≥τ$ iff $Y_T≥τ$. Thus, by Markov's inequality, \[ \Pr \Big[\max _{0\le t\le T}\{ X_t\} \ge \tau \Big] = \Pr [Y_T\ge \tau ] \le \frac{\mathbb {E}[Y_T]}{\tau }, \] and $E[Y_T]=E[Y_T-1]==E[Y_0]=E[X_0]≤λ$, yielding the lemma. $

Proof ▶

Lemma 384 Weak to Strong Polarization, Lem. 16.5.16

There exists

α_2<1$ such that for every $λ>0$, if $X_0(λ,1-λ)$, then the probability that $X_t/2(γ^t,1-γ^t)$ is at most $λ/τ+α_2^t$ (where $τ=τ(c)$, $c≥4$, is as in the construction preceding Lemma~ \ref{lem:c16-phi-drop}, and $γ$ is the target parameter). $

Lemma

Proof ▶

We consider

X_0≤λ$ (the case $X_0≥1-λ$ is symmetric). Let $Z_i=1$ if $X_i<X_i-1$ and $0$ otherwise; by simplicity, the $Z_i$'s are i.i.d.\ and equal to $1$ with probability $1/2$. Let $Z=∑_i=1^t/2Z_i$. Let $E_1$ be the event that there exists $1≤j≤t/2$ with $X_j≥τ$, and $E_2$ the event that $Z≤t/8$. By Lemma~ \ref{lem:c16-doob}, $E_1$ has probability at most $λ/τ$; by a Chernoff bound, $E_2$ has probability at most $α_2^t$ for some $α_2<1$. If $E_1$ does not occur, then using simplicity: with probability $1/2$, $X_1≤(1/c)X_0≤X_0/4$ (suction at the ends), and with probability $1/2$, $X_1≤2X_0$ (unbiasedness bounds the other branch); iterating, $X_t/2≤2^t/2·c^-ZX_0≤2^t/2c^-Z$. If additionally $E_2$ does not occur, $X_t/2≤2^t/2·c^-t/8≤γ^t$, by the choice $c=(2/γ^2)^4$. Combining, if neither $E_1$ nor $E_2$ occurs then $X_t/2≤γ^t$, giving $[X_t/2≥γ^t] ≤[E_1]+[E_2] ≤λ/τ+α_2^t$, and (for $t$ large enough that $γ^t<1-γ^t$) this bounds $[X_t/2(γ^t,1-γ^t)]$. $

Proof ▶

Lemma 385 Local vs. Global Polarization, Lem. 16.5.5

If a sequence

X_0,…,X_t,…$ with $X_t[0,1]$ is simple and locally polarizing, then it is also strongly polarizing. $

Lemma

Proof ▶

Given

γ>0$, let $α_1<1$ be the constant from Lemma~ \ref{lem:c16-weak-poly-polarization} and $α_2<1$ the constant from Lemma~ \ref{lem:c16-weak-to-strong}. Set $α={α_1,α_2}<1$ and $β=2+1/τ<∞$ (with $τ=τ(c)$ as in Lemma~ \ref{lem:c16-weak-to-strong}, $c={4,γ^8/16}$). Let $E$ be the event that $X_t/2(α_1^t,1-α_1^t)$; by Lemma~ \ref{lem:c16-weak-poly-polarization}, $[E]≤α_1^t$. Conditioned on $E$ not occurring, applying Lemma~ \ref{lem:c16-weak-to-strong} with $λ=α_1^t$ gives $[X_t(γ^t,1-γ^t)]≤α_1^t/τ+α_2^t$. Combining, \[ \Pr [X_t\in (\gamma ^t,1-\gamma ^t)] \le \alpha _1^t+\frac{\alpha _1^t}{\tau }+\alpha _2^t \le \alpha ^t+\frac{\alpha ^t}{\tau }+\alpha ^t = \beta \cdot \alpha ^t, \] as desired. $

Proof ▶

16.7 Summary and Additional Information

The reduction of error-correction to compression (Proposition 353) and the idea of using polarization to build a compression scheme (Definition 355, Theorem 360) together give a complete construction. One remaining subtlety, not addressed in the main construction, is that of efficiently identifying the set S$ of unpredictable columns; a polynomial-time algorithm for this (which also generalizes the strong-polarization analysis of this chapter beyond the binary case) is known but out of scope here. \begin{thmenv}[Mrs.\ Gerber's Lemma consequence, Lem.~ 16.7.1] \label{lem:c16-gerber} \notready \uses{def:c16-entropy, def:c16-conditional-entropy} If \end{thmenv} $(U,A)$ and $(V,B)$ are independent and $U,V$ are binary-valued random variables with $H(U∣A)=H(p)$ and $H(V∣B)=H(q)$, then \[ H(U+V\mid A,B) \ge H(p(1-q)+q(1-p)). \] $

Lemma