5 A simple example and sequential decoding

Definition 24 The even-weight parity-check code \(\mathcal{P}\)

Let \(\mathcal{P}\) be the code consisting of all even-weight words of length \(d\), i.e. those \((x_1,\dots ,x_d) \in \{ 0,1\} ^d\) with \(\sum _i x_i = 0 \pmod2\). It has rate \((d-1)/d\) and minimum relative distance \(2/d\).

Proposition 25 Section V example: \(\mathcal{C}(B,\mathcal{P})\)

Let \(B\) be a graph with expansion greater than \(c/2\) on sets of size at most \(\alpha n\), and let \(\mathcal{P}\) be the even-weight code of length \(d\). Then the parity-check matrix of \(\mathcal{C}(B,\mathcal{P})\) is the adjacency matrix of \(B\), and \(\mathcal{C}(B,\mathcal{P})\) has rate \(1-c/d\) and minimum relative distance at least \(\alpha \).

Proof ▶

Each constraint requires that the sum of its \(d\) neighbouring variables be even, i.e. that the restriction lie in \(\mathcal{P}\); the matrix of these constraints is exactly the bipartite adjacency matrix of \(B\). The rate \(1-c/d\) follows by the count of Theorem 23 with \(r=(d-1)/d\) (giving \(cr-(c-1)=1-c/d\)). The distance bound is the distance part of Theorem 23; with \(\mathcal{P}\) of relative distance \(2/d\) the required expansion factor \(c/(d\epsilon )=c/2\), matching the hypothesis.

Definition 26 Satisfied and unsatisfied constraints

A constraint \(C_i\) is satisfied by a word \((x_1,\dots ,x_n)\) if its restriction \((x_{b(i,1)},\dots ,x_{b(i,d)})\) is a codeword of \(\mathcal{S}\); otherwise it is unsatisfied. For the even-weight code \(\mathcal{P}\), a constraint is satisfied iff the sum of its neighbouring variables is even.

Definition 27 Simple Sequential Decoding Algorithm

Repeat the following until no such variable remains: if there is a variable that is in more unsatisfied than satisfied constraints, then flip its value (a \(0\) becomes \(1\) and a \(1\) becomes \(0\)). On termination, if all constraints are satisfied output the current word; otherwise report “failed to decode”.

Lemma 28 Lemma 9: sequential decoding runs in linear time

For \(c\) and \(d\) constant, the implementation of the simple sequential decoding algorithm (Fig. 2 of the paper) runs in linear time on a Pointer Machine, and on a RAM under the uniform cost model.

Proof ▶

The implementation runs in two phases. Set-up (linear time): partition the variables into sets \(S_0,\dots ,S_c\), where \(S_i\) holds the variables that appear in exactly \(i\) unsatisfied constraints; computing for each constraint whether it is satisfied and bucketing each variable takes linear time. On a RAM this is immediate; on a Pointer Machine it is linear because the input graph lets one find a variable’s neighbours in constant time rather than the logarithmic time needed for arbitrary memory access.

Loop (constant time per iteration): repeatedly take a variable \(v\) from the highest nonempty \(S_i\) with \(i {\gt} c/2\), flip \(v\), update the status of each of the \(c\) constraints containing \(v\), and move each affected variable to its new bucket. Only a constant number (\(c,d\) constant) of constraints, variables and pointers are touched, all linked in the graph, so each iteration costs \(O(1)\). Each flip strictly decreases the number of unsatisfied constraints, and there are only linearly many constraints, so the loop runs a linear number of times. Hence the total time is linear.

Theorem 29 Theorem 10: correctness of sequential decoding

Let \(B\) be a \((c,d,\alpha ,3c/4)\)-expander and let \(\mathcal{P}\) be the code of all even-weight words of length \(d\). Then the simple sequential decoding algorithm corrects an \(\alpha /2\) fraction of error from the code \(\mathcal{C}(B,\mathcal{P})\).

Proof ▶

Suppose the input is within distance \(\alpha n/2\) of a codeword; call the variables in which the input differs from that codeword corrupt. Track a state \((v,u)\), where \(v\) is the number of corrupt variables and \(u\) the number of unsatisfied constraints; \(u\) is a potential we drive to zero.

First suppose \(v \le \alpha n\). Let \(s\) be the number of satisfied neighbours of the corrupt variables. By expansion of the \(\le \alpha n\) corrupt variables (factor \(3c/4\)), the total number of neighbouring constraints satisfies \(u + s {\gt} (3/4)cv\). Each satisfied neighbour shares at least two edges with the corrupt variables and each unsatisfied neighbour at least one, so \(cv \ge u + 2s\). Combining,

\[ u {\gt} cv/2. \tag {1} \]

Since each unsatisfied constraint shares at least one of the \(cv\) edges leaving the corrupt variables, inequality (1) shows more than half those edges enter unsatisfied constraints; hence some corrupt variable has more unsatisfied than satisfied neighbours, and the algorithm will flip some variable while \(v \le \alpha n\).

It remains to show \(v\) never exceeds \(\alpha n\), given \(v \le \alpha n/2\) initially. The only way to fail is to flip so many uncorrupt variables that \(v\) grows past \(\alpha n\). If this happened there would be a time with \(v=\alpha n\), and then (1) gives \(u {\gt} c\alpha n/2\). This is a contradiction, because \(u\) is initially at most \(c\alpha n/2\) (the input is within distance \(\alpha n/2\), each corrupt variable touches \(c\) constraints) and \(u\) only decreases. So \(v \le \alpha n\) throughout, the algorithm keeps flipping until \(u=0\), and it outputs the codeword. Linear running time is Lemma 28.