1 The Fundamental Question

Definition 1 Code, Def. 1.2.1

A code of block length \(n\) over an alphabet \(\Sigma \) is a subset \(C \subseteq \Sigma ^n\). We use \(q\) to denote the alphabet size \(|\Sigma |\) (which need not be a constant and may depend on \(n\)).

Equivalently, writing \(|C| = M\), we may view \(C\) as a mapping

\[ C : [M] \to \Sigma ^n, \]

where \([M] \overset {\mathrm{def}}{=} \{ 1,2,\dots ,M\} \); this is the convention we use whenever we speak of “the codeword corresponding to message \(\mathbf m\),” written \(C(\mathbf m)\).

Definition 2 Dimension of a code, Def. 1.2.3

Given a code \(C \subseteq \Sigma ^n\) with \(|\Sigma |=q\), its dimension is

\[ k \overset {\mathrm{def}}{=} \log _q |C|. \]

Construction 3 Parity code \(C_\oplus \)

The parity code \(C_\oplus \) is the binary code (\(\Sigma =\{ 0,1\} \)) of block length \(5\) with message length \(4\) defined by

\[ C_\oplus (x_1,x_2,x_3,x_4) = (x_1,x_2,x_3,x_4,\; x_1\oplus x_2\oplus x_3 \oplus x_4), \]

where \(\oplus \) denotes XOR. That is, \(C_\oplus \) appends the parity of the message bits to the message. It has parameters \(q=2\), \(k=4\), \(n=5\), \(R=4/5\).

Construction 4 Repetition code \(C_{3,rep}\)

The repetition code \(C_{3,rep}\) is the binary code of block length \(12\) with message length \(4\) defined by

\[ C_{3,rep}(x_1,x_2,x_3,x_4) = (x_1,x_1,x_1,\, x_2,x_2,x_2,\, x_3,x_3,x_3,\, x_4,x_4,x_4). \]

It has parameters \(q=2\), \(k=4\), \(n=12\), \(R=1/3\).

Definition 5 Rate of a code, Def. 1.2.4

The rate of a code with dimension \(k\) and block length \(n\) is

\[ R \overset {\mathrm{def}}{=} \frac{k}{n}. \]

Since \(k \le n\) (and we always assume \(k{\gt}0\), \(n{\lt}\infty \)), we have \(0 {\lt} R \le 1\).

Definition 6 Encoding function, Def. 1.3.1

Let \(C \subseteq \Sigma ^n\). An encoding function for \(C\) is an injective map \(E : [|C|] \to \Sigma ^n\) whose image is \(C\).

Definition 7 Decoding function, Def. 1.3.2

Let \(C \subseteq \Sigma ^n\) be a code. A mapping \(D : \Sigma ^n \to [|C|]\) is called a decoding function for \(C\).

Definition 8 Hamming distance, Def. 1.3.3

Given two vectors \(\mathbf u, \mathbf v \in \Sigma ^n\), the Hamming distance between \(\mathbf u\) and \(\mathbf v\), denoted \(\Delta (\mathbf u,\mathbf v)\), is the number of positions in which \(\mathbf u\) and \(\mathbf v\) differ. The relative Hamming distance is

\[ \delta (\mathbf u,\mathbf v) \overset {\mathrm{def}}{=} \frac{1}{n}\Delta (\mathbf u,\mathbf v) \in [0,1]. \]

Definition 9 \(t\)-Error Channel, Def. 1.3.4

An \(n\)-symbol \(t\)-Error Channel over the alphabet \(\Sigma \) is a function \(\mathrm{Ch} : \Sigma ^n \to \Sigma ^n\) that satisfies \(\Delta (\mathbf v, \mathrm{Ch}(\mathbf v)) \le t\) for every \(\mathbf v \in \Sigma ^n\).

Definition 10 Error Correcting Code, Def. 1.3.5

Let \(C \subseteq \Sigma ^n\) be a code and let \(t \ge 1\) be an integer. \(C\) is said to be a \(t\)-error-correcting code if there exists a decoding function \(D\) such that for every message \(\mathbf m \in [|C|]\) and every \(t\)-error channel \(\mathrm{Ch}\) we have \(D(\mathrm{Ch}(C(\mathbf m))) = \mathbf m\).

Definition 11 Error detection code, Def. 1.3.6

Let \(C \subseteq \Sigma ^n\) be a code and let \(t \ge 1\) be an integer. \(C\) is said to be a \(t\)-error-detecting code if there exists a detecting procedure \(D\) such that for every message \(\mathbf m\) and every received vector \(\mathbf v \in \Sigma ^n\) satisfying \(\Delta (C(\mathbf m),\mathbf v) \le t\), it holds that \(D\) outputs \(1\) if \(\mathbf v = C(\mathbf m)\) and \(0\) otherwise; i.e.

\[ D(\mathbf v) = \begin{cases} 1 & \text{if } \mathbf v = C(\mathbf m) \\ 0 & \text{otherwise.}\end{cases} \]

Definition 12 \(t\)-Erasure Channel, Def. 1.3.7

An \(n\)-symbol \(t\)-Erasure Channel over the alphabet \(\Sigma \) is a function \(\mathrm{Ch} : \Sigma ^n \to (\Sigma \cup \{ ?\} )^n\) that satisfies \(\Delta (\mathbf v,\mathrm{Ch}(\mathbf v)) \le t\) for every \(\mathbf v \in \Sigma ^n\) (viewing both arguments of \(\Delta (\cdot ,\cdot )\) as elements of \((\Sigma \cup \{ ?\} )^n\)), and such that for every \(i\in [n]\) with \(\mathbf v_i \neq \mathrm{Ch}(\mathbf v)_i\) we have \(\mathrm{Ch}(\mathbf v)_i = {?}\). A coordinate \(i\) with \(\mathrm{Ch}(\mathbf v)_i = {?}\) is called an erasure.

Definition 13 Erasure Correcting Code, Def. 1.3.8

Let \(C \subseteq \Sigma ^n\) be a code and let \(t \ge 1\) be an integer. \(C\) is said to be a \(t\)-erasure-correcting code if there exists a decoding function \(D\) such that for every message \(\mathbf m \in [|C|]\) and every \(t\)-erasure channel \(\mathrm{Ch}\) we have \(D(\mathrm{Ch}(C(\mathbf m))) = \mathbf m\).

Proposition 14 Prop. 1.3.9

\(C_{3,rep}\) is a \(1\)-error-correcting code.

Proof ▶

Consider the decoding function that, given a received word \(\mathbf y \in \{ 0,1\} ^{12}\), splits it into four consecutive blocks \(\mathbf y_1,\mathbf y_2,\mathbf y_3,\mathbf y_4\) of three bits each and outputs, for each block, the majority bit among its three bits as the corresponding message bit.

Suppose a codeword \(C_{3,rep}(x_1,x_2,x_3,x_4)\) is transmitted over a \(1\)-error channel, i.e. the total number of bit flips among the \(12\) transmitted bits is at most \(1\). Since a single error can affect at most one of the four (disjoint) three-bit blocks, at least three of the four blocks are received with no error at all, and hence all three bits in each such block equal the corresponding \(x_i\), so their majority is \(x_i\). The (at most one) remaining block has at most one of its three bits flipped from \(x_i\); since the other two bits of that block still equal \(x_i\), the majority of the block is still \(x_i\). Hence in all cases the decoding function recovers \((x_1,x_2,x_3,x_4)\) exactly, so \(C_{3,rep}\) is \(1\)-error-correcting.

Algorithm 15 Error Detector for Parity Code, Alg. 1.3.1

Input: received word \(\mathbf y = (y_1,y_2,y_3,y_4,y_5)\). Output: \(1\) if \(\mathbf y \in C_\oplus \) and \(0\) otherwise.

\(b \leftarrow y_1 \oplus y_2 \oplus y_3 \oplus y_4 \oplus y_5\).
Return \(1 \oplus b\).

Proposition 16 Prop. 1.3.10

The parity code \(C_\oplus \) can detect an odd number of errors: for every codeword \(\mathbf c \in C_\oplus \) and every received word \(\mathbf y \in \{ 0,1\} ^5\) with \(\Delta (\mathbf c,\mathbf y)\) odd, Algorithm 15 on input \(\mathbf y\) outputs \(0\).

Proof ▶

Write \(\mathbf c = (x_1,x_2,x_3,x_4,x_1\oplus x_2\oplus x_3\oplus x_4)\). The XOR of the five coordinates of \(\mathbf c\) is \(x_1\oplus x_2 \oplus x_3 \oplus x_4 \oplus (x_1\oplus x_2\oplus x_3 \oplus x_4) = 0\). If \(\mathbf y = \mathbf c \oplus \mathbf e\) (coordinatewise XOR) for an error pattern \(\mathbf e \in \{ 0,1\} ^5\) with \(\mathrm{wt}(\mathbf e) = t = \Delta (\mathbf c,\mathbf y)\) non-zero flipped positions, then the XOR of the coordinates of \(\mathbf y\) equals the XOR of the coordinates of \(\mathbf c\) XORed with the XOR of the coordinates of \(\mathbf e\), i.e.

\[ b = y_1\oplus \cdots \oplus y_5 = 0 \oplus (e_1 \oplus \cdots \oplus e_5). \]

The XOR of a \(0/1\)-vector’s coordinates equals the parity of its number of \(1\)’s, so \(b\) equals \(t \bmod 2\). If \(t\) is odd then \(b=1\) and the algorithm returns \(1\oplus b = 0\), as required.

Definition 17 Minimum distance, Def. 1.4.1

Let \(C \subseteq \Sigma ^n\). The minimum distance (or just distance) of \(C\), denoted \(\Delta (C)\), is

\[ \Delta (C) = \min _{\mathbf c_1 \neq \mathbf c_2 \in C} \Delta (\mathbf c_1,\mathbf c_2). \]

The relative minimum distance of \(C\) is

\[ \delta (C) = \min _{\mathbf c_1 \neq \mathbf c_2 \in C} \delta (\mathbf c_1,\mathbf c_2). \]

Definition 18 Maximum Likelihood Decoding

Given a code \(C \subseteq \Sigma ^n\), the maximum likelihood decoding (MLD) function \(D_{MLD} : \Sigma ^n \to C\) outputs, for every received word \(\mathbf y \in \Sigma ^n\), a codeword closest to \(\mathbf y\) in Hamming distance (with ties broken arbitrarily):

\[ D_{MLD}(\mathbf y) = \operatorname *{arg\, min}_{\mathbf c \in C} \Delta (\mathbf c,\mathbf y). \]

Algorithm 19 Naive Maximum Likelihood Decoder, Alg. 1.4.1

Input: received word \(\mathbf y \in \Sigma ^n\). Output: \(D_{MLD}(\mathbf y)\).

Pick an arbitrary \(\mathbf c \in C\) and set \(\mathbf z \leftarrow \mathbf c\).
For every \(\mathbf c' \in C\) with \(\mathbf c' \neq \mathbf c\): if \(\Delta (\mathbf c',\mathbf y) {\lt} \Delta (\mathbf z,\mathbf y)\) then \(\mathbf z \leftarrow \mathbf c'\).
Return \(\mathbf z\).

Proposition 20 Prop. 1.4.2

Given a code \(C \subseteq \Sigma ^n\), the following are equivalent:

\(C\) has minimum distance \(d \ge 2\).
If \(d\) is odd, \(C\) can correct \((d-1)/2\) errors.
\(C\) can detect \(d-1\) errors.
\(C\) can correct \(d-1\) erasures.

Proof ▶

We freely use the elementary fact that Hamming distance satisfies the triangle inequality \(\Delta (\mathbf x,\mathbf z) \le \Delta (\mathbf x,\mathbf y) + \Delta (\mathbf y,\mathbf z)\) for all \(\mathbf x,\mathbf y,\mathbf z \in \Sigma ^n\) (if \(\mathbf x_i \neq \mathbf z_i\) at some position \(i\), then \(\mathbf x_i \neq \mathbf y_i\) or \(\mathbf y_i \neq \mathbf z_i\), so every position counted on the left is counted at least once on the right).

We first show that (1) implies (2), (3) and (4); we then show that if (1) fails, none of (2), (3), (4) can hold.

(1)\(\Rightarrow \)(2). Assume \(C\) has distance \(d=2t+1\). We claim the MLD function \(D_{MLD}\) (Definition 18, e.g. via Algorithm 19) corrects every error pattern of at most \(t\) errors. Suppose not: let \(\mathbf c_1\) be the transmitted codeword and \(\mathbf y\) the received word, so \(\Delta (\mathbf y,\mathbf c_1) \le t\), and suppose \(D_{MLD}(\mathbf y) = \mathbf c_2 \neq \mathbf c_1\). By definition of MLD, \(\Delta (\mathbf y,\mathbf c_2) \le \Delta (\mathbf y,\mathbf c_1)\). Then

\[ \Delta (\mathbf c_1,\mathbf c_2) \le \Delta (\mathbf c_2,\mathbf y) + \Delta (\mathbf c_1,\mathbf y) \le 2\Delta (\mathbf c_1,\mathbf y) \le 2t = d-1, \]

contradicting that \(C\) has distance \(d\).

(1)\(\Rightarrow \)(3). We exhibit a detection procedure for \(d-1\) errors: on input \(\mathbf y\), check (by exhaustive search over \(C\)) whether \(\mathbf y \in C\), output \(1\) if so and \(0\) otherwise. If no error occurred, \(\mathbf y = \mathbf c_1\) (the transmitted codeword) and the algorithm outputs \(1\), as required. If \(1 \le \Delta (\mathbf y,\mathbf c_1) \le d-1\), then since \(C\) has distance \(d\), \(\mathbf y \notin C\) (otherwise \(\mathbf y\) and \(\mathbf c_1\) would be two codewords at distance \({\lt}d\)), so the algorithm outputs \(0\), as required.

(1)\(\Rightarrow \)(4). Let \(\mathbf y \in (\Sigma \cup \{ ?\} )^n\) be a received word with at most \(d-1\) erasures. We claim there is a unique \(\mathbf c=(c_1,\dots ,c_n) \in C\) agreeing with \(\mathbf y\) on every unerased position. If not, there are distinct \(\mathbf c_1,\mathbf c_2 \in C\) both agreeing with \(\mathbf y\) on the unerased positions; then \(\mathbf c_1,\mathbf c_2\) agree on all positions \(i\) with \(y_i \neq {?}\), so \(\Delta (\mathbf c_1,\mathbf c_2) \le |\{ i : y_i = {?}\} | \le d-1\), contradicting that \(C\) has distance \(d\). Given uniqueness, the algorithm that exhaustively searches \(C\) for the codeword agreeing with \(\mathbf y\) on the unerased positions correctly recovers the transmitted codeword.

\(\neg \)(1)\(\Rightarrow \neg \)(2). Suppose \(C\) has distance \(d-1\) (so property (1) fails for the value \(d\) used above). We show no decoding function can correct all error patterns of weight \(\le (d-1)/2\) (with \(d\) odd, so \((d-1)/2\) is an integer). Let \(\mathbf c_1 \neq \mathbf c_2 \in C\) with \(\Delta (\mathbf c_1,\mathbf c_2) = d-1\). Since \(d\) is odd, there is a vector \(\mathbf y\) with \(\Delta ( \mathbf y,\mathbf c_1) = \Delta (\mathbf y,\mathbf c_2) = (d-1)/2\): split the \(d-1\) differing positions between \(\mathbf c_1\) and \(\mathbf c_2\) into two halves of size \((d-1)/2\) each, let \(\mathbf y\) agree with \(\mathbf c_1\) on one half of the differing positions and with \(\mathbf c_2\) on the other half, and agree with both on all positions where they already agree. Since \(\mathbf y\) could have arisen from either \(\mathbf c_1\) or \(\mathbf c_2\) being transmitted with at most \((d-1)/2\) errors, no decoding function can correctly recover the transmitted codeword in both cases.

\(\neg \)(1)\(\Rightarrow \neg \)(3). With \(\mathbf c_1,\mathbf c_2\) as above (\(\Delta (\mathbf c_1,\mathbf c_2)=d-1\)), let \(\mathbf y = \mathbf c_2\). If \(\mathbf c_1\) is transmitted and \(\mathbf y=\mathbf c_2\) is received (which requires \(d-1\) errors, within the claimed detection radius \(d-1\)), then either the algorithm declares "no error" (incorrect, since \(\mathbf y \neq \mathbf c_1\)) or it declares "error" even though, when \(\mathbf c_2\) is transmitted with \(0\) errors, the same received word \(\mathbf y = \mathbf c_2\) must be accepted. So no detection procedure can be correct on both scenarios.

\(\neg \)(1)\(\Rightarrow \neg \)(4). Let \(\mathbf y\) be the received word obtained by erasing exactly the positions where \(\mathbf c_1\) and \(\mathbf c_2\) differ (a total of \(d-1\) erasures). Then \(\mathbf y\) is consistent with both \(\mathbf c_1\) and \(\mathbf c_2\) having been transmitted, so no algorithm correcting \(d-1\) erasures can work in this case.

Proposition 21 Even-distance correction, Rmk. 1.4.3

Let \(C\) be a code with minimum distance \(d\), for \(d\) even. Then \(C\) can correct \(d/2-1\) errors but cannot correct \(d/2\) errors. Consequently, if a code \(C\) is \(t\)-error correctable (for the largest possible such \(t\)), then \(C\) has distance either \(2t+1\) or \(2t+2\).

Proof ▶

\(C\) corrects \(d/2-1\) errors. The argument for (1)\(\Rightarrow \)(2) in the proof of Proposition 20 used only that \(2t \le d-1\) to derive the contradiction \(\Delta (\mathbf c_1, \mathbf c_2) \le 2t \le d-1\); taking \(t = d/2-1\) gives \(2t = d-2 \le d-1\), so the same MLD-based argument shows \(C\) corrects \(d/2-1\) errors.

\(C\) cannot correct \(d/2\) errors. Let \(\mathbf c_1 \neq \mathbf c_2 \in C\) with \(\Delta (\mathbf c_1,\mathbf c_2)=d\). Split the \(d\) differing positions into two halves of size \(d/2\) each (possible as \(d\) is even), and let \(\mathbf y\) agree with \(\mathbf c_1\) on one half of the differing positions and with \(\mathbf c_2\) on the other, and with both elsewhere. Then \(\Delta (\mathbf y,\mathbf c_1) = \Delta (\mathbf y,\mathbf c_2) = d/2\), so \(\mathbf y\) is consistent with at most \(d/2\) errors having occurred regardless of whether \(\mathbf c_1\) or \(\mathbf c_2\) was transmitted, and no decoding function can be correct in both cases.

Consequence. If \(C\) is \(t\)-error correctable for the largest possible \(t\) and \(C\) has distance \(d\), then by Proposition 20 (odd \(d\)) or the first part above (even \(d\)), \(t \ge \lceil d/2 \rceil - [d\text{ even}]\), i.e. \(t = (d-1)/2\) if \(d\) is odd and \(t=d/2-1\) if \(d\) is even; in either case \(d \in \{ 2t+1,2t+2\} \).

Definition 22 Hamming code \(C_H\)

The Hamming code \(C_H\) is the binary code of block length \(7\) with message length \(4\) defined, for \((x_1,x_2,x_3,x_4) \in \{ 0,1\} ^4\), by

\[ C_H(x_1,x_2,x_3,x_4) = (x_1,x_2,x_3,x_4,\; x_2\oplus x_3\oplus x_4,\; x_1\oplus x_3 \oplus x_4,\; x_1\oplus x_2\oplus x_4). \]

It has parameters \(q=2\), \(k=4\), \(n=7\), \(R=4/7\).

Definition 23 Hamming Weight, Def. 1.5.1

Let \(q \ge 2\). Given a vector \(\mathbf v \in \{ 0,1,\dots ,q-1\} ^n\), its Hamming weight, denoted \(\mathrm{wt}(\mathbf v)\), is the number of non-zero symbols in \(\mathbf v\).

Lemma 24 \(\Delta (\mathbf u,\mathbf v) = \mathrm{wt}(\mathbf u+\mathbf v)\)

For any \(\mathbf u,\mathbf v \in \{ 0,1\} ^n\), where \(\mathbf u + \mathbf v\) denotes coordinatewise XOR,

\[ \Delta (\mathbf u,\mathbf v) = \mathrm{wt}(\mathbf u + \mathbf v). \]

Proof ▶

For each coordinate \(i \in [n]\), \(u_i \neq v_i\) if and only if \(u_i \oplus v_i = 1\), i.e. if and only if \((\mathbf u+\mathbf v)_i \neq 0\). Hence the set of positions where \(\mathbf u\) and \(\mathbf v\) differ coincides with the set of non-zero positions of \(\mathbf u+\mathbf v\), so the two sets have the same size: \(\Delta (\mathbf u,\mathbf v) = \mathrm{wt}(\mathbf u+\mathbf v)\).

Proposition 25 Prop. 1.5.2

\(C_H\) has distance \(3\).

Proof ▶

We prove the claimed distance using two facts:

\[ \min _{\mathbf c \in C_H,\, \mathbf c \neq \mathbf0} \mathrm{wt}(\mathbf c) = 3, \tag {a} \]

and

\[ \min _{\mathbf c \in C_H,\, \mathbf c \neq \mathbf0} \mathrm{wt}(\mathbf c) = \min _{\mathbf c_1 \neq \mathbf c_2 \in C_H} \Delta (\mathbf c_1,\mathbf c_2). \tag {b} \]

Proof of (a). Let \(\mathbf x=(x_1,x_2,x_3,x_4)\) be the message vector and write the three parity bits as \(p_1 = x_2\oplus x_3\oplus x_4\), \(p_2 = x_1\oplus x_3\oplus x_4\), \(p_3 = x_1\oplus x_2\oplus x_4\), so \(\mathrm{wt}(C_H(\mathbf x)) = \mathrm{wt}(\mathbf x) + \mathrm{wt} (p_1,p_2,p_3)\).

If \(\mathrm{wt}(\mathbf x)=0\) then \(\mathbf x=\mathbf0\) and \(C_H(\mathbf x)=\mathbf0\), which we exclude.
If \(\mathrm{wt}(\mathbf x)=1\), a direct check of the four basis vectors gives: \(\mathbf x=(1,0,0,0) \Rightarrow (p_1,p_2,p_3) = (0,1,1)\); \(\mathbf x=(0,1,0,0) \Rightarrow (1,0,1)\); \(\mathbf x=(0,0,1,0) \Rightarrow (1,1,0)\); \(\mathbf x=(0,0,0,1) \Rightarrow (1,1,1)\). In every case at least two of \(p_1,p_2,p_3\) equal \(1\), so \(\mathrm{wt}(C_H(\mathbf x)) \ge 1+2 = 3\).
If \(\mathrm{wt}(\mathbf x)=2\), a direct check of the six weight-\(2\) vectors gives: \((1,1,0,0)\Rightarrow (1,1,0)\); \((1,0,1,0)\Rightarrow (1,0,1)\); \((1,0,0,1)\Rightarrow (1,0,0)\); \((0,1,1,0)\Rightarrow (0,1,1)\); \((0,1,0,1)\Rightarrow (0,1,0)\); \((0,0,1,1)\Rightarrow (0,0,1)\). In every case at least one of \(p_1,p_2,p_3\) equals \(1\), so \(\mathrm{wt}(C_H(\mathbf x)) \ge 2+1=3\).
If \(\mathrm{wt}(\mathbf x) \ge 3\) then already \(\mathrm{wt}(C_H(\mathbf x)) \ge \mathrm{wt}(\mathbf x) \ge 3\).

Hence \(\min _{\mathbf c\neq \mathbf0} \mathrm{wt}(\mathbf c) \ge 3\). Since \(C_H(1,0,0,0) = (1,0,0,0,0,1,1)\) has weight \(3\), equality holds, proving (a).

Proof of (b). Let \(\mathbf x = (x_1,x_2,x_3,x_4)\) and \(\mathbf y=(y_1,y_2,y_3,y_4)\) be two messages. Since each coordinate of \(C_H\) is an XOR of a subset of the message coordinates, and XOR is associative and commutative, \(C_H(\mathbf x) + C_H(\mathbf y) = C_H(\mathbf x + \mathbf y)\) (coordinatewise XOR on both sides; e.g. the last coordinate gives \((x_1\oplus x_2\oplus x_4) \oplus (y_1\oplus y_2\oplus y_4) = (x_1\oplus y_1)\oplus (x_2\oplus y_2) \oplus (x_4\oplus y_4)\), which is exactly the last coordinate of \(C_H(\mathbf x+\mathbf y)\), and likewise for the other coordinates). By Lemma 24,

\[ \min _{\mathbf x \neq \mathbf y \in \{ 0,1\} ^4} \Delta (C_H(\mathbf x),C_H(\mathbf y)) = \min _{\mathbf x \neq \mathbf y} \mathrm{wt}(C_H(\mathbf x)+C_H(\mathbf y)) = \min _{\mathbf x \neq \mathbf y} \mathrm{wt}(C_H(\mathbf x + \mathbf y)) = \min _{\mathbf z \neq \mathbf0} \mathrm{wt}(C_H(\mathbf z)), \]

where the last equality holds because \(\{ \mathbf x+\mathbf y \mid \mathbf x \neq \mathbf y \in \{ 0,1\} ^4\} = \{ \mathbf z \in \{ 0,1\} ^4 \mid \mathbf z \neq \mathbf0\} \) (for any nonzero \(\mathbf z\), take \(\mathbf x=\mathbf0,\mathbf y=\mathbf z\); and \(\mathbf x\neq \mathbf y\) forces \(\mathbf x+\mathbf y \neq \mathbf0\)). This proves (b).

Combining (a) and (b), \(C_H\) has distance \(3\).

Definition 26 Binary linear code

A binary code \(C \subseteq \{ 0,1\} ^n\) of dimension \(k\) is called a binary linear code if there is a \(k\times n\) matrix \(G\) over \(\{ 0,1\} \) (addition being XOR, multiplication being AND) such that \(C = \{ \mathbf x \cdot G \mid \mathbf x \in \{ 0,1\} ^k\} \), and, writing \(C(\mathbf x) \overset {\mathrm{def}}{=} \mathbf x \cdot G\) (viewing \(\mathbf x\) as a row vector), the map \(\mathbf x \mapsto C(\mathbf x)\) is a bijection from \(\{ 0,1\} ^k\) onto \(C\). The matrix \(G\) is called a generator matrix for \(C\).

Lemma 27 Lemma 1.5.3

For any binary linear code \(C\) (not necessarily distinct) messages \(\mathbf x,\mathbf y\), \(C(\mathbf x) + C(\mathbf y) = C(\mathbf x + \mathbf y)\) (coordinatewise XOR throughout).

Proof ▶

Let \(G\) be a generator matrix for \(C\). Using distributivity of AND over XOR and associativity/commutativity of XOR,

\[ C(\mathbf x) + C(\mathbf y) = \mathbf x \cdot G + \mathbf y \cdot G = (\mathbf x + \mathbf y)\cdot G = C(\mathbf x + \mathbf y). \qedhere \]

Proposition 28 Prop. 1.5.4

For any binary linear code \(C\), its minimum distance equals the minimum Hamming weight of any non-zero codeword:

\[ \Delta (C) = \min _{\mathbf c \in C,\, \mathbf c\neq \mathbf0} \mathrm{wt}(\mathbf c). \]

Proof ▶

Let \(C\) have dimension \(k\). Exactly as in the proof of part (b) of Proposition 25, but using Lemma 27 in place of the direct XOR computation for \(C_H\): for messages \(\mathbf x,\mathbf y \in \{ 0,1\} ^k\), Lemma 27 gives \(C(\mathbf x)+C(\mathbf y) = C(\mathbf x + \mathbf y)\), and Lemma 24 gives \(\Delta (C(\mathbf x),C(\mathbf y)) = \mathrm{wt}(C(\mathbf x)+C(\mathbf y))\). Hence

\[ \Delta (C) = \min _{\mathbf x \neq \mathbf y} \Delta (C(\mathbf x),C(\mathbf y)) = \min _{\mathbf x \neq \mathbf y} \mathrm{wt} (C(\mathbf x + \mathbf y)) = \min _{\mathbf z \neq \mathbf0} \mathrm{wt}(C(\mathbf z)) = \min _{\mathbf c \in C,\, \mathbf c \neq \mathbf0} \mathrm{wt}(\mathbf c), \]

where the third equality uses that \(\{ \mathbf x + \mathbf y \mid \mathbf x \neq \mathbf y \in \{ 0,1\} ^k\} = \{ \mathbf z \in \{ 0,1\} ^k \mid \mathbf z \neq \mathbf0\} \) and the last equality uses that \(\mathbf x \mapsto C(\mathbf x)\) is a bijection \(\{ 0,1\} ^k \to C\) (Definition 26).

Definition 29 Hamming Ball, Def. 1.6.1

For any vector \(\mathbf x \in [q]^n\) and integer \(e \ge 0\), the Hamming ball of radius \(e\) centered at \(\mathbf x\) is

\[ B(\mathbf x,e) = \{ \mathbf y \in [q]^n \mid \Delta (\mathbf x,\mathbf y) \le e\} . \]

Theorem 30 Hamming bound for \(d=3\), Thm. 1.6.2

Every binary code with block length \(n\), dimension \(k\), distance \(d=3\) satisfies

\[ k \le n - \log _2(n+1). \]

Proof ▶

Let \(C\) have distance \(3\). First, for any two distinct codewords \(\mathbf c_1,\mathbf c_2 \in C\), \(B(\mathbf c_1,1) \cap B(\mathbf c_2,1) = \varnothing \): if \(\mathbf y\) were in both, then \(\Delta ( \mathbf y,\mathbf c_1) \le 1\) and \(\Delta (\mathbf y,\mathbf c_2) \le 1\), so by the triangle inequality \(\Delta (\mathbf c_1,\mathbf c_2) \le 2 {\lt} 3\), a contradiction.

Next, for every \(\mathbf x \in \{ 0,1\} ^n\), \(|B(\mathbf x,1)| = n+1\): \(B(\mathbf x,1)\) consists of \(\mathbf x\) itself together with the \(n\) vectors obtained by flipping exactly one coordinate of \(\mathbf x\), and these \(n+1\) vectors are pairwise distinct.

The union \(\bigcup _{\mathbf c \in C} B(\mathbf c,1)\) is a subset of \(\{ 0,1\} ^n\), so \(\left|\bigcup _{\mathbf c \in C} B(\mathbf c,1)\right| \le 2^n\). Since the balls are pairwise disjoint,

\[ \left|\bigcup _{\mathbf c \in C} B(\mathbf c,1)\right| = \sum _{\mathbf c \in C} |B(\mathbf c,1)| = \sum _{\mathbf c \in C} (n+1) = 2^k(n+1), \]

using \(|C| = 2^k\). Hence \(2^k(n+1) \le 2^n\), i.e. \(2^k \le \frac{2^n}{n+1}\), and taking \(\log _2\) gives \(k \le n - \log _2(n+1)\).

Definition 31 \((n,k,d)_\Sigma \) code, Def. 1.7.1

A code \(C \subseteq \Sigma ^n\) with dimension \(k\) and distance \(d\) is called an \((n,k,d)_\Sigma \) code, also written an \((n,k,d)_{|\Sigma |}\) code.

Theorem 32 Hamming Bound for any \(d\), Thm. 1.7.2

For every \((n,k,d)_q\) code,

\[ k \le n - \log _q\left( \sum _{i=0}^{\lfloor (d-1)/2 \rfloor } \binom n i (q-1)^i \right). \]

Proof ▶

Let \(e = \lfloor (d-1)/2 \rfloor \) and let \(C\) be an \((n,k,d)_q\) code. As in the proof of Theorem 30, for any two distinct codewords \(\mathbf c_1,\mathbf c_2 \in C\), \(B(\mathbf c_1,e) \cap B(\mathbf c_2,e) = \varnothing \): otherwise \(\mathbf y \in B( \mathbf c_1,e) \cap B(\mathbf c_2,e)\) would give, by the triangle inequality, \(\Delta (\mathbf c_1,\mathbf c_2) \le 2e \le d-1\), contradicting that \(C\) has distance \(d\).

We claim that for all \(\mathbf x \in [q]^n\),

\[ |B(\mathbf x,e)| = \sum _{i=0}^{e} \binom n i (q-1)^i. \]

Indeed, any vector in \(B(\mathbf x,e)\) differs from \(\mathbf x\) in exactly \(i\) positions for some \(0 \le i \le e\); there are \(\binom n i\) ways to choose which \(i\) positions differ, and for each such choice, in each differing position the vector may take any of the \(q-1\) values other than the corresponding coordinate of \(\mathbf x\), giving \((q-1)^i\) choices; summing over \(i\) gives the claim.

Now \(\bigcup _{\mathbf c \in C} B(\mathbf c,e) \subseteq [q]^n\), so \(\left|\bigcup _{\mathbf c \in C} B(\mathbf c,e)\right| \le q^n\). Since the balls are pairwise disjoint and \(|C| = q^k\),

\[ \left|\bigcup _{\mathbf c \in C} B(\mathbf c,e)\right| = \sum _{\mathbf c \in C} |B(\mathbf c,e)| = q^k \sum _{i=0}^{e} \binom n i (q-1)^i. \]

Combining, \(q^k \sum _{i=0}^{e}\binom n i (q-1)^i \le q^n\), and taking \(\log _q\) of both sides gives the desired bound:

\[ k \le n - \log _q \left( \sum _{i=0}^{e} \binom n i (q-1)^i \right). \qedhere \]

Corollary 33 Hamming bound on the rate

Every code of distance \(d\) over an alphabet of size \(q\) has rate \(R\) at most

\[ 1 - \frac{\log _q\left( \sum _{i=0}^{e} \binom n i (q-1)^i \right)}{n}, \]

where \(e = \lfloor (d-1)/2 \rfloor \).

Proof ▶

Divide both sides of the inequality of Theorem 32 by \(n\) and use \(R = k/n\) (Definition 5).

Definition 34 Perfect code, Def. 1.7.3

A code that meets the Hamming bound of Theorem 32 with equality is called a perfect code: i.e. an \((n,k,d)_q\) code \(C\) is perfect if

\[ k = n - \log _q\left( \sum _{i=0}^{e} \binom n i (q-1)^i \right), \qquad e = \left\lfloor \frac{d-1}{2} \right\rfloor . \]

Equivalently, the Hamming balls of radius \(e\) centered at the codewords of \(C\) partition the entire ambient space \([q]^n\). The \((7,4,3)_2\) Hamming code \(C_H\) (Definition 22) is an example of a perfect code, since \(2^4 = 2^7/(7+1)\).

Definition 35 Code families, Rate and Distance, Def. 1.8.1

Let \(\{ n_i\} _{i\ge 1}\) be an increasing sequence of block lengths and suppose there exist sequences \(\{ k_i\} _{i\ge 1}\), \(\{ d_i\} _{i \ge 1}\) and \(\{ q_i\} _{i\ge 1}\) such that for all \(i \ge 1\) there exists an \((n_i,k_i,d_i)_{q_i}\) code \(C_i\). Then the sequence \(\mathscr C = \{ C_i\} _{i\ge 1}\) is a family of codes. The rate of \(\mathscr C\) is

\[ R(\mathscr C) = \lim _{i\to \infty } \left\{ \frac{k_i}{n_i} \right\} , \]

when the limit exists. The relative distance of \(\mathscr C\) is

\[ \delta (\mathscr C) = \lim _{i\to \infty } \left\{ \frac{d_i}{n_i} \right\} , \]

when the limit exists. If \(q_i = q\) for all \(i \ge 1\), then \(\mathscr C\) is referred to as a family of \(q\)-ary codes.

Definition 36 Efficient algorithm

An algorithm related to a code of block length \(n\) is called efficient if it runs in time polynomial in \(n\). (Throughout the book, unless mentioned otherwise, a “code” implicitly refers to a family of codes, so that asymptotic run time as \(n \to \infty \) is meaningful.)