3 Probability as Fancy Counting and the q-ary Entropy Function
In this chapter we set up the probabilistic toolkit that will be used throughout the book to prove existence results for codes via the “probabilistic method,” and we introduce the \(q\)-ary entropy function \(H_q\), whose close relationship to the volume of a Hamming ball makes it central to both existence and non-existence results for codes.
3.1 A Crash Course on Probability
Throughout, we only consider probability distributions defined over finite domains.
Given a finite domain \(\mathbb {D}\), a probability distribution is a function
An event \(\mathbb {E}\) is a predicate on \(\mathbb {D}\), equivalently a subset of \(\mathbb {D}\).
The uniform distribution over a finite domain \(\mathbb {D}\), denoted \(\mathscr {U}_{\mathbb {D}}\), is given by
Let \(\mathbb {D}\) be a finite domain, \(I \subset \mathbb {R}\) a finite subset, and \(p\) a probability distribution over \(\mathbb {D}\). A random variable is a function \(V : \mathbb {D} \to I\). Its expectation is defined as
For an event \(E\) over \(\mathbb {D}\), its indicator variable \(\mathbb {1}_E : \mathbb {D} \to \{ 0,1\} \) is defined by \(\mathbb {1}_E(x) = 1\) if \(x \in E\) and \(0\) otherwise.
Let \(E\) be any event. Then
Given random variables \(V_1, \dots , V_m\) defined over the same domain \(\mathbb {D}\) and with the same probability distribution \(p\),
Given \(m\) binary random variables \(A_1, \dots , A_m\),
Let \(V\) be a non-negative random variable. Then for any \(t {\gt} 0\),
In particular, for any \(a \ge 1\), \(\Pr [V \ge a \cdot \mathbb {E}[V]] \le \frac{1}{a}\).
Let \(V\) be a random variable. Its variance is defined as
and its standard deviation is \(\sigma [V] = \sqrt{\mathrm{Var}[V]}\).
Let \(V\) be a random variable such that \(\mathrm{Var}[V] \ne 0\). Then for any \(t {\gt} 0\),
Two random variables \(A\) and \(B\) are called independent if for every \(a, b\) in the ranges of \(A\) and \(B\) respectively,
Given two events \(A\) and \(B\) defined over the same domain and probability distribution, the probability of \(A\) conditioned on \(B\) is
For any two events \(A\) and \(B\) defined on the same domain and probability distribution,
Let \(X_1, \dots , X_m\) be independent binary random variables and define \(X = \sum _i X_i\). Then the multiplicative Chernoff bound states that for \(0 {\lt} \varepsilon \le 1\),
and the additive Chernoff bound states that
Given probability distributions \(p_1\) and \(p_2\) over domains \(\mathbb {D}_1\) and \(\mathbb {D}_2\) respectively, the product distribution \(p_1 \times p_2\) over \(\mathbb {D}_1 \times \mathbb {D}_2\) is the distribution under which every element \((x,y) \in \mathbb {D}_1 \times \mathbb {D}_2\) is picked by choosing \(x\) from \(\mathbb {D}_1\) according to \(p_1\) and \(y\) is picked independently from \(\mathbb {D}_2\) according to \(p_2\).
For any \(m \ge 1\), the distribution \(\mathscr {U}_{\mathbb {D}_1 \times \mathbb {D}_2 \times \cdots \times \mathbb {D}_m}\) is identical to the distribution \(\mathscr {U}_{\mathbb {D}_1} \times \mathscr {U}_{\mathbb {D}_2} \times \cdots \times \mathscr {U}_{\mathbb {D}_m}\).
Given a non-zero vector \(\mathbf{m} \in \mathbb {F}_q^k\) and a uniformly random \(k \times n\) matrix \(G\) over \(\mathbb {F}_q\), the vector \(\mathbf{m} \cdot G\) is uniformly distributed over \(\mathbb {F}_q^n\).
Let \(g_{ji}\) (\(1 \le j \le k\), \(1 \le i \le n\)) denote the \((j,i)\)-th entry of \(G\). Since \(G\) is a uniformly random \(k \times n\) matrix over \(\mathbb {F}_q\), by Lemma 91 each of the \(g_{ji}\) is an independent uniformly random element of \(\mathbb {F}_q\). It therefore suffices to show that for every \(1 \le i \le n\), the \(i\)-th entry \(b_i\) of \(\mathbf{m} \cdot G\) is an independent uniformly random element of \(\mathbb {F}_q\): since \(b_i\) and \(b_j\) for \(i \ne j\) are sums over disjoint sets of entries of \(G\), they are automatically independent once each \(b_i\) is shown to be uniform.
Write \(\mathbf{m} = (m_1, \dots , m_k)\); then \(b_i = \sum _{j=1}^k m_j g_{ji}\). Since \(\mathbf{m} \ne \mathbf{0}\), some \(m_j \ne 0\); without loss of generality \(m_1 \ne 0\). Write \(b_i = m_1 g_{1i} + \sum _{j=2}^k m_j g_{ji}\). Fix any assignment of values to \(g_{2i}, \dots , g_{ki}\) (there are \(q^{k-1}\) such assignments). As \(g_{1i}\) ranges over the \(q\) elements of \(\mathbb {F}_q\) (using \(m_1 \ne 0\)), \(b_i\) takes each of the \(q\) values of \(\mathbb {F}_q\) exactly once. Hence, over all \(q^{k-1}\) assignments to \(g_{2i}, \dots , g_{ki}\), each value of \(\mathbb {F}_q\) is taken by \(b_i\) exactly \(q^{k-1}\) times out of \(q^k\) total assignments, i.e. \(b_i\) is uniformly distributed over \(\mathbb {F}_q\), as claimed.
3.2 The Probabilistic Method
The probabilistic method proves the existence of an object \(\mathscr {C}\) with a property \(\mathscr {P}\) by defining a distribution \(\mathscr {D}\) over the space of candidate objects and showing
which (being a strictly positive probability) proves such a \(\mathscr {C}\) must exist. A common strategy is to decompose \(\mathscr {P} = P_1 \wedge \cdots \wedge P_m\) into sub-properties and use the union bound (Proposition 82) to show \(\Pr [\overline{P_i}] {\lt} 1/m\) for every \(i\). This section contains no further numbered definitions, lemmas, propositions, theorems, corollaries, constructions or algorithms; the method itself is illustrated in the book on the running example of whether a \([2,2,1]_2\) code exists (Question 3.0.1), using the tools of Section 3.1 above (Definitions 78 and 79, and Proposition 82).
3.3 The q-ary Entropy Function
Let \(q\) be an integer with \(q \ge 2\) and let \(x\) be a real number with \(0 \le x \le 1\). The \(q\)-ary entropy function is defined by
For the special case \(q = 2\) we drop the subscript and write \(H(x) = H_2(x) = -x \log x - (1-x)\log (1-x)\), where \(\log \) denotes \(\log _2\) (the convention used throughout the book). The maximum value of \(H_q\) is \(1\), achieved at \(x = 1 - 1/q\).
Volume of Hamming Balls
Let \(q \ge 2\) and \(n \ge r \ge 1\) be integers. The volume of a Hamming ball of radius \(r\) is given by
(The choice of \(\mathbf{0}\) as center is immaterial: the volume of a Hamming ball is independent of its center, as the sum on the right makes clear, so any point may serve as center.)
Let \(q \ge 2\) be an integer and \(0 \le p \le 1 - \frac{1}{q}\) be a real number. Then:
\(\mathrm{Vol}_q(pn, n) \le q^{H_q(p)n}\); and
for large enough \(n\), \(\mathrm{Vol}_q(pn,n) \ge q^{H_q(p)n - o(n)}\).
Proof of (i). Starting from the binomial expansion of \(1 = (p + (1-p))^n\),
Since \(0 \le p \le 1 - 1/q\) implies \(\frac{p}{(q-1)(1-p)} \le 1\), and \(i \le pn\) in the sum, raising a number in \([0,1]\) to a larger power can only decrease it, so \(\left(\frac{p}{(q-1)(1-p)}\right)^i \ge \left(\frac{p}{(q-1)(1-p)}\right)^{pn}\) for every \(i \le pn\). Hence
where the last line uses \(q^{-H_q(p)n} = \left(\frac{p}{q-1}\right)^{pn} (1-p)^{(1-p)n}\), which follows directly from the definition of \(H_q\). Rearranging gives \(\mathrm{Vol}_q(pn,n) \le q^{H_q(p)n}\), which proves (i).
Proof of (ii). By Stirling’s approximation for \(n!\),
where \(\ell (n) = \dfrac {e^{\lambda _1(n) - \lambda _2(pn) - \lambda _2((1-p)n)}}{\sqrt{2\pi p(1-p)n}}\) (with \(\lambda _1, \lambda _2\) the error terms of Stirling’s approximation). Then, keeping only the last term of the sum defining \(\mathrm{Vol}_q(pn,n)\),
where the last inequality follows from the definition of \(H_q(\cdot )\) and the fact that \(\ell (n) = q^{-o(n)}\) for large enough \(n\). This proves (ii) and completes the proof.
Other Properties of the q-ary Entropy Function
For small enough \(\varepsilon \), the inequality
holds if and only if \(q\) is \(2^{\Omega (1/\varepsilon )}\).
First note that, by definition of \(H_q(\rho )\) and \(H(\rho )\) (writing \(\log \) for \(\log _2\)),
(\(\Leftarrow \)) Suppose \(q \ge 2^{1/\varepsilon }\). Then \(\log _q(q-1) \le 1\) and \(H(\rho ) \le 1\) for every \(\rho \), and \(\log _2 q \ge 1/\varepsilon \), so
hence \(1 - H_q(\rho ) \ge 1 - \rho - \varepsilon \) for every \(0 {\lt} \rho \le 1 - 1/q\), as desired.
(\(\Rightarrow \)) Suppose \(q = 2^{o(1/\varepsilon )}\). We first claim that for small enough \(\varepsilon \), if \(q \ge 1/\varepsilon ^2\) then \(\log _q(q-1) \ge 1 - \varepsilon \): indeed,
which is at least \(1-\varepsilon \) once \(q \ge 1/\varepsilon ^2\) (and \(\varepsilon \) small enough). Next, if \(q = 2^{o(1/\varepsilon )}\), then for fixed \(\rho \),
Combining, for \(q = 2^{o(1/\varepsilon )}\) with \(q \ge 1/\varepsilon ^2\),
which implies \(1 - H_q(\rho ) {\lt} 1 - \rho - \varepsilon \), as desired. Finally, for \(q \le 1/\varepsilon ^2\), Lemma 97 (applied with \(q\) and \(q^m = 1/\varepsilon ^2\)) shows \(1-H_q(\rho ) \le 1 - H_{1/\varepsilon ^2}(\rho ) {\lt} 1 - \rho - \varepsilon \), using the case already established for the base \(1/\varepsilon ^2 \ge 2^{1/\varepsilon }\). This completes the proof of the “only if” direction.
Let \(q \ge 2\) be an integer and let \(0 \le \rho \le 1 - 1/q\). Then for any real \(m \ge 1\) such that
we have \(H_q(\rho ) \ge H_{q^m}(\rho )\).
Since \(H_q(0) = H_{q^m}(0) = 0\), we may assume \(\rho \in (0, 1-1/q]\). As noted in the proof of Proposition 96, \(H_q(\rho ) = \rho \cdot \frac{\log (q-1)}{\log q} + H(\rho ) \cdot \frac{1}{\log q}\) (all logs base 2). Hence
Multiplying through by \(\frac{1}{\rho } m \log q {\gt} 0\),
where the inequality in the second line uses that \(H(\rho )/\rho \) is a decreasing function of \(\rho \) on \((0,1)\) together with \(\rho \le 1-1/q\) (indeed \(H(\rho )/\rho = \log (1/\rho ) - (1/\rho - 1)\log (1-\rho )\), and one checks both terms are decreasing in \(\rho \)), and the final inequality is exactly the hypothesis \(q^{m-1} \ge (1+1/(q-1))^{q-1}\), since \((q-1)\cdot q^{(m-1)/(q-1)} \ge q\) is equivalent to it after raising both sides to the power \(q-1\). This shows \(H_q(\rho ) - H_{q^m}(\rho ) \ge 0\), as desired.
Let \(q \ge 3\) be an integer and \(0 \le \rho \le 1 - 1/q\). Then for any integer \(m \ge 2\),
For small enough \(\varepsilon {\gt} 0\),
where \(c_q\) is a constant that depends only on \(q\).
Since \(H_q'(x) = 0\) at \(x = 1-1/q\), the intuition is that in the Taylor expansion of \(H_q(1-1/q-\varepsilon )\) around \(\varepsilon = 0\) the linear term vanishes; we make this precise assuming \(\varepsilon {\lt} 1/q\). Writing \(x = 1-1/q-\varepsilon \) (so \(1-x = 1/q+\varepsilon \)), by definition of \(H_q\),
Expanding \(\ln (1+x) = x - x^2/2 + x^3/3 - \cdots \) for each logarithm above and collecting the \(\varepsilon ^3\) and smaller terms into \(o(\varepsilon ^2)\), one obtains after simplification
for \(\varepsilon \) small enough. Taking \(c_q = \frac{q^2}{4\ln q\, (q-1)}\) completes the proof.
For small enough \(\varepsilon {\gt} 0\),
By definition,
Since every term on the right is non-negative,
which gives the lower bound (the \(\Omega \) half of the claim). For the upper bound, one has \((1-\varepsilon )\log _q(1/(1-\varepsilon )) \le 2\varepsilon /\ln q\) for small enough \(\varepsilon \), so
Together, the two displayed bounds give \(H_q(\varepsilon ) = \Theta \! \left(\frac{1}{\log q}\cdot \varepsilon \log (1/\varepsilon )\right)\), as claimed.
Since \(H_q(\cdot )\) restricted to the domain \([0, 1-1/q]\) is a bijective map onto \([0,1]\), we define \(H_q^{-1}(y) = x\) for the unique \(x \in [0, 1-1/q]\) such that \(H_q(x) = y\).
For every \(0 {\lt} y \le 1 - 1/q\) and for every small enough \(\varepsilon {\gt} 0\),
where \(c'_q \ge 1\) is a constant that depends only on \(q\).
It is straightforward to check that \(H_q^{-1}(y)\) is a strictly increasing, convex function of \(y\) on \([0,1]\). In particular its derivative is increasing, so \((H_q^{-1})'(1) \ge (H_q^{-1})'(y)\) for every \(y \in [0,1]\), and hence for every \(0 \le y \le 1\) and small enough \(\delta {\gt} 0\),
By Proposition 99 (applied with \(\varepsilon = \sqrt{\delta /c_q}\), recalling \(H_q^{-1}(1) = 1-1/q\)), for small enough \(\delta \),
Combining the two displays, and setting \(c'_q = \max (1, 1/c_q)\) and \(\delta = \varepsilon ^2/c'_q\), we get
since \(c'_q c_q \ge 1\). Rearranging gives \(H_q^{-1}(y - \varepsilon ^2/c'_q) \ge H_q^{-1}(y) - \varepsilon \), as desired.