3 Probability as Fancy Counting and the q-ary Entropy Function

In this chapter we set up the probabilistic toolkit that will be used throughout the book to prove existence results for codes via the “probabilistic method,” and we introduce the \(q\)-ary entropy function \(H_q\), whose close relationship to the volume of a Hamming ball makes it central to both existence and non-existence results for codes.

3.1 A Crash Course on Probability

Throughout, we only consider probability distributions defined over finite domains.

Definition 77 Probability distribution

✓

Given a finite domain \(\mathbb {D}\), a probability distribution is a function

\[ p : \mathbb {D} \to [0,1] \quad \text{such that} \quad \sum _{x \in \mathbb {D}} p(x) = 1 . \]

An event \(\mathbb {E}\) is a predicate on \(\mathbb {D}\), equivalently a subset of \(\mathbb {D}\).

Definition 78 Uniform Distribution, Def 3.1.1

✓

The uniform distribution over a finite domain \(\mathbb {D}\), denoted \(\mathscr {U}_{\mathbb {D}}\), is given by

\[ \Pr _{\mathscr {U}_{\mathbb {D}}}[x] = \frac{1}{|\mathbb {D}|} \quad \text{for every } x \in \mathbb {D}. \]

Definition 79 Random Variable, Def 3.1.2

✓

Let \(\mathbb {D}\) be a finite domain, \(I \subset \mathbb {R}\) a finite subset, and \(p\) a probability distribution over \(\mathbb {D}\). A random variable is a function \(V : \mathbb {D} \to I\). Its expectation is defined as

\[ \mathbb {E}[V] = \sum _{x \in \mathbb {D}} p(x) \cdot V(x). \]

For an event \(E\) over \(\mathbb {D}\), its indicator variable \(\mathbb {1}_E : \mathbb {D} \to \{ 0,1\} \) is defined by \(\mathbb {1}_E(x) = 1\) if \(x \in E\) and \(0\) otherwise.

Lemma 80 Expectation of an indicator, Lemma 3.1.3

✓

Let \(E\) be any event. Then

\[ \mathbb {E}[\mathbb {1}_E] = \Pr [E \text{ is true}]. \]

Proposition 81 Linearity of Expectation, Prop 3.1.4

✓

Given random variables \(V_1, \dots , V_m\) defined over the same domain \(\mathbb {D}\) and with the same probability distribution \(p\),

\[ \mathbb {E}\left[\sum _{i=1}^m V_i\right] = \sum _{i=1}^m \mathbb {E}[V_i]. \]

Proposition 82 Union Bound, Prop 3.1.5

✓

Given \(m\) binary random variables \(A_1, \dots , A_m\),

\[ \Pr \left[\left(\bigvee _{i=1}^m A_i\right) = 1\right] \le \sum _{i=1}^m \Pr [A_i = 1]. \]

Lemma 83 Markov Bound, Lemma 3.1.6

✓

Let \(V\) be a non-negative random variable. Then for any \(t {\gt} 0\),

\[ \Pr [V \ge t] \le \frac{\mathbb {E}[V]}{t}. \]

In particular, for any \(a \ge 1\), \(\Pr [V \ge a \cdot \mathbb {E}[V]] \le \frac{1}{a}\).

Definition 84 Variance, Def 3.1.7

✓

Let \(V\) be a random variable. Its variance is defined as

\[ \mathrm{Var}[V] = \mathbb {E}\left[\left(V - \mathbb {E}[V]\right)^2\right], \]

and its standard deviation is \(\sigma [V] = \sqrt{\mathrm{Var}[V]}\).

Lemma 85 Chebyshev Bound, Lemma 3.1.8

✓

Let \(V\) be a random variable such that \(\mathrm{Var}[V] \ne 0\). Then for any \(t {\gt} 0\),

\[ \Pr [|V - \mathbb {E}[V]| \ge t] \le \frac{\mathrm{Var}[V]}{t^2}. \]

Definition 86 Independence, Def 3.1.9

✓

Two random variables \(A\) and \(B\) are called independent if for every \(a, b\) in the ranges of \(A\) and \(B\) respectively,

\[ \Pr [A = a \wedge B = b] = \Pr [A = a] \cdot \Pr [B = b]. \]

Definition 87 Conditional Probability, Def 3.1.10

✓

Given two events \(A\) and \(B\) defined over the same domain and probability distribution, the probability of \(A\) conditioned on \(B\) is

\[ \Pr [A \mid B] = \frac{\Pr [A \text{ and } B]}{\Pr [B]}. \]

Lemma 88 Law of Total Probability, Lemma 3.1.11

✓

For any two events \(A\) and \(B\) defined on the same domain and probability distribution,

\[ \Pr [A] = \Pr [A \mid B] \cdot \Pr [B] + \Pr [A \mid \neg B] \cdot \Pr [\neg B]. \]

Theorem 89 Chernoff Bound, Thm 3.1.12

Let \(X_1, \dots , X_m\) be independent binary random variables and define \(X = \sum _i X_i\). Then the multiplicative Chernoff bound states that for \(0 {\lt} \varepsilon \le 1\),

\[ \Pr [|X - \mathbb {E}(X)| {\gt} \varepsilon \mathbb {E}(X)] {\lt} 2e^{-\varepsilon ^2 \mathbb {E}(X)/3}, \]

and the additive Chernoff bound states that

\[ \Pr [|X - \mathbb {E}(X)| {\gt} \varepsilon m] {\lt} 2e^{-\varepsilon ^2 m/2}. \]

Proof ▶

Definition 90 Product Distribution

✓

Given probability distributions \(p_1\) and \(p_2\) over domains \(\mathbb {D}_1\) and \(\mathbb {D}_2\) respectively, the product distribution \(p_1 \times p_2\) over \(\mathbb {D}_1 \times \mathbb {D}_2\) is the distribution under which every element \((x,y) \in \mathbb {D}_1 \times \mathbb {D}_2\) is picked by choosing \(x\) from \(\mathbb {D}_1\) according to \(p_1\) and \(y\) is picked independently from \(\mathbb {D}_2\) according to \(p_2\).

Lemma 91 Uniform distribution over a product space, Lemma 3.1.13

✓

For any \(m \ge 1\), the distribution \(\mathscr {U}_{\mathbb {D}_1 \times \mathbb {D}_2 \times \cdots \times \mathbb {D}_m}\) is identical to the distribution \(\mathscr {U}_{\mathbb {D}_1} \times \mathscr {U}_{\mathbb {D}_2} \times \cdots \times \mathscr {U}_{\mathbb {D}_m}\).

Lemma 92 Image of a non-zero vector under a random linear map, Lemma 3.1.14

Given a non-zero vector \(\mathbf{m} \in \mathbb {F}_q^k\) and a uniformly random \(k \times n\) matrix \(G\) over \(\mathbb {F}_q\), the vector \(\mathbf{m} \cdot G\) is uniformly distributed over \(\mathbb {F}_q^n\).

Proof ▶

Let \(g_{ji}\) (\(1 \le j \le k\), \(1 \le i \le n\)) denote the \((j,i)\)-th entry of \(G\). Since \(G\) is a uniformly random \(k \times n\) matrix over \(\mathbb {F}_q\), by Lemma 91 each of the \(g_{ji}\) is an independent uniformly random element of \(\mathbb {F}_q\). It therefore suffices to show that for every \(1 \le i \le n\), the \(i\)-th entry \(b_i\) of \(\mathbf{m} \cdot G\) is an independent uniformly random element of \(\mathbb {F}_q\): since \(b_i\) and \(b_j\) for \(i \ne j\) are sums over disjoint sets of entries of \(G\), they are automatically independent once each \(b_i\) is shown to be uniform.

Write \(\mathbf{m} = (m_1, \dots , m_k)\); then \(b_i = \sum _{j=1}^k m_j g_{ji}\). Since \(\mathbf{m} \ne \mathbf{0}\), some \(m_j \ne 0\); without loss of generality \(m_1 \ne 0\). Write \(b_i = m_1 g_{1i} + \sum _{j=2}^k m_j g_{ji}\). Fix any assignment of values to \(g_{2i}, \dots , g_{ki}\) (there are \(q^{k-1}\) such assignments). As \(g_{1i}\) ranges over the \(q\) elements of \(\mathbb {F}_q\) (using \(m_1 \ne 0\)), \(b_i\) takes each of the \(q\) values of \(\mathbb {F}_q\) exactly once. Hence, over all \(q^{k-1}\) assignments to \(g_{2i}, \dots , g_{ki}\), each value of \(\mathbb {F}_q\) is taken by \(b_i\) exactly \(q^{k-1}\) times out of \(q^k\) total assignments, i.e. \(b_i\) is uniformly distributed over \(\mathbb {F}_q\), as claimed.

3.2 The Probabilistic Method

The probabilistic method proves the existence of an object \(\mathscr {C}\) with a property \(\mathscr {P}\) by defining a distribution \(\mathscr {D}\) over the space of candidate objects and showing

\[ \Pr _{\mathscr {C} \sim \mathscr {D}}[\mathscr {C} \text{ has property } \mathscr {P}] {\gt} 0, \]

which (being a strictly positive probability) proves such a \(\mathscr {C}\) must exist. A common strategy is to decompose \(\mathscr {P} = P_1 \wedge \cdots \wedge P_m\) into sub-properties and use the union bound (Proposition 82) to show \(\Pr [\overline{P_i}] {\lt} 1/m\) for every \(i\). This section contains no further numbered definitions, lemmas, propositions, theorems, corollaries, constructions or algorithms; the method itself is illustrated in the book on the running example of whether a \([2,2,1]_2\) code exists (Question 3.0.1), using the tools of Section 3.1 above (Definitions 78 and 79, and Proposition 82).

3.3 The q-ary Entropy Function

Definition 93 \(q\)-ary Entropy Function, Def 3.3.1

Let \(q\) be an integer with \(q \ge 2\) and let \(x\) be a real number with \(0 \le x \le 1\). The \(q\)-ary entropy function is defined by

\[ H_q(x) = x \log _q(q-1) - x \log _q(x) - (1-x) \log _q(1-x). \]

For the special case \(q = 2\) we drop the subscript and write \(H(x) = H_2(x) = -x \log x - (1-x)\log (1-x)\), where \(\log \) denotes \(\log _2\) (the convention used throughout the book). The maximum value of \(H_q\) is \(1\), achieved at \(x = 1 - 1/q\).

Volume of Hamming Balls

Definition 94 Volume of a Hamming Ball, Def 3.3.2

Let \(q \ge 2\) and \(n \ge r \ge 1\) be integers. The volume of a Hamming ball of radius \(r\) is given by

\[ \mathrm{Vol}_q(r,n) = |B_q(\mathbf{0}, r)| = \sum _{i=0}^r \binom {n}{i} (q-1)^i. \]

(The choice of \(\mathbf{0}\) as center is immaterial: the volume of a Hamming ball is independent of its center, as the sum on the right makes clear, so any point may serve as center.)

Proposition 95 Entropy bounds on the volume of a Hamming ball, Prop 3.3.3

Let \(q \ge 2\) be an integer and \(0 \le p \le 1 - \frac{1}{q}\) be a real number. Then:

\(\mathrm{Vol}_q(pn, n) \le q^{H_q(p)n}\); and
for large enough \(n\), \(\mathrm{Vol}_q(pn,n) \ge q^{H_q(p)n - o(n)}\).

Proof ▶

Proof of (i). Starting from the binomial expansion of \(1 = (p + (1-p))^n\),

\begin{align*} 1 & = \sum _{i=0}^n \binom {n}{i} p^i (1-p)^{n-i} \\ & = \sum _{i=0}^{pn} \binom {n}{i} p^i (1-p)^{n-i} + \sum _{i=pn+1}^{n} \binom {n}{i} p^i (1-p)^{n-i}\\ & \ge \sum _{i=0}^{pn} \binom {n}{i} p^i (1-p)^{n-i} \\ & = \sum _{i=0}^{pn} \binom {n}{i} (q-1)^i \left(\frac{p}{q-1}\right)^i (1-p)^{n-i}\\ & = \sum _{i=0}^{pn} \binom {n}{i} (q-1)^i (1-p)^n \left(\frac{p}{(q-1)(1-p)}\right)^i . \end{align*}

Since \(0 \le p \le 1 - 1/q\) implies \(\frac{p}{(q-1)(1-p)} \le 1\), and \(i \le pn\) in the sum, raising a number in \([0,1]\) to a larger power can only decrease it, so \(\left(\frac{p}{(q-1)(1-p)}\right)^i \ge \left(\frac{p}{(q-1)(1-p)}\right)^{pn}\) for every \(i \le pn\). Hence

\begin{align*} 1 & \ge \sum _{i=0}^{pn} \binom {n}{i} (q-1)^i (1-p)^n \left(\frac{p}{(q-1)(1-p)}\right)^{pn}\\ & = \sum _{i=0}^{pn} \binom {n}{i} (q-1)^i \left(\frac{p}{q-1}\right)^{pn} (1-p)^{(1-p)n} \\ & \ge \mathrm{Vol}_q(pn,n) \cdot q^{-H_q(p)n}, \end{align*}

where the last line uses \(q^{-H_q(p)n} = \left(\frac{p}{q-1}\right)^{pn} (1-p)^{(1-p)n}\), which follows directly from the definition of \(H_q\). Rearranging gives \(\mathrm{Vol}_q(pn,n) \le q^{H_q(p)n}\), which proves (i).

Proof of (ii). By Stirling’s approximation for \(n!\),

\[ \binom {n}{pn} = \frac{n!}{(pn)!((1-p)n)!} {\gt} \frac{(n/e)^n}{(pn/e)^{pn}((1-p)n/e)^{(1-p)n}} \cdot \frac{e^{\lambda _1(n) - \lambda _2(pn) - \lambda _2((1-p)n)}}{\sqrt{2\pi p(1-p) n}} = \frac{1}{p^{pn}(1-p)^{(1-p)n}} \cdot \ell (n), \]

where \(\ell (n) = \dfrac {e^{\lambda _1(n) - \lambda _2(pn) - \lambda _2((1-p)n)}}{\sqrt{2\pi p(1-p)n}}\) (with \(\lambda _1, \lambda _2\) the error terms of Stirling’s approximation). Then, keeping only the last term of the sum defining \(\mathrm{Vol}_q(pn,n)\),

\[ \mathrm{Vol}_q(pn,n) \ge \binom {n}{pn}(q-1)^{pn} {\gt} \frac{(q-1)^{pn}}{p^{pn}(1-p)^{(1-p)n}} \cdot \ell (n) \ge q^{H_q(p)n - o(n)}, \]

where the last inequality follows from the definition of \(H_q(\cdot )\) and the fact that \(\ell (n) = q^{-o(n)}\) for large enough \(n\). This proves (ii) and completes the proof.

Other Properties of the q-ary Entropy Function

Proposition 96 Entropy vs. relative distance for large \(q\), Prop 3.3.4

For small enough \(\varepsilon \), the inequality

\[ 1 - H_q(\rho ) \ge 1 - \rho - \varepsilon \qquad \text{for every } 0 {\lt} \rho \le 1 - 1/q \]

holds if and only if \(q\) is \(2^{\Omega (1/\varepsilon )}\).

Proof ▶

First note that, by definition of \(H_q(\rho )\) and \(H(\rho )\) (writing \(\log \) for \(\log _2\)),

\[ H_q(\rho ) = \rho \log _q(q-1) - \rho \log _q \rho - (1-\rho )\log _q(1-\rho ) = \rho \log _q(q-1) + \frac{H(\rho )}{\log _2 q}. \]

(\(\Leftarrow \)) Suppose \(q \ge 2^{1/\varepsilon }\). Then \(\log _q(q-1) \le 1\) and \(H(\rho ) \le 1\) for every \(\rho \), and \(\log _2 q \ge 1/\varepsilon \), so

\[ H_q(\rho ) \le \rho \log _q(q-1) + \frac{H(\rho )}{\log _2 q} \le \rho + \varepsilon , \]

hence \(1 - H_q(\rho ) \ge 1 - \rho - \varepsilon \) for every \(0 {\lt} \rho \le 1 - 1/q\), as desired.

(\(\Rightarrow \)) Suppose \(q = 2^{o(1/\varepsilon )}\). We first claim that for small enough \(\varepsilon \), if \(q \ge 1/\varepsilon ^2\) then \(\log _q(q-1) \ge 1 - \varepsilon \): indeed,

\[ \log _q(q-1) = 1 + \frac{1}{\ln q}\ln (1 - 1/q) = 1 - O\! \left(\frac{1}{q \ln q}\right), \]

which is at least \(1-\varepsilon \) once \(q \ge 1/\varepsilon ^2\) (and \(\varepsilon \) small enough). Next, if \(q = 2^{o(1/\varepsilon )}\), then for fixed \(\rho \),

\[ \frac{H(\rho )}{\log _2 q} = \varepsilon \cdot \omega (1). \]

Combining, for \(q = 2^{o(1/\varepsilon )}\) with \(q \ge 1/\varepsilon ^2\),

\[ \rho \log _q(q-1) + \frac{H(\rho )}{\log _2 q} \ge \rho - \varepsilon + \varepsilon \cdot \omega (1) {\gt} \rho + \varepsilon , \]

which implies \(1 - H_q(\rho ) {\lt} 1 - \rho - \varepsilon \), as desired. Finally, for \(q \le 1/\varepsilon ^2\), Lemma 97 (applied with \(q\) and \(q^m = 1/\varepsilon ^2\)) shows \(1-H_q(\rho ) \le 1 - H_{1/\varepsilon ^2}(\rho ) {\lt} 1 - \rho - \varepsilon \), using the case already established for the base \(1/\varepsilon ^2 \ge 2^{1/\varepsilon }\). This completes the proof of the “only if” direction.

Lemma 97 Entropy is monotone decreasing in the base, Lemma 3.3.5

Let \(q \ge 2\) be an integer and let \(0 \le \rho \le 1 - 1/q\). Then for any real \(m \ge 1\) such that

\[ q^{m-1} \ge \left(1 + \frac{1}{q-1}\right)^{q-1}, \]

we have \(H_q(\rho ) \ge H_{q^m}(\rho )\).

Proof ▶

Since \(H_q(0) = H_{q^m}(0) = 0\), we may assume \(\rho \in (0, 1-1/q]\). As noted in the proof of Proposition 96, \(H_q(\rho ) = \rho \cdot \frac{\log (q-1)}{\log q} + H(\rho ) \cdot \frac{1}{\log q}\) (all logs base 2). Hence

\[ H_q(\rho ) - H_{q^m}(\rho ) = \rho \left(\frac{\log (q-1)}{\log q} - \frac{\log (q^m-1)}{m \log q}\right) + H(\rho )\left(\frac{1}{\log q} - \frac{1}{m \log q}\right). \]

Multiplying through by \(\frac{1}{\rho } m \log q {\gt} 0\),

\begin{align*} \frac{m \log q}{\rho }\left(H_q(\rho ) - H_{q^m}(\rho )\right) & = \log (q-1)^m - \log (q^m - 1) + \frac{H(\rho )}{\rho }(m-1) \\ & \ge \log (q-1)^m - \log (q^m-1) + \frac{H(1-1/q)}{1-1/q}(m-1)\\ & = \log (q-1)^m - \log (q^m-1) + (m-1)\left(\log \frac{q}{q-1} + \frac{\log q}{q-1}\right) \\ & = \log \left(\frac{(q-1)^m}{q^m - 1}\cdot \left(\frac{q}{q-1}\right)^{m-1}\cdot q^{\frac{m-1}{q-1}}\right)\\ & = \log \left(\frac{(q-1)\cdot q^{m-1}\cdot q^{\frac{m-1}{q-1}}}{q^m - 1}\right)\\ & \ge 0, \end{align*}

where the inequality in the second line uses that \(H(\rho )/\rho \) is a decreasing function of \(\rho \) on \((0,1)\) together with \(\rho \le 1-1/q\) (indeed \(H(\rho )/\rho = \log (1/\rho ) - (1/\rho - 1)\log (1-\rho )\), and one checks both terms are decreasing in \(\rho \)), and the final inequality is exactly the hypothesis \(q^{m-1} \ge (1+1/(q-1))^{q-1}\), since \((q-1)\cdot q^{(m-1)/(q-1)} \ge q\) is equivalent to it after raising both sides to the power \(q-1\). This shows \(H_q(\rho ) - H_{q^m}(\rho ) \ge 0\), as desired.

Corollary 98 Entropy is monotone decreasing in the base, Cor 3.3.6

Let \(q \ge 3\) be an integer and \(0 \le \rho \le 1 - 1/q\). Then for any integer \(m \ge 2\),

\[ H_q(\rho ) \ge H_{q^m}(\rho ). \]

Proof ▶

Since \((1+1/x)^x \le e\) for \(x {\gt} 0\), the hypothesis of Lemma 97, \(q^{m-1} \ge (1+1/(q-1))^{q-1}\), is satisfied whenever \(m \ge 1 + \frac{1}{\ln q}\). One checks directly that this in turn holds for every integer \(m \ge 2\) once \(q \ge 3\). The claim then follows by applying Lemma 97.

Proposition 99 Entropy near its maximum, Prop 3.3.7

For small enough \(\varepsilon {\gt} 0\),

\[ H_q\left(1 - \frac{1}{q} - \varepsilon \right) \le 1 - c_q \varepsilon ^2, \]

where \(c_q\) is a constant that depends only on \(q\).

Proof ▶

Since \(H_q'(x) = 0\) at \(x = 1-1/q\), the intuition is that in the Taylor expansion of \(H_q(1-1/q-\varepsilon )\) around \(\varepsilon = 0\) the linear term vanishes; we make this precise assuming \(\varepsilon {\lt} 1/q\). Writing \(x = 1-1/q-\varepsilon \) (so \(1-x = 1/q+\varepsilon \)), by definition of \(H_q\),

\begin{align*} H_q(1-1/q-\varepsilon ) & = -\left(1-\frac1q-\varepsilon \right)\log _q\! \left(\frac{1-1/q-\varepsilon }{q-1}\right) - \left(\frac1q+\varepsilon \right)\log _q\! \left(\frac1q+\varepsilon \right)\\ & = -\log _q\! \left(\frac1q\left(1-\varepsilon q\right)\right) + \left(\frac1q+\varepsilon \right)\log _q\! \left(\frac{1-(\varepsilon q)/(q-1)}{1+\varepsilon q}\right)\\ & = 1 - \frac{1}{\ln q}\left[\ln \! \left(1-\frac{\varepsilon q}{q-1}\right) - \left(\frac1q+\varepsilon \right)\ln \! \left(\frac{1-(\varepsilon q)/(q-1)}{1+\varepsilon q}\right)\right]. \end{align*}

Expanding \(\ln (1+x) = x - x^2/2 + x^3/3 - \cdots \) for each logarithm above and collecting the \(\varepsilon ^3\) and smaller terms into \(o(\varepsilon ^2)\), one obtains after simplification

\[ H_q(1-1/q-\varepsilon ) = 1 - \frac{\varepsilon ^2 q^2}{2\ln q\, (q-1)} + o(\varepsilon ^2) \le 1 - \frac{\varepsilon ^2 q^2}{4 \ln q\, (q-1)} \]

for \(\varepsilon \) small enough. Taking \(c_q = \frac{q^2}{4\ln q\, (q-1)}\) completes the proof.

Proposition 100 Entropy near zero, Prop 3.3.8

For small enough \(\varepsilon {\gt} 0\),

\[ H_q(\varepsilon ) = \Theta \left(\frac{1}{\log q} \cdot \varepsilon \log \left(\frac{1}{\varepsilon }\right)\right). \]

Proof ▶

By definition,

\[ H_q(\varepsilon ) = \varepsilon \log _q(q-1) + \varepsilon \log _q(1/\varepsilon ) + (1-\varepsilon )\log _q\! \left(\frac{1}{1-\varepsilon }\right). \]

Since every term on the right is non-negative,

\[ H_q(\varepsilon ) \ge \varepsilon \log (1/\varepsilon )/\log q, \]

which gives the lower bound (the \(\Omega \) half of the claim). For the upper bound, one has \((1-\varepsilon )\log _q(1/(1-\varepsilon )) \le 2\varepsilon /\ln q\) for small enough \(\varepsilon \), so

\[ H_q(\varepsilon ) \le \frac{2+\ln (q-1)}{\ln q}\cdot \varepsilon + \frac{1}{\ln q}\cdot \varepsilon \ln (1/\varepsilon ). \]

Together, the two displayed bounds give \(H_q(\varepsilon ) = \Theta \! \left(\frac{1}{\log q}\cdot \varepsilon \log (1/\varepsilon )\right)\), as claimed.

Definition 101 Inverse \(q\)-ary Entropy Function

Since \(H_q(\cdot )\) restricted to the domain \([0, 1-1/q]\) is a bijective map onto \([0,1]\), we define \(H_q^{-1}(y) = x\) for the unique \(x \in [0, 1-1/q]\) such that \(H_q(x) = y\).

Lemma 102 Lower bound on the inverse entropy function, Lemma 3.3.9

For every \(0 {\lt} y \le 1 - 1/q\) and for every small enough \(\varepsilon {\gt} 0\),

\[ H_q^{-1}(y - \varepsilon ^2/c'_q) \ge H_q^{-1}(y) - \varepsilon , \]

where \(c'_q \ge 1\) is a constant that depends only on \(q\).

Proof ▶

It is straightforward to check that \(H_q^{-1}(y)\) is a strictly increasing, convex function of \(y\) on \([0,1]\). In particular its derivative is increasing, so \((H_q^{-1})'(1) \ge (H_q^{-1})'(y)\) for every \(y \in [0,1]\), and hence for every \(0 \le y \le 1\) and small enough \(\delta {\gt} 0\),

\[ \frac{H_q^{-1}(y) - H_q^{-1}(y-\delta )}{\delta } \le \frac{H_q^{-1}(1) - H_q^{-1}(1-\delta )}{\delta }. \]

By Proposition 99 (applied with \(\varepsilon = \sqrt{\delta /c_q}\), recalling \(H_q^{-1}(1) = 1-1/q\)), for small enough \(\delta \),

\[ H_q^{-1}(1-\delta ) \ge 1 - \frac1q - \sqrt{\delta /c_q} = H_q^{-1}(1) - \sqrt{\delta /c_q}. \]

Combining the two displays, and setting \(c'_q = \max (1, 1/c_q)\) and \(\delta = \varepsilon ^2/c'_q\), we get

\[ H_q^{-1}(y) - H_q^{-1}(y-\varepsilon ^2/c'_q) \le \sqrt{\delta /c_q} = \sqrt{\varepsilon ^2/(c'_q c_q)} \le \varepsilon , \]

since \(c'_q c_q \ge 1\). Rearranging gives \(H_q^{-1}(y - \varepsilon ^2/c'_q) \ge H_q^{-1}(y) - \varepsilon \), as desired.