Improved Batch Code Lower Bounds

3 Proof of the main theorem

Definition 13 Good simple tensor

Fix an integer \(t \ge 1\) and work in the standard tensor basis of \(\mathbb {F}^{N^{2t}}\). A simple tensor \(e_{i'_1} \otimes \cdots \otimes e_{i'_{2t}}\) is called good if the multiset \(\{ i'_1, \dots , i'_{2t}\} \subset [N]\) contains exactly \(t\) distinct elements, each appearing exactly twice.

Lemma 14 Batch codes are downward closed in \(k\)

If \(C\) is a \(k\)-batch code and \(k' \le k\), then \(C\) is also a \(k'\)-batch code.

Proof

Let \(\{ i_1, \dots , i_{k'}\} \subset [n]\) be a multiset of size \(k'\). Extend it to a multiset \(\{ i_1, \dots , i_{k'}, i_{k'+1}, \dots , i_k\} \) of size \(k\) by appending arbitrary indices (e.g. repeating \(i_1\)). Since \(C\) is a \(k\)-batch code, there exist mutually disjoint sets \(R_1, \dots , R_k\) and functions \(g_1, \dots , g_k\) with \(g_j(c|_{R_j}) = x_{i_j}\) for all \(j \in [k]\). Restricting to \(j \in [k']\), the sets \(R_1, \dots , R_{k'}\) are still mutually disjoint and satisfy \(g_j(c|_{R_j}) = x_{i_j}\), witnessing that \(C\) is a \(k'\)-batch code.

Lemma 15 Pairs of dual codewords meeting in a single index

Let \(C \le \mathbb {F}^N\) be a linear \(k\)-batch code with a systematic encoding and set \(t := \lfloor k/3 \rfloor \). For every \(t\)-tuple \((i_1, \dots , i_t) \in [n]^t\) there exist sets \(R_{j,\ell } \subseteq [N]\) (for \(j \in [t]\), \(\ell \in [3]\)) that are pairwise disjoint over all \((j,\ell )\), together with, for each \(j \in [t]\), two dual codewords \(c^{(j,1)}, c^{(j,2)} \in C^{\perp }\) such that

\[ \operatorname{supp}\bigl(c^{(j,\ell )}\bigr) \subseteq \{ i_j\} \cup R_{j,\ell } \quad (\ell = 1,2), \qquad \operatorname{supp}\bigl(c^{(j,1)}\bigr) \cap \operatorname{supp}\bigl(c^{(j,2)}\bigr) = \{ i_j\} . \]

We write \(\mathcal{D}_{i_1, \dots , i_t} := (c^{(j,\ell )})_{1 \le j \le t,\ \ell =1,2}\).

Proof

Since \(t = \lfloor k/3 \rfloor \) we have \(3t \le k\), so by Lemma 14 the code \(C\) is a \(3t\)-batch code. Consider the \(3t\)-multiset of information symbols \(\bigcup _{j=1}^{t} \{ i_j, i_j, i_j\} \) (three copies of each \(i_j\)). By the \(3t\)-batch property there exist mutually disjoint recovery sets \(R_{j,1}, R_{j,2}, R_{j,3}\) (\(j \in [t]\)) and linear functions \(g_{j,\ell }\) with \(g_{j,\ell }(c|_{R_{j,\ell }}) = x_{i_j} = c_{i_j}\) for all codewords \(c\) (the last equality by the systematic encoding). In particular all the \(R_{j,\ell }\) are pairwise disjoint.

Fix \(j \in [t]\). Since \(R_{j,1}, R_{j,2}, R_{j,3}\) are pairwise disjoint, the index \(i_j\) lies in at most one of them; hence at least two of the three sets, say \(R_{j,\ell }\) for \(\ell \) in a two-element set, do not contain \(i_j\). Relabel these two as \(R_{j,1}, R_{j,2}\). For each such \(\ell \), applying Lemma 7 with \(i = i_j\), \(R = R_{j,\ell }\), and \(g = g_{j,\ell }\) yields a nonzero dual codeword \(c^{(j,\ell )}\) with \(\operatorname{supp}(c^{(j,\ell )}) \subseteq R_{j,\ell } \cup \{ i_j\} \) and \(i_j \in \operatorname{supp}(c^{(j,\ell )})\). Because \(R_{j,1}\) and \(R_{j,2}\) are disjoint, the only index common to \(\operatorname{supp}(c^{(j,1)})\) and \(\operatorname{supp}(c^{(j,2)})\) is \(i_j\), so \(\operatorname{supp}(c^{(j,1)}) \cap \operatorname{supp}(c^{(j,2)}) = \{ i_j\} \).

Definition 16 Good map
#

A good map is a bijection \(\pi : [2t] \to [t] \times [2]\). We write \(\pi _1 : [2t] \to [t]\) for the first coordinate of \(\pi \) and \(\pi _2 : [2t] \to [2]\) for the second coordinate, so \(\pi (j) = (\pi _1(j), \pi _2(j))\).

Definition 17 The simple tensor \(e_{i_1,\dots ,i_t,\pi }\)

For a good map \(\pi \) and indices \(i_1, \dots , i_t\), define the simple tensor

\[ e_{i_1, \dots , i_t, \pi } := \bigotimes _{j=1}^{2t} e_{i_{\pi _1(j)}} \in \mathbb {F}^{N^{2t}} . \]

Note that this definition does not depend on \(\pi _2\).

Lemma 18 \(e_{i_1,\dots ,i_t,\pi }\) is good

For every good map \(\pi \) and indices \(i_1, \dots , i_t\), the simple tensor \(e_{i_1, \dots , i_t, \pi }\) is a good simple tensor.

Proof

Since \(\pi \) is a bijection onto \([t] \times [2]\), for each value \(a \in [t]\) there are exactly two indices \(j \in [2t]\) with \(\pi _1(j) = a\) (namely the preimages of \((a,1)\) and \((a,2)\)). Hence in the product \(\bigotimes _{j=1}^{2t} e_{i_{\pi _1(j)}}\) each \(e_{i_a}\) occurs exactly twice. Assuming \(i_1, \dots , i_t\) are distinct, the multiset of indices appearing is \(\{ i_1, i_1, \dots , i_t, i_t\} \): exactly \(t\) distinct values, each twice. By Definition 13 the tensor is good.

Definition 19 The tensor \(w_{i_1,\dots ,i_t,\pi }\)

Let \(\mathcal{D}_{i_1, \dots , i_t} = (c^{(j,\ell )})_{1 \le j \le t,\ \ell =1,2}\) be the dual codewords of Lemma 15. For a good map \(\pi \) and indices \(i_1, \dots , i_t\), define the tensor

\[ w_{i_1, \dots , i_t, \pi } := \bigotimes _{j=1}^{2t} c^{(\pi _1(j),\, \pi _2(j))} . \]

Since each \(c^{(\pi _1(j),\pi _2(j))} \in V = C^{\perp }\), we have \(w_{i_1, \dots , i_t, \pi } \in W = V^{\otimes 2t}\).

Lemma 20 Claim 2.1: few good simple tensors

Each \(w_{i_1, \dots , i_t, \pi }\), written in the standard basis of \(\mathbb {F}^{N^{2t}}\), contains at most \(3^t\) good simple tensors.

Proof

Recall from Lemma 15 that \(\operatorname{supp}(c^{(j,\ell )}) \subseteq \{ i_j\} \cup R_{j,\ell }\) and that the sets \(R_{j,\ell }\) are pairwise disjoint. Consequently:

  • each of \(i_1, \dots , i_t\) lies in at most three of the supports \(\operatorname{supp}(c^{(j,\ell )})\): an index \(i_j\) lies in \(\operatorname{supp}(c^{(j,1)})\) and \(\operatorname{supp}(c^{(j,2)})\), and otherwise can lie in \(\operatorname{supp}(c^{(j',\ell )})\) only when \(i_j \in R_{j',\ell }\), which holds for at most one pair \((j',\ell )\) by pairwise disjointness of the \(R_{j',\ell }\);

  • any index \(i \in [N] \setminus \{ i_1, \dots , i_t\} \) lies in at most one set \(R_{j,\ell }\), hence in at most one support \(\operatorname{supp}(c^{(j,\ell )})\).

In the standard basis representation of \(w_{i_1, \dots , i_t, \pi } = \bigotimes _{j=1}^{2t} c^{(\pi _1(j),\pi _2(j))}\), a simple tensor \(e_{i'_1} \otimes \cdots \otimes e_{i'_{2t}}\) can occur only if, for each position \(j\), the index \(i'_j\) lies in \(\operatorname{supp}(c^{(\pi _1(j),\pi _2(j))})\). By the two bullet points, across the \(2t\) positions each of \(e_{i_1}, \dots , e_{i_t}\) can appear at most three times and any other \(e_i\) at most once.

Now suppose \(e_{i'_1} \otimes \cdots \otimes e_{i'_{2t}}\) is moreover a good simple tensor: then every value in its index multiset appears exactly twice. A value other than \(i_1, \dots , i_t\) appears at most once, so it cannot appear (exactly twice is impossible); hence the only values are \(i_1, \dots , i_t\), and each \(e_{i_j}\) appears exactly twice. For a fixed \(j\), the positions \(p \in [2t]\) at which \(e_{i_j}\) can occur are: the two positions \(p\) with \(\pi _1(p) = j\) (where \(c^{(\pi _1(p),\pi _2(p))} = c^{(j,\cdot )}\) has \(i_j\) in its support), plus at most one position \(p\) with \(R_{\pi _1(p),\pi _2(p)} \ni i_j\). So there are at most three admissible positions, and we must choose which two of them carry \(e_{i_j}\): at most \(\binom {3}{2} = 3\) choices. Independently over \(j \in [t]\), the number of good simple tensors is therefore at most \(3^t\).

Lemma 21 Claim 2.2: \(w_{i_1,\dots ,i_t,\pi }\) contains \(e_{i_1,\dots ,i_t,\pi }\)

Each \(w_{i_1, \dots , i_t, \pi }\), written in the standard basis of \(\mathbb {F}^{N^{2t}}\), contains the simple tensor \(e_{i_1, \dots , i_t, \pi }\).

Proof

By Lemma 15, for each \(j \in [2t]\) the dual codeword \(c^{(\pi _1(j),\pi _2(j))}\) has \(i_{\pi _1(j)}\) in its support, i.e. a nonzero coordinate at index \(i_{\pi _1(j)}\). By the definition of the tensor product (Definition 8), the coordinate of \(w_{i_1, \dots , i_t, \pi } = \bigotimes _{j=1}^{2t} c^{(\pi _1(j),\pi _2(j))}\) at the tuple \((i_{\pi _1(1)}, \dots , i_{\pi _1(2t)})\) equals \(\prod _{j=1}^{2t} c^{(\pi _1(j),\pi _2(j))}_{i_{\pi _1(j)}} \neq 0\). Hence the standard basis representation of \(w_{i_1, \dots , i_t, \pi }\) contains \(\bigotimes _{j=1}^{2t} e_{i_{\pi _1(j)}} = e_{i_1, \dots , i_t, \pi }\).

Lemma 22 Size of the set of good simple tensors

Let \(E\) be the set of all simple tensors \(e_{i_1, \dots , i_t, \pi }\) with \(i_1 {\gt} \cdots {\gt} i_t\) (from \([n]\)) and \(\pi \) a good map. Then

\[ |E| = \binom {n}{t} \cdot \binom {2t}{2, 2, \dots , 2} = \binom {n}{t} \cdot \frac{(2t)!}{2^t}, \]

where the multinomial coefficient has \(t\) twos.

Proof

By Definition 17, \(e_{i_1, \dots , i_t, \pi }\) depends only on \((i_1, \dots , i_t)\) and on \(\pi _1\) (not on \(\pi _2\)). The strictly decreasing tuples \(i_1 {\gt} \cdots {\gt} i_t\) from \([n]\) are in bijection with \(t\)-subsets of \([n]\), giving \(\binom {n}{t}\) choices. Given the tuple, the data \(e_{i_1, \dots , i_t, \pi }\) is determined by \(\pi _1 : [2t] \to [t]\), the first coordinate of a good map; since \(\pi \) is a bijection onto \([t] \times [2]\), the map \(\pi _1\) takes each value in \([t]\) exactly twice. The number of such maps \(\pi _1\) is the number of ways to partition the \(2t\) positions into \(t\) ordered groups of size \(2\), namely \(\binom {2t}{2,2,\dots ,2} = (2t)!/2^t\). Distinct \(\pi _1\) give distinct simple tensors (the assignment of indices to positions differs), so \(|E| = \binom {n}{t} \cdot (2t)!/2^t\).

Proof of Theorem 3

Let \(C \le \mathbb {F}^N\) be a linear \(k\)-batch code with systematic encoding, and set \(t := \lfloor k/3 \rfloor \). Define \(V := C^{\perp }\) and \(W := V^{\otimes 2t}\). By Lemma 6 the redundancy of \(C\) equals \(\dim V\), so it suffices to lower bound \(\dim V\).

For every strictly decreasing tuple \(i_1 {\gt} \cdots {\gt} i_t\) from \([n]\), Lemma 15 furnishes the dual codewords \(\mathcal{D}_{i_1, \dots , i_t}\), and for each good map \(\pi \) (Definition 16) we obtain the simple tensor \(e_{i_1, \dots , i_t, \pi }\) (Definition 17), which is good by Lemma 18, and the tensor \(w_{i_1, \dots , i_t, \pi } \in W\) (Definition 19).

Let \(E\) be the set of good simple tensors \(e_{i_1, \dots , i_t, \pi }\) with \(i_1 {\gt} \cdots {\gt} i_t\) and \(\pi \) a good map. We greedily build a sequence \((e^{(1)}, w^{(1)}), (e^{(2)}, w^{(2)}), \dots \in E \times W\): given \((e^{(1)}, w^{(1)}), \dots , (e^{(r)}, w^{(r)})\), choose a good simple tensor \(e^{(r+1)} = e_{i_1, \dots , i_t, \pi } \in E\) that does not appear in the standard basis representation of any of \(w^{(1)}, \dots , w^{(r)}\), and set \(w^{(r+1)} := w_{i_1, \dots , i_t, \pi }\); by Lemma 21 the representation of \(w^{(r+1)}\) contains \(e^{(r+1)}\). Each \(w^{(r)}\) contains at most \(3^t\) good simple tensors by Lemma 20, so after \(r\) steps at most \(r \cdot 3^t\) elements of \(E\) are excluded; hence the greedy choice succeeds for at least

\[ D := \lceil |E| / 3^t \rceil \]

steps. By construction \(w^{(r)}\) contains \(e^{(r)}\) (Lemma 21) and, for \(r' {\gt} r\), \(e^{(r')}\) was chosen to avoid \(w^{(r)}\), so \(w^{(r)}\) contains none of \(e^{(r+1)}, \dots , e^{(D)}\). Thus the hypotheses of Lemma 10 hold (with \(s = 2t\)), and \(w^{(1)}, \dots , w^{(D)}\) are linearly independent in \(W\).

Therefore, using Lemma 12 and Lemma 22,

\[ (\dim V)^{2t} = \dim W \ge D \ge \frac{|E|}{3^t} = \binom {n}{t} \cdot \frac{(2t)!}{2^t} \cdot \frac{1}{3^t} \ge \Bigl(\frac{n}{t}\Bigr)^t \cdot \frac{(2t)^{2t}/e^{2t}}{2^t \cdot 3^t} = \frac{n^t t^t}{(3e^2/2)^t}, \]

where we used \(\binom {n}{t} \ge (n/t)^t\) and \((2t)! \ge (2t/e)^{2t} = (2t)^{2t}/e^{2t}\), and simplified \((2t)^{2t} = 4^t t^{2t}\) so that \(4^t/(2^t 3^t e^{2t}) = (2/3)^t/e^{2t} = (3e^2/2)^{-t}\). Taking \(2t\)-th roots,

\[ \dim V \ge \Bigl(\frac{n^t t^t}{(3e^2/2)^t}\Bigr)^{1/(2t)} = \sqrt{\frac{n t}{3e^2/2}} = \Omega (\sqrt{nt}) = \Omega (\sqrt{nk}), \]

since \(t = \lfloor k/3 \rfloor = \Theta (k)\).

Finally we convert \(\Omega (\sqrt{nk})\) into \(\Omega (\sqrt{Nk})\). If \(n = \Omega (N)\) then \(\sqrt{nk} = \Omega (\sqrt{Nk})\) and we are done. Otherwise \(n {\lt} N/2\) (for large \(N\)), so the redundancy is \(N - n {\gt} N/2\); since \(k \le N\) gives \(\sqrt{Nk} \le N\), we again have redundancy \(\ge N/2 = \Omega (\sqrt{Nk})\). In all cases the redundancy is \(\Omega (\sqrt{Nk})\), completing the proof.