22 Finding Defectives: Group Testing
Given
\(\)N\( individuals indexed by \)[N]\(, a \emph{query} (or \emph{test}) \)S\( is a subset \)S ⊆[N]\(. If \)x = (x_1,…,x_N) {0,1}^N\( is the (unknown) indicator vector of the individuals possessing a given biomarker (so \)x_i = 1\( iff individual \)i\( is defective), the answer to the query \)S\( is \[ A(S) = \begin{cases} 1 & \text{if } \sum _{i \in S} x_i \ge 1, \\ 0 & \text{otherwise.} \end{cases} \] Equivalently, \)A(S) = _i S x_i\( is the logical-OR of the bits of \)x\( indexed by \)S\(. The \emph{group testing problem} is: given an upper bound \)d\( on the number of defectives (i.e.\ \)wt(x) ≤d\(), compute \)x\( while minimizing the number of tests/queries used. \)
A group testing scheme is adaptive if each test may be chosen based on the outcomes of all previously performed tests. A group testing scheme is non-adaptive if the entire collection of tests is fixed in advance, before any test outcome is observed.
Given a subset of \(\)N\( items with \)d\( defects represented as \)x {0,1}^N\( with \)wt(x) ≤d\(, the minimum number of \emph{non-adaptive} tests that one would have to make (over all valid non-adaptive schemes that correctly determine \)x\() is defined as \)t(d,N)\(. \)
Given a set of \(\)N\( items with \)d\( defects, \)t^a(d,N)\( is defined as the minimum number of \emph{adaptive} tests that one would have to make to detect all the defective items. \)
For every
\(\)1 ≤d ≤N\(, we have \[ 1 \le t^a(d,N) \le t(d,N) \le N. \] \)
The inequality
\(\)t(d,N) ≤N\( follows from the naive scheme of testing all individuals with singleton tests \){1},…,{N}\( (which is trivially non-adaptive). The inequality \)1 ≤t^a(d,N)\( is trivial (if \)N ≥1\( and \)d ≥1\( at least one test is needed to distinguish the all-zero input from a weight-one input). Finally, \)t^a(d,N) ≤t(d,N)\( holds because any non-adaptive test schedule can be simulated by an adaptive scheme that simply runs all of the non-adaptive tests as its first round of tests and ignores the intermediate outcomes when choosing later tests; adaptive schemes can only do at least as well since later tests may be chosen using information revealed by earlier ones. \)
For
\(\)S ⊆[N]\(, let \)χ_S {0,1}^N\( denote its indicator vector, i.e.\ \)χ_S(i) = 1\( iff \)i S\(. A non-adaptive group testing scheme with \)t\( tests \)S_1,…,S_t\( is represented by the \)t ×N\( \emph{testing matrix} \)M = (M_ij)_i [t], j [N]\( whose \)i\(-th row is \)χ_S_i\(, i.e.\ \)M_ij = 1\( if \)j S_i\( and \)M_ij=0\( otherwise. (Testing each individual as a singleton set corresponds to \)M\( being the \)N ×N\( identity matrix.) Interpreting multiplication as logical AND (\)∧\() and addition as logical OR (\)∨\(), we write \)M ⊙x\( for the vector \)r {0,1}^t\( of test outcomes, i.e.\ \)r = M ⊙x\( where \)r_i = 1\( iff row \)i\( of \)M\( and \)x\( share a common \)1\(-position. For a column index \)j [N]\( we also write \)M^j {0,1}^t\( for the \)j\(-th column of \)M\(, thought of interchangeably as a subset of \)[t]\( (namely, the set of rows in which it has a \)1\(). The goal of non-adaptive group testing is to design \)M\( with as few rows \)t\( as possible so that \)x\( can be recovered from \)M ⊙x\(. \)
For every
\(\)1 ≤d ≤N\(, \[ t^a(d,N) \ge d \log \frac{N}{d}. \] \)
Fix any valid adaptive group testing scheme with
\(\)t\( tests. Observe that if \)x ≠y {0,1}^N\( with \)wt(x), wt(y) ≤d\(, then \)r(x) ≠r(y)\(, where \)r(x)\( denotes the vector of test results obtained when running the (adaptive) scheme on input \)x\(, and similarly for \)r(y)\(: two valid inputs cannot produce the same result vector, since otherwise the scheme could not distinguish \)x\( from \)y\(. This observation implies that the number of distinct test-outcome vectors is at least the number of distinct binary vectors of Hamming weight at most \)d\( in \){0,1}^N\(, i.e.\ \)Vol_2(d,N)\( (the volume of a Hamming ball of radius \)d\(, Definition~ \ref{def:c01-hamming-ball}). On the other hand, the number of possible distinct \)t\(-bit outcome vectors is at most \)2^t\(. Combining the two facts, \[ 2^t \ge \mathrm{Vol}_2(d,N), \] and hence \[ t \ge \log \mathrm{Vol}_2(d,N). \] Recall that \[ \mathrm{Vol}_2(d,N) \ge \binom {N}{d} \ge \left(\frac{N}{d}\right)^d, \] where the first inequality is the definition of the volume of a Hamming ball as a sum of binomial coefficients (of which \)Nd\(, the term for weight exactly \)d\(, is one summand) and the second is the standard bound \)Nd ≥(N/d)^d\(. Combining these, \)t ≥d (N/d)\(, as desired, and since this holds for every valid adaptive scheme with \)t\( tests, \)t^a(d,N) ≥d(N/d)\(. \)
We prove the upper bound by exhibiting a matrix that solves non-adaptive group testing for
\(\)d=1\(. Let the group testing matrix \)M\( be the parity check matrix for the \)[2^m-1,2^m-m-1,3]_2\( Hamming code (Definition~ \ref{def:c01-hamming-code}), i.e.\ \)H_m\(, where the \)i\(-th column is the binary representation of \)i\(. This works because if \)x\( has \)wt(x) ≤1\( and \)x = e_i\( for some \)i [N]\(, then \)M ⊙x = M^i\(, the \)i\(-th column of \)M\(, and by construction \)M^i\( is exactly the binary representation of \)i\(; hence \)i\( (and thus \)x\() can be read off directly from \)M ⊙x\(. If instead \)wt(x) = 0\(, then \)M ⊙x = 0\(, which is exactly \)x\( and is not equal to the binary representation of any \)i [N]\( (since \)i ≥1\(). Hence \)M ⊙x\( uniquely identifies \)x\( whenever \)wt(x) ≤1\(. If \)N ≠2^m - 1\( for any \)m\(, take the matrix \)H_m\( for the (unique) \)m\( with \)2^m-1 - 1 < N < 2^m - 1\( and pad \)x\( with \)0\(s at the end to length \)2^m-1\( before applying the above scheme; decoding remains trivial since a binary representation is directly read off. In either case the number of tests (rows of \)M\() is at most \)m = (N+1) + 1\(, which completes the proof. \)
A \(\)t ×N\( matrix \)M\( is \)\(\)d\(-separable\) if and only if for every \(\)S_1 ≠S_2 ⊆[N]\( such that \)|S_1|,|S_2| ≤d\(, we have \[ \bigcup _{j \in S_1} M^j \ne \bigcup _{i \in S_2} M^i. \] \)
Input: Result vector
\(\)r\( and a \)d\(-separable matrix \)M\(. \textbf{Output:} \)x\( if \)r = M ⊙x\(, else \texttt{Fail}. \begin{enumerate} \item \end{enumerate} \)R ←{i ∣r_i = 1}\(. \item For every \)T ⊆[N]\( with \)|T| ≤d\(: \begin{enumerate} \item \end{enumerate}\)S_T ←_i T M^i\(. \item If \)R = S_T\(: set \)x {0,1}^N\( with \)x_i = 1\( iff \)i T\(, and return \)x\(. \)
Return Fail.
Correctness follows directly from Definition 468: the set \(\)T\( found (if any) satisfies \)_i T M^i = R = _i supp(x) M^i\(, and \)d\(-separability guarantees that no other set of size \)≤d\( has the same union, so \)T = supp (x)\(. The algorithm runs in time \)N^Θ(d)\(, since it enumerates all subsets \)T ⊆[N]\( of size at most \)d\(. \)
A \(\)t ×N\( matrix \)M\( is \)\(\)d\(-disjunct\) if and only if for every \(\)S ⊂[N]\( with \)|S| ≤d\( and for every \)j S\(, there exists \)i [t]\( such that \)M_ij = 1\( but \)M_ik = 0\( for all \)k S\(. Or, equivalently, \[ M^j \not\subseteq \bigcup _{k \in S} M^k. \] \)
Any
\(\)d\(-disjunct matrix is also \)d\(-separable. \)
For contradiction, assume
\(\)M\( is \)d\(-disjunct but not \)d\(-separable. Since \)M\( is not \)d\(-separable, there exist two distinct subsets \)S ≠T ⊂[N]\( of size at most \)d\( each such that \[ \bigcup _{k \in S} M^k = \bigcup _{k \in T} M^k. \] Since \)S ≠T\(, there exists \)j T ∖S\(. But then \[ M^j \subseteq \bigcup _{k \in T} M^k = \bigcup _{k \in S} M^k, \] where the equality is the displayed equality above. However, since by assumption \)j S\(, this directly contradicts the fact that \)M\( is \)d\(-disjunct (Definition~ \ref{def:c22-disjunct}). This contradiction completes the proof. \)
Input: Result vector
\(\)r\( and a \)d\(-disjunct matrix \)M\(. \textbf{Output:} \)x\( if \)M ⊙x = r\(, else \texttt{Fail}. \begin{enumerate} \item Initialize \end{enumerate} \)x {0,1}^N\( to be the all-ones vector. \item For every \)i [t]\(: if \)r_i = 0\(, then for every \)j [N]\( with \)M_ij = 1\(, set \)x_j ←0\(. \item If \)M ⊙x = r\(, return \)x\(; else return \texttt{Fail}. \)
There exists an
\(\)O(tN)\(-time decoding algorithm for any \)t ×N\( matrix that is \)d\(-disjunct. \)
The proof follows from two observations, which together justify Algorithm 472 (Algorithm 22.3.2).
First, say we have a matrix
\(\)M\(, a vector \)x\(, and \)r = M ⊙x\( such that \)r_i = 1\(. Then there exists a column \)j\( that made it possible, i.e.\ if \)r_i = 1\( then there is a \)j\( such that \)M_ij = 1\( and \)x_j = 1\(. Second, let \)T = {ℓ∣x_ℓ= 1}\( with \)|T| ≤d\(, and let \)j\( be a column not in \)T\(. Consider the row \)i\( such that \)T\( has all zeros in the \)i\(-th row and \)M_ij=1\(; such a row exists by \)d\(-disjunctness (Definition~ \ref{def:c22-disjunct}, applied with \)S=T\(), and for this row \)r_i = 0\(. Conversely, if \)r_i = 0\(, then for every \)j [N]\( with \)M_ij=1\(, it must be the case that \)x_j = 0\( (else, by the first observation applied to row \)i\(, we would need \)r_i=1\(). These two observations show that Algorithm~ \ref{alg:c22-disjunct-decoder} correctly sets to \)0\( exactly the coordinates \)x_j\( with \)j T\(, and hence correctly recovers \)x\(. The algorithm runs in time \)O(tN)\(: Step 2 examines each of the \)t\( rows once, and for each row with \)r_i=0\( scans all \)N\( columns. \)
Let
\(\)1 ≤d ≤N\( be an integer and \)M\( be a \)t ×N\( matrix such that \begin{enumerate} \item[(i)]for every \end{enumerate}\)j [N]\(, the \)j\(-th column has at least \)w_\( ones in it, i.e.\ \)|M^j| ≥w_\(; and \item [(ii)] for every \)i ≠j [N]\(, the \)i\(-th and \)j\(-th columns have at most \)a_\( ones in common, i.e.\ \)|M^i ∩M^j| ≤a_\(, \)
for some integers \(\)a_ ≤w_ ≤t\(. Then \)M\( is a \) \(-disjunct matrix. \)
For notational convenience, define
\(\)d = \(. Fix an arbitrary \)S ⊂[N]\( with \)|S| ≤d\( and a \)j S\(. We must show that \[ M^j \not\subseteq \bigcup _{i \in S} M^i, \] or equivalently \[ M^j \not\subseteq \bigcup _{i \in S} \left(M^i \cap M^j\right). \] We prove this by showing that \[ \left| M^j \setminus \bigcup _{i \in S} \left(M^i \cap M^j\right) \right| {\gt} 0. \] Indeed, consider the following sequence of relations: \begin{align*} \left| M^j \setminus \bigcup _{i \in S} \left(M^i \cap M^j\right)\right| & = |M^j| - \left|\bigcup _{i \in S}\left(M^i \cap M^j\right)\right| \\ & \ge |M^j| - \sum _{i \in S} \left|\left(M^i \cap M^j\right)\right| \\ & \ge w_{\min } - |S| \cdot a_{\max } \\ & \ge w_{\min } - d \cdot a_{\max } \\ & \ge w_{\min } - \frac{w_{\min }-1}{a_{\max }}\cdot a_{\max } \\ & = 1. \end{align*} Here the first equality holds because \)M^j\( is the disjoint union of the set on the right and the set being subtracted. The first inequality follows from the fact that the size of a union of sets is at most the sum of their sizes. The second inequality follows from the definitions of \)w_\( and \)a_\( (hypotheses (i) and (ii)). The third follows from \)|S| ≤d\(. The fourth follows from the definition of \)d\(. This shows the displayed set has size at least \)1\(, so it is nonempty, which completes the proof. \)
Let
\(\)C ⊆[q]^t\( be a code with codewords \)C = {c_1, …,c_N}\(. The matrix \)M_C\( associated to \)C\( is the \)t ×N\( matrix whose \)i\(-th column is \)c_i\(, i.e. \[ M_C = \begin{bmatrix} \uparrow & \uparrow & & \uparrow \\ \mathbf{c}_1 & \mathbf{c}_2 & \cdots & \mathbf{c}_N \\ \downarrow & \downarrow & & \downarrow \end{bmatrix}. \] \)
\(\) \(-disjunct matrix it suffices to design a binary code \)C^* ⊆{0,1}^t\( such that (i) for every \)c C^*\(, \)wt(c) ≥w_\(, and (ii) for every \)c^1 ≠c^2 C^*\(, \)|{i ∣c^1_i = c^2_i = 1}| ≤a_\(, and then take \)M = M_C^*\( (Definition~ \ref{def:c22-code-to-matrix}). \)
22.4.1 Kautz–Singleton Construction
Let
\(\)C^* = C_out ○C_in\( be a concatenated code (in the sense of code concatenation, Ch.\ 10), where \)C_out\( is a \)[q,k,q-k+1]_q\( Reed--Solomon code (Definition~ \ref{def:c05-reed-solomon}) and the inner code \)C_in : F_q {0,1}^q\( is defined by \)C_in(i) = e_i\( for every \)i F_q\( (identified with \)[q]\(). Note that \)M_C_in\( is the \)q ×q\( identity matrix, and that \)N = q^k\( and \)t = q^2\(. The group testing matrix produced by this construction is \)M_C^*\( (Definition~ \ref{def:c22-code-to-matrix}). \)
For every integer
\(\)d ≥1\( and large enough \)N ≥d\(, there exists a \)t ×N\( matrix that is \)d\(-disjunct where \[ t = O\! \left(d^2 (\log _d N)^2\right). \] \)
We instantiate the Kautz–Singleton construction (Construction 477) and analyze the resulting matrix
\(\)M = M_C^*\( via Lemma~ \ref{lem:c22-sufficient-disjunct}. Recall \)N = q^k\( and \)t = q^2\(. Every column of \)M\( has exactly \)q\( ones in it (since the inner encoding of any symbol is a weight-one vector \)e_i {0,1}^q\(, and every codeword of \)C_out\( has \)q\( (i.e.\ all) coordinates over \)F_q\( each contributing one block of length \)q\(). In other words, \)w_ = q\(. Next we estimate \)a_\(. Divide the \)t = q^2\( rows into \)q\(-sized chunks, and index them by pairs \)(i,j) [q] ×[q]\(. For \)(i,j) [q]×[q]\( and a column index \)ℓ[N]\( (corresponding to a codeword \)c_ℓC^*\(, which in turn corresponds to a message \)m F_q^k\(), we have \)M_(i,j),ℓ = 1\( if and only if \)c_ℓ(j) = i\( (i.e.\ the \)j\(-th coordinate of \)C_out (m)\( equals \)i\(, under a fixed bijection \)F_q ↔[q]\(). Hence, the number of rows in which columns \)ℓ_1\( and \)ℓ_2\( both have a \)1\( is exactly the number of coordinate positions \)j\( where \)c_ℓ_1(j) = c_ℓ_2(j)\(, which equals \)q - Δ(c_ℓ_1,c_ℓ_2)\(. Since \)C_out\( is a \)[q,k,q-k+1]_q\( code (Definition~ \ref{def:c05-reed-solomon}), any two distinct codewords of \)C_out\( agree in at most \)k-1\( positions. Hence any two distinct columns of \)M\( agree (have a common \)1\() in at most \)k-1\( rows, i.e.\ \)a_ = k-1\(. By Lemma~ \ref{lem:c22-sufficient-disjunct}, \)M\( is \)d\(-disjunct for \[ d = \left\lfloor \frac{q-1}{k-1} \right\rfloor . \] We now pick \)q\( and \)k\( (in terms of \)d\() so that this holds. Note that \)d = Θ((q-1)/(k-1))\( gives \)q = O(kd)\(. Since \)N = q^k\(, we have \)k = _q N\(, so \)q = O(d _q N)\(, i.e.\ \)q q = O(d N)\(, which gives \[ q = O(d \log _d N). \] Recalling that \)t = q^2\( completes the proof. \)
Let
\(\)k=1\( and \)q=3\(. By the choice of the \)[3,1]_3\( Reed--Solomon code, \)C_ out = {(0,0,0),(1,1,1),(2,2,2)}\(, so \[ M_{C_{\mathrm{out}}} = \begin{bmatrix} 0 & 1 & 2 \\ 0 & 1 & 2 \\ 0 & 1 & 2 \end{bmatrix}. \] Concatenating each entry with the inner code \)C_in(i) = e_i {0,1}^3\( produces the \)9 ×3\( matrix \)M_C^*\(, where every group of three consecutive rows of \)M_C^*\( corresponds to one row of \)M_C_out\(. \)
A data stream algorithm is an algorithm satisfying the following four requirements.
The algorithm makes one sequential pass over the input.
The algorithm uses poly-log space (in particular, it cannot store the entire input).
The algorithm has poly-log update time, i.e. whenever a new item appears, the algorithm updates its internal state in poly-log time.
The algorithm has poly-log reporting time, i.e. at any point of time the algorithm can return the required answers in poly-log time.
Given a stream of pairs
\(\)(i_1,u_1),…,(i_m,u_m)\( with \)i_ℓ[N]\( and \)u_ℓZ\(, let \)f_ℓ\( denote the total count for item (stock) id \)ℓ\(. Initially \)f_ℓ= 0\(; upon seeing a pair \)(ℓ,u_ℓ)\(, update \)f_ℓ←f_ℓ+ u_ℓ\(. \)
Given
\(\)N\( different items, and \)m\( input pairs of data \)(i_ℓ,u_ℓ)\( for \)1 ≤ℓ≤m\(, where \)i_ℓ[N]\( indicates the item index and \)u_ℓ\( indicates the corresponding count, the \emph{hot items problem} requires updating the count \)f_ℓ\( (\)1 ≤ℓ≤N\() for each item, and outputting all item indices \)j\( such that \[ f_j {\gt} \frac{\sum _{\ell =1}^{N} u_\ell }{d}. \] Any such item is called a \emph{hot item}. (The hot items problem is also called the \emph{heavy hitters problem}.) \)
Computing hot items exactly by a deterministic one-pass algorithm needs
\(\)Ω(N)\( space (even allowing exponential time). \)
22.5.1 Connection to Group Testing
We assume throughout this subsection the heavy-tail distribution assumption on the input data, namely
Output: initialize the counters.
\(\)m ←0\(. \item For every \)j [t]\(: \)C_j ←0\(. \)
Input: pair
\(\)(i,u)\(, \)i [N]\(, \)u Z\(. \textbf{Output:} updates the counters. \begin{enumerate} \item \end{enumerate} \)m ←m+1\(. \item For every \)j [t]\(: if \)M_ij = 1\(, then \)C_j ←C_j + u\(. \)
Here \(\)M\( is a fixed \)t ×N\( matrix and \)C_1,…,C_t\( are counters, \)C_i\( maintaining the total count of all items present in the \)i\(-th row of \)M\(. \)
If
\(\)j\( is a hot item and \)M_ij=1\(, then \)C_i > \(. \)
Let
\(\)i [t]\( be such that \)M_ij=1\(. Then at any point of time, \[ C_i = \sum _{k:\, M_{ik}=1} f_k \ge f_j, \] where the equality follows by an easy induction on Algorithm~ \ref{alg:c22-hot-items-update} (each update to \)f_k\( for \)M_ik=1\( is mirrored by an equal update to \)C_i\(). Since \)j\( is a hot item, we have \)f_j > m/d\( (Definition~ \ref{def:c22-hot-items}, applied with \)∑_ℓu_ℓ= m\(), which completes the proof. \)
For any
\(\)1 ≤i ≤t\(, if all \)j\( with \)M_ij=1\( are not hot items, then \)C_i ≤\(. \)
Fix
\(\)i [t]\( such that every \)j [N]\( with \)M_ij=1\( is not a hot item. Then by the same argument as in the proof of Lemma~ \ref{lem:c22-hot-item-obs1}, \[ C_i = \sum _{k:\, M_{ik}=1} f_k. \] Since every \)k\( with \)M_ik=1\( is not hot, this sum is a sub-sum of \)∑_ℓ: not hot f_ℓ≤m/d\( (the heavy-tail assumption), so \)C_i ≤m/d\(. \)
Input: counters
\(\)m\( and \)C_1,…,C_t\(. \textbf{Output:} the heavy (hot) items. \begin{enumerate} \item For every \end{enumerate} \)j [t]\(: set \)r_j ←1\( if \)C_j > m/d\(, else \)r_j ←0\(. \item Let \)x\( be the result of decoding (for group testing) \)r = (r_1,…,r_t)\( against \)M\(. \item Return \){i ∣x_i = 1}\(. \)
Define \(\)x = (x_1,…,x_N) {0,1}^N\( with \)x_j = 1\( iff \)j\( is a hot item. By Lemmas~ \ref{lem:c22-hot-item-obs1} and~ \ref{lem:c22-hot-item-obs2}, \)r_i = 1\( if and only if \)C_i > m/d\(, which happens if and only if \)_j: M_ij=1 x_j = 1\(; that is, \)r = M ⊙x\(. Since \)wt(x) < d\( (there are at most \)d\( hot items), reporting the hot items reduces exactly to decoding the group testing instance \)(M,r)\(. \)
There exists a data streaming algorithm that computes the
\(\)d\( hot items with one pass, \)O(t N)\( space for \)t = O(d^2 _d^2 N)\(, \)O(t polylog t)\( update time, and \)poly(t)\( reporting time. \)
We take
\(\)M\( to be the \)d\(-disjunct matrix of Theorem~ \ref{thm:c22-kautz-singleton} (Kautz--Singleton construction), with \)t = O(d^2 _d^2 N)\(, and use Algorithms~ \ref{alg:c22-hot-items-init},~ \ref{alg:c22-hot-items-update}, and~ \ref{alg:c22-report-heavy-items} to initialize, update, and report, respectively. We check the four requirements of Definition~ \ref{def:c22-data-stream-algorithm}. \emph{One-pass.} Using non-adaptive group testing, the algorithm above can be implemented in one pass: Algorithm~ \ref{alg:c22-hot-items-update} is invoked once per input pair, and the choice of \)M\( (which is fixed in advance) satisfies the non-adaptivity requirement. \emph{Poly-log space.} We must maintain the counters \)C_1,…,C_t\( and \)m\(. Each counter is at most \)mN\( in absolute value, so it can be represented in \)O(N + m)\( bits, giving a total of \)O((N + m) t)\( bits, i.e.\ \)O(t N)\( space (treating \)m = O(N)\(, e.g.\ if \)m = poly(N)\(). Since \)t = O(d^2 _d^2 N)\(, this is poly-logarithmic in \)N\( provided \)d = poly N\(. We do not store the matrix \)M\( itself (which would need \)Ω(tN)\( space); instead \)M\(’s entries are computed on the fly (see next item). \emph{Poly-log update time.} Recall \)M = M_C^*\( with \)C^* = C_ out ○C_in\(, \)C_out\( a \)[q,k,q-k+1]_q\( Reed--Solomon code (Definition~ \ref{def:c05-reed-solomon}) and \)M_C_in\( the \)q ×q\( identity matrix, so \)N = q^k\(, \)t = q^2\(. Column \)M^j\(, for a message \)m F_q^k\( corresponding to item \)j\(, equals (a suitable encoding of) the codeword \)C_out(m)\(: since \)C_out\( is a linear code (Definition~ \ref{def:c02-linear-code}), \)C_out(m)\( can be computed in \)O(q^2 ×polylog q)\( time by a matrix-vector product against the (implicitly stored, low-space) generator matrix, and hence \)M^j\( can be computed in \)O(q^2 ×polylog q) = O(t ×polylog t)\( time. Step 2 of Algorithm~ \ref{alg:c22-hot-items-update} therefore runs in \)O(t ×polylog t)\( time overall. \emph{Poly(t) reporting time.} The run time of Algorithm~ \ref{alg:c22-report-heavy-items} is dominated by Step 2 (decoding). The only decoding algorithm exhibited in this chapter for a general \)d\(-disjunct matrix is Algorithm~ \ref{alg:c22-disjunct-decoder}~ /~ Lemma~ \ref{lem:c22-disjunct-decoding-time}, which runs in time \)Ω(tN)\(, not \)poly(t)\(. Granting the existence of such a \)poly(t)\(-time decoder for this specific \)M\( (which uses the algebraic structure of the Reed--Solomon outer code), Step 2 runs in \)poly(t)\( time, completing the proof. \)
The lower bound
\(\)t^a(d,N) = Ω(d(N/d))\( is exactly Proposition~ \ref{prop:c22-adaptive-lower-bound}. \notready \)
The upper bound term
\(\)O(d^2 _d^2 N)\( follows directly from Theorem~ \ref{thm:c22-kautz-singleton} (Kautz--Singleton). \notready \)
By Theorem 491,
\(\)t(d,N) ≥Ω(d^2 _d N)\(, and by Theorem~ \ref{thm:c22-adaptive-bound-summary}, \)t^a(d,N) = Θ(d(N/d)) = O(d N)\(. Hence \[ \frac{t(d,N)}{t^a(d,N)} \ge \Omega \! \left(\frac{d^2 \log _d N}{d \log N} \right) = \Omega \! \left(\frac{d \log _d N}{\log N}\right). \] Writing \)_d N = N / d\(, the right-hand side equals \)Ω(d / d)\(, which is the claimed bound. \)