Approximate Gradient Coding with Optimal Decoding

5 The decoding error under random stragglers

Let \(G = (V,E)\) be any vertex-transitive graph with \(n\) vertices, \(m\) edges, and spectral expansion \(\lambda \), and let \(A\) be the graph assignment matrix of \(G\). Suppose for some positive \(\epsilon \):

  1. \(\lambda {\gt} 1.5\);

  2. \((\tfrac {\lambda }{2}+1)(1 - p e^{1/\lambda }) \ge 1 + \epsilon \);

  3. \(\Big(1 - \tfrac {e^2 p^{\lambda (1-\frac{1}{3+\epsilon })}}{(1-pe^{1/\lambda })^2}\Big) \Big(\lambda - (2d-\lambda )\tfrac {e^2 p^{\lambda (1-\frac{1}{3+\epsilon })}}{(1-pe^{1/\lambda })^2}\Big) \ge \epsilon \);

  4. \(\tfrac {n}{\log (n)^2} \ge \tfrac {4(1+\epsilon )}{\epsilon ^2} \big(2 - 2\log (\epsilon ) + 2\log (1+\epsilon ) - \log (1 - pe^{1/\lambda })\big)\).

If each machine straggles independently with probability \(p\), then

  1. \(\mathbb {E}[\alpha ^*] = r\mathbb {1}\) for some \(r \ge 1 - \tfrac {6}{n} - t\);

  2. for all \(i\), \(\mathbb {E}[(\alpha ^*_i - r)^2] \le \tfrac {6}{n} + t\);

  3. \(\big|\mathbb {E}[(\alpha ^* - r\mathbb {1})(\alpha ^* - r\mathbb {1})^T]\big|_2 \le 2k^2\big(t + \tfrac {6}{n}\big)^2 + 24\),

where \(t = \tfrac {e^2 p^{\lambda (1-\frac{1}{3+\epsilon })}}{(1-pe^{1/\lambda })^2}\) and \(k = \tfrac {2(1+\epsilon )}{\epsilon ^2} \big(2\log (n) - 2\log (\epsilon ) + 2\log (1+\epsilon ) - \log (1-p)\big)\), and all expectations are over the random stragglers.

Proof

Because \(G\) is vertex-transitive, the distribution of \(\alpha ^*_i\) is the same for every \(i\), so \(\mathbb {E}[\alpha ^*]\) is a multiple of \(\mathbb {1}\). By Corollary 37, with probability at least \(1 - \tfrac {6}{n}\) at least \((1-t)n\) vertices lie in non-bipartite components, where \(t\) is as above; by Lemma 24 those vertices have \(\alpha ^*_i = 1\).

First statement. Since \(\alpha ^*_i \ge 0\),

\[ \mathbb {E}[\alpha ^*_i] \ge \Pr [\alpha ^*_i = 1] \ge \Pr [\text{vertex } i \text{ in a non-bipartite component}] \ge 1 - \tfrac {6}{n} - t . \]

Second statement. Using \(|1 - \alpha ^*_i| \le 1\) always,

\[ \mathbb {E}[(\alpha ^*_i - \mathbb {E}[\alpha ^*_i])^2] \le \mathbb {E}[(\alpha ^*_i - 1)^2] \le 1 - \Pr [\alpha ^*_i = 1] \le \tfrac {6}{n} + t . \]

Third statement. This is Lemma 28.

Lemma 28 Covariance norm bound [Lemma IV.5 / Lemma C.1]

Under the assumptions of Theorem 27,

\[ \big|\mathbb {E}[(\alpha ^* - r\mathbb {1})(\alpha ^* - r\mathbb {1})^T]\big|_2 \le 2k^2\big(t + \tfrac {6}{n}\big)^2 + 24, \]

where \(r, t, k\) are as in Theorem 27.

Proof

Bound the operator norm by the sum of absolute entries of a row:

\[ \big|\mathbb {E}[(\alpha - r\mathbb {1})(\alpha - r\mathbb {1})^T]\big|_2 \le \sum _j \big|\mathbb {E}[\alpha _i \alpha _j] - \mathbb {E}[\alpha _i]\mathbb {E}[\alpha _j]\big| . \]

For \(S \subset V\) let \(E_S\) be the event that \(S\) is exactly a connected component of \(G(p)\), let \(\mathcal{E}(i) = \{ E_S : i \in S\} \), and for \(E \in \mathcal{E}(i)\) let \(\alpha _i(E)\) be the value of \(\alpha _i\) conditioned on \(E\). Since \(1 - \alpha _i\) is determined by which component contains \(i\) and whether it is bipartite,

\begin{align*} \big|\mathbb {E}[\alpha _i \alpha _j] - \mathbb {E}[\alpha _i]\mathbb {E}[\alpha _j]\big| & = \Big| \sum _{E_S \in \mathcal{E}(i),\, E_T \in \mathcal{E}(j)} (\Pr [E_S \cap E_T] - \Pr [E_S]\Pr [E_T])(1 - \alpha _i(E_S))(1 - \alpha _j(E_T)) \Big| \\ & \le \sum _{E_S, E_T :\ \alpha _i(E_S) \neq 1,\, \alpha _j(E_T) \neq 1} |\Pr [E_S \cap E_T] - \Pr [E_S]\Pr [E_T]| \\ & \le 2 \sum _{\substack {E_S, E_T :\ \alpha _i(E_S) \neq 1,\, \alpha _j(E_T) \neq 1 \\ \begin{bgroup} \Pr [E_S]\Pr [E_T] {\gt} \Pr [E_S \cap E_T] \end{bgroup}}} \Pr [E_S]\Pr [E_T] , \end{align*}

where the second line uses \(|1-\alpha _i| \le 1\). Note \(\Pr [E_S]\Pr [E_T] {\gt} \Pr [E_S \cap E_T]\) exactly when the two events are incompatible, i.e. \(S \neq T\) and \(S \cap T \neq \emptyset \); define \(I(S,T) = 1\) in that case and \(0\) otherwise.

Using vertex transitivity, for a fixed \(i\) with \(|S| \le k\) one rewrites, with \(A_j \subset \operatorname {Aut}(G)\) the automorphisms sending \(j \mapsto 1\) and using \(\Pr [E_T] = \Pr [E_{\sigma (T)}]\),

\[ \sum _{j}\sum _{\substack {E_T \in \mathcal{E}(j),\, |T| \le k}} \Pr [E_T] I(S,T) = n \sum _{\substack {E_T \in \mathcal{E}(1),\, |T| \le k}} \Pr [E_T]\, \mathbb {E}_{\sigma \sim \operatorname {Aut}(G)}[I(S, \sigma (T))] . \]

Bounding \(I(S,\sigma (T)) \le \mathbf{1}[S \cap \sigma (T) \neq \emptyset ]\) and using that \(\sigma (u)\) is uniform on \(V\) (vertex transitivity), so \(\mathbb {E}_{\sigma }[|S \cap \sigma (T)|] = \tfrac {|S||T|}{n}\), gives

\[ \sum _{j}\sum _{E_T} \Pr [E_T] I(S,T) \le n \sum _{E_T} \Pr [E_T]\tfrac {|S||T|}{n} \le k|S|\Pr [|C(1)| \le k] \le k|T|\big(t + \tfrac {6}{n}\big), \]

the last step using Corollary 37 (a vertex is in a giant component of size \({\gt} n/2\) with probability \(\ge 1 - t - \tfrac {6}{n}\)). Summing over all \(j\), splitting into pairs with \(\max (|S|,|T|) \le k\) and \(\max (|S|,|T|) {\gt} k\), and applying Claim 35 for the large pairs,

\[ \sum _j \big|\mathbb {E}[\alpha _i\alpha _j] - \mathbb {E}[\alpha _i]\mathbb {E}[\alpha _j]\big| \le 2 k^2 \big(t + \tfrac {6}{n}\big)^2 + 4n\cdot \tfrac {6}{n} \le 2k^2\big(t + \tfrac {6}{n}\big)^2 + 24 . \qedhere \]
Theorem 29 Giant component of a sparsified expander [Theorem IV.3]

Let \(G\) be any \(d\)-regular \(\lambda \)-spectral expander on \(n\) vertices and suppose for some positive \(\epsilon \): (1) \(\lambda {\gt} 1.5\); (2) \((\tfrac {\lambda }{2}+1)(1-p) \ge 1 + \epsilon \); (3) \(\tfrac {n}{\log (n)^2} \ge \tfrac {2(3+\epsilon )(1+\epsilon )}{\epsilon ^2} \big(2 - 2\log (\epsilon ) + 2\log (1+\epsilon ) - \log (1-p)\big)\). Let \(G(p)\) be the random sparsification deleting each edge with probability \(p\). With probability at least \(1 - \tfrac {5}{n}\):

  1. \(G(p)\) has a giant component of size at least \(n\Big(1 - \tfrac {e p^{\lambda (1-\frac{1}{3+\epsilon })}}{(1-p)^2}\Big)\);

  2. every vertex is either in a component of size at most \(k\), where \(k\) is as in Theorem 27, or in a component of size greater than \(n/2\).

Proof

Consider the growing process of Algorithm 30 (a generalization of the standard component-exploration process), started from a vertex set \(V_0\): it reveals edges of \(G\) one at a time, including an edge in \(G(p)\) with probability \(1-p\) (variable \(X_t\)), maintaining the explored vertices \(V_t\), the explored \(G(p)\)-edges \(S_t\), the explored non-\(G(p)\) edges \(B_t\), and the frontier \(F_t = \partial (V_t)\setminus B_t\) of unexplored edges. Since the explored \(G(p)\)-edges form a forest, \(|V_t| = |V_0| + |S_t|\) and \(|B_t| = t - |S_t|\). Claim 31 lower bounds the frontier size, and Claim 32 shows every vertex is in a component of size \(\le k\) or \({\gt} n/2\) with probability \(1 - \tfrac {4}{n}\). Since there is at most one component of size \({\gt} n/2\), all such vertices lie in one giant component. Claim 33 bounds the number of vertices in small components, yielding the giant-component size bound and the failure probability \(1 - \tfrac {5}{n}\).

Lemma 30 Growing process [Algorithm 1]

The growing process on input vertex set \(V_0\) explores \(\bigcup _{v \in V_0} C_v\) (the \(G(p)\)-components of \(V_0\)) as follows. Initialize \(t \leftarrow 0\), \(S_0, B_0 \leftarrow \emptyset \), \(F_0 \leftarrow \partial (V_0)\). While \(|F_t| {\gt} 0\): pick \(e \in F_t\) arbitrarily; set \(t \leftarrow t+1\) and draw \(X_t \sim \mathrm{Bernoulli}(1-p)\). If \(X_t = 1\) then \(S_t \leftarrow S_{t-1} \cup e\), \(V_t \leftarrow V_{t-1} \cup \delta (e)\), \(B_t \leftarrow B_{t-1}\); else \(B_t \leftarrow B_{t-1}\cup e\), \(S_t \leftarrow S_{t-1}\), \(V_t \leftarrow V_{t-1}\). Update \(F_t \leftarrow \partial (V_t) \setminus B_t\). Return \(V_t\). Here \(S_t\) are the explored edges in \(G(p)\), \(V_t\) the explored vertices, \(B_t\) the explored edges not in \(G(p)\), and \(F_t\) the frontier.

Lemma 31 Frontier lower bound [Claim IV.7]

At the end of any step \(t\) with \(|V_t| \le n/2\),

\[ |F_t| \ge \sum _{i=1}^t (X_i(\alpha _t + 1) - 1) + |V_0|\alpha _t, \qquad \alpha _t = \lambda \max \! \Big(\tfrac 12,\ 1 - \tfrac {t+|V_0|}{n}\Big). \]
Proof

By the Expander Mixing Lemma (Lemma 9) applied to \(V_t\) and its complement, \(|\partial (V_t)| \ge \lambda |V_t|\big(1 - \tfrac {|V_t|}{n}\big) \ge \alpha _t|V_t|\). Hence, using \(|F_t| = |\partial (V_t)\setminus B_t| \ge \alpha _t|V_t| - |B_t|\), \(|V_t| = |S_t| + |V_0|\) and \(|B_t| = t - |S_t|\),

\[ |F_t| \ge \alpha _t(|S_t| + |V_0|) - (t - |S_t|) = (\alpha _t + 1)|S_t| + |V_0|\alpha _t - t = \sum _{i=1}^t (X_i(\alpha _t+1) - 1) + |V_0|\alpha _t, \]

since \(|S_t| = \sum _{i=1}^t X_i\).

Lemma 32 Dichotomy of component sizes [Claim IV.8]

With probability at least \(1 - \tfrac {4}{n}\), simultaneously for all vertices \(v\), either the component of \(v\) has size less than \(k\) or has size strictly greater than \(n/2\), where \(k = \tfrac {2(1+\epsilon )}{\epsilon ^2} (2\log (n) - 2\log (\epsilon ) + 2\log (1+\epsilon ) - \log (1-p))\).

Proof

Conditioned on \(v\) in a component of size \(\ge k\), the growing process on \(\{ v\} \) takes at least \(k(1+\alpha _k)\) steps. By the third assumption, \(k \le \tfrac {n}{(3+\epsilon )\log n}\), so \(\alpha _t \le \tfrac 12\) for the relevant \(t\). Using the frontier bound (Lemma 31) the component stays \(\le n/2\) only if the random walk \(\sum _{i=1}^t(X_i(\alpha _t+1)-1)\) becomes nonpositive, so

\[ \Pr [|C(v)| \le n/2 \mid |C(v)| \ge k] \le \sum _{t = k(1+\alpha _k)}^{\infty } \Pr \Big[\sum _{i=1}^t(X_i(\alpha _t+1) - 1) \le -\alpha _t\Big]. \]

By the Chernoff bound (Lemma 4) for Bernoulli\((1-p)\) variables,

\[ \Pr \Big[\sum _{i=1}^t(X_i(\alpha _t+1)-1) \le -\alpha _t\Big] \le \exp \! \Big(-\tfrac {(1-p)t}{2}\big(1 - \tfrac {1}{(\alpha _t+1)(1-p)}\big)^2\Big). \]

Summing the geometric series and using the second assumption and the definition of \(k\) bounds the total by \(\dfrac {\exp (-k\epsilon ^2/(2(1+\epsilon )))}{\frac{1-p}{4}(1-\frac{1}{1+\epsilon })^2} = \tfrac {4}{n^2}\). A union bound over the \(n\) vertices gives the claim.

Lemma 33 Few vertices in small components [Claim IV.9]

Let \(I_v = \mathbf{1}[|C_v| \le k]\) and \(Y = \sum _v I_v\). Then

\[ \Pr \! \left[Y \ge \frac{en\, p^{\lambda (1-\frac{1}{3+\epsilon })}}{(1-p)^2}\right] \le \frac{1}{n}. \]

Consequently, with probability \(\ge 1 - \tfrac {4}{n} - \tfrac 1n\), there is a giant component of size \(\ge n\big(1 - \tfrac {e p^{\lambda (1-\frac{1}{3+\epsilon })}}{(1-p)^2}\big)\) and no vertices in components of size between \(k\) and \(n/2\) (establishing Theorem 29).

Proof

By Claim 35, the first \(\log (n)\) moments of \(Y\) are upper bounded by those of \(\mathrm{Binomial}(n,q)\) with \(q = \tfrac {p^{\lambda (1-\frac{1}{3+\epsilon })}}{(1-p)^2}\). By Proposition 36 and Markov’s inequality (Lemma 2) applied to the \(\log (n)\)-th centralized moment,

\[ \Pr [Y \ge enq] = \Pr [(Y - \mathbb {E}[Y])^{\log n} \ge (neq - nq)^{\log n}] \le \frac{(2q\sqrt{n\log n})^{\log n}}{(nq(e-1))^{\log n}} \le \Big(\tfrac 1e\Big)^{\log n} = \tfrac 1n . \qedhere \]
Lemma 34 Random-walk extinction probability [Lemma IV.10]

Let \(S(\alpha ) = \sum _{i=1}^{\infty } X_i\) be a random walk where \(X_i = \alpha \) with probability \(1-p\) and \(X_i = -1\) with probability \(p\). Let \(S^{(\beta )}(\alpha ) = \beta + S(\alpha )\). If \(\alpha {\gt} 1\), then for any positive integer \(c\) the probability that \(S^{(c\alpha )}(\alpha )\) ever goes below zero is at most \(\Big(\tfrac {p^{\alpha }}{(1-p)^2}\Big)^c\).

Proof

For \(\beta \) let \(d_\beta (\alpha )\) be the extinction probability of \(S^{(\beta )}(\alpha )\). We compare \(S^{(\beta )}(\alpha )\) with the integer walk \(S(\lfloor \alpha \rfloor )\): \(d_\beta (\alpha ) \le d_{\lceil \beta \rceil }(\lfloor \alpha \rfloor ) = d_1(\lfloor \alpha \rfloor )^{\lceil \beta \rceil }\), and a first-step analysis gives \(d_1(\lfloor \alpha \rfloor ) = p + (1-p)d_1(\lfloor \alpha \rfloor )^{\lfloor \alpha \rfloor +1}\), whence \(d_1(\lfloor \alpha \rfloor ) \le \tfrac {p}{(1-p)^{1/\lfloor \alpha \rfloor }}\). Therefore for any positive integer \(c\),

\[ d_{c\alpha }(\alpha ) \le d_{c\alpha }(\lfloor \alpha \rfloor ) \le \Big(\tfrac {p}{(1-p)^{1/\lfloor \alpha \rfloor }}\Big)^{\lceil c\alpha \rceil } \le \Big(\tfrac {p^{\alpha }}{(1-p)^{\frac{\alpha }{\lfloor \alpha \rfloor }}}\Big)^c \le \Big(\tfrac {p^{\alpha }}{(1-p)^2}\Big)^c . \qedhere \]
Lemma 35 Moment control of small-component sets [Claim IV.11]

For any set \(S\) of \(c \le \log (n)\) vertices, the probability that every vertex of \(S\) lies in a component of size at most \(k\) is at most \(\Big(\tfrac {p^{\lambda (1-\frac{1}{3+\epsilon })}}{(1-p)^2}\Big)^c\).

Proof

We have \(\Pr [\max _{v\in S}|C_v| \le k] \le \Pr [|\bigcup _{v\in S}C_v| \le ck]\). This is the probability that the growing process started at \(V_0 = S\) terminates before reaching size \(ck\). From \(k \le n(1 - \tfrac {1}{3+\epsilon })\) (for \(c \le \log n\)), the process terminates early only if the random walk \(\sum _{i=1}^{\infty }\big(X_i(\lambda (1-\tfrac 1{3+\epsilon })+1) - 1\big) + c\lambda (1-\tfrac 1{3+\epsilon })\) becomes extinct. Since \(\lambda (1-\tfrac 1{3+\epsilon }) {\gt} 1\) by the first assumption, Lemma 34 (with \(\alpha = \lambda (1-\tfrac 1{3+\epsilon })\), \(\beta = c\alpha \)) gives the stated bound.

Proposition 36 Centered binomial moment bound [Proposition IV.12]

For any \(n\), \(q\) and \(c \le \log (n)\),

\[ \mathbb {E}\big[(\mathrm{Binomial}(n,q) - nq)^c\big] \le \big(2q\sqrt{nc}\big)^c . \]
Proof

Let \(X_i, Y_i\) be i.i.d. Bernoulli\((q)\). By Jensen’s inequality (Lemma 3) applied to the convex map \(x \mapsto x^c\),

\[ \mathbb {E}[(\mathrm{Binomial}(n,q) - nq)^c] = \mathbb {E}\Big[\big(\sum _i X_i - \sum _i \mathbb {E}[Y_i]\big)^c\Big] \le \mathbb {E}\Big[\big(\sum _i (X_i - Y_i)\big)^c\Big]. \]

Write \(X_i - Y_i = r_i Z_i\) with \(r_i\) i.i.d. Rademacher and \(Z_i \sim \mathrm{Bernoulli}(2q(1-q))\). Since even moments of a standard Gaussian \(g_i\) dominate those of a Rademacher and odd moments vanish for both, \(\mathbb {E}[(\sum _i r_i Z_i)^c] \le \mathbb {E}[(\sum _i g_i Z_i)^c]\). As \(\mathbb {E}[Z_i^k] = 2q(1-q)\) for all \(k \ge 1\), comparing moments gives \(\sum _i g_i Z_i \sim 2q(1-q)\sum _i g_i \sim 2q(1-q)\mathcal{N}(0,n)\), so

\[ \mathbb {E}\Big[\big(\sum _i g_i Z_i\big)^c\Big] = (4nq^2(1-q)^2)^{c/2}\frac{c!}{2^{c/2}(c/2)!} \le (4nq^2(1-q)^2 c)^{c/2} \le (2q\sqrt{nc})^c , \]

where the first inequality uses Stirling (Lemma 5).

Corollary 37 Non-bipartite giant component [Corollary IV.4]

Let \(G\) be any \(d\)-regular \(\lambda \)-spectral expander on \(n\) vertices and suppose for some positive \(\epsilon \) assumptions (1)–(4) of Theorem 27 hold. Then with probability at least \(1 - \tfrac {6}{n}\):

  1. \(G(p)\) has a non-bipartite giant component of size at least \(n\Big(1 - \tfrac {e^2 p^{\lambda (1-\frac{1}{3+\epsilon })}}{(1-pe^{1/\lambda })^2}\Big)\);

  2. every vertex is either in a component of size at most \(k\) (with \(k\) as in Theorem 27) or in a component of size \({\gt} n/2\).

Proof

Use “edge sprinkling”: let \(q = pe^{1/\lambda }\). Create \(G(p)\) in two steps: Step 1 forms \(G(q)\) by deleting edges with probability \(q\); Step 2 adds back each absent edge of \(G\) with probability \(1 - \tfrac {p}{q}\). By Theorem 29 applied with \(q\), with probability \(\ge 1 - \tfrac 5n\) the graph \(G(q)\) has a giant component \(C\) of size \(n(1-s)\), \(s = \tfrac {e^2 p^{\lambda (1-\frac1{3+\epsilon })}}{(1-pe^{1/\lambda })^2}\). If \(C\) were bipartite with sides \(L \supseteq R\), \(r = |R|/n \ge \tfrac {1-s}{2}\), then by the Expander Mixing Lemma (Lemma 9) there are at least \(n(dr^2 - (d-\lambda )r(1-r))\) edges of \(G\) inside \(R\); assumption (3) makes this \(\ge \tfrac {n\epsilon }{4}\). The probability Step 2 adds no edge inside \(R\) is at most \((p/q)^{n\epsilon /4} \le \exp (-\tfrac {n\epsilon }{4\lambda }) \le \tfrac 1n\). Hence with probability \(\ge 1 - \tfrac 5n - \tfrac 1n\) there is a non-bipartite giant component, and the second statement follows from Theorem 29.

Proposition 38 Lower bound on random error \(\geq p^d/(1-p^d)\) [Remark IV.2]

For a graph assignment scheme with replication factor \(d\), \(\tfrac 1n\mathbb {E}[|\overline{\alpha ^*} - \mathbb {1}|_2^2] \ge p^d\), because \(p^d\) is the probability that a fixed data block is stored only at straggling machines. Theorem 27 therefore implies the variance of \(\alpha ^*\) shrinks exponentially in \(d\); for Ramanujan graphs (\(\lambda \ge d - 2\sqrt{d-1}\)) the bound is tight up to factors in the exponent.