- Boxes
- definitions
- Ellipses
- theorems and lemmas
- Blue border
- the statement of this result is ready to be formalized; all prerequisites are done
- Orange border
- the statement of this result is not ready to be formalized; the blueprint needs more work
- Blue background
- the proof of this result is ready to be formalized; all prerequisites are done
- Green border
- the statement of this result is formalized
- Green background
- the proof of this result is formalized
- Dark green background
- the proof of this result and all its ancestors are formalized
- Dark green border
- this is in Mathlib
Algorithm GCOD\((A, p, \theta _0, \gamma , \{ f_i\} , k)\): in the distribution phase draw a uniform permutation \(\rho \sim \mathrm{Uniform}(\mathcal{S}_n)\) and send \(f_{\rho (i)}\) to all machines \(j\) with \(A_{ij} \neq 0\). In the computation phase, for \(t = 1,\dots ,k\): the server sends \(x_{t-1}\); each machine \(j\) straggles with \(B_j \sim \mathrm{Bernoulli}(p)\) and, if not straggling, returns \(g_j = \sum _i A_{ij}\nabla f_{\rho (i)}(\theta _{t-1})\); the server computes \(w^* \in \arg \min _{w :\ w_j = 0 \text{ if } B_j = 0}|Aw - \mathbb {1}|_2\) and updates \(\theta _t = \theta _{t-1} - \gamma \sum _{j : B_j = 1} w^*_j g_j\).
The growing process on input vertex set \(V_0\) explores \(\bigcup _{v \in V_0} C_v\) (the \(G(p)\)-components of \(V_0\)) as follows. Initialize \(t \leftarrow 0\), \(S_0, B_0 \leftarrow \emptyset \), \(F_0 \leftarrow \partial (V_0)\). While \(|F_t| {\gt} 0\): pick \(e \in F_t\) arbitrarily; set \(t \leftarrow t+1\) and draw \(X_t \sim \mathrm{Bernoulli}(1-p)\). If \(X_t = 1\) then \(S_t \leftarrow S_{t-1} \cup e\), \(V_t \leftarrow V_{t-1} \cup \delta (e)\), \(B_t \leftarrow B_{t-1}\); else \(B_t \leftarrow B_{t-1}\cup e\), \(S_t \leftarrow S_{t-1}\), \(V_t \leftarrow V_{t-1}\). Update \(F_t \leftarrow \partial (V_t) \setminus B_t\). Return \(V_t\). Here \(S_t\) are the explored edges in \(G(p)\), \(V_t\) the explored vertices, \(B_t\) the explored edges not in \(G(p)\), and \(F_t\) the frontier.
Algorithm SGD-ALG\((P_\beta , \theta _0, \gamma , \{ f_i\} , k)\): draw a uniform permutation \(\rho \); for \(t = 1,\dots ,k\), draw \(\beta \sim P_\beta \) and update \(\theta _t = \theta _{t-1} - \gamma \sum _i \beta _i \nabla f_{\rho (i)}(\theta _{t-1})\). For an unbiased scheme with \(\alpha ^* = A(p)(A(p)^TA(p))^{\dagger }A(p)^T\mathbb {1}\) (Definition 14) and \(P_{\overline{\alpha ^*}}\) the distribution of the normalized \(\overline{\alpha ^*}\), the algorithm GCOD\((A,p,x_0,\gamma ,\{ f_i\} ,k)\) is stochastically equivalent to SGD-ALG\((P_{\overline{\alpha ^*}}, x_0, \gamma \mathbb {E}[\alpha _1], \{ f_i\} , k)\).
Let \(A\) be a graph assignment scheme for a \(d\)-regular graph \(G\) with spectral gap \(\lambda = d - o(d)\). For any set of \(\lfloor pm\rfloor \) stragglers there exist decoding coefficients \(w\) with
Let \(A\) be a graph assignment scheme for a \(d\)-regular graph \(G\) with spectral expansion \(\lambda \). For any set of \(\lfloor pm\rfloor \) stragglers there exist decoding coefficients \(w\) with
Let \(f, f_i, x^*, \sigma ^2\) be as in Proposition 50. Perform \(\theta _{k+1} = \theta _k - \gamma \sum _i \alpha _i^{(k)}\nabla f_i\) with \(|\alpha ^{(k)} - \mathbb {1}|_2^2 \le r\) at each iteration and assume \(\mu {\gt} rL'\). Then there is a step size \(\gamma \) such that after
steps, \(|\theta _k - \theta ^*|_2^2 \le \tfrac {4r\sigma ^2}{(\mu - \sqrt{\mu rL'})^2}\), where \(\epsilon _0 = |\theta _0 - \theta ^*|^2\). Combining with the adversarial error bound of Corollary 41, coded gradient descent converges to a noise floor of \(\tfrac {4\sigma ^2 n(2d-\lambda )p}{\mu (\sqrt{2d(1-p)} - \sqrt{n(2d-\lambda )pL'})^2}\).
Let \(G\) be any \(d\)-regular \(\lambda \)-spectral expander on \(n\) vertices and suppose for some positive \(\epsilon \) assumptions (1)–(4) of Theorem 27 hold. Then with probability at least \(1 - \tfrac {6}{n}\):
\(G(p)\) has a non-bipartite giant component of size at least \(n\Big(1 - \tfrac {e^2 p^{\lambda (1-\frac{1}{3+\epsilon })}}{(1-pe^{1/\lambda })^2}\Big)\);
every vertex is either in a component of size at most \(k\) (with \(k\) as in Theorem 27) or in a component of size \({\gt} n/2\).
For any desired accuracy \(\epsilon \), choosing \(\gamma = \tfrac {\mu \epsilon }{2\mu \epsilon (sL’+L) + 2r(1+\frac1{n-1})\sigma ^2}\) gives, after
steps, \(\mathbb {E}[|x_k - x^*|_2^2] \le \epsilon \), where \(\epsilon _0 = |x_0 - x^*|^2\).
Given \(A \in \mathbb {R}^{N \times m}\), the adversarial decoding error under a \(p\) fraction of adversarial stragglers is
For a straggler rate \(p\), let \(A(p)\) be the random matrix obtained from \(A\) by replacing each column independently with the zero column with probability \(p\) (a straggling machine). Then \(\alpha ^*\) is the orthogonal projection of \(\mathbb {1}\) onto the column space of \(A(p)\),
where \(M^{\dagger }\) denotes the Moore–Penrose pseudoinverse of \(M\).
An assignment matrix for \(N\) data points distributed over \(m\) worker machines is a matrix \(A \in \mathbb {R}^{N \times m}\) with \(A_{ij} \neq 0\) if and only if the \(i\)th data point is held by machine \(j\). Writing \(f_i(\theta ) := L(x_i, y_i; \theta )\) for the per-point loss, each non-straggling machine \(j\) returns the single vector \(g_j := \sum _{i=1}^N A_{ij} \nabla f_i(\theta )\) to the parameter server.
At each round \(t\) the parameter server broadcasts \(\theta _t\), each non-straggling machine \(j\) returns \(g_j\), and the server picks a decoding coefficient vector \(w \in \mathbb {R}^m\) with \(w_j = 0\) whenever machine \(j\) straggles, and updates
If a scheme can always achieve \(\alpha = \mathbb {1}\) (the all-ones vector) it recovers full-batch gradient descent; otherwise we are in the regime of approximate gradient coding.
A differentiable function \(f : \mathbb {R}^k \to \mathbb {R}\) is convex if \(f(y) \ge f(x) + \langle \nabla f(x), y - x\rangle \) for all \(x,y\); it is \(\mu \)-strongly convex if \(f - \tfrac {\mu }{2}|\cdot |_2^2\) is convex; and it has an \(L\)-Lipschitz gradient if \(|\nabla f(x) - \nabla f(y)|_2 \le L|x-y|_2\) for all \(x,y\).
A graph assignment scheme corresponding to a graph \(G\) with \(n\) vertices and \(m\) edges is the matrix \(A \in \{ 0,1\} ^{n \times m}\) with \(A_{ij} = 1\) iff the \(j\)th edge of \(G\) has vertex \(i\) as an endpoint. Here the \(n\) vertices are the data blocks and the \(m\) edges are the machines, so each machine holds exactly two data blocks. Each block consists of \(\tfrac {dN}{2m}\) data points, the number of blocks is \(n = \tfrac {2m}{d}\), the computational load (points per machine) is \(\ell = \tfrac {dN}{m}\), and the replication factor is \(d = \tfrac {2m}{n}\), independent of block size. Results are stated in terms of the \(n \times m\) block-to-machine matrix.
Given an assignment matrix \(A\) and a set of straggling machines, the optimal decoding coefficient vector is any
and we write \(\alpha ^* := Aw^*\). Here \(|\cdot |_2\) is the Euclidean norm. The vector \(w^*\) need not be unique but \(\alpha ^*\) is.
Given an assignment matrix \(A \in \mathbb {R}^{N \times m}\), the random decoding error under a \(p\) fraction of random stragglers is
where \(S \subseteq [m]\) is a random subset including each index independently with probability \(p\). We drop the subscript and write \(\mathbb {E}\) for the expectation over the random straggler set.
The replication factor of an assignment matrix \(A \in \mathbb {R}^{N \times m}\) is the average number of times a data point is replicated, that is, the number of non-zero entries of \(A\) divided by \(N\).
For a graph assignment scheme on \(G = (V,E)\) and straggler probability \(p\), let \(G(p)\) be the random subgraph in which each edge of \(G\) is deleted independently with probability \(p\) (a straggling machine). The decoding vector \(w^*\) has one (possibly nonzero) coordinate \(w^*_e\) per edge \(e\) of \(G(p)\), and \(\alpha ^*_v = \sum _{e \ni v} w^*_e\) is the sum of weights of edges incident to \(v\).
Let \(G = (V,E)\) be a \(d\)-regular graph with adjacency matrix \(\mathcal{A}(G)\), whose eigenvalues are \(\lambda _1(\mathcal{A}(G)) = d \ge \lambda _2(\mathcal{A}(G)) \ge \cdots \). The spectral expansion (spectral gap) of \(G\) is \(\lambda := \lambda _1(\mathcal{A}(G)) - \lambda _2(\mathcal{A}(G)) = d - \lambda _2(\mathcal{A}(G))\). \(G\) is a \(\lambda \)-spectral expander if its spectral expansion is at least \(\lambda \).
A coding scheme is unbiased if \(\mathbb {E}[\alpha ] = c\mathbb {1}\) for some constant \(c\) (so that \(\mathbb {E}\sum _i \alpha _i \nabla f_i(\theta ) = c\nabla f(\theta )\) where \(f := \sum _i f_i\)). For an unbiased scheme we define the normalized vector \(\overline{\alpha } := \alpha /c\), so that \(\mathbb {E}[\overline{\alpha }] = \mathbb {1}\).
For a graph \(G = (V,E)\) let \(\operatorname {Aut}(G) \subset \mathcal{S}_{|V|}\) be its group of automorphisms. \(G\) is vertex transitive if for all \(u, v \in V\) there is an automorphism \(\sigma \in \operatorname {Aut}(G)\) with \(\sigma (u) = v\).
In the setting of Proposition 50, for any step size \(\gamma \),
For any edge \(e = (u,v)\) of \(G(p)\), the optimal \(\alpha ^*\) satisfies \(\alpha ^*_u + \alpha ^*_v = 2\).
If a connected component \(C = L \cup R\) of \(G(p)\) is bipartite with \(|L| \ge |R|\), then \(\alpha ^*_u = 1 + \tfrac {|L|-|R|}{|L|+|R|}\) for \(u \in R\) and \(\alpha ^*_v = 1 - \tfrac {|L|-|R|}{|L|+|R|}\) for \(v \in L\).
Let \(X_1, \dots , X_t\) be i.i.d. Bernoulli random variables with mean \(q\) and let \(\delta \in (0,1)\). Then
With probability at least \(1 - \tfrac {4}{n}\), simultaneously for all vertices \(v\), either the component of \(v\) has size less than \(k\) or has size strictly greater than \(n/2\), where \(k = \tfrac {2(1+\epsilon )}{\epsilon ^2} (2\log (n) - 2\log (\epsilon ) + 2\log (1+\epsilon ) - \log (1-p))\).
For any permutation \(\rho \in \mathcal{S}_n\), with \(G(x)\) as above,
With \(\sigma ^2 = \sum _i|\nabla f_i(x^*)|_2^2\) and \(r,s\) as above,
With \(G = G(x_k)\), \(|G^TG\mathbb {1}|_2 \le LL'|y_k|_2^2 + L\sigma |y_k|_2\).
With \(G = G(x_k)\),
With \(G = G(x_k)\), \(\sigma _1(G^TG) \le 2|y_k|_2^2(L')^2 + 2\sigma ^2\).
At the end of any step \(t\) with \(|V_t| \le n/2\),
For any set \(S\) of \(c \le \log (n)\) vertices, the probability that every vertex of \(S\) lies in a component of size at most \(k\) is at most \(\Big(\tfrac {p^{\lambda (1-\frac{1}{3+\epsilon })}}{(1-p)^2}\Big)^c\).
Let \(I_v = \mathbf{1}[|C_v| \le k]\) and \(Y = \sum _v I_v\). Then
Consequently, with probability \(\ge 1 - \tfrac {4}{n} - \tfrac 1n\), there is a giant component of size \(\ge n\big(1 - \tfrac {e p^{\lambda (1-\frac{1}{3+\epsilon })}}{(1-p)^2}\big)\) and no vertices in components of size between \(k\) and \(n/2\) (establishing Theorem 29).
For a convex function \(f\) whose gradient has Lipschitz constant \(L\),
For all vertices \(v\) in a single connected component of \(G(p)\), the value \(|1 - \alpha ^*_v|\) is the same.
Let \(G\) be a \(d\)-regular graph on \(n\) vertices with spectral expansion \(\lambda \) (Definition 18). For any sets of vertices \(S\) and \(T\),
where \(E(S,T)\) denotes the set of edges between \(S\) and \(T\).
If \(\varphi \) is convex and \(X\) is an integrable random variable, then \(\varphi (\mathbb {E}[X]) \le \mathbb {E}[\varphi (X)]\).
Given the set of non-straggling machines, \(w^*\) can be computed in \(O(m)\) time: perform a breadth-first search of \(G(p)\) to find connected components and the sides \(L,R\) of any bipartite component; set \(\alpha ^*_v\) on each component using Lemmas 24–25; then a depth-first search labels each edge with \(w^*_e\) so that the incident edge weights sum to \(\alpha ^*_v\).
If \(X \ge 0\) is a nonnegative random variable and \(a {\gt} 0\), then \(\Pr [X \ge a] \le \mathbb {E}[X]/a\).
If a connected component of \(G(p)\) contains an odd cycle (is not bipartite), then \(\alpha ^*_v = 1\) for all vertices \(v\) in that component.
Let \(S(\alpha ) = \sum _{i=1}^{\infty } X_i\) be a random walk where \(X_i = \alpha \) with probability \(1-p\) and \(X_i = -1\) with probability \(p\). Let \(S^{(\beta )}(\alpha ) = \beta + S(\alpha )\). If \(\alpha {\gt} 1\), then for any positive integer \(c\) the probability that \(S^{(c\alpha )}(\alpha )\) ever goes below zero is at most \(\Big(\tfrac {p^{\alpha }}{(1-p)^2}\Big)^c\).
For every positive integer \(k\), \(\left(\dfrac {k!}{2^{k/2}(k/2)!}\right)^{1/k} \le \sqrt{k}\).
If \(A\) and \(B\) are positive semidefinite matrices, then \(\operatorname {Tr}(AB) \le |A|_2 \, \operatorname {Tr}(B)\), where \(|A|_2\) is the operator norm.
Let \(A \in \mathbb {R}^{N \times m}\) be any assignment matrix on \(N\) data points in which each data point is replicated exactly \(d\) times and each machine holds exactly \(\ell \) data points. Let \(\sigma _2\) be the second largest singular value of \(A\). For any set \(S\) of \(s\) stragglers there exist decoding coefficients \(w = w(S)\) such that
For any graph assignment scheme with \(\lfloor pm\rfloor \) stragglers and replication factor \(d\), the stragglers can be chosen so that at least \(\tfrac {mp}{d}\) data blocks are held only at straggling machines, whence for any decoding coefficients, \(\tfrac 1n|\alpha - \mathbb {1}|_2^2 \ge \tfrac {mp}{dn} = \tfrac {p}{2}\), using \(nd = 2m\).
Consider any assignment scheme \(A\) with \(m\) machines, \(n\) data blocks and \(n \le \operatorname {nnz}(A) \le dn\). Suppose a decoding algorithm yields an unbiased gradient, \(\mathbb {E}[\alpha ] = c\mathbb {1}\). Then
For any \(n\), \(q\) and \(c \le \log (n)\),
Let \(f = \sum _i f_i\) be \(\mu \)-strongly convex with \(L\)-Lipschitz gradient, each \(f_i\) convex with \(L'\)-Lipschitz gradient, \(x^*\) the minimizer, and \(\sigma ^2 = \sum _i|\nabla f_i(x^*)|_2^2\). Perform gradient descent with \(x_{k+1} = x_k - \gamma \sum _i \alpha _i^{(k)}\nabla f_i(x_k)\) where at each iteration \(|\alpha ^{(k)} - \mathbb {1}|_2^2 \le r^2\). Let \(a = 1 - \tfrac {r\sqrt{L'}}{\sqrt\mu }\) and suppose \(a {\gt} 0\). For any \(0 {\lt} \epsilon \le 1\), with constant step size \(\gamma = \tfrac {\epsilon a\mu }{4(L^2 + 2rLL’ + 4r^2(L’)^2)}\), after
iterations, \(|x_k - x_*|_2 \le (1 + \epsilon )\tfrac {r\sigma }{a\mu }\).
Let \(f = \sum _{i=1}^N f_i\) be \(\mu \)-strongly convex with \(L\)-Lipschitz gradient, each \(f_i\) convex with \(L'\)-Lipschitz gradient, \(\theta ^*\) the minimizer, \(\sigma ^2 = \sum _i|\nabla f_i(\theta ^*)|_2^2\). Suppose a possibly biased coding scheme with computational load \(\ell \) has \(\tfrac 1N\mathbb {E}[|\alpha - \mathbb {1}|_2^2] \le \zeta \) for some \(\zeta \le \tfrac 14\). Debias it via Proposition 59 (load \(\le 2\ell \)) and run gradient coding as in Algorithm 43 with \(w\) from the original scheme, with straggling probability \(p\). Then for any \(\epsilon \) there is a step size \(\gamma \) such that after
steps, \(\mathbb {E}[|\theta _k - \theta ^*|_2^2] \le \epsilon \), where \(\epsilon _0 = |\theta _0 - \theta ^*|^2\).
Let \(f = \sum _{i=1}^n f_i\) be \(\mu \)-strongly convex with \(L\)-Lipschitz gradient, each \(f_i\) convex with \(L'\)-Lipschitz gradient, \(\theta ^*\) the minimizer, and \(\sigma ^2 = \sum _i|\nabla f_i(\theta ^*)|_2^2\). Perform gradient coding with optimal decoding (Algorithm 43) with an assignment matrix for a \(d\)-regular vertex-transitive graph of spectral gap \(d - o(d)\), so \(m = \tfrac {nd}{2}\), and machine straggling probability \(p\). Then for any accuracy \(\epsilon \) there is a step size \(\gamma \) such that after
steps, \(\mathbb {E}[|\theta _k - \theta ^*|_2^2] \le \epsilon \), where \(\epsilon _0 = |\theta _0 - \theta ^*|^2\).
Let \(f = \sum _{i=1}^n f_i\) be \(\mu \)-strongly convex with \(L\)-Lipschitz gradient, each \(f_i\) convex with \(L'\)-Lipschitz gradient, and let \(x^*\) be the minimizer. Run SGD-ALG\((P_\beta , x_0, \gamma , \{ f_i\} , k)\) from \(x_0\) for \(k\) iterations with step size \(\gamma \le \tfrac {1}{sL’+L}\) and a distribution \(P_\beta \) with \(\mathbb {E}[\beta ] = \mathbb {1}\). Let \(\sigma ^2 = \sum _i|\nabla f_i(x^*)|_2^2\), \(r = \tfrac 1n\mathbb {E}[|\beta - \mathbb {1}|_2^2]\), and \(s = |\mathbb {E}[(\beta - \mathbb {1})(\beta - \mathbb {1})^T]|_2\). Then
the expectation being over \(\rho \) and \(\{ \beta ^{(j)} : j {\lt} k\} \).
Suppose there is an assignment matrix \(A\) with computational load \(\ell \) on \(m\) machines and \(N\) data blocks, and a decoding strategy \(w\) with \(\alpha \) such that \(\mathbb {E}[|\alpha - \mathbb {1}|_2^2] \le \epsilon N\). Then there exists an assignment matrix \(\hat A\) with computational load at most \(2\ell \) on \(m\) machines and \(N\) data blocks, and a decoding strategy \(\hat w\) with \(\hat\alpha \) such that
Consider any assignment scheme \(A\) with \(m\) machines, \(n\) data blocks, and \(n \le \operatorname {nnz}(A) \le dn\). Suppose a fixed decoding scheme is used that yields an unbiased gradient: (1) for some \(\hat w\), \(w_j = \hat w_j\) if machine \(j\) does not straggle and \(0\) otherwise; (2) \(\mathbb {E}[\alpha ] = c\mathbb {1}\). Then
(For graph schemes \(m = dn/2\), so the second bound is \(\ge \tfrac {2p}{d(1-p)}\).)
The lower bound of Proposition 57 holds even if the distributed algorithm uses a more complicated coding strategy than the linear scheme of the introduction, including non-linear or coordinate-wise coding of the gradients (e.g. multiplying gradients by a matrix).
For a graph assignment scheme with replication factor \(d\), \(\tfrac 1n\mathbb {E}[|\overline{\alpha ^*} - \mathbb {1}|_2^2] \ge p^d\), because \(p^d\) is the probability that a fixed data block is stored only at straggling machines. Theorem 27 therefore implies the variance of \(\alpha ^*\) shrinks exponentially in \(d\); for Ramanujan graphs (\(\lambda \ge d - 2\sqrt{d-1}\)) the bound is tight up to factors in the exponent.
Let \(G = (V,E)\) be any vertex-transitive graph with \(n\) vertices, \(m\) edges, and spectral expansion \(\lambda \), and let \(A\) be the graph assignment matrix of \(G\). Suppose for some positive \(\epsilon \):
\(\lambda {\gt} 1.5\);
\((\tfrac {\lambda }{2}+1)(1 - p e^{1/\lambda }) \ge 1 + \epsilon \);
\(\Big(1 - \tfrac {e^2 p^{\lambda (1-\frac{1}{3+\epsilon })}}{(1-pe^{1/\lambda })^2}\Big) \Big(\lambda - (2d-\lambda )\tfrac {e^2 p^{\lambda (1-\frac{1}{3+\epsilon })}}{(1-pe^{1/\lambda })^2}\Big) \ge \epsilon \);
\(\tfrac {n}{\log (n)^2} \ge \tfrac {4(1+\epsilon )}{\epsilon ^2} \big(2 - 2\log (\epsilon ) + 2\log (1+\epsilon ) - \log (1 - pe^{1/\lambda })\big)\).
If each machine straggles independently with probability \(p\), then
\(\mathbb {E}[\alpha ^*] = r\mathbb {1}\) for some \(r \ge 1 - \tfrac {6}{n} - t\);
for all \(i\), \(\mathbb {E}[(\alpha ^*_i - r)^2] \le \tfrac {6}{n} + t\);
\(\big|\mathbb {E}[(\alpha ^* - r\mathbb {1})(\alpha ^* - r\mathbb {1})^T]\big|_2 \le 2k^2\big(t + \tfrac {6}{n}\big)^2 + 24\),
where \(t = \tfrac {e^2 p^{\lambda (1-\frac{1}{3+\epsilon })}}{(1-pe^{1/\lambda })^2}\) and \(k = \tfrac {2(1+\epsilon )}{\epsilon ^2} \big(2\log (n) - 2\log (\epsilon ) + 2\log (1+\epsilon ) - \log (1-p)\big)\), and all expectations are over the random stragglers.
Let \(G\) be any \(d\)-regular \(\lambda \)-spectral expander on \(n\) vertices and suppose for some positive \(\epsilon \): (1) \(\lambda {\gt} 1.5\); (2) \((\tfrac {\lambda }{2}+1)(1-p) \ge 1 + \epsilon \); (3) \(\tfrac {n}{\log (n)^2} \ge \tfrac {2(3+\epsilon )(1+\epsilon )}{\epsilon ^2} \big(2 - 2\log (\epsilon ) + 2\log (1+\epsilon ) - \log (1-p)\big)\). Let \(G(p)\) be the random sparsification deleting each edge with probability \(p\). With probability at least \(1 - \tfrac {5}{n}\):
\(G(p)\) has a giant component of size at least \(n\Big(1 - \tfrac {e p^{\lambda (1-\frac{1}{3+\epsilon })}}{(1-p)^2}\Big)\);
every vertex is either in a component of size at most \(k\), where \(k\) is as in Theorem 27, or in a component of size greater than \(n/2\).