1 Introduction

1.1 Background and Motivation

In this paper we investigate the typical structure of the largest component after supercritical percolation in a certain class of high-dimensional graphs. Of particular interest, both in their own right, but also as a tool to study other structural properties, are the isoperimetric properties of the largest component, which have proven to be key to understanding the large-scale structure of the giant component in many percolation models. Unsurprisingly, in order to understand the likely isoperimetric properties of the giant component, it is first essential to study the isoperimetric properties of the host graph.

Very generally, for any space which is endowed with a notion of volume and boundary, the isoperimetric problem is to determine which sets of fixed volume have the smallest boundary. In the case of graphs, a natural notion of boundary to consider is the edge-boundary. Given a graph \(G=(V,E)\) and a subset of the vertices \(S\subseteq V(G)\), we write \(\partial (S)\) for the edge-boundary of S, that is, the set of edges with one endpoint in S and one endpoint in \(V(G)\setminus S\). The isoperimetric problem is then equivalent to determining, for each \(k \in \mathbb {N}\), the parameter

$$\begin{aligned} i_k(G) {:}{=}\min _{S\subseteq V(G), |S|=k}\left\{ \frac{|\partial (S)|}{k}\right\} , \end{aligned}$$

and characterising the sets which achieve this minimum. Of particular interest is the edge-isoperimetric constant of G, given by

$$\begin{aligned} i(G){:}{=}\min _{k\le |V(G)|/2} \{i_k(G)\}. \end{aligned}$$

This is also sometimes called the Cheeger constant, as it can be viewed as a discrete analogue of the Cheeger isoperimetric constant of a compact Riemannian manifold [25]. It turns out that the Cheeger constant is a fundamental graph parameter, and can be used to demonstrate deep links between the combinatorial, geometric, spectral and stochastic properties of graphs. For this reason expander graphs, roughly speaking graphs whose Cheeger constant is bounded from below by an absolute constant, have turned out to be very important in diverse areas of discrete mathematics and computer science. We refer the reader to [47] for a comprehensive survey on expander graphs and their application.

Whilst in general it is NP-hard to determine even the edge-isoperimetric constant of an arbitrary graph [38], much is known about the isoperimetric properties of particularly well-structured graph classes. In particular, a classical result of Harper solves the isoperimetric problem on the d-dimensional (binary) hypercube \(Q^d\), whose vertex set is \(\{0,1\}^d\), and in which two vertices are adjacent if and only if their Hamming distance is one. Harper’s result implies the following isoperimetric inequality:

Theorem 1.1

([40], see also [11, 43, 57]). Let \(d \in \mathbb {N}\). For every \(k\in \left[ 2^d\right] \)

$$\begin{aligned} i_k\left( Q^d\right) \ge d-\log _2 k. \end{aligned}$$

Furthermore, the only sets which achieve equality in the above estimate are subcubes.

The isoperimetric problem has also been solved, at least asymptotically, in many other classes of lattice-like graphs, such as grids [1, 19], Cartesian powers of graphs [15, 24], and Abelian Cayley graphs [8, 9, 55]. For further background, we refer the reader to the surveys [13, 14, 42] on discrete isoperimetric problems.

On the other hand, the isoperimetric properties of particularly ‘unstructured’ graphs, that is, graphs without any clear geometric structure, have also been well-studied. It is known that Erdős-Rényi (binomial) random graphs [36, 48] and random d-regular graphs [16] have typically good expansion properties, and one can view the well-known Expander Mixing Lemma, due to Alon and Chung [5], as a bound on the edge-isoperimetric constant of pseudo-random \((n,d,\lambda )\)-graphs (see also [6]). Furthermore, the isoperimetric properties of such graphs have been a key tool in the study of their structural properties.

In this paper, we consider a mixture of these two paradigms. We study properties of random subgraphs of graphs coming from a family of graphs which are quite structured—arising from high-dimensional products of bounded graphs. As in other percolation models, it turns out that the isoperimetric properties of these random subgraphs are key to understanding their large-scale structure, and that in order to understand the likely isoperimetric properties in the percolated subgraphs, it is useful first to study the isoperimetric problem in the underlying product graphs.

Given a sequence of graphs \(G^{(1)},\ldots , G^{(t)}\), the Cartesian product of \(G^{(1)},\ldots , G^{(t)}\), denoted by \(G=G^{(1)}\square \cdots \square G^{(t)}\) or \(G=\square _{j=1}^{t}G^{(j)}\), is the graph with the vertex set

$$\begin{aligned} V(G)=\left\{ v=(v_1,v_2,\ldots ,v_t) :v_j\in V(G^{(j)}) \text { for all } j \in [t]\right\} , \end{aligned}$$

and the edge set

$$\begin{aligned} E(G)=\left\{ uv :\begin{array}{l} \text {there is some } j\in [t] \text { such that } u_jv_j\in E\left( G^{(j)}\right) \\ \text { and } u_m=v_m \text { for all } m \ne j \end{array}\right\} . \end{aligned}$$

We call \(G^{(j)}\) the base graphs of G. Note that if each \(G^{(j)}\) is \(d_j\)-regular, then G is d-regular with \(d {:}{=}\sum _{j=1}^t d_j\). Many well-studied families of graphs arise in this manner. For example, the t-dimensional hypercube \(Q^t\) is the t-fold Cartesian product of a single edge. Other examples include tori, grids, Hamming graphs, and many examples of Cayley graphs of groups arising from direct products.

We will be interested in properties of random subgraphs of high-dimensional product graphs, that is, we consider bond percolation on these graphs. Percolation theory was initiated in 1957 by Broadbent and Hammersley [23] in order to model the flow of fluid through a medium with randomly blocked channels, and has become a major area of research. In (bond) percolation, given a host graph G and a probability \(p\in [0,1]\), we form the random subgraph \(G_p\) by including every edge of G independently with probability p. Percolation has been studied extensively on various geometric ‘lattice-like’ classes of graphs, and in particular on many of the families of graphs which arise naturally as high-dimensional product graphs such as high-dimensional hypercubes [3, 18], tori [44, 45], or Hamming graphs [21, 22] (see [46, Chap. 13] for a survey on many important results in these models). We refer the reader to the monographs [20, 39, 49] for a more comprehensive background on percolation theory.

There is an intrinsic connection between the phase transition in percolated graphs, and the isoperimetric properties of the host graph. This connection can be seen, albeit implicitly, already in the classical phase transition result of Erdős and Rényi [33]. In the case of percolated expander graphs, this connection is explicit in the work of Alon, Benjamini and Stacey [4], and in the case of percolated pseudo-random graphs in the work of Frieze, Krivelevich and Martin [37]. Ajtai, Komlós, and Szemerédi [3] proved that \(Q^d_p\) undergoes a phase transition quantitatively similar to the one which occurs in G(np), and their work was later extended by Bollobás, Kohayakawa, and Łuczak [18]—both of which explicitly rely on the isoperimetric properties of the hypercube.

Furthermore, above the percolation threshold the connection between the isoperimetric properties of the host graph G, the expansion properties of the percolated graph \(G_p\), and the combinatorial properties of the resulting giant component in \(G_p\) has been made explicit in several works. To mention a few, Fountoulakis and Reed [34, 35], and, independently, Benjamini, Kozma and Wormald [10] study the asymptotic mixing time of a random walk on the giant component of G(np) using the likely expansion properties of connected sets (and, implicitly, the isoperimetric properties of the complete graph); Riordan and Wormald [59] utilise likely expansion properties in the giant component of G(np) in order to bound its typical diameter; and Erde, Kang and Krivelevich [32] use the isoperimetric properties of \(Q^d\) to show typical expansion properties of the giant component of \(Q^d_p\), and derive from them the current best known bounds on its likely circumference (that is, the length of a longest cycle), typical diameter and asymptotic mixing time.

Recently, generalising the results of [3, 18] on \(Q^d\), the authors showed that any high-dimensional product graph, whose base graphs are bounded in order and regular, undergoes a phase transition in terms of its component structure around \(p=\frac{1}{d}\), where d is the degree of the product graph, and that this phase transition is quantitatively similar to that of G(np). Given a constant \(\epsilon >0\), let us define \(y{:}{=}y(\epsilon )\) to be the unique solution in (0, 1) of the equation

$$\begin{aligned} y=1-\exp \left( -(1+\epsilon )y\right) . \end{aligned}$$
(1)

Theorem 1.2

(Theorem 2 in [29]). Let \(C>1\) be a constant and let \(\epsilon >0\) be sufficiently small. For all \(j\in [t]\), let \(G^{(j)}\) be a connected regular graph of degree \(d_j\) such that \(1<\big |V\left( G^{(j)}\right) \big |\le C\). Let \(G=\square _{j=1}^{t}G^{(j)}\), let \(n{:}{=}|V(G)|\) and let \(p=\frac{1+\epsilon }{d}\), where \(d{:}{=}d(G) =\sum _{j=1}^{t}d_j\) is the degree of G. Then, whpFootnote 1, there exists a unique component of order \(\left( 1+o(1)\right) yn\) in \(G_p\), where \(y=y(\epsilon )\) is defined as in (1). Furthermore, whp, all the remaining components of \(G_p\) are of order \(O_{\epsilon , C}(d)\).

This can perhaps be viewed as an example of the universality of the phase transition that G(np) undergoes—in many percolation models various aspects of the phase transition close to the critical point seem to behave in a quantitatively similar manner, under the right rescaling, independently of the host graph (see, for example [46]). In this case, the proportion y of the host graph G which is covered by the giant component is the same as arises in G(np) [33], but also in supercritical percolation in the hypercube [3, 18], pseudo-random graphs [37], and many other percolation models, see for example [17, 58].

Whilst the internal structure of the giant component in G(np) is reasonably well understood, in other percolation models, such as hypercube percolation, many basic questions about the structure of the giant component remain unanswered, although in light of this universality phenomena there are natural conjectures suggested by the structure in G(np). Since the expansion properties of the giant component in G(np) have been key to understanding its likely structural properties, in order to better understand the structure of the giant component in percolated high-dimensional product graphs it is natural to ask about its expansion properties, and in order to answer this question it seems crucial to understand first the isoperimetric properties of general high-dimensional product graphs.

A well-known result of Chung and Tetali [26] (see also Tillich [60]) shows that, at least on a broad scale, the isoperimetric properties of a product graph are closely related to those of the base graphs.

Theorem 1.3

(Theorem 2 of [26]). Let \(G^{(1)},\ldots , G^{(t)}\) be such that \(|V(G^{(j)}|>1\) for all \(j\in [t]\). Let \(G = \square _{j=1}^{t}G^{(j)}\). Then

$$\begin{aligned} \min _j \left\{ i\left( G^{(j)}\right) \right\} \ge i(G) \ge \frac{1}{2}\min _j \left\{ i\left( G^{(j)}\right) \right\} . \end{aligned}$$

However, on a finer scale we might expect smaller sets in a product graph to expand by a larger factor than is suggested by Theorem 1.3. Indeed, in the case of the hypercube, Theorem 1.3 gives a much weaker bound on the expansion of small sets than is implied by Theorem 1.1, where the expansion of small sets is asymptotically optimal, and this optimal expansion is critical to understanding the distribution of small percolation clusters in \(Q^d_p\). It is thus natural to ask whether similar isoperimetric results hold on a finer scale for arbitrary product graphs.

1.2 Main Results

Our first main results are two edge-isoperimetric inequalities for high-dimensional product graphs, under mild assumptions on the base graphs. The first concerns high-dimensional product graphs whose base graphs are bounded and regular.

Theorem 1

Let \(C>1\) be an integer. For all \(j \in [t]\), let \(G^{(j)}\) be a \(d_j\)-regular graph with \(1<|V(G^{(j)})|\le C\). Let \(G=\square _{j=1}^tG^{(j)}\), let \(n{:}{=}|V(G)|\) and let \(d {:}{=}\sum _{j=1}^t d_j\). Then for any \(k\in [n]\),

$$\begin{aligned} i_k(G) \ge d-(C-1)\log _2 k. \end{aligned}$$

Observe that if \(\log _2 k\ll d\), then Theorem 1 implies that \(i_k(G) \ge (1-o(1))d\) which, since G is d-regular, is asymptotically optimal. Furthermore, in the particular case of \(Q^d\) we have that \(C=2\), and this result recovers the tight bound for the hypercube (Theorem 1.1). Note, however, that when the base graphs are larger, there are \(k \in [n]\) with \((C-1)\log _2 k > d\), for which Theorem 1 gives a trivial bound for \(i_k(G)\).

The second isoperimetric inequality holds for high-dimensional product graphs whose base graphs are bounded and connected (and not necessarily regular), and gives an effective bound for larger values of k.

Theorem 2

Let \(C>1\) be an integer. For all \(j \in [t]\), let \(G^{(j)}\) be a connected graph with \(1< |V(G^{(j)})|\le C\). Let \(G=\square _{j=1}^tG^{(j)}\) and let \(n{:}{=}|V(G)|\). Then for any \(k\in [n]\),

$$\begin{aligned} i_k(G) \ge \frac{1}{C-1} \log _C \left( \frac{n}{k}\right) . \end{aligned}$$

Note that, taking \(C=2\), Theorem 2 also implies the classical edge-isoperimetric bound for the hypercube. Furthermore, in general Theorem 2 implies that \(i_k(G)=\Omega \left( \ln \left( \frac{n}{k}\right) \right) \) for all \(k\in [n]\), which recovers the asymptotic result of Tillich [60] on high-dimensional Cartesian powers of graphs, which was proved using analytic methods inspired by isoperimetric problems in Riemannian geometry. Let us also mention a related result of Lev [55] which shows that \(i_k(G)\) has the same asymptotic growth rate in any Abelian Cayley graph, where the implicit constant depends on the exponent of the underlying group. Moreover, for many different types of product graphs where the isoperimetric problem has been studied, among them Hamming graphs [14] and the d-dimensional torus graphs [24], the bound given by Theorem 2 is known to be asymptotically tight up to a multiplicative constant. In fact, as we will discuss in more detail in Sect. 7, it can be shown that Theorem 2 is asymptotically tight for any high-dimensional product graph all of whose base graphs are isomorphic.

Using these new isoperimetric inequalities, we are able to derive several likely expansion properties of the giant component after percolation in a high-dimensional product graph whose base graphs are regular and of bounded order. These typical expansion properties which we will present, and their consequences, not only generalise but also improve substantially upon the known typical bounds in \(Q^d_p\) given in [32]. We note that while we present the results in the supercritical regime, that is when \(\epsilon >0\) is a small constant and \(p=\frac{1+\epsilon }{d}\), the results naturally extend (with slight adaptations in the statements) to the sparse regime, that is, when \(p=\frac{c}{d}\) for constant \(c>1\).

Given a graph G, a subset \(S\subseteq V(G)\) and \(r\in \mathbb {N}\), we denote by \(N^{r}_G(S)\) the r-th external neighbourhood of S in G, that is, the set of vertices in \(V(G)\setminus S\) which are at distance at most r from S in G. When \(r=1\), we omit the superscript.

Theorem 3

Let \(C>1\) be an integer. For all \(j\in [t]\), let \(G^{(j)}\) be a \(d_j\)-regular connected graph with \(1<|V(G^{(j)})|\le C\). Let \(G=\square _{j=1}^tG^{(j)}\), let \(n{:}{=}|V(G)|\) and let \(d {:}{=}\sum _{j=1}^t d_j\). Let \(\epsilon >0\) be a small enough constant and let \(p=\frac{1+\epsilon }{d}\). Let \(L_1\) be the largest component in \(G_p\). Then, there exists a positive constant \(c=c(\epsilon )\) such that whp,

  1. (a)

    for all \(k\le \frac{3\epsilon n}{2}\) and all subsets \(S\subseteq V(L_1)\) with \(|S|=k\),

    $$\begin{aligned} |\partial _{G_p}(S)|\ge \frac{c|S|}{d\ln d}; \end{aligned}$$
  2. (b)

    for all \(\epsilon ^2n\le k \le \frac{3\epsilon n}{2}\) and all subsets \(S\subseteq V(L_1)\) with \(|S|=k\),

    $$\begin{aligned} |N_{G_p}(S)|\ge \frac{c|S|}{d\ln d}. \end{aligned}$$

We note that, since G is d-regular, Theorem 3(a) implies a lower bound of \(\Omega \left( \frac{1}{d^2\ln d}\right) \) on the vertex-expansion of arbitrary subsets of \(L_1\). Theorem 3(b) then improves this by a factor of d for linear-sized sets.

If we make the additional assumption that our subset \(S\subseteq V(L_1)\) is connected, that is, \(G_p[S]\) is connected, then we are able to give stronger bounds on the expansion.

Theorem 4

Let \(C>1\) be an integer. For all \(j\in [t]\), let \(G^{(j)}\) be a \(d_j\)-regular connected graph with \(1<|V(G^{(j)})|\le C\). Let \(G=\square _{j=1}^tG^{(j)}\), let \(n{:}{=}|V(G)|\) and let \(d {:}{=}\sum _{j=1}^t d_j\). Let \(\epsilon >0\) be a small enough constant and let \(p=\frac{1+\epsilon }{d}\). Let \(L_1\) be the largest component in \(G_p\). Then, there exists a positive constant \(c=c(\epsilon )\) such that whp for all subsets \(S\subseteq V(L_1)\) with \(|S|=k\) and \(G_p[S]\) connected,

  1. (a)

    for all \(\frac{9\ln c \cdot d}{\epsilon ^2}\le k \le n^{\epsilon ^5}\),

    $$\begin{aligned} |N_{G_p}(S)|\ge c|S|; \end{aligned}$$
  2. (b)

    for all \(n^{\epsilon ^5}\le k\le \frac{3\epsilon n}{2}\),

    $$\begin{aligned} |\partial _{G_p}(S)|\ge \frac{c|S|\ln \left( \frac{n}{|S|}\right) }{d\ln d}. \end{aligned}$$

One interesting interpretation of Theorem 4, noting that the bound in Theorem 4(a) implies the bound in Theorem 4(b) for the same range of k, is as a sparsification of Theorem 2, and so in the particular case of the hypercube a sparsification of Harper’s theorem. In other words, recalling that we are interested in percolation with probability \(p=\Theta \left( \frac{1}{d}\right) \), broadly Theorem 4 tells us that, if we restrict ourselves to connected subsets which are not too small, then the naive isoperimetric inequality that holds in expectation in \(G_p\) by Theorem 2 for a given set, actually holds whp up to a logarithmic factor for all sets simultaneously. We note that the restriction to large connected sets here is necessary, due to the likely existence of bare paths of length \(\Theta (d)\) in \(G_p\), which can be shown by elementary arguments, which are connected but exhibit poor expansion, and in fact the likely existence of a disjoint family of such paths of large total volume.

Let us make a few clarifying remarks about Theorems 3 and 4. We note first that Theorem 4implies Theorem 3(a). Indeed, given such a (not necessarily connected) set S, each component K of \(G_p[S]\) either has order at least \(\frac{9d \ln C}{\epsilon ^2}\), and hence by Theorem 4whp has edge-boundary at least \(c\frac{|K|}{d\ln d}\), or has size at most \(\frac{9d \ln C}{\epsilon ^2}\) and at least one edge in its boundary, since \(L_1\) is connected, and hence has edge-boundary of order \(\Omega \left( \frac{|K|}{d}\right) \). Since the edge-boundaries for different components are disjoint, the claim follows.

We note further that the results in Theorems 3 and 4 are (almost-)optimal for a wide range of choices of k. Indeed, since \(|N_G(S)| \le |S|d\) for all subsets S, a simple first-moment calculation shows that Theorem 4(a) is optimal up to the constant factor. Moreover, Theorem 4(b) (and hence also Theorem 3(b)) are optimal up to the logarithmic factor in d. Indeed, consider the particular example of \(Q^d_p\) and let \(Q'\) be the subcube of \(Q^d\) obtained by fixing the first \(\log _2 x\) coordinates to be 0, noting that \(|V(Q')|=\frac{n}{x}{:}{=}k\), and that every vertex in \(Q'\) is adjacent to at most \(\log _2 x =\log _2\left( \frac{n}{k}\right) \) vertices in \(Q^d\setminus Q'\). Therefore, by a Chernoff-type bound, whp the edge-boundary of \(V(Q')\) (and hence its vertex-boundary) in \(Q^d_p\) has order \(O\Big (\frac{k \ln \left( \frac{n}{k}\right) }{d}\Big )\). In particular, if \(\log _2x\ll \epsilon d\), then \(Q'_p\) is supercritical and contains a connected subset S of order \(\Theta (k)\), whose edge-boundary (and hence vertex-boundary) has size at most that of \(V(Q')\), and hence is of order \(O\Big (\frac{k \ln \left( \frac{n}{k}\right) }{d}\Big ) = O\Big (\frac{ |S|\ln \left( \frac{n}{|S|}\right) }{d}\Big )\).

Finally, it is worth comparing Theorem 3 with the expansion properties of the giant component of \(Q^d_p\), as given in [32]. There, it was shown that for any set \(S\subseteq V(L_1)\), whp \(|N_{G_p}(S)|=\Omega \left( \frac{|S|}{d^5}\right) \), and for linear-sized subsets S whp \(|N_{G_p}(S)|=\Omega \left( \frac{|S|}{d^2\ln d}\right) \). In comparison, as mentioned above, it follows from Theorem 3 that whp for any set \(S\subseteq V(L_1)\), \(|\partial _{G_p}(S)|=\Omega \left( \frac{|S|}{d\ln d}\right) \) and \({|N_{G_p}(S)|=\Omega \left( \frac{|S|}{d^2\ln d}\right) }\), and for linear-sized subsets, whp \(|N_{G_p}(S)|=\Omega \left( \frac{|S|}{d\ln d}\right) \).

A particularly interesting consequence that we can derive from Theorem 3(b) is that typically \(L_1\) contains a linear-sized subgraph which is a good expander at all scales.

Theorem 5

Let \(C>1\) be an integer. For all \(j\in [t]\), let \(G^{(j)}\) be a \(d_j\)-regular connected graph with \(1<|V(G^{(j)})|\le C\). Let \(G=\square _{j=1}^tG^{(j)}\), let \(n{:}{=}|V(G)|\) and let \(d {:}{=}\sum _{j=1}^t d_j\). Let \(\epsilon >0\) be a small enough constant and let \(p=\frac{1+\epsilon }{d}\). Let \(L_1\) be the largest component in \(G_p\). Then, there exists a positive constant \(c=c(\epsilon )\) such that whp the following holds. There exists a subgraph \(H\subseteq L_1\) such that \(|V(H)|\ge \frac{3\epsilon n}{2}\), and for every \(S\subseteq V(H)\) with \(|S|\le \frac{|V(H)|}{2}\),

$$\begin{aligned} |N_{H}(S)|\ge \frac{c|S|}{d\ln d}. \end{aligned}$$

Remark 1.4

The fraction \(\frac{3}{2}\) in Theorem 5 can be replaced by any constant strictly smaller than 2. In particular, since whp \(|V(L_1)| = \left( 2\epsilon - O(\epsilon ^2)\right) n\), we can choose an H which covers almost all of the vertices of \(L_1\).

We note that, in the case of the hypercube, as shown in [32, Claim 5.2], whp every linear-sized subgraph of the giant component in a supercritical \(Q^d_p\) has edge-expansion \(O\left( \frac{1}{d}\right) \), and thus Theorem 5 is optimal up to the logarithmic factor in d. In the case of G(np), Benjamini, Kozma and Wormald [10], and Krivelevich [50] showed that in the supercritical regime there is typically a linear-sized subgraph H of the giant component with a constant edge- and vertex-expansion (see also [28]). This result and the accompanying structural description of the giant component in terms of this expanding subgraph given by Benjamini, Kozma and Wormald [10], can be used to determine the asymptotic order of many important structural parameters of the giant component in G(np). An analogous description of the structure of the giant component in a percolated high-dimensional product graph is likely to be useful for determining its finer structure.

Using Theorems 4 and 3(b), we can obtain several interesting consequences on the typical structure of \(L_1\).

Theorem 6

Let \(C>1\) be an integer. For all \(j\in [t]\), let \(G^{(j)}\) be a \(d_j\)-regular connected graph with \(1<|V(G^{(j)})|\le C\). Let \(G=\square _{j=1}^tG^{(j)}\), let \(n{:}{=}|V(G)|\) and let \(d {:}{=}\sum _{j=1}^t d_j\). Let \(\epsilon >0\) be a small enough constant, let \(p=\frac{1+\epsilon }{d}\), and let \(L_1\) be the largest component of \(G_p\). Then whp,

  1. (a)

    the diameter of \(L_1\) is \(O(d\ln ^2 d)\);

  2. (b)

    the mixing time of a lazy random walk on \(L_1\) is \(O(d^2\ln ^2d)\);

  3. (c)

    the circumference of \(L_1\) is \(\Omega \left( \frac{n}{d \ln d}\right) \).

The bounds given in Theorem 6(a), (b), and (c) are close to optimal, up to a multiplicative factor of \(\ln ^2d\) in the first two cases and of \(d \ln d\) in the latter case. In the case of Theorem 6(c) this is immediate, and in the other two cases, this follows from the likely existence in \(G_p\) of a bare path of length \(\Omega (d)\). Furthermore, note that these typical bounds not only generalise but also improve substantially upon the typical bounds in \(Q^d_p\) given in [32].

The structure of the paper is as follows. In Sect. 2, we provide an outline of the proofs of our main results, stressing also the main challenges one needs to overcome, our approach towards them and the key novelties of this paper. In Sect. 3, we present and establish several lemmas that will be useful for us throughout the paper. In Sect. 4, we prove Theorems 1 and 2 (the reader who is interested in the implications of our results for the hypercube can recall Harper’s inequality: \(i_{k}(Q^d)\ge d-\log _2k\), think of our base graphs as \(K_2\), and skip Sect. 4). In Sect. 5 we prove Theorems 34 and 5. In Sect. 6 we prove Theorems 6(c), (a) and (b). Finally, in Sect. 7 we mention some questions and open problems.

2 Outline of the Proofs

For the proof of Theorem 1, since we assume that the graph is regular, it suffices to bound from above the density of any set of a given size. To that end, we can use the product structure of G to decompose it into disjoint projections of lower dimension. Then, given a subset \(S\subseteq V(G)\), this decomposition of G induces a partition of S, and we can express the density of S as a function of the density inside each partition class and the density between the partition classes. Since each partition class lives in a lower dimensional projection, we can bound its density inductively. However, whilst our desired bound is subadditive, we require a stronger inequality (Corollary 4.2) to account for the cross-partition density, which we prove using a novel entropic argument (Lemma 4.1).

The proof of Theorem 2 also utilises the entropy function. More explicitly, given a subset \(S \subseteq V(G)\), we consider a uniformly chosen random vertex in S, which we can consider as a random vector in the product space V(G). It can be shown that the entropy of suitable projections of this random vector can be bounded in terms of the edge boundary of S in a fixed direction. We can then combine these individual bounds into a bound for \(\partial (S)\) in terms of |S| using Shearer’s inequality.

Moving to our results on typical expansion properties of the giant component \(L_1\), for small enough sets, we can combine our almost tight isoperimetric inequality (Theorem 1) with good bounds on the number of connected subsets of G (Lemma 3.3) to argue via a first-moment calculation that it is unlikely that any small connected subset of \(L_1\) does not expand well. This allows one to derive Theorem 4(a). In the proof of Theorem 4(a) there is a trade-off, in (9), between the enumerative bound of the number of connected sets of size k, and the probability bound that these sets have small expansion, which is related to the isoperimetric inequality. For larger sets, the strategy of Theorem 4(a) is ineffective because of the limitations of the isoperimetric inequality, leading to a weaker probability bound, and for disconnected sets the strategy is ineffective due to a weaker enumerative bound, as there are many more disconnected sets than connected sets.

Thus, our key improvements come in the proof of Theorem 4(b) and Theorem 3, and therein lie several novel techniques, embedded in two key lemmas: Lemma 5.3 and Lemma 5.7. We argue via a two-round exposure. Setting \(\delta =\delta (\epsilon )\ll \epsilon \) we define \(p_2=\frac{\delta }{d}\) and let \(p_1\) be such that \((1-p_1)(1-p_2)=1-p\), so that \(G_p\) has the same distribution as \(G_{p_1}\cup G_{p_2}\), noting that \(p_1\ge \frac{1+\epsilon -\delta }{d}\). By Theorem 1.2, we know that whp \(G_{p_1}\) already contains a giant component of linear order, which we denote by \(L_1'\). Furthermore, note that whp, \(L_1'\) will be a subgraph of \(L_1\), the giant component of \(G_p\) (in fact, typically it will cover most of the vertices of \(L_1\)). We thus informally refer to \(L_1'\) as the early giant.

Key ideas of [32], which we generalise to the setting of high-dimensional product graphs, use an isoperimetric inequality (Theorem 2) to give a strong probability bound for the event that a given subset of \(L_1'\) does not expand well after sprinkling (see Lemma 5.2 and [32, Lemma 3.4]). However, a naive enumerative bound on the number of subsets of \(L'_1\) is too weak to conclude that whp every subset of \(L_1'\) expands well, using a union bound.

An essential contribution here is then a novel double counting argument to improve this enumerative bound. Indeed, we only need to demonstrate an expansion property for the subsets of \(L_1'\) which do not already expand inside \(L_1'\), where the required expansion factor is \(o_d(1)\). In particular, for each such set S the size of its boundary B in \(L_1'\) is significantly smaller than the size of S, and so naively, enumerating over the set of possible boundaries should be more effective than enumerating over the sets themselves. Of course, there may be many sets S with the same boundary, but we will see that again the assumption that S does not expand well will allow us to give an effective bound on the number of relevant S with boundary B (see Lemma 5.3).

Naturally, the subsets we consider can contain many vertices from the residue \(L_1-L_1'\), and thus showing good expansion of subsets of the early giant \(L_1'\) in \(L_1\) does not immediately imply good expansion in \(L_1\). Our second key contribution then lies in the analysis of the typical structure of subsets in the residue, and in particular their likely expansion into the early giant (see Lemmas 5.4 and 5.7). Having all these tools at hand, we prove Theorems 3 and 4.

The proof of Theorem 5 uses ideas from [51] to move from expansion at a fixed scale to expansion at all scales, together with our expansion result on large sets (Theorem 3(b)). Having found a large expander subgraph, one can then derive the existence of a long cycle (Theorem 6(c)) using techniques from [52]. For Theorem 6(a), we analyse the growth rate of a ball of given radius. To obtain tight results, we use the edge-expansion of connected sets given in Theorem 4, together with the fact that, typically, connected subsets of the random subgraph \(G_p\) are not dense, and thus edge-expansion is tightly connected to vertex expansion (see [54] for similar ideas). Finally, Theorem 6(b) follows from a careful analysis of the method of Fountoulakis and Reed together with our results on the expansion of connected sets (see [31, 35, 54] for somewhat similar implementations).

3 Preliminary Lemmas

We will use the following standard Chernoff type bound on the tail probabilities of the binomial distribution (see, for example, Appendix A in [7]):

Lemma 3.1

Let \(N\in \mathbb {N}\), let \(p\in [0,1]\), and let \(X\sim Bin(N,p)\). Then for any \(b>0\),

$$\begin{aligned} \mathbb {P}\left( X\ge bNp\right) \le \left( \frac{e}{b}\right) ^{bNp}. \end{aligned}$$

We will also use the well-known Azuma–Hoeffding inequality (see, for example, Chap. 7 in [7]),

Lemma 3.2

Let \(X = (X_1,X_2,\ldots , X_m)\) be a random vector with range \(\Lambda = \prod _{i \in [m]} \Lambda _i\) and let \(f:\Lambda \rightarrow \mathbb {R}\) be such that there exists \(D \in \mathbb {R}_+\) such that for every \(x,x' \in \Lambda \) which differ only in the jth coordinate,

$$\begin{aligned} |f(x)-f(x')|\le D. \end{aligned}$$

Then, for every \(b\ge 0\),

$$\begin{aligned} \mathbb {P}\left[ \big |f(X)-\mathbb {E}\left[ f(X)\right] \big |\ge b\right] \le 2\exp \left( -\frac{b^2}{2mD^2}\right) . \end{aligned}$$

We require the following bound on the number of k-vertex trees in a d-regular graph G, which follows immediately from Lemma 2 in [12].

Lemma 3.3

Let \(k \in \mathbb {N}\) and let \(t_k(G)\) be the number of trees on k vertices which are subgraphs of an n-vertex d-regular graph G. Then

$$\begin{aligned} t_k(G)\le n(ed)^{k-1}. \end{aligned}$$

In certain situations it will be useful to decompose a tree into connected parts of roughly equal size. In [32, Lemma 2.2] and [53, Proposition 4.5], such a result is given where the tree is decomposed into vertex-disjoint subsets, but where the gap between the sizes of the subsets grows with the maximum degree of the tree. For our purposes, we will require a similar result with tighter control over the size of the parts. To do so, we instead decompose into edge-disjoint subsets, which allows us to bound the difference in the sizes of the subsets independently of the tree.

Lemma 3.4

Let \(\ell >0\) be an integer. Let T be a tree with \(|V(T)|\ge \ell \). Then, there exist vertex sets \(A_1,\ldots A_s\) such that:

  1. (a)

    \(V(T)=\bigcup _{1\le i \le s}A_{i}\);

  2. (b)

    \(T[A_i]\) is connected for all \(i\in [s]\);

  3. (c)

    \(|A_i\cap \left( \bigcup _{j\in ([s]\setminus \{i\})}A_{j}\right) |\le 1\); and

  4. (d)

    \(\ell \le |A_{i}|\le 3\ell \) for all \(i\in [s]\).

Proof

We prove the result by induction on \(m=|V(T)|\). If \(\ell \le m\le 3\ell \), the trivial partition \(A_1=V(T)\) satisfies the conclusion of the lemma. Suppose then that \(m>3\ell \), and that the statement holds for all trees \(T'\) where \(\ell \le |V(T')|<m\). Let us choose an arbitrary root \(w\in V(T)\) for T. For each \(v \in V(T)\), we write \(T_v\) for the subtree of T rooted at v.

Let v be a vertex of maximal distance from w such that \(|V(T_{v})|\ge \ell \). Note that by our choice of v, \(|V(T_x)|<\ell \) for every child x of v. Then, there exists a subset of the children of v, \(X_0\subseteq V(T)\), such that \(\ell -1\le \sum _{x\in X_0}|V(T_x)|\le 2\ell -2\). Set \(A_1=\{v\}\cup \bigcup _{x\in X_0}V(T_{x})\), and note that \(\ell \le |A_1| \le 2\ell -1\) and that \(T[A_1]\) is connected. Set \(T'=T\setminus \bigcup _{x\in X_0}V(T_x)\), and note that \(T'\) is connected with \(|T|>|T'|\ge |T|-(2\ell -2)\ge \ell \). We may thus apply the induction hypothesis to \(T'\), producing \(A_2,\ldots , A_s\) satisfying properties (a) through (d) with respect to \(T'\).

Consider the sets \(A_1, A_2, \ldots , A_s\) with respect to T. Properties (a), (b) and (d) are clear from the above construction. Since \(V(T) \cap V(T') = \{v\}\), v is the only vertex that can be shared by \(A_1\) and any \(A_j\) with \(j>1\), and so property (c) is satisfied as well. \(\square \)

The following theorem will allow us to deduce the existence of a long cycle in a graph with good vertex-expansion.

Theorem 3.5

[52, Theorem 1] Let \(a\ge 1, b\ge 2\) be integers. Let G be a graph on more than a vertices satisfying

$$\begin{aligned} |N(S)|\ge b, \quad \text {for every } S\subseteq V(G) \text { with } \frac{a}{2}\le |S|\le a. \end{aligned}$$

Then G contains a cycle of length at least \(b+1\).

Given a discrete random variable X taking values in some range \(\mathcal {X}\), the entropy of X is given by

$$\begin{aligned} H(X){:}{=}\sum _{x\in \mathcal {X}}-p(x)\log _2p(x), \end{aligned}$$

where \(p(x){:}{=}\mathbb {P}(X=x)\) and we follow the convention that \(x\log _2x=0\) for \(x=0\). Given discrete random variables \(X_1,X_2,\ldots , X_t\), the joint entropy \(H(X_1,X_2,\ldots , X_t)\) is defined to be the entropy of the random vector \((X_1,X_2,\ldots , X_t)\). We denote by

$$\begin{aligned} H\left( X_1,X_2,\ldots , X_t|X_{t+1}\right) {:}{=}H\left( X_1,X_2,\ldots , X_{t+1}\right) -H\left( X_{t+1}\right) \end{aligned}$$

the conditional entropy of \((X_1,\ldots , X_t)\) given \(X_{t+1}\).

Remark 3.6

Observe that if \(X_1\) determines \(X_2\), then by definition

$$\begin{aligned} H(X_1|X_2)=H(X_1,X_2)-H(X_2)=H(X_1)-H(X_2). \end{aligned}$$
(2)

Furthermore, if \(X_3\) determines \(X_2\), then

$$\begin{aligned} H(X_1,X_2|X_3)=H(X_1,X_2,X_3)-H(X_3)=H(X_1,X_3)-H(X_3)=H(X_1|X_3). \end{aligned}$$
(3)

Finally, given a random vector \(X=(X_1,\ldots , X_t)\) and \(I\subseteq [t]\), we denote by \(X_I\) the random vector \((X_i)_{i\in I}\).

We will require the following property of the entropy function due to Shearer (see, for example, [7, Chap. 7]):

Lemma 3.7

(Shearer’s inequality). Let \(X_1, \ldots , X_t\) be discrete random variables and let \(\mathcal {A}\) be a collection of (not necessarily distinct) subsets of [t], such that each \(i\in [t]\) is in at least m members of \(\mathcal {A}\). Then

$$\begin{aligned} H(X_1,\ldots , X_t)\le \frac{1}{m}\sum _{A\in \mathcal {A}}H(X_A). \end{aligned}$$

Throughout the rest of the paper, unless explicitly mentioned otherwise, we assume that \(C>1\) and \({G=\square _{j=1}^tG^{(j)}}\) is a high-dimensional product graph, whose base graphs \(G^{(j)}\) are connected and \(d_j\)-regular with \(1<|V(G^{(j)}|\le C\). Without loss of generality we can assume that \(C{:}{=}C\left( G\right) =\max _{j\in [t]}\{|V(G^{(j)})|\}.\)

We follow the notation regarding product graphs as in [30]. Given a product graph \(G=\square _{j=1}^tG^{(j)}\), we call the \(G^{(j)}\) the base graphs of G. Given a vertex \(u = (u_1,u_2, \ldots , u_t)\) in V(G) and \(j \in [t]\) we call the vertex \(u_j\in V(G^{(j)})\) the j-th coordinate of G. Whenever confusion may arise, we will clarify whether the subscript stands for the enumeration of the vertices of the set, or for their coordinates. When \(G^{(j)}\) is a graph on a single vertex, that is, \(G^{(j)}=\left( \{u\},\varnothing \right) \), we call it trivial (and non-trivial, otherwise). We define the dimension of \(G=\square _{j=1}^tG^{(j)}\) to be the number of base graphs \(G^{(j)}\) of G which are non-trivial (we note that the dimension of G is not an invariant of G, and in fact depends on the choice of the base graphs). We note that G is also regular, and we write \(d{:}{=}\sum _{j=1}^t d_j\), which can be seen to be the degree of G, and let \(n{:}{=}|V(G)|\). Furthermore, we assume in what follows that \(\epsilon >0\) is a small enough constant, and let \(p=\frac{1+\epsilon }{d}\). We denote by \(G_p\) the graph obtained by retaining every edge of G independently with probability p.

Given a subgraph \(H\subseteq G\), we denote by d(H) the average degree of the subgraph H. Given two subsets \(A, B\subseteq V(G)\) with \(A\cap B=\varnothing \), we denote by e(AB) the number of edges between A and B. Furthermore, given a subset \(A\subseteq G\), we let \(e(A){:}{=}|E(G[A])|\). Finally, given a vertex \(v\in V(G)\) and a subset \(A\subseteq V(G)\), we denote by \(d_A(v)\) the number of neighbours of v in A.

We close this section with two lemmas about the structure of percolated product graphs. The first one is about large matchings in a random edge-subset, and is a fairly straightforward generalisation of Lemma 2.9 in [32].

Lemma 3.8

Let G be a d-regular graph. Let \(c_1>0\) and \(0<\delta <1\) be constants. Let \(s\ge c_1d\). Let \(F\subseteq E(G)\) be such that \(|F|\ge s\) and let \(q=\frac{\delta }{d}\). Then, there exists a constant \(c_2=c_2(c_1,\delta )\) such that \(F_{q}\), a random subset of F obtained by retaining each edge independently with probability q, contains a matching of size at least \(\frac{c_2s}{d}\) with probability at least \(1-\exp \left( -\frac{c_2s}{d}\right) \).

Proof

We may assume \(|F|=s\). If the matching number of \(F_q\) is less than \(\frac{c_2s}{d}\), then \(F_q\) contains a maximal (by inclusion) matching of size \(\ell < \frac{c_2s}{d}\). Let us then consider the number of maximal matchings in \(F_q\) of size \(\ell <\frac{c_2s}{d}\).

There are at most \(\left( {\begin{array}{c}|F|\\ \ell \end{array}}\right) =\left( {\begin{array}{c}s\\ \ell \end{array}}\right) \) maximal matchings of size \(\ell \) in F. Given a fixed matching M of size \(\ell \) in F, in order for it to be a maximal matching in \(F_q\) its edges have to be retained, which happens with probability \(q^\ell \), and there are no other edges in \(F_q\) which are disjoint from M. Since G is d-regular, there are at most \(2\ell d\) edges which share a vertex with edges in M. Hence, there is a set of at least \(|F|-2\ell d\) edges which do not appear in \(F_q\), which happens with probability at most

$$\begin{aligned} (1-q)^{|F|-2\ell d}\le \exp \left( -\frac{\delta s(1-2c_2)}{d}\right) . \end{aligned}$$

Therefore, by the union bound, the probability that \(F_q\) contains a maximal matching of size \(\ell <\frac{c_2s}{d}\) is at most

$$\begin{aligned} \sum _{\ell =0}^{\frac{c_2s}{d}}\left( {\begin{array}{c}s\\ \ell \end{array}}\right) \left( \frac{\delta }{d}\right) ^\ell \exp \left( -\frac{\delta s(1-2c_2)}{d}\right)&\le \exp \left( -\frac{\delta s(1-2c_2)}{d}\right) \left( 1+\sum _{\ell =1}^{\frac{c_2s}{d}}\left( \frac{e\delta s}{d\ell }\right) ^{\ell }\right) . \end{aligned}$$

Since \(s\ge c_1d\) and for \(c_2=c_2(c_1,\delta )\) small enough in terms of \(c_1\) and \(\delta \), the ratio of consecutive terms \(\left( \frac{e\delta s}{d\ell }\right) ^{\ell }\) is at least 2, and hence the sum is dominated by the final term. Therefore,

$$\begin{aligned} \exp \left( -\frac{\delta s(1-2c_2)}{d}\right) \left( 1+\sum _{\ell =1}^{\frac{c_2s}{d}}\left( \frac{e\delta s}{d\ell }\right) ^{\ell }\right)&\le 3\exp \left( -\frac{\delta s(1-2c_2)}{d}\right) \left( \frac{e\delta }{c_2}\right) ^{\frac{c_2s}{d}}\\&\le \exp \left( -\frac{c_2s}{d}\right) , \end{aligned}$$

for small enough \(c_2\). \(\square \)

The second result bounds the typical number of high-degree vertices in \(G_p\).

Lemma 3.9

Whp, there are at most \(\frac{n}{d^4}\) vertices of degree at least \(\ln d\) in \(G_p\).

Proof

Fix a vertex \(v\in V(G)\). The degree of v in \(G_p\) is distributed according to Bin(dp). Thus, by Lemma 3.1,

$$\begin{aligned} \mathbb {P}\left( d_{G_p}(v)\ge \ln d\right) \le \left( \frac{e(1+\epsilon )}{\ln d}\right) ^{\ln d}\le d^{-\frac{\ln \ln d}{2}}. \end{aligned}$$

Hence, the expected number of vertices in \(G_p\) with degree at least \(\ln d\) is at most \(nd^{-\frac{\ln \ln d}{2}}\). Therefore, by Markov’s inequality, whp there are at most \(\frac{n}{d^4}\) vertices of degree at least \(\ln d\) in \(G_p\). \(\square \)

4 Isoperimetric Inequalities

The proofs of Theorems 1 and 2 will both use discrete entropy as a tool, but in quite different ways. For the proof of Theorem 1, we require the following lemma bounding the entropy of a random variable from below.

Lemma 4.1

Let \(C\ge 2\) be an integer and let X be a random variable supported on [C]. For each \(i\in [C]\), let \(p(i){:}{=}\mathbb {P}(X=i)\). Assume without loss of generality that \(p(1) \le p(2) \le \ldots \le p(C)\). Then

$$\begin{aligned} \frac{C}{C-1}\left( 1 - p(C)\right) \le H(X). \end{aligned}$$

Proof

We prove the result by induction on C. For \(C=2\) we note that \(0 \le p(1) \le p(2)\) and \(p(1) + p(2) =1\), and so in particular \(p(1)p(2) \le \frac{1}{4}\). It follows that

$$\begin{aligned} H(X)&= p(1) \log _2 \frac{1}{p(1)} + p(2)\log _2 \frac{1}{p(2)} \ge p(1) \left( \log _2 \frac{1}{p(1)} + \log _2 \frac{1}{p(2)}\right) \\&= p(1) \left( \log _2 \frac{1}{p(1)p(2)}\right) \ge p(1) \log _2 4 \ge 2p(1) = 2\left( 1-p(2)\right) . \end{aligned}$$

Suppose that \(C >2\). Let Y be the indicator random variable of the event that \(X=C\). Note that because X determines Y, by Remark 3.6,

$$\begin{aligned} H(X)=H(X,Y)=H(Y)+H(X|Y). \end{aligned}$$

Let \(q(1){:}{=}\mathbb {P}(Y=1) = p(C)\) and \(q(0){:}{=}\mathbb {P}(Y=0) = \sum _{i=1}^{C-1} p(i)\).

If \(q(1) \ge q(0)\), then by the induction hypothesis applied to Y we can conclude that

$$\begin{aligned} H(X) \ge H(Y) \ge \frac{C}{C-1}\left( 1 - q(1)\right) = \frac{C}{C-1}\left( 1-p(C)\right) , \end{aligned}$$

as claimed.

Otherwise, again by the induction hypothesis applied to Y, we have that

$$\begin{aligned} H(Y)\ge \frac{C}{C-1}\left( 1-q(0)\right) . \end{aligned}$$

Thus, we obtain that

$$\begin{aligned} H(X)&= H(Y) + H(X|Y) \\&\ge \frac{C}{C-1}(1-q(0)) + \mathbb {P}(Y=1)H(X|Y=1) + \mathbb {P}(Y=0)H(X|Y=0). \end{aligned}$$

However, on the event \(\{Y=1\}\) we have \(X=C\), and so the second term is 0, and by the induction hypothesis applied to the random variable X conditional on \(\{Y=0\}\), which is supported on \([C-1]\), we can conclude that

$$\begin{aligned} H(X|Y=0) \ge \frac{C-1}{C-2}\left( 1 - \frac{p(C-1)}{q(0)}\right) . \end{aligned}$$

It follows that

$$\begin{aligned} H(X)&\ge \frac{C}{C-1}\left( 1-q(0)\right) + \frac{C-1}{C-2}q(0)\left( 1 - \frac{p(C-1)}{q(0)}\right) \\&=\frac{C}{C-1}-\frac{C}{C-1}q(0) +\frac{C-1}{C-2}q(0)-\frac{C-1}{C-2}p(C-1)\\&\ge \frac{C}{C-1}\left( 1-p(C)\right) . \end{aligned}$$

\(\square \)

An immediate corollary of Lemma 4.1 is the following inequality which is key to the proof of Theorem 1.

Corollary 4.2

Let \(C\ge 2\) be an integer and let \(0\le k_1\le \cdots \le k_C\) and \(k = \sum _{i=1}^C k_i\). Then

$$\begin{aligned} \frac{C}{C-1}\left( k - k_C\right) + \sum _{i=1}^C k_i \log _2 k_i\le k \log _2 k. \end{aligned}$$

Proof

Let X be a random variable supported on [C] with \(p(i) =\mathbb {P}(X=i)= \frac{k_i}{k}\) for each \(i\in [C]\). Then, by the previous lemma,

$$\begin{aligned} \frac{C}{C-1}\left( 1 - \frac{k_C}{k}\right) \le H(X)&= \sum _{i=1}^C \frac{k_i}{k} \log _2 \frac{k}{k_i} \\&= \sum _{i=1}^C \frac{k_i}{k} \log _2 k - \sum _{i=1}^C \frac{k_i}{k} \log _2 k_i\\&= \log _2 k - \sum _{i=1}^C \frac{k_i}{k} \log _2 k_i, \end{aligned}$$

which rearranges to give the claimed inequality. \(\square \)

As will be seen in the proof of Theorem 1, the inequality proven in Corollary 4.2 allows us to inductively bound the density of certain sets by considering an appropriate collection of projections. Using the regularity of the graph we can relate this density bound to an isoperimetric inequality.

Proof of Theorem 1

Let \(k{:}{=}|S|\), and we may assume that \(k\ge 2\). We claim that

$$\begin{aligned} \sum _{v\in S}d_{G[S]}(v)\le (C-1)k\log _2k. \end{aligned}$$
(4)

Then, assuming that (4) holds, since G is d-regular we obtain that

$$\begin{aligned} |\partial S|=|S|\left( d-d(G[S])\right) \ge k(d-(C-1)\log _2k), \end{aligned}$$

as required.

We prove (4) by induction on the dimension t of the product graph G. For \(t=1\), since \(2\le k \le C\), we indeed have that

$$\begin{aligned} \sum _{v\in S}d_{G[S]}(v)\le \frac{k(k-1)}{2}\le (C-1)k\log _2k. \end{aligned}$$

Assume that (4) holds for all graphs of dimension \(t'<t\). We may assume that, without loss of generality, \(V(G^{(1)})=\{v_1,\ldots ,v_C\}\). Let \(H_1,\ldots , H_C\) be pairwise disjoint projections of G, such that \(H_i\) is obtained by fixing the first coordinate of G to be \(v_i\in V(G^{(1)})\). Let \(S_i=S\cap V(H_i)\) and set \(k_i{:}{=}|S_i|\). Note that we have \(\sum _{i=1}^Ck_i=k\), and we may assume without loss of generality that \(k_1\le k_2 \le \ldots \le k_C\). Since each \(H_i\) has dimension \(t-1\), by the induction hypothesis, for all \(1\le i \le C\),

$$\begin{aligned} \sum _{v\in S_i}d_{G[S_i]}(v)=\sum _{v\in S_i}d_{H_i[S_i]}(v)\le (C-1)k_i\log _2k_i. \end{aligned}$$

Furthermore, observe that each vertex in \(H_i\) has at most one neighbour in each \(H_j\) for \(j\ne i\). In particular, since \(k_1\le k_2\le \ldots \le k_C\), it follows that \(e(S_i, S_j)\le k_i\) whenever \(i\le j\). Thus,

$$\begin{aligned} \sum _{v\in S}d_{G[S]}(v)&=\sum _{i=1}^C\left( \sum _{v\in S_i}d_{G[S_i]}(v)+\sum _{j\ne i}e(S_i,S_j)\right) \\&\le \sum _{i=1}^C\left( (C-1)k_i\log _2k_i+(C-i)k_i+\sum _{j< i}k_j\right) \\&\le \sum _{i=1}^C\Bigg ((C-1)k_i\log _2k_i+(C-i)k_i+(i-1)k_{i-1}\Bigg )\\&\le C\left( k-k_C\right) +(C-1)\sum _{i=1}^Ck_i\log _2k_i. \end{aligned}$$

Therefore, we have by the above and by Corollary 4.2 that

$$\begin{aligned} \sum _{v\in S}d_G[S](v)\le (C-1)\left( \frac{C}{C-1}\left( k-k_C\right) +\sum _{i=1}^Ck_i\log _2k_i\right) \le (C-1)k\log _2k, \end{aligned}$$

as claimed. \(\square \)

The proof of Theorem 2 will also utilise the entropy function, specifically Shearer’s Lemma (Lemma 3.7) in a key way.

Proof of Theorem 2

Given \(S\subseteq V(G)\), let X be a uniformly distributed random variable on S, so that \({H(X)=\log _2|S|}\). Observe that we may consider X as a random vector \(X=(X_1,\ldots , X_t)\), where the random variables \(X_i\) are given by the coordinates of the vertex \(X\in V(G^{(1)})\times \cdots \times V(G^{(t)})\). For each \(i\in [t]\) let \(A_{-i}{:}{=}[t]\setminus \{i\}\) and let us set

$$\begin{aligned} X_{-i}{:}{=}X_{A_{-i}} = (X_1,\ldots , X_{i-1},X_{i+1},\ldots , X_t). \end{aligned}$$

Note that each \(i\in [t]\) appears in exactly \(t-1\) members of the family \(\mathcal {A}=\left\{ A_{-i}:i\in [t]\right\} \).

Thus, by Lemma 3.7,

$$\begin{aligned} H(X)\le \frac{1}{t-1}\sum _{i=1}^tH(X_{-i}). \end{aligned}$$
(5)

Therefore, observing that X determines \(X_{-i}\) and \(X_i\), we have by the above and by Remark 3.6 that

$$\begin{aligned} H(X)&{\mathop {\ge }\limits ^{(5)}} \sum _{i=1}^t\left( H(X)-H(X_{-i})\right) {\mathop {=}\limits ^{(2)}}\sum _{i=1}^tH(X|X_{-i})= \sum _{i=1}^tH(X_i,X_{-i}|X_{-i})\nonumber \\&{\mathop {=}\limits ^{(3)}}\sum _{i=1}^tH(X_i|X_{-i}). \end{aligned}$$
(6)

By definition,

$$\begin{aligned} H(X_i|X_{-i})=\sum _{x_{-i}}\mathbb {P}(X_{-i}=x_{-i})H(X_i|X_{-i}=x_{-i}) {=}{:}\sum _{x_{-i}} w(x_{-i}), \end{aligned}$$
(7)

where the sum ranges over the vectors \(x_{-i}\) in the range of \(X_{-i}\).

Given such a point \(x_{-i}\), there are \(1\le r(x_{-i})\le C_i{:}{=}|V(G^{(i)})|\) vertices in S whose projection is \(x_{-i}\), where \(\mathbb {P}(X_{-i}=x_{-i})=\frac{r(x_{-i})}{|S|}\). Then, since X is uniformly distributed on S,

$$\begin{aligned} H(X_i|X_{-i}=x_{-i})=\log _2r(x_{-i}). \end{aligned}$$

It follows that for each \(x_{-i}\),

$$\begin{aligned} w(x_{-i}) = \frac{r(x_{-i})\log _2r(x_{-i})}{|S|} \le \frac{r(x_{-i})\log _2C_i}{|S|} {=}{:}w'(x_{-i}), \end{aligned}$$

with equality if and only if \(r(x_{-i})=C_i\).

However, since each \(G^{(i)}\) is connected, for each \(x_{-i}\) in the range of \(X_{-i}\) where \(r(x_{-i})<C_i\) there is at least one edge in the edge-boundary of S in direction i. In particular, there are at most \(|\partial _i(S)|\) many vectors \(x_{-i}\) such that \(r(x_{-i})<C_i\), where \(\partial _i(S)\) denotes the edges in the edge-boundary of S in the ith direction, that is, that are obtained by changing the ith coordinate of some \(v\in S\). Furthermore, for each \(x_{-i}\) with \(r(x_{-i})<C_i\),

$$\begin{aligned} w'(x_{-i}) - w(x_{-i}) \le w'(x_{-i}) \le \frac{(C-1)\log _2C}{|S|}. \end{aligned}$$

Thus, by (7)

$$\begin{aligned} H(X_i|X_{-i})&=\sum _{x_{-i}} w(x_{-i}) \nonumber \\&= \sum _{x_{-i}} w'(x_{-i}) + \sum _{\begin{array}{c} x_{-i} \\ r(x_{-i}) < C_i \end{array}} \big ( w(x_{-i}) - w'(x_{-i})\big ) \nonumber \\&\ge \log _2C_i-|\partial _i(S)|\frac{(C-1)\log _2C}{|S|}. \end{aligned}$$
(8)

Therefore, by (6) and (8),

$$\begin{aligned} \log _2|S|=H(X)&\ge \sum _{i=1}^tH(X_i|X_{-i})\\&\ge \sum _{i=1}^t\left( \log _2C_i-|\partial _i(S)|\frac{(C-1)\log _2C}{|S|}\right) \\&\ge \log _2|V(G)|-|\partial (S)|\frac{(C-1)\log _2C}{|S|}. \end{aligned}$$

Rearranging, we obtain

$$\begin{aligned} \frac{|\partial (S)|}{|S|}\ge \frac{\log _2|V(G)|-\log _2|S|}{(C-1)\log _2C}=\frac{1}{C-1}\log _C\left( \frac{|V(G)|}{|S|}\right) , \end{aligned}$$

as claimed. \(\square \)

5 Expansion and Expanders

We begin with the proof of the first part of Theorem 4. We note that the proof includes several elements similar to the proof of Lemma 3.8.

Proof of Theorem 4(a)

We will assume that \(c \le \epsilon ^4\). Given \(\frac{7Cd}{\epsilon ^2}\le k \le n^{\epsilon ^5}\), let \(\mathcal {A}_k\) be the event there exists a set \(S \subseteq V(L_1)\) of order k such that S is connected in \(G_p\) and \(|N_{G_p}(S)| < c |S|\). Since S is connected in \(G_p\) it contains a spanning tree. Therefore, if \(\mathcal {A}_k\) occurs, then there is some tree T whose vertex set is S, all of whose edges are in \(G_p\). By Lemma 3.3, there are at most \(n(ed)^{k-1}\) ways to choose the tree T, and the edges of T are present in \(G_p\) with probability \(p^{k-1}\).

Now, consider the auxiliary random bipartite graph \(\Gamma (S,p)\), whose one side is S, the other side is \(N_G(S)\), and we retain every edge of G between S and \(N_G(S)\) in \(\Gamma (S,p)\) independently with probability p. We then have that \(|N_{G_p}(S)|\ge \nu \left( \Gamma (S,p)\right) \), where \(\nu (H)\) is the matching number of H. Thus, it suffices to bound the probability that a maximum matching in \(\Gamma (S,p)\) is smaller than \(\epsilon ^4k\), that is,

$$\begin{aligned} \mathbb {P}\left( \mathcal {A}_k\right)&\le \sum _{\begin{array}{c} S \subseteq V(G), |S|=k\\ T \text { a tree} ,V(T)=S \end{array}}\mathbb {P}\left( \left( E(T)\subseteq E(G_p)\right) \wedge \left( \nu \left( \Gamma (S,p)\right) \le \epsilon ^4k\right) \right) \nonumber \\&\le n(edp)^{k-1}\mathbb {P}\left( \nu \left( \Gamma (S,p)\right) \le \epsilon ^4k\right) . \end{aligned}$$
(9)

Let us first bound the probability that \(\nu \left( \Gamma (S,p)\right) =i\). This is, at most, the probability that \(\Gamma (S,p)\) has an inclusion-maximal matching of size i. We have at most \(\left( {\begin{array}{c}kd\\ i\end{array}}\right) \) ways to choose a matching M of size i, and we then need to include the edges of the matching, which occurs with probability \(p^i\). Due to the maximality of M, every edge of G between S and \(N_G(S)\) disjoint from M is not in \(\Gamma (S,p)\). Thus, we have at least \(|\partial (S)|-2id\) edges that do not fall into \(\Gamma (S,p)\). Since \(n \le C^d\), by Theorem 1

$$\begin{aligned} |\partial (S)| \ge k\left( d-(C-1)\log _2k\right) \ge k\left( d-(C-1)\cdot \log _2C\cdot \epsilon ^5d\right) \ge (1-\epsilon ^4)kd. \end{aligned}$$

Hence, by the union bound,

$$\begin{aligned} \mathbb {P}\left( \nu \left( \Gamma (S,p)\right) =i\right) \le \left( {\begin{array}{c}kd\\ i\end{array}}\right) p^i(1-p)^{(1-\epsilon ^4)kd-2id}. \end{aligned}$$

All in all, we obtain that

$$\begin{aligned} \mathbb {P}\left( \mathcal {A}_k\right)&\le n(ed)^{k-1}p^{k-1}\sum _{i=0}^{\epsilon ^4k}\left( {\begin{array}{c}kd\\ i\end{array}}\right) p^i(1-p)^{(1-\epsilon ^4)kd-2id}\\&=n(edp)^{k-1}(1-p)^{(1-\epsilon ^4)kd}\sum _{i=0}^{\epsilon ^4k}\left( {\begin{array}{c}kd\\ i\end{array}}\right) p^i(1-p)^{-2id}\\&\le n\left( (1+\epsilon )\exp \left( 1-(1+\epsilon )(1-\epsilon ^4)\right) \right) ^k\\ {}&\quad \times \left( 1+\sum _{i=1}^{\epsilon ^4k}\left( \frac{k(1+\epsilon )e}{i}\right) ^i\exp \left( 2(1+\epsilon )i\right) \right) \\&\le n\left( (1+\epsilon )\exp \left( -\epsilon +2\epsilon ^4\right) \right) ^k\left( 1+\sum _{i=1}^{\epsilon ^4k}\left( \frac{e^4k}{i}\right) ^i\right) . \end{aligned}$$

Observe that the ratio of consecutive terms of \(\left( \frac{e^4k}{i}\right) ^i\) is at least 2, and hence the sum is dominated by the last term. That is,

$$\begin{aligned} \mathbb {P}\left( \mathcal {A}_k\right)&\le 2n\left( (1+\epsilon )\exp \left( -\epsilon +2\epsilon ^4\right) \right) ^k\left( \frac{e^4}{\epsilon ^4}\right) ^{\epsilon ^4k}\\&\le 2n\left( (1+\epsilon )\exp \left( -\epsilon +\epsilon ^3\right) \right) ^k. \end{aligned}$$

Using \(1+x\le \exp \left( x-\frac{x^2}{3}\right) \) for small enough \(x>0\), together with \(\ln n \le \ln C \cdot d\) (since \(n\le C^t \le C^d\)) and our assumption that \(k\ge \frac{9\ln c \cdot d}{\epsilon ^2}\), we obtain that

$$\begin{aligned} \mathbb {P}\left( \mathcal {A}_k\right) \le 3n\exp \left( -\frac{\epsilon ^2k}{4}\right) =o(1/n). \end{aligned}$$

Taking a union bound over the less than n different values of k completes the proof. \(\square \)

Throughout the rest of the section, we assume that \(\epsilon >0\) is a small enough constant and let \(\delta =\delta (\epsilon )\le \epsilon ^3\) be a positive constant. We define \(p_2=\frac{\delta }{d}\) and let \(p_1\) be such that \((1-p_1)(1-p_2)=1-p\). We form \(G_{p_i}\), \(i\in \{1,2\}\), by including every edge of G independently and with probability \(p_i\). We set \(G_1=G_{p_1}\) and \(G_2=G_{p_2}\cup G_1\), so that \(G_2\) has the same distribution as \(G_p\). We note that by Theorem 1.2, whp \(G_{p_1}\) has a unique giant component, which we denote by \(L_1'\), and that whp \(G_{p}\) has a unique giant component which we denote by \(L_1\), where \(L_1'\subseteq L_1\).

5.1 Expansion of Subsets of the Early Giant

We begin by showing likely expansion properties of subsets of the early giant. We will require the following density result.

Lemma 5.1

(Lemma 4.7 in [29], rephrased). There exists a constant \(c=c(\epsilon )>0\) such that whp every \(v\in V(G)\) is at distance (in G) at most two from at least \(cd^2\) vertices in \(L_1'\).

The following lemma, which uses Lemma 5.1 together with an edge-isoperimetric inequality for G (Theorem 2) and a result on large matchings in a random edge-subset of G (Lemma 3.8), gives a good bound on the probability that subsets of the early giant expand well after sprinkling.

Lemma 5.2

There exists a constant \(c=c(\delta )>0\) such that the following holds. Let \(A\cup B=V(L_1')\) be a partition of \(V(L_1')\) with \(\min \left\{ |A|,|B|\right\} =k\). Then, with probability at least

$$\begin{aligned} 1-\exp \left( -\frac{ck\ln \left( \frac{n}{k}\right) }{d}\right) , \end{aligned}$$

there exists a family of \(\frac{ck\ln \left( \frac{n}{k}\right) }{d}\) vertex-disjoint \(A-B\) paths of length at most five in \(G_{p_2}\).

Proof

By Lemma 5.1, there exists a constant \(c'>0\) such that whp every \(v\in V(G)\) is at distance (in G) at most two from at least \(c'd^2\) vertices in \(L_1'\). We work on the event that every \(v\in V(G)\) is at distance (in G) at most two from at least \(c'd^2\) vertices in \(L_1'\).

Throughout the proof, we will introduce constants \(c_1\) up to \(c_8\), under the assumption that each \(c_i\) is sufficiently small in terms of \(\delta \) and all \(c_j\) with \(j <i\).

By assumption, every \(v\in V\) is at distance at most two from at least \(c'd^2\) vertices in \(L_1'\). Let us now define four sets inductively:

$$\begin{aligned} A_1&{:}{=}\left\{ v\in V\setminus (B\cup A):d_A(v)\ge \frac{c'd}{10}\right\} , \\ B_1&{:}{=}\left\{ v\in V\setminus (B\cup A\cup A_1):d_B(v)\ge \frac{c'd}{10}\right\} ,\\ A_2&{:}{=}\left\{ v\in V\setminus (B\cup A\cup A_1 \cup B_1):d_{A_1}(v)\ge \frac{c'd}{10}\right\} ,\\ B_2&{:}{=}\left\{ v\in V\setminus (B\cup A\cup A_1 \cup B_1 \cup A_2):d_{B_1}(v)\ge \frac{c'd}{10}\right\} . \end{aligned}$$
Fig. 1
figure 1

An illustration of the sets and matchings in Lemma 5.2. The matchings \(M_1\) through \(M_5\), in purple, are ordered according to the order they are constructed in the proof. In dark blue, one can see the properties of vertices in \(A_2, B_2, A_1\) and \(B_1\), with respect to their set of neighbours in \(A_1, B_1, A\) and B, respectively. Observe that if the first matching \(M_1\) had many endpoints in \(A'\setminus A_2\) (or \(B'\setminus B_2\)), we could continue in the same manner with fewer matchings required

Let us set \(A'=A\cup A_1\cup A_2\), and \(B'=B\cup B_1\cup B_2\). Observe that \(V=A'\sqcup B'\). Indeed, it is clear by the definition of the sets that \(A'\cap B'=\varnothing \). Suppose towards contradiction that \(v\notin A'\cup B'\), and let us consider the number of vertices in \(L_1'\) that are the endpoints of paths of length at most two starting from v. There are at most d vertices in \(L_1'\) that are neighbours of v. As for paths of length exactly two, they are of the form vux. If \(u\in A_1\), then since \(v\notin A'\), and in particular \(v\notin A_2\), we have at most \(\frac{c'd^2}{10}\) possible choices for the path. Similarly, if \(u\in B_1\), then since \(v\notin B'\), and in particular \(v\notin B_2\), we have at most \(\frac{c'd^2}{10}\) possible choices for the path. Finally, if \(u\notin A_1\cup B_1\), since \(v\notin A_1\cup B_1\), we have at most \(\frac{c'd^2}{5}\) possible choices for the path. Altogether, we have at most \(\frac{2c'd^2}{5}+d<c'd^2\) vertices in \(L_1'\) that are at distance at most two from v—a contradiction.

Since \(A'\sqcup B'=V\), by Theorem 2,

$$\begin{aligned} e(A',B')\ge \frac{k}{C-1} \log _C \left( \frac{n}{k}\right) \ge c_1k\ln \left( \frac{n}{k}\right) {=}{:}s. \end{aligned}$$

By Lemma 3.8, with probability at least \(1-\exp \left( -\frac{c_2s}{d}\right) \), there exists a matching of size at least \(\frac{c_2s}{d}\) between \(A'\) and \(B'\) in \(G_{p_2}\). We continue under the assumption that at least \(\frac{c_2s}{3d}\) of the edges in the matching have endpoints in both \(A_2\) and \(B_2\), as the other cases follow more easily, with fewer matching edges required (see Figure 1 for an illustration). Let us denote these endpoints of the matching by \(\tilde{A}_2\) and \(\tilde{B}_2\), respectively.

Now, every \(v\in A_2\), and in particular in \(\tilde{A}_2\), has at least \(\frac{c'd}{10}\) neighbours in \(A_1\). Hence, with probability at least \(1-\exp \left( -\frac{c_2s}{d}\right) \) we have a set of at least \(\frac{c_2s}{3d}\cdot \frac{c'd}{10}=c_3s\) edges between \(\tilde{A}_2\) and \(A_1\). Thus, by Lemma 3.8, with probability at least \(1-\exp \left( -\frac{c_4s}{d}\right) \) there exists a matching of size at least \(\frac{c_4s}{d}\) between \(\tilde{A}_2\) and \(A_1\). Denote by \(\tilde{A}_2\) and \(\tilde{A}_1\) the corresponding vertices in \(\tilde{A}_2\) and \(A_1\) of this matching. Since every \(v\in A_1\), and in particular in \(\tilde{A}_1\), has at least \(\frac{c'd}{10}\) neighbours in A, with probability at least \(1-\exp \left( -\frac{c_4s}{d}\right) \) there are at least \(\frac{c_4s}{d}\cdot \frac{c'd}{10}=c_5s\) edges between \(\tilde{A}_1\) and A. Once again, by Lemma 3.8, with probability at least \(1-\exp \left( -\frac{c_6s}{d}\right) \), there exists a matching of size at least \(\frac{c_6s}{d}\) between \(\tilde{A}_1\) and A. Denote the endpoints of this matching in A by \(\tilde{A}\). Altogether, we obtain with probability at least \(1-\exp \left( -\frac{c_{7}s}{d}\right) \) a family of at least \(\frac{c_7s}{d}\) vertex-disjoint paths of length three between \(\tilde{B}_2\subseteq B_2\) and \(\tilde{A}\subseteq A\).

Working similarly in \(B'\), we define \(\tilde{B}_1\subseteq B_1\) and \(\tilde{B}\subseteq B\), and find with probability at least \(1-\exp \left( -\frac{c_8s}{d}\right) \) a family of at least \(\frac{c_8s}{d}\) vertex-disjoint paths of length at most five, starting from \(\tilde{A}\subseteq A\), going through \(\tilde{A}_1\subseteq A_1\), \(\tilde{A}_2\subseteq A_2\), \(\tilde{B}_2\subseteq B_2\), and \(\tilde{B}_1\subseteq B_1\) to \(\tilde{B}\subseteq B\) (see Figure 1 for an illustration). Choosing \(c\le c_8\) completes the proof. \(\square \)

The following lemma is then key to the proof of Theorem 3. We effectively enumerate the number of subsets of \(L'_1\) which do not expand well using a novel double-counting argument to enumerate them in terms of their boundaries, which by assumption are significantly smaller than the sets themselves. This allows us to apply the probability bound from Lemma 5.2 to conclude that whp all subsets of \(L'_1\) expand relatively well after sprinkling.

Lemma 5.3

There exists a constant \(c=c(\delta )>0\) such that whp for any \(S\subseteq V(L_1')\) the following hold.

  1. (a)

    If \(\frac{n}{d}\le |S|\le \frac{3\epsilon n}{2}\), then either

    $$\begin{aligned} |N_{L_1'}(S)|\ge \frac{c|S|}{d\ln d}, \end{aligned}$$

    or there exists a family of at least \(\frac{c|S|}{d}\) vertex disjoint paths of length at most five between S and \(V(L_1')\setminus S\) in \(G_{p_2}\);

  2. (b)

    If \(|S|=\omega (d)\) and \(|S|\le \frac{3\epsilon n}{2}\), then either

    $$\begin{aligned} |\partial _{L_1'}(S)|\ge \frac{c|S|\ln \left( \frac{n}{|S|}\right) }{d\ln d}, \end{aligned}$$

    or there exists a family of at least \(\frac{c|S|}{d}\) vertex disjoint paths of length at most five between S and \(V(L_1')\setminus S\) in \(G_{p_2}\).

We note that the assumption that \(|S|=\omega (d)\) in 5.3(b) can be strengthened, however it suffices for our usage and allows for a simpler proof.

Proof

We argue via two-round exposure, beginning by exposing \(G_{p_1}\). By Lemma 5.1, whp every \(v\in V(G)\) is at distance at most two from at least \(c_1d^2\) vertices in \(L_1'\), for some \(c_1=c_1(\epsilon ,\delta )>0\). Finally, by Lemma 3.9, whp there are at most \(\frac{n}{d^4}\) vertices with degree larger than \(\ln d\). We continue assuming that these properties hold deterministically.

We begin with part (a). Given \(\frac{n}{d}\le |S|\le \frac{3\epsilon n}{2}\), let \(k{:}{=}|S|\) and let \(b_1{:}{=}|N_{L_1'}(S)|\). As we aim to bound the expansion of the set S, we may assume that \(b_1< \frac{ck}{d\ln d}\), as otherwise the claim holds. In order to facilitate a union bound argument, let us estimate the number of subsets S of size k in \(L_1'\) such that \(|N_{L_1'}(S)|=b_1\). Let \(e_1 = \partial _{L_1'}(N_{L_1'}(S))\). Since there are at most \(\frac{n}{d^4}\) vertices with degree larger than \(\ln d\), we have that \(e_1\le \frac{n}{d^3}+b_1\ln d\). Since \(L_1'\) is connected, there are at most \(e_1+1\) components in \(L_1'\setminus N_{L_1'}(S)\). Furthermore, since S has no neighbours outside \(N_{L_1'}(S)\), it must be the union of components in \(L_1'\setminus N_{L_1'}(S)\). Hence, the number of ways to choose such an S is at most \(\left( {\begin{array}{c}n\\ b_1\end{array}}\right) \cdot 2^{e_1+1}\). Thus, there are at most

$$\begin{aligned} \sum _{b_1=1}^{\frac{ck}{d\ln d}}\left( {\begin{array}{c}n\\ b_1\end{array}}\right) 2^{\frac{n}{d^3}+b_1\ln d+1}&\le \left( \frac{en}{\frac{ck}{d\ln d}}\right) ^{\frac{ck}{d\ln d}}2^{\frac{2ck}{d}}\\&\le \exp \left( \frac{ck}{d\ln d}\left( \ln \left( \frac{end\ln d}{ck}\right) +2\ln d\right) \right) \\&\le \exp \left( \frac{2ck}{d\ln d}\left( \ln \left( \frac{n}{k}\right) +2\ln d\right) \right) \le \exp \left( \frac{6ck}{d}\right) \end{aligned}$$

sets \(S\subseteq V(L_1')\) with \(|N_{L_1'}(S)|<\frac{ck}{d\ln d}\), where we used the fact that \(k\ge \frac{n}{d}\) in the first and last inequalities.

We now turn to facilitate a union bound argument for part (b). Given \(S\subseteq V(L_1')\) with \(|S|=k\le \frac{3\epsilon n}{2}\), we may assume that \(|\partial _{L_1'}(S)|<\frac{ck\ln \left( \frac{n}{k}\right) }{d\ln d}\), as otherwise the claim holds. Let us then estimate the number of sets S such that \(|\partial _{L_1'}(S)|<\frac{ck\ln \left( \frac{n}{k}\right) }{d\ln d}\).

Let \(e_2{:}{=}|\partial _{L_1'}(S)|<\frac{ck\ln \left( \frac{n}{k}\right) }{d\ln d}\) and let \(b_1{:}{=}|N_{L_1'}(S)|\) as before. If we write m for the number of components in \(G_{p_1}[S]\), then, since \(L_1'\) is connected, \(m \le e_2+1\). Moreover, since S has no neighbours outside \(N_{L_1'}(S)\), it must be the union of precisely m components of \(L'_1 \setminus N_{L_1'}(S)\).

Hence, since \(L'_1 \setminus N_{L_1'}(S)\) has at most n components, the number of ways to choose such an S is at most \(\left( {\begin{array}{c}n\\ b_1\end{array}}\right) \left( {\begin{array}{c}n\\ m\end{array}}\right) \). Thus, since \(b_1 \le e_2\), there are at most

$$\begin{aligned} \sum _{b_1=1}^{e_2}\sum _{m=1}^{e_2+1}\left( {\begin{array}{c}n\\ b_1\end{array}}\right) \left( {\begin{array}{c}n\\ m\end{array}}\right)&\le \left( \frac{en}{\frac{ck\ln \left( \frac{n}{k}\right) }{d\ln d}}\right) ^{2\frac{ck\ln \left( \frac{n}{k}\right) }{d\ln d}}\le \exp \left( \frac{2ck\ln \left( \frac{n}{k}\right) }{d\ln d}\ln \left( \frac{end\ln d}{ck\ln \left( \frac{n}{k}\right) }\right) \right) \\&\le \exp \left( \frac{2ck}{d}\cdot \frac{\ln \left( \frac{n}{k}\right) }{\ln d}\cdot \left( \ln \left( \frac{n}{k}\right) +2\ln \left( \frac{d\ln d}{\ln \left( \frac{n}{k}\right) }\right) \right) \right) \\&\le \exp \left( \frac{3ck}{d}\right) \end{aligned}$$

sets \(S\subseteq V(L_1')\) with \(|\partial _{L_1'}(S)|<\frac{ck\ln \left( \frac{n}{k}\right) }{d\ln d}\).

Fix \(S\subseteq V(L_1')\) with \(|S|=k\). By Lemma 5.2, with probability at least \(1-\exp \left( -\frac{c_1k\ln \left( \frac{n}{k}\right) }{d}\right) \), there exists a family of at least \(\frac{c_1k\ln \left( \frac{n}{k}\right) }{d}\) vertex disjoint paths of length at most five between S and \(V(L_1')\setminus S\) in \(G_{p_2}\), where \(c_1\) is the constant from Lemma 5.2. We note that we used our assumption that every \(v\in V(G)\) is at distance at most two from at least \(c'd^2\) vertices in \(L_1'\) in order to invoke Lemma 5.2.

Recalling that \(k\le \frac{3\epsilon n}{2}\), the probability there is a set S violating the statement of part (a) is then at most

$$\begin{aligned} \exp \left( \frac{6ck}{d}-\frac{c_1k\ln \left( \frac{n}{k}\right) }{d}\right) \le \exp \left( \frac{k}{d}\left( 6c-c_1\right) \right) . \end{aligned}$$

Once again, the probability that there is a set S violating the statement of part (b) is at most

$$\begin{aligned} \exp \left( \frac{3ck}{d}-\frac{c_1k}{d}\right)&=\exp \left( \frac{k}{d}\left( 3c-c_1\right) \right) . \end{aligned}$$

Under our assumption that \(k{:}{=}|S|=\omega (d)\) and for c small enough with respect to \(c_1\), by the union bound the probability having a set S violating the statement of part (a) or (b) is o(1). \(\square \)

5.2 Structure of Subsets in the Residue

As we mentioned, we also require some control over the typical structure of subsets in the residue \(L_1 - L'_1\) and their likely expansion into the early giant after sprinkling. Let us begin with the following lemma, showing how the vertices in \(L_1'\) are embedded in \(L_1\), which generalises Lemma 3.2 in [32]. Given a vertex \(v\in V(L_1')\), let \(C_v\) be the set of vertices which are contained in components of \(L_1-L_1'\), such that there is a vertex adjacent to v in \(G_2\) in these components. Also, given a subset \(S\subseteq L_1'\), we denote by \(C_{S}=\cup _{v\in S}C_v\).

Lemma 5.4

There exists a constant \(K_2{:}{=}K_2(C,\epsilon )>0\) such that whp \(|C_v|\le K_2d\) for every \(v\in V(L_1')\).

Proof

Note that \(G_2\) has the same distribution as \(G_p\), and that \(p_1=\frac{1+\epsilon -\delta +o(1)}{d}\). Furthermore, observe that by Theorem 1.2, there exists a constant \(K_1{:}{=}K_1(C,\epsilon )\) such that whp every component of \(G_{p_1}\), besides \(L_1'\), is of order at most \(K_1d\) (although technically the \(K_1\) given by Theorem 1.2 might depend on \(\delta \), it is easy to check from the proof that, since \(\delta \ll \epsilon \), we may choose \(K_1\) only as a function of \(\epsilon \) and C).

Suppose that there is some \(v\in V(L_1')\) such that \(|C_v|\ge K_2d\). Note that \(C_v\cup \{v\}\) is connected in \(G_2\), and that \(C_v\) is the disjoint union of some sets \(\left\{ C_1,\ldots , C_r\right\} \) where \(C_i\) is the vertex set of some component of \(G_1\), each of which has order at most \(K_1d\). It follows there must be some subset \(\hat{C}\subseteq C_v\) such that \(\hat{C}\cup \{v\}\) is connected in \(G_2\), \(\hat{C}\) is the union of some subset of \(\{C_1,\ldots , C_r\}\) and \(K_2d\le |\hat{C}|\le (K_2+K_1)d\).

In particular, there is some spanning tree T of \(\hat{C}\cup \{v\}\), all of whose edges are in \(G_2\), and no edge in the edge-boundary of \(V(T)\setminus \{v\}\) is present in \(G_1\).

Let us bound the probability that such a tree of order k exists in \(G_2\) for each \(K_2d+1\le k \le (K_2+K_1)d+1\). By Lemma 3.3, there are at most \(n(ed)^{k-1}\) such trees. A spanning tree T has \(k-1\) edges in \(G_2\), which happens with probability at most \(p^{k-1}\). Furthermore, since \(|V(T)\setminus \{v\}|=k-1\), by Theorem 1 there are at least \((k-1)\left( d-(C-1)\log _2(k-1)\right) \) edges in the edge-boundary of \(V(T)\setminus \{v\}\), none of which are in \(G_1\), which happens with probability at most \({(1-p_1)^{(k-1)\left( d-(C-1)\log _2(k-1)\right) }}\). Whilst these two events are not necessarily independent, they are negatively correlated. Thus, by the union bound, the probability that such a tree of order k exists in \(G_2\) is at most

$$\begin{aligned} n(ed)^{k-1}p^{k-1}(1-p_1)^{(k-1)\left( d-(C-1)\log _2(k-1)\right) }. \end{aligned}$$

Therefore, the probability that such a tree exists for \(k\in I{:}{=}[K_2d+1, (K_2+K_1)d+1]\) is at most

$$\begin{aligned}&n\sum _{k\in I}\exp \left( (k-1)\left( 1+\ln (1+\epsilon )-(1+\epsilon -2\epsilon ^3)\right) \right) \\&\quad \le n\sum _{k\in I}\exp \left( -\epsilon ^3(k-1)\right) =o(1), \end{aligned}$$

where we used the fact that \(\ln (1+\epsilon )\le \epsilon -3\epsilon ^3\) for small enough \(\epsilon >0\), and we assume that \(K_2\ge \frac{2\ln C}{\epsilon ^3}\), recalling that \(n\le C^t\le C^d\) and hence \(\ln n\le \ln C \cdot d\). \(\square \)

In order to obtain our results, we will require further information about the likely expansion of subsets in the residue into the early giant. We will require the following density lemma.

Lemma 5.5

(Lemma 4.6 of [29], rephrased). There exists a constant \(c_2>0\) such that for any fixed constants \(K,c_1>0\), whp every subset \(M \subseteq V(G)\), with \(|M|=Kd\) and G[M] connected, contains at most \(c_1d\) vertices \(v \in M\) such that \(|N_G(v)\cap V(L_1')|<c_2d\).

We will make use of the following probabilistic lemma, which utilises Lemma 3.4.

Lemma 5.6

There exist positive constants \(K, K'{:}{=}K'(K)\) and \(c{:}{=}c(\delta )\) such that the following holds. Let \(S\subseteq V(L_1')\) and \(B\subseteq V(G)\setminus V(L_1')\) be such that \(|S\cup B|\ge Kd\) and \(G[S\cup B]\) is connected and \(|B|\ge K'|S|\). Then, there exists a matching in \(G_{p_2}\) of size at least c|B| between B and \(V(L_1')\setminus S\) with probability at least \(1-\exp \left( -c|B|\right) \).

Proof

By Lemma 5.5, there exists a constant \(c_2>0\) such that for any fixed constants \(K,c_1>0\), whp every subset \(M \subseteq V(G)\), with \(|M|=Kd\) and G[M] connected, contains at most \(c_1d\) vertices \(v \in M\) such that \(|N_G(v)\cap V(L_1')|<c_2d\). Let us fix some \(K\gg c_1\) and continue assuming the above holds deterministically for the corresponding \(c_2\).

Since \(G[S\cup B]\) is connected, it has a spanning tree T. By Lemma 3.4, applied with \(\ell = Kd\), there exist subsets \(A_1,\ldots , A_s\subseteq V(T)\) satisfying properties (a)–(d) of that lemma. In particular, since for all \(i\in [s]\) we have \(Kd \le |A_i|\le 3Kd\), by Theorem 1 we have that \(e(A_i) \le (C-1)3Kd\log _2(3Kd)\le 6CKd\log _2d\). Thus, by our assumption, for all \(i\in [s]\) we have that

$$\begin{aligned} e_G(A_i, L_1'\setminus A_i)\ge (Kd-c_1d)c_2d-12CKd\log _2d. \end{aligned}$$

Thus, defining \(\hat{A}_i{:}{=}A_i\setminus \left( \bigcup _{j\in ([s]\setminus \{i\})} A_j\right) \), we have that \(e(\hat{A}_i, L_1'\setminus \hat{A}_i)\ge e(A_i, L_1'\setminus A_i)-d\), and the edge sets \(E(\hat{A}_1, L_1'\setminus \hat{A}_1), \ldots , E(\hat{A}_s, L_1'\setminus \hat{A}_s)\) are disjoint. Hence, since K is sufficiently large with respect to \(c_1\), we can choose \(c_3{:}{=}c_3(c_1,c_2)>0\) small enough such that

$$\begin{aligned} e\left( S\cup B, L_1'\setminus (S\cup B)\right)&\ge \frac{|S|+|B|}{3Kd}\left( (Kd-c_1d)c_2d-12CKd\log _2d\right) \\&\quad -\frac{|S|+|B|}{Kd}\cdot d\\&\ge \left( |S|+|B|\right) c_3d. \end{aligned}$$

Therefore, as long as \(K' {:}{=}K'(c_1,c_2)\) is large enough, there exists \(c_4 {:}{=}c_4(c_1,c_2) > 0\) such that

$$\begin{aligned} e(B, L_1'\setminus S)\ge |B|c_3d-2|S|d\ge \left( c_3 - \frac{2}{K'} \right) |B|d \ge c_4|B|d. \end{aligned}$$

Thus, by Lemma 3.8, there exists a constant \(c(\delta )>0\) such that with probability at least \(1-\exp \left( -c|B|\right) \) there exists a matching M in \(G_{p_2}\) of size at least c|B| between B and \(L_1'\setminus S\). \(\square \)

From Lemmas 5.6 and 5.5, we can derive the following statement, complementing Lemma 5.3.

Lemma 5.7

There exist constants \(K, K', c>0\) such that whp, for every \(S_1\subseteq V(L_1'), S_1\ne \varnothing \), and for every \(S_2\subseteq C_{S_1}\) such that \(|S_2|\ge K'|S_1|\), \(|S_1\cup S_2|\ge Kd\) and \(G[S_1\cup S_2]\) connected, the following holds. Either

$$\begin{aligned} |N_{L_1'}(S_1)|\ge \frac{c|S_2|}{d}, \quad \text {or} \quad |N_{G_2}(S_1\cup S_2)|\ge \frac{c|S_2|}{d}. \end{aligned}$$

Proof

We begin by exposing \(G_{p_1}\), and let us fix \(\varnothing \ne S_1\subseteq V(L_1')\). Let us now expose all the edges in \(G_{p_2}\) which are either inside \(V(G)\setminus V(L_1')\) or lie between \(S_1\) and \(V(G)\setminus V(L_1')\). Denote by \(G_1'\) the graph \(G_1\) together with these edges, noting that \(G_1 \subseteq G'_1 \subseteq G_2\) and that \(G'_1\) determines \(C_{S_1}\).

Let us choose \(c_1>0\) a small enough constant, and choose K a sufficiently large constant. Then by Lemma 5.5 there exists \(c_2>0\) such that whp every connected subset \(M\subseteq V(G)\) of size Kd has at most \(c_1d\) vertices with less than \(c_2d\) neighbours in \(L_1'\). Furthermore, by Lemma 5.6 there exist constants \(K',c'>0\) (from Lemma 5.5) that the conclusion of the lemma holds for \(K,c_1,c_2\) and \(S_1\), noting that the event in the lemma depends only on edges in \(G_{p_2}\) between \(C_{S_1}\) and \(V(L_1')\setminus S_1\), which we have not yet exposed. We further note that we may choose \(K'\) sufficiently large. We continue assuming these properties hold deterministically.

Let us fix \(S_2\subseteq C_{S_1}\) satisfying the conditions of the lemma and let \(k_1{:}{=}|S_1|\) and \(k_2{:}{=}|S_2|\). By Lemma 5.6 the probability that \(S_2\) has less than \(c'|S_2|\) neighbours in \(V(L_1')\setminus S_1\) in \(G_{p_2}\) is at most \(\exp \left( -c'|S_2|\right) \). Furthermore, the event that \(S_2\) has at least \(c'|S_2|\) neighbours in \(V(L_1)\setminus S_1\) in \(G_{p_2}\) clearly implies that \(|N_{G_2}(S_1\cup S_2)|\ge \frac{c|S_2|}{d}\), for any constant \(c>0\).

Let us now facilitate a union bound argument. Let us choose \(c{:}{=}c(C,c')\) sufficiently small and suppose that \(b_1{:}{=}|N_{L_1'}(S_1)|<\frac{ck_2}{d}\), as otherwise the claim holds. Let us further fix \(k_2\) for now. Let us write m for the number of components in \(G_{1}\setminus N_{L_1'}(S_1)\). Since \(L_1'\) is connected and G is d-regular, we have \(m\le d\cdot b_1+1\).

Hence, since \(S_1\) has no neighbours in \(G_1\) outside \(N_{L_1'}(S_1)\), it must be the union of some components of \(G_{1}\setminus N_{L_1'}(S_1)\), and so the number of ways to choose such an \(S_1\) is at most \(\left( {\begin{array}{c}n\\ b_1\end{array}}\right) 2^{m}\). Thus, there are at most

$$\begin{aligned} \sum _{b_1=1}^{\frac{ck_2}{d}}\sum _{m=1}^{ck_2}\left( {\begin{array}{c}n\\ b_1\end{array}}\right) 2^{m}&\le \left( \frac{en}{\frac{ck_2}{d}}\right) ^{\frac{ck_2}{d}}\cdot 2^{ck_2+1}\le \exp \left( \frac{ck_2}{d}\left( \ln \left( \frac{end}{ck_2}\right) +2d\right) \right) \\ {}&\le \exp \left( 5\ln C\cdot ck_2\right) \end{aligned}$$

sets \(S_1\subseteq V(L_1')\) with \(|N_{L_1'}(S_1)|<\frac{ck_2}{d}\), where we used the assumption that \(\ln n\le \ln C \cdot d\) and that \(k_2\ge d\), since we may choose \(K\ge 2\) and \(K'\) large enough.

Now, let us consider the number of ways to choose \(S_2\subseteq C_{S_1}\), noting that having determined \(S_1\), choosing \(S_1\cup S_2\) determines \(S_2\). We may assume that \(b_2{:}{=}|N_{G_1'}(S_1\cup S_2)|\le \frac{ck_2}{d}\), since \(G_1'\subseteq G_2\). Since \(G_1'[S_1\cup S_2]\) is connected, and has all its neighbours in \(N_{G_1'}(S_1\cup S_2)\), exactly one of the at most n components in \(G_1'\setminus N_{G_1'}(S_1\cup S_2)\) is \(S_1\cup S_2\). Since \(S_1\) is fixed, we can identify this component. Hence, the number of ways to choose \(S_1\cup S_2\) with \(|N_{G_1'}(S_1\cup S_2)|\le \frac{ck_2}{d}\) is at most the number of ways to choose a set of size at most \(\frac{ck_2}{d}\) in \(V(G_1')\). That is at most

$$\begin{aligned} \sum _{b_2=1}^{\frac{ck_2}{d}}\left( {\begin{array}{c}n\\ b_2\end{array}}\right) \le \left( \frac{en}{\frac{ck_2}{d}}\right) ^{\frac{ck_2}{d}}\le \exp \left( 4\ln C\cdot ck_2\right) . \end{aligned}$$

Therefore, for fixed \(k_2\), the probability of an event violating the statement of the lemma is at most

$$\begin{aligned} \exp \left( 9\ln C\cdot ck_2\right) \exp \left( -c'k_2\right) =o(1/n), \end{aligned}$$

for c small enough in terms of C and \(c'\). Union bound over the at most n choices of \(k_2\) completes the proof. \(\square \)

5.3 Proof of Theorems 4(b) and 3(b)

Proof of Theorem 4(b) and 3(b)

Let \(S_1=S\cap V(L_1')\) and \(S_2=S\cap \left( V(L_1)\setminus V(L_1')\right) \). Let \(c_{(5.3)}\) be the constant whose existence is asserted in Lemma 5.3, and let \(K_{(5.7)}, K'_{(5.7)}\) and \(c_{(5.7)}\) be the constants whose existence is asserted in Lemma 5.7. Let \(c>0\) be sufficiently small in terms of \(c_{(5.3)}\), \(K'_{(5.7)}\) and \(c_{(5.7)}\).

4(b):

Recall that we assume that \(K_{(5.7)}d \le n^{\epsilon ^5}\le |S|\le \frac{3\epsilon n}{2}\) and \(G_p[S]\) is connected. Suppose \(|S_1|\ge \frac{|S_2|}{K'_{(5.7)}}\). Then, \(|S_1|=\Omega (d\ln d)=\omega (d)\) and so by Lemma 5.3(b) whp either

$$\begin{aligned} |\partial _{L_1'}(S_1)|\ge \frac{c_{(5.3)}|S_1|\ln \left( \frac{n}{|S_1|}\right) }{d\ln d}\ge \frac{c|S|\ln \left( \frac{n}{k}\right) }{d\ln d}, \end{aligned}$$

or there is a family of at least \(\frac{c_{(5.3)}|S_1|}{d}\ge \frac{c|S|}{d}\) vertex-disjoint paths from \(S_1\) to \(V(L_1')\setminus S_1\subseteq V(L_1)\setminus S\). However, since each such path contributes a unique vertex to the neighbourhood of S in \(V(L_1)\) (the first vertex along the path which is not in S), in the latter case \(|N_{G_p}(S)|\ge \frac{c|S|}{d}\), and so the result follows.

Otherwise, \(|S_2|\ge K'_{(5.7)}|S_1|\) and so by Lemma 5.7whp either \(|N_{L_1'}(S_1)|\ge \frac{c_{(5.7)}|S_2|}{d}\), or \(|N_{L_1}(S)|\ge \frac{c_{(5.7)}|S_2|}{d}\). In the first case, \(|\partial _{L_1}(S)|\ge |N_{L_1'}(S_1)|\ge \frac{c|S|}{d}\) and, similarly to before, in the second case \(|\partial _{L_1}(S)|\ge |N_{L_1}(S)|\ge \frac{c|S|}{d}\).

3(b):

We now assume that \(K_{(5.7)}d \le \epsilon ^2 n\le |S|\le \frac{3\epsilon n}{2}\). Note that, since \(\big ||V(L_1)|-|V(L_1')|\big |\le 4\epsilon ^3n\), it follows that \(|S_1|\ge \frac{2|S|}{3}\). Thus, by Lemma 5.3(a), whp either

$$\begin{aligned} |N_{L_1'}(S_1)|\ge \frac{c_{(5.3)}|S_1|}{d\ln d}\ge \frac{c|S|}{d\ln d}, \end{aligned}$$

or there is a family of at least \(\frac{c_{(5.3)}|S_1|}{d}\ge \frac{c|S|}{d}\) vertex-disjoint paths from \(S_1\) to \(V(L_1')\setminus S_1 \subseteq V(L_1) \setminus S\), and each such path contributes a unique vertex to the neighbourhood of S in \(L_1\). As before, in either case \(|N_{G_p}(S)|\ge \frac{c|S|}{d\ln d}\).

\(\square \)

The proof of Theorem 5 will follow from key ideas from [51] together with our expansion result on large sets (Theorem 3(b)).

Proof of Theorem 5

Let c be the constant whose existence is asserted in Theorem 4. Let \(M\subseteq V(L_1)\) be a maximal set such that \(|M|\le \frac{\epsilon n}{10}\) and \(|N_{G_p}(M)|\le \frac{c|M|}{d\ln d}\). Let \(H=L_1- M\). Assume that there is some subset \(B\subseteq V(H)\) such that \(|B|\le \frac{|V(H)|}{2}\) and \(|N_{H}(B)|\le \frac{c|B|}{d\ln d}\). Then,

$$\begin{aligned} |N_{G_p}(M\cup B)| \le |N_{G_p}(M)| + |N_{H}(B)| <\frac{c|M|}{d\ln d}+\frac{c|B|}{d\ln d}=\frac{c|M\cup B|}{d\ln d}. \end{aligned}$$

Thus, by the maximality of M, we obtain that \(|M\cup B|\ge \frac{\epsilon n}{10}\). However, by Theorem 3(b), every subset \(S\subseteq V(L_1)\) with \(\epsilon ^2n\le |S|\le \frac{3\epsilon n}{2}\) has \(|N_{G_p}(S)|\ge \frac{c|S|}{d\ln d}\). Hence, \(|M\cup B|\ge \frac{3\epsilon n}{2}\).

On the other hand, by our choice of B and M, we have that

$$\begin{aligned} |M\cup B|\le |M|+\frac{|V(L_1)|-|M|}{2}= \frac{|V(L_1)|+|M|}{2}\le \frac{|V(L_1)|}{2}+\frac{\epsilon n}{20}. \end{aligned}$$

By Theorem 1.2, whp \(|V(L_1)|\le 2\epsilon n\), and hence \(|M\cup B|\le \frac{21\epsilon n}{20}<\frac{3\epsilon n}{2}\)—a contradiction. Hence, whp H has the desired expansion properties. Furthermore, by Theorem 1.2whp

$$\begin{aligned} |V(H)|=|V(L_1)|-|M|\ge \frac{19\epsilon n}{10}-\frac{\epsilon n}{10}\ge \frac{3\epsilon n}{2}. \end{aligned}$$

\(\square \)

6 Consequences of Expansion in the Giant Component

We begin with the likely existence of a long cycle, which follows immediately from Theorem 3(b) together with Theorem 3.5.

Proof of Theorem 6(c)

By Theorem 3(b), there exists a constant \(c<0\) such that whp for all \(\epsilon ^2n\le k\le \frac{3\epsilon n}{2}\) and all subsets \(S\subseteq V(L_1)\) with \(|S|=k\),

$$\begin{aligned} |N_{G_p}(S)|\ge \frac{c|S|}{d\ln d}. \end{aligned}$$

Thus, applying Theorem 3.5 with \(a=\frac{3\epsilon n}{2}\) and \(b=\frac{3c\epsilon n}{4d\ln d}\), we obtain that whp \(L_1\) contains a cycle of length \(\Omega \left( \frac{n}{d\ln d}\right) \). \(\square \)

Note that, due to the comment after Theorem 5, up to a logarithmic factor in d this is the best bound that can be given with such an argument based solely on the expansion of \(L_1\).

We now turn to Theorem 6(a) and (b). For these two theorems, the following two lemmas will be useful. The first is a variant of a lemma from [31], bounding the typical number of edges incident to connected subsets in \(G_p\), whose proof we include for completeness.

Lemma 6.1

Whp, for all \(S\subseteq V(L_1)\) such that \(G_p[S]\) is connected,

$$\begin{aligned} e_{G_p}(S)+e_{G_p}(S,S^C)\le \max \left\{ 10|S|, 20\ln C \cdot d\right\} . \end{aligned}$$
(10)

Proof

Let us begin by considering connected sets S such that \(|S|=k\ge \ln C \cdot d\). Since any connected set in \(G_p\) has a spanning tree, it is sufficient to show that (10) holds whenever S is the vertex set of a tree in \(G_p\) of order \(k \ge d\) in \(G_p\). By Lemma 3.3, there are at most \(n(ed)^{k-1}\) trees on k vertices in G and the probability that each such tree is contained \(G_p\) is \(p^{k-1}\). Since each set of k vertices is incident to at most kd edges in G, there are most \(\left( {\begin{array}{c}kd\\ 9k\end{array}}\right) \) ways to choose an additional 9k edges incident to this set of vertices, and these edges are in \(G_p\) with probability \(p^{9k}\). Hence, by the union bound, the probability that (10) fails to hold is at most:

$$\begin{aligned} n(ed)^{k-1}p^{k-1}\left( {\begin{array}{c}kd\\ 9k\end{array}}\right) p^{9k}&\le n\cdot \left( 2e\right) ^{k-1}\left( \frac{2e}{9}\right) ^{9k}\le n\exp (-2k)=o(1/n), \end{aligned}$$

since \(k\ge \ln C \cdot d \ge \ln n\). Taking a union bound over the at most n possible values of k, it follows that whp for all subsets \(S\subseteq V(L_1)\) with \(|S|\ge \ln C \cdot d\) and \(G_p[S]\) connected, \(e(S)+e(S, S^C)\le 10|S|\).

We now turn to connected sets S with \(|S|<\ln C\cdot d\). Since \(L_1\) is connected, and by Theorem 1.2 we have that whp \(|V(L_1)|\ge \epsilon n\), there exists a connected set \(S'\supseteq S\) such that \(|S'|=d\ln d\). Note that \(2e(S)+e(S,S^c)\le 2e(S')+e(S',S'^C)\), and so in particular by the above whp

$$\begin{aligned} e(S)+e(S, S^C)\le 2e(S)+e(S,S^C)\le 2\left( e(S')+e(S', S'^C)\right) \le 20\ln C \cdot d, \end{aligned}$$

completing the proof. \(\square \)

We also require a bound on the typical number of edges in \(L_1\). While this can be calculated quite accurately, the following naive, yet simple to prove bound will suffice for our goals, and utilises the Depth First Search (DFS) algorithm (see [52] for definition and application of the DFS algorithm in random graphs). Recall that the excess of a connected graph H is defined as \(|E(H)|-(|V(H)|-1)\).

Lemma 6.2

Whp, \(e(L_1)<3\epsilon n\).

Proof

We begin by running a DFS algorithm with \(\frac{nd}{2}\) random bits \(X_i\), to expose a spanning forest of \(G_p\). We first claim that if there is a connected component S of order k with \(k\ge d^2\), then we have queried at least \(\frac{2kd}{3}\) of the edges incident to S. Indeed, otherwise, there would have been an interval of length at most \(\frac{2kd}{3}\) where we receive k positive answers. By a typical Chernoff-type bound, the probability that a fixed interval of length \(\frac{2kd}{3}\) contains k positive answers is at most

$$\begin{aligned} \mathbb {P}\left( Bin\left( \frac{2kd}{3},\frac{1+\epsilon }{d}\right) \ge k\right) \le \exp \left( -\frac{k}{30}\right) \le \exp \left( -\frac{d^2}{30}\right) . \end{aligned}$$

In particular, taking a union bound over the at most nd intervals of length \(\frac{2kd}{3}\) and at most n different values of k completes the proof of the claim.

By Theorem 1.2, whp this algorithm discovered a unique giant component \(L_1\), with \(|V(L_1)|< 2\epsilon n\), and in doing so queried at least \(\frac{2|V(L_1)|d}{3}\) of the at most \(|V(L_1)|d\) edges incident to \(V(L_1)\). However, since we exposed a spanning tree of \(L_1\), at most \(|V(L_1)|-1\) edges of \(L_1\) were exposed during the algorithm. Since there are at most \(\frac{|V(L_1)|d}{3}\) queries left and whp \(|V(L_1)<2\epsilon n\), the number of excess edges in \(L_1\) is stochastically dominated by a binomial random variable \(\text {Bin}\left( \frac{2\epsilon n d}{3}, \frac{1+\epsilon }{d}\right) \). In particular, by a standard Chernoff-type bound, whp \(L_1\) has at most \(\epsilon n\) excess edges and hence in total \(e(L_1) \le |V(L_1)| -1 + \epsilon n < 3\epsilon n\). \(\square \)

6.1 Proof of Theorem 6(a)

Proof of Theorem 6(a)

We note that by Theorem 1.2 and Lemma 6.2, whp, \(\epsilon n< |E(L_1)|, |V(L_1)|< 3\epsilon n\), and we assume in what follows that this holds. Given a vertex \(v \in V(L_1)\), let B(vr) denote the ball of radius r around v in \(L_1\). Since \(L_1\) is connected and has size at least \(\epsilon n\), for any \(v\in V(L_1)\) we have that \(|B(v,d\ln d)|\ge d\ln d\). Furthermore, by Lemma 6.1, whp for any \(B(v,r)\subseteq V(L_1)\) with \(|B(v,r)|\ge \ln C \cdot d\),

$$\begin{aligned} \frac{e\left( B(v,r)\right) }{10} \le |B(v,r)|\le e(B(v,r)) -1, \end{aligned}$$

where the lower bound holds since B(vr) is connected. By Theorem 4(a) and (b), whp for any \(B(v,r)\subseteq V(L_1)\) with \(|B(v,r)|\ge d\ln d\),

$$\begin{aligned} e(B(v,r+1))&= e(B(v,r)) + \partial _{G_p}(B(v,r)) \\&\ge \min \left\{ \frac{3\epsilon n}{2}-1,e(B(v,r)) +\frac{c\ln \left( \frac{n}{|B(v,r)|}\right) }{d\ln d}|B(v,r)|\right\} . \end{aligned}$$

By the above, whp

$$\begin{aligned} e(B(v,r)) +\frac{c\ln \left( \frac{n}{|B(v,r)|}\right) }{d\ln d}|B(v,r)|&\ge \left( 1+\frac{c\ln \left( \frac{n}{e(B(v,r))-1}\right) }{10d\ln d}\right) e(B(v,r))\\&\ge \left( 1+\frac{c'\ln \left( \frac{n}{e(B(v,r))}\right) }{10d\ln d}\right) e(B(v,r)), \end{aligned}$$

for some constants \(c,c'>0\), and hence whp

$$\begin{aligned} e(B(v,r+1))\ge \min \left\{ \frac{3\epsilon n}{2}-1,\left( 1+\frac{c'\ln \left( \frac{n}{e(B(v,r))}\right) }{10d\ln d}\right) e(B(v,r))\right\} . \end{aligned}$$
(11)

We continue assuming the above holds deterministically.

Let v be an arbitrary vertex in \(L_1\). We let \(B_0{:}{=}B(v,d\ln d)\), and define inductively \(B_i{:}{=}B(v,d\ln d +i)\).

Let \(C'>0\) be such that \(n=\exp \left( C'd\right) \). Given \(\frac{1}{d}<\alpha \le 1\), we define

$$\begin{aligned} I(\alpha ){:}{=}\left\{ i\in \mathbb {N}:\exp \left( (1-\alpha )C'd\right) \le e(B_i)\le \exp \left( \left( 1-\frac{\alpha }{2}\right) C'd\right) \right\} . \end{aligned}$$

Using (11) we can bound the size of \(I(\alpha )\). For each \(i\in I(\alpha )\), we have that \(\frac{c'\ln \left( \frac{n}{e(B(i))}\right) }{10d\ln d}\ge \frac{c' C'\alpha }{20\ln d} : =\frac{c''\alpha }{\ln d}\). Thus by (11),

$$\begin{aligned} |I(\alpha )|\le \log _{1+\frac{c''\alpha }{\ln d}}\left( \frac{\exp \left( \left( 1-\frac{\alpha }{2}\right) C'd\right) }{\exp \left( (1-\alpha )C'd\right) }\right) =\frac{\alpha C'd}{2\ln \left( 1+\frac{c''\alpha }{\ln d}\right) }=O(d\ln d). \end{aligned}$$

Let \(i_{\textrm{max}}\) be the smallest index such that \(e(B_i)>\frac{3\epsilon n}{2}-1\), let \(\alpha _0=1\) and let \(\alpha _j=\frac{\alpha _0}{2^j}\). Then, there is a smallest index \(j_{\textrm{max}}\) such that

$$\begin{aligned}{}[i_{\textrm{max}}]=\bigcup _{j=1}^{j_{\textrm{max}}}I(\alpha _j). \end{aligned}$$

Furthermore, there is some constant \(C''\) such that if we let \(\alpha _{\textrm{max}}=\frac{C''\ln \left( \frac{1}{\epsilon }\right) }{d}\), then \(\exp \left( (1-\alpha _{\textrm{max}})C'd\right) =\frac{3\epsilon n}{2}\). Since \(\alpha _i=\frac{\alpha _0}{2^{i}}\), it follows that

$$\begin{aligned} j_{\textrm{max}} \le \Bigg \lceil \log _2\left( \frac{d}{C''\ln \left( \frac{1}{\epsilon }\right) }\right) \Bigg \rceil =O(\ln d). \end{aligned}$$

Thus,

$$\begin{aligned} i_{\textrm{max}} \le j_{\textrm{max}} \cdot \max _{j \le j_{\textrm{max}}} |I(\alpha _j)| = O(d \ln ^2 d). \end{aligned}$$

Therefore it follows that there is some constant \(K>0\) such that for every \(v\in V(L_1)\),

$$\begin{aligned} e\left( B(v,Kd\ln ^2d)\right) \ge e\left( B(v,(K-1)d\ln ^2d)\right) \ge \frac{3\epsilon n}{2}-1 \ge \frac{|E(L_1)|}{2}. \end{aligned}$$

Since \(L_1\) is connected, we have that \(e\left( B(v,Kd\ln ^2d+1)\right) >\frac{|E(L_1)|}{2}\).

Thus, we can cover more than half of \(E(L_1)\) within a ball of radius \(O(d\ln ^2d)\) from any vertex \(v\in V(L_1)\), and therefore the diameter of \(L_1\) is \(O(d\ln ^2d)\). \(\square \)

6.2 Proof of Theorem 6(b)

We start with some definitions and brief background (see [56] for a more comprehensive introduction to Markov chains and mixing time). Given a graph G, the lazy simple random walk on G is a Markov chain starting at a vertex \(v_0\) chosen according to some distribution \(\sigma \), such that for any vertex \(v\in V(G)\) the walk stays at v with probability \(\frac{1}{2}\), and otherwise moves to a uniformly chosen random neighbour u of v. Hence, the transition probability from v to u satisfies \(\mathbb {P}(v\rightarrow u)=\frac{1}{2d(v)}\). If G is connected, then this Markov chain is irreducible and ergodic and as such has a stationary distribution, which we call the stationary distribution \(\pi \), which can be seen to be given by \(\pi (v)=\frac{d(v)}{2e(G)}\) for each \(v\in V(G)\). We are interested in estimating how quickly this Markov chain converges to its limit distribution. For that, recall that the total variation distance \(d_{TV}\) between two distributions \(p_1\) and \(p_2\) on V(G) is defined by

$$\begin{aligned} d_{TV}(p_1,p_2):=\max _{A\subset V(G)}\bigg |p_1(A)-p_2(A)\bigg |. \end{aligned}$$

Let \(P^t(v,\cdot )\) denote the distribution on V(G) given by starting the lazy random walk at \(v\in V(G)\) and running for t steps. If we define \(d(t):=\max _{v\in V(G)}d_{TV}\left( P^t(v,\cdot ),\pi \right) \), then the mixing time of the lazy random walk is then defined as \(t_{mix}:=\min \left\{ t:d(t)\le \frac{1}{4}\right\} .\) Now, for any \(S\subseteq V(G)\), let

$$\begin{aligned} \pi (S)&:=\sum _{v\in S}\pi (v)=\frac{2e(S)+e(S,S^C)}{2e(G)} \qquad \text { and } \\ Q(S)&:=\sum _{v\in S, u\in S^C}\pi (v)\mathbb {P}(v\rightarrow u)=\frac{e(S,S^C)}{4e(G)}. \end{aligned}$$

The conductance \(\Phi (S)\) of S is then given by

$$\begin{aligned} \Phi (S):=\frac{Q(S)}{\pi (S)\pi (S^C)}=\frac{e(S, S^C)}{2\left( 2e(S)+e(S,S^C)\right) \pi (S^C)}, \end{aligned}$$

where we note that since \(Q(S)=Q(S^C)\), we have that \(\Phi (S)=\Phi (S^C)\). Let \(\pi _{\min }=\min _{v\in V(G)}\pi (v)\). For \(\rho >\pi _{\min }\), we define

$$\begin{aligned} \Phi (\rho ):=\min \left\{ \Phi (S): S\subseteq V(G), \rho /2\le \pi (S)\le \rho , \text {S is connected in } G\right\} , \end{aligned}$$

if there is no such subset S, we set \(\Phi (\rho )=1\). The following theorem due to Fountoulakis and Reed [34] bounds the mixing time through the conductance of connected sets:

Theorem 6.3

(Theorem 1 of [34]). There exists an absolute constant \(K>0\) such that

$$\begin{aligned} t_{mix}\le K\sum _{j=1}^{\log _2\pi _{\min }^{-1}}\Phi ^{-2}\left( 2^{-j}\right) . \end{aligned}$$

Throughout the rest of this section, we consider the mixing time of the lazy random walk on the giant component \(L_1\) of \(G_p\). Below, e(S) will stand for \(e_{G_p}(S)\) and \(e\left( S,S^C\right) \) will stand for \(\big |\partial _{G_p}(S)\big |\).

We now aim to bound \(\Phi (\rho )\). We begin with the following simple observation.

Lemma 6.4

Whp, for any \(S\subseteq V(L_1)\) such that \(G_p[S]\) is connected and \(\pi (S)\ge \frac{100\ln c \cdot d}{\epsilon ^3n}\), we have that \(|S|\ge \frac{10d \ln C}{\epsilon ^2}\).

Proof

Given S satisfying the conditions of the lemma, it follows that \(2e(S)+e(S,S^C) = 2e(L_1)\pi (S)\ge \frac{100\ln c \cdot d}{\epsilon ^3n}e(L_1)\). Since \(L_1\) is connected, by Theorem 1.2whp \(e(L_1)\ge |V(L_1)|-1\ge \epsilon n\). In particular, whp \(2e(S)+e(S,S^C)\ge \frac{200\ln c \cdot d}{\epsilon ^2}\), and so by Lemma 6.1, whp

$$\begin{aligned} \frac{200\ln c \cdot d}{\epsilon ^2}\le 2e(s)+e(S,S^C)\le 2\left( e(S)+e(S,S^C)\right) \le \max \{ 20|S|, 40\ln c \cdot d\}. \end{aligned}$$

Since \(\epsilon \) is sufficiently small, \(|S|\ge \frac{10\ln c \cdot d}{\epsilon ^2}\), as required. \(\square \)

We now show that for wide ranges of \(\rho \), we can apply Theorem 4(a) and (b). We begin by relating bounds on \(\pi (S)\) to those on \(\Phi (S)\).

Lemma 6.5

There exists a constant \(c>0\) such that whp, for every \(S\subseteq V(L_1)\) with \(G_p[S]\) connected and \(\frac{100\ln c \cdot d}{\epsilon ^3 n}\le \pi (S)\le \frac{1}{2}\),

$$\begin{aligned} \Phi (S)\ge \frac{c\ln \left( \frac{n}{|S|}\right) }{d\ln d}. \end{aligned}$$

Proof

Since \(\pi (S)=\frac{2e(S)+e(S,S^C)}{2e(L_1)}\le \frac{1}{2}\), it follows that \(e(S) \le \frac{e(L_1)}{2}\), as otherwise we have \(\pi (S)>\frac{1}{2}\). Furthermore, by Lemma 6.2, whp \(e(L_1)<3\epsilon n\) and thus whp \(e(S)<\frac{3\epsilon n}{2}\). Since \(G_p[S]\) is connected, we have that \(|S|\le 1+e(S)\). Therefore, whp \(|S|\le \frac{3\epsilon n}{2}\). On the other hand, since \(\pi (S)\ge \frac{100\ln Cd}{\epsilon ^3n}\), by Lemma 6.4whp \(|S|\ge \frac{10d \ln C}{\epsilon ^2}\).

Altogether, we have that whp \(\frac{10\ln c \cdot d}{\epsilon ^2}\le |S| \le \frac{3\epsilon n}{2}\). Thus, by Theorem 4(a) and (b), there exists a constant \(c'>0\) such that whp \(e(S,S^C)\ge \frac{c'\ln \left( \frac{n}{|S|}\right) |S|}{d\ln d}\), and by Lemma 6.1 we have that whp \(2e(S)+e(S,S^C)\le 2\left( e(S)+e(S,S^C)\right) \le 20|S|\). Therefore, with \(c=\frac{c'}{20}\), whp

$$\begin{aligned} \Phi (S)=\frac{e(S,S^C)}{2\left( 2e(S)+e(S,S^C)\right) \pi (S^C)}\ge \frac{c'\ln \left( \frac{n}{|S|}\right) |S|}{d\ln d\cdot 20|S|}\ge \frac{c\ln \left( \frac{n}{|S|}\right) }{d\ln d}. \end{aligned}$$

\(\square \)

Before applying Theorem 6.3, we estimate \(\Phi (2^{-j})\) for wide ranges of values of j using Lemma 6.5.

Lemma 6.6

Let j be an integer such that \(\frac{200 \ln c \cdot d}{\epsilon ^3 n}\le 2^{-j}\le \frac{1}{2}\). Then there exists a constant \(c>0\) such that whp

$$\begin{aligned} \Phi (2^{-j})\ge \frac{cj}{d\ln d}. \end{aligned}$$

Proof

Let \(\mathcal {S} = \left\{ S\subseteq V(L_1), 2^{-j-1}\le \pi (S)\le 2^{-j}, L_1[S] \text { is connected}\right\} \). Since \(2^{-j} \ge \frac{200 \ln c \cdot d}{\epsilon ^3 n}\), for all \(S \in \mathcal {S}\), \(\pi (S) \ge \frac{100 \ln c \cdot d}{\epsilon ^3 n}\) and so by Lemma 6.5, whp

$$\begin{aligned} \Phi (S) \ge \frac{c'\ln \left( \frac{n}{|S|}\right) }{d\ln d}, \end{aligned}$$

for \(c'{:}{=}c_{6.5}\), where \(c_{6.5}\) is the constant whose existence is guaranteed by Lemma 6.5. It follows that whp

$$\begin{aligned} \Phi \left( 2^{-j}\right) = \min \left\{ \Phi (S): S \in \mathcal {S} \right\} \ge \min \left\{ \frac{c'\ln \left( \frac{n}{|S|}\right) }{d\ln d}: S \in \mathcal {S} \right\} . \end{aligned}$$
(12)

However, for all \(S \in \mathcal {S}\), since \(\pi (S) \ge \frac{100 \ln c \cdot d}{\epsilon ^3 n}\) it follows from Lemma 6.4 that \(|S| \ge \frac{10 d \ln C}{\epsilon ^2}\). Hence, by Lemma 6.1, whp for all \(S \in \mathcal {S}\), \(\pi (S)=\frac{2e(S)+e(S,S^C)}{2e(L_1)}\ge \frac{|S|}{\epsilon n}\) and so

$$\begin{aligned} |S| \le \epsilon n \pi (S) \le 2^{-j}\epsilon n. \end{aligned}$$
(13)

Therefore, by (12) and (13) whp

$$\begin{aligned} \Phi \left( 2^{-j}\right) \ge \frac{c'\ln \left( \frac{2^{j}}{\epsilon }\right) }{d\ln d} = \frac{cj}{d\ln d}. \end{aligned}$$

\(\square \)

We are now ready to prove the Theorem 6(b).

Proof of Theorem 6(b)

By Theorem 6.3, we have that there exists an absolute constant \(K>0\) such that

$$\begin{aligned} t_{mix}\le K\sum _{j=1}^{\log _2\pi _{\min }^{-1}}\Phi ^{-2}\left( 2^{-j}\right) . \end{aligned}$$
(14)

Let \(j_{\textrm{max}}\) be the largest integer such that \(2^{-j_{\textrm{max}}}\ge \frac{200d \ln C}{\epsilon ^3 n}\), noting that \(j_{\textrm{max}}\le \log _2(\epsilon ^3n)\le d\). Then by Lemma 6.6, whp for \(1\le j \le j_{\textrm{max}}\), we have that \(\Phi ^{-2}\left( 2^{-j}\right) \le \frac{d^2\ln ^2d}{c^2j^2}\). Thus,

$$\begin{aligned} \sum _{j=1}^{\log _2\pi _{\min }^{-1}}\Phi ^{-2}\left( 2^{-j}\right)&=\sum _{j=1}^{j_{\textrm{max}}}\Phi ^{-2}\left( 2^{-j}\right) + \sum _{j=j_{\textrm{max}}}^{\log _2\pi _{\min }^{-1}}\Phi ^{-2}\left( 2^{-j}\right) \nonumber \\&\le \sum _{j=1}^{d}\frac{d^2\ln ^2d}{c^2j^2}+\sum _{j=j_{\textrm{max}}}^{\log _2\pi _{\min }^{-1}}\Phi ^{-2}\left( 2^{-j}\right) . \end{aligned}$$
(15)

We note that

$$\begin{aligned} \sum _{j=1}^{d}\frac{d^2\ln ^2d}{c^2j^2}=O(d^2\ln ^2d), \end{aligned}$$
(16)

since \(\sum _{j=1}^{d}\frac{1}{j^2}=O(1)\) for \(d\rightarrow \infty \). Let us now estimate \(\sum _{j=j_{\textrm{max}}}^{\log _2\pi _{\min }^{-1}}\Phi ^{-2}\left( 2^{-j}\right) \). Since \(L_1\) is connected, and by Lemma 6.2whp \(e(L_1) < 3\epsilon n\), whp for every \(S\subseteq V(L_1)\) we have that

$$\begin{aligned} \Phi (S)=\Phi (S^c)\ge \frac{1}{4e(L_1)\pi (S)} \ge \frac{1}{12\epsilon n \pi (S)} . \end{aligned}$$

Hence, whp for any S with \(\pi (S) \le 2^{-j}\), \(\Phi (S) \ge \frac{2^j}{12\epsilon n}\), and so \(\Phi \left( 2^{-j}\right) \ge \frac{2^j}{12\epsilon n}\). Therefore, whp

$$\begin{aligned} \sum _{j=j_{\textrm{max}}}^{\log _2\pi _{\min }^{-1}}\Phi ^{-2}\left( 2^{-j}\right) \le 2 \left( \frac{12\epsilon n}{2^{j_{\textrm{max}}}}\right) ^2 \le 2\left( \frac{12 \epsilon n \cdot 200 d\ln C}{\epsilon ^3 n}\right) ^2=O(d^2). \end{aligned}$$
(17)

Altogether, by (14), (15), (16) and (17) we obtain

$$\begin{aligned} t_{mix}\le K\left( \sum _{j=1}^{d}\frac{d^2\ln ^2d}{c^2j^2}+\sum _{j=j_{\textrm{max}}}^{\log _2\pi _{\min }^{-1}}\Phi ^{-2}\left( 2^{-j}\right) \right) =O(d^2\ln ^2d)+O(d^2)=O(d^2\ln ^2d). \end{aligned}$$

\(\square \)

7 Discussion and Open Questions

In this paper, we give edge-isoperimetric bounds for high-dimensional product graphs, from which we are able to derive almost-tight bounds on the likely expansion properties of the giant component in supercritical percolation on these graphs, as well as almost-tight several structural consequences of these expansion properties. However, there are many interesting open questions that remain, both in terms of the isoperimetric properties of these graphs, as well as in terms of the typical structure of the giant component, and we mention a few of these below.

7.1 Isoperimetry in Product Graphs

As mentioned in the introduction, Theorems 1 and 2 generalise the edge-isoperimetric inequality of the hypercube, and are tight in this case for sets of size \(2^k\). In fact, more generally, Theorem 1 is tight in general for small sets up to a \((1+o(1))\) multiplicative factor, and the consequence of Theorem 2 that \(i_k(G)=\Omega \left( \ln \left( \frac{n}{k}\right) \right) \) recovers up to a constant multiplicative factor known tight isoperimetric inequalities for many of the families of product graphs for which the edge-isoperimetric problem has been studied (see [14]).

Moreover, it is not too hard to see that, under the assumption that the base graphs are all isomorphic, \(i_k(G)=\Theta \left( \ln \left( \frac{n}{k}\right) \right) \) for all k. Indeed, it is easy to verify that for \(k=C^i\), i-dimensional projections of G—that is, induced subgraphs on a vertex set of the form \(V_1 \times V_2 \times \cdots \times V_t\) where each \(V_j\) is either \(V(G^{(j)})\) or a singleton vertex \(\{v_j\} \subseteq V(G^{(j)})\)—will have order k and edge-boundary of order \(O\left( k\ln \left( \frac{n}{k}\right) \right) \). With a slightly more careful inductive argument, it can be shown that such a bound holds for intermediary k as well. It is thus natural to ask about the leading constant.

Question 7.1

Let H be a connected regular graph, and for all \(j\in [t]\), let \(G^{(j)}=H\). Let \(G = \square _{j=1}^tG^{(j)}\) and let \(n{:}{=}|V(G)|\). Are there constants \(c{:}{=}c(H)\) and \(K {:}{=}K(H)\) such that for all \(1\le k \le \frac{n}{2}\),

$$\begin{aligned} i_k(G)= (1+o(1))c\log _2 \left( \frac{n}{k}\right) \pm K? \end{aligned}$$

A natural conjecture, given the edge-boundary of i-dimensional projections of G, would be that we can take \(c = d(H)\), which would agree with the known bounds in the case of the hypercube.

More generally, and very ambitiously, since we are interested in the asymptotics as \(t \rightarrow \infty \), and for any fixed C there is only a finite set \(\{H_1,\ldots , H_m\}\) of graphs on at most C vertices, we could ask the analogue of Question 7.1 in the limit as the proportion of the number of base graphs \(G^{(i)}\) that are equal to a particular graph \(H_i\) converges to a limit \(\alpha _i\) for each i, although it seems likely that this is a difficult optimisation problem.

In the case of the hypercube the edge-isoperimetric problem has in fact been fully solved—for each \(k \le 2^d\) it is known precisely which k-sets S minimise its edge-boundary \(\partial (S)\), and it is even known that one can choose a nested sequence of optimal sets, which then interpolate between subcubes of dimension k for each \(k \le d\). This is known to hold more generally for many other product graphs, see [14], although there are examples, such as the d-dimensional torus for cycles of length larger than five [24], where there is no nested sequence of optimisers.

For more general high-dimensional product graphs, again restricting ourselves first to the case of identical base graphs for simplicity’s sake, it is natural to ask if optimal sets are given again by appropriately chosen projections of G.

Question 7.2

Let H be a connected regular graph and for all \(j\in [t]\), let \(G^{(j)}=H\). Let \(G = \square _{j=1}^tG^{(j)}\). Given \(k\le t\), under what conditions on H is there a choice of vertices \(v_{k,1},\ldots ,v_{k,k}\) such that the minimal edge-boundary of a subset of size \(C^{t-k}\) in G is achieved by a set of the form

$$\begin{aligned} S_k = \{v_{k,1}\} \times \{v_{k,2}\} \times \cdots \times \{v_{k,k}\} \times V(H) \times V(H) \times \cdots \times V(H)? \end{aligned}$$

Furthermore, under what conditions on H can the vertices \(\{v_{k,j}:j\le k\}\) be chosen such that \(v_{k,j} = v_{k',j}\) for all \(k,k'\ge j\), so that the \(S_k\) form a nested family?

Finally, the vertex-isoperimetric problem has also been fully solved in the hypercube, see [41], where optimal sets are given by Hamming balls. It is less easy to give an explicit lower bound for the vertex-boundary of a set of size k as in Theorem 1.1, but roughly the vertex-expansion factor is a decreasing function of k, which is \(\Omega (d)\) for small sets and shrinks to \(\Omega \left( \frac{1}{\sqrt{d}}\right) \) for linear-sized sets. It would be interesting to determine if the solution to the vertex-isoperimetric problem in high-dimensional regular product graphs has similar asymptotic behaviour.

Question 7.3

Let \(C>1\) be an integer. For all \(j \in [t]\), let \(G^{(j)}\) be a \(d_j\)-regular connected graph with \(1<|V(G^{(j)})|\le C\). Let \(G=\square _{j=1}^tG^{(j)}\), let \(n{:}{=}|V(G)|\) and let \(d {:}{=}\sum _{j=1}^t d_j\).

  • Is it true that for all sets \(S \subseteq V(G)\) of size \(|S| \le \frac{n}{2}\), \(|N_G(S)| = \Omega \left( \frac{|S|}{\sqrt{d}}\right) \)?

  • How does the function \(\displaystyle \hat{i}_k(G) := \min _{S \subseteq V(G), |S|=k} \left\{ \frac{|N_G(S)|}{|S|} \right\} \) behave for general k?

7.2 Percolation in High-Dimensional Product Graphs

Moving on to the topic of percolation, as mentioned in the introduction, it has been shown [29] that for a large class of high-dimensional product graphs the phase transition that they undergo around the percolation threshold is quantitatively similar to that which occurs in the binomial random graph G(np), a phenomenon that has been observed in many other random subgraph models and which can be viewed as a sort of universality property of G(np). Using the standard notation of \(\tilde{\Theta }\) to denote the \(\Theta \) Landau notation while suppressing logarithmic factors, in this paper we show that as in the giant component of G(np), in percolation on a high-dimensional product graph with degree d and order n the typical mixing time of a lazy random walk on \(L_1\) is \(\tilde{\Theta }(d^2) = \tilde{\Theta }((\log n)^2)\), and the likely diameter of \(L_1\) is \(\tilde{\Theta }(d)= \tilde{\Theta }(\log n)\). From this point of view it is natural to ask what other parameters of these models, when appropriately scaled, resemble those in G(np). In particular, a well-known result of Ajtai, Komlós, and Szemerédi [2] states that whp a supercritical binomial random graph G(np) contains a path and cycle of length \(\Omega (n)\). Indeed, in a recent work, it was shown [27] that \(Q^d_{\frac{1}{2}+\epsilon }\) contains whp a Hamiltonian cycle. Finding a cycle spanning a linear fraction of the vertices in the case of a supercritical subgraph of the hypercube remains open. Note that [27] poses several questions about a typical maximum length of a cycle in \(Q^d_p\) for various regimes of \(p{:}{=}p(d)\).

Question 7.4

Let \(G = \square _{j=1}^tG^{(j)}\) be a product graph all of whose base graphs are connected, regular and of bounded order. Let \(d{:}{=}d(G)\), \(n{:}{=}|V(G)|\), \(\epsilon >0\) and let \(p= \frac{1+\epsilon }{d}\). Does \(G_p\) whp contain a cycle or a path of length \(\Theta (n)\)?

Remark 7.5

We note that finding a path of length \(\Theta (n)\) in \(Q^d_p\) implies the likely existence of a cycle of the same order of magnitude in \(Q^d_p\). Indeed, one can start by taking a path \(P_0\) of length \(\Theta (n)\) in the giant component of \(\left( Q_0\right) _p\), where \(Q_0\) is the subcube of \(Q^d\) obtained by fixing the first coordinate to be 0. Considering the projection of the first and last \(\frac{|P_0|}{10}\) vertices of this path into the subcube of \(Q^d\) obtained by fixing the first coordinate to be 1, \(Q_1\), one can utilise similar methods to Lemmas 5.1 and 5.2 to show that at least one of the first \(\frac{|P_0|}{10}\) vertices and one of the last \(\frac{|P_0|}{10}\) vertices of this path will belong whp to the giant component in \((Q_1)_p\), and thus will have a path connecting them, closing a cycle of length \(\Theta (n)\) with most of the vertices of \(P_0\). This argument generalises easily to a product graph all of whose base graphs are connected, regular and of bounded order.

Theorem 6(c) shows that \(L_1\) contains whp a cycle of length \(\Omega (nd^{-1}\log ^{-1}d)\), and by the comment after Theorem 5, up to the logarithmic factor in d, this result is the best possible that one can derive directly from the expansion properties of \(L_1\). It seems likely that to settle this question, even in the case of the hypercube, new methods will be required.

Finally, it would be interesting to determine whether the logarithmic factors in d that appear in our bounds for the asymptotic mixing time and the likely diameter are necessary, or whether they can be removed, thus mirroring the picture in the supercritical G(np), or improved. It is worth noting that unlike the application of the methods of [35] in G(tnp), in randomly perturbed graphs [54], and in pseudo-random graphs [31], the bottleneck on our bound on the mixing time here comes from our bound on the typical expansion of large connected subsets, rather than small subsets.