You live on the volcanic archipelago of Riddleria. Your archipelago is connected via a network of bridges, forming one unified community. In an effort to conserve resources, the ancient Riddlerians who built this network opted not to build bridges between any two islands that were already connected to the community otherwise. Hence, there is exactly one path from any one island to any other island.

Each island contains exactly one volcano. You know that if a volcano erupts, the subterranean pressure change will be so great that the volcano will collapse in on itself, causing its island — and any connected bridges — to crumble into the ocean. Remarkably, other islands will be spared unless their own volcanoes erupt. But if enough bridges go down, your once-unified archipelagic community could split into several smaller, disjointed communities.

If there were N islands in the archipelago originally and each volcano erupts independently with probability p, how many disjointed communities can you expect to find when you return? What value of p maximizes this number?

Here is an example of a possible archipelago with $N=9$ islands:

Of course, there are many possible ways to connect the bridges between the islands, but it turns out, this does not actually change the answer to the question! A useful way of thinking about this problem is using the language of graph theory, where we imagine that the archipelago is a graph where the islands are the vertices and the bridges are the edges. Here is the problem, rewritten using graph terminology:

We are given a tree. Each vertex is removed with probability $p$ and when a vertex is removed, so are all incident edges. What is the expected number of connected components in the resulting forest?

We can now leverage a key property of forests:

\[

(\text{# of edges}) + (\text{# of connected components}) = (\text{# of vertices})

\]In particular, if a forest has $N$ vertices and $N-1$ edges, it must have a single connected component, i.e. it’s a tree! You can check this with the tree shown above; it has $9$ vertices and $8$ edges.

This property is very powerful because it tells us exactly what happens when we delete edges and/or vertices from a tree: deleting an edge increases the number of connected components by one, and if we delete an isolated vertex (a vertex with no incident edges), the number of connected components decreases by one.

We now have everything we need to solve the problem. We can write:

\begin{align}

&\mathbb{E}(\text{connected components}) \\

&= \mathbb{E}(\text{vertices})-\mathbb{E}(\text{edges}) \\

&= (N-\mathbb{E}(\text{vertices removed}))-(N-1-\mathbb{E}(\text{edges removed}))\\

&= 1+\mathbb{E}(\text{edges removed})-\mathbb{E}(\text{vertices removed})\\

&= 1+(N-1)\mathrm{Prob}(\text{one edge removed})-N\,\mathrm{Prob}(\text{one vertex removed})\\

&= 1+(N-1)(2p-p^2)-Np \\

&= (1-p)(1-p+Np)

\end{align}In the third-to-last line, we used linearity of expectation. Each vertex has the same probability of being removed ($p$), and each edge also has the same probability of being removed, which is the probability that either of its adjacent vertices of removed, or one minus the probability neither of its vertices are removed: $1-(1-p)^2 = 2p-p^2$.

You might be thinking: but the probability that any two edges disappear is correlated! And it also depends on the connectivity pattern! Correct! but this does not affect the expectation. Linearity of expectation holds for any sum of random variables, even when those variables are correlated.

So here is the final result:

$\displaystyle

\mathbb{E}(\text{# of communities}) = (1-p)(1-p+Np)

$

### Maximizing the number of communities

First of all, does this question even make sense? Let’s do a thought experiment. When $p$ is low, there are very few volcanoes, so the archipelago will not be disturbed much (will remain highly connected), so we should obtain a small number of communities. When $p$ is high, most islands disappear, so again we are left with a small number of communities. One might expect that with an intermediate value of $p$, we can maximize the number of communities. The formula we derived before is a concave quadratic function of $p$, and we can maximize it with respect to $p$ by setting the first derivative equal to zero (or completing the square). We obtain:

$\displaystyle

\underset{p}{\mathrm{max}}\,\, \mathbb{E}(\text{# of communities}) = \frac{N^2}{4(N-1)},

\qquad p_{\text{max}} = \frac{N-2}{2(N-1)}

$

As $N$ gets very large, the limiting behavior looks like:

$\displaystyle

\underset{p}{\mathrm{max}}\,\, \mathbb{E}(\text{# of communities}) \approx \frac{N}{4},

\qquad p_{\text{max}} \approx \frac{1}{2}

$

So for a very large archipelago, the probability of volcanic eruption that maximizes the number of communities is $\tfrac{1}{2}$, and the expected number of communities will grow linearly with the number of islands we had initially.

### What about distributions rather than expectation?

The natural question to ask after we’ve computed the expected number of communities is to ask about the *distribution* of the communities. In other words, what are the probabilities associated with obtaining communities of different sizes? It turns out this question doesn’t make any sense unless we know how the islands are connected. For example, consider the simple case of four islands, and two possible ways of connecting them:

In the graph on the right, if we remove vertex $A$, we will have $3$ communities remaining. But in the graph on the left, no matter which combinations of vertices we remove, it’s impossible to obtain $3$ communities. This simple example shows that the distribution of communities must depend on the particular way in which the vertices are connected. Amazingly, though, the expected number of communities is the same no matter how the graph is connected.

Neat! I reached the same answer with a very different method, considering the original spanning tree, picking an arbitrary node as the root, creating indicator random variables for each node (1 if it’s a root in the post-eruption spanning forest, zero if not), and using linearity of expectation. The only wrinkle is that you have to take special care with the root of the original tree. But it’s nice in that you can think only about the vertices. 🙂