Suppose people walk into a room, one at a time. Their birthdays happen to be randomly distributed throughout the 365 days of the year (and no one was born on a leap day). The moment two people in the room have the same birthday, no more people enter the room and everyone inside celebrates by eating cake, regardless of whether that common birthday happens to be today.

On average, what is the expected number of people in the room when they eat cake?

*Extra credit:* Suppose everyone eats cake the moment three people in the room have the same birthday. On average, what is this expected number of people?

Suppose there are $n$ days in a year, so $n$ different possible birthdays. If there are $k$ people in the room when they eat cake, then this means the first $k-1$ people had different birthdays, and the $k^\text{th}$ person had a birthday that matched one of the first $k-1$ birthdays. The number of ways this can happen is:

\[

\underbrace{n (n-1) (n-2) \cdots (n-k+2)}_{k-1\text{ terms}} \cdot (k-1)

= \frac{n!(k-1)}{(n-k+1)!}

\]Note that we must have $1 \leq k \leq n+1$ because if $k$ is any larger than $n+1$, the first $k-1$ people cannot have distinct birthdays and the product above is zero. Meanwhile, the total number of birthday combinations is $n^k$. So the expected number of people in the room when they eat cake is given by:

$\displaystyle

E_2 = \sum_{k=1}^{n+1} \frac{n!\cdot k(k-1)}{(n-k+1)! \cdot n^k}

$

Unfortunately, this sum does not have a closed-form solution, but we can evaluate it numerically. When $n=365$, we find that the expected number of people in the room is $24.61659$.

This particular sum was studied by Ramanujan, and he found that the sum has an asymptotic expansion (as a function of $n$) that grows like

\[

\sum_{k=1}^{n+1} \frac{n!\cdot k(k-1)}{(n-k+1)! \cdot n^k}

\approx \sqrt{\frac{\pi n}{2}} + \frac{2}{3} + \frac{1}{12}\sqrt{\frac{\pi}{2n}} + \cdots

\]This approximation is quite good. In fact, when $n=365$, we have:

\[

\sqrt{\frac{\pi n}{2}} + \frac{2}{3} = 24.6112

\quad\text{and}\quad

\sqrt{\frac{\pi n}{2}} + \frac{2}{3} + \frac{1}{12}\sqrt{\frac{\pi}{2n}} = 24.6167

\]which are both close to the true value.

### Extra credit

When we look for 3 matched birthdays, the counting exercise becomes a lot more complicated. Let’s suppose that there are $\ell$ pairs of matched birthdays before the $k^\text{th}$ person enters the room. To count the number of ways this can happen, we have:

- $\binom{n}{\ell}$ ways to pick the $\ell$ days of the year that will have paired birthdays.
- $\binom{k-1}{2\ell}$ ways to pick the $2\ell$ people that have a paired birthday.
- $\binom{2\ell}{\underbrace{2,\dots,2}_{\ell\text{ times}}} = \frac{(2\ell)!}{2^\ell}$ ways to assign the $\ell$ pairs of birthdays to $2\ell$ people.
- $\underbrace{(n-\ell)(n-\ell-1)\cdots(n+\ell-k+2)}_{k-1-2\ell\text{ terms}} = \frac{(n-\ell)!}{(n+\ell-k+1)!}$ ways to assign unique birthdays to the rest of the first $k-1$ people.
- $\ell$ ways for the $k^\text{th}$ person to match one of the $\ell$ pairs.

In terms of sum ranges, we must have $1 \leq \ell \leq \frac{k-1}{2}$, since there must be at least one matched pair in the first $k-1$ people, and $2\ell \leq k-1$ because these paired birthdays cannot outnumber the total number of people. Therefore, the expected number of people in the room when there are 3 matched birthdays is:

\[

E_3 = \sum_{k=1}^{n+1}\sum_{\ell=1}^{\left\lfloor\frac{k-1}{2}\right\rfloor}

\binom{n}{\ell}

\binom{k-1}{2\ell}

\frac{k\ell\cdot (n-\ell)!(2\ell)!}{(n+\ell-k+1)!\cdot 2^\ell n^k}

\]With a bit of effort, we can expand the binomials and simplify some of the factorial terms, and we obtain

$\displaystyle

E_3 = \sum_{k=1}^{n+1}\sum_{\ell=1}^{\left\lfloor\frac{k-1}{2}\right\rfloor}

\frac{n!\,k!}{(\ell-1)!\,(k-1-2\ell)!\,(n+\ell-k+1)!\cdot 2^\ell n^k}

$

As you might expect, this expression does not have a closed form either, but again, we can evaluate it numerically, and when $n=365$, we obtain $88.7389$.

### Numerical verification

I wrote a simple Python script to simulate the game and see if we can confirm our findings above.

from random import randint
from operator import countOf
import numpy as np
N = 100000 # number of trials
n = 365 # number of days
m = 2 # we celebrate when there are m repeated birthdays
lengths = []
for trials in range(N):
x = []
for k in range(1,n+2):
z = randint(1,n)
if countOf(x,z) == m-1:
lengths.append(k)
break
else:
x.append(z)
a = np.mean(lengths) # empirical mean
sem = np.std(lengths) / np.sqrt(N) # standard error
print(f'The expected number of people is {a} ± {1.96 * sem:.3f} (95% confidence)')

Running the above with $m=2$, I obtained $24.55923 ± 0.075$.

Running the above with $m=3$, I obtained $88.69134 ± 0.204$. The plus/minus represents a 95% confidence interval. Both simulations agree with the analytical expressions derived above.