This week’s Riddler Classic is a challenging counting problem about shared birthdays.
Suppose people walk into a room, one at a time. Their birthdays happen to be randomly distributed throughout the 365 days of the year (and no one was born on a leap day). The moment two people in the room have the same birthday, no more people enter the room and everyone inside celebrates by eating cake, regardless of whether that common birthday happens to be today.
On average, what is the expected number of people in the room when they eat cake?
Extra credit: Suppose everyone eats cake the moment three people in the room have the same birthday. On average, what is this expected number of people?
My solution:
[Show Solution]
Suppose there are $n$ days in a year, so $n$ different possible birthdays. If there are $k$ people in the room when they eat cake, then this means the first $k-1$ people had different birthdays, and the $k^\text{th}$ person had a birthday that matched one of the first $k-1$ birthdays. The number of ways this can happen is:
\[
\underbrace{n (n-1) (n-2) \cdots (n-k+2)}_{k-1\text{ terms}} \cdot (k-1)
= \frac{n!(k-1)}{(n-k+1)!}
\]Note that we must have $1 \leq k \leq n+1$ because if $k$ is any larger than $n+1$, the first $k-1$ people cannot have distinct birthdays and the product above is zero. Meanwhile, the total number of birthday combinations is $n^k$. So the expected number of people in the room when they eat cake is given by:
$\displaystyle
E_2 = \sum_{k=1}^{n+1} \frac{n!\cdot k(k-1)}{(n-k+1)! \cdot n^k}
$
Unfortunately, this sum does not have a closed-form solution, but we can evaluate it numerically. When $n=365$, we find that the expected number of people in the room is $24.61659$.
This particular sum was studied by Ramanujan, and he found that the sum has an asymptotic expansion (as a function of $n$) that grows like
\[
\sum_{k=1}^{n+1} \frac{n!\cdot k(k-1)}{(n-k+1)! \cdot n^k}
\approx \sqrt{\frac{\pi n}{2}} + \frac{2}{3} + \frac{1}{12}\sqrt{\frac{\pi}{2n}} + \cdots
\]This approximation is quite good. In fact, when $n=365$, we have:
\[
\sqrt{\frac{\pi n}{2}} + \frac{2}{3} = 24.6112
\quad\text{and}\quad
\sqrt{\frac{\pi n}{2}} + \frac{2}{3} + \frac{1}{12}\sqrt{\frac{\pi}{2n}} = 24.6167
\]which are both close to the true value.
Extra credit
When we look for 3 matched birthdays, the counting exercise becomes a lot more complicated. Let’s suppose that there are $\ell$ pairs of matched birthdays before the $k^\text{th}$ person enters the room. To count the number of ways this can happen, we have:
- $\binom{n}{\ell}$ ways to pick the $\ell$ days of the year that will have paired birthdays.
- $\binom{k-1}{2\ell}$ ways to pick the $2\ell$ people that have a paired birthday.
- $\binom{2\ell}{\underbrace{2,\dots,2}_{\ell\text{ times}}} = \frac{(2\ell)!}{2^\ell}$ ways to assign the $\ell$ pairs of birthdays to $2\ell$ people.
- $\underbrace{(n-\ell)(n-\ell-1)\cdots(n+\ell-k+2)}_{k-1-2\ell\text{ terms}} = \frac{(n-\ell)!}{(n+\ell-k+1)!}$ ways to assign unique birthdays to the rest of the first $k-1$ people.
- $\ell$ ways for the $k^\text{th}$ person to match one of the $\ell$ pairs.
In terms of sum ranges, we must have $1 \leq \ell \leq \frac{k-1}{2}$, since there must be at least one matched pair in the first $k-1$ people, and $2\ell \leq k-1$ because these paired birthdays cannot outnumber the total number of people. Therefore, the expected number of people in the room when there are 3 matched birthdays is:
\[
E_3 = \sum_{k=1}^{n+1}\sum_{\ell=1}^{\left\lfloor\frac{k-1}{2}\right\rfloor}
\binom{n}{\ell}
\binom{k-1}{2\ell}
\frac{k\ell\cdot (n-\ell)!(2\ell)!}{(n+\ell-k+1)!\cdot 2^\ell n^k}
\]With a bit of effort, we can expand the binomials and simplify some of the factorial terms, and we obtain
$\displaystyle
E_3 = \sum_{k=1}^{n+1}\sum_{\ell=1}^{\left\lfloor\frac{k-1}{2}\right\rfloor}
\frac{n!\,k!}{(\ell-1)!\,(k-1-2\ell)!\,(n+\ell-k+1)!\cdot 2^\ell n^k}
$
As you might expect, this expression does not have a closed form either, but again, we can evaluate it numerically, and when $n=365$, we obtain $88.7389$.
Numerical verification
I wrote a simple Python script to simulate the game and see if we can confirm our findings above.
from random import randint from operator import countOf import numpy as np N = 100000 # number of trials n = 365 # number of days m = 2 # we celebrate when there are m repeated birthdays lengths = [] for trials in range(N): x = [] for k in range(1,n+2): z = randint(1,n) if countOf(x,z) == m-1: lengths.append(k) break else: x.append(z) a = np.mean(lengths) # empirical mean sem = np.std(lengths) / np.sqrt(N) # standard error print(f'The expected number of people is {a} ± {1.96 * sem:.3f} (95% confidence)')
Running the above with $m=2$, I obtained $24.55923 ± 0.075$.
Running the above with $m=3$, I obtained $88.69134 ± 0.204$. The plus/minus represents a 95% confidence interval. Both simulations agree with the analytical expressions derived above.