This holiday-themed Riddler problem is about probability:

In Santa’s workshop, elves make toys during a shift each day. On the overhead radio, Christmas music plays, with a program randomly selecting songs from a large playlist.

During any given shift, the elves hear 100 songs. A cranky elf named Cranky has taken to throwing snowballs at everyone if he hears the same song twice. This has happened during about half of the shifts. One day, a mathematically inclined elf named Mathy tires of Cranky’s sodden outbursts. So Mathy decides to use what he knows to figure out how large Santa’s playlist actually is.

Help Mathy out: How large is Santa’s playlist?

Here is my solution:

[Show Solution]

This problem is short and sweet. The probability in question seems complicated, because there are so many different ways in which a song (or several songs) could repeat. This is made much easier by using complementarity. Namely:

\[

\mathrm{Prob}(\text{at least one song repeats}) = 1-\mathrm{Prob}(\text{no songs repeat})

\]And the probability that no songs repeat is simply the probability that all songs are different. This is much easier to calculate! Let’s call $p$ the probability that at least one song gets played twice and let’s say there are $n$ different songs on the playlist. Then we have:

\begin{align}

p &= 1-\mathrm{Prob}(\text{all }100\text{ songs are different})\\

&= 1-\frac{(\text{number of ways of picking }100\text{ different songs})}{(\text{number of ways of picking }100\text{ songs})}\\

&= 1-\frac{n(n-1)(n-2)\cdots(n-99)}{n^{100}} \\

&= 1-\binom{n}{100}\frac{100!}{n^{100}}

\end{align}Of course, this formula is only valid if $n\ge 100$. If there aren’t at least $100$ songs, then by the Pigeonhole Principle, a song getting picked twice is inevitable. The formula for the probability $p$ of hearing at least one repeated song as a function of the number of different songs $n$ is therefore:

$\displaystyle

p = \begin{cases}

1 & \text{if }1 \le n \le 99\\

1-\binom{n}{100}\frac{100!}{n^{100}} & \text{if }n \ge 100

\end{cases}

$

We can make a plot of this function to see how it changes as a function of $n$:

As expected, the probability remains at or near $1$ for awhile, and then drops once the number of songs on the playlist gets very large. Searching near $p=0.5$ as stated in the problem, the closest candidate $n$ values are:

\begin{align}

n=7174\quad&\implies\quad p=0.50002836 \\

n=7175\quad&\implies\quad p=0.49997983

\end{align}The closest to $p=0.5$ is attained by $n=7175$, so we’ll go with that!

I got the same answer, but I was surprised it was so high.

I really enjoy reading your solutions, Laurent. Happy holidays to you.

I arrived at this from the opposite end. After the first song plays, there is an $\frac{N – 1}{N}$ probability of a duplicate song playing; after the second, the probability of a duplicate song playing is $\frac{N-2}{N}$ and so on. Once $m$ songs have played, the probability of a duplicate song having played is thus $P_{dup} = \Pi_{k=0}^{m-1} \frac{N – k}{N}$. This doesn’t account for the fact that Grumpy throws a snowball at the radio if a duplicate plays, but that doesn’t matter for our purposes here.

It turns out that $\Pi_{k=0}^{m-1}(N-k)$ is defined as the “falling factorial” $(N)_m = N(N – 1)(N – 2) \cdots (N – (m – 1)$, which can be expressed in terms of the Gamma function as $(N)_k = \dfrac{\Gamma[N + 1]}{\Gamma[N – (m – 1)]}$. So our duplicate probability becomes $P_{dup} = \left(\frac{1}{N}\right)^m \dfrac{\Gamma[N + 1]}{\Gamma[N – m + 1]}$.

The nice thing about the Gamma function is that it retains its validity even for $N < m$ and will properly calculate the resultant zero probability of this case. If we want to simplify further, though, we can swap in factorials to end up with $P_{dup} = \left(\frac{1}{N}\right)^m \dfrac{N!}{(N – m)!}$ for $N \geq m$, which aligns with the complement of your result.

The slight downside here is that I had to rely on multi-precision arithmetic to actually compute these quantities (in Python) whereas a direct numerical evaluation of the product series gave the correct answer quickly and painlessly. Can't win them all.

My solution is like Laurent’s. I wrote code in Go to find the answer and run simulations to check it.

This is basically a twist on the oft-misunderstood birthday paradox, where it is mathematically proven that a random selection of 50 people has a 98%+ chance of having two people who share a birthday)

Since this requires such large numbers, I’ll pull up a spreadsheet and show my work later today or tomorrow. I think to solve this I’ll be adding the log10 of lots of numbers…

But the answer is x, where (x^100)/(x! / (x-100)!) = 0.5

This is exactly how I attacked the problem. Then, I used Stirling’s approximation to get rid of the factorials, and did some log manipulations to reduced it to a equation that was easily solved using a calculator and trial and error.