This Riddler puzzle is about the popular Secret Santa gift exchange game. Can we guess who our Secret Santa is?
The 41 FiveThirtyEight staff members have decided to send gifts to each other as part of a Secret Santa program. Each person is randomly assigned one of the other 40 people on the masthead to give a gift to, and they can’t give to themselves. After the Secret Santa is over, everybody naturally wants to find out who gave them their gift. So, each of them decides to ask up to 20 people who they were a Secret Santa for. If they can’t find the person who gave them the gift within 20 tries, they give up. (Twenty co-workers is a lot of co-workers to talk to, after all.) Each person asks and answers individually — they don’t tell who anyone else’s Secret Santa is. Also, nobody asks any question other than “Who were you Secret Santa for?”
If each person asks questions optimally, giving themselves the best chance to unmask their Secret Santa, what is the probability that everyone finds out who their Secret Santa was? And what is this optimal strategy? (Asking randomly won’t work, because only half the people will find their Secret Santa that way on average, and there’s about a 1-in-2 trillion chance that everyone will know.)
Here is my solution:
[Show Solution]
In order to generalize the problem, let’s assume there are $2n+1$ staff members in total, and each one is allowed to ask $n$ co-workers who they were Secret Santa for. The problem asks about $n=20$, but we’ll examine the general case here.
A key aspect of this problem is that there is no communication among co-workers. Each staff member independent seeks to discover their Secret Santa and can’t use information gleaned by other co-workers. This assumption actually simplifies the problem greatly; from the point of view of the individual asking questions, there are two classes of co-workers: (1) those we know nothing about and (2) those for whom we already know either their Secret Santa or who they are Secret Santa for. A strategy consists of a sequence of decisions where we must either query somebody from group (1) or somebody from group (2). The lack of collusion among co-workers means that each staff member must use the same strategy. Consequently, there aren’t actually that many different possible strategies! Let’s look at each one in turn.
Purely random strategy
The simplest strategy is to choose $n$ co-workers at random. Each co-worker has a $\tfrac{1}{2n}$ chance of being your Secret Santa, so the probability that you will find your Secret Santa is $\tfrac{1}{2}$, regardless of the number of people. Since everybody must use the same strategy, the probability of everybody finding their Secret Santa is:
\[
P_\text{rand} = \frac{1}{2^{2n+1}}
\]This gets small very quickly as $n$ increases. When $n=20$, we have $P_\text{rand} = 4.55\times10^{-13}$ (about 1 in 2 trillion, as mentioned in the problem statement). A very small probability indeed. The random strategy maximizes an individual’s chance of finding their Secret Santa, but it’s a horrible choice if the goal is to maximize the chance that everybody finds their Secret Santa.
Deterministic strategy
If not picking randomly, there is only one alternative: to ask the co-worker you gifted! Once they reveal who they gifted, then you ask that person, and so on. You can imagine a graph where each node is a staff member, and you draw an arrow connecting each Secret Santa to their target. Once this is done, each node has exactly one incoming arrow (from their Secret Santa) and one outgoing arrow (to their target). An example of such a graph is shown below.
Notice that the nodes form closed loops. There could be a single loop or multiple smaller loops, as shown above. Before we tackle the general case, let’s see what happens if the group is small.
Small group ($n=1$ or $n=2$): In the case $n=1$, there are three staff members, and they must necessarily be in a single cycle of length 3. In this case, we immediately know our Secret Santa; it’s the person we didn’t give a gift to! We don’t even need to ask any questions to know this. In the case $n=2$, there are five staff members. If they form a cycle of length 5, we can follow the Santa until there is a single co-worker left out. Since there can be no cycles of length 4, the co-worker that was left out must be our Secret Santa! Once again, we can deduce our Secret Santa with perfect accuracy even if we don’t directly ask them the question.
Larger group ($n\ge 2$): If the group is larger, the deductive arguments used previously no longer work because there are too many co-workers left over after we’ve asked all our questions. Each staff member will find their Secret Santa only if all the loops in the are sufficiently small. Since $n$ questions are asked, each loop must have at most $n+1$ members. It turns out counting big loops is easier than counting small ones. So the probability we’re after is:
\begin{align}
P_\text{determ}
&= \frac{\text{Assignments where each loop has at most }n+1\text{ nodes}}{\text{Total number of assignments}}\\
&= 1-\frac{\text{Assignments containing a loop with at least }n+2\text{ nodes}}{\text{Total number of assignments}}\\
&= 1-\frac{\sum_{k=n+2}^{2n+1}\left(\text{Assignments containing a loop with }k\text{ nodes}\right)}{\text{Total number of assignments}}
\end{align}The total number of assignments is the number of permutations of $\{1,\dots,2n+1\}$ such that there are no fixed points, since nobody is allowed to be their own Secret Santa (wouldn’t be much of a secret). This is precisely the number of derangements $D_{2n+1}$, a concept we discussed in the Lonesome King puzzle.
For the numerator, we must count how many ways we can have a loop of length $k$. Since $k \ge n+2$, there are at most $n-1$ nodes left over. Therefore there can be no more than one loop of length $k$. There is ${2n+1\choose k}$ ways of picking our loop, $(k-1)!$ ways of ordering the elements in the loop, and $D_{2n+1-k}$ ways of arranging the remaining nodes that weren’t part of the loop. Note that we don’t risk double counting in this manner because there aren’t enough nodes remaining to produce loops of length $k$ or greater. The final result is:
\[
P_\text{determ} = 1-\frac{\sum_{k=n+2}^{2n+1}{2n+1 \choose k}(k-1)!\,D_{2n+1-k}}{D_{2n+1}}
\]In the case $n=20$, this evaluates to
\begin{align}
P_\text{determ} &= \tfrac{97939699570365313025758987280981560791683250767}{307662419905587585654556899376849633804439730767}\\
&\approx 0.318335
\end{align}so a probability of about 31.8%. Not bad! Here is a plot of how the probability changes as a function of $n$.
As discussed above, the probability is equal to 1 when $n=1$ or $n=2$, since we can deduce the Secret Santa in those cases even if we ask the wrong co-workers. Curiously, the case $n=3$ (7 people) has a lower probability than the case $n=4$ (9 people). As $n\to\infty$, the probability tends to a finite limit. To find this limit, we can use the derangement formula $D_n = \left[\frac{n!}{e}\right]$, where $[x]$ is the nearest integer to $x$. We then have:
\begin{align}
P_\text{determ} &= 1-\frac{\sum_{k=n+2}^{2n+1}{2n+1 \choose k}(k-1)!\,D_{2n+1-k}}{D_{2n+1}} \\
&\approx 1-\frac{\sum_{k=n+2}^{2n+1}{2n+1 \choose k}(k-1)!(2n+1-k)!}{(2n+1)!}\\
&= 1-\sum_{k=n+2}^{2n+1}\frac{1}{k}\\
&\approx 1-\log(2)\\
&\approx 0.306853
\end{align}As $n\to\infty$, this approximation becomes exact and the limiting probability is $1-\log(2)$, or about 30.7%. This can be shown rigorously by using upper and lower bounds for the derangements; i.e. $\frac{k!}{e}-\tfrac12 \le D_k \le \frac{k!}{e}+\tfrac12$.
In the derivation above, we calculated the probability that everybody wins when everybody uses the deterministic strategy. If we’re interested in the probability that an individual wins using this strategy, we can observe that an individual wins as long as they don’t belong to the large chain of length $k$ in the summation above. So the probability that an individual wins is given by:
\begin{align}
P_{\text{determ}}^{\text{individual}} &= 1-\frac{\sum_{k=n+2}^{2n+1}{2n+1 \choose k}(k-1)!\,D_{2n+1-k}\tfrac{k}{2n+1}}{D_{2n+1}}
\end{align}Here is what a plot of this looks like as a function of $n$:
When $n > 2$, the probability of an individual winning is always less than 50%. When $n=20$, the probability that an individual wins is about 0.487805, or 48.8%. The limit as $n\to\infty$ can be computed as before and we obtain $\tfrac{1}{2}$ this time. So in conclusion, the deterministic strategy is slightly worse than random guessing from the individual’s perspective, but it yields the best possible chance of everybody finding their Secret Santa.
Hybrid strategy
The nice thing about the purely random strategy is that no matter what the assignment looks like, there is always some chance that everyone will find their Secret Santa. In fact, the chance is the same for every assignment. Unfortunately, that chance is very low. In contrast, the deterministic strategy’s success depends heavily on the assignment. In some cases it works for everybody and in other cases it only works for some.
In principle, one could design a hybrid strategy, e.g. using a deterministic strategy for some number of moves and then playing the remaining moves randomly. The potential of a hybrid strategy is that it gives every assignment a winning chance even if that chance is small. Unfortunately, such strategies are always worse than the purely deterministic one for this problem.
Nice job, however I get:
For n=1, P=0, and for n=2, P=5/11. I didn’t check the others.
Are you using D(0)=1?
You’re absolutely right. I used D(0)=0 by accident. I updated my post and plot. I decided to keep P=1 for the cases n=1 and n=2 because you can deduce your Secret Santa perfectly in these cases even if you don’t ask the correct co-worker (see my explanation in the updated post).
I think your P_determ equation is correct but for n = 20, it should evaluate to 0.318335.
From my Monte Carlo code, I obtained 0.318323 ± 0.000045 (95% confidence interval).
https://gist.github.com/ShiangYong/60efb97508282e23ba97fc16dae47935
Thanks, indeed you are correct. I accidentally used D(0)=0 in my code, when it should be D(0)=1. I updated my post and the plot.
I think your evaluation is missing the k=2n+1 term. n=2 should still evaluate to 100%, as link length of 5 is fully determined from first 3 permutations.
Yep. Fixed it! Check out the updated plot and post.
It seems that you have solved a different problem than asked, though likely the one that was intended.
“If each person asks questions optimally, giving themselves the best chance to unmask their Secret Santa…” (the original problem) is not the same as, “If each person asks questions optimally, so as to give the entire group the best chance to unmask all of the Secret Santas…” (the question you answered).
To find our own Secret Santa we can do better than guessing randomly… proving we have an optimal strategy is difficult, however.
– JZGYK
I agree — since we were asked about the probability that everybody wins, I was assuming this was the quantity to be maximized.
If we interpret the question literally and each individual greedily attempts to maximize the chance that they will find their own Secret Santa, I don’t think you can do any better than guessing randomly (assuming $n > 2$). If you think it’s possible to do better than randomly guessing, I’d love to hear your thoughts!
This is another reason I interpreted the question the way I did; because the literal interpretation, as far as I could tell, led to random guessing as the only possible choice and this isn’t an interesting scenario!
If you use the “follow the Santa” strategy, there is a 31.8% chance that everybody finds their Secret Santa, but there is actually a 48.8% chance that any one individual will find their Secret Santa. Therefore this strategy doesn’t beat random guessing, where the chance of winning is 50% for each individual. I think I’ll update my solution and include some of these comments.
There is another interpretation that leads to a perfect (but clearly unintended) strategy. Even though nobody can share the results of their questioning, if everyone in the office conspires to ask exactly one question, (and they each ask their own target), then by the end of the day each person will know who their secret Santa is because their secret Santa was the only one to ask them anything.
I came to the same results as everyone else: 31.8% for everybody using “Follow The Santa [FTS]”, 48.8% for me individually if I use FTS and 50% if I “guess randomly”. I note here that I only count a “win” if someone actually says, “I am your Secret Santa.”; there is always the possibility of guessing correctly if no one names me.
What struck me as odd about this was: “If I am guessing randomly, I might Follow The Santa by pure chance.” But this would give me a less than the 50% chance of winning that I would get if I guessed randomly. So what if I guess randomly but re-guess if I FTS? By throwing out the “bad” (less than 50%) approach of FTS, what is left must be something greater than 50%.
An example:
n = 3 is the first interesting case so let’s start there. There are 1854 derangements and the individual finds their Secret Santa 774/1854 = 41.75% of the time.
Consider an anti-Follow The Santa strategy ([aFTS]). We are player #1 and we ask the lowest numbered player who has not yet been named. That is, if we were the Secret Santa for player #4 then the lowest numbered player who has not yet been named is player #2 and we ask them who they bought for. If they say player #3 then we have named players #1, #2, #3, and #4 so the lowest numbered player who has not yet been named is #5, etc..
After 1 question, 3-FTS has 264 winning cases, 3-Rnd has 309 and 3-aFTS has 318.
After 2 questions, 3-FTS has 534 winning cases, 3-Rnd has 618 and 3-aFTS has 654.
After 3 questions, 3-FTS has 774 winning cases, 3-Rnd has 927 and 3-aFTS has 1038.
Assuming I have made no mistakes, 3-FTS = 41.8%, 3-Rnd = 50.0% and 3-aFTS = 56.0%
Proving that this is the optimal individual strategy is another thing entirely.
– JZGYK
On an unrelated note, whenever I try to use the less than or greater than symbols, your site misinterprets them.
ah neat — I guess I was assuming the staff members weren’t allowed to cooperate ahead of time (by agreeing on a specific ordering, for example). Looks like this sort of cooperation can lead to better-than-guessing strategies, as you pointed out!
Math symbols (and arbitrary equations, in general) can be displayed by using a dollar sign before and after the equation (LaTeX code). I think the issue with $<$ and $>$ is that these symbols are interpreted as html tags.
No collusion is required and I can get this result as the sole Santa-seeker. The results are unchanged if I make a list or if I simply choose randomly from people who are not named. But I don’t know that this is optimal in general, or even close.
– JZGYK
Nice writeup!
If you want to avoid using enormous numbers (say, because you are using a spreadsheet for the computations, or any finite-precision language), the number of derangements with a cycle of length $k>n/2$, $\binom{n}{k} (k-1)! D_{n-k}$, can be rewritten as $n!\frac{(k-1)!}{k!}\frac{D_{n-k}}{(n-k)!}=\frac{n!}{k}\frac{D_{n-k}}{(n-k)!}$, and the proportion of such derangements out of all possible derangements of size $n$ is $\frac{1}{k}\frac{n!}{D_n}\frac{D_{n-k}}{(n-k)!}$. So, the proportion of derangements on 41 items with a cycle of length (exactly) 22 is $\frac{1}{22} \frac{41!}{D_{41}} \frac{D_{19}}{19!} \approx 0.045455$.
The function $\frac{D_n}{n!}$ converges to $1/e$ quickly. The exact formula is $\frac{D_n}{n!} = \sum_{i=0}^n \frac{(-1)^i}{i!}$, and, if we accept the approximation $\frac{41!}{D_{41}} = e$ (the difference is far below machine precision), we can sum up the contributions of each of the different cycle lengths for $k=22,\ldots,41$ and get that the total proportion of derangements with a cycle of length at least 22 is approximately $0.681665$. Therefore, the probability that our strategy succeeds is approximately $0.318335$, which matches your approximation to the exact ratio you found.