This week’s Riddler Classic is about finding the “luckiest” coin!
I have in my possession 1 million fair coins. I first flip all 1 million coins simultaneously, discarding any coins that come up tails. I flip all the coins that come up heads a second time, and I again discard any of these coins that come up tails. I repeat this process, over and over again. If at any point I am left with one coin, I declare that to be the “luckiest” coin.
But getting to one coin is no sure thing. For example, I might find myself with two coins, flip both of them and have both come up tails. Then I would have zero coins, never having had exactly one coin.
What is the probability that I will at some point have exactly one “luckiest” coin?
Here is my solution:
[Show Solution]
Let’s define $p_n$ to be the probability that we will end up with exactly one “luckiest” coin, given that we start with $n$ coins. Our terminal conditions are that $p_0=0$ and $p_1=1$, since we lose if we end up with zero and we win if we end up with one. We can define the rest of the $p_n$ recursively as follows. If we flip the $n$ coins and obtain $k$ heads, then the probability we will win is $p_k$. The probability of obtaining $k$ heads is $\binom{n}{k}\frac{1}{2^n}$. Therefore,
\[
p_n = \sum_{k=0}^n \frac{1}{2^n} \binom{n}{k} p_k\qquad\text{for }n=2,3,\dots
\]Note that $p_n$ appears on both sides of the equation, since it’s possible that we flip all Heads, and we remain with $n$ coins at the next turn. We can solve for $p_n$ by rearranging the equation, and we obtain the complete recursion:
$\displaystyle
\begin{aligned}
p_0&=1,\quad p_1=1,\\
p_n &= \frac{1}{2^n-1} \sum_{k=0}^{n-1}\binom{n}{k} p_k\qquad\text{for }n=2,3,\dots
\end{aligned}
$
Let’s start with a simple simulation to see what we’re dealing with. Here is what we get for $n$ up to 50:
Seems like we’re done! The sequences appears to settle quickly. A numerical evaluation shows the limit to be approximately 0.72135, or 72.1%.
Down the rabbit hole
Appearances can be deceiving! First, we can expect the number of coins to roughly drop by a factor of 2 at each turn, so we should expect that $p_n \approx p_{2n}$ for large $n$. This suggests that we should plot $n$ on a log scale. Let’s see what happens if we make the same plot as before, except this time go up to $2^{12} = 4{,}096$, and we ignore the initial large swings in values to just focus on the tail end.
Would you look at that! The plot tends to a sinusoid. And although the amplitude decays rapidly at first, it does not decay to zero! If you have been following the Riddler for some time, this may seem familiar to you. In fact, a very similar pattern of decaying sinusoids on a log scale that don’t quite decay to zero was observed in the puzzle of The lonesome king, published about five and a half years ago!
It might be possible to actually compute all the way up to $10^6$, but it would likely take quite awhile (on my laptop, anyway). Computing up to $n=5000$ already took over one minute. And the scaling is not linear! Computing each $p_n$ involves all previous $p_k$, so the computation scales like $n^2$. Moreover, the binomial coefficients become so large that computing them exactly would take a lot of time and memory, and approximating them would further complicate the solution.
To approximate the probability when $n=10^6$, the easiest way is to leverage the periodicity of the function and reduce the quantity of interest to one that is within the range of values we have already computed. We have that:
\[
p_{10^6} = p_{10^6/2^8} \approx p_{3096} \approx 0.72134685.
\]Granted, the limiting amplitude of the sinusoid is so small that even if we use our original guess of 0.72135, we will be correct to 5 decimal places for any $n$ sufficiently large. So at the end of the day, we don’t gain much by doing all this extra work.
As with the Lonesome King puzzle, we are once again faced with this puzzling phenomenon of marginally stable oscillations. There are several questions that seem natural to ask:
- Is there a closed-form expression for the limiting amplitude?
- Is there a closed-form expression for the limiting mean?
- Is the limiting waveform actually a true sinusoid?
To answer these questions, we have to go deeper…
Deeper down the rabbit hole
After more thought (and many great comments here and on Twitter), I decided to take another crack at the problem. I started with the approach used by Allen Gu, where he derived a non-recursive formula for $p_n$:
\[
p_n = \frac{n}{2}\sum_{t=0}^\infty 2^{-t} \left( 1-2^{-t} \right)^{n-1}
\]Define the inside function as
\[
g_n(t) = \frac{n}{2}2^{-t} \left( 1-2^{-t} \right)^{n-1}
\]Here is what $g_n(t)$ looks like for different values of $n$:
It appears that $g_n(t)$ gets shifted to the right as we increase $n$. The height and width of the peak appears to remain constant. To confirm this mathematically, we can solve $\frac{\mathrm{d}}{\mathrm{d}t}g_n(t) = 0$ and we find that the peak occurs at $t = \log_2(n)$. Since our plan is to sum $g_n(t)$ over all $t$ anyway, then we might as well shift it so that it is centered at its peak. If the offset from the peak is $\tau$, then the shifted function is:
\begin{align}
h_n(\tau) &= g_n(\log_2(n) + \tau) \\
&= 2^{-\tau-1} \left(1-\frac{2^{-\tau}}{n}\right)^{n-1}
\end{align}We can evaluate the limit as $n\to\infty$, and we see that:
\[
h(\tau) = \lim_{n\to\infty}h_n(\tau) = 2^{-\tau-1} e^{-2^{-\tau}}
\]It follows that in the limit of large $n$, the probability $p_n$ will take on all values in the range
\[
p(x) = \sum_{\tau=-\infty}^\infty 2^{-\tau-1+x} e^{-2^{-\tau+x}}\quad\text{for }x\in[0,1]
\]Of course, due to the translation invariance of the sum, this must be a periodic function. Here is what it looks like:
It looks very close to a sinusoid. However, it is not. If it were a true sinusoid, then $p(x)+p(x+\tfrac{1}{2})$ would be constant for all $x$. We can verify that this is not the case by doing an accurate numerical evaluation of the sums for two different values of $x$. We find:
\begin{align}
\tfrac{1}{2}(p(0.3)+p(0.8)) &\approx 0.72134752045108504298 \\
\tfrac{1}{2}(p(0.5)+p(1.0)) &\approx 0.72134752043912493759
\end{align}So they differ in the $11^\text{th}$ decimal digit.
Given that the function $p(x)$ is not a true sinusoid, we have to re-examine what we mean by “limiting value”. One way to define this would be the average value over one period. Thankfully, this integral can be evaluated exactly, and we obtain:
\begin{align}
\int_0^1 p(x)\,\mathrm{d}x &= \int_0^1 \sum_{\tau=-\infty}^\infty 2^{-\tau+x-1} e^{-2^{-\tau+x}}\,\mathrm{d}x \\
&= \sum_{\tau=-\infty}^\infty \int_0^1 2^{-\tau+x-1} e^{-2^{-\tau+x}}\,\mathrm{d}x \\
&= \sum_{\tau=-\infty}^\infty \frac{e^{2^{-\tau}}-e^{-2^{-(\tau-1)}}}{2\log (2)} \\
&= \lim_{M,N\to\infty} \sum_{\tau=-N}^M \frac{e^{2^{-\tau}}-e^{-2^{-(\tau-1)}}}{2\log (2)} \\
&= \frac{1}{2\log(2)} \left[ \lim_{M\to\infty} e^{-2^{-M}}-\lim_{N\to\infty}e^{-2^{N+1}} \right]\\
&= \frac{1}{2\log(2)}
\end{align}The bi-infinite sum is actually a telescoping series, which is pretty nice!
In conclusion, we showed that the limiting function is periodic (peak-to-peak is about $1.4\times 10^{-5}$), it is not a true sinusoid (but very close), and its mean value is exactly $\frac{1}{2\log(2)}$.
Rabbits all the way down
Our previous expression for the limiting distribution was:
\[
p(x) = \sum_{\tau=-\infty}^\infty 2^{-\tau+x-1} e^{-2^{-\tau+x}}\quad\text{for }x\in[0,1]
\]This is a periodic function, so why not find its Fourier series? To refresh your memory, since $p(x)$ is periodic with period $1$, we can write
\[
p(x) = \sum_{n=-\infty}^\infty c_n e^{2\pi i n x},\quad\text{where:}\quad
c_n = \int_0^1 p(x) e^{-2\pi i n x}\,\mathrm{d}x
\]Amazingly, we can evaluate $c_n$ in (somewhat) closed form:
\begin{align}
c_n &= \int_0^1 p(x) e^{-2\pi i n x} \\
&= \int_0^1 \sum_{\tau=-\infty}^\infty 2^{-\tau+x-1} e^{-2^{-\tau+x}} e^{-2\pi i n x}\,\mathrm{d}x \\
&= \sum_{\tau=-\infty}^\infty \int_0^1 2^{-\tau+x-1} e^{-2^{-\tau+x}} e^{-2\pi i n x}\,\mathrm{d}x
\end{align}Substitute $u = 2^{-\tau+x}$, so $\mathrm{d}u = u\log(2) \mathrm{d}x$, and $x=\tau+\log_2(u)$:
\begin{align}
c_n &= \sum_{\tau=-\infty}^\infty \frac{1}{2\log(2)}\int_{2^{-\tau}}^{2^{-\tau+1}} e^{-u}e^{-2\pi in(\tau+\log_2(u))}\,\mathrm{d}u \\
&= \frac{1}{2\log(2)} \sum_{\tau=-\infty}^\infty \int_{2^{-\tau}}^{2^{-\tau+1}} e^{-u}u^{-2\pi in/\log(2)}\,\mathrm{d}u \\
&= \frac{1}{2\log(2)} \int_{0}^{\infty} e^{-u}u^{-2\pi i n/\log(2)}\,\mathrm{d}u \\
&= \frac{\Gamma\bigl(1-\tfrac{2\pi i n}{\log(2)}\bigr)}{2\log(2)},
\end{align}where $\Gamma(z)$ is the gamma function. Therefore, the Fourier series of $p(x)$ is
\[
p(x) = \sum_{n=-\infty}^\infty \frac{\Gamma\bigl(1-\tfrac{2\pi i n}{\log(2)}\bigr)}{2\log(2)} e^{2\pi i n x}
\]Define $\alpha_n := \Gamma\bigl(1-\tfrac{2\pi i n}{\log(2)}\bigr) = |\alpha_n|e^{i \arg(\alpha_n)}$, and we can write the series in a more familiar way using standard trig functions:
\[
p(x) = \frac{1}{2\log(2)} + \frac{1}{\log(2)}\sum_{n=1}^\infty |\alpha_n| \cos\bigl( 2\pi n x+\arg(\alpha_n) \bigr)
\]There is actually a closed-form expression for the magnitude of the gamma function when the real part of the argument is an integer (see here). Specifically, we have $|\Gamma(1+ib)|^2 = \frac{\pi b}{\sinh(\pi b)}$ for all $b\in \mathbb{R}$. Therefore, we can write the Fourier series for $p(x)$ as:
\[
p(x) = \frac{1}{2\log(2)} + \sum_{n=1}^\infty \sqrt{ \frac{2 \pi^2 n \,\text{csch}\bigl(\tfrac{2 \pi ^2 n}{\log (2)}\bigr)}{\log^{3}(2)}
} \cos\bigl( 2\pi n x + \arg(\alpha_n) \bigr)
\]where $\text{csch}(z) := \frac{2}{e^z-e^{-z}}$ is the hyperbolic cosecant. The Fourier coefficients $|\alpha_n|$ decrease very rapidly with increasing $n$. In fact, we have:
\begin{align}
c_n &= \sqrt{ \frac{2 \pi^2 n \,\text{csch}\bigl(\tfrac{2 \pi ^2 n}{\log (2)}\bigr)}{\log^{3}(2)}} \\
c_1 &\approx 7.1301\times 10^{-6} \\
c_2 &\approx 6.60339\times 10^{-12} \\
c_3 &\approx 5.29624\times 10^{-18} \\
\end{align}As we increase $n$, we have $c_n \sim \exp\bigl(-\tfrac{n \pi^2}{\log(2)}\bigr) \approx 10^{-6.184 n}$. This is why the amplitude of $p(x)$ is so small, and why it looks so close to being a sinusoid when we plot it.
Using this information, we can return to the original problem and write out an exact expression for the steady-state function in terms of the original $n$ (number of coins). This yields the expression:
\[
p_n \to \frac{1}{2\log(2)} + \sum_{k=1}^\infty \sqrt{ \frac{2 \pi^2 k \,\text{csch}\bigl(\tfrac{2 \pi ^2 k}{\log (2)}\bigr)}{\log^{3}(2)}
} \cos\Bigl( 2\pi k \log_2(n) + \arg \Gamma\bigl(1-\tfrac{2\pi i k}{\log(2)}\bigr) \Bigr)
\]
The first few terms of the series are:
\begin{multline}
p_n \approx 0.72134752044448170368 \\
+ \left(7.130117\times 10^{-6}\right) \cos\bigl( 2\pi \log_2(n) +0.872711\bigr) \\
+ \left(6.603386\times 10^{-12}\right) \cos\bigl( 4\pi \log_2(n) +2.51702\bigr) + \dots
\end{multline}
Finally, here is the numerical data overlaid with the first harmonic approximation (only keeping the first term of the sum). Looks pretty good!
Equipped with this series, we can now evaluate the probability of finding a “luckiest coin” when there are $n=10^6$ coins. Here is a very accurate approximation (all digits significant)
\[
p_{1,000,000} \approx 0.7213539630726652117823954768
\]
Hi Laurent,
I solved it essentially the same way, but to get out to n=1,000,000 I evaluate the expression
which you write for p_n using the following two tricks:
1) past n=48 I don’t bother subtracting 1 in the denominator for p_n, which is beyond double-precision anyway
2) I calculate the sum term by term. For the n-choose-k coefficient, I calculate it by modifying
the previous coefficient. Every time this coefficient becomes large, I divide the sum-so-far,
the coefficient, and the 2^n factor in front by a common rescaling, to keep them within double-precision range.
Doing that I was able to get to n=1,000,000 where I found p = 0.721353963324
Regarding the oscillations — as you point out, we have that, approximately
p(2n) = p(n)
which ensures periodicity in log_2(n) (log base 2 of n).
Therefore one expects that, for large(ish) n,
p(n) = p_0 + p_1 cos(2 pi log_2(n) – phi_1) + p_2 cos(4 pi log_2(n) – phi_2) + …
let’s try to understand the corrections to this formula — how much does p(2n) differ from p(n)?
At the next-closest level of approximation, p(2n) depends not only on p(n) but on
a narrow range around p(n), with a Gaussian envelope according to the central limit theorem.
The variance (squared width) of the envelope is the variance of 2n coin flips — each
coin flip gives a variance of 1/4, so the total variance is sigma^2 = 2n/4 = n/2.
Therefore p(n’) is sampled in a neighborhood about n’=n with width sqrt(n/2).
Approximating p(n’) with a Taylor series about n’=n,
p(n’) = p(n) + (n’-n) p'(n) + (n’-n)^2/2 p”(n)
we find = 0 but <(n'-n)^2 = (n/2) the variance of the sampling distribution, so
p(2n) = p(n) + (n/4) p''(n)
where p'' means the second derivative, as usual.
The second derivative of the first cosine term gives
2 pi/(n^2 ln(2)) sin(…) – 4 pi^2/(n^2 ln^2(2)) cos(…)
The sin(…) term slightly shifts the phase phi_1, while the -4pi^2/(n^2 ln^2(2)) cos(…)
acts to damp the cosine. Multiplying by the n/4 coefficient in front, the damping is:
new coefficient = old coefficient ( 1 – pi^2 / n ln^2(2) )
Since the damping has a 1/n factor and n doubles with each step, the damping rather quickly
becomes irrelevant — it's order-1 for n = pi^2/ln^2(2) = 20
but it's small for, say, n=100.
On the other hand, the next term in the Fourier series has a damping coefficient
with 4 times the size, so it gets damped strongly for two more oscillations.
This is why I expect that the true form is nearly but not perfectly sinusoidal.
My figure, going out to n=1,000,000, is at:
https://theorie.ikp.physik.tu-darmstadt.de/~guymoore/luckycoin.ps
This paper was mentioned in the Riddler’s Lonesome King solution:
https://arxiv.org/pdf/1507.03805.pdf
Quoting from the first paragraph on p.2: “Suppose we have n coins, each of which lands heads up with probability p. Flip all the coins independently and throw out the coins that show heads. Repeat the procedure with the remaining coins until 0 or 1 coins are left. The probability of ending with 0 coins does not converge as n → ∞ and becomes asymptotically periodic and continuous on the log n scale [6, 11].”
Furthermore, cite 11 gives the limiting mean (assuming I’m reading it correctly) as -p / ln(1-p). In our case, that’s 1/ln(4) = 0.72134752044….
https://www.jstor.org/stable/2325134
Thanks! — I didn’t quite understand the approach of the paper you linked, but I played around with the equations a bit more and I found another way to arrive at the result. I added a new section to the blog post.
For a quick “derivation” of the limiting mean:
We can equivalently compute the probability that, if we flip the coins until they are all tails, there is a single coin that survives the longest. For a specific coin out of k total coins, the probability that it survived for n flips (i.e. got n heads followed by a tails) and was the longest surviving coin is:
(1/2)^(n+1) * (1 – (1/2)^(n))^(k – 1)
And so we have a closed form as an infinite sum:
p(k) = k * sum_(n = 1)^(infty) (1/2)^(n+1)* (1 – (1/2)^(n))^(k-1)
If we replace the sum with an integral, taking the limit as k -> infty gives the 1/ln(4) number referenced elsewhere.
Of course, “derivation” is in quotes because the “replacing the sum with an integral” part isn’t really justified.
Right — I saw a similar argument was posted by Allen Gu. I don’t think one can simply replace the sum by an integral.
I added a new section to my write-up, where I prove that the limiting periodic function is not a true sinusoid (but it’s very close), and its asymptotic average value is exactly $\frac{1}{2\log(2)}$.
I went out to 5M and found the same curve you did for the oscillating behaviour and a 1 + 1/n behaviour for the magnitude of the sine.
https://github.com/davidlewistoronto/riddler220218/blob/main/plots.pdf
I got Allen Gu’s formula for p_n with a generating functions approach: using the recurrence, we can prove that P(x) = sum ((1 – p_(n+2)) x^n / (n+2)!) satisfies 4P(2x) = (e^x – x – 1)/x^2 + P(x)e^x, and we let P(x) = Q(x)e^x/x^2 to get Q(2x) – Q(x) = e^(-2x) (e^x – x – 1). Now it is easy to find a power series solution and equate coefficients.
Consider S(N) = sum (2/n)*p_n from n=1 to N. Assuming p = lim(p_n) exists, by the harmonic series we should have S(N)/log(N) -> 2p. On the other hand, with the formula we have a telescoping series giving S(N) = sum (1 – (1 – 2^(-k))^N) from k = 0 to infinity. But for large N, the summand is essentially 1 for k < log_2(N) and essentially 0 for k > log_2(N). [Around k = log_2(N), the sequence will be approximately: …, 1 – 1/e^4, 1 – 1/e^2, 1 – 1/e, 1 – 1/e^(1/2), 1 – 1/e^(1/4), …] So the sum is close to log_2(N) and certainly we will have S(N)/log(N) -> 1/log(2).
Hence 2p = 1/log(2) so in some sense the limit of the average value of p_n is indeed 1/(2log(2)).