This week’s Riddler Classic concerns a paradoxical coin-flipping game:
You have two fair coins, labeled A and B. When you flip coin A, you get 1 point if it comes up heads, but you lose 1 point if it comes up tails. Coin B is worth twice as much — when you flip coin B, you get 2 points if it comes up heads, but you lose 2 points if it comes up tails.
To play the game, you make a total of 100 flips. For each flip, you can choose either coin, and you know the outcomes of all the previous flips. In order to win, you must finish with a positive total score. In your eyes, finishing with 2 points is just as good as finishing with 200 points — any positive score is a win. (By the same token, finishing with 0 or −2 points is just as bad as finishing with −200 points.)
If you optimize your strategy, what percentage of games will you win? (Remember, one game consists of 100 coin flips.)
Extra credit: What if coin A isn’t fair (but coin B is still fair)? That is, if coin A comes up heads with probability p and you optimize your strategy, what percentage of games will you win?
Here is my solution:
[Show Solution]
At first, the problem appears to be asking the impossible. How can flipping fair coins over and over lead to anything other than winning half the time? The key here is that “winning” only requires getting a positive score. If we looked at the “expected total score” after 100 flips, this would be 0 no matter what strategy was used (you can see this by applying linearity of expectation). But since we’re counting number of positive outcomes rather than total score, it’s possible to win more than half of the time on average.
As a simpler example of how this can happen, consider a game where you roll a die. If you get a 1, you lose \$5. But if you get anything else, you win \$1. The expected score in the game is $\tfrac{1}{6}(-5) + \tfrac{5}{6}(+1) = 0$. However, if “winning” simply means obtaining a positive score, then we win $\tfrac{5}{6}$ of the time. The flipping game is similar; our total score can average out to zero, even though it ends up being positive more than half of the time. Practically speaking, this means that when we do lose, we lose big.
Probability of winning assuming optimal play
Let’s suppose we are doing $N$ flips, and we are choosing between:
- Coin $A$: wins $+1$ with prob $a$ and wins $-1$ with prob $1-a$.
- Coin $B$: wins $+2$ with prob $b$ and wins $-2$ with prob $1-b$.
Our task is to determine the percentage of games that we will win if we optimize our strategy. One way to approach this is using dynamic programming. The basic idea is that our best move at any point in the game should only depend on (1) our current score and (2) how many moves we have left. Therefore, we define:
\[
V_t(s) = \left\{\begin{aligned}
&\text{Probability that we win if we play optimally}\\
&\text{from here on out, given that our current score}\\
&\text{is }s\text{ and we are currently at turn }t\text{ of }N
\end{aligned}\right.
\]We can view each $V_t$ as a vector indexed by $s \in \{\dots,-1,0,1,\dots\}$, which is our current score. The idea is to solve for each $V_t$ by working our way backwards from the final step until we reach $V_0$, which is our probability of winning for each different starting score. Ultimately, the question is asking us to compute $V_0(0)$.
The terminal distribution is simply given by our probability of winning given that we have achieved a given final score. The result in this case is simple, since at that point there are no moves left to make:
\[
V_N(s) = \begin{cases}
0 & \text{if }s \leq 0\\
1 & \text{if }s > 0
\end{cases}
\]At each previous step, we pick the coin that will give us our highest chance of winning. This leads to the recursive formula for $t=n-1,\dots,2,1$:
\[
V_{t-1}(s) = \max\left\{ a V_t(s+1)+(1-a) V_t(s-1),\, b V_t(s+2)+(1-b) V_t(s-2) \right\}
\]I couldn’t find a way to evaluate this expression in closed form, but it is rather straightforward to evaluate it numerically. Here is efficient Python code that evaluates $V_0(0)$ for any choice of $(a,b,N)$.
import numpy as np # probability of winning if coin A has prob a of +1 and (1-a) of -1, # coin B has prob b of +2 and (1-b) of -2, and we do N flips in total def compute_win_prob( a, b, N ): # shift the indices so that a score of "s" is at index v[s+2*N] v = np.zeros(4*N+1) # initialize (time=N) v[:2*N] = 0 v[2*N+1:] = 1 # overwrite results as we recurse backwards to time=0 # this code is efficient: uses vectorization and overwrites past results for t in range(N): coin_A = a*v[3:4*N] + (1-a)*v[1:4*N-2] coin_B = b*v[4:4*N+1] + (1-b)*v[0:4*N-3] v[2:4*N-1] = np.maximum(coin_A, coin_B) # return the probability of winning assuming we start with a score of 0 return v[2*N]
When we run this script using $a=b=0.5$ and $N=100$, we obtain a probability of winning of about $0.6403$. We can get the exact answer as well using the following Mathematica code:
ComputeWinProb[a_,b_,n_]:=( v = ConstantArray[0,4n+1]; v[[2n+2;;4n+1]] = 1; For[t=0, t<n, t++, CoinA = a v[[4;;4n]] + (1-a) v[[2;;4n-2]]; CoinB = b v[[5;;4n+1]] + (1-b) v[[1;;4n-3]]; v[[3;;4n-1]] = MapThread[Max,{CoinA,CoinB}]; ]; v[[2n+1]]) ComputeWinProb[1/2,1/2,100]
This gives us the exact result:
\[
\text{Prob}(\text{win})=\tfrac{811698796376000066208208781649}{1267650600228229401496703205376} \approx 0.640317
\]
Does doing more flips help?
We can win 64% of the time if we do 100 flips, but What if we did 1000 flips, or more? Could we keep increasing our chances of winning? It doesn’t appear to be the case. As we do more and more flips (increase $N$), the probability of winning increases slightly, but only up to a limit. To see this, I solved the problem for $N$ ranging up to 100,000 flips. Here is what I found:
Given how slowly the function is growing, it appears that there should be a finite limit, and that this limit is around $\frac{2}{3}$ (the last point on the plot is at $0.6658$). But is there another way we can compute this limit?
Limiting behavior for a large number of flips
The optimal strategy for the case $a=b=\tfrac{1}{2}$ is to flip coin A when our score is positive and to flip coin B when our score is negative. How often will we win the game as $N\to\infty$ if we use this strategy?
We can think of the coin-flipping problem in the context of random walks. It is well-known that random walks on the 1D line (with equal probability of moving left or right at each step) will visit every integer an infinite number of times, given enough time. It is also known that the expected number of steps before a walk first returns to its starting point is infinite. Therefore, as $N$ gets large, we can expect that the the score will spend only an infinitesimal fraction of the time near 0.
We can model this process as a Markov chain with states:
\[
\dots,-8,-6,-4,-2,0,1,2,3,4,\dots
\]When our score is $\leq 0$, we will flip coin B and move $\pm 2$. When our score is $\gt 0$, we will flip coin A and move $\pm 1$. Note that the states $-3,-5,-7,\dots$ can never be reached because we only move by $\pm 2$ when our score is negative. Based on the random walk argument above, we can expect the Markov chain to spend most of its time with a large-magnitude score (either on the positive or negative side).
As an approximation, we can imagine that once we hit $+2$, we will spend a large amount of time on the positive side (with a winning score) before we return to $+1$. Likewise, once we hit $0$, we will spend a large amount of time on the negative side (with a losing score) before we return. Since the positive and negative halves of this infinite Markov chain have identical probability distributions, we can expect the time spent on both halves to be equally large. We model this situation by truncating the Markov chain and assigning a transition probability of $\gamma \gg 0$ of remaining in a winning or losing state once we get there. Here is the resulting truncated Markov chain:
The transition matrix for this Markov chain is:
\[
A = \begin{bmatrix}
\gamma & \frac{1}{2} & 0 \\
0 & 0 & 1-\gamma \\
1-\gamma & \frac{1}{2} & \gamma
\end{bmatrix}
\]The steady-state distribution (right eigenvector corresponding to the eigenvalue of $1$) is given by: $\begin{bmatrix} \tfrac{1}{2} & 1-\gamma & 1\end{bmatrix}$. As anticipated, as $\gamma \to 1$, less and less probability is associated with the intermediate state 1. In the limit, the fraction of time spent in the winning state is
\[
\text{Prob}(\text{win as }N\to\infty) = \frac{1}{\tfrac{1}{2}+1} = \frac{2}{3}
\]So this justifies why we found the probability of winning tending to $\tfrac{2}{3}$ in the numerical simulation above.
As an aside, @electricvishnu with some help from @DarkAdonisSA found a formula that computes the probability of winning the game as a function of $N$. This formula is:
$\displaystyle
\text{Prob}(\text{win in }N\text{ steps}) = \frac{2}{3} + \frac{(-1)^N}{2^N}
\left[ \frac{1}{3} + \sum_{k=1}^{N-1} (-1)^k \binom{k}{\lfloor k/2 \rfloor} \right]
$
We don’t have a proof of formula as of yet, but it does produce the exact answer when $N=100$, and it also clearly reveals the limit of $\tfrac{2}{3}$ since the second term tends to zero as $N\to\infty$. There does not appear to be a meaningful way to simplify this formula either.
Changing the probabilities
We can also examine what happens when we try varying the probabilities away from $a=b=\tfrac{1}{2}$. Using the Python function shown above, this is a straightforward exercise. Here is what we obtain when we assume coin B is fair but coin A is not fair:
As expected, when coin A has a low probability, the optimal strategy is to flip coin B a lot more, which just leads to a probability of 0.5 of winning. As coin A’s probability increases, as do our chances of winning. We can also see what happens as we increase either $a$ or $b$ (neither coin is a fair coin). Here is what we obtain:
Once again, I struggled to find closed-form expressions for any of the above cases ($a$ and/or $b$ varying). Even for simpler instances, the solutions get complicated in a hurry. For example, the probability of winning as a function of $a$, with $b=\tfrac{1}{2}$, when the game only lasts 4 flips is:
\[
\text{Prob}(\text{win}) = \begin{cases}
\frac{1}{8}(a+4) & 0 \leq a \leq \tfrac{1}{2} \\
-\frac{1}{8} \left(16 a^4-32 a^3+20 a^2-11 a-1\right) & \tfrac{1}{2} \leq a \leq \gamma \\
-\frac{1}{2} a \left(6 a^3-11 a^2+6 a-3\right) & \gamma \leq a \leq 1
\end{cases}
\]where $\gamma \approx 0.7328$ is one of the roots of $8\gamma^3-4\gamma^2-1=0$. (this was found using the Mathematica function I included above). Each time we increase the number of flips, the number of parts in the piecewise polynomial can potentially double. So it seems highly unlikely that we’ll have a simple closed-form expression by the time we arrive at $N=100$!
I have some intuition for the 2/3 asymptote. It seems to me you should place the small bet if and only if you have positive points. I don’t have a rigorous proof of this, but I was able to show it programatically: https://github.com/joshuameisel/joshuameisel.github.io/blob/master/code/flipping_game.ipynb
The hand-wavy reasoning is that if you’re ahead, if you can stay ahead for the rest of the game you win, which is maximized by betting small. If you do end up back at 0, you’re essentially resetting. When you’re losing, you need to eventually recoup your losses and “reset,” and the best way to guarantee this happens is by betting big.
With this strategy you start off betting big for the amount of time it takes a random walk to reach 1. At this point you will have 2 points and start betting small, for the amount of time it takes a random walk to reach -2, at which point you start over betting big with 0 points again. The expected amount of time betting small is sort of twice the expected amount of time betting big, as you need to reach -1 in your random walk and then -1 again. This again is hand-wavy since both expectations are actually infinite, but maybe hints at why you’re twice as likely to be betting small (i.e. winning) at the end.
Here’s my write-up. I give a proof that you should bet small only when you’re ahead (curious if there’s a simpler proof!) and show that the solution is P(M_100 != 0 mod 3) where M_n is the running maximum of a random walk.
I also give an alternative proof of the 1/3 limit, which I think is a bit more rigorous (the object is technically not a Markov chain. For instance, you have a 1/2 chance of initially returning to 1 but this decreases over time).
https://joshuameisel.github.io/Flip-To-Victory/
Nice! I actually edited my solution a couple days ago to include a proof of the 2/3 limit by treating it as a Markov chain. Though your proof may be more rigorous than mine…
I did this with pen and paper for up to 9 flips, and some interesting patterns emerged. As for strategy, in the vast majority of cases it does not matter if you choose coin A or coin B. The only times it matters:
If points>1 and flips+points = 1 (mod 2) you need to flip coin A
If points=0 point with possibility converging towards 2/3 AND ending at =0 points with possibility =N/2). The first is probably a classic proof about random walks.
Both of these paths hinge on proving W_N(1) => 2/3. Joshuas observation about random walks seems to be useful for this purpose
Hi Laurent,
So instead of looking for the chance of WINNING, look instead for the chance of NOT LOSING.
This is the same as assigning a value of 1, rather than 0, to ending with a score of 0.
Then for a game of length L, the outcome, when both coins are fair, is:
L=0: 1
L=1: 1 – 1/2
L=2: 1 – 1/2 + 1/4
L=3: 1 – 1/2 + 1/4 – 1/8
L=4: 1 – 1/2 + 1/4 – 1/8 + 1/16
L=5: 1 – 1/2 + 1/4 – 1/8 + 1/16 – 1/32 = sum(i=0..L) (-1)^i / 2^i
and the series exponentially converges to 2/3.
The probability to draw goes to zero as the length of the game gets longer
(though not exponentially, only as a half-power), so the chance to win converges
towards the chance to not lose. Which converges rapidly towards 2/3.
Now we just need a proof, rather than a mere observation, of the above behavior.
I posted an update to include an explanation for why the limit as $N\to\infty$ is $\tfrac{2}{3}$. I also included a formula that was discovered and posted on Twitter. We still don’t have a proof for it, though it does match the analytical expression at $N=100$ exactly so it’s probably correct.
An intuitive and almost rigorous argument for the 2/3 limit is this: imagine that you have infinite tosses. On average, you spend twice as much time with a positive score as you do with a negative one (since by tossing A you’re effectively moving half as fast as you do when tossing B). Now think of the end time as randomly distributed. Then it should have twice as much probability of catching you when you’re positive as when you’re negative, so you’re probability of winning is 2/3 (there’s still a bit of work to do to convert it to a statement about a fixed time but it’s close).
@Bora, I don’t think this argument works. Just because you’re moving “twice as fast” when you’re negative doesn’t necessarily mean you spend half as much time there.
Consider for example the probability of returning to your starting point for the first time after $2m$ flips. This is known as a “first return” in random walks, and it turns out to be $\tfrac{1}{2^{2m}(2m-1)}\binom{2m}{m}$. This probability does not depend on whether you’re flipping coin A (moving $\pm 1$) or flipping coin B (moving $\pm 2$); it’s the same in both cases!
The reason for the discrepancy in time comes from what happens near zero. When we transition from a losing score to a winning score, we jump to $+2$ (so we have to lose the next two flips in a row before we have a losing score again). But when we transition from a winning score to a losing score, we jump to $0$ (so we only have to win the next flip before we have a winning score again). This is the reason why we end up spending more time on the winning side. It has nothing to do with the fact that we take bigger jumps on one side.
Hi Laurent,
I know your work through the Riddler puzzles, and find it very impressive.
However, I am wondering if I am missing something, or you are making one incorrect claim above.
If Coin A has 0 probability of being heads, then you only flip coin B.
Coin B will give 50 heads (losing) with a probability of about .08 . (you can compute it exactly or get a good estimate of sqrt(2/pi) using Stirling’s approximation of the factorial.
This means you only win about 46% of the time when flipping a single fair coin.
It seems that after 100 flips of a single fair coin, a score of +N is just as likely as a score of -N. However, since you only win if the score is positive, the score of 0 is like the green space on a roulette wheel. It ensures that, in the long run, the house always comes out ahead.
Hi Aaron,
The problem with your argument is that, if you are winning by enough, it’s best to flip the A coin EVEN IF it always loses. If you have M flips left to make, and your score is at least M+1, then you can only flip coin A for the rest of the time and win with 100% certainty, rather than risk that a series of bad flips will push you negative.
The simplest example is if the game is only 2 flips long.
If you always flip the 2-point coin, you have: 1/4 chance to end with +4, 1/2 chance to end with 0, and 1/4 chance to end with -4, total (1/4) chance to win.
If you flip each coin once, you have (1/2) chance to end with +1 and (1/2) chance to end with -3. So you win 1/2 of the time. Which is clearly better, even though the expected score is worse.
There is a very nice closed form expression for the probability of winning with if you start with a score of 1 rather than a score of 0.
p = 1 – 1/2 + 1/4 – 1/8 + 1/16 – 1/32 … = 2/3
The Nth partial sum is the probability of winning with N turns.
Not completely sure why yet. It’s a much nicer expression than the one starting at 0, but they converge to the same thing. Trying to figure out why, and how to map one on to the other.
Just realized my solution is pretty much the same as Guy Moore above.
The solution for starting at 0 ends up looking like
p = 1/2 + 0/4 – 0/8 + c1/16 – c1/32 + (c1+c2)/64 – (c1+c2)/128 + (c1+c2+c3)/256 – ….
where c_i are the Catalan numbers. This would make sense as being related to the number of ways of getting back to zero.
Just realized my solution is pretty much the same as Guy Moore above.
The solution for starting at 0 ends up looking like
p = 1/2 + 0/4 – 0/8 + c1/16 – c1/32 + (c1+c2)/64 – (c1+c2)/128 + (c1+c2+c3)/256 – ….
where c_i are the Catalan numbers. This would make sense as being related to the number of ways of getting back to zero.
So here is a different way of approaching the problem (where each coin is fair), and a proof that the chance to win or tie, after f flips, is
(2/3) + (-1/2)^f / 3
We adopt the optimal strategy of flipping for 2 points if our current score is 0 or less, and flipping for 1 point if our score is positive. Rather than work from the end backwards, I will work from the first flip forward, and ask:
After f flips, how many ways can I end up with a score of s? I call the answer W(f,s)
The probability to have a score of s after f flips is then W(f,s) / 2^f.
The initial conditions are W(0,0)=1, W(0,s)=0 if s is not 0.
Our strategy tells us that
A) W(f,-2m+1) = 0 you never get an odd negative score,
B) W(f+1,-2m) = W(f,-2m+2) + W(f,-2m-2) for all positive integers m
C) W(f+1,0) = W(f,1) + W(f,-2) (You get to 0 either from +1 or from -2)
D) W(f+1,1) = W(f,2) the ONLY way to get to score=1 is to come down from score=2
E) W(f+1,2) = W(f,3) + W(f,1) + W(f,0) (you get to score=2 from 3, from 1, OR from 0)
F) W(f+1,m+2) = W(f,m+1) + W(f,m+3) for positive m.
We now prove the following inductively:
1) W(f+1,1) = W(f,2)
2) W(f+1,-2m) = W(f,m) for m >= 2
3) W(f,0) = W(f,-2) + (-1)^f
Property 1 is the same as D)
Property 2 follows inductively very easily for m >= 3. For m=2, we note that:
W(f+1,-4) = W(f,-6) + W(f,-2) by B)
= W(f-1,3) + W(f-1,-4) + W(f-1,0) by 2) and B)
= W(f-1,3) + W(f-2,2) + W(f-1,0) by 2)
= W(f-1,3) + W(f-1,1) + W(f-1,0) by 1)
= W(f,2) by E)
which completes the inductive proof of B).
To prove C), note that
W(f+1,0) = W(f,1) + W(f,-2)
while
W(f+1,-2) = W(f,-4) + W(f,0) = W(f-1,2) + W(f,0) = W(f,1) + W(f,0)
The coefficients switch places and increase by W(f,1).
Since W(0,0) starts out 1 larger than W(0,-2), they will alternate being 1 larger,
completing the proof of C)
Now consider the total number of ways to get a nonnegative score, MINUS TWICE the number of ways to get a negative score:
A(f) = sum(m=2,..) W(f,m) + W(f,1) + W(f,0) – 2 W(f,-2) – 2 sum(m=2,..) W(f,-2m)
= ( Probability to win or tie – 2*probability to lose ) * 2^f
For shorthand, write Probability to win or tie as Pwt and probability to lose as Pl = 1 – Pwt.
Now let’s massage the form of A(f):
Replace W(f,2) = W(f,4) + W(f,1) + W(f,0) and W(f,m>2) with W(f-1,m-1) + W(f-1,m+1):
sum(m=2,..) W(f,m) = 2 sum(m=2,..) W(f-1,m) – W(f-1,2) + W(f-1,1) + W(f-1,0)
Because of B), 2 sum(m=2,..) W(f-1,m) = 2 sum(m=2,..) W(f,-2m) so these cancel, leaving
A(f) = – W(f-1,2) + W(f-1,1) + W(f-1,0) + W(f,1) + W(f,0) – 2 W(f,-2)
But W(f-1,2) = W(f,1) by 1), so these terms cancel:
A(f) = W(f-1,1) + W(f-1,0) + W(f,0) – 2 W(f,-2)
Use W(f-1,1) = W(f-2,2) = W(f-1,-4) from 1) and 2):
A(f) = W(f-1,-4) + W(f-1,0) + W(f,0) – 2 W(f,-2)
But W(f-1,-4) + W(f-1,0) = W(f,-2) by B), so I get
A(f) = W(f,0) – W(f,-2) = (-1)^f by C)
2^f (Pwt – 2*Pl) = (-1)^f
But Pl = 1 – Pwt, so
3 Pwt – 2 = (-1)^f/2^f
Pwt = (2/3) + (-1/2)^f / 3
The probability to win or tie converges to (2/3) exponentially, with alternating corrections.