This week’s Riddler Classic concerns a paradoxical coin-flipping game:
You have two fair coins, labeled A and B. When you flip coin A, you get 1 point if it comes up heads, but you lose 1 point if it comes up tails. Coin B is worth twice as much — when you flip coin B, you get 2 points if it comes up heads, but you lose 2 points if it comes up tails.
To play the game, you make a total of 100 flips. For each flip, you can choose either coin, and you know the outcomes of all the previous flips. In order to win, you must finish with a positive total score. In your eyes, finishing with 2 points is just as good as finishing with 200 points — any positive score is a win. (By the same token, finishing with 0 or −2 points is just as bad as finishing with −200 points.)
If you optimize your strategy, what percentage of games will you win? (Remember, one game consists of 100 coin flips.)
Extra credit: What if coin A isn’t fair (but coin B is still fair)? That is, if coin A comes up heads with probability p and you optimize your strategy, what percentage of games will you win?
Here is my solution:
[Show Solution]
At first, the problem appears to be asking the impossible. How can flipping fair coins over and over lead to anything other than winning half the time? The key here is that “winning” only requires getting a positive score. If we looked at the “expected total score” after 100 flips, this would be 0 no matter what strategy was used (you can see this by applying linearity of expectation). But since we’re counting number of positive outcomes rather than total score, it’s possible to win more than half of the time on average.
As a simpler example of how this can happen, consider a game where you roll a die. If you get a 1, you lose \$5. But if you get anything else, you win \$1. The expected score in the game is $\tfrac{1}{6}(-5) + \tfrac{5}{6}(+1) = 0$. However, if “winning” simply means obtaining a positive score, then we win $\tfrac{5}{6}$ of the time. The flipping game is similar; our total score can average out to zero, even though it ends up being positive more than half of the time. Practically speaking, this means that when we do lose, we lose big.
Probability of winning assuming optimal play
Let’s suppose we are doing $N$ flips, and we are choosing between:
- Coin $A$: wins $+1$ with prob $a$ and wins $-1$ with prob $1-a$.
- Coin $B$: wins $+2$ with prob $b$ and wins $-2$ with prob $1-b$.
Our task is to determine the percentage of games that we will win if we optimize our strategy. One way to approach this is using dynamic programming. The basic idea is that our best move at any point in the game should only depend on (1) our current score and (2) how many moves we have left. Therefore, we define:
\[
V_t(s) = \left\{\begin{aligned}
&\text{Probability that we win if we play optimally}\\
&\text{from here on out, given that our current score}\\
&\text{is }s\text{ and we are currently at turn }t\text{ of }N
\end{aligned}\right.
\]We can view each $V_t$ as a vector indexed by $s \in \{\dots,-1,0,1,\dots\}$, which is our current score. The idea is to solve for each $V_t$ by working our way backwards from the final step until we reach $V_0$, which is our probability of winning for each different starting score. Ultimately, the question is asking us to compute $V_0(0)$.
The terminal distribution is simply given by our probability of winning given that we have achieved a given final score. The result in this case is simple, since at that point there are no moves left to make:
\[
V_N(s) = \begin{cases}
0 & \text{if }s \leq 0\\
1 & \text{if }s > 0
\end{cases}
\]At each previous step, we pick the coin that will give us our highest chance of winning. This leads to the recursive formula for $t=n-1,\dots,2,1$:
\[
V_{t-1}(s) = \max\left\{ a V_t(s+1)+(1-a) V_t(s-1),\, b V_t(s+2)+(1-b) V_t(s-2) \right\}
\]I couldn’t find a way to evaluate this expression in closed form, but it is rather straightforward to evaluate it numerically. Here is efficient Python code that evaluates $V_0(0)$ for any choice of $(a,b,N)$.
import numpy as np # probability of winning if coin A has prob a of +1 and (1-a) of -1, # coin B has prob b of +2 and (1-b) of -2, and we do N flips in total def compute_win_prob( a, b, N ): # shift the indices so that a score of "s" is at index v[s+2*N] v = np.zeros(4*N+1) # initialize (time=N) v[:2*N] = 0 v[2*N+1:] = 1 # overwrite results as we recurse backwards to time=0 # this code is efficient: uses vectorization and overwrites past results for t in range(N): coin_A = a*v[3:4*N] + (1-a)*v[1:4*N-2] coin_B = b*v[4:4*N+1] + (1-b)*v[0:4*N-3] v[2:4*N-1] = np.maximum(coin_A, coin_B) # return the probability of winning assuming we start with a score of 0 return v[2*N]
When we run this script using $a=b=0.5$ and $N=100$, we obtain a probability of winning of about $0.6403$. We can get the exact answer as well using the following Mathematica code:
ComputeWinProb[a_,b_,n_]:=( v = ConstantArray[0,4n+1]; v[[2n+2;;4n+1]] = 1; For[t=0, t<n, t++, CoinA = a v[[4;;4n]] + (1-a) v[[2;;4n-2]]; CoinB = b v[[5;;4n+1]] + (1-b) v[[1;;4n-3]]; v[[3;;4n-1]] = MapThread[Max,{CoinA,CoinB}]; ]; v[[2n+1]]) ComputeWinProb[1/2,1/2,100]
This gives us the exact result:
\[
\text{Prob}(\text{win})=\tfrac{811698796376000066208208781649}{1267650600228229401496703205376} \approx 0.640317
\]
Does doing more flips help?
We can win 64% of the time if we do 100 flips, but What if we did 1000 flips, or more? Could we keep increasing our chances of winning? It doesn’t appear to be the case. As we do more and more flips (increase $N$), the probability of winning increases slightly, but only up to a limit. To see this, I solved the problem for $N$ ranging up to 100,000 flips. Here is what I found:
Given how slowly the function is growing, it appears that there should be a finite limit, and that this limit is around $\frac{2}{3}$ (the last point on the plot is at $0.6658$). But is there another way we can compute this limit?
Limiting behavior for a large number of flips
The optimal strategy for the case $a=b=\tfrac{1}{2}$ is to flip coin A when our score is positive and to flip coin B when our score is negative. How often will we win the game as $N\to\infty$ if we use this strategy?
We can think of the coin-flipping problem in the context of random walks. It is well-known that random walks on the 1D line (with equal probability of moving left or right at each step) will visit every integer an infinite number of times, given enough time. It is also known that the expected number of steps before a walk first returns to its starting point is infinite. Therefore, as $N$ gets large, we can expect that the the score will spend only an infinitesimal fraction of the time near 0.
We can model this process as a Markov chain with states:
\[
\dots,-8,-6,-4,-2,0,1,2,3,4,\dots
\]When our score is $\leq 0$, we will flip coin B and move $\pm 2$. When our score is $\gt 0$, we will flip coin A and move $\pm 1$. Note that the states $-3,-5,-7,\dots$ can never be reached because we only move by $\pm 2$ when our score is negative. Based on the random walk argument above, we can expect the Markov chain to spend most of its time with a large-magnitude score (either on the positive or negative side).
As an approximation, we can imagine that once we hit $+2$, we will spend a large amount of time on the positive side (with a winning score) before we return to $+1$. Likewise, once we hit $0$, we will spend a large amount of time on the negative side (with a losing score) before we return. Since the positive and negative halves of this infinite Markov chain have identical probability distributions, we can expect the time spent on both halves to be equally large. We model this situation by truncating the Markov chain and assigning a transition probability of $\gamma \gg 0$ of remaining in a winning or losing state once we get there. Here is the resulting truncated Markov chain:
The transition matrix for this Markov chain is:
\[
A = \begin{bmatrix}
\gamma & \frac{1}{2} & 0 \\
0 & 0 & 1-\gamma \\
1-\gamma & \frac{1}{2} & \gamma
\end{bmatrix}
\]The steady-state distribution (right eigenvector corresponding to the eigenvalue of $1$) is given by: $\begin{bmatrix} \tfrac{1}{2} & 1-\gamma & 1\end{bmatrix}$. As anticipated, as $\gamma \to 1$, less and less probability is associated with the intermediate state 1. In the limit, the fraction of time spent in the winning state is
\[
\text{Prob}(\text{win as }N\to\infty) = \frac{1}{\tfrac{1}{2}+1} = \frac{2}{3}
\]So this justifies why we found the probability of winning tending to $\tfrac{2}{3}$ in the numerical simulation above.
As an aside, @electricvishnu with some help from @DarkAdonisSA found a formula that computes the probability of winning the game as a function of $N$. This formula is:
$\displaystyle
\text{Prob}(\text{win in }N\text{ steps}) = \frac{2}{3} + \frac{(-1)^N}{2^N}
\left[ \frac{1}{3} + \sum_{k=1}^{N-1} (-1)^k \binom{k}{\lfloor k/2 \rfloor} \right]
$
We don’t have a proof of formula as of yet, but it does produce the exact answer when $N=100$, and it also clearly reveals the limit of $\tfrac{2}{3}$ since the second term tends to zero as $N\to\infty$. There does not appear to be a meaningful way to simplify this formula either.
Changing the probabilities
We can also examine what happens when we try varying the probabilities away from $a=b=\tfrac{1}{2}$. Using the Python function shown above, this is a straightforward exercise. Here is what we obtain when we assume coin B is fair but coin A is not fair:
As expected, when coin A has a low probability, the optimal strategy is to flip coin B a lot more, which just leads to a probability of 0.5 of winning. As coin A’s probability increases, as do our chances of winning. We can also see what happens as we increase either $a$ or $b$ (neither coin is a fair coin). Here is what we obtain:
Once again, I struggled to find closed-form expressions for any of the above cases ($a$ and/or $b$ varying). Even for simpler instances, the solutions get complicated in a hurry. For example, the probability of winning as a function of $a$, with $b=\tfrac{1}{2}$, when the game only lasts 4 flips is:
\[
\text{Prob}(\text{win}) = \begin{cases}
\frac{1}{8}(a+4) & 0 \leq a \leq \tfrac{1}{2} \\
-\frac{1}{8} \left(16 a^4-32 a^3+20 a^2-11 a-1\right) & \tfrac{1}{2} \leq a \leq \gamma \\
-\frac{1}{2} a \left(6 a^3-11 a^2+6 a-3\right) & \gamma \leq a \leq 1
\end{cases}
\]where $\gamma \approx 0.7328$ is one of the roots of $8\gamma^3-4\gamma^2-1=0$. (this was found using the Mathematica function I included above). Each time we increase the number of flips, the number of parts in the piecewise polynomial can potentially double. So it seems highly unlikely that we’ll have a simple closed-form expression by the time we arrive at $N=100$!