This Riddler puzzle is a game theory problem: how should each player play the game to maximize their own winnings?

Ariel, Beatrice and Cassandra — three brilliant game theorists — were bored at a game theory conference (shocking, we know) and devised the following game to pass the time. They drew a number line and placed \$1 on the 1, \$2 on the 2, \$3 on the 3 and so on to \$10 on the 10.

Each player has a personalized token. They take turns — Ariel first, Beatrice second and Cassandra third — placing their tokens on one of the money stacks (only one token is allowed per space). Once the tokens are all placed, each player gets to take every stack that her token is on or is closest to. If a stack is midway between two tokens, the players split that cash.

How will this game play out? How much is it worth to go first?

A grab bag of extra credits: What if the game were played not on a number line but on a clock, with values of \$1 to \$12? What if Desdemona, Eleanor and so on joined the original game? What if the tokens could be placed anywhere on the number line, not just the stacks?

Here are the details of how I approached the problem:

[Show Solution]

This problem is different from previous game theory problems I’ve written about such as the war game or the dodgeball duel, where all players in the game act simultaneously. In such games, you can assume a strategy for each player, and then look for the strategy that yields a Nash equilibrium, which is a set of strategies such that no player has any incentive to change their actions. Sometimes, the optimal strategy is a mixed strategy, which means it involves randomness. This is the case in rock-paper-scissors, for example, where the Nash-optimal strategy is to choose randomly.

What makes the present problem quite different is that it’s a sequential game in which each player gets to see the previous players’ moves when it comes their turn to play. This means that there can be no advantage to playing randomly. The standard way to solve such problems is via backward induction (also known as dynamic programming). I will illustrate the approach for the basic version of the problem: 10 stacks and 3 players. A *strategy* is a set of moves $(a,b,c)$ where $a$ is Ariel’s move, $b$ is Beatrice’s move, and $c$ is Cassandra’s move. We proceed by backward induction as follows:

- Given a strategy $(a,b,c)$, let’s call Ariel’s payout $P_A(a,b,c)$. Similarly, we’ll call Beatrice’s and Cassandra’s payouts $P_B$ and $P_C$, respectively.
- Imagine it’s Cassandra’s turn. She observes Ariel and Beatrice’s moves $a$ and $b$. For each potential choice of $c$, we can compute Cassandra’s payout $P_C(a,b,c)$ and then select the most profitable choice of $c$. Let’s call Cassandra’s strategy $K_C(a,b)$. Mathematically, we’re defining:

\[

K_C(a,b) = \arg\max_{c} P_C(a,b,c)

\] - Now imagine it’s Beatrice’s turn. She observes Ariel’s move ($a$). She also knows that if she chooses $b$, then Cassandra will follow up with $K_C(a,b)$. So she must choose the best $b$ in anticipation of what Cassandra will do. If we call Beatrice’s strategy $K_B(a)$, then we have:

\[

K_B(a) = \arg\max_{b} P_B(a,b,K_C(a,b))

\] - Finally, imagine it’s Ariel’s turn. Although she’s the first to move, no matter which $a$ she picks, she can be assured Beatrice will choose $b = K_B(a)$ and Cassandra will follow-up with $c = K_C(a,b)$. So Ariel can examine all of her options and predict her payout. Ultimately, her decision will be $K_A$, which is given by:

\[

K_A = \arg\max_{a} P_A(a,K_B(a),K_C(a,K_B(a)))

\]

### Computational details

Writing code to implement the backward induction above was quite a challenge. I will include my finished code later so you can play with it, but for now I’ll just highlight two of the major hurdles I had to overcome.

- A naive implementation of the above steps quickly becomes computationally intractable as we make the problem larger. If there are $N$ money stacks and $M$ players, the number of possible strategies is $N(N-1)\cdots(N-M+1)$, which can be quite large. So if $N=12$ and $M=9$, the $K$ function for the last player will depend on how the first $8$ players moved, and there are about 20 million possibilities there. Things deteriorate rapidly as we further increase $N$ or $M$. I observed that the best move for any player does not depend on the order in which the previous players moved. So for the intent of making the functions $K_A$, $K_B$, etc., we may assume the inputs are sorted. This reduces the number of possible strategies to $\binom{N}{M}$. So in the example above, the 20 million possibilities reduces to 500, a far more manageable number.
- The optimal solution may not be unique. At each step, there may be several choices that produce identical payouts for the deciding player. Keeping track of all possibilities is tedious because it means that each $\arg\max$ can return many solutions, and each of those must be propagated back through the other functions. I implemented the recursion using lists of strategies, so at every step I was passing around lists of optimal strategies rather than just one strategy.
- Extensions and bonus questions. After some thought, I was able to write the code in a way that solves the bonus questions with only minor modifications. Writing generic code in terms of $N$ and $M$ allows the easy addition of more stacks or more players. Writing the number line so it wraps around (on a clock) just amounts to redefining the function that computes the payouts for each strategy. Solving the continuous version amounts to adding extra slots in between the stacks and suitably redefining the payout function.
- Bugs. So many bugs! This was the trickiest piece of code I’ve written in awhile, and tracking down indexing errors in the recursion was a challenge to say the least. I think I got everything working in the end and I’m quite please with it!

The results from my simulation are in the next section.

And here are the answers:

[Show Solution]

### Adding more players

Let’s call $N$ the number of money stacks and $M$ the number of players. Here is a table showing the case $N=10$ as we add more players.

Players | Optimal strategies (number line) | Optimal payouts (in dollars) |
---|---|---|

$2$ | $(7,8)$ | $(28,27)$ |

$3$ | $(5, 9, 8)$ | $(21, 19, 15)$ |

$4$ | $(7, 4, 9, 10)$ | $(17, 15, 13, 10)$ |

$5$ | $(4, 6, 8, 10, 9)$ | $(12.5, 12, 11.5, 10, 9)$ |

$6$ | $(4, 10, 9, 6, 8, 7)$ | $(12.5, 10, 9, 8.5, 8, 7)$ |

$7$ | $(10, 9, 3, 8, 5, 7, 6)$ $(10, 9, 3, 8, 7, 5, 6)$ $(10, 9, 8, 3, 5, 7, 6)$ $(10, 9, 8, 3, 7, 5, 6)$ |
$(10, 9, 8, 8, 7, 7, 6)$ |

$8$ | $(10, 9, 8, 7, 3, 6, 5, 4)$ $(10, 9, 8, 7, 6, 3, 5, 4)$ |
$(10, 9, 8, 7, 6, 6, 5, 4)$ |

$9$ | $(10, 9, 8, 7, 6, 5, 4, 2, 3)$ $(10, 9, 8, 7, 6, 5, 4, 3, 2)$ |
$(10, 9, 8, 7, 6, 5, 4, 3, 3)$ |

$10$ | $(10, 9, 8, 7, 6, 5, 4, 3, 2, 1)$ | $(10, 9, 8, 7, 6, 5, 4, 3, 2, 1)$ |

At the onset, it wasn’t clear whether it would be best to move first in order to stake claim to the best spots or to move later on and have the advantage of playing reactively. As we can see from the table above, the first player always has the highest payout and it gets worse for subsequent players. Some of the rows contain multiple strategies. These are cases with multiple Nash equilibria. All produce identical payoffs.

In the original problem with $M=3$ players, it’s worth \$21 to go first. Of course, this assumes everybody is playing in a greedy fashion and trying to maximize their own payoffs. If Beatrice and Cassandra conspired to ruin Ariel’s day, they could reduce Ariel’s winnings to \$5 (by placing tokens at 4 and 6) but they would have to trust each other enough to split their \$50 winnings evenly after the game is over!

### Playing around the clock

If we wrap the number line around a clock, the game changes slightly, but the code only requires minor modification. All we need to do is define how payoffs are computed, since the “closest” token may be either clockwise or counterclockwise from a given stack. Here are the results with $N=12$ as we add more players.

Players | Optimal strategies (clock) | Optimal payouts (in dollars) |
---|---|---|

$2$ | $(9, 10)$ $(10, 9)$ |
$(39, 39)$ |

$3$ | $(7, 11, 10)$ | $(31.5, 27.5, 19)$ |

$4$ | $(6, 9, 12, 11)$ | $(23.5, 22, 16.5, 16)$ |

$5$ | $(8, 5, 12, 10, 11)$ | $(19.5, 18, 15, 14.5, 11)$ |

$6$ | $(5, 12, 7, 9, 11, 10)$ $(12, 5, 7, 9, 11, 10)$ |
$(15, 15, 14, 13, 11, 10)$ |

$7$ | $(12, 9, 7, 11, 4, 10, 6)$ $(12, 9, 11, 7, 4, 10, 6)$ |
$(14, 13, 11, 11, 10.5, 10, 8.5)$ |

$8$ | $(12, 11, 4, 10, 9, 6, 8, 7)$ | $(14, 11, 10.5, 10, 9, 8.5, 8, 7)$ |

$9$ | $(12, 11, 10, 9, 8, 3, 5, 7, 6)$ $(12, 11, 10, 9, 8, 3, 7, 5, 6)$ $(12, 11, 10, 9, 8, 7, 3, 5, 6)$ |
$(13, 11, 10, 9, 8, 7, 7, 7, 6)$ |

$10$ | $(12, 11, 10, 9, 8, 7, 6, 3, 5, 4)$ $(12, 11, 10, 9, 8, 7, 6, 5, 3, 4)$ |
$(13, 11, 10, 9, 8, 7, 6, 5, 5, 4)$ |

$11$ | $(12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2)$ | $(12.5, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2.5)$ |

$12$ | $(12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1)$ | $(12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1)$ |

As in the number line case, there are many instances with multiple Nash equilibria and each case leads to the same payoff.

### Continuous number line

I adapted my code to handle the continuous case by adding intermediate slots in between each stack. This allows players to play “in between” the stacks if they so choose. If I return to $N=10$ on the number line add 4 slots between each stack for $M=2$ and $M=3$. Here are the results:

Players | Optimal strategies (continuous) | Optimal payouts (in dollars) |
---|---|---|

$2$ | $(7.0, 7.2)$ $(7.0, 7.4)$ $(7.0, 7.6)$ $(7.0, 7.8)$ $(7.0, 8.0)$ $(7.0, 8.2)$ $(7.0, 8.4)$ $(7.0, 8.6)$ $(7.0, 8.8)$ |
$(28, 27)$ |

$3$ | $(5.0, 9.0, 7.2)$ $(5.0, 9.0, 7.4)$ $(5.0, 9.0, 7.6)$ $(5.0, 9.0, 7.8)$ $(5.0, 9.0, 8.0)$ $(5.0, 9.0, 8.2)$ $(5.0, 9.0, 8.4)$ $(5.0, 9.0, 8.6)$ $(5.0, 9.0, 8.8)$ |
$(21, 19, 15)$ |

Here it’s clear what’s going on; given the option, the first players do not choose to deviate. Ultimately, the last player has freedom to move their token in the range $7 \lt c \lt 9$ and the payoff doesn’t change, and it matches the solution in the integer case (where $c=8$ is optimal). So at least in these cases, making the problem continuous doesn’t change a thing.

This isn’t always true. The solution gets more complex if we examine the case $M=4$. Here, I tried solving the problem with different discretizations and I found the following:

Players | Optimal strategies (continuous) | Optimal payouts (in dollars) |
---|---|---|

$4$ | Steps of $0.50$: $(9.00, 3.50, 6.50, 7.50)$ Steps of $0.25$: $(9.00, 3.25, 6.75, 7.25)$ Steps of $0.20$: $(9.00, 3.20, 6.80, 7.20)$ Steps of $0.10$: $(9.00, 3.10, 6.90, 7.10)$ |
$(19, 12.5, 12, 11.5)$ |

This case highlights when using a discretization breaks down. We would never expect any stacks to be evenly split in the continuous case because you can always nudge your token a little bit in order to be a little closer to either stack and win it all. It also appears that as we reduce the discretization to $\epsilon$, the unique optimal strategy is $(9, 3+\epsilon, 7-\epsilon, 7+\epsilon)$. This solution is an artifact of our discretization. Indeed, the last player could do better by picking $7+\tfrac{1}{2}\epsilon$.

One way around this would be to give each successive player a finer discretization to choose from. This would prevent the case above from occurring. I didn’t code this up so I didn’t verify that the Nash-optimal solution for the discrete case is the same in the continuous case. Future work!

### My code

I wrote my code in Julia (version 0.6.2.1). For those that don’t know about Julia, it’s a fast language for scientific computing that is syntactically similar to Matlab and Python. If you are interested in seeing my code, here is my Jupyter notebook.