Another game theory problem from the Riddler. This game is a simplified version of poker, but captures some interesting behaviors!
Baby poker is played by two players, each holding a single die in a cup. The game starts with each player anteing \$1. Then both shake their die, roll it, and look at their own die only. Player A can then either “call,” in which case both dice are shown and the player with the higher number wins the \$2 on the table, or Player A can “raise,” betting one more dollar. If A raises, then B has the option to either “call” by matching A’s second dollar, after which the higher number wins the \$4 on the table, or B can “fold,” in which case A wins but B is out only his original \$1. No other plays are made, and if the dice match, a called pot is split equally.
What is the optimal strategy for each player? Under those strategies, how much is a game of baby poker worth to Player A? In other words, how much should A pay B beforehand to make it a fair game?
If you’re interested in the derivation (and maybe learning about some game theory along the way), you can read my full solution here:
[Show Solution]
Our first task will be to figure out the payout of the game as a function of the players’ strategies. For each possible dice roll, the Player A must decide whether to call ($0$) or raise ($1$). Similarly, Player B must decide whether to call ($0$) or fold ($1$). An example strategy for Player A is:
\[
p = \begin{bmatrix} 0 & 0 & 0 & 0 & 1 & 1 \end{bmatrix}
\]This corresponds to Player A calling if they roll $\{1,2,3,4\}$ and raising if they roll $\{5,6\}.$ This is an example of a pure strategy because it completely determines what the player does as a function of the information available. We can encode each pure strategy as a six-digit binary number. There are $2^6 = 64$ possible pure strategies for each player and we can number them from $0$ to $63$ based on the decimal representation. For example, strategy $44$ corresponds to $101100$ in binary, i.e. strategy $p = \begin{bmatrix}1 & 0 & 1 & 1 & 0 & 0\end{bmatrix}$. We can also consider mixed strategies, where actions are probabilistic. An example of a mixed strategy for Player A is:
\[
p = \begin{bmatrix} 0.3 & 0 & 1 & 1 & 0 & 0 \end{bmatrix}
\]This strategy is identical to the previous example, except if Player A rolls a $1$, they will raise $30$% of the time and call $70$% of the time. Mixed strategies allows players to play randomly if they so desire! Pure strategies are special cases of mixed strategies in which the probabilities are either $0$ or $1$ (deterministic).
Let’s call $0 \le p_i \le 1$ the strategy of Player A if they roll $i \in \{1,\dots,6\}.$ Likewise, let’s call $0 \le q_j \le 1$ the strategy of Player B if they roll $j \in \{1,\dots,6\}.$ Define the $6\times 6$ matrix $E$ as:
\[
E_{ij} = \begin{cases}
1 &\text{if }i > j \\
0 &\text{if }i = j \\
-1 &\text{if }i < j
\end{cases}
\]This matrix tells us how much Player A wins if they call with an $i$ and the opponent rolls $j.$ Player A raises with probability $p_i$ and calls with probability $1-p_i.$ Similarly, Player B folds with probability $q_j$ and calls with probability $1-q_j.$ Since each pair $(i,j)$ is equally likely, the expected winnings for Player A are:
\[
W = \frac{1}{36} \sum_{i=1}^6 \sum_{j=1}^6 \bigl( (1-p_i)E_{ij} + p_i\left( 2(1-q_j) E_{ij} + q_j \right) \bigr)
\]The last term looks the way it does because if Player B folds, Player A wins $+1$ with certainty. If Player B calls, then we use the $E_{ij}$ matrix to determine the payoff, but with a factor of 2 because the stakes were doubled.
Pure strategy equilibria
Define $P_{ab}$ to be the winnings of Player A from the $W$ formula above when players adopt pure strategies $a,b \in \{0,1,\dots,63\}$ respectively. This yields the $64 \times 64$ matrix $P$ corresponding to the winnings for all possible combinations of pure strategies. Here is what $P$ looks like:

Player A is trying to maximize $P_{ab}$ while Player B is trying to minimize it (it’s a zero-sum game, so minimizing the winnings of one player is equivalent to maximizing the losses of the other player). Put another way, Player A selects a row of the matrix $P$ (corresponding to a choice among 64 pure strategies), while Player B selects a column (again, corresponding to a choice among pure strategies). The number in the chosen cell determines how much Player A will win. A Nash Equilibrium is a cell $P_{ab}$ that satisfies
\[
P_{\bar a b} \le P_{ab} \le P_{a \bar b}\qquad \text{for all }\bar a,\bar b \in \{1,\dots,64\}
\]In other words, $P_{ab}$ is the largest element in its column (so Player A has no incentive to pick a different row) and it’s also the smallest element in its row (so Player B has no incentive to pick a different column).
A natural question is to ask is: is there an optimal pair of pure strategies? This amounts to searching the above $P$ matrix for an element that is simultaneously the largest in its column and smallest in its row. A quick search reveals that no such element exists. In other words, there are no pure equilibria for this game!
Mixed strategy equilibria
Every mixed strategy can be expressed as a convex combination of pure strategies. For example:
\[
\begin{bmatrix}1\\\tfrac{1}{3}\\0\\0\\\tfrac{3}{4}\\1\end{bmatrix} =
\frac{1}{3}\begin{bmatrix}1\\1\\0\\0\\1\\1\end{bmatrix} +
\frac{5}{12}\begin{bmatrix}1\\0\\0\\0\\1\\1\end{bmatrix} +
\frac{1}{4}\begin{bmatrix}1\\0\\0\\0\\0\\1\end{bmatrix}
\]This follows from the fact that each mixed strategies lies in the convex hull of all pure strategies. Also, since the winnings $W$ are bilinear in $p$ and $q$, the winnings for a convex combination of strategies is the convex combination of the corresponding winnings.
Now let $u \in [0,1]^{64}$ and $v \in [0,1]^{64}$ be the coefficients of the convex combinations of the pure strategies for Players 1 and 2 respectively. The winnings of Player A can be written compactly as $u^\textsf{T} P v$. Player A would like to select $u$ in order to maximize this quantity, while Player B would like to select $v$ in order to minimize it. All this is subject to the constraint that $0 \le u_i \le 1$ for all $i$ and $u_1 + \dots + u_{64} = 1$, and similarly for $v$. Note that in the special case where $u$ and $v$ contain a single “1” and the rest is “0”, we recover the case of seeking pure strategies.
Nash famously proved that every game with finitely many players and actions has a mixed strategy equilibrium. So all that’s left to do is find it! The set of Nash equilibria can be expressed as the optimal set of a linear program. In particular, the optimal strategies $u,v$ are solutions to the dual linear programs:
\[
\begin{aligned}
\underset{u,\mu}{\max}\quad& \mu \\
\text{s.t.}\quad& u\ge 0,\quad \mathbf{1}^\textsf{T} u = 1 \\
& P^\textsf{T} u \ge \mu \mathbf{1}
\end{aligned}
\qquad
\begin{aligned}
= \qquad
\underset{v,t}{\min}\quad& t \\
\text{s.t.}\quad& v\ge 0,\quad \mathbf{1}^\textsf{T} v = 1 \\
& P v \le t \mathbf{1}
\end{aligned}
\]The first program (Player A) has a unique solution, which is the mixture:
\[
p_\text{opt} =
\frac{1}{3}\begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \\ 1 \\ 1 \end{bmatrix} +
\frac{2}{3}\begin{bmatrix} 1 \\ 0 \\ 0 \\ 0 \\ 1 \\ 1 \end{bmatrix} =
\begin{bmatrix} \tfrac{2}{3} \\ 0 \\ 0 \\ 0 \\ 1 \\ 1 \end{bmatrix}
\]The second program (Player B) the solution is not unique, and is rather complicated. As with any linear program, the solution set is a polytope. In this case, the polytope has 15 vertices and lives in a 7-dimensional subspace of $\mathbb{R}^{64}$ (neat, huh?). Projected back down to $\mathbb{R}^6$ where $q$ lives, every optimal solution occupies a 3-dimensional subspace. Specifically,
\[
q_\text{opt} \sim
\begin{bmatrix} 1 \\ x \\ y \\ z \\ 0 \\ 0 \end{bmatrix}
\]where $0 \le x,y \le 1$ and $0 \le z \le \tfrac{2}{3}$ and $x+y+z =\tfrac{4}{3}$.
Each optimal solution has the same optimal value (i.e. the optimal value of $\mu$ or $t$ in the linear programs for $u$ and $v$). The optimal value is $\tfrac{5}{54} \approx 0.0926$. So on average, Player A comes out ahead to the tune of about 9.26 cents when both players are playing optimally.
This alternate solution was proposed by a commenter named Chris. Same answer, but a simpler argument!
[Show Solution]
The solution proceeds much like the previous solution, but this time, we rewrite the cost as a quadratic form in the vectors $p$ and $q$:
\begin{align}
W &= \frac{1}{36} \sum_{i=1}^6 \sum_{j=1}^6 \bigl( (1-p_i)E_{ij} + p_i\left( 2(1-q_j) E_{ij} + q_j \right) \bigr) \\
&= \frac{1}{36} \left( p^\textsf{T}(E \mathbf{1}) + p^\textsf{T}(\textbf{1}\textbf{1}^\textsf{T}-2E)q \right) \\
&= p^\textsf{T} b + p^\textsf{T} A q
\end{align}where $b = \tfrac{1}{36}E\mathbf{1}$, and $A = \tfrac{1}{36}\left(\textbf{1}\textbf{1}^\textsf{T}-2E\right)$, and $\textbf{1}$ is the vector of all ones. We can now directly write linear programs for $p$ and $q$. The first player is trying to maximize $W$ under the assumption that the second player is trying to minimize $W$. In other words, we should solve:
\begin{align}
&\phantom{=}\underset{\textbf{0} \le p \le \textbf{1}}{\max}\quad\underset{\textbf{0} \le q \le \textbf{1}}{\min}\quad p^\textsf{T} b + p^\textsf{T} A q\\
&= \underset{\textbf{0} \le p \le \textbf{1}}{\max} \left( p^\textsf{T} b + \sum_{j=1}^6 \underset{0\le q_j \le 1}{\min} q_j (A^\textsf{T}p)_j \right) \\
&= \underset{\textbf{0} \le p \le \textbf{1}}{\max} \left( p^\textsf{T} b + \sum_{j=1}^6 \min\left( 0, (A^\textsf{T}p)_j \right) \right) \\
&= \left\{\begin{aligned}
\underset{p,t}{\text{maximize}}&\quad p^\textsf{T} b + t^\textsf{T} \textbf{1} \\
\text{subject to:}&\quad \textbf{0} \le p \le \textbf{1} \\
& t \le \textbf{0},\quad t \le A^\textsf{T} p
\end{aligned}\right.
\end{align}Likewise, the second player is trying to minimize $W$ under the assumption that the first player is trying to maximize $W$. We can proceed in a similar manner to above and we obtain:
\begin{align}
&\phantom{=}\underset{\textbf{0} \le q \le \textbf{1}}{\min}\quad\underset{\textbf{0} \le p \le \textbf{1}}{\max}\quad p^\textsf{T} b + p^\textsf{T} A q\\
&= \underset{\textbf{0} \le q \le \textbf{1}}{\min} \left( \sum_{i=1}^6 \max_{0\le p_i \le 1} p_i(Aq+b)_i \right) \\
&= \underset{\textbf{0} \le q \le \textbf{1}}{\min} \left( \sum_{i=1}^6 \max\left( 0, (Aq+b)_i \right) \right) \\
&= \left\{\begin{aligned}
\underset{q,\mu}{\text{minimize}}&\quad \mu^\textsf{T} \textbf{1} \\
\text{subject to:}&\quad \textbf{0} \le q \le \textbf{1} \\
& \mu \ge \textbf{0},\quad \mu \ge A q + b
\end{aligned}\right.
\end{align}Solving these two linear programs yields the same optimal $p$ and $q$ vectors as in the previous approach, but this time our linear programs have far fewer variables! The objective values of both LPs match as well and are both equal to $\tfrac{5}{54} \approx 0.926$, as before. In fact, we can check algebraically that the two LPs are dual to one another. Since both are clearly feasible (easy to check that the feasible set is compact and nonempty), strong duality holds and the two LPs must have the same optimal value. This is in essence Nash’s celebrated result; that the minimax and maximin formulations are equivalent!
If you’d just like to know the answer along with a brief explanation, here is the tl;dr version:
[Show Solution]
Player A’s optimal strategy:
- If you roll $\{5,6\}$ then always raise.
- If you roll $\{2,3,4\}$ then always call.
- If you roll a $1$, then employ the following random strategy: call with probability $\tfrac{1}{3}$ and raise with probability $\tfrac{2}{3}$.
This makes sense; if you have a strong hand you should raise because you stand to win more that way. If your hand is weak you should call because you’ll likely lose and this minimizes your losses. However, if your hand is very weak, you should occasionally bluff to trick your opponent into thinking you have a strong hand!
Player B’s optimal strategy:
- If you roll $\{5,6\}$ then always call.
- If you roll a $1$ then always fold.
- If you roll $\{2,3,4\}$ then employ a random strategy where you assign probabilities of folding of $x,y,z$ to $2,3,4$ respectively, where $x+y+z = \tfrac{4}{3}$ and $z \le \tfrac{2}{3}$. One simple viable strategy is to fold with probability $\tfrac{4}{9}$ whenever $\{2,3,4\}$ is rolled.
This makes sense; if your hand is strong you should call, if your hand is weak you should fold, and if you’re in no-man’s land, then you should go one way or the other. Why might we expect there to be multiple viable strategies? When Player A raises using the optimal strategy, he either has a strong hand with $\{4,5,6\}$, or a very weak hand with $\{1\}$. Given this information, if Player B rolls a $3$, it’s no better or worse than rolling a $2$! Among these infinitely many viable strategies, each has the same expected payoff so it doesn’t matter which one is chosen.
When both players play optimally, Player A is expected to win $\tfrac{5}{54}$ dollars, or about 9.26 cents on average. So this is what Player A should pay to Player B beforehand in order to make the game fair.