Another game theory problem from the Riddler. This game is a simplified version of poker, but captures some interesting behaviors!

Baby poker is played by two players, each holding a single die in a cup. The game starts with each player anteing \$1. Then both shake their die, roll it, and look at their own die only. Player A can then either “call,” in which case both dice are shown and the player with the higher number wins the \$2 on the table, or Player A can “raise,” betting one more dollar. If A raises, then B has the option to either “call” by matching A’s second dollar, after which the higher number wins the \$4 on the table, or B can “fold,” in which case A wins but B is out only his original \$1. No other plays are made, and if the dice match, a called pot is split equally.

What is the optimal strategy for each player? Under those strategies, how much is a game of baby poker worth to Player A? In other words, how much should A pay B beforehand to make it a fair game?

If you’re interested in the derivation (and maybe learning about some game theory along the way), you can read my full solution here:

[Show Solution]

Our first task will be to figure out the payout of the game as a function of the players’ strategies. For each possible dice roll, the Player A must decide whether to call ($0$) or raise ($1$). Similarly, Player B must decide whether to call ($0$) or fold ($1$). An example strategy for Player A is:

\[

p = \begin{bmatrix} 0 & 0 & 0 & 0 & 1 & 1 \end{bmatrix}

\]This corresponds to Player A calling if they roll $\{1,2,3,4\}$ and raising if they roll $\{5,6\}.$ This is an example of a pure strategy because it completely determines what the player does as a function of the information available. We can encode each pure strategy as a six-digit binary number. There are $2^6 = 64$ possible pure strategies for each player and we can number them from $0$ to $63$ based on the decimal representation. For example, strategy $44$ corresponds to $101100$ in binary, i.e. strategy $p = \begin{bmatrix}1 & 0 & 1 & 1 & 0 & 0\end{bmatrix}$. We can also consider mixed strategies, where actions are probabilistic. An example of a mixed strategy for Player A is:

\[

p = \begin{bmatrix} 0.3 & 0 & 1 & 1 & 0 & 0 \end{bmatrix}

\]This strategy is identical to the previous example, except if Player A rolls a $1$, they will raise $30$% of the time and call $70$% of the time. Mixed strategies allows players to play randomly if they so desire! Pure strategies are special cases of mixed strategies in which the probabilities are either $0$ or $1$ (deterministic).

Let’s call $0 \le p_i \le 1$ the strategy of Player A if they roll $i \in \{1,\dots,6\}.$ Likewise, let’s call $0 \le q_j \le 1$ the strategy of Player B if they roll $j \in \{1,\dots,6\}.$ Define the $6\times 6$ matrix $E$ as:

\[

E_{ij} = \begin{cases}

1 &\text{if }i > j \\

0 &\text{if }i = j \\

-1 &\text{if }i < j
\end{cases}
\]This matrix tells us how much Player A wins if they call with an $i$ and the opponent rolls $j.$ Player A raises with probability $p_i$ and calls with probability $1-p_i.$ Similarly, Player B folds with probability $q_j$ and calls with probability $1-q_j.$ Since each pair $(i,j)$ is equally likely, the expected winnings for Player A are:
\[
W = \frac{1}{36} \sum_{i=1}^6 \sum_{j=1}^6 \bigl( (1-p_i)E_{ij} + p_i\left( 2(1-q_j) E_{ij} + q_j \right) \bigr)
\]The last term looks the way it does because if Player B folds, Player A wins $+1$ with certainty. If Player B calls, then we use the $E_{ij}$ matrix to determine the payoff, but with a factor of 2 because the stakes were doubled.

### Pure strategy equilibria

Define $P_{ab}$ to be the winnings of Player A from the $W$ formula above when players adopt pure strategies $a,b \in \{0,1,\dots,63\}$ respectively. This yields the $64 \times 64$ matrix $P$ corresponding to the winnings for all possible combinations of pure strategies. Here is what $P$ looks like:

Player A is trying to maximize $P_{ab}$ while Player B is trying to minimize it (it’s a zero-sum game, so minimizing the winnings of one player is equivalent to maximizing the losses of the other player). Put another way, Player A selects a row of the matrix $P$ (corresponding to a choice among 64 pure strategies), while Player B selects a column (again, corresponding to a choice among pure strategies). The number in the chosen cell determines how much Player A will win. A Nash Equilibrium is a cell $P_{ab}$ that satisfies

\[

P_{\bar a b} \le P_{ab} \le P_{a \bar b}\qquad \text{for all }\bar a,\bar b \in \{1,\dots,64\}

\]In other words, $P_{ab}$ is the largest element in its column (so Player A has no incentive to pick a different row) and it’s also the smallest element in its row (so Player B has no incentive to pick a different column).

A natural question is to ask is: is there an optimal pair of pure strategies? This amounts to searching the above $P$ matrix for an element that is simultaneously the largest in its column and smallest in its row. A quick search reveals that no such element exists. In other words, there are no pure equilibria for this game!

### Mixed strategy equilibria

Every mixed strategy can be expressed as a convex combination of pure strategies. For example:

\[

\begin{bmatrix}1\\\tfrac{1}{3}\\0\\0\\\tfrac{3}{4}\\1\end{bmatrix} =

\frac{1}{3}\begin{bmatrix}1\\1\\0\\0\\1\\1\end{bmatrix} +

\frac{5}{12}\begin{bmatrix}1\\0\\0\\0\\1\\1\end{bmatrix} +

\frac{1}{4}\begin{bmatrix}1\\0\\0\\0\\0\\1\end{bmatrix}

\]This follows from the fact that each mixed strategies lies in the convex hull of all pure strategies. Also, since the winnings $W$ are bilinear in $p$ and $q$, the winnings for a convex combination of strategies is the convex combination of the corresponding winnings.

Now let $u \in [0,1]^{64}$ and $v \in [0,1]^{64}$ be the coefficients of the convex combinations of the pure strategies for Players 1 and 2 respectively. The winnings of Player A can be written compactly as $u^\textsf{T} P v$. Player A would like to select $u$ in order to maximize this quantity, while Player B would like to select $v$ in order to minimize it. All this is subject to the constraint that $0 \le u_i \le 1$ for all $i$ and $u_1 + \dots + u_{64} = 1$, and similarly for $v$. Note that in the special case where $u$ and $v$ contain a single “1” and the rest is “0”, we recover the case of seeking pure strategies.

Nash famously proved that every game with finitely many players and actions has a mixed strategy equilibrium. So all that’s left to do is find it! The set of Nash equilibria can be expressed as the optimal set of a linear program. In particular, the optimal strategies $u,v$ are solutions to the dual linear programs:

\[

\begin{aligned}

\underset{u,\mu}{\max}\quad& \mu \\

\text{s.t.}\quad& u\ge 0,\quad \mathbf{1}^\textsf{T} u = 1 \\

& P^\textsf{T} u \ge \mu \mathbf{1}

\end{aligned}

\qquad

\begin{aligned}

= \qquad

\underset{v,t}{\min}\quad& t \\

\text{s.t.}\quad& v\ge 0,\quad \mathbf{1}^\textsf{T} v = 1 \\

& P v \le t \mathbf{1}

\end{aligned}

\]The first program (Player A) has a unique solution, which is the mixture:

\[

p_\text{opt} =

\frac{1}{3}\begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \\ 1 \\ 1 \end{bmatrix} +

\frac{2}{3}\begin{bmatrix} 1 \\ 0 \\ 0 \\ 0 \\ 1 \\ 1 \end{bmatrix} =

\begin{bmatrix} \tfrac{2}{3} \\ 0 \\ 0 \\ 0 \\ 1 \\ 1 \end{bmatrix}

\]The second program (Player B) the solution is not unique, and is rather complicated. As with any linear program, the solution set is a polytope. In this case, the polytope has 15 vertices and lives in a 7-dimensional subspace of $\mathbb{R}^{64}$ (neat, huh?). Projected back down to $\mathbb{R}^6$ where $q$ lives, every optimal solution occupies a 3-dimensional subspace. Specifically,

\[

q_\text{opt} \sim

\begin{bmatrix} 1 \\ x \\ y \\ z \\ 0 \\ 0 \end{bmatrix}

\]where $0 \le x,y \le 1$ and $0 \le z \le \tfrac{2}{3}$ and $x+y+z =\tfrac{4}{3}$.

Each optimal solution has the same optimal value (i.e. the optimal value of $\mu$ or $t$ in the linear programs for $u$ and $v$). The optimal value is $\tfrac{5}{54} \approx 0.0926$. So on average, Player A comes out ahead to the tune of about 9.26 cents when both players are playing optimally.

This alternate solution was proposed by a commenter named Chris. Same answer, but a simpler argument!

[Show Solution]

The solution proceeds much like the previous solution, but this time, we rewrite the cost as a quadratic form in the vectors $p$ and $q$:

\begin{align}

W &= \frac{1}{36} \sum_{i=1}^6 \sum_{j=1}^6 \bigl( (1-p_i)E_{ij} + p_i\left( 2(1-q_j) E_{ij} + q_j \right) \bigr) \\

&= \frac{1}{36} \left( p^\textsf{T}(E \mathbf{1}) + p^\textsf{T}(\textbf{1}\textbf{1}^\textsf{T}-2E)q \right) \\

&= p^\textsf{T} b + p^\textsf{T} A q

\end{align}where $b = \tfrac{1}{36}E\mathbf{1}$, and $A = \tfrac{1}{36}\left(\textbf{1}\textbf{1}^\textsf{T}-2E\right)$, and $\textbf{1}$ is the vector of all ones. We can now *directly* write linear programs for $p$ and $q$. The first player is trying to maximize $W$ under the assumption that the second player is trying to minimize $W$. In other words, we should solve:

\begin{align}

&\phantom{=}\underset{\textbf{0} \le p \le \textbf{1}}{\max}\quad\underset{\textbf{0} \le q \le \textbf{1}}{\min}\quad p^\textsf{T} b + p^\textsf{T} A q\\

&= \underset{\textbf{0} \le p \le \textbf{1}}{\max} \left( p^\textsf{T} b + \sum_{j=1}^6 \underset{0\le q_j \le 1}{\min} q_j (A^\textsf{T}p)_j \right) \\

&= \underset{\textbf{0} \le p \le \textbf{1}}{\max} \left( p^\textsf{T} b + \sum_{j=1}^6 \min\left( 0, (A^\textsf{T}p)_j \right) \right) \\

&= \left\{\begin{aligned}

\underset{p,t}{\text{maximize}}&\quad p^\textsf{T} b + t^\textsf{T} \textbf{1} \\

\text{subject to:}&\quad \textbf{0} \le p \le \textbf{1} \\

& t \le \textbf{0},\quad t \le A^\textsf{T} p

\end{aligned}\right.

\end{align}Likewise, the second player is trying to minimize $W$ under the assumption that the first player is trying to maximize $W$. We can proceed in a similar manner to above and we obtain:

\begin{align}

&\phantom{=}\underset{\textbf{0} \le q \le \textbf{1}}{\min}\quad\underset{\textbf{0} \le p \le \textbf{1}}{\max}\quad p^\textsf{T} b + p^\textsf{T} A q\\

&= \underset{\textbf{0} \le q \le \textbf{1}}{\min} \left( \sum_{i=1}^6 \max_{0\le p_i \le 1} p_i(Aq+b)_i \right) \\

&= \underset{\textbf{0} \le q \le \textbf{1}}{\min} \left( \sum_{i=1}^6 \max\left( 0, (Aq+b)_i \right) \right) \\

&= \left\{\begin{aligned}

\underset{q,\mu}{\text{minimize}}&\quad \mu^\textsf{T} \textbf{1} \\

\text{subject to:}&\quad \textbf{0} \le q \le \textbf{1} \\

& \mu \ge \textbf{0},\quad \mu \ge A q + b

\end{aligned}\right.

\end{align}Solving these two linear programs yields the same optimal $p$ and $q$ vectors as in the previous approach, but this time our linear programs have far fewer variables! The objective values of both LPs match as well and are both equal to $\tfrac{5}{54} \approx 0.926$, as before. In fact, we can check algebraically that the two LPs are dual to one another. Since both are clearly feasible (easy to check that the feasible set is compact and nonempty), strong duality holds and the two LPs *must have the same optimal value*. This is in essence Nash’s celebrated result; that the minimax and maximin formulations are equivalent!

If you’d just like to know the answer along with a brief explanation, here is the tl;dr version:

[Show Solution]

**Player A’s optimal strategy:**

- If you roll $\{5,6\}$ then
*always raise*. - If you roll $\{2,3,4\}$ then
*always call*. - If you roll a $1$, then employ the following random strategy: call with probability $\tfrac{1}{3}$ and raise with probability $\tfrac{2}{3}$.

This makes sense; if you have a strong hand you should raise because you stand to win more that way. If your hand is weak you should call because you’ll likely lose and this minimizes your losses. However, if your hand is very weak, you should occasionally bluff to trick your opponent into thinking you have a strong hand!

**Player B’s optimal strategy:**

- If you roll $\{5,6\}$ then
*always call*. - If you roll a $1$ then
*always fold*. - If you roll $\{2,3,4\}$ then employ a random strategy where you assign probabilities of folding of $x,y,z$ to $2,3,4$ respectively, where $x+y+z = \tfrac{4}{3}$ and $z \le \tfrac{2}{3}$. One simple viable strategy is to fold with probability $\tfrac{4}{9}$ whenever $\{2,3,4\}$ is rolled.

This makes sense; if your hand is strong you should call, if your hand is weak you should fold, and if you’re in no-man’s land, then you should go one way or the other. Why might we expect there to be multiple viable strategies? When Player A raises using the optimal strategy, he either has a strong hand with $\{4,5,6\}$, or a very weak hand with $\{1\}$. Given this information, if Player B rolls a $3$, it’s no better or worse than rolling a $2$! Among these infinitely many viable strategies, each has the same expected payoff so it doesn’t matter which one is chosen.

When both players play optimally, Player A is expected to win $\tfrac{5}{54}$ dollars, or about **9.26 cents** on average. So this is what Player A should pay to Player B beforehand in order to make the game fair.

Wonderful, thank you! I’d figured out that there was no pure-strategy solution (there’s a quick argument that an optimal pure B strategy against any pure or mixed A strategy has to be a threshold for calling–there are only 7 such strategies–and that the best threshold strategy against A’s best strategy against a threshold strategy is never the same threshold strategy). But I was out of my depth looking for mixed strategies non-numerically. Would you mind giving a brief description of how you found the solutions to the dual LPs?

BTW, the solution for A given in your tl;dr version differs from that in the extended version. I am sure the extended one (always call with a 4) is correct.

Also, I have trouble wrapping my head around the exact significance for NE to rational choice in a one-shot game like this. If B adopts any of the equilibrium strategies you identify, then A’s expectation for a strategy of the form [p 0 0 0 1 1] is 5/54 independently of p. It’s true that B’s best strategies against A include those equilibrium ones only for p = 2/3, which is what makes that A’s NE strategy. But that makes me doubt that it follows from a strategy being part of a NE that a player has more reason to follow it than non-NE strategies. If A knows for sure that B will choose a NE strategy, maximizing expectation only motivates choosing some [p 0 0 0 1 1] strategy or other. And if A doesn’t know for sure that B will go NE, it’s not clear in what sense it is optimal for A to do so. Sure, he won’t wish he’d chosen a different one if B happens to choose the best counter-strategy to his, as might happen if he picks a non-NE strategy, but why should an expectation-maximizer be especially worried about that?

Thanks for the comment; I fixed the typo.

I square it as follows: as you pointed out, A has several possible best responses to B’s Nash strategies, i.e. by choosing a different p. But if A made such a choice, B would have an incentive to deviate from its Nash strategy and improve their expected winnings. The only way A can be assured that they will win 5/54 on average is if they play p=2/3. It’s true that A doesn’t know for sure if B will play a Nash strategy, but if A plays p=2/3, then it’s in B’s best interest to play a Nash strategy as well. If B does anything else, then A will win even more. Likewise, if B plays a Nash strategy, they are guaranteed to lose no more than 5/54 on average. There is no guarantee that A will follow suit and play Nash, but if A plays anything else, B will win even more. I hope that makes sense; I apologize if I misunderstood your question!

Thanks. What you say is correct, but my point is that it’s a minimax justification rather than a maximize-expectation one, and I would think that what we mean by the “optimal” strategy for a player is the one that maximizes expectation, not the one that minimizes possible losses. Maybe there’s an argument for the principle “where expectation is not defined, minimize possible losses”, but it obviously can’t be an argument based on the expected benefit of adopting it!

I second this: “Would you mind giving a brief description of how you found the solutions to the dual LPs?”

I’d be interested in knowing what tool(s) you used (open source solver?).

Maybe that’s what’s in Baby poker revisited?

Thanks!

Hi Mark,

Baby Poker Revisited is a different problem altogether — I’m hiding it for now and will probably post at a later date.

I used the JuMP modeling language in Julia together with the CLP open source solver to solve the LPs. I used a random perturbation method to determine whether solutions were unique and to find the full set of solutions in cases where it was not.

I didn’t give more details in my post because it was already getting a little long!

If there is interest I could post code/etc.

Thanks. After posting that request and doing some web searching, I downloaded JuMP and got a single result that fell into the range you specified.

Great post! I followed a similar approach but instead formed the LP in the space of the strategy vectors. Basically, I used the function W as the objective and then maximized with respect to p and minimized with respect to q, with p and q each constrained to [0,1]^6. (The order of min and max still doesn’t matter because of duality.) I rewrote the inner problem using LP duality and then was left with a single LP. I think this can generalize a little better because it’s a smaller LP than when you enumerate all the possible pure strategies, so you can solve it quickly if you use n-sided dice. I thought it might be instructive in case anyone’s interested. You get the same answer either way, obviously. Thanks again for your post!

That’s an excellent point! I somehow convinced myself that I had to work in the augmented space because the p’s and q’s don’t sum to 1. In fact, all I had to do was remove the constraint that they sum to 1! That’s quite nice… and you’re right, much easier to generalize. Much easier to interpret as well, since I had to do all this extra work to tease out the multiple solutions in the higher dimensional space. Thanks for the comment.

I updated my solution to include your approach with the more efficient LPs. Thanks again! I learned something today.

You’re welcome! Thank you for your blog posts! You give great explanations.