In a previous post, I took a look at “baby poker”, a game involving two players rolling a six-sided die. The higher number wins, but players may elect to raise, call, or fold depending on their number (which only they can see). In this post, I’ll take a look at the continuous version of the problem (also appeared in a recent Riddler post!) Here is the full text of the problem:
Toddler poker is played by two players. Each is dealt a “card,” which is actually a number randomly chosen uniformly from the interval [0,1]. (It could be 0.1, or 0.9234781, or 1/π, and so on.) The game starts with each player anteing \$1. Player A can then either “call,” in which case both numbers are shown and the player with the higher number wins the \$2 on the table, or “raise,” betting one more dollar. If A raises, B then has the option to either “call” by matching A’s second dollar, after which the higher number wins the \$4 on the table, or “fold,” in which case A wins but B is out only his original \$1. No other plays are made.
What is the optimal strategy for each player? Under those strategies, how much is a game of toddler poker worth to Player A?
Extra credit: What if the value of the raise is \$k — i.e., players stand to profit \$k instead of \$2 after the raise?
Here is my derivation:
[Show Solution]
Let’s call Player A’s number $x \in [0,1]$ and Player B’s number $y \in [0,1]$. We’ll assume a general mixed strategy for each player and compute each player’s best response. This approach is similar to the one I used in the war game puzzle, but the solution is more complicated this time.
For this solution, I’ll use similar notation and conventions to my solution to the baby poker (the discrete version of toddler poker). Define players’ strategies as follows:
- $p(x)$: probability that Player A will raise if their number is $x$.
- $q(y)$: probability that Player B will fold if their number is $y$.
Let’s call $E(x,y)$ the payoff for Player A when both numbers are revealed:
\[
E(x,y) = \begin{cases}
1&\text{if }x > y \\
-1&\text{if }x < y
\end{cases}
\]We don't consider the case $x=y$ because that case has a zero probability of occurring. If we let $W(x,y)$ be the winnings for Player A, we can compute this quantity as we did for the discrete problem:
\[
W(x,y) = (1-p(x))E(x,y) + p(x)\bigl( k(1-q(y))E(x,y) + q(y) \bigr)
\]Of course, $A$'s expected winnings averaged over all random numbers $x,y$ is simply the integral $\bar W = \int_0^1\int_0^1 W(x,y)\,dx\,dy$.
Player B’s best response
Let’s suppose that Player A uses strategy $p(x)$ and Player B somehow knows this in advance and gets to play the best possible response. What should this response be? For each $y$, $q(y)$ should be chosen to minimize A’s expected winnings. In other words, we should solve:
\[
q(y) = \arg\underset{q}{\min} \int_0^1 \biggl[
(1-p(x))E(x,y) + p(x)\bigl( k(1-q)E(x,y) + q \bigr) \biggr]\,dx
\]The expression on the right is linear in $q$ and the constant terms don’t affect the argmin. So we conclude that
\[
q(y) = \begin{cases}
1 & \text{if } \int_0^1 p(x)(1-kE(x,y))\,dx < 0 \\
0 & \text{otherwise}
\end{cases}
\]Splitting the integral for $x\in[0,1]$ into $x\in[0,y]$ and $x\in[y,1]$, we can substitute the definition of $E(x,y)$ and obtain:
\begin{align}
\int_0^1 p(x)(1-kE(x,y))\,dx
&= \int_0^1 p(x)dx + k\left( \int_0^y p(x) dx - \int_y^1 p(x) dx \right) \\
&= (1-k)\int_0^1 p(x)dx + 2k \int_0^y p(x)dx
\end{align}So our final formula for $q(y)$ is:
$\displaystyle
q(y) = \begin{cases}
1 & \text{if } \int_0^y p(x)dx < \frac{k-1}{2k}\int_0^1 p(x)dx \\
0 & \text{otherwise}
\end{cases}
$
This formula already tells us a lot. If $k \le 1$, the inequality never holds so $q(y)=0$ (always call). If $k > 1$, then $0 < \frac{k-1}{2k} < \tfrac{1}{2}$. Since $\int_0^y p(x)dx$ is a monotonically increasing function no matter what $p$ is, there is a unique $y$ that yields equality. We deduce that $q(y)$ must be a threshold strategy:
\[
q(y) = \begin{cases}
1 & \text{if } 0 \le y < c \\
0 & \text{if } c < y \le 1
\end{cases}
\]where $c$ is chosen such that $\int_0^c p(x)dx = \frac{k-1}{2k}\int_0^1 p(x)dx$. So fold if your hand is bad, and call if your hand is good. Makes sense!
Player A’s best response
Let’s suppose that Player B uses strategy $q(y)$ and Player A somehow knows this in advance and gets to play the best possible response. What should this response be? For each $x$, $p(x)$ should be chosen to maximize A’s expected winnings. In other words, we should solve:
\[
p(x) = \arg\underset{p}{\max} \int_0^1 \biggl[
(1-p)E(x,y) + p\bigl( k(1-q(y))E(x,y) + q(y) \bigr) \biggr]\,dy
\]The expression on the right is linear in $p$ and the constant terms don’t affect the argmin. So we conclude that
\[
p(x) = \begin{cases}
1 & \text{if } \int_0^1 \bigl( -E(x,y) + k(1-q(y))E(x,y) + q(y) \bigr) \,dy > 0 \\
0 & \text{otherwise}
\end{cases}
\]Splitting the integral up as $[0,1] = [0,x] \cup [x,1]$ as we did when we computed Player B’s best response and simplifying the algebra, we obtain a more complicated formula than last time:
$\displaystyle
p(x) = \begin{cases}
1 & \text{if }\,\, \frac{k-1}{k}(\tfrac{1}{2}-x) + \int_0^x q(y)dy < \frac{k+1}{2k}\int_0^1 q(y)dy \\
0 & \text{otherwise}
\end{cases}
$
This is a bit trickier than last time because the left-hand side of the inequality isn’t a simple increasing function in $x$. It contains both an increasing and a decreasing part! So $A$’s best response might be more complicated than a simple threshold strategy. However, we can leverage the fact that we have a formula for $q(y)$…
Combining both best responses
Substituting Player B’s threshold response into the formula for Player A’s best response, we obtain:
\[
p(x) = \begin{cases}
1 & \text{if }\,\, \frac{k-1}{k}(\tfrac{1}{2}-x) + \min(x,c) < \frac{k+1}{2k}c \\
0 & \text{otherwise}
\end{cases}
\]Working out the cases $ x < c $ and $ x > c $ separately, we deduce that:
\[
p(x) = \begin{cases}
1 & \text{if }\,\, 0 < x < \frac{k+1}{2}c-\frac{k-1}{2} \\
0 & \text{if }\,\, \frac{k+1}{2}c-\frac{k-1}{2} < x < \frac{c+1}{2} \\
1 & \text{if }\,\, \frac{c+1}{2} < x < 1
\end{cases}
\]So Player A still plays a threshold strategy... but with two thresholds rather than one! We can now solve for $c$ by substituting $p(x)$ back into the formula $\int_0^c p(x)dx = \frac{k-1}{2k}\int_0^1 p(x)dx$ we derived earlier. This is relatively easy to do because $c$ is always in the middle portion of the interval. i.e. $p(c)=0$. The result is:
\[
\left( \tfrac{k+1}{2}c-\tfrac{k-1}{2} \right) = \tfrac{k-1}{2k}\left[ \left(\tfrac{k+1}{2}c-\tfrac{k-1}{2}\right) + \left(1-\tfrac{c+1}{2}\right)\right]
\]After simplifications, we obtain:
\[
c = \frac{(k-1)(k+2)}{k(k+3)}
\]We can go back and compute the expected winnings of Player A by integrating $W(x,y)$ using the optimal policies we derived. Upon doing this, we find that the expected winnings for Player A are:
\[
\bar W = \frac{k-1}{k(k+3)}
\]
If you’d like the tl;dr instead:
[Show Solution]
Optimal policies
The optimal policy for Player A is:
\[
\text{Player A: } \begin{cases}
\text{raise} & \text{if } 0 < x < \frac{k-1}{k(k+3)} \\
\text{call} & \text{if } \frac{k-1}{k(k+3)} < x < \frac{k^2+2k-1}{k(k+3)} \\
\text{raise} & \text{if } \frac{k^2+2k-1}{k(k+3)} < x < 1
\end{cases}
\]The optimal policy for Player B is:
\[
\text{Player B: } \begin{cases}
\text{fold} & \text{if } 0 < y < \frac{(k-1)(k+2)}{k(k+3)} \\
\text{call} & \text{if } \frac{(k-1)(k+2)}{k(k+3)} < y < 1
\end{cases}
\]The expected payout for Player A is given by the expression:
\[
\text{Expected payout for Player A:}\quad \frac{k-1}{k(k+3)}\quad\text{dollars}.
\]Here are plots that show the optimal strategies:
For the case $k=2$, Player A should raise if $x>0.7$ or if $x<0.1$ (a bluff). Meanwhile, Player B should call if $y > 0.4$ and fold otherwise. On average, Player A wins \$0.10 per game. A fascinating twist is that as $k$ increases, Player A will bluff more aggressively at first, but then will eventually not bluff at all.
The case $k=3$ is special; it corresponds to when Player A is most aggressive (bluffing happens whenever $x < \tfrac{1}{9} \approx 0.111$). This also coincides to when the game is most advantageous to Player A; the expected winnings are also \$0.111. Put another way, if Player A gets to choose how much the raise should be, they should choose \$3! When $k>3$ and as $k$ gets larger, Player A becomes increasingly conservative; raising only very rarely (when a win is all but assured). In this limit, the expected payout for Player A decreases monotonically and converges to \$0 in the limit.
Hi Laurent,
I’m the guy who submitted the “baby poker” problem to the Riddler. I think that the continuous “toddler poker” version is a nice, somewhat more elegant version. I just found out (i probably should have known earlier maybe) that “toddler poker” is the same version of poker that Von Neumann solves in his field-starting book “Theory of Games and Economic Behavior”.
Back to your solution, i think that your result for the game expectation is incorrect as it should be (k-1)/(k*(k+3)) instead of (k-1)*(k+2)/(k*(k+3)). The players’ policy thresholds are however correct. Interestingly, this means that, as k->infinity, the value of the game goes to 0. Even more interestingly, if player A is allowed to choose how much to raise, then the optimum is k=3 which corresponds exactly to the pot size (two) at the the time of the raise. The value of toddler poker is then 1/10, which is close to the 5/54 in the discrete version. One last observation is that toddler poker is, in a certain sense, simpler as it has pure strategies while baby poker has a mixed strategies.
Thank you for your fun problem solving blog & Best Regards,
Dan/
Thanks Dan! It’s a wonderful problem and I really enjoyed working on it. You’re absolutely right; I found the bug in my code and corrected my post. Should be right now. I’m glad it didn’t affect any of the plots I made!
It would actually be interesting to see K=.35, .5, .75 as these are common raise sizes in real games.
I think much more interesting than any > 10 which would never happen.
Thanks again for sharing
Sorry just realized I was thinking about different scenario 🙂
Great analysis.
Is it me or the charts are off in the x-axis? It looks like the first 5th leg is shorter than the others.
The tick marks are correct. I started the x-axis at k=1 because if k<1 it's not a raise at all 🙂
My conclusion when solving this, was that when B has a card between A’s two threshold values, there’s no incentive in either direction to call or fold. B can always call in between the thresholds, always fold, flip a coin, whatever. In short, any time A raises and B has a card in the middle, the expected value of a call is equal to the expected value of a fold. Intuitively it makes sense to me that there would be no unique threshold value for B, because we know A can’t have a card in the middle, so there’s nothing to distinguish any value in the middle from any other for B.
Am I missing some subtlety, and you’re arguing that B must actually pick this particular threshold value?
You’re correct that if A is playing optimally, then B may change their strategy as you said and there will be no detrimental effect. The expected winnings for A will still be \$0.10. In other words, the best response for B isn’t unique.
However, you have to look at the problem both ways. If B does alter their strategy, then A could respond by making their own adjustments and win more than \$0.10! Here is an example: Suppose B changes their threshold to 0.5. As stated previously, A’s expected winnings are still \$0.10. But now if A changes their strategy so that they bluff when $x < 0.2$ and keep the rest of the strategy intact, then now A's expected winnings jump to \$0.12. What the Nash-optimal strategy tells you is that if A plays optimally, then they are guaranteed to win \$0.10 on average no matter what B does. Similarly, if B plays optimally, then they are guaranteed never to lose more than \$0.10 on average no matter what A does. While it's true that A's best response to B might not be unique, and B's best response to A might not be unique, this problem has a unique pair of strategies such that A and B's strategies are best responses to one another.
Since A plays before B, I assumed a priori that B makes the best response to A’s strategy. So in that line of thinking, if A is raising all the way up to .2 (and above .7), then B can respond by calling all the way down to .125.
But of course you’re right about the Nash equilibrium. Because of the A-before-B nature of the problem it hadn’t occurred to me to consider the hypothetical of B playing a fixed strategy that A can respond to.
In the solution to toddler poker with a payout of $2, the optimal strategy is:
Optimal strategy:
Numbers 0 – 0.1 = Bluff
Numbers 0.1 – 0.7 = Call
Numbers 0.7 – 1= Raise
How can this possibly be better than the following strategy?
Alternative strategy:
Numbers 0 – 0.6 = Call
Numbers 0.6 – 0.7 = Bluff
Numbers 0.7 – 1 = Raise
I just don’t understand the intuition. With the alternative strategy, you have a higher chance of winning when player B calls your bluff. Otherwise, they are very similar…
Great question! Just so everybody is clear — Player A only has two choices: call or raise. So “bluff” is the same as “raise”. The reason I give them different names is to distinguish the case where you’re raising with a strong hand vs raising with a weak hand.
In your alternative strategy, instead of raising with 0 – 0.1 and 0.7 – 1 as in my strategy, you propose to raise with 0.6 – 1. You’ll be raising just as often with your strategy as with mine (40% of the time). And we can calculate the expected winnings for both strategies:
With your strategy, assuming Player B plays the same optimal strategy I mentioned in my solution, Player A raises 40% of the time and wins \$0.80 on average when this happens. Player A also calls 60% of the time and loses \$0.40 on average when this happens. The net expected winnings for your strategy is therefore \$0.08.
With my strategy, again assuming the same strategy for Player B, Player A also raises 40% of the time but only wins \$0.55 on average when this happens. So you’re right! Your strategy does win more on average when Player A raises… But when Player A calls (60% of the time), they only \$0.20 on average, which is also less than with your strategy. The net expected winnings for my strategy ends up being \$0.10, which is a bit higher than with your strategy.
Finding the optimal strategy isn’t just about maximizing your winnings when you win. It must be balanced by also minimizing your losses when you lose!
If B’s strategy is based on its own number and not on A’s behavior, I don’t see how “bluffing” could matter. B will fold or call based on the 0.4 threshold whether A raises or not, correct? I’d understand if B’s strategy was something like “call with 0.4 or higher, or 0.5 or higher if A raised.” (I’m also somewhat surprised B’s strategy can’t be improved in such a manner).
Is the simple existence of bluffing “baked in” to B’s strategy such that the threshold would be lower (I think?) than 0.4 if B knew that A wouldn’t bluff (i.e., wouldn’t raise if below its one and only threshold)? If B adopted a cautionary response to raising (or an aggressive response to calling), would that just mean A ought to bluff more and it balances out?
B only gets to play if A raises. When A calls, then the game ends right away. When A raises, B has the choice of either calling or folding.
B could adapt their strategy and improve their expected winnings if they knew ahead of time that A was using a threshold policy, i.e. not bluffing. Likewise, if B didn’t use the optimal 0.4 threshold strategy, then A could also adapt their strategy to improve their expected winnings (by bluffing more aggressively, for example). The Nash equilibrium strategy is optimal in the sense that no player could improve their strategy even if they knew the opponent’s strategy. Similarly, either player’s expected winnings get worse if they deviate from their strategy while the other player keeps their strategy fixed to the Nash optimal.