Toddler poker

In a previous post, I took a look at “baby poker”, a game involving two players rolling a six-sided die. The higher number wins, but players may elect to raise, call, or fold depending on their number (which only they can see). In this post, I’ll take a look at the continuous version of the problem (also appeared in a recent Riddler post!) Here is the full text of the problem:

Toddler poker is played by two players. Each is dealt a “card,” which is actually a number randomly chosen uniformly from the interval [0,1]. (It could be 0.1, or 0.9234781, or 1/π, and so on.) The game starts with each player anteing \$1. Player A can then either “call,” in which case both numbers are shown and the player with the higher number wins the \$2 on the table, or “raise,” betting one more dollar. If A raises, B then has the option to either “call” by matching A’s second dollar, after which the higher number wins the \$4 on the table, or “fold,” in which case A wins but B is out only his original \$1. No other plays are made.

What is the optimal strategy for each player? Under those strategies, how much is a game of toddler poker worth to Player A?

Extra credit: What if the value of the raise is \$k — i.e., players stand to profit \$k instead of \$2 after the raise?

Here is my derivation:
[Show Solution]

If you’d like the tl;dr instead:
[Show Solution]

13 thoughts on “Toddler poker”

  1. Hi Laurent,

    I’m the guy who submitted the “baby poker” problem to the Riddler. I think that the continuous “toddler poker” version is a nice, somewhat more elegant version. I just found out (i probably should have known earlier maybe) that “toddler poker” is the same version of poker that Von Neumann solves in his field-starting book “Theory of Games and Economic Behavior”.

    Back to your solution, i think that your result for the game expectation is incorrect as it should be (k-1)/(k*(k+3)) instead of (k-1)*(k+2)/(k*(k+3)). The players’ policy thresholds are however correct. Interestingly, this means that, as k->infinity, the value of the game goes to 0. Even more interestingly, if player A is allowed to choose how much to raise, then the optimum is k=3 which corresponds exactly to the pot size (two) at the the time of the raise. The value of toddler poker is then 1/10, which is close to the 5/54 in the discrete version. One last observation is that toddler poker is, in a certain sense, simpler as it has pure strategies while baby poker has a mixed strategies.

    Thank you for your fun problem solving blog & Best Regards,

    1. Thanks Dan! It’s a wonderful problem and I really enjoyed working on it. You’re absolutely right; I found the bug in my code and corrected my post. Should be right now. I’m glad it didn’t affect any of the plots I made!

      1. It would actually be interesting to see K=.35, .5, .75 as these are common raise sizes in real games.
        I think much more interesting than any > 10 which would never happen.

        Thanks again for sharing

  2. Great analysis.
    Is it me or the charts are off in the x-axis? It looks like the first 5th leg is shorter than the others.

  3. My conclusion when solving this, was that when B has a card between A’s two threshold values, there’s no incentive in either direction to call or fold. B can always call in between the thresholds, always fold, flip a coin, whatever. In short, any time A raises and B has a card in the middle, the expected value of a call is equal to the expected value of a fold. Intuitively it makes sense to me that there would be no unique threshold value for B, because we know A can’t have a card in the middle, so there’s nothing to distinguish any value in the middle from any other for B.

    Am I missing some subtlety, and you’re arguing that B must actually pick this particular threshold value?

    1. You’re correct that if A is playing optimally, then B may change their strategy as you said and there will be no detrimental effect. The expected winnings for A will still be \$0.10. In other words, the best response for B isn’t unique.

      However, you have to look at the problem both ways. If B does alter their strategy, then A could respond by making their own adjustments and win more than \$0.10! Here is an example: Suppose B changes their threshold to 0.5. As stated previously, A’s expected winnings are still \$0.10. But now if A changes their strategy so that they bluff when $x < 0.2$ and keep the rest of the strategy intact, then now A's expected winnings jump to \$0.12. What the Nash-optimal strategy tells you is that if A plays optimally, then they are guaranteed to win \$0.10 on average no matter what B does. Similarly, if B plays optimally, then they are guaranteed never to lose more than \$0.10 on average no matter what A does. While it's true that A's best response to B might not be unique, and B's best response to A might not be unique, this problem has a unique pair of strategies such that A and B's strategies are best responses to one another.

      1. Since A plays before B, I assumed a priori that B makes the best response to A’s strategy. So in that line of thinking, if A is raising all the way up to .2 (and above .7), then B can respond by calling all the way down to .125.

        But of course you’re right about the Nash equilibrium. Because of the A-before-B nature of the problem it hadn’t occurred to me to consider the hypothetical of B playing a fixed strategy that A can respond to.

  4. In the solution to toddler poker with a payout of $2, the optimal strategy is:

    Optimal strategy:
    Numbers 0 – 0.1 = Bluff
    Numbers 0.1 – 0.7 = Call
    Numbers 0.7 – 1= Raise

    How can this possibly be better than the following strategy?

    Alternative strategy:
    Numbers 0 – 0.6 = Call
    Numbers 0.6 – 0.7 = Bluff
    Numbers 0.7 – 1 = Raise

    I just don’t understand the intuition. With the alternative strategy, you have a higher chance of winning when player B calls your bluff. Otherwise, they are very similar…

    1. Great question! Just so everybody is clear — Player A only has two choices: call or raise. So “bluff” is the same as “raise”. The reason I give them different names is to distinguish the case where you’re raising with a strong hand vs raising with a weak hand.

      In your alternative strategy, instead of raising with 0 – 0.1 and 0.7 – 1 as in my strategy, you propose to raise with 0.6 – 1. You’ll be raising just as often with your strategy as with mine (40% of the time). And we can calculate the expected winnings for both strategies:

      With your strategy, assuming Player B plays the same optimal strategy I mentioned in my solution, Player A raises 40% of the time and wins \$0.80 on average when this happens. Player A also calls 60% of the time and loses \$0.40 on average when this happens. The net expected winnings for your strategy is therefore \$0.08.

      With my strategy, again assuming the same strategy for Player B, Player A also raises 40% of the time but only wins \$0.55 on average when this happens. So you’re right! Your strategy does win more on average when Player A raises… But when Player A calls (60% of the time), they only \$0.20 on average, which is also less than with your strategy. The net expected winnings for my strategy ends up being \$0.10, which is a bit higher than with your strategy.

      Finding the optimal strategy isn’t just about maximizing your winnings when you win. It must be balanced by also minimizing your losses when you lose!

  5. If B’s strategy is based on its own number and not on A’s behavior, I don’t see how “bluffing” could matter. B will fold or call based on the 0.4 threshold whether A raises or not, correct? I’d understand if B’s strategy was something like “call with 0.4 or higher, or 0.5 or higher if A raised.” (I’m also somewhat surprised B’s strategy can’t be improved in such a manner).

    Is the simple existence of bluffing “baked in” to B’s strategy such that the threshold would be lower (I think?) than 0.4 if B knew that A wouldn’t bluff (i.e., wouldn’t raise if below its one and only threshold)? If B adopted a cautionary response to raising (or an aggressive response to calling), would that just mean A ought to bluff more and it balances out?

    1. B only gets to play if A raises. When A calls, then the game ends right away. When A raises, B has the choice of either calling or folding.

      B could adapt their strategy and improve their expected winnings if they knew ahead of time that A was using a threshold policy, i.e. not bluffing. Likewise, if B didn’t use the optimal 0.4 threshold strategy, then A could also adapt their strategy to improve their expected winnings (by bluffing more aggressively, for example). The Nash equilibrium strategy is optimal in the sense that no player could improve their strategy even if they knew the opponent’s strategy. Similarly, either player’s expected winnings get worse if they deviate from their strategy while the other player keeps their strategy fixed to the Nash optimal.

Leave a Reply

Your email address will not be published. Required fields are marked *