This week’s Riddler Classic is about the viral word game Wordle.

Find a strategy that maximizes your probability of winning Wordle in at most three guesses.

Here is my solution:

[Show Solution]

There have been a variety of solutions proposed on the internet, most of which involve some form of *greedy heuristic*, which means that we only look to optimize the state of the game immediately after we make our next move. In essence, this is like playing chess but only looking one move ahead. The reason for this concession is that the search space grows exponentially, and it would be computationally intractable to look farther ahead.

This week’s Riddler, however, is a bit different. If our goal is to maximize the probability of winning in at most 3 guesses, then we only need to devise the best *short-term* strategy. So, there is hope that an optimal strategy could be found… And indeed, we will find it! Let $W$ be the set of admissible guess words (contains 12,972 words), and let $S$ be the set of possible solution words (contains 2,315 words). Let us define the following value function:

\[

V_k(T) = \left\{ \begin{array}{l}\text{Maximum probability that we win in at most }k\text{ turns,} \\

\text{assuming the solution is contained in the set }T\\

\text{and we can use any guesses from the set }W.

\end{array}\right\}

\]Note that the Riddler problem is asking us to find $V_3(S)$. Our general approach will be to use dynamic programming, which involves working our way up to $k=3$. Let’s start with the simplest case, $k=1$.

When $k=1$, we are asking to maximize the probability that we win in one move. Given that every word is equally likely to be chosen, if we know our solution lies in $T$, our best bet is to pick any word in $T$, and we obtain

\[

V_1(T) = \frac{1}{|T|},

\]where $|T|$ is the number of words contained in $T$. Therefore, our strategy on the *last move* should be to pick any word from $T$, the list of valid words remaining. For larger $k$, we can use the Bellman equation, which is the fundamental idea behind dynamic programming. In our context, it is:

\[

V_{k}(T) = \max_{w \in W} \frac{1}{|T|} \sum_{s \in T} V_{k-1}(f(w,s,T))\qquad\text{for }k=2,3,\dots

\]where $f(w,s,T)$ is the set of words that are still admissible if we guess $w$, and the true word is $s\in T$, where $T$ is the known list of possible words remaining. This makes intuitive sense; we look at every possible move we could make, and for each one we look at the average probability over all possible solution words that we will finish in at most $k-1$ moves.

For $k=2$, the Bellman equation can simplify. Suppose we are contemplating a guess $w \in W$. Applying this guess would split the set $T$ into $M_w$ possible groups, based on the $M_w$ different possible sets of green and yellow tiles we could receive in response to the guess $w$. In other words,

\[

|T| = n_1 + n_2 + \cdots + n_{M_w}

\]where $n_i$ is the number of words in our reduced pool of possible solutions if we receive response $i$. If we receive response $i$, then the probability of winning in one move must be $1/n_i$, as discussed above. Therefore, we can apply Bellman’s equation and obtain:

\begin{align}

V_{2}(T) &= \max_{w\in W} \frac{1}{|T|}\biggl( \underbrace{\tfrac{1}{n_1}+\cdots+\tfrac{1}{n_1}}_{n_1\text{ times}} + \cdots + \underbrace{\tfrac{1}{n_{M_w}}+\cdots+\tfrac{1}{n_{M_w}}}_{n_{M_w}\text{ times}}\biggr)\\

&= \max_{w\in W}\frac{M_w}{|T|}

\end{align}So no matter what word $w\in W$ we pick, what determines our probability of winning in two moves or fewer is the number of “splits” $M_w$, that is, the number of sets into which $T$ is partitioned after we make our choice. Consequently, the optimal strategy for the *second last move* is to pick the word that maximizes the number of splits.

We have now figured out the optimal strategy for the last two moves. All that remains is the first move. Unfortunately, the argument above does not appear to generalize beyond $k=2$, so we must turn to a brute force search. However, since we already know how to act optimally for the second and third moves, this can be carried out pretty efficiently. The result we find is that the optimal first word is TRACE, and the associated probability of winning in 3 moves or fewer is $\frac{1388}{2315} \approx 0.599568$; just shy of 60%.

Based on previous numerical simulations performed with the help of Vincent Tjeng (our code can be found on Github), we established that the optimal first word to pick when maximizing splits (as a greedy heuristic) is also TRACE. Therefore, for this problem the optimal strategy to win in three moves or fewer is actually to maximize the number of splits for the first two moves. As far as I can tell, this is a coincidence; if we used a different word list, for example, the optimal strategy for the first move may be different.

**Note:** As far as I can tell, the proof that the max-splits heuristic is optimal can only be applied to the second-last move (move two, in our case). The fact that it turned out to be optimal for the first move as well appears to be a coincidence. Indeed, if we extend this problem to more moves, or use different word lists, I suspect that the max-splits heuristic will no longer be optimal for the first move.

### Interesting anomaly

If we apply the “max-splits” heuristic at every turn, here is the distribution of how long it takes different words to win:

So based on this approach, we should win in 3 turns or fewer with probability $\frac{1+74+1233}{2315} = \frac{1308}{2315} \approx 0.565$. Why does this not produce the same answer as the $\frac{1388}{2315}$ found above? The reason is that while “max-splits” is the correct strategy for the first two moves, it is not the correct strategy for the final move. Using max-splits often leads to picking words that are not part of the valid solutions, and therefore can never lead to winning in the next turn!

To illustrate how this can happen, consider the following example, contributed by Vincent Tjeng. Suppose we begin with the usual first guess of TRACE, and we receive the response . Based on this response, the solution must be one of: {BIRCH, LURCH, PORCH}. Our task is to maximize our probability of winning in at most two more moves (3 moves total). It turns out that picking one of the three possible solution words as our second guess is the wrong strategy! Here is what happens for different choices of our second guess.

True solution | Prob. the game lasts a total of | |||||
---|---|---|---|---|---|---|

Guess | BIRCH | LURCH | PORCH | 2 turns | 3 turns | 4 turns |

BIRCH | 1/3 | 1/3 | 1/3 | |||

LURCH | 1/3 | 1/3 | 1/3 | |||

PORCH | 1/3 | 1/3 | 1/3 | |||

ZONAL | 0 | 1 | 0 |

If we guess BIRCH, we will get lucky with probability 1/3 and win on the spot. But if we don’t guess correctly, we will be left with two words. It could take one or two more guesses to win, each with probability 1/3. So the probability that we win in at most 3 total moves moves is 2/3. In this case, there are two “splits” (different possible responses) and three total words in the list, which is why the probability is 2/3.

However, if we guess ZONAL, which is not even on the list of remaining words, we sacrifice the possibility of winning on the spot, but we guarantee a win on the next move. This is because each possible solution provides a different response, allowing us to perfectly distinguish them. There are three “splits” and three total words in the list, which is why the probability is 3/3 = 1.

In the Bellman equation above, why does the solution not generalize for k > 2?

When $k=2$, the solution is relatively simple because it depends on the solution at $k=1$, which is just $1/|T|$. So there turns out to be a nice formula for it. But when $k=3$, the solution depends on the solution at $k=2$, and once you plug that in, the expression you get does not simplify as it did in the previous steps. In order to find the maximum over $w\in W$, you now have to test each $w$ and see which one gives you the larger score.