Wordle is a popular daily word game that has taken the world by storm. The rules are simple: Guess the hidden 5-letter “solution word” using at most 6 guesses. Every time you make a guess, the letter tiles of your guess word are colored to provide a hint. Wordle was recently acquired by the New York Times, and remains free, though now at a new website.
What is the best first guess to use in Wordle? Is it always possible to win in 5 moves or fewer? what about 4 moves? How much harder is Hard Mode? In this article, I will answer all these questions and more!
Summary of findings
- We find a strategy that is guaranteed to win Hard Mode in 5 moves or fewer, using only common words. This strategy is not easy to find; only two words (SCAMP and SCOWL) can work as the first guess. Since this strategy works for Hard Mode, it also works for regular mode.
- We prove there does not exist any strategy (regular mode or Hard Mode) that guarantees a win in 4 moves or fewer.
- Most first guesses allow you to win in 3-4 moves on average, provided you choose your next guesses wisely. This is possible even you restrict yourself to using only common words. We provide several strategies that perform well and serve to illustrate a fundamental trade-off: strategies that perform better on average tend to perform worse in the worst case, and vice versa. So which strategy is “best” depends on how much risk you’re willing to take!
Before I begin, I want to acknowledge my collaborator Vincent Tjeng, a software engineer at Google. Vincent made significant contributions to the code for this project. In particular, Vincent found speedups that reduced the runtime from days down to minutes in certain cases! I would also like to thank Evan Sparks, vice president in AI and High Performance Computing at Hewlett Packard Enterprise. For some of the more computationally intensive tasks, Evan graciously ran our code on a computer cluster. In one case, this reduced the computation time from 7 months down to a couple days!
Preliminaries
Contents
- A gray tile A means the solution word does not contain A.
- A yellow tile A means the solution word contains A in a different position.
- A green tile A means the solution word contains A in the correct position.
Wordle uses two word lists. The solution word for each day is drawn from a list of 2,315 common words. In addition to the 2,315 possible solutions, you can guess any word from a list of 10,675 additional 5-letter words, for a total of 12,947 valid guess words.
Edit: Since acquisition by the New York Times, there are still two word lists, but they have been trimmed slightly (mostly to remove vulgar or offensive words). The new solution list contains 2,309 words and the new additional list contains 10,638 words. From this point on, I will assume we are playing with the new lists.Here is a simple example of what can happen on the first turn. Suppose we start by guessing STUMP. Depending on the solution word, there are 90 possible responses we could receive. Here are some possible responses and their ensuing implications:
- S T U M P : None of the letters were correct. This still gives us useful information; we now know the solution does not contain any of the letters S, T, U, M, P. This narrows our original list of 2,309 down to 726 words.
- S T U M P : The solution word contains S but in a different position. It does not contain T, U, M, P. This narrows down the possible solutions to 87 words.
- S T U M P : The U is in the correct position, and the S and P are in the wrong position. The solution does not contain T or M. This narrows down the possible solutions to only two words (PAUSE and PLUSH).
- S T U M P : The T is in the correct position, and the S is in the wrong position. Although this seems less informative than the previous case, there is only a single word that fits this description! (ETHOS).
In the work that follows, we assume prior knowledge of both word lists. In principle, this is not really a limitation, since any sort of systematic approach to the game must assume some word list and the strategies we derived could have been derived using any word list of our choosing.
Why guess words that can’t win?
You might be wondering why there are two word lists in the first place. What possible benefit could there be to guessing a word that we know can never be the solution? Let’s look at an example. Suppose we begin with a first guess of TRACE, and we get the following response:
Guess 1 | Response | Possible solution words |
TRACE | T R A C E | (BIRCH, LURCH, PORCH) |
Based on the response we received, there are only three possible solutions words that fit. For our next move, we could try guessing one of these words, or we could guess something entirely different… Here are some possible scenarios.
Response if the solution word is… | Probability the game lasts a total of | ||||||
BIRCH | LURCH | PORCH | 2 turns | 3 turns | 4 turns | ||
Guess 2 | BIRCH | B I R C H | B I R C H | B I R C H | 1/3 | 1/3 | 1/3 |
LURCH | L U R C H | L U R C H | L U R C H | 1/3 | 1/3 | 1/3 | |
PORCH | P O R C H | P O R C H | P O R C H | 1/3 | 1/3 | 1/3 | |
ZONAL | Z O N A L | Z O N A L | Z O N A L | 0 | 1 | 0 |
If we guess a solution word, we will guess correctly 1/3 of the time, but if we’re wrong, there is no way to tell which of the other two words are correct since both produce the same response. We may end up needing two more guesses, for a total of 4 guesses to find the solution word! However, if we guess ZONAL (which is not even on the list of remaining words), we sacrifice the possibility of winning on the spot, but we guarantee a win on the next turn. This is because BIRCH, LURCH, and PORCH each provide a different response to the guess ZONAL, allowing us to perfectly distinguish them. Note that all strategies above win in 3 moves on average, but if we guess ZONAL, we always win in 3 moves.
In general, the remaining solution words are sometimes uninformative when used as guesses, so having the flexibility to pick words outside the solution set allows us to pick more informative guesses, which can have a benefit in the long run.
Hard Mode
Wordle also has a “Hard Mode”. In Hard Mode, we are forced to use guess words that are consistent with all past responses. For example, if we learn that the solution starts with S, then all our guesses from that point on must start with S. Many people do this instinctively; they only guess words that have a chance of winning. If we played the example above in Hard Mode, we would not be allowed to play ZONAL at all, and we could not guarantee a win in 3 moves.
Hard Mode is uniquely challenging because each guess we make can restrict our future guesses. If we guess poorly, things can go downhill in spectacular fashion. For example, suppose our first guess and response is N I G H T. This seems like we got lucky, but we’re actually in a lot of trouble. Now all our subsequent guesses are forced to end in IGHT, since we know that these letters are correct, and unfortunately, there are 8 possible solution words (EIGHT, FIGHT, LIGHT, MIGHT, RIGHT, SIGHT, TIGHT, WIGHT). All we can do is try them one by one, and if we’re unlucky, we will find the solution in a total of 9 moves. In regular mode, however, we can eliminate more options by picking a word like FERMS. This leads to finding the correct word in at most 5 total moves. So Hard Mode is far less forgiving than regular mode.
Greedy heuristics
In general, it would take too long to exhaustively check all possible Wordle strategies (more on this later). One possible alternative is to take a greedy approach, which roughly means that we only look one move ahead. Every guess word we choose determines a partition of the 2,309 solution words into what I’ll call splits. In the first example above, the word STUMP partitioned the solution word list into 90 splits, each one associated with a different possible response, and each containing a different subset of the original word list. We can visualize this by plotting a bar chart. Each bar is a different possible response, and the bar height corresponds to the number of words that could produce the given response (sorted from smallest to largest). The sum of all bar heights is 2,309 (the total number of solution words).
If each of the 2,309 solution words is equally likely, we can also interpret this bar charts as a probability distribution. Since the largest bar has a height of 726, there is a $\frac{726}{2{,}306} \approx 31.5\%$ chance we will be stuck with 726 possible words on our second turn. Each different starting word leads to a different distribution plot like the one above. Here is another version of the same plot, this time comparing several different starting words.
Each distribution has an equal area under the curve, since they just correspond to different ways of splitting the 2,309 words into groups. So which one should we pick? Roughly, we want distributions that are more “spread out” and “flatter” because this means we are likelier to have a smaller set of solution words after we make our first move. Here are three intuitive ways to achieve this:
- Max-size prioritization: One possible strategy is worry about the “worst possible case”. This is achieved by making sure the tallest bar (the peak of the distribution) is as short as possible. It turns out there are two words that do this equally well: RAISE and ARISE. Both have a peak of 167 (much better than the 726 we had with STUMP!), which means that after this first guess, you are guaranteed to reduce the list of possible solutions to no more than 167. RAISE has the lowest peak in the plot above.
- Max-splits prioritization: Another possible strategy is to maximize the number of splits (distinct possible responses) that we could receive. This is the same as picking the distribution that is widest, or in probability jargon, the distribution with maximum support. Intuitively, more spread-out distributions lead to fewer words per split, which can help quickly eliminate easy options, potentially lowering the average number of guesses required to win. The best first guess using this approach is TRACE, which has 150 possible responses. TRACE has the widest distribution in the plot above.
- Max-entropy prioritization: Finally, we could look at the entropy, which is an information-theoretic concept that measures an average notion of uncertainty. Distributions with sharp peaks tend to have low entropy, and distributions that are more uniform have high entropy. By maximizing entropy, we are attempting to make all possible outcomes equally bad (i.e., none of them are too bad). The best first guess using this approach is SOARE.
If you’re curious about letter frequency, here is a chart showing how frequently each letter occurs in the Wordle solution words. As you can see, the three recommended start words above (RAISE, TRACE, SOARE) all use common letters. It’s actually possible to use all five most common letters; the words ROATE and ORATE are valid first guesses! (their distributions are very similar to that of SOARE).
Performance comparison
How well do these different guess-selection heuristics work when we apply them repeatedly? We can test each strategy on all 2,309 possible solution words and see how many turns are required to win. We can also see how each strategy fares when used in Hard Mode. Here are the results:
In regular mode, the Max-Size strategy is guaranteed to win in 5 moves or fewer, while the other two strategies sometimes require 6 moves. However, the Max-Size strategy takes more moves to win, on average, than both other strategies. So there is a trade-off; the strategy you should pick depends on whether you want to do well on average, or whether you want to protect yourself against the worst case. This is analogous to the trade-offs present in portfolio management: strategies that have higher expected return also tend to have higher risk.
If each solution word is equally likely to occur, these plots can be interpreted as probability distributions. For example, the Max-Size strategy in regular mode will take 5 turns to win $\frac{73}{2{,}315} \approx 3.15\%$ of the time.
The strategies presented above are fundamentally greedy; each decision is based solely on the state of the game one move from now. Perhaps it is not surprising that these strategies do not fare well in Hard Mode (they each have a worst case of 8 moves). By only looking one move ahead, we don’t protect ourselves against bad cases such as NIGHT discussed above. To do well in Hard Mode, we need to look farther ahead!
Does the first guess matter?
In this section, we will show that no matter which first guess is used, if we make reasonable guesses afterwards, we will be successful most of the time. In our simulations, we used the Max-Splits heuristic for all guesses beyond the first. In an effort to simulate a more “human” strategy, we also restricted all the guesses to just the 2,309 solution words. This means only relatively common words were ever used. For all 2,309 possible first guesses, we compared the worst-case and average case number of moves required to win. Here is what we found.
In the left plot, we see that with this simple strategy (Max-Splits, using common words only), most starting words win in 6 moves or fewer, and 141 starting words guarantee a win in 5 moves or fewer. In the right plot, we see that the average number of moves required to win is consistently between 3.4 and 4.1. All of this to say: the first word does not matter much, and even with a limited vocabulary and a crude strategy, we can do quite well. We ran the same experiment in Hard Mode, and here is what we found.
In Hard Mode, the worst case is never good! This supports our previous claim that greedy strategies have bad worst-case performance in Hard Mode. By only looking one move ahead, we can paint ourselves into a corner rather easily. There is some good news, however. Based on the right plot, we see that these “bad cases” are rare. The average number of moves required is only a bit worse than in regular mode. So even in Hard Mode, we should expect simple strategies to do well most of the time. In the next section, we’ll explore the limits of what is possible if we don’t restrict ourselves to using greedy strategies.
Optimal strategies
Greedy strategies may underperform because they only look one move ahead, so in this section, we will explore what happens if we stop being greedy and start being optimal! First, we must acknowledge that a naïve brute-force search of all possible strategies is not feasible. There are 12,947 possible guess words and dozens of possible responses to each guess. Assuming we win on the fifth move and there are 20 different responses per guess, then there would be roughly $12{,}972 \times 20 \times 12{,}972 \times 20 \times 12{,}972 \times 20 \times 12{,}972 \approx 2.3\times 10^{20}$ different strategies. For each strategy, we would have to try it with each of the 2,309 possible solution words. This would likely take years on a supercomputer. Nevertheless, it is still possible to draw some useful conclusions for certain special cases.
Solving Hard Mode
Hard Mode is difficult for humans (and for greedy heuristics) because each guess has downstream consequences that are difficult to anticipate. However, this fact actually makes the problem easier from the point of view of a brute-force computer search! Every time we make a move, the number of possible future moves is reduced dramatically. So by the time we get to turn 3, there will typically only be a few words we can play. This makes it possible to use a computer to search all possible Hard Mode strategies. Our approach is to find the smallest number $n$ such that we can guarantee a win in $n$ moves or fewer. We then search for a strategy guaranteeing a win in the fewest average number of guesses among all strategies guaranteeing a win in $n$ moves or fewer. By posing the question in this way, we can save time by eliminating from consideration any strategy that has not narrowed down the possible solution to a single word after $n-1$ guesses. When all is said and done, the full search took roughly 5 minutes on a laptop. We found that:
- It is possible to guarantee a win in Hard Mode in 5 moves, but not in 4. This is a vast improvement over the greedy heuristics above, which could require 7+ moves in the worst case.
- We can win in 5 moves in Hard Mode even if we further restrict ourselves to only using guess words that belong to the solution word list! In this case, there are only two starting words that work: SCAMP and SCOWL. SCAMP has a slightly better performance in terms of average number of turns (3.71546 for SCAMP vs. 3.75227 for SCOWL).
Interestingly, the optimal first words for Hard Mode do not use common letters! In fact, if we refer to the “distribution of solution words for different initial guesses” plot above, we can see that SCAMP performs poorly on all greedy metrics we used (it has few splits, a high peak, and low entropy). However, SCAMP is a good starting word for Hard Mode! Here is a plot showing the performance of SCAMP in Hard mode, using only guess words from the solution list.
Optimal Hard Mode strategy Initial guess: SCAMP |
Average: 3.7155 moves, Worst case: 5 moves |
If you want to try this strategy yourself, you can find it here. We also tested the strategy against Absurdle, an adversarial Wordle clone that changes the solution word every turn to be consistent with previous responses in an effort to prolong the game as much as possible. As anticipated, the optimized Hard Mode strategy can defeat Absurdle in 5 moves:
Side note: Absurdle is not optimally adversarial, because this would require looking many moves ahead, and would be just as difficult as solving Wordle in the first place! Instead, Absurdle uses a greedy approach similar to the Max-Size strategy described above in order to constantly maximize the number of possible solution words remaining. For this reason, it may be possible to solve Absurdle in fewer than 5 moves.
Solving regular mode
Since guaranteeing a solution in 5 moves or fewer is possible in Hard Mode, this means it is also possible in regular mode. In fact, we showed above that it can be done using greedy heuristics, a variety of start words, and even with a restricted word list! If our goal is to optimize the average number of moves required, the Max-Splits heuristic does quite well, with an average of 3.4320. Finding the strategy that minimizes the average number of moves requires an exhaustive search of all strategies, and we already saw that this is not feasible. Nevertheless, many other people have come up with clever non-greedy strategies that work quite well. For example, 3Blue1Brown made an excellent video, in which he presents an entropy-based approach that looks 2 moves ahead. Using the starting word CRANE, he obtains an average of 3.43 moves. Jonathan Olson’s beautiful Wordle site also presents several non-greedy strategies, among which 3.42 is the best average. Although this is a difficult problem to solve computationally, I suspect there is not much to be gained by optimizing further, since we were already able to achieve results in the ballpark of 3.43 using only common words and greedy heuristics!
We cannot optimize for average moves because the search space is too large, but we can reduce the search space if we ask the right sorts of questions. For example, we could ask: what strategy has the highest probability of winning in 3 moves or fewer? This question was posed by Zach Wissner-Gross of 538’s Riddler blog. It turns out the way to achieve this is to start with TRACE, then use the Max-Splits heuristic for the second move, and then pick a word at random among possible remaining solution words on the third turn. This leads to a win in 3 moves for roughly 60% of the 2,309 words. For more details on this strategy, see my blog post on the topic.
Another question within our reach is to see whether it is possible to guarantee a win in regular mode in 4 moves. To answer this question, we can exclude any strategy that would take more than 4 turns. The resulting search space is still very large, but manageable, particularly since the search is easily parallelizable over the choice of initial guess. We were able to perform an exhaustive search in about 5000 CPU-hours, which took a couple days on a computer cluster. What did we find? There is no strategy that is guaranteed to win Wordle in 4 moves or fewer.
Edit: We did not re-run the computation after the New York Times made minor changes to the word list. We strongly suspect our result will be the same with the new word list, since only a few solution words were removed in the new list.
Concluding remarks
It may be impossible to guarantee a win in 4 moves of fewer, but as long as you’re not playing against Absurdle, most reasonable strategies will typically finish in 4 moves or fewer. In fact, the choice of first word does not affect the outcome that much, and you don’t even need to resort to fancy words to do well. The question of “which strategy is best?” is a matter of personal preference.
Hard Mode, however, is a different story. Each guess can have severe downstream consequences, so if you want to guarantee you won’t lose, it is important to choose wisely, and not be greedy. In fact, in order to guarantee a win in 5 moves or fewer in Hard Mode while restricting your guesses to possible solution words only, you must start with either SCAMP or SCOWL, which both use rather uncommon letters.
The code that produced all results above can be found in our Github repo. Thanks for reading, and happy Wordling!
Pingback: Wordle solvers – updated – Mike Vermeulen's web
Pingback: Excel Doing Billions Of Calculations To Study Wordle
Success in 3 tries using stump and trace. Beginners luck or brilliant information from your site.
Pingback: How to Win at Wordle Every Time - News @ Northeastern - Northeastern University - Tech Me Now
Your list of heuristics should include average-split-size minimization. RAISE’s first-test 132 splits have max-size 167 and average-size 60.744 (averaged over the 2309 equiprobable solutions), the lowest such average of all the 12947 testers. By minimizing average-split-size I frequently get the answer in 3 tries, rarely 5. My program likes to minimize average-split-size, but prefers minimizing average-remaining-guesses. Any partition with a max-split-size less than 4 has an easily computed average-remaining-guesses, which can be minimized e.g. by preferring a good triple (like BUNNY, FUNKY, FUNNY, average remaining guesses 1 2/3) to a bad triple (BIRCH, LURCH, PORCH, remaining guesses 2).