In the spirit of the Kentucky Derby, this Riddler puzzle is about a peculiar type of horse race.
The bugle sounds, and 20 horses make their way to the starting gate for the first annual Lucky Derby. These horses, all trained at the mysterious Riddler Stables, are special. Each second, every Riddler-trained horse takes one step. Each step is exactly one meter long. But what these horses exhibit in precision, they lack in sense of direction. Most of the time, their steps are forward (toward the finish line) but the rest of the time they are backward (away from the finish line). As an avid fan of the Lucky Derby, you’ve done exhaustive research on these 20 competitors. You know that Horse One goes forward 52 percent of the time, Horse Two 54 percent of the time, Horse Three 56 percent, and so on, up to the favorite filly, Horse Twenty, who steps forward 90 percent of the time. The horses’ steps are taken independently of one another, and the finish line is 200 meters from the starting gate.
Handicap this race and place your bets! In other words, what are the odds (a percentage is fine) that each horse wins?
Here is my full derivation (long!):
[Show Solution]
Let’s take a look at a particular horse. Each step, suppose it moves forward with probability $q > \frac{1}{2}$ or backward with probability $1-q$. The horse starts at position $0$ and we would like to know the probability that it will reach position $2n$ in exactly $2k$ steps. The reason for the factor of $2$ is that to reach an even-numbered position (such as $200$, in the problem statement), the horse must take an even number of steps. Let’s call this probability $P(n,k,q)$.
This scenario is known as a 1D random walk, and past posts that deal with similar ideas include The Deadly Board Game and Gambler’s Ruin. This problem is a bit different, however, since we care about the distributions of the number of moves required (not just the expected number of moves).
Back to the task at hand: determining $P(n,k,q)$. If we reach the position $2n$ in $2k$ moves, then clearly we have $k \ge n$. Next, we can deduce that we must have moved forward $(k+n)$ times and backward $(k-n)$ times. The probability of this occurring is $q^{n+k}(1-q)^{n-k}$. The tricky part here is to deal with the order in which we take the steps. The last step we take must always be forward, so we will restrict our attention to arrange the remaining $2k-1$ steps. There is a total of $2k-1 \choose k-n$ ways to do this, but this is not the number we’re looking for! Some of these paths will reach $2n$ too early, so we will count these paths and subtract them.
The quantity we’re counting is closely related to the Catalan Numbers, and we can borrow a similar proof technique (namely André’s reflection method) to arrive at the solution. Here is the argument: imagine the path laid out as a “mountain range” where each of the $2k-1$ steps is a move to the right and either up or down depending on whether the step was forward or backward. The diagram below illustrates a valid path for $n=2$ and $k=5$. Note that we are allowed to go negative!
We would like to exclude “bad paths” that reach $n$ at some earlier point. Consider the first point at which this occurs, and reflect the remainder of the path about the line $y=2n$. This leads to a new path that has $(k+n)$ forward steps, $(k-n-1)$ backward steps, and ends at $2n+1$ instead of $2n-1$. See the figure below for an illustration of the reflection method.
There is a one-to-one correspondence between bad paths and unconstrained paths of length $2k-1$ that reach $2n+1$. So the total number of valid paths (total paths minus bad paths) is given by:
\[
{2k-1 \choose k-n}-{2k-1 \choose k-n-1}=\frac{n}{k}{2k \choose k-n}
\]The simplification follows from (these identities), but can also be easily derived using algebra. Putting everything together, the probability of a horse first reaching $2n$ in exactly $2k$ steps is given by the formula:
$\displaystyle
P(n,k,q) = \frac{n}{k}{2k \choose k-n} q^{k+n} (1-q)^{k-n},\quad\text{for }k=n,n+1,\dots
$
We can plot the distributions for our case of interest, which is $n=100$ and $q$ varies from $0.52$ (for Horse $1$) up to $0.90$ (for Horse $20$). Here is a plot of the distributions for some of the horses:
As we can see, the distributions are normal to very good approximation. Although they are challenging expressions to manipulate, the task is not beyond the capabilities of Mathematica! In fact, we can verify that $P$ is a legitimate probability mass function and we can also compute its mean and variance for the case $\tfrac{1}{2} \lt q \le 1$:
\begin{align}
\sum_{k=n}^\infty P(n,k,q) &= 1 & & \text{(sums to 1)}\\
\sum_{k=n}^\infty k\, P(n,k,q) &= \tfrac{n}{2q-1} & & \text{(computing the mean)}\\
\sum_{k=n}^\infty \bigl(k-\tfrac{n}{2q-1}\bigr)^2 P(n,k,q) &= \tfrac{2nq(1-q)}{(2q-1)^3} & & \text{(computing the variance)}
\end{align}Here is the mean and standard deviation for a few of the horses:
\begin{align}
\text{Horse 4} &\approx 1250 \pm 218\\
\text{Horse 8} &\approx 625 \pm 74\\
\text{Horse 12} &\approx 417 \pm 37\\
\text{Horse 16} &\approx 313 \pm 21\\
\text{Horse 20} &\approx 250 \pm 12
\end{align}Note: we must double the means and standard deviations from the formulas above since those formulas skip over the odd lengths (which are impossible). When making a continuous approximation, we want those spaces filled in!
One thing we notice right away is that the distributions are very well separated (it’s a log scale!). The only horses that stand any chance of beating Horse 20 are the horses that are close behind. Let’s plot the distributions one more time, on a linear scale, but only for the top horses:
If horse $i$ finishes in $h_i$ steps, computing the probability that horse $i$ wins amounts to finding the probability that $h_i > h_j$ for all $i\ne j$. We can approximate this as:
\begin{align}
\mathbb{P}(h_i > h_j\,\,\forall i \ne j)
&\approx \int_{-\infty}^\infty \mathbb{P}(h_i = x) \prod_{j \ne i} \mathbb{P}(h_j > x)\, \mathrm{d} x\\
&= \int_{-\infty}^\infty \frac{1}{\sigma_i}\varphi_i(z_i)\prod_{j \ne i} \left( 1-\Phi(z_j)\right) \, \mathrm{d} x\\
\end{align}where $\varphi$ and $\Phi$ are the probability density (pdf) and cumulative distribution (cdf) of the standard Normal distribution and $z_k = \tfrac{x-\mu_k}{\sigma_k}$ is the standard normal deviate computed for the $k^\text{th}$ horse ($\mu_k$ and $\sigma_k$ are the mean and standard deviations computed earlier).
This is a very messy integral for which there does not exist a closed-form solution, so we must resort to evaluating it numerically. For the solution, read on!
For the tl;dr, the answer is:
[Show Solution]
The following table shows the probability of each horse winning the race.
Horse number |
Probability of winning the race |
20 |
71.27% |
19 |
21.54% |
18 |
5.605% |
17 |
1.268% |
16 |
0.2595% |
15 |
0.05085% |
14 |
0.01018% |
13 |
0.002212% |
As we can see, it doesn’t look good for horses 1 through 17. The top few horses are the only ones with a legitimate shot of winning!
Note: Since these probabilities were computed using a continuous approximation, I didn’t account for ties at all. If you must be a clear winner in order to be declared the winner, then the probabilities will all shift down. The effect is greatest with the first horse (probability of winning drops by about 5%), but the effect is far smaller for the other horses (Horse 18’s probability only drops by about 0.5%).
Hi Laurent,
This is a very pretty approach. But it doesn’t account for ties, which occur 4.394% of the time.
I looked up the rules of horse racing, and a tie between N horses is treated as 1/N of a win for each horse. Treating ties that way, and performing the discrete Fokker-Planck equations for the probability of each horse being at each step as a function of time, I got slightly different results:
horse 20, 72.394775%
horse 19, 21.69022%
horse 18, 4.97523%
horse 17, 0.832475%
horse 16, 0.0986%
… horse 1, 1.87X10^{-33}%
see http://theorie.ikp.physik.tu-darmstadt.de/qcd/horse.c
For what it’s worth, my Monte Carlo simulation, which treated ties exactly as you describe, Guy, gave almost exactly the same probabilities as your analytic approach. Very nice!
Hi Mike,
In your Monte Carlo approach how did you accurately simulate the rare event(s) that one of the horses (using Guy’s notation) 1-15 win?
I had a naive simulation and realized quickly that I wasn’t going to be able to accurately distinguish between the probabilities for the uncommon horses winning and 0.
Thanks!
–Alden Stowe
The other way to come up with the formula is to recognize that the final two steps must be forward. Hence, the other 2k-2 steps must be forward and backward steps that net to 2n-2 forward steps. This implies the number of forward “successes” is k+n-2 and backward failures is k-n. Hence,
P(k,n,q) = combin(2*k-2,n+k-2)*q^(n+k-2)*(1-q)^(k-n)*q^2
The last q^2 are the final two steps. I think this is the equivalent of your expression.
I solved the problem using lattice Green functions as discussed in the book by Hughes, “Random Walks and Random Environments, Vol I”. The following generating function, when expanded as a Taylor’s series about zero, gives coefficients of $z^n$ which are the probabilities that a random walker first reaches the 200 meter finish line after $n$ steps:
\[
\left(\frac{1-\sqrt{1-4p(1-p)z^2}}{2(1-p)z}\right)^{200}\qquad\text{where }p=0.52,0.54,\dots,0.90.
\]Using this methodology I incorporated ties up to the fourth order and exactly replicated the probabilities given by Guy Moore.
Neat! thanks! I took the liberty of reformatting your comment using $\LaTeX$. Let me know if there are any typos.