This Riddler puzzle concerns a random process and sequential decision-making.
A grizzly bear stands in the shallows of a river during salmon spawning season. Precisely once every hour, a fish swims within its reach. The bear can either catch the fish and eat it, or let it swim past to safety. This grizzly is, as many grizzlies are, persnickety. It’ll only eat fish that are at least as big as every fish it ate before.
Each fish weighs some amount, randomly and uniformly distributed between 0 and 1 kilogram. (Each fish’s weight is independent of the others, and the skilled bear can tell how much each weighs just by looking at it.) The bear wants to maximize its intake of salmon, as measured in kilograms. Suppose the bear’s fishing expedition is two hours long. Under what circumstances should it eat the first fish within its reach? What if the expedition is three hours long?
Here is my solution:
[Show Solution]
Recurrence relation
The bear’s optimal decision of whether to eat the current salmon or not only depends on three things: the weight of the current salmon, the weight of the largest salmon caught so far, and the number of salmon remaining. Therefore, this problem is ideally suited for dynamic programming. We’ll write a recursion that solves the general case, and then we’ll specialize to the case of a two-hour or three-hour horizon.
Define $w_k(x,t)$ to be the expected weight of all of the salmon eaten from now on, assuming there are $k$ salmon remaining, the heaviest salmon eaten so far weighed $t$ kg, and the current salmon weighs $x$ kg. If $x \le t$, the bear cannot eat the salmon because it’s too light. If $x > t$, the bear must decide whether to eat the salmon or not. We therefore have the following recursion for expected future weight:
\[
w_{k+1}(x,t) = \begin{cases}
\mathbb{E}_\xi w_k(\xi,t) & \text{if }x \le t \\
\max\left\{ \mathbb{E}_\xi w_k(\xi,t) , \mathbb{E}_\xi \left( x + w_k(\xi,x) \right) \right\} & \text{if }x > t
\end{cases}
\]
In the case $x > t$, the bear makes the decision that maximizes the expected total salmon weight. The first and second choices correspond to letting the salmon go or eating it, respectively. Let’s now define the expected future salmon weight $f_k(t) = \mathbb{E}_x w_k(x,t)$, which is averaged over all possible current salmon weights. Expanding the recursion,
\begin{align}
f_{k+1}(t) &= \mathbb{E}_x w_{k+1}(x,t) \\
&= \int_0^1 w_{k+1}(x,t)\,dx \\
&= \int_0^t f_k(t)\,dx + \int_t^1 \max\left\{ f_k(t) ,\, x + f_k(x)\right\}\,dx \\
&= t f_k(t) + \int_t^1 \max\left\{ f_k(t) ,\, x + f_k(x)\right\}\,dx \\
&= f_k(t) + \int_t^1 \max\left\{ 0 ,\, x + f_k(x)-f_k(t) \right\}\,dx
\end{align}
The trivial base case is $f_0(t) = 0$, since with no more salmon the future weight is necessarily zero. With this recursion, we can now solve for the cases asked for in the problem statement, namely $f_2(0)$ and $f_3(0)$.
Greedy grizzly case
Let’s start with the simple case where the grizzly always eats every salmon whenever possible. The recursion then simplifies to:
\begin{align}
f_{k+1}(t) &= t f_k(t) + \int_t^1 \left( x + f_k(x) \right)\,dx \\
&= t f_k(t) + \tfrac{1}{2}(1-t^2) + \int_t^1 f_k(x) \,dx
\end{align}
Taking the derivative with respect to $t$ and simplifying, we obtain:
\[
f_{k+1}'(t) = t f_k'(t)-t
\]
Substituting in $f_{0}'(t)=0$ and iterating, we observe:
\[
f_{1}'(t) = -t,\qquad f_{2}'(t) = -t-t^2,\qquad f_{3}'(t) = -t-t^2-t^3,\qquad\ldots
\]
The pattern is clear, and $f_{k}'(t) = -t-t^2-\dots-t^k$. Now integrate from $1$ to $t$ and use the fact that $f_k(1)=0$ (can’t eat any more salmon if we’ve already had one of weight 1 kg) and obtain
\[
f_k(t) = \frac{1-t^2}{2}+\frac{1-t^3}{3}+\dots+\frac{1-t^{k+1}}{k+1}
\]
So when the grizzly is greedy, we have: $f_{k}^{\prime\prime}(t) = -1-2t-3t^2-\dots-kt^{k-1} \le 0$. Therefore $f_k(t)$ is a concave function. We’ll make use of this fact later… Starting with an empty stomach and a greedy grizzly, the expected salmon weight eaten is given by the formula
\[
f_k(0) = \frac{1}{2} + \frac{1}{3} + \dots + \frac{1}{k+1}
\]
This is a harmonic sum, which grows like $\log(k)$. So given infinite time, the persnickety (and greedy) grizzly can expect to eat an infinite quantity of salmon.
When is it best to be greedy?
It turns out the grizzly’s optimal strategy is to be greedy when there is 1, 2, or 3 salmon remaining. But if there are 4 or more salmon, the optimal strategy changes. To see why, recall our recurrence relation:
\[
f_{k+1}(t) = f_k(t) + \int_t^1 \max\left\{ 0 ,\, x + f_k(x)-f_k(t) \right\}\,dx
\]
If $x + f_k(x)-f_k(t) \ge 0$ for all $0\le t \le x \le 1$, then we should always be greedy for the $(k+1)^\text{st}$ salmon. Here, we’ll make use of the previously derived fact that when the grizzly is greedy, $f_k$ is concave. This means that $x + f_k(x)-f_k(t)$ is also concave in $x$, and nonnegativity can be verified by checking the endpoints $x=t$ and $x=1$. It’s always true for $x=t$, and for $x=1$ it’s equivalent to $1-f_k(t) \ge 0$ because $f_k(1)=0$ (can’t eat any more salmon if we’ve already had one of weight 1 kg).
Let’s examine each $k$ sequentially and see whether $1-f_k(t) \ge 0$ holds…
For 1 salmon: Recall that $f_0(t)=0$. Therefore:
\[
1-f_0(t) = 1 \ge 0
\]
So the grizzly is greedy with 1 salmon left, and therefore the previously derived formula holds: $f_1(t) = \frac{1-t^2}{2}$. The expected salmon weight is $f_1(0)=\tfrac{1}{2}$. This isn’t surprising; it’s the expected weight of one salmon!
For 2 salmon: Substitute $f_1(t) = \frac{1-t^2}{2}$ and find:
\[
1-f_1(t) = \tfrac{1}{2}+\tfrac{t^2}{2} \ge 0
\]
Therefore, the grizzly is greedy with 2 salmon left, and $f_2(t) = \frac{1-t^2}{2}+\frac{1-t^3}{3}$. The expected salmon weight is $f_2(0)=\tfrac{5}{6}$.
For 3 salmon: Substitute $f_2(t) = \frac{1-t^2}{2}+\frac{1-t^3}{3}$ and find:
\[
1-f_2(t) = \tfrac{1}{6}+\tfrac{t^2}{2}+\tfrac{t^3}{3} \ge 0
\]
Therefore, the grizzly is greedy with 3 salmon left, and $f_3(t) = \frac{1-t^2}{2}+\frac{1-t^3}{3}+\tfrac{1-t^4}{4}$. The expected salmon weight is $f_3(0)=\tfrac{13}{12}$.
For 4 salmon: Substitute $f_3(t) = \frac{1-t^2}{2}+\frac{1-t^3}{3}+\tfrac{1-t^4}{4}$ and find:
\[
1-f_3(t) = -\tfrac{1}{12}+\tfrac{t^2}{2}+\tfrac{t^3}{3}+\tfrac{t^4}{4} \ngeq 0
\]
Aha! this function is no longer nonnegative for all $t$ (take $t$ near zero). So in this case, if the heaviest salmon eaten weighed $t$ kg and the current salmon weighs $x$ kg, then the bear should eat the current salmon if $x + f_3(x)-f_3(t) \ge 0$ and leave the salmon alone otherwise. Here is what the decision boundary looks like:
The green region represents when the bear should eat the current fish, assuming there are four fish remaining including this one. The bottom red triangle in the picture is forbidden because the fish doesn’t weigh enough. The small top red region is forbidden because the fish is too heavy, i.e. $x + f_3(x)-f_3(t) < 0$.
The general case
Since the strategy for four salmon is no longer greedy, we must return to the original recursion in order to find $f_k$ for $k\ge 4$. It’s no longer possible to find analytic solutions, so we resort to a numerical approximation. Upon doing this, I found the optimal decision rules for the bear as a function of how many salmon remain. Here is the result in a single plot:
As before, we always reject fish below the dotted line. Each solid line corresponds to a decision boundary for a different number of fish remaining. Above the line, let the fish go. Below the line, eat the fish. The upper-most blue curve ($k=4$ fish remaining) is the same as the one plotted earlier. As one might expect, the more fish lie ahead, the pickier we must be at first in order to maximize final weight.
Now you might be wondering… How much worse is the greedy strategy? We can get a clear picture if we plot the expected total weight of salmon as a function of how many salmon remain:
Note that the greedy solution is simply the harmonic sum we derived earlier. As expected, the optimal strategy and the greedy strategy perfectly coincide when $k=1,2,3$. They’re very close when $k=4$, but not quite the same. When $k\ge 5$, we begin to see a noticeable difference. The curves keep getting farther apart as $k$ gets larger. Here is a zoomed out version of the same plot:
The greedy curve grows like $\log(k)$ while the optimal curve grows like $\sqrt{k}$, which is much faster indeed!
Very nice. Double-check f2(0), though?
thanks! fixed it.
While there is no closed form solution for non-small k, as the number of rounds approaches infinity, the optimal (up to an asymptotic factor of 1 for the sum) strategy approaches a closed form:
threshold = w + sqrt(2/3)*sqrt(1-w^3)/w / sqrt(k), where w is the minimum weight allowed, and k is the number of remaining rounds, except for a correction for very small w to avoid divergence; the following correction is good enough: if w<k^(-1/4) set threshold to w+k^(-1/4).
Assuming the initial minimum weight is 0 and the number of rounds is k, the approximate total sum is sqrt(2/3*k), and at time t, the threshold will be approximately (t/k)^(1/3).
Can someone please post a solution for the less mathematically mature? Dmytro, any chance you can accept my Facebook request, I wanted to discuss some of these problems with you? Sposibo
In brief, here is my solution as the number of rounds k approach infinity.
At time t, the optimal threshold will be around (t/k)^(1/3).
Justification: As threshold increases in c times, the time to increment the threshold by ε is reduced in c^2 times, which is as desired since the benefit of extra time (if total time is flexible) to increase threshold by ε (which by optimality, should be the same for both thresholds) is proportional to threshold/sqrt(ε_increment_time). (If time is increased in c^2 times, we can reduce the threshold increment in c times and increase the number of numbers taken in c times.)
d(threshold/dt) = 1/3*t^(-2/3)*k^(-1/3)
average_threshold_increment = (threshold-last_number)/2 = 1/(2*increment_time)
average_threshold_increment = d(threshold)/dt * increment_time
Thus,
1/(2*increment_time^2) = d(threshold)/dt,
increment_time = sqrt(1/(2*d(threshold)/dt) = sqrt(3/2)*t^(1/3)*k^(1/6)
(*) total_sum = integral(threshold/increment_time,t,0,k) = integral(sqrt(2/3)/sqrt(k),t,0,k) = sqrt(2/3*k).
Let the current number be w.
t = w^3*k
threshold – w = 1/increment_time = sqrt(2/3)*t^(-1/3)*k^(-1/6) = sqrt(2/3)/w/sqrt(k)
Let k_r be the remaining number of rounds.
k = k_r/(1-w^3)
(*) threshold = w + sqrt(2/3)*sqrt(1-w^3)/w / sqrt(k_r), except for a correction for very small w to avoid unreasonable thresholds; the following correction is good enough: if w<k_r^(-1/4) set threshold to w+k_r^(-1/4).
Here’s a solution from the less mathematically mature:
https://www.sharelatex.com/project/57a8e6f9885374c221961e76
Here is a simple explicit strategy that achieves O(sqrt(n)), albeit with a lower-than-optimal constant:
Divide the meal into sqrt(n) “courses” of sqrt(n) fish each. (I’ll just assume sqrt(n) is an integer; otherwise, one can make some minor adjustments that don’t affect the asymptotics.) During the kth “course”, let the bear eat any fish he can of weight less than k/sqrt(n). As a result, if at least one fish of weight between (k-1)/sqrt(n) and k/sqrt(n) appears during the kth course, then the bear will certainly eat at least one such fish during that course. The probability that a fish in that range appears during the kth course is approximately 1-1/e, since the number of such fish is roughly Poisson-distributed — pun noted — with a mean of 1. Thus, the bear’s expected consumption is approximately bounded below by (1-1/e)(1+2+3+…+(n-1))/sqrt(n), which is O(sqrt(n)). I say “bounded below” because this analysis does not account for the possibility that the bear gets to eat more than one fish during the kth course, or gets to eat a fish lighter than (k-1)/sqrt(n) during the kth course. These possibilities improve the expected consumption by some constant factor which I am too lazy to work out.
Oops, typo: (1-1/e)(1+2+3+…+(n-1))/sqrt(n) should have been (1-1/e)(1+2+3+…+(sqrt(n)-1))/sqrt(n).
very cool, thanks!
btw $\LaTeX$ works in the comments with standard syntax, and I think you should also be able to edit your own posts if you want to make a correction.
You can also put a bound on the order of growth of the best case by noting that the most salmon the bear can eat is the length of the longest increasing subsequence, which is known to have an expected value of 2*sqrt(n) – in other words, if the bear knew the weight of all future salmon, his best case would be O(sqrt(n)), which the strategy above achieves (albeit not with the same coefficient).
Here’s an extension I suggested to Ollie that you might enjoy: suppose there are 2 bears, alpha and beta. They’re each greedy, and want to eat any salmon bigger than they’ve eaten before, but alpha-bear always gets first choice. Which bear ends up with more? (Hint: it’s not always alpha!)
Easy way to see it’s a harmonic series for greedy case. Probability that fish N is the maximum weight fish in a sequence of N fish = 1/N. Expected weight of the maximum weight fish out of N fish for IID uniformly [0,1] distributed weights= N/N+1. So expected amount eaten at step N = 1/N * N/N+1 = 1/N+1.
Very nice! Linearity of expectation wins again!
Very nice solution!
Did you mean to write $f’_{k+1}(t)$ rather than $f’_{k+1}(x)$ ?
Yes, thanks! Corrected.