This Riddler puzzle concerns a random process and sequential decision-making.

A grizzly bear stands in the shallows of a river during salmon spawning season. Precisely once every hour, a fish swims within its reach. The bear can either catch the fish and eat it, or let it swim past to safety. This grizzly is, as many grizzlies are, persnickety. It’ll only eat fish that are at least as big as every fish it ate before.

Each fish weighs some amount, randomly and uniformly distributed between 0 and 1 kilogram. (Each fish’s weight is independent of the others, and the skilled bear can tell how much each weighs just by looking at it.) The bear wants to maximize its intake of salmon, as measured in kilograms. Suppose the bear’s fishing expedition is two hours long. Under what circumstances should it eat the first fish within its reach? What if the expedition is three hours long?

Here is my solution:

[Show Solution]

### Recurrence relation

The bear’s optimal decision of whether to eat the current salmon or not only depends on three things: the weight of the current salmon, the weight of the largest salmon caught so far, and the number of salmon remaining. Therefore, this problem is ideally suited for dynamic programming. We’ll write a recursion that solves the general case, and then we’ll specialize to the case of a two-hour or three-hour horizon.

Define $w_k(x,t)$ to be the expected weight of all of the salmon eaten from now on, assuming there are $k$ salmon remaining, the heaviest salmon eaten so far weighed $t$ kg, and the current salmon weighs $x$ kg. If $x \le t$, the bear cannot eat the salmon because it’s too light. If $x > t$, the bear must decide whether to eat the salmon or not. We therefore have the following recursion for expected future weight:

\[

w_{k+1}(x,t) = \begin{cases}

\mathbb{E}_\xi w_k(\xi,t) & \text{if }x \le t \\

\max\left\{ \mathbb{E}_\xi w_k(\xi,t) , \mathbb{E}_\xi \left( x + w_k(\xi,x) \right) \right\} & \text{if }x > t

\end{cases}

\]

In the case $x > t$, the bear makes the decision that maximizes the expected total salmon weight. The first and second choices correspond to letting the salmon go or eating it, respectively. Let’s now define the expected future salmon weight $f_k(t) = \mathbb{E}_x w_k(x,t)$, which is averaged over all possible current salmon weights. Expanding the recursion,

\begin{align}

f_{k+1}(t) &= \mathbb{E}_x w_{k+1}(x,t) \\

&= \int_0^1 w_{k+1}(x,t)\,dx \\

&= \int_0^t f_k(t)\,dx + \int_t^1 \max\left\{ f_k(t) ,\, x + f_k(x)\right\}\,dx \\

&= t f_k(t) + \int_t^1 \max\left\{ f_k(t) ,\, x + f_k(x)\right\}\,dx \\

&= f_k(t) + \int_t^1 \max\left\{ 0 ,\, x + f_k(x)-f_k(t) \right\}\,dx

\end{align}

The trivial base case is $f_0(t) = 0$, since with no more salmon the future weight is necessarily zero. With this recursion, we can now solve for the cases asked for in the problem statement, namely $f_2(0)$ and $f_3(0)$.

### Greedy grizzly case

Let’s start with the simple case where the grizzly *always* eats every salmon whenever possible. The recursion then simplifies to:

\begin{align}

f_{k+1}(t) &= t f_k(t) + \int_t^1 \left( x + f_k(x) \right)\,dx \\

&= t f_k(t) + \tfrac{1}{2}(1-t^2) + \int_t^1 f_k(x) \,dx

\end{align}

Taking the derivative with respect to $t$ and simplifying, we obtain:

\[

f_{k+1}'(t) = t f_k'(t)-t

\]

Substituting in $f_{0}'(t)=0$ and iterating, we observe:

\[

f_{1}'(t) = -t,\qquad f_{2}'(t) = -t-t^2,\qquad f_{3}'(t) = -t-t^2-t^3,\qquad\ldots

\]

The pattern is clear, and $f_{k}'(t) = -t-t^2-\dots-t^k$. Now integrate from $1$ to $t$ and use the fact that $f_k(1)=0$ (can’t eat any more salmon if we’ve already had one of weight 1 kg) and obtain

\[

f_k(t) = \frac{1-t^2}{2}+\frac{1-t^3}{3}+\dots+\frac{1-t^{k+1}}{k+1}

\]

So when the grizzly is greedy, we have: $f_{k}^{\prime\prime}(t) = -1-2t-3t^2-\dots-kt^{k-1} \le 0$. Therefore $f_k(t)$ is a concave function. We’ll make use of this fact later… Starting with an empty stomach and a greedy grizzly, the expected salmon weight eaten is given by the formula

\[

f_k(0) = \frac{1}{2} + \frac{1}{3} + \dots + \frac{1}{k+1}

\]

This is a harmonic sum, which grows like $\log(k)$. So given infinite time, the persnickety (and greedy) grizzly can expect to eat an infinite quantity of salmon.

### When is it best to be greedy?

It turns out the grizzly’s optimal strategy is to be greedy when there is 1, 2, or 3 salmon remaining. But if there are 4 or more salmon, the optimal strategy changes. To see why, recall our recurrence relation:

\[

f_{k+1}(t) = f_k(t) + \int_t^1 \max\left\{ 0 ,\, x + f_k(x)-f_k(t) \right\}\,dx

\]

If $x + f_k(x)-f_k(t) \ge 0$ for all $0\le t \le x \le 1$, then we should always be greedy for the $(k+1)^\text{st}$ salmon. Here, we’ll make use of the previously derived fact that when the grizzly is greedy, $f_k$ is concave. This means that $x + f_k(x)-f_k(t)$ is also concave in $x$, and nonnegativity can be verified by checking the endpoints $x=t$ and $x=1$. It’s always true for $x=t$, and for $x=1$ it’s equivalent to $1-f_k(t) \ge 0$ because $f_k(1)=0$ (can’t eat any more salmon if we’ve already had one of weight 1 kg).

Let’s examine each $k$ sequentially and see whether $1-f_k(t) \ge 0$ holds…

**For 1 salmon:** Recall that $f_0(t)=0$. Therefore:

\[

1-f_0(t) = 1 \ge 0

\]

So the grizzly is greedy with 1 salmon left, and therefore the previously derived formula holds: $f_1(t) = \frac{1-t^2}{2}$. The expected salmon weight is $f_1(0)=\tfrac{1}{2}$. This isn’t surprising; it’s the expected weight of one salmon!

**For 2 salmon:** Substitute $f_1(t) = \frac{1-t^2}{2}$ and find:

\[

1-f_1(t) = \tfrac{1}{2}+\tfrac{t^2}{2} \ge 0

\]

Therefore, the grizzly is greedy with 2 salmon left, and $f_2(t) = \frac{1-t^2}{2}+\frac{1-t^3}{3}$. The expected salmon weight is $f_2(0)=\tfrac{5}{6}$.

**For 3 salmon:** Substitute $f_2(t) = \frac{1-t^2}{2}+\frac{1-t^3}{3}$ and find:

\[

1-f_2(t) = \tfrac{1}{6}+\tfrac{t^2}{2}+\tfrac{t^3}{3} \ge 0

\]

Therefore, the grizzly is greedy with 3 salmon left, and $f_3(t) = \frac{1-t^2}{2}+\frac{1-t^3}{3}+\tfrac{1-t^4}{4}$. The expected salmon weight is $f_3(0)=\tfrac{13}{12}$.

**For 4 salmon:** Substitute $f_3(t) = \frac{1-t^2}{2}+\frac{1-t^3}{3}+\tfrac{1-t^4}{4}$ and find:

\[

1-f_3(t) = -\tfrac{1}{12}+\tfrac{t^2}{2}+\tfrac{t^3}{3}+\tfrac{t^4}{4} \ngeq 0

\]

Aha! this function is no longer nonnegative for all $t$ (take $t$ near zero). So in this case, if the heaviest salmon eaten weighed $t$ kg and the current salmon weighs $x$ kg, then the bear should eat the current salmon if $x + f_3(x)-f_3(t) \ge 0$ and leave the salmon alone otherwise. Here is what the decision boundary looks like:

The green region represents when the bear should eat the current fish, assuming there are four fish remaining including this one. The bottom red triangle in the picture is forbidden because the fish doesn’t weigh enough. The small top red region is forbidden because the fish is too heavy, i.e. $x + f_3(x)-f_3(t) < 0$.

### The general case

Since the strategy for four salmon is no longer greedy, we must return to the original recursion in order to find $f_k$ for $k\ge 4$. It’s no longer possible to find analytic solutions, so we resort to a numerical approximation. Upon doing this, I found the optimal decision rules for the bear as a function of how many salmon remain. Here is the result in a single plot:

As before, we always reject fish below the dotted line. Each solid line corresponds to a decision boundary for a different number of fish remaining. Above the line, let the fish go. Below the line, eat the fish. The upper-most blue curve ($k=4$ fish remaining) is the same as the one plotted earlier. As one might expect, the more fish lie ahead, the pickier we must be at first in order to maximize final weight.

Now you might be wondering… How much worse is the greedy strategy? We can get a clear picture if we plot the expected total weight of salmon as a function of how many salmon remain:

Note that the greedy solution is simply the harmonic sum we derived earlier. As expected, the optimal strategy and the greedy strategy perfectly coincide when $k=1,2,3$. They’re very close when $k=4$, but not quite the same. When $k\ge 5$, we begin to see a noticeable difference. The curves keep getting farther apart as $k$ gets larger. Here is a zoomed out version of the same plot:

The greedy curve grows like $\log(k)$ while the optimal curve grows like $\sqrt{k}$, which is much faster indeed!