Tank counting

This week’s Riddler Classic is a simple-sounding problem about statistics. But it’s not so simple! Here is a paraphrased version of the problem:

There are $n$ tanks, labeled $\{1,2,\dots,n\}$. We are presented with a sample of $k$ tanks, chosen uniformly at random, and we observe that the smallest label among the sample is $22$ and the largest label is $114$. Both $n$ and $k$ are unknown. What is our best estimate of $n$?

Here is a short introduction on parameter estimation, for the curious:
[Show Solution]

Here is my solution to the problem:
[Show Solution]

5 thoughts on “Tank counting”

  1. If we assume a uniform normalizable distribution of N = [114,some_cutoff] is it legit to integrate P(22,114,k,N)*N over N=[114,some_cutoff] and k=[2,93] to get an expectation of N? [

    1. Yes, that works. Mathematically, we are given $P(z\mid\theta)$, where $z=(x,y)$ is our data (the min and max tank numbers observed), and $\theta=(n,k)$ are the unknown parameters. This is the distribution of the min and max values assuming we know the parameters. If we imagine that our data $z$ is fixed, then $P(z\mid\theta)$ (viewed as a function of $\theta$) is called the “likelihood”.

      The maximum likelihood estimator (MLE) simply maximizes likelihood, i.e. what is the value of the parameter $\theta$ that maximizes $P(z\mid\theta)$.

      If you have a “prior” distribution on $\theta$, e.g. if you know that $n$ is uniformly distributed on [114,some_cutoff], as you suggested, then we can call this $P(\theta)$. Then, we can compute the “posterior distribution”, which is the distribution of your parameters conditioned on your data. Using Bayes’ theorem, we’re saying that the posterior satisfies:
      \[
      P(\theta\mid z) \propto P(z\mid\theta)P(\theta)
      \]The maximum a posteriori estimator (MAP) is to choose the value of $\theta$ that maximizes this posterior distribution.

      What you’re suggesting, which is to take the expectation of the posterior distribution (instead of finding the maximum) also has a name. It’s called the minimum mean squared error estimator (MMSE). This is a specific example of a more general class of estimators called Bayes Estimators. Again, no wrong answer here, just different potential choices for estimators.

      Note: In your formula, you’re not only assuming $n$ is uniformly distributed, but that $k$ is as well! You have to remember to normalize the probability distributions for $n$ and $k$, so that would mean dividing your integral (which is actually a sum) by $(\text{cutoff}-114)(93-2)$.

  2. Ah, but German tanks go into combat and some percentage of them are destroyed by the allies. Since it takes days to build a tank, not all tanks seen on any particular day have been on the battlefield the same number of days. The earlier built tanks, thus see more days of battle and are more likely to be destroyed. For example, tanks number 1 through 22 are more likely to have been destroyed which means you cannot observe their serial number, than tanks 125 through 135.

    But all is not lost. Because of the above, the estimate of 135 is more likely too high than too low. Which means the allies most likely would have fewer German tanks to battle.

    Oh, but you say these are not German tanks. It is still more likely that the earlier built tanks are out of commission, having either broken down or in need of routing maintenance.

Leave a Reply

Your email address will not be published. Required fields are marked *