# Card collection completion

This Riddler puzzle is a classic probability problem: how long can one expect to wait until the entire set of cards is collected?

My son recently started collecting Riddler League football cards and informed me that he planned on acquiring every card in the set. It made me wonder, naturally, how much of his allowance he would have to spend in order to achieve his goal. His favorite set of cards is Riddler Silver; a set consisting of 100 cards, numbered 1 to 100. The cards are only sold in packs containing 10 random cards, without duplicates, with every card number having an equal chance of being in a pack.

Each pack can be purchased for \$1. If his allowance is \$10 a week, how long would we expect it to take before he has the entire set?

What if he decides to collect the more expansive Riddler Gold set, which has 300 different cards?

Here is my solution:
[Show Solution]

## 13 thoughts on “Card collection completion”

1. Jason Weisman says:

Laurent, did you have a chance to attempt the Riddler Express – competitive coin flipping puzzle – from last week? The solution provided yesterday said “according to the simulations of many solvers… the Red Team wins this game about 59 percent of the time.” Seems to me this problem can be solved exactly, but my answer was 5/8=62.5%. Anyway, just wondering if anyone else got the same result or if there may be a flaw in my analysis. See –

1. John Snyder says:

I got the answer 5/8 also. I believe that you are exactly correct.

1. Jason Weisman says:

John, appreciate the confirmation. The closed-form solution didn’t seem very complicated. Surprising that wasn’t the solution on 538 Riddler site. 3.5% difference between 59% answer provided and 62.5% isn’t negligible.

2. I didn’t look at that problem, but it does seem like there should be a closed-form solution. Your approach looks correct o me.

2. John Snyder says:

Laurent, I got exactly the same solutions using what is essentially the same methodology as you employed-but I used Mathematica. I then went on to find the probability generating function for the Riddler Silver 100 card case. The mean is, of course, 49.94457; the exact value being a rational number whose numerator in lowest terms contains 569 digits and whose denominator contains 567 digits. The standard deviation is 11.9989. The mode is 44 and the median is 47. The upper 90%, 95%, 99% and 99.9% points of the cumulative probability distribution are, respectively, 66, 72, 88 and 110. I would have enclosed a plot of the probability density function, but I can’t figure out how to attach an image file to this message.

1. Cool! I didn’t think to compute the other moments of the distribution, but I’d love to see how you went about it! Any chance you could post a write-up?

1. John Snyder says:

If you can send me some kind of link I’ll send you back a PDF with a simple example of how I did it.

1. you can email me at laurent.lessard at gmail dot com.

3. David Kravitz says:

This is how I did it except since the same binoms were being called over and over, and never with a “bottom” term more than 10, I just generated all of them first. This sped things up quite a bit.

NCards = 100

# make B, a 2D array holding B[n][k] for all n between 0 and NCards and for all k between 0 and 10
B = [[0]*11 for i in range(0,NCards+1)]
for n in range(0,NCards+1):
for k in range(0,11):
if(k==n or k==0): B[n][k]=1
else: B[n][k] = B[n-1][k] + B[n-1][k-1]

# now C(n,k) can make use of B very quickly
def C(n,k):
if(kn): return 0
else: return B[n][k]

Like John Snyder, I came up with 47 for 100 cards. And 179 for 300 cards. Those were the numbers for which 50% of the 100 and 300 cards would be found. It’s pretty simple math…here’s the 100 one that I did. The MAX 62815650955529472000 is 100 * 99 * 98 * 97 * 96 * 95 * 94 * 93 * 92 * 91. That’s the number of possible 10 *different* cards to get. I grant that the ten lines of code are a bit longer than usual 🙂

#include
#include
#include
#include

#define TOTAL 100

#define MAX 62815650955529472000.

int
main(int ac, char **av)
{
double n_cards[TOTAL + 1], new_cards[TOTAL + 1];
int i, n, i0, i1, i2, i3, i4, i5, i6, i7, i8, i9, new, count, card_count;
for(i = 0; i <= TOTAL; i++)
n_cards[i] = 0;

n_cards[10] = 1.;
for(n = 0; n_cards[TOTAL] < .5;n++) {
printf("n = %d\n", n);
for(i = 0; i <= TOTAL; i++)
new_cards[i] = 0;
for(i = 0; i <= TOTAL ; i++) {
if(n_cards[i] != 0) {
new_cards[i + 10] += n_cards[i] * (TOTAL – i – 0)*(TOTAL – i – 1)*(TOTAL – i – 2)*(TOTAL – i – 3)*(TOTAL – i – 4)*(TOTAL – i – 5)*(TOTAL – i – 6)*(TOTAL – i – 7)*(TOTAL – i – 8)*(TOTAL – i – 9) / MAX;
new_cards[i + 9] += n_cards[i] * (TOTAL – i – 0)*(TOTAL – i – 1)*(TOTAL – i – 2)*(TOTAL – i – 3)*(TOTAL – i – 4)*(TOTAL – i – 5)*(TOTAL – i – 6)*(TOTAL – i – 7)*(TOTAL – i – 8)*i*(10) / MAX;
new_cards[i + 8] += n_cards[i] * (TOTAL – i – 0)*(TOTAL – i – 1)*(TOTAL – i – 2)*(TOTAL – i – 3)*(TOTAL – i – 4)*(TOTAL – i – 5)*(TOTAL – i – 6)*(TOTAL – i – 7)*i*(i – 1)*(10 * 9 / 2.) / MAX;
new_cards[i + 7] += n_cards[i] * (TOTAL – i – 0)*(TOTAL – i – 1)*(TOTAL – i – 2)*(TOTAL – i – 3)*(TOTAL – i – 4)*(TOTAL – i – 5)*(TOTAL – i – 6)*i*(i – 1)*(i – 2)*(10 * 9 * 8 / 2. / 3.) / MAX;
new_cards[i + 6] += n_cards[i] * (TOTAL – i – 0)*(TOTAL – i – 1)*(TOTAL – i – 2)*(TOTAL – i – 3)*(TOTAL – i – 4)*(TOTAL – i – 5)*i*(i – 1)*(i – 2)*(i – 3)*(10 * 9 * 8 * 7 / 2. / 3. / 4.) / MAX;
new_cards[i + 5] += n_cards[i] * (TOTAL – i – 0)*(TOTAL – i – 1)*(TOTAL – i – 2)*(TOTAL – i – 3)*(TOTAL – i – 4)*i*(i – 1)*(i – 2)*(i – 3)*(i – 4)*(10 * 9 * 8 * 7 * 6 / 2. / 3. / 4. / 5.) / MAX;
new_cards[i + 4] += n_cards[i] * (TOTAL – i – 0)*(TOTAL – i – 1)*(TOTAL – i – 2)*(TOTAL – i – 3)*i*(i – 1)*(i – 2)*(i – 3)*(i – 4)*(i – 5)*(10 * 9 * 8 * 7 * 6 * 5 / 2. / 3. / 4. / 5. / 6.) / MAX;
new_cards[i + 3] += n_cards[i] * (TOTAL – i – 0)*(TOTAL – i – 1)*(TOTAL – i – 2)*i*(i – 1)*(i – 2)*(i – 3)*(i – 4)*(i – 5)*(i – 6)*(10 * 9 * 8 * 7 * 6 * 5 * 4 / 2. / 3. / 4. / 5. / 6. / 7.) / MAX;
new_cards[i + 2] += n_cards[i] * (TOTAL – i – 0)*(TOTAL – i – 1)*i*(i – 1)*(i – 2)*(i – 3)*(i – 4)*(i – 5)*(i – 6)*(i – 7)*(10 * 9 * 8 * 7 * 6 * 5 * 4 * 3 / 2. / 3. / 4. / 5. / 6. / 7. / 8.) / MAX;
new_cards[i + 1] += n_cards[i] * (TOTAL – i – 0)*i*(i – 1)*(i – 2)*(i – 3)*(i – 4)*(i – 5)*(i – 6)*(i – 7)*(i – 8)*(10 * 9 * 8 * 7 * 6 * 5 * 4 * 3 * 2 / 2. / 3. / 4. / 5. / 6. / 7. / 8. / 9.) / MAX;
new_cards[i + 0] += n_cards[i] * i*(i – 1)*(i – 2)*(i – 3)*(i – 4)*(i – 5)*(i – 6)*(i – 7)*(i – 8)*(i – 9)*(10 * 9 * 8 * 7 * 6 * 5 * 4 * 3 * 2 * 1 / 2. / 3. / 4. / 5. / 6. / 7. / 8. / 9. /10.) / MAX;
}
}
for(i = 0; i .000001)
printf(” %3d: %12.10lf\n”, i, n_cards[i]);
}
}

printf(“n = %d\n”, n);
}

5. Jimmy Waters says:

I like your solution and explanation! I solved it a similar way (and used Julia as well). I was a little disappointed that most of the solutions I saw simply simulated the process over and over and averaged the results.

But – I don’t think this answer is *quite* right! I know it was the accepted solution, and the calculated expected packs are definitely correct, but the question asked for the expected number of weeks, which wouldn’t just be packs/10, since the packs are not distributed evenly throughout the week (at least that was not my interpretation – it seems much more reasonable that he is buying all ten at once one time per week, rather than one every 16.8 hours). For example if the packs happened to be distributed so that 25% of the time it took 49, 50% of the time it took 50, and 25% of the time it took 51, the expected number of packs would be 50, but the expected number of weeks would be .25*5 + .5*5 + .25*6 = 5.25, since packs 49 and 50 occur at week 5, but pack 51 occurs at week 6.

6. D says:

by the way, to complement your final analysis, it occurs to me some basic error bounds are useful.

The underlying idea relates to stopping trials (or overshoots in random walks).

Suppose we model this as having a draw of one card at a time but in fact get $m$ in a pack. Then for any sample path, once we’ve recovered an entire collection, we stop collecting more cards and we have at a minimum of an overshoot of 0 and at a maximum an overshoot of $m-1$ cards (i.e. all the cards in the pack are unused except the ‘first’ one).

so in terms of numbers of cards

$\text{single card solution} \leq \text{expected cards for actual problem} \leq \text{single card solution} + (m-1)\leq \text{single card solution} +m$

and dividing by $m$ cards per pack we can change the units to get
======================

$\text{(single cards solution divided by m) }\leq \text{expected actual solution in terms of packs} \leq \text{(single cards solution divided by m)} + 1$

======================
edit: Hah! I had this idea a week or two after looking at the problem and I forgot that there are no duplicates in the packs… in effect dealing $m$ cards at a time for a given draw / pack without replacement. The math of course is easier with replacement, and there are some standard hypergeometric bounds we could use, but anyway the above idea is a bit incomplete I guess.