{"id":2437,"date":"2018-09-08T00:44:22","date_gmt":"2018-09-08T05:44:22","guid":{"rendered":"https:\/\/laurentlessard.com\/bookproofs\/?p=2437"},"modified":"2018-09-08T13:46:48","modified_gmt":"2018-09-08T18:46:48","slug":"card-collection-completion","status":"publish","type":"post","link":"https:\/\/laurentlessard.com\/bookproofs\/card-collection-completion\/","title":{"rendered":"Card collection completion"},"content":{"rendered":"<p>This <a href=\"https:\/\/fivethirtyeight.com\/features\/id-like-to-use-my-riddler-lifeline\/\">Riddler puzzle<\/a> is a classic probability problem: how long can one expect to wait until the entire set of cards is collected?<\/p>\n<blockquote><p>\nMy son recently started collecting Riddler League football cards and informed me that he planned on acquiring every card in the set. It made me wonder, naturally, how much of his allowance he would have to spend in order to achieve his goal. His favorite set of cards is Riddler Silver; a set consisting of 100 cards, numbered 1 to 100. The cards are only sold in packs containing 10 random cards, without duplicates, with every card number having an equal chance of being in a pack.<\/p>\n<p>Each pack can be purchased for \\$1. If his allowance is \\$10 a week, how long would we expect it to take before he has the entire set?<\/p>\n<p>What if he decides to collect the more expansive Riddler Gold set, which has 300 different cards?\n<\/p><\/blockquote>\n<p>Here is my solution:<br \/>\n<a href=\"javascript:Solution('soln_couponcollector','toggle_couponcollector')\" id=\"toggle_couponcollector\">[Show Solution]<\/a><\/p>\n<div id=\"soln_couponcollector\", style=\"display: none\">\n<p>This problem is a twist on the classical <a href=\"https:\/\/en.wikipedia.org\/wiki\/Coupon_collector%27s_problem\">Coupon Collector Problem<\/a>. In this problem, there are $n$ different coupons in a basket. For one dollar, we get to pick a coupon at random from the basket. We then return the coupon to the basket. We keep doing this until we&#8217;ve seen each of the $n$ different coupons at least once. How many dollars will we spend on average?<\/p>\n<h3>Simpler version: packs of one<\/h3>\n<p>Let&#8217;s start by solving the standard coupon collector problem. First, we&#8217;ll need the following fact: if a coin comes up Heads with probability $p$, then we have to flip it on average $1\/p$ times before we obtain our first Heads. To see why, let $x$ be the expected number of flips. If the first flip is Heads, (probability $p$), then we have performed 1 flip and we stop. If the first flip is Tails, (probability $1-p$), then we have performed 1 flip but we will need to perform $x$ more on average. Mathematically, this amounts to:<br \/>\n\\[<br \/>\nx = p\\cdot 1 + (1-p)\\cdot (1+x)<br \/>\n\\]Solving for $x$ yields $x=1\/p$. You can also solve this using recursion, as in my solution to the <a href=\"https:\/\/laurentlessard.com\/bookproofs\/monsters-gems\/\">Monsters&#8217; gems puzzle<\/a>.<\/p>\n<p>We can think of the coupon collection problem by asking how many draws are required before we pick our next <em>new<\/em> coupon. Our first coupon is always new (probability 1) and will take 1 draw. The probability of picking our next new coupon is $\\tfrac{n-1}{n}$ since any of the $n-1$ remaining coupons will do, and this will take on average $\\tfrac{n}{n-1}$ draws. Next, our probability is $\\tfrac{n-2}{n}$, which takes on average $\\tfrac{n}{n-2}$ draws, and so on. Therefore, the expected number of draws $C_n$ in order to pick all $n$ coupons is:<br \/>\n\\begin{align}<br \/>\nC_n<br \/>\n&#038;= \\frac{n}{n} + \\frac{n}{n-1} + \\frac{n}{n-2} + \\cdots + \\frac{n}{1} \\\\<br \/>\n&#038;= n \\left( 1 + \\frac{1}{2} + \\cdots + \\frac{1}{n} \\right) \\\\<br \/>\n&#038;\\approx n (\\log n + \\gamma)<br \/>\n\\end{align}Where $\\gamma \\approx 0.5772$ is the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Euler%E2%80%93Mascheroni_constant\">Euler-Mascheroni constant<\/a> and the approximation becomes exact as $n\\to\\infty$. This is a <a href=\"https:\/\/en.wikipedia.org\/wiki\/Harmonic_series_(mathematics)#Partial_sums\">harmonic sum<\/a> and it has surfaced in several past problems (<a href=\"https:\/\/laurentlessard.com\/bookproofs\/should-the-grizzly-eat-the-salmon\/\">bears<\/a>, <a href=\"https:\/\/laurentlessard.com\/bookproofs\/how-many-bananas-can-the-camel-carry\/\">camels<\/a>, <a href=\"https:\/\/laurentlessard.com\/bookproofs\/where-will-the-seven-dwarfs-sleep-tonight\/\">dwarfs<\/a>).<\/p>\n<h3>Full version: multi-packs<\/h3>\n<p>Let&#8217;s now assume that we draw $m$ coupons at a time from the basket, which contains $n$ distinct coupons. This is precisely the original problem (with $m=10$ and $n=100$). Let&#8217;s call $X_k$ the expected number of additional draws required to obtain all $n$ coupons given that we have already collected $k$ coupons. We could get lucky and obtain several new coupons in our draw, or we could strike out and receive all duplicates. In general, there are $\\binom{k}{m-i}\\binom{n-k}{i}$ ways to obtain $i$ new coupons (out of $\\binom{n}{m}$ total possible draws), since we can choose $i$ out of the $n-k$ possible new coupons remaining and $m-i$ out of the $k$ already-seen coupons. Upon drawing our $i$ new coupons, we will have to make $X_{k+i}$ further draws. We therefore have the recursion:<br \/>\n\\[<br \/>\nX_k = 1 + \\sum_{i=0}^m \\frac{\\binom{k}{m-i}\\binom{n-k}{i}}{\\binom{n}{m}} X_{k+i}<br \/>\n\\]Noting that $X_k$ occurs on both sides, we can simplify and obtain:<br \/>\n\\[<br \/>\nX_k = \\frac{1}{\\binom{n}{m}-\\binom{k}{m}}\\left( \\binom{n}{m} + \\sum_{i=1}^m \\binom{k}{m-i}\\binom{n-k}{i} X_{k+i} \\right)<br \/>\n\\]Now each $X_k$ depends on $X_{k+1},X_{k+2},\\dots$, and we have the terminal condition $X_n = 0$. So we can work backwards, first evaluating $X_{n-1}$, then $X_{n-2}$, and so on until we arrive at $X_0$, which is the final answer we seek. As a side note, the fact that the probabilities above sum to $1$ is a consequence of <a href=\"https:\/\/en.wikipedia.org\/wiki\/Vandermonde%27s_identity\">Vandermonde&#8217;s Identity<\/a>, which I discussed in my post on <a href=\"https:\/\/laurentlessard.com\/bookproofs\/double-counting-part-2\/\">double counting<\/a>.<\/p>\n<p>It doesn&#8217;t appear that there is a nice closed-form expression for $X_0$, however it&#8217;s straightforward to do this numerically, as long as care is taken with the binomials causing integer overflow when $n$ and $m$ get large. We can also approximate the solution decently well: the $m=1$ case (one card per pack) requires approximately $n(\\log(n)+\\gamma)$ packs to collect all cards, so one might expect that with $m$ cards per pack, one could collect all cards roughly $m$ times faster. This turns out to be a good approximation!<\/p>\n<p>In the plot below, I computed the exact expectation and also showed the approximation $\\tfrac{n}{m}(\\log(n)+\\gamma)$.<\/p>\n<p><a href=\"https:\/\/laurentlessard.com\/bookproofs\/wp-content\/uploads\/2018\/09\/card_collection.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/laurentlessard.com\/bookproofs\/wp-content\/uploads\/2018\/09\/card_collection.png\" alt=\"\" width=\"640\" height=\"480\" class=\"aligncenter size-full wp-image-2440\" srcset=\"https:\/\/laurentlessard.com\/bookproofs\/wp-content\/uploads\/2018\/09\/card_collection.png 640w, https:\/\/laurentlessard.com\/bookproofs\/wp-content\/uploads\/2018\/09\/card_collection-300x225.png 300w\" sizes=\"auto, (max-width: 709px) 85vw, (max-width: 909px) 67vw, (max-width: 984px) 61vw, (max-width: 1362px) 45vw, 600px\" \/><\/a><\/p>\n<h3>Numerical solutions<\/h3>\n<p>The original problem statement asked about the case of $m=10$ cards per pack, with either $n=100$ or $n=300$ total different cards. Here is <a href=\"https:\/\/julialang.org\/\">Julia<\/a> code that evaluates the solution rather quickly:<\/p>\n<pre>\r\n# Julia 0.6.4\r\n# define binomial for large values of n (use infinite precision)\r\nbinom(n,m) = binomial(big(n),m)\r\nfunction expected_draws(n,m)\r\n    X = zeros(n+m+1)\r\n    for k = n-1:-1:0\r\n        q0 = binom(n,m)\/(binom(n,m)-binom(k,m))\r\n        q = [ binom(k,m-i)*binom(n-k,i)\/(binom(n,m)-binom(k,m)) for i in 1:m ]\r\n        X[k+1] = q0 + sum(q .* X[k+2:k+1+m])\r\n    end\r\n    X[1]\r\nend\r\nprintln(\"100 cards, packs of 10: Number of packs = \", expected_draws(100,10))\r\nprintln(\"300 cards, packs of 10: Number of draws = \", expected_draws(300,10))\r\n<\/pre>\n<p>and the output of the above code is:<\/p>\n<pre>\r\n100 cards, packs of 10: Number of packs = 49.94456605666412\r\n300 cards, packs of 10: Number of draws = 186.0851712198894\r\n<\/pre>\n<p>Therefore, it requires about 50 packs (5 weeks allowance) on average to collect all cards in the 100-set, and about 186 packs (18.6 weeks allowance) for the 300-card set. If we use the approximation $\\tfrac{n}{m}(\\log(n)+\\gamma)$, we actually come pretty close to the exact answer:<\/p>\n<pre>\r\n100\/10 * (log(100) + \u03b3) = 51.8238585088962\r\n300\/10 * (log(300) + \u03b3) = 188.429944186732\r\n<\/pre>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>This Riddler puzzle is a classic probability problem: how long can one expect to wait until the entire set of cards is collected? My son recently started collecting Riddler League football cards and informed me that he planned on acquiring every card in the set. It made me wonder, naturally, how much of his allowance &hellip; <a href=\"https:\/\/laurentlessard.com\/bookproofs\/card-collection-completion\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Card collection completion&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":2440,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"_uf_show_specific_survey":0,"_uf_disable_surveys":false,"footnotes":""},"categories":[7],"tags":[29,18,8,2],"class_list":["post-2437","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-riddler","tag-classics","tag-linearity-of-expectation","tag-probability","tag-riddler"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/laurentlessard.com\/bookproofs\/wp-json\/wp\/v2\/posts\/2437","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/laurentlessard.com\/bookproofs\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/laurentlessard.com\/bookproofs\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/laurentlessard.com\/bookproofs\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/laurentlessard.com\/bookproofs\/wp-json\/wp\/v2\/comments?post=2437"}],"version-history":[{"count":4,"href":"https:\/\/laurentlessard.com\/bookproofs\/wp-json\/wp\/v2\/posts\/2437\/revisions"}],"predecessor-version":[{"id":2443,"href":"https:\/\/laurentlessard.com\/bookproofs\/wp-json\/wp\/v2\/posts\/2437\/revisions\/2443"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/laurentlessard.com\/bookproofs\/wp-json\/wp\/v2\/media\/2440"}],"wp:attachment":[{"href":"https:\/\/laurentlessard.com\/bookproofs\/wp-json\/wp\/v2\/media?parent=2437"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/laurentlessard.com\/bookproofs\/wp-json\/wp\/v2\/categories?post=2437"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/laurentlessard.com\/bookproofs\/wp-json\/wp\/v2\/tags?post=2437"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}