Sniff out the spies

This interesting problem appeared on the Riddler blog. Here it goes:

There are N agents and K of them are spies. Your job is to identify all the spies. You can send a given number of agents to a “retreat” on a remote island. If all K spies are present at the retreat, they will meet to strategize. If even one spy is missing, this spy meeting will not take place. The only information you get from a retreat is whether or not the spy meeting happened. You can send as many agents as you like to the retreat, and the retreat can happen as many times as needed. You know the values of N and K.

What’s the minimum number of retreats needed to guarantee you can identify all K spies? If each retreat costs \$1,000 per person, what is the total cost to identify all K spies?

To begin with, let’s assume you know that N = 1,024 and K = 17.

Here is my solution for $K=1$:
[Show Solution]

And here is a partial solution for $K \gt 1$:
[Show Solution]

23 thoughts on “Sniff out the spies”

  1. Let’s assume we try groups of size 2^m to start and test all N/2^m non-intersecting groups, replacing ones that have spies on the next retreat, and discarding those that don’t. Worst case we have K groups left of size 2^m after this round. If we divide the group size in half each round, until size =1, we will have at most m+1 total rounds and the maximum total number of retreats will be R=2*k*m +N/m^2 -1. We can optimize over m and find that m=ln((Nln(2)+1)/2K)/ln(2). For N=1024, k=17, m=4 (rounded) (initial group size is 16, 64 groups) and we see a testing program like:

    Total Groups Members Left
    1024 64 16 272
    272 34 8 136
    136 34 4 68
    68 34 2 34
    34 34 1 17

    $1,533,000.00 199

    Where we subtract one from total number of tests for last round where group size is one.

    1. Perhaps I’m misunderstanding, but I don’t think this strategy works. You don’t get to know whether there is spy at the retreat or not — the only information you get is whether all spies were at the retreat or not. So when no meeting takes place at the retreat, you don’t actually know whether your retreat group had any spies in it or not. It could have $0,1,\dots,k-1$ spies and all would give the same answer of “no meeting”.

      1. So you start with all N. Then you systematically extract groups and hold a meeting opportunity. If the meeting takes place you discard the extracted group for the next retreat and move on to extracting the next group, if the meeting does not take place you replace the extracted group and move on to extracting the next group.

  2. The reason you do this is exactly as you say. You can only get information by extracting groups and seeing if not all spies are there. If there are spies in your test group you need to replace them to test the next group.

    1. Ah I see! I was confused because you wrote about “testing” a group of size 2^m. So what you mean by this is that you leave this group home and send everyone else to the retreat? If so then I think I follow. I’ll think about this some more and update my solution (w credit) if I can work it out!

  3. Jim is correct that Laurent’s answer is too high, and Laurent is correct that Jim’s answer is too low. Reason is that with Jim’s scenario, he is running 5 sets of retreats but only counting each participant once per retreat, when really you need at least K people at each retreat and thus need to keep re-sending people within each set.

    My answer came in at 56.08 million dollars. I was wondering if I could get it even lower. Here’s how I got it: Like Jim, I’m running multiple sets of retreats, with each set of retreats having a goal of clearing some number of suspects. Each set has a number of retreats R = ceiling(N/X), where X is the number of people excluded and N is the total number of suspects. In the worst case, each of the first K-1 retreats has exactly one spy in the excluded group, the next R-K retreats feature a meeting of spies (allowing me to clear the excluded group and leave them out of all subsequent retreats), and the last retreat has exactly one spy in the excluded group. This means that I have N-X participants in each of the first K-1 retreats, and N-iK participants in each of the subsequent retreats (except the last one), where i increases from 1 to R – (K-1). For the last retreat of the set, I’m probably excluding less than X (because N/R is probably not an integer). Instead I’m excluding x’, so I have (N-(R-(K-1))X)-x’ participants. At the end of a set, I have cleared N-((K-1)X+x’) people at a cost of $1,000 per aggregate number of retreat participants. I can add up these terms for decreasing values of X until I find the X that yields the minimal cost per cleared person.

    For N = 1024, I found X=26 and R=40 for a cost (33134) per clear (598) of 55.4. I now only have 426 suspects. I repeat with X=10 to get to 166 suspects, X=4 to get to 66 suspects, and X=2 to get to 34 suspects. At 34 suspects, I run 34 retreats clearing one person at a time. Total cost is 33134 + 14528 + 5564 + 1870 + 984 = 56,080 retreat participants or 56,080,000 dollars.

    I minimized X for each set of retreats. My next step would be to see if slightly less efficient values of X would lead to a lower over-all cost, but since the submit deadline has past, I guess I’ll stop here. Curious to hear your thoughts on this approach.

    1. Re-reading my post, I noticed a couple typographical errors. The phrase “only counting each participant once per retreat” should have read “only counting each participant once per set of retreats.” The statement “N-iK participants in each of the subsequent retreats” should have read “N-iX participants in each of the subsequent retreats”

      I also should have been clear that the strategy was to maximize the number of people cleared of being a spy at the lowest per-clearance cost. I wasn’t sure how to mathematically do this minimization, but the numbers were small, so I wrote a snippet of code to run all the possible integer values and return the values that produced the lowest per-clearance cost.

      The number of retreats in each of the phases are 40, 43, 42, 33, and 34. The phases clear 598, 260, 100, 32, and 17 people, respectively. Because each retreat within a phase would omit people cleared in earlier retreats in the phase, the number of participants per retreat goes down within the phase as I described above. The total number of retreat participants in each phase is 33134, 14528, 5564, 1870, and 984. Thus the cost is 56,080,000 dollars.

      I used a greedy algorithm that assumed minimizing the cost of each phase would minimize the total cost. I have not proven that assumption to be correct, and I’m not entirely sure that it is. The question I didn’t resolve is whether paying more to clear a different number of people earlier would pay off with a lower cost to clear people later. For example, I could have cleared 644 people using 45 retreats in the first phase at the cost of paying for 36052 retreat participants. If I did that, and then used my original scheme for the remaining 380, my cost goes up to 56,309. But maybe there’s some more optimization that would be better?

  4. Hi – same kind of idea. Your group size / group number is just a bit non integral. So you have 26/10/4/2/1 members (I have 16/8/4/2/1, really picked just cause the numbers are integers) and your total number of retreats I see as ~193 vs 199 (maybe). Not sure if you work out the non-integer stuff what happens but really same idea! I think your 50mm is not right? I see yours as 1.737mm.


  5. If you look at my optimal expression up top:


    It comes out to m=~4.4, which would be an initial group size of 21.5. I just rounded it to 4 and picked 16 because the numbers were easy!


    1. I see # of retreats with 16/8/4/2/1 as 199 and with non-integral or different (group) numbers of 21/10/4/2/1 as 194 ish.

  6. Optimizing just over intial size of groups (see above):


    For N=1024, K = 17 gives m = 4.42, initial group size = 21.5. I just picked round numbers….

  7. Thank you all for the insightful comments! I updated my solution to reflect the improved strategy of adaptively choosing $m$ at every step. I found a minimum of 191 retreats at a cost of \$65.6 million.
    I used exact integers when computing the number of retreats, so that number should be exact, though I used an approximation to compute the cost. Some of the comments note that we can bring the cost lower… This is true, but only at the cost of having more retreats. The problem asks about minimizing retreats, not minimizing dollars. I tried minimizing dollars instead and I found 229 retreats at a cost of \$57.17 million (again using approximate cost in the minimization).

    I updated my solution — comments welcome!

    1. I find Julia to have more intuitive syntax (similar to Matlab), yet it’s just as fast as Python — often much faster. It’s also compatible with Jupyter notebooks, which I really like.

      The main downside to using Julia, in my opinion, is that it’s still a very new language. So it doesn’t have any of the nice things that Matlab does, like a debugger or a profiler, or a decent IDE… The language is also constantly changing and there isn’t much in the way of online resources to get help.

  8. Laurent, I think you can do slightly better by noticing that for each round with m groups you only need to organize at most m-1 retreats. The the final retreat m is unnecessary since by that time if you have not yet learned about the Kth spy meeting in that round you will automatically know that it would occur without actually holding the final retreat. So in the given example with N=1024 and K=17 the minimum number of retreats necessary to guarantee you can identify all the spies is 37+50+50+50=187.

    1. I’m not so sure about that. When you keep a group home and send everybody else on a retreat, if there is a meeting, you can eliminate the group that stayed home, since it can’t contain a spy. If there is no meeting, then you know the group that stayed home has at least one spy in it, so you keep it around.

      It’s true that if “no meeting” occurs k times, then you can stop and immediately go to the next round (and pick a new m), since you’ve definitely identified k groups with at least one spy in each. Since there are only k spies, that means each group contains exactly one spy. But what if one of your groups has two spies in it? Then there will be fewer than k retreats with “no meeting”! In fact, if after m-1 retreats you find that “no meeting” occurred fewer than k times, you still can’t conclude anything about the final retreat because you don’t know how many spies are in each of your groups.

      1. I still don’t think you should ever hold the final m’th retreat in any round.

        This is easy to see in the last round where, since there is only one agent per “group”, if you have not identified the K’th agent after the (m-1)th retreat in that round then you automatically know the final agent/m’th group is a spy.

        Similarly, as you suggest assume there are two spies in one of the groups. After the (m-1)th retreat in that round it is correct that you still can’t conclude whether or not there is a spy in the last group. However, irrespective of this the optimal strategy is to keep the final group and refrain from holding the m’th retreat in that round.

        In preferable configurations where, for example, you have identified (K-2) groups containing agents with still two groups remaining in the round, follow the same strategy – keep the final two groups and reduce the number of retreats held by two.

        Given the goal of minimizing the number of retreats guaranteeing identification of all the spies even in the worst case configuration – I think the best you can do for certain is reduce by one the number of retreats in each round versus your solution. So for N=1024, K=17 it should never be necessary to hold more than 187 retreats.

        1. You’re absolutely right. There is never a need to do the mth retreat; we can just skip directly to the next round. Interestingly, I obtained 186 retreats with this strategy (not 187). The difference might be because I assumed that we would order our groups such that we save the two smallest groups for last. That way, in the worst case, the $k-1$ largest groups and the $2^\text{nd}$ smallest group graduate to the next round. This makes a small difference in the outcome, since it avoids having to send the $k$ largest groups to the next round.

          I updated my solution (again!) and included a plot comparing the upper and lower bounds. Thanks again for all the help and insight!

  9. My intuition tells me that the optimal number of trips will be at least “a few” above 122.

    Consider the case of N=6 agents, K=2 spies. Six choose two is 15, and 15 < 16, so the theoretical lower bound is 4 retreats. BUT, how many agents will you send on the first retreat? There is no option that guarantees the search space will be reduced to 8 or fewer possible spy pairs. The most information you can guarantee you will gain is ~0.737 bits.

    Even with the much less "blocky" numbers of N=1024,K=17, the most information you can guarantee from the first retreat is ~0.9898 bits. That's a loss of 0.0102 bits, and even if you manage to keep the "waste" to such low levels for each decision, by the back of the envelope that would still tip us over 123 guesses.

Leave a Reply

Your email address will not be published. Required fields are marked *