This Riddler puzzle is about a topic near and dear to many hearts: Pokémon!
Your neighborhood park is full of Pokéstops — places where you can restock on Pokéballs to, yes, catch more Pokémon! You are at one of them right now and want to visit them all. The Pokéstops are located at points whose (x, y) coordinates are integers on a fixed coordinate system in the park.
For any given pair of Pokéstops in your park, it is possible to walk from one to the other along a path that always goes from one Pokéstop to another Pokéstop adjacent to it. (Two Pokéstops are considered adjacent if they are at points that are exactly 1 unit apart. For example, Pokéstops at (3, 4) and (4, 4) would be considered adjacent.)
You’re an ambitious and efficient Pokémon trainer, who is also a bit of a homebody: You wish to visit each Pokéstop and return to where you started, while traveling the shortest possible total distance. In this open park, it is possible to walk in a straight line from any point to any other point — you’re not confined to the coordinate system’s grid. It turns out that this is a really hard problem, so you seek an approximate solution.
If there are N Pokéstops in total, find the upper and lower bounds on the total length of the optimal walk. (Your objective is to find bounds whose ratio is as close to 1 as possible.)
Advanced extra credit: For solvers who prefer a numerical question with this theme, suppose that the Pokéstops are located at every point with coordinates (x, y), where x and y are relatively prime positive integers less than or equal to 1,000. Find upper and lower bounds for the length of the optimal walk, again seeking bounds whose ratio is as close to 1 as possible.
The problem of visiting a set of locations while minimizing total distance traveled is known as a Traveling Salesman Problem (TSP), and it is indeed a famous and notoriously difficult problem in computer science. That being said, bounding the solution to a particular TSP instance can be easy if we take advantage of its structure.
Here is my solution to the first part:
[Show Solution]
To clarify, we know there are $N$ Pokéstops but not how they are arranged. The question asks us to find the smallest and largest possible value for the minimum tour that visits each Pokéstop and returns to the starting point.
Upper bound
The worst-case arrangement of Pokéstops is to have them lie in a line, which leads to a path length of $2(N-1)$. Proving that this is indeed the worst possible arrangements of stops is a bit more involved, so read on if you’re interested.
Let $T$ be a minimum spanning tree for the set of Pokéstops. Any tree with $N$ nodes has $N-1$ edges, so $T$ has $N-1$ edges. It also follows from the adjacency property of Pokéstops that each edge of $T$ must have length $1$. One way to visit all the nodes is by using a depth-first tree traversal:
- Imagine all edges of $T$ are initially colored black.
- Pick a node at random to be the current node.
- Find a black edge emanating from the current node. Paint that edge blue and follow it to the next node, which becomes the new current node. If there are no more black edges, follow a blue edge instead and paint it red.
- Repeat until all edges are painted red.
This process clearly visits all the nodes and visits each edge exactly twice, for a total tour length of $2(N-1)$. This construction works for any arrangement of Pokéstops. Of course this is merely an upper bound; the optimal path may be shorter. By arranging the $N$ stops in a line, we attain our upper bound of $2(N-1)$, so it must be a tight bound.
Lower bound
Since the Pokéstops lie at integer coordinates, the shortest possible distance from one to the next is $1$. So a path visiting $N$ Pokéstops and then returning to the start has length at least $N$. Does there exist an arrangement of Pokéstops that achieves this minimum? If $N$ is even, the answer is yes; arrange the stops in a $2\times N$ grid:
If the number of stops is odd, however, it’s impossible to arrange them in a way that creates a tour of length $N$. To see why, imagine that the stops are on a checkerboard. Each minimum-length move takes you from black square to a white square or vice versa. Therefore, we can’t return to where we started after an odd number of moves. The second-shortest move has length $\sqrt{2}$, so the best we can do is:
Therefore, a lower bound for the length of the optimal walk is given by:
\[
\text{Lower Bound} = \begin{cases}
0 & \text{if }N = 1 \\
N & \text{if }N\text{ is even} \\
N-1+\sqrt{2} & \text{if }N\text{ is odd and }N \ge 3
\end{cases}
\]
Putting it all together
So in the end, the smallest possible ratio of bounds assuming there are at least two Pokéstops is given by:
$\displaystyle
\text{Bound Ratio} = \begin{cases}
2-\tfrac{2}{N} & \text{if }N\text{ is even} \\
2-\tfrac{2\sqrt{2}}{N-1+\sqrt{2}} & \text{if }N\text{ is odd}
\end{cases}
$
The smallest bound ratio that works for all $N$ is $2$, so this is the ratio we should use if we don’t know how many Pokéstops there are.
Here is my solution to the second part:
[Show Solution]
For the second part of the Riddle, we are given a very specific arrangement of Pokéstops: the set of points $1 \le i,j \le M$ such that $\mathrm{gcd}(i,j) = 1$ (greatest common divisor is $1$). The problem asks for $M=1000$, but we’ll keep $M$ as a parameter for now.
Upper bounds
Any tour that visits all the Pokéstop and returns to the starting point gives us an upper bound. With this many nodes, it’s impossible to check every tour, so we’ll compare different heuristics for choosing admissible paths and see how they compare.
Trivial upper bound: An obvious way to hit all the stops is to simply visit every point with integer coordinates in the $M\times M$ square. We know how to do this optimally from the first part of the problem; the optimal tour has length $M^2$ if $M$ is even and $M^2+\sqrt{2}-1$ if $M$ is odd. See the illustration below for example constructions:
So the trivial bound for the case of interest is 1,000,000. We can definitely do better!
Sierpiński bound: One way to find a reasonable tour is to use a space-filling curve, which maps every point in the square to a point on a curve. This gives us a way to order the stops based on where they lie on the curve. We then visit the stops in that order. This heuristic was pioneered by Prof. John Bartholdi III and others, and does very well in many cases.
The Sierpiński curve is often used here, and it’s what I used to find an upper bound. Here are some examples of solutions for different $M$ values.
These paths have lengths 313.9 and 1828.8, respectively. Using this method, our upper bound improves to 815,234 for the $M=1000$ case, and this was computed in 0.35 seconds.
Columnwise bound: The Sierpiński approach takes advantage of the pseudouniform distribution of points, but it doesn’t leverage the fact that points often appear in lines. For example, the rows and columns corresponding to prime coordinates are always full. Another heuristic, suggested by Diarmuid Early in a comment, is to proceed in a lawnmower-like pattern, but two columns at a time. So as we move up or down a pair of columns, we pick the next nearest point from the pair of available points at each step. Here are examples for the same grid sizes.
These paths are slightly shorter than the Sierpiński ones, with lengths of 308.3 and 1721.8, respectively. Using this method, our upper bound improves to 735,394 for the $M=1000$ case, and was computed in 2 seconds.
More on upper bounds: Other notable heuristics for upper-bounding TSP problems include the nearest neighbor algorithm and the Christofides algorithm. These don’t work well for our instance because they require computing all pairwise distances, a prohibitively expensive task when we have roughly 600,000 Pokéstops. The Sierpiński and columnwise algorithms have complexity $\mathcal{O}(n)$, where $n$ is the number of stops, whereas computing pairwise distances is $\mathcal{O}(n^2)$. This is a significant difference because $n$ is so large — if an $\mathcal{O}(n)$ algorithm takes 1 second then $\mathcal{O}(n^2)$ will take roughly 7 days!
Lower bounds
Trivial lower bound: One possibility is to count how many total stops there are, and assume we can find a path consisting entirely of adjacent stops. This isn’t possible of course, but it yields a lower bound. It’s relatively easy to write computer code to count this quantity, or we can use the fact that the density of stops in the $M \times M$ square approaches $6/\pi^2 \approx 0.6079$ as $M\to\infty$ (source). For the case $M=1000$, the lower bound turns out to be 608,383. Lots of room for improvement!
Averaging bound: Another observation, again courtesy of Diarmuid Early, is that each Pokéstop has two edges emanating from it. So if we assume each stop is connected to its two nearest stops, we will get another lower bound on the shortest possible path. Computing this quantity for the case $M=1000$, we obtain a much-improved 643,811.
Held-Karp relaxation: This final bounding method is significantly more advanced. Finding a set of edges (i.e. assigning each edge a label 0 or 1) that yields a valid tour requires two things:
- each node has exactly two incident edges labeled 1
- there are no disjoint cycles (it’s a connected path)
What makes this problem so hard is the second constraint. There are exponentially many cycles, so it’s not feasible to rule each one out. If we relax the problem by removing the disjoint cycles constraint, and allow the label on each edge to be any number between 0 and 1, the problem turns into a linear program (LP), which can be solved very efficiently. Successive refinements can then be made to improve the solution by ruling out particular cycles. This is known as the Held-Karp LP relaxation. Unfortunately, this still requires computing all pairwise distances, which we already decided was too costly (see above). The LP also has $\mathcal{O}(n^2)$ variables, since we need a variable for each edge. So I made two simplifying assumptions:
- Each node may only be connected to one of its $k$ nearest neighbors. This is a relaxation to the GTSP, (traveling salesmsan problem on a graph). This means the number of variables, constraints, and distances we need to compute are now $\mathcal{O}(kn)$.
- The final path will be symmetric ($x=y$). This allows us to use half as many variables and constraints and greatly speeds up computation.
To find an appropriate $k$, I solved the relaxed LP with $k$ values of 4, 8, 12, 16, 20. The solutions were 695251, 694956, 694954, 694952, 694952. So it seems there is no use in adding any more edges. For those that are interested, I implemented the solution in Julia using JuMP with Gurobi. On my laptop, the LPs took 30–90 seconds to solve.
Putting everything together
Here is a plot comparing all the bounds above.
So ultimately, our best bound for the case $M=1000$ is
\[
694952 \le L^\star \le 735394
\]
for an approximation ratio of about $1.06$.
It’s not easy to evaluate this without a computer, but one way to get a higher lower bound is to sum over all pairs the average of the distances to its two nearest neighbors (since every point needs to be connected to 2 other distinct points). I don’t have the exact number here, but that got me to about 709k.
For the upper bound, any actual path will do. One example that comes fairly close to the LB above is to start at (1,1), go up the first column, then going down and up pairs of alternating columns (e.g. down cols 2/3, up 4/5, down 6/7, etc, then end by going down 998/999/1000 together) excluding the bottom row, then finally go back along the bottom row. Because many of the columns are fairly sparse, going up and down in pairs takes a lot less distance than going up and down each one (but groups of 3 or more columns seem to give too big a trade-off with extra over-and-back horizontally). That got me to an upper bound of about 735k.
I can’t think of any way to get further than that without using a lot more computing power – but will be interested to see what you come up with! Thanks for another good solution.
(Also, as a small niggle, I think you’ve got the bounding inequality for L*(M) the wrong way around!)
Very nice! My upper bound isn’t as good as yours but I think my lower bound is better. I’m still working on the write-up. Should be done soon. Would you mind checking your numbers again? When I tried your suggestion of averaging the two nearest neighbors for each node, I got 643,811 instead of 709k. I’ll include both your solution and mine in the write-up, since that will give the best possible approximation ratio.
Ah – what a shame… I thought the bounds seemed surprisingly close together, I guess I should have realized it was too close to be true! I don’t have the original file I did it in to hand, but yes, when I reran it, I got your value for the LB (643,811.507). Thanks for double-checking!
Thanks for doing all this work! Are the curves straight if you use M^2 as the axis? I would guess that they are apart from a little steepening for M under 100 where coprimes get a bit more dense.
The optimum for M=50 is 1773.05, which can be computed in a few seconds with MAOS:
https://github.com/wiomax/MAOS-TSP
So a reasonable bet is that the optimium for 1000 is a little less than 400 times that (709220). A little less because the coprimes are about 1% more dense for M=50 than for M=1000.
My computer runs out of its 8G of memory by M=100, but it wouldn’t be too hard to compute and splice together 400 50×50 paths (or maybe 100 100×100 ones with a roomier computer) to get a rather tight upper bound.
Your intuition is correct about the shape of the curves. All the curves look like scaled versions of $M^2$.