TL;DR: How TSP can now be solved faster in time $2^n n^2 / 2^{\Omega(\sqrt{\log n})}$ (pre-print).
Original: https://xkcd.com/399/
It's summer and you were thinking of making a round-trip of your country. There is one problem: You're still a student and that ain't be cheap. You have two options:
Actually, let's do a small poll.
Ok, so if you opted for (a), maybe this blog post might be boring for you 😴. But if you opted for (b), you'll probably enjoy what follows.
The problem of having the shortest trip - so what you choose by (b) - is also known as the Traveling Salesmen Problem (probably you knew this, otherwise you wouldn't have landed on this post). The "formal" definition would be:
Given a list of your favorite cities and the distances between these, what is the shortest trip so that each city is visited exactly once and you return to your home city?
While there are multiple ways to go about solving this (see here for a nice Medium post), we'll focus on the dynamic programming solution. This is also what you saw in the xkcd at the beginning.
If you're into algorithms / computer science, you've probably seen this in your introductory courses. It's an algorithm devised in 1962 by Bellman, and independently by Held and Karp. The algorithm goes as follows:
You have a dynamic programming (DP) table $\mathrm{dp}$ that is indexed by subsets of the cities and by the last city visited in that subset.
Let's outline the DP-recursion. In the following, let $n$ be the number of cities and $c$ be the $n\times n$ cost matrix - so $c(i, j)$ is the distance between cities $i$ and $j$. Then,
$\begin{align} \mathrm{dp}(S, k) &= \displaystyle\min_{j \in S \setminus \{k\}}\left(\mathrm{dp}(S \setminus \{k\}, j) + c_{j,k}\right), k \in S\\ \mathrm{dp}(\{1\}, 1) &= 0, \end{align}$
where $S \subseteq [n] := \{1, \ldots, n\}$. At the end, the final solution is computed as
$\displaystyle\min_{j}\left(\mathrm{dp}([n], j) + c_{j,1}\right)$.
A convenient way to compute the $\mathrm{dp}$-table is to fix a cardinality $\ell \in \{2, \ldots, n\}$ and compute the $\mathrm{dp}$-values for all subsets of cardinality $\ell$. In other words, the $\mathrm{dp}$-table is computed layer-wise.
The algorithm runs in time $O(2^n n^2)$ since for each state $(S, k)$ we have to loop over all $j$ in $S \setminus \{k\}$.
You'd be surprised to learn that, apart from some special TSP instances, this is the best we can do for the general case.
Better said, this was the best we could do. Next, we show how you can actually boost this.
Let's take a step back and contemplate the equation that consumed 1/2 of your time in high-school:
$C_{i,j} = \displaystyle\sum_{k} A_{i,k} B_{k,j}$.
You probably don't need an explanation what this stands for. Now, take a look again the DP-recursion for TSP. Then return back to this equation. Do you see any pattern?
"Wait, is the DP-recursion a matrix multiplication?"
Indeed, the shape of the DP-recursion resembles that of a matrix multiplication. Namely, the entry $\mathrm{dp}(S \setminus \{k\}, j)$ interacts with the entry $c_{j,k}$. So this is nothing else but a matrix multiplication in the $(\min, +)$ semi-ring. Ok, this notation might raise some questions. So we've all been used to the $(+, \times)$ ring, where you multiply the entries and then sum the products. In the $(\min, +)$ semi-ring, instead of multiplying the entries you add them and take the minimum of these.
Formally a matrix multiplication (product) in the $(\min, +)$ semi-ring is defined as:
$C_{i,j} = \displaystyle\min_{k}\:(A_{i,k} + B_{k,j})$.
If you paid attention, computing $\mathrm{dp}(S, k)$ is not exactly a matrix product, but a vector inner-product in the $(\min, +)$ semi-ring between the DP-row $\mathrm{dp}(S \setminus \{k\}, :)$ and the column $c_{:, k}$ of the cost matrix.
To actually get the matrix product we want, we just need to group multiple rows of the last layer of the DP-table, put them into many square matrices, and take their product with the cost matrix. And that's it.
You probably wonder why we need all this. Well, what if I told you that matrix products in the $(\min, +)$ semi-ring can be done faster than the naive $O(n^3)$ algorithm?
So we just saw that the standard Bellman-Held-Karp algorithm is actually doing the min-plus matrix product between the previous layer of the DP-table and the cost matrix in a very naive manner.
You may ask: But does this change the time-complexity? At a first sight, the new perspective just computes the new DP-layer in another, different manner. So if a min-plus product between two $n\times n$-matrices takes $T(n)$ time, then our new algorithm takes time
$O(\frac{2^n}{n} T(n))$.
If we use the naive algorithm for the min-plus matrix product, that is $T(n) = O(n^3)$, we're still in the regime of $O(2^n n^2)$. So the question is:
Are there faster algorithms for this kind of product?
Fortunately, there are. Indeed, there has been a long line of research on this problem. Research on faster min-plus matrix product has already started in the 1976 with Fredman's work on All-Pairs Shortest Paths (APSP), which is in itself very interesting problem in algorithmic design. The fastest algorithm is due to Williams and runs in the following (very complicated looking) term:
$n^3 / 2^{\Omega(\sqrt{\log n})}$.
Combining the last parts, we obtain a faster algorithm that runs in time $2^n n^2 / 2^{\Omega(\sqrt{\log n})}$. This improves over the decades old algorithm taught in every introductory dynamic programming lecture. Feel free to use this blog post in your teaching. For more details, check out the pre-print. You can also use this proof-of-concept notebook: It implements the algorithm using the naive evaluation of the min-plus product. Figures to better visualize the algorithm for this blog post are more than welcome!
Last update: June 7, 2024