# Construct a minimum spanning tree covering a specific subset of the vertices

I have an undirected, positive-edge-weight graph *(V,E)* for which I want a minimum spanning tree covering a subset *k* of vertices *V* (the Steiner tree problem).

I'm not limiting the size of the spanning tree to *k* vertices; rather I know exactly *which* *k* vertices must be included in the MST.

Starting from the entire MST I could pare down edges/nodes until I get the smallest MST that contains all *k*.

I can use Prim's algorithm to get the entire MST, and start deleting edges/nodes while the MST of subset k is not destroyed; alternatively I can use Floyd-Warshall to get all-pairs shortest paths and somehow union the paths. Are there better ways to approach this?

## Answers

There's a lot of confusion going on here. Based on what the OP says:

I'm not limiting the size of the spanning tree to k vertices; rather I know exactly which *k* vertices must be included in the MST.

**This is the Steiner tree problem on graphs.** *This is not the k-MST problem.* The Steiner tree problem is defined as such:

Given a weighted graph G = (V, E), a subset S ⊆ V of the vertices, and a root r ∈ V , we want to find a minimum weight tree which connects all the vertices in S to r. 1

As others have mentionned, this problem is NP-hard. Therefore, you can use an approximation algorithm.

**Early/Simple Approximation Algorithms**

Two famous methods are **Takahashi's method** and **Kruskal's method** (both of which have been extended/improved by Rayward-Smith):

- Takahashi H, Matsuyama A: An approximate solution for the Steiner problem in graphs. Math. Jap 1980, 24:573–577.
- Kruskal JB: On the Shortest Spanning Subtree of a Graph and the Traveling Salesman Problem. In Proceedings of the American Mathematical Society, Volume 7. ; 1956:48–50.
- Rayward-Smith VJ, Clare A: On finding Steiner vertices. Networks 1986, 16:283–294.

*Shortest path approximation by Takahashi (with modification by Rayward-Smith)*

*Kruskal's approximation algorithm (with modification by Rayward-Smith)*

**Modern/More Advanced Approximation Algorithms**

In biology, more recent approaches have treated the problem using the cavity method, which has led to a "modified belief propagation" method that has shown good accuracy on large data sets:

- Bayati, M., Borgs, C., Braunstein, A., Chayes, J., Ramezanpour, A., Zecchina, R.: Statistical mechanics of steiner trees. Phys. Rev. Lett. 101(3), 037208 (2008) 15.
- For an application: Steiner tree methods for optimal sub-network identification: an empirical study. BMC Bioinformatics. BMC Bioinformatics 2013 30;14:144. Epub 2013 Apr 30.

In the context of search engine problems, approaches have focused on efficiency for very large data sets that can be pre-processed to some degree.

- G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabarti, and S. Sudarshan. Keyword Searching and Browsing in Databases using BANKS. In ICDE, pages 431–440.
- G. Kasneci, M. Ramanath, M. Sozio, F. M. Suchanek, and G. Weikum. STAR: Steiner-tree approximation in relationship graphs. In ICDE’09, pages 868–879, 2009

The problem you stated is a famous NP-hard problem, called Steiner tree in graphs. There are no known solutions in polynomial time and many believe no such solutions exist.

Run Prim's algorithm on the restricted graph (*k*, *E'*) where *E'* = {(*x*, *y*) ∈ *V* : *x* ∈ *k* and *y* ∈ *k*}). Constructing that graph takes O(|*E*|).