# Construct a minimum spanning tree covering a specific subset of the vertices

I have an undirected, positive-edge-weight graph (V,E) for which I want a minimum spanning tree covering a subset k of vertices V (the Steiner tree problem).

I'm not limiting the size of the spanning tree to k vertices; rather I know exactly which k vertices must be included in the MST.

Starting from the entire MST I could pare down edges/nodes until I get the smallest MST that contains all k.

I can use Prim's algorithm to get the entire MST, and start deleting edges/nodes while the MST of subset k is not destroyed; alternatively I can use Floyd-Warshall to get all-pairs shortest paths and somehow union the paths. Are there better ways to approach this?

There's a lot of confusion going on here. Based on what the OP says:

I'm not limiting the size of the spanning tree to k vertices; rather I know exactly which k vertices must be included in the MST.

This is the Steiner tree problem on graphs. This is not the k-MST problem. The Steiner tree problem is defined as such:

Given a weighted graph G = (V, E), a subset S ⊆ V of the vertices, and a root r ∈ V , we want to find a minimum weight tree which connects all the vertices in S to r. 1

As others have mentionned, this problem is NP-hard. Therefore, you can use an approximation algorithm.

Early/Simple Approximation Algorithms

Two famous methods are Takahashi's method and Kruskal's method (both of which have been extended/improved by Rayward-Smith):

• Takahashi H, Matsuyama A: An approximate solution for the Steiner problem in graphs. Math. Jap 1980, 24:573–577.
• Kruskal JB: On the Shortest Spanning Subtree of a Graph and the Traveling Salesman Problem. In Proceedings of the American Mathematical Society, Volume 7. ; 1956:48–50.
• Rayward-Smith VJ, Clare A: On finding Steiner vertices. Networks 1986, 16:283–294.

Shortest path approximation by Takahashi (with modification by Rayward-Smith) Kruskal's approximation algorithm (with modification by Rayward-Smith) In biology, more recent approaches have treated the problem using the cavity method, which has led to a "modified belief propagation" method that has shown good accuracy on large data sets:

• Bayati, M., Borgs, C., Braunstein, A., Chayes, J., Ramezanpour, A., Zecchina, R.: Statistical mechanics of steiner trees. Phys. Rev. Lett. 101(3), 037208 (2008) 15.
• For an application: Steiner tree methods for optimal sub-network identification: an empirical study. BMC Bioinformatics. BMC Bioinformatics 2013 30;14:144. Epub 2013 Apr 30.

In the context of search engine problems, approaches have focused on efficiency for very large data sets that can be pre-processed to some degree.

• G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabarti, and S. Sudarshan. Keyword Searching and Browsing in Databases using BANKS. In ICDE, pages 431–440.
• G. Kasneci, M. Ramanath, M. Sozio, F. M. Suchanek, and G. Weikum. STAR: Steiner-tree approximation in relationship graphs. In ICDE’09, pages 868–879, 2009

The problem you stated is a famous NP-hard problem, called Steiner tree in graphs. There are no known solutions in polynomial time and many believe no such solutions exist.

Run Prim's algorithm on the restricted graph (k, E') where E' = {(x, y) ∈ V : xk and yk}). Constructing that graph takes O(|E|).