Hadoop MapReduce implementation of shortest PATH in a graph, not just the distance
I have been looking for "MapReduce implementation of Shortest path search algorithms".
However, all the instances I could find "computed the shortest distance form node x to y", and none actually output the "actual shortest path like x-a-b-c-y".
As for what am I trying to achieve is that I have graphs with hundreds of 1000s of nodes and I need to perform frequent pattern analysis on shortest paths among the various nodes. This is for a research project I am working on.
It would be a great help if some one could point me to some implementation (if it exists) or give some pointers as to how to hack the existing SSSP implementations to generate the paths along with the distances.
Basically these implementations work with some kind of messaging. So messages are send to HDFS between map and reduce stage.
In the reducer they are grouped and filtered by distance, the lowest distance wins. When the distance is updated in this case, you have to set the vertex (well, some ID probably) where the message came from.
So you have additional space requirement per vertex, but you can reconstruct every possible shortest path in the graph.
Based on your comment:
I will need to write another class of the vertex object to hold this additional information. Thanks for the tip, though it would be very helpful if you could point out where and when I can capture this information of where the minimum weight came from, anything from your blog maybe :-)
Yea, could be a quite cool theme, also for Apache Hama. Most of the implementations are just considering the costs not the real path. In your case (from the blog you've linked above) you will have to extract a vertex class which actually holds the adjacent vertices as LongWritable (maybe a list instead of this split on the text object) and simply add a parent or source id as field (of course also LongWritable). You will set this when propagating in the mapper, that is the for loop that is looping over the adjacent vertices of the current key node.
In the reducer you will update the lowest somewhere while iterating over the grouped values, there you will have to set the source vertex in the key vertex by the vertex that updated to the minimum.
Maybe it helps you, it is quite unmaintained so please come back to me if you have a specific question.
Here is the same algorithm in BSP with Apache Hama: