GraphX is Apache Spark's API for graph distributed computing based on the Pregel programming model. In this talk we'll see a brief introduction to Pregel and then we'll focus on transforming standard graph algorithms in their distributed counterpart using GraphX to speedup performance in a distributed environment.
10. Graph initial state
Node [1]: 3
Node [2]: 6
Node [3]: 2
Node [4]: 1
Graph final state
Node [1]: 6
Node [2]: 6
Node [3]: 6
Node [4]: 6
Max value of the graph is 6.
MaxValueimplementation
Results:
24. type VertexId = scala.Long
case class City(
name: String,
id: VertexId
)
case class VertexAttribute(
cityName: String,
distance: Double,
path: List[City]
)
Dijkstra'salgorithmimplementation
Typesdefinitions:
27. val sendMsg = (edgeTriplet: EdgeTriplet[VertexAttribute, Double]) =>
{
if (edgeTriplet.srcAttr.distance < (edgeTriplet.dstAttr.distance - edgeTriplet.attr)) {
Iterator( (
edgeTriplet.dstId,
new VertexAttribute(
edgeTriplet.dstAttr.cityName,
edgeTriplet.srcAttr.distance + edgeTriplet.attr,
edgeTriplet.srcAttr.path :+ new City(
edgeTriplet.dstAttr.cityName,
edgeTriplet.dstId
)
)
)
)
}
else Iterator.empty
}
Dijkstra'salgorithmimplementation
28. Going from Washington to Chicago has a distance of 105.0 km.
Path is: Washington [1] => Baltimore [2] => Detroit [3] => NewYork [5] => Chicago [4]
Going from Washington to Washington has a distance of 0.0 km.
Path is: Washington [1]
Going from Washington to Philadelphia has a distance of 91.0 km.
Path is: Washington [1] => Baltimore[2] => Detroit[3] => NewYork[5] => Philadelphia[6]
Going from Washington to Detroit has a distance of 62.0 km.
Path is: Washington [1] => Baltimore [2] => Detroit [3]
Going from Washington to NewYork has a distance of 76.0 km.
Path is: Washington [1] => Baltimore [2] => Detroit [3] => NewYork [5]
Going from Washington to Baltimore has a distance of 27.0 km.
Path is: Washington [1] => Baltimore [2]
Dijkstra'salgorithmimplementation
Results: