GraphX
GraphX
GraphX
What is a graph
GraphX
Examples of graph computations
● Finding common friends
● Finding the page rank
● And Many more…
GraphX
Finding common friends
Examples of graph computations
GraphX
Finding Page Rank
Examples of graph computations
GraphX
● Unifies Graph Computation
○ ETL
○ Exploratory analysis
○ Iterative
● View the same Data as Graph and Collections
● Transform and join graphs with RDDs efficiently
● Extends the Spark RDD by introducing a new Graph
abstraction
GraphX
GraphX
GraphX
○ PageRank
■ If important pages link you, you are more important
○ Connected Components
■ Clusters amongst your facebook friends
○ Triangle Counting
■ Triangles passing through each vertex => measure of
clustering.
○ Label propagation
○ SVD++
○ Strongly connected components
Has library of algorithms
GraphX
GraphX
● subgraph
● joinVertices
● aggregateMessages
● And more….
Provides set of fundamental operations
https://spark.apache.org/docs/latest/graphx-programming-guide.html
GraphX
GraphX - Pagerank
1. BarackObama
2. Lady Gaga
3. John Resig
4. Justin Bieber
6. Matei Zaharia
6. Martin Odersky
7. anonsys
PR(A) = 0.15 + 0.85 * ( rank of node / outgoing)
GraphX
GraphX - Pagerank
$ hadoop fs -cat /data/spark/graphx/followers.txt
2 1
4 1
1 2
6 3
7 3
7 6
6 7
3 7
https://github.com/cloudxlab/bigdata/blob/master/spark/examples/graphx/pagerank.scala
GraphX
GraphX - Pagerank
import org.apache.spark.graphx.GraphLoader
// Load the edges as a graph
val graph = GraphLoader.edgeListFile(sc, "/data/spark/graphx/followers.txt")
// Run PageRank
val ranks = graph.pageRank(0.0001).vertices
// Join the ranks with the usernames
val users = sc.textFile("/data/spark/graphx/users.txt").map { line =>
val fields = line.split(",")
(fields(0).toLong, fields(1))
}
val ranksByUsername = users.join(ranks).map {
case (id, (username, rank)) => (username, rank)
}
// Print the result
println(ranksByUsername.collect().mkString("n"))
See more
GraphX
GraphX - Pagerank
1. BarackObama
2. Lady Gaga
3. John Resig
4. Justin Bieber
6. Matei Zaharia
6. Martin Odersky
7. anonsys
0.15
0.70
1.39
1.46
1.0
1.3
Thank you!
GraphX
reachus@cloudxlab.com

Introduction to GraphX | Big Data Hadoop Spark Tutorial | CloudxLab

  • 1.
  • 2.
  • 3.
    GraphX Examples of graphcomputations ● Finding common friends ● Finding the page rank ● And Many more…
  • 4.
  • 5.
    GraphX Finding Page Rank Examplesof graph computations
  • 6.
    GraphX ● Unifies GraphComputation ○ ETL ○ Exploratory analysis ○ Iterative ● View the same Data as Graph and Collections ● Transform and join graphs with RDDs efficiently ● Extends the Spark RDD by introducing a new Graph abstraction GraphX
  • 7.
    GraphX GraphX ○ PageRank ■ Ifimportant pages link you, you are more important ○ Connected Components ■ Clusters amongst your facebook friends ○ Triangle Counting ■ Triangles passing through each vertex => measure of clustering. ○ Label propagation ○ SVD++ ○ Strongly connected components Has library of algorithms
  • 8.
    GraphX GraphX ● subgraph ● joinVertices ●aggregateMessages ● And more…. Provides set of fundamental operations https://spark.apache.org/docs/latest/graphx-programming-guide.html
  • 9.
    GraphX GraphX - Pagerank 1.BarackObama 2. Lady Gaga 3. John Resig 4. Justin Bieber 6. Matei Zaharia 6. Martin Odersky 7. anonsys PR(A) = 0.15 + 0.85 * ( rank of node / outgoing)
  • 10.
    GraphX GraphX - Pagerank $hadoop fs -cat /data/spark/graphx/followers.txt 2 1 4 1 1 2 6 3 7 3 7 6 6 7 3 7 https://github.com/cloudxlab/bigdata/blob/master/spark/examples/graphx/pagerank.scala
  • 11.
    GraphX GraphX - Pagerank importorg.apache.spark.graphx.GraphLoader // Load the edges as a graph val graph = GraphLoader.edgeListFile(sc, "/data/spark/graphx/followers.txt") // Run PageRank val ranks = graph.pageRank(0.0001).vertices // Join the ranks with the usernames val users = sc.textFile("/data/spark/graphx/users.txt").map { line => val fields = line.split(",") (fields(0).toLong, fields(1)) } val ranksByUsername = users.join(ranks).map { case (id, (username, rank)) => (username, rank) } // Print the result println(ranksByUsername.collect().mkString("n")) See more
  • 12.
    GraphX GraphX - Pagerank 1.BarackObama 2. Lady Gaga 3. John Resig 4. Justin Bieber 6. Matei Zaharia 6. Martin Odersky 7. anonsys 0.15 0.70 1.39 1.46 1.0 1.3
  • 13.