Graphs

4,263 views

Published on

Paolo Castagna talks about Graphs on Hadoop

Published in: Technology

Graphs

  1. 1. Graph Algorithms and MapReduce Paolo Castagna The words and opinions expressed here are my own, and do not, in any way, represent the views of my employer.
  2. 2. Why graphs ?
  3. 3. I am an infornographer ! see: http://en.wikipedia.org/wiki/Infornography
  4. 4. ... addicted to RDF
  5. 5. RDF is (just) a directed labeled multigraph
  6. 6. RDF is (just) a directed labeled multigraph
  7. 7. RDF is (just) a directed labeled multigraph URI2 URI1 URI3
  8. 8. RDF is (just) a directed labeled multigraph URI2 URI1 URI3 URI4
  9. 9. RDF processing
  10. 10. RDF parallel processing
  11. 11. MapReduce ?
  12. 12. “ ... almost no descriptions of graph algorithms appear in the literature, with the exception of a simplified PageRank calculation and a naive implementation of finding distances from a specified node. ” Graph Twiddling in a MapReduce World, Jonathan Cohen
  13. 13. RDF processing Inference1 (?x p ?y) (?y q r) -> (?x rdf:type t) (?x p ?y) (?y p ?z) -> (?x p ?z) 1 using a rule engine with forward rules only and a total materialization strategy
  14. 14. Transitive closure
  15. 15. Transitive closure
  16. 16. MapReduce ?
  17. 17. Transitive closure 1: 4, 6, 7 1: 4, 6, 7 2: 5 map 3: 2, 4, 7 1, >4 4: 1, 3, 6 1, >6 5: 2, 3 1, >7 6: 5 4, <1 6, <1 7: 3, 5 7, <1
  18. 18. Transitive closure 1: 4, 6, 7 6, <1 2: 5 6, <4 3: 2, 4, 7 6, >5 4: 1, 3, 6 5: 2, 3 reduce 6: 5 1: 5 7: 3, 5 4: 5
  19. 19. Transitive closure WARNINGS: - Thinking in progress ! - Not implemented (yet) ! - Stop when no new edges are found
  20. 20. Transitive reduction
  21. 21. Transitive reduction
  22. 22. MapReduce ?
  23. 23. PageRank Lessons learned
  24. 24. #1 adjacency list
  25. 25. #2 moving the graph around at each iteration is not ideal
  26. 26. #3 to communicate with all the vertex use configuration parameters of a subsequent MapReduce job
  27. 27. “ Pregel computes over large graphs much faster than alternatives, and the application programming interface is easy to use. Implementing PageRank, for example, takes only about 15 lines of code... ” Official Google Research Blog, Grzegorz Czajkowski
  28. 28. “ Pregel computes over large graphs much faster than alternatives, and the application programming interface is easy to use. Implementing PageRank, for example, takes only about 15 lines of code... ” Official Google Research Blog, Grzegorz Czajkowski
  29. 29. Apache Hamburg ?
  30. 30. Graph algorithms Graph search - Depth First Search - Breadth First Search Directed (acyclic) graphs - Reachability and Transitive Closure - Topological Sorting Minimum Spanning Tree Shortest Paths Network Flow ...
  31. 31. Apache Common Graph (dormant)

×