Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

- HDFS: Hadoop Distributed Filesystem by Steve Loughran 1667 views
- MapReduce Algorithm Design by Gabi Agustini 4172 views
- Design Patterns for Efficient Graph... by Yahoo! Developer ... 6672 views
- HW09 Social network analysis with H... by Cloudera, Inc. 5967 views
- Determining the k in k-means with M... by Thibault Debatty 986 views
- Chap10 slides by HJ DS 153 views

4,150 views

4,042 views

4,042 views

Published on

Paolo Castagna talks about Graphs on Hadoop

Published in:
Technology

No Downloads

Total views

4,150

On SlideShare

0

From Embeds

0

Number of Embeds

311

Shares

0

Downloads

127

Comments

0

Likes

5

No embeds

No notes for slide

- 1. Graph Algorithms and MapReduce Paolo Castagna The words and opinions expressed here are my own, and do not, in any way, represent the views of my employer.
- 2. Why graphs ?
- 3. I am an infornographer ! see: http://en.wikipedia.org/wiki/Infornography
- 4. ... addicted to RDF
- 5. RDF is (just) a directed labeled multigraph
- 6. RDF is (just) a directed labeled multigraph
- 7. RDF is (just) a directed labeled multigraph URI2 URI1 URI3
- 8. RDF is (just) a directed labeled multigraph URI2 URI1 URI3 URI4
- 9. RDF processing
- 10. RDF parallel processing
- 11. MapReduce ?
- 12. “ ... almost no descriptions of graph algorithms appear in the literature, with the exception of a simplified PageRank calculation and a naive implementation of finding distances from a specified node. ” Graph Twiddling in a MapReduce World, Jonathan Cohen
- 13. RDF processing Inference1 (?x p ?y) (?y q r) -> (?x rdf:type t) (?x p ?y) (?y p ?z) -> (?x p ?z) 1 using a rule engine with forward rules only and a total materialization strategy
- 14. Transitive closure
- 15. Transitive closure
- 16. MapReduce ?
- 17. Transitive closure 1: 4, 6, 7 1: 4, 6, 7 2: 5 map 3: 2, 4, 7 1, >4 4: 1, 3, 6 1, >6 5: 2, 3 1, >7 6: 5 4, <1 6, <1 7: 3, 5 7, <1
- 18. Transitive closure 1: 4, 6, 7 6, <1 2: 5 6, <4 3: 2, 4, 7 6, >5 4: 1, 3, 6 5: 2, 3 reduce 6: 5 1: 5 7: 3, 5 4: 5
- 19. Transitive closure WARNINGS: - Thinking in progress ! - Not implemented (yet) ! - Stop when no new edges are found
- 20. Transitive reduction
- 21. Transitive reduction
- 22. MapReduce ?
- 23. PageRank Lessons learned
- 24. #1 adjacency list
- 25. #2 moving the graph around at each iteration is not ideal
- 26. #3 to communicate with all the vertex use configuration parameters of a subsequent MapReduce job
- 27. “ Pregel computes over large graphs much faster than alternatives, and the application programming interface is easy to use. Implementing PageRank, for example, takes only about 15 lines of code... ” Official Google Research Blog, Grzegorz Czajkowski
- 28. “ Pregel computes over large graphs much faster than alternatives, and the application programming interface is easy to use. Implementing PageRank, for example, takes only about 15 lines of code... ” Official Google Research Blog, Grzegorz Czajkowski
- 29. Apache Hamburg ?
- 30. Graph algorithms Graph search - Depth First Search - Breadth First Search Directed (acyclic) graphs - Reachability and Transitive Closure - Topological Sorting Minimum Spanning Tree Shortest Paths Network Flow ...
- 31. Apache Common Graph (dormant)

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment