Your SlideShare is downloading. ×
0
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Large-Scale Graph Processing〜Introduction〜(完全版)

37,878

Published on

グラフデータの大規模処理はMapReduceよりも効率の良い計算モデル が提案され、Google Pregel・Giraph・Hama・GoldenOrb等のプロジェクトにおいて実装 が進められています。またHamaやGiraphはNextGen Apache Hadoop MapReduceへ の対応が進められています。本LTでは"Large Scale Graph …

グラフデータの大規模処理はMapReduceよりも効率の良い計算モデル が提案され、Google Pregel・Giraph・Hama・GoldenOrb等のプロジェクトにおいて実装 が進められています。またHamaやGiraphはNextGen Apache Hadoop MapReduceへ の対応が進められています。本LTでは"Large Scale Graph Processing"とはどのようなものをMap Reduceと比較して紹介するとともに、最後に各プロジェクトの特徴を挙げています。

Published in: Technology
1 Comment
30 Likes
Statistics
Notes
  • This slide is a 'full', the light version is here →
    http://www.slideshare.net/doryokujin/largescale-graph-processingintroductionlt
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
37,878
On Slideshare
0
From Embeds
0
Number of Embeds
16
Actions
Shares
0
Downloads
264
Comments
1
Likes
30
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. http://www.catehuston.com/blog/2009/11/02/touchgraph/
  • 2. Hadoop MapReduce デザインパターン——MapReduceによる大規模テキストデータ処理1 Jimmy Lin, Chris Dyer�著、神 林 飛志、野村 直之�監修、玉川 竜司�訳2 2011年10月01日 発売予定3 210ページ4 定価2,940円
  • 3. Shuffle & barrier job start/ shutdowni i+1
  • 4. 1 B E 5 1 4A D G 3 3 2 4 C 5 F
  • 5. 5 1 B E 5 1 3 4A D G 3 3 2 5!4 min(6,4) 4 1 B E C 5 F 5 1 i 3 4 A D G 3 3 2 4 3 2 C 5 F i+1
  • 6. a super step http://en.wikipedia.org/wiki/Bulk_Synchronous_Parallel
  • 7. ...
  • 8. a super step
  • 9. a super step
  • 10. 1 B E 5 1 4A D G 3 3 2 4 C 5 F initialize
  • 11. +∞ B 1 +∞ E 5 10 +∞ 4 +∞A D G 3 3 2 4 +∞ C 5 F +∞ 1
  • 12. +∞ B 1 +∞ E 5 10 +∞ 4 +∞A D G 3 3 2 4 +∞ C 5 F +∞ 1
  • 13. +∞ B 1 +∞ E 5 10 +∞ 4 +∞A D G 3 3 2 4 +∞ C 5 F +∞ 1
  • 14. 5 1 +∞ B E 5 10 3 4 +∞A D G 3 3 2 4 +∞ C 5 F +∞ 1
  • 15. 5 1 +∞ B E 5 10 3 4 +∞A D G 3 3 2 4 +∞ C 5 F +∞ 2
  • 16. 5 1 +∞ B E 5 10 3 4 +∞A D G 3 3 2 4 +∞ C 5 F +∞ 2
  • 17. 4 1 6 B E 5 10 3 4 +∞A D G 3 3 2 4 6 C 5 F 5 2
  • 18. 4 1 6 B E 5 10 3 4 +∞A D G 3 3 2 4 6 C 5 F 5 3
  • 19. 4 1 6 B E 5 10 3 4 +∞A D G 3 3 2 4 6 C 5 F 5 3
  • 20. 4 1 5 B E 5 10 3 4 9A D G 3 3 2 4 6 C 5 F 5 3
  • 21. 4 1 5 B E 5 10 3 4 9A D G 3 3 2 4 6 C 5 F 5 end
  • 22. class ShortestPathMapper(Mapper) def map(self, node_id, Node): # send graph structure emit node_id, Node # get node value and add it to edge distance dist = Node.get_value() for neighbour_node_id in Node.get_adjacency_list(): dist_to_nbr = Node.get_distance( node_id, neighbour_node_id ) emit neighbour_node_id, dist + dist_to_nbr
  • 23. class ShortestPathReducer(Reducer): def reduce(self, node_id, dist_list): min_dist = sys.maxint for dist in dist_list: # dist_list contains a Node if is_node(dist): Node = dist elif dist < min_dist: min_dist = dist Node.set_value(min_dist)" emit node_id, Node
  • 24. # In-Mapper Combinerclass ShortestPathMapper(Mapper): def __init__(self): self.buffer = {} def check_and_put(self, key, value): if key not in self.buffer or value < self.buffer[key]: self.buffer[key] = value def check_and_emit(self): if is_exceed_limit_buffer_size(self.buffer): for key, value in self.buffer.items(): emit key, value self.buffer = {} def close(self): for key, value in self.buffer.items(): emit key, value
  • 25. #...continue def map(self, node_id, Node): # send graph structure emit node_id, Node # get node value and add it to edge distance dist = Node.get_value() for nbr_node_id in Node.get_adjacency_list(): dist_to_nbr = Node.get_distance(node_id, nbr_node_id) dist_nbr = dist + dist_to_nbr check_and_put(nbr_node_id, dist_nbr) check_and_emit()
  • 26. # Shimmy trickclass ShortestPathReducer(Reducer): def __init__(self): P.open_graph_partition() def emit_precede_node(self, node_id): for pre_node_id, Node in P.read(): if node_id == pre_node_id: return Node else: emit pre_node_id, Node
  • 27. #(...continue) def reduce(node_id, dist_list): Node = self.emit_precede_node(node_id) min_dist = sys.maxint for dist in dist_list: if dist < min_dist: min_dist = dist Node.set_value(min_dist) emit node_id, Node
  • 28. +∞ B 1 +∞ E 5 10 +∞ 4 +∞A D G 3 3 2 4 +∞ C 5 F +∞ 1
  • 29. +∞ B 1 +∞ E 5 10 +∞ 4 +∞A D G 3 3 2 4 +∞ C 5 F +∞ 1
  • 30. +∞ B 1 +∞ E 5 10 +∞ 4 +∞A D G 3 3 2 4 +∞ C 5 F +∞ 1
  • 31. 5 1 +∞ B E 5 10 3 4 +∞A D G 3 3 2 4 +∞ C 5 F +∞ 2
  • 32. 5 1 +∞ B E 5 10 3 4 +∞A D G 3 3 2 4 +∞ C 5 F +∞ 2
  • 33. 4 1 6 B E 5 10 3 4 +∞A D G 3 3 2 4 6 C 5 F 5 3
  • 34. 4 1 6 B E 5 10 3 4 +∞A D G 3 3 2 4 6 C 5 F 5 3
  • 35. 4 1 5 B E 5 10 3 4 9A D G 3 3 2 4 6 C 5 F 5 4
  • 36. 4 1 5 B E 5 10 3 4 9A D G 3 3 2 4 6 C 5 F 5 4
  • 37. 4 1 5 B E 5 10 3 4 9A D G 3 3 2 4 6 C 5 F 5 5
  • 38. 4 1 5 B E 5 10 3 4 9A D G 3 3 2 4 6 C 5 F 5 5
  • 39. 4 1 5 B E 5 10 3 4 9A D G 3 3 2 4 6 C 5 F 5 end
  • 40. class ShortestPathVertex: def compute(self, msgs): min_dist = 0 if self.is_source() else sys.maxint; # get values from all incoming edges. for msg in msgs: min_dist = min(min_dist, msg.get_value()) if min_dist < self.get_value(): # update current value(state). " self.set_current_value(min_dist) # send new value to outgoing edge. out_edge_iterator = self.get_out_edge_iterator() for out_edge in out_edge_iterator: recipient = out_edge.get_other_element(self.get_id()) self.send_massage(recipient.get_id(), min_dist + out_edge.get_distance() ) self.vote_to_halt()
  • 41. Pregel
  • 42. Science and Technology), South Korea edwardyoon@apache.org Science and Technology), South Korea swseo@calab.kaist.ac.kr jaehong@calab.kaist.ac.kr Seongwook Jin Jin-Soo Kim Seungryoul Maeng Computer Science Division School of Information and Communication Computer Science DivisionKAIST (Korea Advanced Institute of Sungkyunkwan University, South Korea KAIST (Korea Advanced Institute ofScience and Technology), South Korea jinsookim@skku.edu Science and Technology), South Korea swjin@calab.kaist.ac.kr maeng@calab.kaist.ac.kr Abstract—APPLICATION. Various scientific computations HAMA APIhave become so complex, and thus computation tools play an HAMA Core HAMA Shellimportant role. In this paper, we explore the state-of-the-artframework providing high-level matrix computation primitives Computation Enginewith MapReduce through the case study approach, and demon- MapReduce BSP Dryad (Plugged In/Out)strate these primitives with different computation engines toshow the performance and scalability. We believe the opportunity Zookeeper Distributed Lockingfor using MapReduce in scientific computation is even morepromising than the success to date in the parallel systemsliterature. HBase Storage Systems HDFS RDBMS I. I NTRODUCTION File As cloud computing environment emerges, Google has Fig. 1. The overall architecture of HAMA.introduced the MapReduce framework to accelerate parallel http://wiki.apache.org/hama/Articlesand distributed computing on more than a thousand of in-expensive machines. Google has shown that the MapReduceframework is easy to use and provides massive scalability HAMA is a distributed framework on Hadoop for massivewith extensive fault tolerance [2]. Especially, MapReduce fits matrix and graph computations. HAMA aims at a power-well with complex data-intensive computations such as high- ful tool for various scientific applications, providing basicdimensional scientific simulation, machine learning, and data primitives for developers and researchers with simple APIs.mining. Google and Yahoo! are known to operate dedicated HAMA is currently being incubated as one of the subprojectsclusters for MapReduce applications, each cluster consisting of Hadoop by the Apache Software Foundation [10].of several thousands of nodes. One of typical MapReduce Figure 1 illustrates the overall architecture of HAMA.

×