Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
http://www.catehuston.com/blog/2009/11/02/touchgraph/
Hadoop MapReduce デザインパターン——MapReduceによる大規模テキストデータ処理1 Jimmy Lin, Chris Dyer�著、神  林 飛志、野村 直之�監修、玉川  竜司�訳2 2011年10月01日 発売予定3 ...
Shuffle &     barrier    job start/     shutdowni                i+1
1        B                   E    5           1                        4A                   D               G        3    ...
5               1            B                   E    5               1                        3   4A                     ...
a super step         http://en.wikipedia.org/wiki/Bulk_Synchronous_Parallel
...
a super step
a super step
1        B                    E    5            1                         4A                    D               G        3...
+∞ B             1           +∞                             E    5           10                   +∞   4            +∞A   ...
+∞ B             1           +∞                             E    5           10                   +∞   4            +∞A   ...
+∞ B             1           +∞                             E    5           10                   +∞   4            +∞A   ...
5               1           +∞            B                   E    5               10                       3   4         ...
5               1           +∞            B                   E    5               10                       3   4         ...
5               1           +∞            B                   E    5               10                       3   4         ...
4               1           6            B                   E    5               10                       3   4          ...
4               1           6            B                   E    5               10                       3   4          ...
4               1           6            B                   E    5               10                       3   4          ...
4               1           5            B                   E    5               10                       3   4          ...
4               1           5            B                   E    5               10                       3   4          ...
class ShortestPathMapper(Mapper)  def map(self, node_id, Node):    # send graph structure    emit node_id, Node    # get n...
class ShortestPathReducer(Reducer):    def reduce(self, node_id, dist_list):      min_dist = sys.maxint      for dist in d...
# In-Mapper Combinerclass ShortestPathMapper(Mapper):  def __init__(self):     self.buffer = {}  def check_and_put(self, k...
#...continue  def map(self, node_id, Node):    # send graph structure    emit node_id, Node    # get node value and add it...
# Shimmy trickclass ShortestPathReducer(Reducer):  def __init__(self):    P.open_graph_partition()  def emit_precede_node(...
#(...continue)  def reduce(node_id, dist_list):    Node = self.emit_precede_node(node_id)    min_dist = sys.maxint    for ...
+∞ B             1           +∞                             E    5           10                   +∞   4            +∞A   ...
+∞ B             1           +∞                             E    5           10                   +∞   4            +∞A   ...
+∞ B             1           +∞                             E    5           10                   +∞   4            +∞A   ...
5               1           +∞            B                   E    5               10                       3   4         ...
5               1           +∞            B                   E    5               10                       3   4         ...
4               1           6            B                   E    5               10                       3   4          ...
4               1           6            B                   E    5               10                       3   4          ...
4               1           5            B                   E    5               10                       3   4          ...
4               1           5            B                   E    5               10                       3   4          ...
4               1           5            B                   E    5               10                       3   4          ...
4               1           5            B                   E    5               10                       3   4          ...
4               1           5            B                   E    5               10                       3   4          ...
class ShortestPathVertex:  def compute(self, msgs):    min_dist = 0 if self.is_source() else sys.maxint;    # get values f...
Pregel
Science and Technology), South Korea             edwardyoon@apache.org                  Science and Technology), South Kor...
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
Upcoming SlideShare
Loading in …5
×
Upcoming SlideShare
Hadoop and the Data Scientist
Next
Download to read offline and view in fullscreen.

31

Share

Download to read offline

Large-Scale Graph Processing〜Introduction〜(完全版)

Download to read offline

グラフデータの大規模処理はMapReduceよりも効率の良い計算モデル が提案され、Google Pregel・Giraph・Hama・GoldenOrb等のプロジェクトにおいて実装 が進められています。またHamaやGiraphはNextGen Apache Hadoop MapReduceへ の対応が進められています。本LTでは"Large Scale Graph Processing"とはどのようなものをMap Reduceと比較して紹介するとともに、最後に各プロジェクトの特徴を挙げています。

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Large-Scale Graph Processing〜Introduction〜(完全版)

  1. 1. http://www.catehuston.com/blog/2009/11/02/touchgraph/
  2. 2. Hadoop MapReduce デザインパターン——MapReduceによる大規模テキストデータ処理1 Jimmy Lin, Chris Dyer�著、神 林 飛志、野村 直之�監修、玉川 竜司�訳2 2011年10月01日 発売予定3 210ページ4 定価2,940円
  3. 3. Shuffle & barrier job start/ shutdowni i+1
  4. 4. 1 B E 5 1 4A D G 3 3 2 4 C 5 F
  5. 5. 5 1 B E 5 1 3 4A D G 3 3 2 5!4 min(6,4) 4 1 B E C 5 F 5 1 i 3 4 A D G 3 3 2 4 3 2 C 5 F i+1
  6. 6. a super step http://en.wikipedia.org/wiki/Bulk_Synchronous_Parallel
  7. 7. ...
  8. 8. a super step
  9. 9. a super step
  10. 10. 1 B E 5 1 4A D G 3 3 2 4 C 5 F initialize
  11. 11. +∞ B 1 +∞ E 5 10 +∞ 4 +∞A D G 3 3 2 4 +∞ C 5 F +∞ 1
  12. 12. +∞ B 1 +∞ E 5 10 +∞ 4 +∞A D G 3 3 2 4 +∞ C 5 F +∞ 1
  13. 13. +∞ B 1 +∞ E 5 10 +∞ 4 +∞A D G 3 3 2 4 +∞ C 5 F +∞ 1
  14. 14. 5 1 +∞ B E 5 10 3 4 +∞A D G 3 3 2 4 +∞ C 5 F +∞ 1
  15. 15. 5 1 +∞ B E 5 10 3 4 +∞A D G 3 3 2 4 +∞ C 5 F +∞ 2
  16. 16. 5 1 +∞ B E 5 10 3 4 +∞A D G 3 3 2 4 +∞ C 5 F +∞ 2
  17. 17. 4 1 6 B E 5 10 3 4 +∞A D G 3 3 2 4 6 C 5 F 5 2
  18. 18. 4 1 6 B E 5 10 3 4 +∞A D G 3 3 2 4 6 C 5 F 5 3
  19. 19. 4 1 6 B E 5 10 3 4 +∞A D G 3 3 2 4 6 C 5 F 5 3
  20. 20. 4 1 5 B E 5 10 3 4 9A D G 3 3 2 4 6 C 5 F 5 3
  21. 21. 4 1 5 B E 5 10 3 4 9A D G 3 3 2 4 6 C 5 F 5 end
  22. 22. class ShortestPathMapper(Mapper) def map(self, node_id, Node): # send graph structure emit node_id, Node # get node value and add it to edge distance dist = Node.get_value() for neighbour_node_id in Node.get_adjacency_list(): dist_to_nbr = Node.get_distance( node_id, neighbour_node_id ) emit neighbour_node_id, dist + dist_to_nbr
  23. 23. class ShortestPathReducer(Reducer): def reduce(self, node_id, dist_list): min_dist = sys.maxint for dist in dist_list: # dist_list contains a Node if is_node(dist): Node = dist elif dist < min_dist: min_dist = dist Node.set_value(min_dist)" emit node_id, Node
  24. 24. # In-Mapper Combinerclass ShortestPathMapper(Mapper): def __init__(self): self.buffer = {} def check_and_put(self, key, value): if key not in self.buffer or value < self.buffer[key]: self.buffer[key] = value def check_and_emit(self): if is_exceed_limit_buffer_size(self.buffer): for key, value in self.buffer.items(): emit key, value self.buffer = {} def close(self): for key, value in self.buffer.items(): emit key, value
  25. 25. #...continue def map(self, node_id, Node): # send graph structure emit node_id, Node # get node value and add it to edge distance dist = Node.get_value() for nbr_node_id in Node.get_adjacency_list(): dist_to_nbr = Node.get_distance(node_id, nbr_node_id) dist_nbr = dist + dist_to_nbr check_and_put(nbr_node_id, dist_nbr) check_and_emit()
  26. 26. # Shimmy trickclass ShortestPathReducer(Reducer): def __init__(self): P.open_graph_partition() def emit_precede_node(self, node_id): for pre_node_id, Node in P.read(): if node_id == pre_node_id: return Node else: emit pre_node_id, Node
  27. 27. #(...continue) def reduce(node_id, dist_list): Node = self.emit_precede_node(node_id) min_dist = sys.maxint for dist in dist_list: if dist < min_dist: min_dist = dist Node.set_value(min_dist) emit node_id, Node
  28. 28. +∞ B 1 +∞ E 5 10 +∞ 4 +∞A D G 3 3 2 4 +∞ C 5 F +∞ 1
  29. 29. +∞ B 1 +∞ E 5 10 +∞ 4 +∞A D G 3 3 2 4 +∞ C 5 F +∞ 1
  30. 30. +∞ B 1 +∞ E 5 10 +∞ 4 +∞A D G 3 3 2 4 +∞ C 5 F +∞ 1
  31. 31. 5 1 +∞ B E 5 10 3 4 +∞A D G 3 3 2 4 +∞ C 5 F +∞ 2
  32. 32. 5 1 +∞ B E 5 10 3 4 +∞A D G 3 3 2 4 +∞ C 5 F +∞ 2
  33. 33. 4 1 6 B E 5 10 3 4 +∞A D G 3 3 2 4 6 C 5 F 5 3
  34. 34. 4 1 6 B E 5 10 3 4 +∞A D G 3 3 2 4 6 C 5 F 5 3
  35. 35. 4 1 5 B E 5 10 3 4 9A D G 3 3 2 4 6 C 5 F 5 4
  36. 36. 4 1 5 B E 5 10 3 4 9A D G 3 3 2 4 6 C 5 F 5 4
  37. 37. 4 1 5 B E 5 10 3 4 9A D G 3 3 2 4 6 C 5 F 5 5
  38. 38. 4 1 5 B E 5 10 3 4 9A D G 3 3 2 4 6 C 5 F 5 5
  39. 39. 4 1 5 B E 5 10 3 4 9A D G 3 3 2 4 6 C 5 F 5 end
  40. 40. class ShortestPathVertex: def compute(self, msgs): min_dist = 0 if self.is_source() else sys.maxint; # get values from all incoming edges. for msg in msgs: min_dist = min(min_dist, msg.get_value()) if min_dist < self.get_value(): # update current value(state). " self.set_current_value(min_dist) # send new value to outgoing edge. out_edge_iterator = self.get_out_edge_iterator() for out_edge in out_edge_iterator: recipient = out_edge.get_other_element(self.get_id()) self.send_massage(recipient.get_id(), min_dist + out_edge.get_distance() ) self.vote_to_halt()
  41. 41. Pregel
  42. 42. Science and Technology), South Korea edwardyoon@apache.org Science and Technology), South Korea swseo@calab.kaist.ac.kr jaehong@calab.kaist.ac.kr Seongwook Jin Jin-Soo Kim Seungryoul Maeng Computer Science Division School of Information and Communication Computer Science DivisionKAIST (Korea Advanced Institute of Sungkyunkwan University, South Korea KAIST (Korea Advanced Institute ofScience and Technology), South Korea jinsookim@skku.edu Science and Technology), South Korea swjin@calab.kaist.ac.kr maeng@calab.kaist.ac.kr Abstract—APPLICATION. Various scientific computations HAMA APIhave become so complex, and thus computation tools play an HAMA Core HAMA Shellimportant role. In this paper, we explore the state-of-the-artframework providing high-level matrix computation primitives Computation Enginewith MapReduce through the case study approach, and demon- MapReduce BSP Dryad (Plugged In/Out)strate these primitives with different computation engines toshow the performance and scalability. We believe the opportunity Zookeeper Distributed Lockingfor using MapReduce in scientific computation is even morepromising than the success to date in the parallel systemsliterature. HBase Storage Systems HDFS RDBMS I. I NTRODUCTION File As cloud computing environment emerges, Google has Fig. 1. The overall architecture of HAMA.introduced the MapReduce framework to accelerate parallel http://wiki.apache.org/hama/Articlesand distributed computing on more than a thousand of in-expensive machines. Google has shown that the MapReduceframework is easy to use and provides massive scalability HAMA is a distributed framework on Hadoop for massivewith extensive fault tolerance [2]. Especially, MapReduce fits matrix and graph computations. HAMA aims at a power-well with complex data-intensive computations such as high- ful tool for various scientific applications, providing basicdimensional scientific simulation, machine learning, and data primitives for developers and researchers with simple APIs.mining. Google and Yahoo! are known to operate dedicated HAMA is currently being incubated as one of the subprojectsclusters for MapReduce applications, each cluster consisting of Hadoop by the Apache Software Foundation [10].of several thousands of nodes. One of typical MapReduce Figure 1 illustrates the overall architecture of HAMA.
  • ssuserbbb2ac

    Feb. 27, 2017
  • tetsuoyutani

    Sep. 24, 2015
  • koorukuroo

    Jun. 1, 2014
  • jazzwang

    Nov. 26, 2013
  • udanax

    Nov. 25, 2013
  • ShingoOmura

    Jul. 11, 2013
  • ssuser3c3394

    Jan. 31, 2013
  • teruok

    Jul. 26, 2012
  • hiroyukisatou585

    Jun. 10, 2012
  • angushe

    May. 10, 2012
  • matsumana0101

    Apr. 30, 2012
  • whitestardev

    Apr. 3, 2012
  • fv3386

    Mar. 23, 2012
  • fivemini

    Feb. 19, 2012
  • cedar101

    Feb. 3, 2012
  • fujohnwang

    Jan. 10, 2012
  • halwhite

    Nov. 4, 2011
  • naltoma

    Oct. 16, 2011
  • disktnk

    Oct. 16, 2011
  • yanaoki

    Oct. 9, 2011

グラフデータの大規模処理はMapReduceよりも効率の良い計算モデル が提案され、Google Pregel・Giraph・Hama・GoldenOrb等のプロジェクトにおいて実装 が進められています。またHamaやGiraphはNextGen Apache Hadoop MapReduceへ の対応が進められています。本LTでは"Large Scale Graph Processing"とはどのようなものをMap Reduceと比較して紹介するとともに、最後に各プロジェクトの特徴を挙げています。

Views

Total views

39,697

On Slideshare

0

From embeds

0

Number of embeds

30,111

Actions

Downloads

272

Shares

0

Comments

0

Likes

31

×