Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
2015/05/26
김상우(lightzeus@gmail.com)
정향민(jhyangmin@gmail.com)
Graph Analytics in Spark
소셜 네트워크
사람과 사람, 사람과 SNS와의 관계
월드와이드웹
하이퍼링크로 연결된 웹 페이지
Graph
Vertices + Edges  Graph
Vertex
• 사람
• 웹 페이지
• …
Edge
• 인간 관계
• 하이퍼링크
• …
Degree:
Vertex에 연결된
Edge 수
방향성 존재 가능
Degre...
PageRank
Graph에서 중요한 vertex를 찾는 방법
유저가 하이퍼링크를 따라 사이트를 방문하거나,
임의로 사이트를 방문하는 경우에 가장 중요한 website를 찾는 방법
PageRank
Graph에서 중요한 vertex를 찾는 방법
유저가 하이퍼링크를 따라 사이트를 방문하거나,
임의로 사이트를 방문하는 경우에 가장 중요한 website를 찾는 방법
Triangle Counting
선택된 vertex를 기준으로 삼각형 개수 확인
Triangle Counting
선택된 vertex를 기준으로 삼각형 개수 확인
Graph 분석 방식
Graph 분석은 선택된 vertex와 vertex의 이웃으로만 분석 함
Graph 분석 방식
Graph 분석은 선택된 vertex와 vertex의 이웃으로만 분석 함
Table vs. Graph
Table Graph
Dependency GraphTable
Result
Row
Row
Row
Row
출처: UC Berkerley Lab
Graph 저장 형태
B C
A D
F E
A DD
Property Graph
B C
D
E
AA
F
출처: UC Berkerley Lab
Graph 저장 형태
Vertex
Table
(RDD)
B C
A D
F E
A DD
Property Graph
A
B
C
D
E
A
F
출처: UC Berkerley Lab
Graph 저장 형태
Vertex
Table
(RDD)
Property Graph
Edge Table
(RDD)
A B
A C
C D
B C
A E
A F
E F
E D
B
C
D
E
A
F
출처: UC Berkerle...
Graph 저장 형태
Part. 2
Part. 1
Vertex
Table
(RDD)
B C
A D
F E
A D
Property Graph
Edge Table
(RDD)
A B
A C
C D
B C
A E
A F
E F...
Vertex
Table
(RDD)
GraphX의 분석 과정
Edge Table
(RDD)
A B
A C
C D
B C
A E
A F
E F
E D
Mirror
Cache
B
C
D
A
Mirror
Cache
D
E
F
...
Vertex
Table
(RDD)
Edge Table
(RDD)
A B
A C
C D
B C
A E
A F
E F
E D
Mirror
Cache
B
C
D
A
Mirror
Cache
D
E
F
A
GraphX의 분석 과...
Vertex
Table
(RDD)
Edge Table
(RDD)
A B
A C
C D
B C
A E
A F
E F
E D
Mirror
Cache
B
C
D
A
Mirror
Cache
D
E
F
A
GraphX의 분석 과...
GraphX 예제 (1/3)
Vertex Id는 숫자형(int, double, … )만 지원함
source destination attribute
GraphX 예제 (2/3)
import org.apache.spark._
import org.apache.spark.graphx._
import org.apache.spark.rdd.RDD
//vertices RDD ...
GraphX 예제 (3/3)
graph.degrees.collect
Array[(org.apache.spark.graphx.VertexId, Int)] =
Array((2,1), (3,2), (5,3), (7,2))
g...
PageRank Benchmark
Good!
출처: UC Berkerley Lab
Graph-Parallel 분석
• Collaborative Filtering
 Alternating Least
Squares
 Stochastic Gradient
Descent
 Tensor Factorizati...
GraphX
• Vertex와 edge로 이루어진 graph 분석
관계 분석
Graph 분석은 Hadoop이나 Naïve Spark보다 빠
름
• Map/Reduce보다 복잡한 분석
• 앞으로의 발전 계획
다양한 ...
Q&A
Upcoming SlideShare
Loading in …5
×

스사모 테크톡 - GraphX

5,379 views

Published on

스사모 (스파크 사용자 모임) 테크톡 발표 자료입니다.
https://www.facebook.com/groups/sparkkoreauser

Published in: Technology
  • If you want to download or read this book, copy link or url below in the New tab ......................................................................................................................... DOWNLOAD FULL PDF EBOOK here { https://urlzs.com/UABbn } .........................................................................................................................
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • accessibility Books Library allowing access to top content, including thousands of title from favorite author, plus the ability to read or download a huge selection of books for your pc or smartphone within minutes DOWNLOAD THIS BOOKS INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { http://bit.ly/2m6jJ5M } ......................................................................................................................... Download Full EPUB Ebook here { http://bit.ly/2m6jJ5M } ......................................................................................................................... ...................................ALL FOR EBOOKS................................................. Cookbooks, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • If you want to download or read this book, Copy link or url below in the New tab ......................................................................................................................... DOWNLOAD FULL PDF EBOOK here { http://bit.ly/2m6jJ5M } ......................................................................................................................... Download EPUB Ebook here { http://bit.ly/2m6jJ5M } ......................................................................................................................... Download Doc Ebook here { http://bit.ly/2m6jJ5M } ......................................................................................................................... .........................................................................................................................
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • If you want to download or read this book, Copy link or url below in the New tab ......................................................................................................................... DOWNLOAD FULL PDF EBOOK here { http://bit.ly/2m77EgH } ......................................................................................................................... Download EPUB Ebook here { http://bit.ly/2m77EgH } ......................................................................................................................... Download Doc Ebook here { http://bit.ly/2m77EgH } ......................................................................................................................... .........................................................................................................................
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • If you want to download or read this book, copy link or url below in the New tab ......................................................................................................................... DOWNLOAD FULL PDF EBOOK here { http://bit.ly/2m6jJ5M } ......................................................................................................................... Download EPUB Ebook here { http://bit.ly/2m6jJ5M } .........................................................................................................................
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

스사모 테크톡 - GraphX

  1. 1. 2015/05/26 김상우(lightzeus@gmail.com) 정향민(jhyangmin@gmail.com) Graph Analytics in Spark
  2. 2. 소셜 네트워크 사람과 사람, 사람과 SNS와의 관계
  3. 3. 월드와이드웹 하이퍼링크로 연결된 웹 페이지
  4. 4. Graph Vertices + Edges  Graph Vertex • 사람 • 웹 페이지 • … Edge • 인간 관계 • 하이퍼링크 • … Degree: Vertex에 연결된 Edge 수 방향성 존재 가능 Degree:4
  5. 5. PageRank Graph에서 중요한 vertex를 찾는 방법 유저가 하이퍼링크를 따라 사이트를 방문하거나, 임의로 사이트를 방문하는 경우에 가장 중요한 website를 찾는 방법
  6. 6. PageRank Graph에서 중요한 vertex를 찾는 방법 유저가 하이퍼링크를 따라 사이트를 방문하거나, 임의로 사이트를 방문하는 경우에 가장 중요한 website를 찾는 방법
  7. 7. Triangle Counting 선택된 vertex를 기준으로 삼각형 개수 확인
  8. 8. Triangle Counting 선택된 vertex를 기준으로 삼각형 개수 확인
  9. 9. Graph 분석 방식 Graph 분석은 선택된 vertex와 vertex의 이웃으로만 분석 함
  10. 10. Graph 분석 방식 Graph 분석은 선택된 vertex와 vertex의 이웃으로만 분석 함
  11. 11. Table vs. Graph Table Graph Dependency GraphTable Result Row Row Row Row 출처: UC Berkerley Lab
  12. 12. Graph 저장 형태 B C A D F E A DD Property Graph B C D E AA F 출처: UC Berkerley Lab
  13. 13. Graph 저장 형태 Vertex Table (RDD) B C A D F E A DD Property Graph A B C D E A F 출처: UC Berkerley Lab
  14. 14. Graph 저장 형태 Vertex Table (RDD) Property Graph Edge Table (RDD) A B A C C D B C A E A F E F E D B C D E A F 출처: UC Berkerley Lab Part. 2 Part. 1 B C A D F E A D
  15. 15. Graph 저장 형태 Part. 2 Part. 1 Vertex Table (RDD) B C A D F E A D Property Graph Edge Table (RDD) A B A C C D B C A E A F E F E D B C D E A F Routing Table (RDD) B C D E A F 1 2 1 2 1 2 1 2 출처: UC Berkerley Lab
  16. 16. Vertex Table (RDD) GraphX의 분석 과정 Edge Table (RDD) A B A C C D B C A E A F E F E D Mirror Cache B C D A Mirror Cache D E F A B C D E A F 출처: UC Berkerley Lab
  17. 17. Vertex Table (RDD) Edge Table (RDD) A B A C C D B C A E A F E F E D Mirror Cache B C D A Mirror Cache D E F A GraphX의 분석 과정 B C D E A F Change Change 출처: UC Berkerley Lab
  18. 18. Vertex Table (RDD) Edge Table (RDD) A B A C C D B C A E A F E F E D Mirror Cache B C D A Mirror Cache D E F A GraphX의 분석 과정 B C D E A F Scan Change Change Change Change Local Aggregate Local Aggregate B C D F 출처: UC Berkerley Lab Vertex Table (RDD)
  19. 19. GraphX 예제 (1/3) Vertex Id는 숫자형(int, double, … )만 지원함 source destination attribute
  20. 20. GraphX 예제 (2/3) import org.apache.spark._ import org.apache.spark.graphx._ import org.apache.spark.rdd.RDD //vertices RDD 생성 val users: RDD[(VertexId, (String, String))] = sc.parallelize(Array((3L, ("rxin", "student")), (7L, ("jgonzal", "postdoc")), (5L, ("franklin", "prof")), (2L, ("istoica", "prof")))) //Edge RDD 생성 val relationships: RDD[Edge[String]] = sc.parallelize(Array(Edge(3L, 7L, "collab"), Edge(5L, 3L, "advisor"), Edge(2L, 5L, "colleague"), Edge(5L, 7L, "pi"))) //graph 생성 val graph = Graph(users, relationships)
  21. 21. GraphX 예제 (3/3) graph.degrees.collect Array[(org.apache.spark.graphx.VertexId, Int)] = Array((2,1), (3,2), (5,3), (7,2)) graph.edges.collect Array[org.apache.spark.graphx.Edge[String]] = Array(Edge(3,7,collab), Edge(5,3,advisor), Edge(2,5,colleague), Edge(5,7,pi)) graph.vertices.collect Array[(org.apache.spark.graphx.VertexId, (String, String))] = Array((2,(istoica,prof)), (3,(rxin,student)), (5,(franklin,prof)), (7,(jgonzal,postdoc))) graph.pageRank(0.1, 0.15).vertices.collect Array[(org.apache.spark.graphx.VertexId, Double)] = Array((2,0.15), (3,0.2679375), (5,0.2775), (7,0.3954375)) graph.triangleCount.vertices.collect Array[(org.apache.spark.graphx.VertexId, Int)] = Array((2,0), (3,1), (5,1), (7,1))
  22. 22. PageRank Benchmark Good! 출처: UC Berkerley Lab
  23. 23. Graph-Parallel 분석 • Collaborative Filtering  Alternating Least Squares  Stochastic Gradient Descent  Tensor Factorization • Graph Analytics  PageRank  Triangle-Counting  Shortest Path • Community Detection  Triangle-Counting  K-core Decomposition  K-Truss  Label Propagation • Classification  Neural Networks
  24. 24. GraphX • Vertex와 edge로 이루어진 graph 분석 관계 분석 Graph 분석은 Hadoop이나 Naïve Spark보다 빠 름 • Map/Reduce보다 복잡한 분석 • 앞으로의 발전 계획 다양한 알고리즘 시간에 따라 변하는 graph 분석
  25. 25. Q&A

×