ソーシャルグラフのデータ解析

641 views

Published on

  • Be the first to comment

ソーシャルグラフのデータ解析

  1. 1. 11 8 5
  2. 2. • ( ) • @kimuras • G(2007 ) • • •11 8 5
  3. 3. 11 8 5
  4. 4. Agenda • Introduction • The past work • Introduction to GraphDB • Introduction to Neo4j • Introduction to analysis sample11 8 5
  5. 5. Introduction11 8 5
  6. 6. Motivation for social graph analysis11 8 5
  7. 7. mixi 30000000 ID 22500000 # of member id 15000000 7500000 0 2007 2008 2009 2010 2011 year11 8 5
  8. 8. What is Social Graph?11 8 5
  9. 9. 11 8 5
  10. 10. 11 8 5
  11. 11. 11 8 5
  12. 12. 11 8 5
  13. 13. 11 8 5
  14. 14. 11 8 5
  15. 15. 11 8 5
  16. 16. Feed Back11 8 5
  17. 17. Feed Back11 8 5
  18. 18. Feed Back11 8 5
  19. 19. Feed Back11 8 5
  20. 20. Feed Back11 8 5
  21. 21. Approach for SG analysis Feed Back11 8 5
  22. 22. Approach for SG analysis Feed Back11 8 5
  23. 23. Approach for SG analysis Feed Back11 8 5
  24. 24. Approach for SG analysis Feed Back11 8 5
  25. 25. The past work11 8 5
  26. 26. 11 8 5
  27. 27. •11 8 5
  28. 28. • •11 8 5
  29. 29. Relational Databases from_id to_id id name age 1 2 1 Kimura 18 1 3 2 kato 45 2 3 3 ito 2111 8 5
  30. 30. Relational Databases Dump & Denormalization from_id to_id id name age 1 2 1 Kimura 18 1 3 2 kato 45 2 3 3 ito 2111 8 5
  31. 31. Relational Databases Dump & Denormalization from_id to_id id name age Key value 1 2 1 Kimura 18 From:1 2,3 1 3 2 kato 45 From:2 3 2 3 3 ito 21 Prof:1 Kimuras,18 Prof:2 Kato,4511 8 5
  32. 32. Relational Databases Dump & Denormalization from_id to_id id name age Key value 1 2 1 Kimura 18 From:1 2,3 1 3 2 kato 45 From:2 3 2 3 3 ito 21 Prof:1 Kimuras,18 Prof:2 Kato,4511 8 5
  33. 33. Relational Databases Dump & Denormalization from_id to_id id name age Key value 1 2 1 Kimura 18 From:1 2,3 1 3 2 kato 45 From:2 3 2 3 3 ito 21 Prof:1 Kimuras,18 Prof:2 Kato,4511 8 5
  34. 34. Relational Databases Dump & Denormalization from_id to_id id name age Key value 1 2 1 Kimura 18 From:1 2,3 1 3 2 kato 45 From:2 3 2 3 3 ito 21 Prof:1 Kimuras,18 Prof:2 Kato,4511 8 5
  35. 35. Relational Databases Dump & reimplementation Denormalization from_id to_id id name age Key value 1 2 1 Kimura 18 From:1 2,3 1 3 2 kato 45 From:2 3 2 3 3 ito 21 Prof:1 Kimuras,18 Prof:2 Kato,4511 8 5
  36. 36. Relational Databases Dump & reimplementation Denormalization from_id to_id id name age Key value 1 1 2 3 maintenance cost 1 2 Kimura kato 18 45 From:1 From:2 2,3 3 2 3 3 ito 21 Prof:1 Kimuras,18 Prof:2 Kato,4511 8 5
  37. 37. Relational Databases Dump & reimplementation Denormalization from_id to_id id name age Key value 1 1 2 3 maintenance cost 1 2 Kimura kato 18 45 From:1 From:2 2,3 3 2 3 3 ito 21 Prof:1 Kimuras,18 Prof:2 Kato,45 scalability11 8 5
  38. 38. Introduction to GraphDB11 8 5
  39. 39. What is graph11 8 5
  40. 40. What is graph Vertex (node : )11 8 5
  41. 41. What is graph Vertex (node : ) Edge ( )11 8 5
  42. 42. What is graph Vertex (node : ) Undirected graph ( ) Edge ( )11 8 5
  43. 43. What is graph Vertex (node : ) Edge ( )11 8 5
  44. 44. What is graph Vertex (node : ) Edge ( )11 8 5
  45. 45. What is graph Vertex (node : ) Edge ( )11 8 5
  46. 46. What is graph Vertex (node : ) Directed graph ( ) Edge ( )11 8 5
  47. 47. What is GraphDB Vertex (node : ) Edge ( )11 8 5
  48. 48. What is GraphDB ID: 1 Vertex (node : ) NAME: kimura PROP: Male AGE: 18 Edge ( )11 8 5
  49. 49. What is GraphDB ID: 1 Vertex (node : ) NAME: kimura PROP: Male AGE: 18 Edge ( ) ID: 2 NAME: ITO PROP: Female AGE: 2111 8 5
  50. 50. What is GraphDB ID: 1 Vertex (node : ) NAME: kimura PROP: Male AGE: 18 Edge ( ) ID: 2 NAME: ITO PROP: Female AGE: 2111 8 5
  51. 51. What is GraphDB ID: 1 Vertex (node : ) NAME: kimura PROP: Male AGE: 18 Edge ( ) ID: 2 NAME: ITO PROP: Female AGE: 2111 8 5
  52. 52. What is GraphDB ID: 1 Vertex (node : ) NAME: kimura PROP: Male AGE: 18 Edge ( ) ID: 2 ID: 3 NAME: ITO LABEL: Like PROP: Female Since: 2011/08/06 AGE: 21 OutGoing: 211 8 5
  53. 53. What is GraphDB ID: 1 Vertex (node : ) NAME: kimura PROP: Male AGE: 18 Edge ( ) ID: 2 ID: 3 NAME: ITO LABEL: Like PROP: Female Since: 2011/08/06 AGE: 21 OutGoing: 211 8 5
  54. 54. What is GraphDB ID: 1 Vertex (node : ) NAME: kimura PROP: Male AGE: 18 Edge ( ) ID: 2 ID: 3 NAME: ITO LABEL: Like PROP: Female Since: 2011/08/06 AGE: 21 OutGoing: 211 8 5
  55. 55. The implementations for GraphDB http://en.wikipedia.org/wiki/GraphDB11 8 5
  56. 56. Introduction to Neo4j11 8 5
  57. 57. GraphDB Neo4j • True ACID transactions • High availability • Scales to billions of nods and relationships • High speed querying through traversals Single instance(GPLv3) Multiple instance(AGPLv3) Embedded EmbeddedGraphDatabase HighlyAvailableGraphDatabase Standalone Neo4j Server Neo4j Server high availability mode http://neo4j.org/11 8 5
  58. 58. Other my favorite features for Neo4j http://www.tinkerpop.com/post/4633229547/tinkerpop-graph-stack11 8 5
  59. 59. Other my favorite features for Neo4j • RESTful APIs http://www.tinkerpop.com/post/4633229547/tinkerpop-graph-stack11 8 5
  60. 60. Other my favorite features for Neo4j • RESTful APIs • Query Language(Cypher) http://www.tinkerpop.com/post/4633229547/tinkerpop-graph-stack11 8 5
  61. 61. Other my favorite features for Neo4j • RESTful APIs • Query Language(Cypher) • Full indexing – lucene http://www.tinkerpop.com/post/4633229547/tinkerpop-graph-stack11 8 5
  62. 62. Other my favorite features for Neo4j • RESTful APIs • Query Language(Cypher) • Full indexing – lucene • Implemented graph algorithm – A*, Dijkstra – High speed traverse http://www.tinkerpop.com/post/4633229547/tinkerpop-graph-stack11 8 5
  63. 63. Other my favorite features for Neo4j • RESTful APIs • Query Language(Cypher) • Full indexing – lucene • Implemented graph algorithm – A*, Dijkstra – High speed traverse • Gremlin supported – Like a query language http://www.tinkerpop.com/post/4633229547/tinkerpop-graph-stack11 8 5
  64. 64. Introduction simple Neo4j usecase Single node Multi node Embedded Server11 8 5
  65. 65. Introduction simple Neo4j usecase Single node Multi node Embedded Analyses system Server11 8 5
  66. 66. Introduction simple Neo4j usecase Single node Multi node Embedded Analyses system Analyses system Server11 8 5
  67. 67. Introduction simple Neo4j usecase Single node Multi node Embedded Analyses system Analyses system Analyses system Server11 8 5
  68. 68. Introduction simple Neo4j usecase Single node Multi node Embedded Analyses system Analyses system Analyses system Analyses system Server11 8 5
  69. 69. Introduction simple Neo4j usecase Single node Multi node Embedded Analyses system Analyses system Analyses system Analyses system Server11 8 5
  70. 70. Introduction simple Neo4j usecase Single node Multi node Analyses system Embedded Analyses system Analyses system Analyses system Server11 8 5
  71. 71. Introduction simple Neo4j usecase Single node Multi node Analyses system Embedded Analyses system Analyses system Analyses system Server11 8 5
  72. 72. Introduction to simple embedded Neo4j • Insert Vertices & make Relationships • Single node & Embedded • Traversal sample11 8 5
  73. 73. Insert vertices, make relationship public final class InputVertex { public static void main(final String[] args) { GraphDatabaseService graphDb = new EmbeddedGraphDatabase("/tmp/neo4j"); Transaction tx = graphDb.beginTx(); try { Node firstNode = graphDb.createNode(); firstNode.setProperty("Name", "Kimura"); Node secondNode = graphDb.createNode(); secondNode.setProperty("Name", "Kato"); firstNode.createRelationshipTo(secondNode, DynamicRelationshipType.withName("LIKE")); tx.success(); } finally { tx.finish(); } graphDb.shutdown(); } }11 8 5
  74. 74. Insert vertices, make relationship public final class InputVertex { public static void main(final String[] args) { GraphDatabaseService graphDb = new EmbeddedGraphDatabase("/tmp/neo4j"); Transaction tx = graphDb.beginTx(); try { Node firstNode = graphDb.createNode(); firstNode.setProperty("Name", "Kimura"); Node secondNode = graphDb.createNode(); secondNode.setProperty("Name", "Kato"); firstNode.createRelationshipTo(secondNode, DynamicRelationshipType.withName("LIKE")); tx.success(); } finally { tx.finish(); } graphDb.shutdown(); } }11 8 5
  75. 75. Insert vertices, make relationship public final class InputVertex { public static void main(final String[] args) { ID: 1 GraphDatabaseService graphDb = new NAME: kimura EmbeddedGraphDatabase("/tmp/neo4j"); Transaction tx = graphDb.beginTx(); try { Node firstNode = graphDb.createNode(); firstNode.setProperty("Name", "Kimura"); Node secondNode = graphDb.createNode(); secondNode.setProperty("Name", "Kato"); firstNode.createRelationshipTo(secondNode, DynamicRelationshipType.withName("LIKE")); tx.success(); } finally { tx.finish(); } graphDb.shutdown(); } }11 8 5
  76. 76. Insert vertices, make relationship public final class InputVertex { public static void main(final String[] args) { ID: 1 GraphDatabaseService graphDb = new NAME: kimura EmbeddedGraphDatabase("/tmp/neo4j"); Transaction tx = graphDb.beginTx(); try { Node firstNode = graphDb.createNode(); firstNode.setProperty("Name", "Kimura"); Node secondNode = graphDb.createNode(); secondNode.setProperty("Name", "Kato"); firstNode.createRelationshipTo(secondNode, DynamicRelationshipType.withName("LIKE")); tx.success(); } finally { tx.finish(); } graphDb.shutdown(); } }11 8 5
  77. 77. Insert vertices, make relationship public final class InputVertex { public static void main(final String[] args) { ID: 1 GraphDatabaseService graphDb = new NAME: kimura EmbeddedGraphDatabase("/tmp/neo4j"); Transaction tx = graphDb.beginTx(); try { Node firstNode = graphDb.createNode(); firstNode.setProperty("Name", "Kimura"); Node secondNode = graphDb.createNode(); secondNode.setProperty("Name", "Kato"); firstNode.createRelationshipTo(secondNode, DynamicRelationshipType.withName("LIKE")); tx.success(); } finally { ID: 2 tx.finish(); NAME: Kato } graphDb.shutdown(); } }11 8 5
  78. 78. Insert vertices, make relationship public final class InputVertex { public static void main(final String[] args) { ID: 1 GraphDatabaseService graphDb = new NAME: kimura EmbeddedGraphDatabase("/tmp/neo4j"); Transaction tx = graphDb.beginTx(); try { Node firstNode = graphDb.createNode(); firstNode.setProperty("Name", "Kimura"); Node secondNode = graphDb.createNode(); secondNode.setProperty("Name", "Kato"); firstNode.createRelationshipTo(secondNode, DynamicRelationshipType.withName("LIKE")); tx.success(); } finally { ID: 2 tx.finish(); NAME: Kato } graphDb.shutdown(); } }11 8 5
  79. 79. Insert vertices, make relationship public final class InputVertex { public static void main(final String[] args) { ID: 1 GraphDatabaseService graphDb = new NAME: kimura EmbeddedGraphDatabase("/tmp/neo4j"); Transaction tx = graphDb.beginTx(); try { Node firstNode = graphDb.createNode(); ID: 3 firstNode.setProperty("Name", "Kimura"); Relation: Like Node secondNode = graphDb.createNode(); secondNode.setProperty("Name", "Kato"); firstNode.createRelationshipTo(secondNode, DynamicRelationshipType.withName("LIKE")); tx.success(); } finally { ID: 2 tx.finish(); NAME: Kato } graphDb.shutdown(); } }11 8 5
  80. 80. Batch Insert • Non thread safe, non transaction • But very fast! public final class Batch { public static void main(final String[] args) { BatchInserter inserter = new BatchInserterImpl("/tmp/neo4j", BatchInserterImpl.loadProperties("/tmp/neo4j.props")); Map<String, Object> prop = new HashMap<String, Object>(); prop.put("Name", "Kimura"); prop.put("Age", 21); long node1 = inserter.createNode(prop); prop.put("Name", "Kato"); prop.put("Age", 21); long node2 = inserter.createNode(prop); inserter.createRelationship(node1, node2, DynamicRelationshipType.withName("LIKE"), null); inserter.shutdown(); } }11 8 5
  81. 81. Traversal sample • public static void main(final String[] args) { GraphDatabaseService graphDB = new EmbeddedGraphDatabase(args[0]); Node node = graphDB.getNodeById(1); Traverser friends = node.traverse( Order.DEPTH_FIRST, StopEvaluator.END_OF_GRAPH, ReturnableEvaluator.ALL_BUT_START_NODE, DynamicRelationshipType.withName("LIKE"), Direction.OUTGOING); for (Node nodeBuf : friends) { TraversalPosition currentPosition = friends.currentPosition(); } }11 8 5
  82. 82. Traversal sample • public static void main(final String[] args) { GraphDatabaseService graphDB = new EmbeddedGraphDatabase(args[0]); Node node = graphDB.getNodeById(1); Traverser friends = node.traverse( // Order.DEPTH_FIRST, BREADTH_FIRST StopEvaluator.END_OF_GRAPH, ReturnableEvaluator.ALL_BUT_START_NODE, DynamicRelationshipType.withName("LIKE"), Direction.OUTGOING); for (Node nodeBuf : friends) { TraversalPosition currentPosition = friends.currentPosition(); } }11 8 5
  83. 83. Traversal sample • public static void main(final String[] args) { GraphDatabaseService graphDB = new EmbeddedGraphDatabase(args[0]); Node node = graphDB.getNodeById(1); Traverser friends = node.traverse( // Order.DEPTH_FIRST, BREADTH_FIRST // StopEvaluator.END_OF_GRAPH, DEPTH_ONE ReturnableEvaluator.ALL_BUT_START_NODE, DynamicRelationshipType.withName("LIKE"), Direction.OUTGOING); for (Node nodeBuf : friends) { TraversalPosition currentPosition = friends.currentPosition(); } }11 8 5
  84. 84. Traversal sample • public static void main(final String[] args) { GraphDatabaseService graphDB = new EmbeddedGraphDatabase(args[0]); Node node = graphDB.getNodeById(1); Traverser friends = node.traverse( // Order.DEPTH_FIRST, BREADTH_FIRST // StopEvaluator.END_OF_GRAPH, DEPTH_ONE // ReturnableEvaluator.ALL_BUT_START_NODE, ALL, isReturnableNode() DynamicRelationshipType.withName("LIKE"), Direction.OUTGOING); for (Node nodeBuf : friends) { TraversalPosition currentPosition = friends.currentPosition(); } }11 8 5
  85. 85. Traversal sample • public static void main(final String[] args) { GraphDatabaseService graphDB = new EmbeddedGraphDatabase(args[0]); Node node = graphDB.getNodeById(1); Traverser friends = node.traverse( // Order.DEPTH_FIRST, BREADTH_FIRST // StopEvaluator.END_OF_GRAPH, DEPTH_ONE // ReturnableEvaluator.ALL_BUT_START_NODE, ALL, isReturnableNode() // DynamicRelationshipType.withName("LIKE"), Direction.OUTGOING); for (Node nodeBuf : friends) { TraversalPosition currentPosition = friends.currentPosition(); } }11 8 5
  86. 86. Traversal sample • public static void main(final String[] args) { GraphDatabaseService graphDB = new EmbeddedGraphDatabase(args[0]); Node node = graphDB.getNodeById(1); Traverser friends = node.traverse( // Order.DEPTH_FIRST, BREADTH_FIRST // StopEvaluator.END_OF_GRAPH, DEPTH_ONE // ReturnableEvaluator.ALL_BUT_START_NODE, ALL, isReturnableNode() // DynamicRelationshipType.withName("LIKE"), // Direction.OUTGOING); INCOMING, BOTH for (Node nodeBuf : friends) { TraversalPosition currentPosition = friends.currentPosition(); } }11 8 5
  87. 87. Traversal sample Order.BREADTH_FIRST •11 8 5
  88. 88. Traversal sample Order.BREADTH_FIRST •11 8 5
  89. 89. Traversal sample Order.BREADTH_FIRST •11 8 5
  90. 90. Traversal sample Order.BREADTH_FIRST •11 8 5
  91. 91. Traversal sample Order.BREADTH_FIRST •11 8 5
  92. 92. Traversal sample Order.BREADTH_FIRST •11 8 5
  93. 93. Traversal sample Order.DEPTH_FIRST •11 8 5
  94. 94. Traversal sample Order.DEPTH_FIRST •11 8 5
  95. 95. Traversal sample Order.DEPTH_FIRST •11 8 5
  96. 96. Traversal sample Order.DEPTH_FIRST •11 8 5
  97. 97. Traversal sample Order.DEPTH_FIRST •11 8 5
  98. 98. Traversal sample Order.DEPTH_FIRST •11 8 5
  99. 99. Neoclipse sample http://wiki.neo4j.org/content/Neoclipse11 8 5
  100. 100. experiment11 8 5
  101. 101. experiment • mixi Neo4j • • Machine: 24 core CPU, Memory 65GB • Neo4j: BatchInsert, community, embedded • Data • 1.5 6011 8 5
  102. 102. experiment • mixi Neo4j • • Machine: 24 core CPU, Memory 65GB • Neo4j: BatchInsert, community, embedded • Data • 1.5 60 513m17sec (about 8.6h)11 8 5
  103. 103. Network Dataset • Stanford Large Network Dataset Collection • SNAP has a Wide variety of graph data! Social Networks Communication networks Citation networks Collaboration networks Web graphs Product co-purchasing networks Internet peer-to-peer networks Road networks Autonomous systems graphs Signed networks Wikipedia networks and metadata Memetracker and Twitter http://snap.stanford.edu/data/index.html11 8 5
  104. 104. Introduction to Analysis Sample11 8 5
  105. 105. Architecture Service Database Analyses Visualization (Social Graph)11 8 5
  106. 106. Architecture Service Database Analyses Visualization (Social Graph)11 8 5
  107. 107. Introduction Analyses Sample • Centrarity ( ) • Clustering coefficient ( )11 8 5
  108. 108. Centrality ( ) • =11 8 5
  109. 109. Centrality ( ) • =11 8 5
  110. 110. Centrality ( ) • =11 8 5
  111. 111. Centrality ( ) • =11 8 5
  112. 112. Centrality ( ) • = Pagerank11 8 5
  113. 113. Centrality ( ) • = Pagerank11 8 5
  114. 114. Centrality ( ) • = Pagerank11 8 5
  115. 115. Centrality ( ) • = Pagerank11 8 5
  116. 116. Centrality ( ) • = Pagerank11 8 5
  117. 117. Centrality ( ) • = Pagerank11 8 5
  118. 118. Centrality ( ) • = Pagerank11 8 5
  119. 119. • • = Vertex ( )11 8 5
  120. 120. • • = Vertex ( ) 1 1 111 8 5
  121. 121. • • = Vertex ( ) 2 1 1 2 1 211 8 5
  122. 122. • • = Vertex ( ) 2 1 1 2 1 211 8 5
  123. 123. • • = Vertex ( ) 2 1 1 4 2 1 211 8 5
  124. 124. • • = Vertex ( ) 2 1 1 4 2 1 211 8 5
  125. 125. mixi • 1000 • summary Min 1st Que. Median Mean 3rd Que. Max 1.00 3.00 10.00 25.69 30.00 903.0011 8 5
  126. 126. mixi11 8 5
  127. 127. • • ≒11 8 5
  128. 128. • • ≒ =0/3=011 8 5
  129. 129. • • ≒ =0/3=0 =1/311 8 5
  130. 130. • • ≒ =0/3=0 =1/3 =2/311 8 5
  131. 131. • • ≒ =0/3=0 =1/3 =2/3 =3/3=111 8 5
  132. 132. • 1000 • summary Min 1st Que. Median Mean 3rd Que. Max 0.00 0.00 0.1157 0.2071 0.2667 1.00011 8 5
  133. 133. 11 8 5
  134. 134. 11 8 5
  135. 135. • 25 0.0811 8 5
  136. 136. • 14 0.1711 8 5
  137. 137. • 10 0.6811 8 5
  138. 138. • 4 111 8 5
  139. 139. Visualization Sample11 8 5
  140. 140. • 2hop Social Graph • Edge • ( ) • Vertex • ( ) • Gephi http://gephi.org/11 8 5
  141. 141. 11 8 5
  142. 142. • Social Graph • • GraphDB • Neo4j • R • Visualization11 8 5
  143. 143. Thanks!11 8 5

×