Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Mining the Social Graph          mixi.inc       Shunya Kimura
Introduction•   Name: Shunya Kimura    •   twitter: @kimuras•   Job:Data mining, Software engineering    •   text mining, ...
Agenda• Introduction• The past work• Introduction to GraphDB• Introduction to Neo4j• Introduction to analysis sample
Introduction
Motivation for social graph analysis Test of millions of nodes, hundreds of millions of edges. The diversity of graph algo...
Number of users on mixi                 30000000                                    ID                 22500000# of member...
What is Social Graph?
Feed Back
Feed Back
Feed Back
Feed Back
Feed Back
Approach for SG analysis           Feed Back
Approach for SG analysis           Feed Back
Approach for SG analysis           Feed Back
Approach for SG analysis           Feed Back
The past work
• Friend recommend
• Friend recommend• Community recommend
Relational Databasesfrom_id    to_id    id   name     age1          2        1    Kimura   181          3        2    kato...
Relational Databases                                        Dump &                                        Denormalizationf...
Relational Databases                                        Dump &                                        Denormalizationf...
Relational Databases                                        Dump &                                        Denormalizationf...
Relational Databases                                        Dump &                                        Denormalizationf...
Relational Databases                                        Dump &                                        Denormalizationf...
Relational Databases                                        Dump &                   reimplementation     Denormalizationf...
Relational Databases                                        Dump &                   reimplementation     Denormalizationf...
Relational Databases                                        Dump &                   reimplementation     Denormalizationf...
Introduction to GraphDB
What is graph
What is graph   Vertex (node)
What is graph       Vertex (node)Edge
What is graph       Vertex (node)            Undirected graphEdge
What is graph       Vertex (node)Edge
What is graph       Vertex (node)Edge
What is graph       Vertex (node)Edge
What is graph       Vertex (node)             Directed graphEdge
What is GraphDB        Vertex (node) Edge
What is GraphDBID:   1               Vertex (node)NAME: kimuraPROP: MaleAGE: 18       Edge
What is GraphDBID:   1               Vertex (node)NAME: kimuraPROP: MaleAGE: 18       Edge                  ID:   2       ...
What is GraphDBID:   1               Vertex (node)NAME: kimuraPROP: MaleAGE: 18       Edge                  ID:   2       ...
What is GraphDBID:   1               Vertex (node)NAME: kimuraPROP: MaleAGE: 18       Edge                  ID:   2       ...
What is GraphDBID:   1                       Vertex (node)NAME: kimuraPROP: MaleAGE: 18        Edge                       ...
What is GraphDBID:   1                       Vertex (node)NAME: kimuraPROP: MaleAGE: 18        Edge                       ...
What is GraphDBID:   1                       Vertex (node)NAME: kimuraPROP: MaleAGE: 18        Edge                       ...
The implementations   for GraphDB  http://en.wikipedia.org/wiki/GraphDB
Introduction to Neo4j
GraphDB Neo4j       •     True ACID transactions       •     High availability       •     Scales to billions of nods and ...
Other my favorite features       for Neo4j          http://www.tinkerpop.com/post/4633229547/tinkerpop-graph-stack
Other my favorite features       for Neo4j• RESTful APIs                 http://www.tinkerpop.com/post/4633229547/tinkerpo...
Other my favorite features       for Neo4j• RESTful APIs• Query Language(Cypher)                 http://www.tinkerpop.com/...
Other my favorite features       for Neo4j• RESTful APIs• Query Language(Cypher)• Full indexing – lucene                 h...
Other my favorite features       for Neo4j• RESTful APIs• Query Language(Cypher)• Full indexing   – lucene• Implemented gr...
Other my favorite features       for Neo4j• RESTful APIs• Query Language(Cypher)• Full indexing   – lucene• Implemented gr...
Introduction simple Neo4j usecase           Single node           Multi nodeEmbeddedServer
Introduction simple Neo4j usecase           Single node           Multi nodeEmbedded           Analyses systemServer
Introduction simple Neo4j usecase           Single node           Multi nodeEmbedded           Analyses system       Analy...
Introduction simple Neo4j usecase           Single node           Multi nodeEmbedded           Analyses system       Analy...
Introduction simple Neo4j usecase           Single node           Multi nodeEmbedded           Analyses system       Analy...
Introduction simple Neo4j usecase           Single node           Multi nodeEmbedded           Analyses system       Analy...
Introduction simple Neo4j usecase              Single node          Multi node           Analyses systemEmbedded          ...
Introduction simple Neo4j usecase              Single node          Multi node           Analyses systemEmbedded          ...
Introduction to simple   embedded Neo4j• Insert Vertices & make Relationships • Single node & Embedded• Traversal sample
Insert vertices,                   make relationshippublic final class InputVertex {    public static void main(final Stri...
Insert vertices,                   make relationshippublic final class InputVertex {    public static void main(final Stri...
Insert vertices,                   make relationshippublic final class InputVertex {    public static void main(final Stri...
Insert vertices,                   make relationshippublic final class InputVertex {    public static void main(final Stri...
Insert vertices,                   make relationshippublic final class InputVertex {    public static void main(final Stri...
Insert vertices,                   make relationshippublic final class InputVertex {    public static void main(final Stri...
Insert vertices,                   make relationshippublic final class InputVertex {    public static void main(final Stri...
Batch Insert    • Non thread safe, non transaction    • But very fast!public final class Batch {    public static void mai...
Traversal sample    • You can specify the traverse criteriapublic static void main(final String[] args) {        GraphData...
Traversal sample    • You can specify the traverse criteriapublic static void main(final String[] args) {        GraphData...
Traversal sample    • You can specify the traverse criteriapublic static void main(final String[] args) {        GraphData...
Traversal sample    • You can specify the traverse criteriapublic static void main(final String[] args) {        GraphData...
Traversal sample    • You can specify the traverse criteriapublic static void main(final String[] args) {        GraphData...
Traversal sample    • You can specify the traverse criteriapublic static void main(final String[] args) {        GraphData...
Traversal sample   Order.BREADTH_FIRST• Breadth-first search
Traversal sample   Order.BREADTH_FIRST• Breadth-first search
Traversal sample   Order.BREADTH_FIRST• Breadth-first search
Traversal sample   Order.BREADTH_FIRST• Breadth-first search
Traversal sample   Order.BREADTH_FIRST• Breadth-first search
Traversal sample   Order.BREADTH_FIRST• Breadth-first search
Traversal sample     Order.DEPTH_FIRST• Depth-first search
Traversal sample     Order.DEPTH_FIRST• Depth-first search
Traversal sample     Order.DEPTH_FIRST• Depth-first search
Traversal sample     Order.DEPTH_FIRST• Depth-first search
Traversal sample     Order.DEPTH_FIRST• Depth-first search
Traversal sample     Order.DEPTH_FIRST• Depth-first search
Neoclipse sample       http://wiki.neo4j.org/content/Neoclipse
experiment
experiment•   Store the mixi’s social graph for Neo4j•   Condition    •   Machine: 24 core CPU, Memory 65GB    •   Neo4j: ...
experiment•   Store the mixi’s social graph for Neo4j•   Condition    •   Machine: 24 core CPU, Memory 65GB    •   Neo4j: ...
Network Dataset•   Stanford Large Network Dataset Collection    •    SNAP has a Wide variety of graph data!             So...
Introduction to Analysis        Sample
Architecture   Service                  Database   Analysis   Visualization(Social Graph)
Architecture   Service                  Database   Analysis   Visualization(Social Graph)
Introduction Analyses          Sample• Centrality• Clustering coefficient
Centrality• Centrality • to measure the importance of eahc nodes
Centrality• Centrality • to measure the importance of eahc nodes
Centrality• Centrality • to measure the importance of eahc nodes
Centrality       • Centrality        • to measure the importance of eahc nodescloseness centrality
Centrality       • Centrality        • to measure the importance of eahc nodescloseness centrality    Pagerank
Centrality       • Centrality        • to measure the importance of eahc nodescloseness centrality    Pagerank degree cent...
Centrality       • Centrality        • to measure the importance of eahc nodescloseness centrality       Pagerank degree c...
Centrality        • Centrality         • to measure the importance of eahc nodescloseness centrality         Pagerank degr...
Centrality        • Centrality         • to measure the importance of eahc nodescloseness centrality         Pagerank degr...
Centrality            • Centrality             • to measure the importance of eahc nodes   closeness centrality        Pag...
Centrality            • Centrality             • to measure the importance of eahc nodes   closeness centrality        Pag...
Degree centrality•   The simplest measuring.    •   Counting the number of edge of each nodes.    •     num of friends
Degree centrality•   The simplest measuring.    •   Counting the number of edge of each nodes.    •     num of friends    ...
Degree centrality•   The simplest measuring.    •   Counting the number of edge of each nodes.    •     num of friends    ...
Degree centrality•   The simplest measuring.    •   Counting the number of edge of each nodes.    •     num of friends    ...
Degree centrality•   The simplest measuring.    •   Counting the number of edge of each nodes.    •     num of friends    ...
Degree centrality•   The simplest measuring.    •   Counting the number of edge of each nodes.    •     num of friends    ...
Degree distribution of mixi •     Random sampling the 1000 users •     the summary of degree sistributionMin       1st Que...
Degree distribution of mixi
Clustering coefficient•   Network destiny around any node.    •   ≒ destiny relationship
Clustering coefficient•   Network destiny around any node.    •   ≒ destiny relationship                                 cl...
Clustering coefficient•   Network destiny around any node.    •   ≒ destiny relationship                                 cl...
Clustering coefficient•   Network destiny around any node.    •   ≒ destiny relationship                                 cl...
Clustering coefficient•   Network destiny around any node.    •   ≒ destiny relationship                                 cl...
Clustering coefficient •     Random sampling the 1000 users •     summary for Clustering coefficientMin       1st Que. Media...
Clustering coefficient
Clustering coefficient
the sample of low Clustering           coefficient user•   degree 25, clustering coefficient   0.08
the sample of middle      Clustering coefficient user•   degree 14,   clustering coefficient   0.17
the sample of high Clustering           coefficient user•   degree 10,   clustering coefficient   0.68
the sample of MAX Clustering           coefficient user•   degree 4,   clustering coefficient   1
Visualization Sample
•   Visualize a my social graph on mixi•   Weighting the Edge    •   Amount of communication(color, thickness)•   Weightin...
•   Motivation for Social Graph mining•   Overview for GraphDB•   Introduction for Neo4j•   The samples for graph analysis...
Thanks!
Mining the social graph
Mining the social graph
Mining the social graph
Mining the social graph
Mining the social graph
Mining the social graph
Mining the social graph
Mining the social graph
Mining the social graph
Upcoming SlideShare
Loading in …5
×

Mining the social graph

Introduction for mining the social graph with Neo4j and R and Gephi.

  • Be the first to comment

Mining the social graph

  1. 1. Mining the Social Graph mixi.inc Shunya Kimura
  2. 2. Introduction• Name: Shunya Kimura • twitter: @kimuras• Job:Data mining, Software engineering • text mining, graph mining, search engine
  3. 3. Agenda• Introduction• The past work• Introduction to GraphDB• Introduction to Neo4j• Introduction to analysis sample
  4. 4. Introduction
  5. 5. Motivation for social graph analysis Test of millions of nodes, hundreds of millions of edges. The diversity of graph algorithm by developing distributed processing technology. Challenging.
  6. 6. Number of users on mixi 30000000 ID 22500000# of member id 15000000 7500000 0 2007 2008 2009 2010 2011 year
  7. 7. What is Social Graph?
  8. 8. Feed Back
  9. 9. Feed Back
  10. 10. Feed Back
  11. 11. Feed Back
  12. 12. Feed Back
  13. 13. Approach for SG analysis Feed Back
  14. 14. Approach for SG analysis Feed Back
  15. 15. Approach for SG analysis Feed Back
  16. 16. Approach for SG analysis Feed Back
  17. 17. The past work
  18. 18. • Friend recommend
  19. 19. • Friend recommend• Community recommend
  20. 20. Relational Databasesfrom_id to_id id name age1 2 1 Kimura 181 3 2 kato 452 3 3 ito 21
  21. 21. Relational Databases Dump & Denormalizationfrom_id to_id id name age1 2 1 Kimura 181 3 2 kato 452 3 3 ito 21
  22. 22. Relational Databases Dump & Denormalizationfrom_id to_id id name age Key value1 2 1 Kimura 18 From:1 2,31 3 2 kato 45 From:2 32 3 3 ito 21 Prof:1 Kimura,18 Prof:2 Kato,45
  23. 23. Relational Databases Dump & Denormalizationfrom_id to_id id name age Key value1 2 1 Kimura 18 From:1 2,31 3 2 kato 45 From:2 32 3 3 ito 21 Prof:1 Kimura,18 Prof:2 Kato,45
  24. 24. Relational Databases Dump & Denormalizationfrom_id to_id id name age Key value1 2 1 Kimura 18 From:1 2,31 3 2 kato 45 From:2 32 3 3 ito 21 Prof:1 Kimura,18 Prof:2 Kato,45
  25. 25. Relational Databases Dump & Denormalizationfrom_id to_id id name age Key value1 2 1 Kimura 18 From:1 2,31 3 2 kato 45 From:2 32 3 3 ito 21 Prof:1 Kimuras,18 Prof:2 Kato,45
  26. 26. Relational Databases Dump & reimplementation Denormalizationfrom_id to_id id name age Key value1 2 1 Kimura 18 From:1 2,31 3 2 kato 45 From:2 32 3 3 ito 21 Prof:1 Kimuras,18 Prof:2 Kato,45
  27. 27. Relational Databases Dump & reimplementation Denormalizationfrom_id to_id id name age Key value11 2 3 maintenance cost 1 2 Kimura kato 18 45 From:1 From:2 2,3 32 3 3 ito 21 Prof:1 Kimuras,18 Prof:2 Kato,45
  28. 28. Relational Databases Dump & reimplementation Denormalizationfrom_id to_id id name age Key value11 2 3 maintenance cost 1 2 Kimura kato 18 45 From:1 From:2 2,3 32 3 3 ito 21 Prof:1 Kimuras,18 Prof:2 Kato,45 scalability
  29. 29. Introduction to GraphDB
  30. 30. What is graph
  31. 31. What is graph Vertex (node)
  32. 32. What is graph Vertex (node)Edge
  33. 33. What is graph Vertex (node) Undirected graphEdge
  34. 34. What is graph Vertex (node)Edge
  35. 35. What is graph Vertex (node)Edge
  36. 36. What is graph Vertex (node)Edge
  37. 37. What is graph Vertex (node) Directed graphEdge
  38. 38. What is GraphDB Vertex (node) Edge
  39. 39. What is GraphDBID: 1 Vertex (node)NAME: kimuraPROP: MaleAGE: 18 Edge
  40. 40. What is GraphDBID: 1 Vertex (node)NAME: kimuraPROP: MaleAGE: 18 Edge ID: 2 NAME: ITO PROP: Female AGE: 21
  41. 41. What is GraphDBID: 1 Vertex (node)NAME: kimuraPROP: MaleAGE: 18 Edge ID: 2 NAME: ITO PROP: Female AGE: 21
  42. 42. What is GraphDBID: 1 Vertex (node)NAME: kimuraPROP: MaleAGE: 18 Edge ID: 2 NAME: ITO PROP: Female AGE: 21
  43. 43. What is GraphDBID: 1 Vertex (node)NAME: kimuraPROP: MaleAGE: 18 Edge ID: 2ID: 3 NAME: ITOLABEL: Like PROP: FemaleSince: 2011/08/06 AGE: 21OutGoing: 2
  44. 44. What is GraphDBID: 1 Vertex (node)NAME: kimuraPROP: MaleAGE: 18 Edge ID: 2ID: 3 NAME: ITOLABEL: Like PROP: FemaleSince: 2011/08/06 AGE: 21OutGoing: 2
  45. 45. What is GraphDBID: 1 Vertex (node)NAME: kimuraPROP: MaleAGE: 18 Edge ID: 2ID: 3 NAME: ITOLABEL: Like PROP: FemaleSince: 2011/08/06 AGE: 21OutGoing: 2
  46. 46. The implementations for GraphDB http://en.wikipedia.org/wiki/GraphDB
  47. 47. Introduction to Neo4j
  48. 48. GraphDB Neo4j • True ACID transactions • High availability • Scales to billions of nods and relationships • High speed querying through traversals Single instance(GPLv3) Multiple instance(AGPLv3)Embedded EmbeddedGraphDatabase HighlyAvailableGraphDatabaseStandalone Neo4j Server Neo4j Server high availability mode http://neo4j.org/
  49. 49. Other my favorite features for Neo4j http://www.tinkerpop.com/post/4633229547/tinkerpop-graph-stack
  50. 50. Other my favorite features for Neo4j• RESTful APIs http://www.tinkerpop.com/post/4633229547/tinkerpop-graph-stack
  51. 51. Other my favorite features for Neo4j• RESTful APIs• Query Language(Cypher) http://www.tinkerpop.com/post/4633229547/tinkerpop-graph-stack
  52. 52. Other my favorite features for Neo4j• RESTful APIs• Query Language(Cypher)• Full indexing – lucene http://www.tinkerpop.com/post/4633229547/tinkerpop-graph-stack
  53. 53. Other my favorite features for Neo4j• RESTful APIs• Query Language(Cypher)• Full indexing – lucene• Implemented graph algorithm – A*, Dijkstra – High speed traverse http://www.tinkerpop.com/post/4633229547/tinkerpop-graph-stack
  54. 54. Other my favorite features for Neo4j• RESTful APIs• Query Language(Cypher)• Full indexing – lucene• Implemented graph algorithm – A*, Dijkstra – High speed traverse• Gremlin supported – Like a query language http://www.tinkerpop.com/post/4633229547/tinkerpop-graph-stack
  55. 55. Introduction simple Neo4j usecase Single node Multi nodeEmbeddedServer
  56. 56. Introduction simple Neo4j usecase Single node Multi nodeEmbedded Analyses systemServer
  57. 57. Introduction simple Neo4j usecase Single node Multi nodeEmbedded Analyses system Analyses systemServer
  58. 58. Introduction simple Neo4j usecase Single node Multi nodeEmbedded Analyses system Analyses system Analyses systemServer
  59. 59. Introduction simple Neo4j usecase Single node Multi nodeEmbedded Analyses system Analyses system Analyses system Analyses systemServer
  60. 60. Introduction simple Neo4j usecase Single node Multi nodeEmbedded Analyses system Analyses system Analyses system Analyses systemServer
  61. 61. Introduction simple Neo4j usecase Single node Multi node Analyses systemEmbedded Analyses system Analyses system Analyses systemServer
  62. 62. Introduction simple Neo4j usecase Single node Multi node Analyses systemEmbedded Analyses system Analyses system Analyses systemServer
  63. 63. Introduction to simple embedded Neo4j• Insert Vertices & make Relationships • Single node & Embedded• Traversal sample
  64. 64. Insert vertices, make relationshippublic final class InputVertex { public static void main(final String[] args) { GraphDatabaseService graphDb = new EmbeddedGraphDatabase("/tmp/neo4j"); Transaction tx = graphDb.beginTx(); try { Node firstNode = graphDb.createNode(); firstNode.setProperty("Name", "Kimura"); Node secondNode = graphDb.createNode(); secondNode.setProperty("Name", "Kato"); firstNode.createRelationshipTo(secondNode, DynamicRelationshipType.withName("LIKE")); tx.success(); } finally { tx.finish(); } graphDb.shutdown(); }}
  65. 65. Insert vertices, make relationshippublic final class InputVertex { public static void main(final String[] args) { GraphDatabaseService graphDb = new EmbeddedGraphDatabase("/tmp/neo4j"); Transaction tx = graphDb.beginTx(); try { Node firstNode = graphDb.createNode(); firstNode.setProperty("Name", "Kimura"); Node secondNode = graphDb.createNode(); secondNode.setProperty("Name", "Kato"); firstNode.createRelationshipTo(secondNode, DynamicRelationshipType.withName("LIKE")); tx.success(); } finally { tx.finish(); } graphDb.shutdown(); }}
  66. 66. Insert vertices, make relationshippublic final class InputVertex { public static void main(final String[] args) { ID: 1 GraphDatabaseService graphDb = new NAME: kimura EmbeddedGraphDatabase("/tmp/neo4j"); Transaction tx = graphDb.beginTx(); try { Node firstNode = graphDb.createNode(); firstNode.setProperty("Name", "Kimura"); Node secondNode = graphDb.createNode(); secondNode.setProperty("Name", "Kato"); firstNode.createRelationshipTo(secondNode, DynamicRelationshipType.withName("LIKE")); tx.success(); } finally { tx.finish(); } graphDb.shutdown(); }}
  67. 67. Insert vertices, make relationshippublic final class InputVertex { public static void main(final String[] args) { ID: 1 GraphDatabaseService graphDb = new NAME: kimura EmbeddedGraphDatabase("/tmp/neo4j"); Transaction tx = graphDb.beginTx(); try { Node firstNode = graphDb.createNode(); firstNode.setProperty("Name", "Kimura"); Node secondNode = graphDb.createNode(); secondNode.setProperty("Name", "Kato"); firstNode.createRelationshipTo(secondNode, DynamicRelationshipType.withName("LIKE")); tx.success(); } finally { tx.finish(); } graphDb.shutdown(); }}
  68. 68. Insert vertices, make relationshippublic final class InputVertex { public static void main(final String[] args) { ID: 1 GraphDatabaseService graphDb = new NAME: kimura EmbeddedGraphDatabase("/tmp/neo4j"); Transaction tx = graphDb.beginTx(); try { Node firstNode = graphDb.createNode(); firstNode.setProperty("Name", "Kimura"); Node secondNode = graphDb.createNode(); secondNode.setProperty("Name", "Kato"); firstNode.createRelationshipTo(secondNode, DynamicRelationshipType.withName("LIKE")); tx.success(); } finally { ID: 2 tx.finish(); NAME: Kato } graphDb.shutdown(); }}
  69. 69. Insert vertices, make relationshippublic final class InputVertex { public static void main(final String[] args) { ID: 1 GraphDatabaseService graphDb = new NAME: kimura EmbeddedGraphDatabase("/tmp/neo4j"); Transaction tx = graphDb.beginTx(); try { Node firstNode = graphDb.createNode(); firstNode.setProperty("Name", "Kimura"); Node secondNode = graphDb.createNode(); secondNode.setProperty("Name", "Kato"); firstNode.createRelationshipTo(secondNode, DynamicRelationshipType.withName("LIKE")); tx.success(); } finally { ID: 2 tx.finish(); NAME: Kato } graphDb.shutdown(); }}
  70. 70. Insert vertices, make relationshippublic final class InputVertex { public static void main(final String[] args) { ID: 1 GraphDatabaseService graphDb = new NAME: kimura EmbeddedGraphDatabase("/tmp/neo4j"); Transaction tx = graphDb.beginTx(); try { Node firstNode = graphDb.createNode(); ID: 3 firstNode.setProperty("Name", "Kimura"); Relation: Like Node secondNode = graphDb.createNode(); secondNode.setProperty("Name", "Kato"); firstNode.createRelationshipTo(secondNode, DynamicRelationshipType.withName("LIKE")); tx.success(); } finally { ID: 2 tx.finish(); NAME: Kato } graphDb.shutdown(); }}
  71. 71. Batch Insert • Non thread safe, non transaction • But very fast!public final class Batch { public static void main(final String[] args) { BatchInserter inserter = new BatchInserterImpl("/tmp/neo4j", BatchInserterImpl.loadProperties("/tmp/neo4j.props")); Map<String, Object> prop = new HashMap<String, Object>(); prop.put("Name", "Kimura"); prop.put("Age", 21); long node1 = inserter.createNode(prop); prop.put("Name", "Kato"); prop.put("Age", 21); long node2 = inserter.createNode(prop); inserter.createRelationship(node1, node2, DynamicRelationshipType.withName("LIKE"), null); inserter.shutdown(); }}
  72. 72. Traversal sample • You can specify the traverse criteriapublic static void main(final String[] args) { GraphDatabaseService graphDB = new EmbeddedGraphDatabase(args[0]); Node node = graphDB.getNodeById(1); Traverser friends = node.traverse( Order.DEPTH_FIRST, StopEvaluator.END_OF_GRAPH, ReturnableEvaluator.ALL_BUT_START_NODE, DynamicRelationshipType.withName("LIKE"), Direction.OUTGOING); for (Node nodeBuf : friends) { TraversalPosition currentPosition = friends.currentPosition(); } }
  73. 73. Traversal sample • You can specify the traverse criteriapublic static void main(final String[] args) { GraphDatabaseService graphDB = new EmbeddedGraphDatabase(args[0]); Node node = graphDB.getNodeById(1); Traverser friends = node.traverse( //how to traversal Order.DEPTH_FIRST, BREADTH_FIRST StopEvaluator.END_OF_GRAPH, ReturnableEvaluator.ALL_BUT_START_NODE, DynamicRelationshipType.withName("LIKE"), Direction.OUTGOING); for (Node nodeBuf : friends) { TraversalPosition currentPosition = friends.currentPosition(); } }
  74. 74. Traversal sample • You can specify the traverse criteriapublic static void main(final String[] args) { GraphDatabaseService graphDB = new EmbeddedGraphDatabase(args[0]); Node node = graphDB.getNodeById(1); Traverser friends = node.traverse( //how to traversal Order.DEPTH_FIRST, BREADTH_FIRST //traversal termination condition StopEvaluator.END_OF_GRAPH, DEPTH_ONE ReturnableEvaluator.ALL_BUT_START_NODE, DynamicRelationshipType.withName("LIKE"), Direction.OUTGOING); for (Node nodeBuf : friends) { TraversalPosition currentPosition = friends.currentPosition(); } }
  75. 75. Traversal sample • You can specify the traverse criteriapublic static void main(final String[] args) { GraphDatabaseService graphDB = new EmbeddedGraphDatabase(args[0]); Node node = graphDB.getNodeById(1); Traverser friends = node.traverse( //how to traversal Order.DEPTH_FIRST, BREADTH_FIRST //traversal termination condition StopEvaluator.END_OF_GRAPH, DEPTH_ONE // to get the type of node ReturnableEvaluator.ALL_BUT_START_NODE, ALL, isReturnableNode() DynamicRelationshipType.withName("LIKE"), Direction.OUTGOING); for (Node nodeBuf : friends) { TraversalPosition currentPosition = friends.currentPosition(); } }
  76. 76. Traversal sample • You can specify the traverse criteriapublic static void main(final String[] args) { GraphDatabaseService graphDB = new EmbeddedGraphDatabase(args[0]); Node node = graphDB.getNodeById(1); Traverser friends = node.traverse( //how to traversal Order.DEPTH_FIRST, BREADTH_FIRST //traversal termination condition StopEvaluator.END_OF_GRAPH, DEPTH_ONE // to get the type of node ReturnableEvaluator.ALL_BUT_START_NODE, ALL, isReturnableNode() // type of relational for traverse DynamicRelationshipType.withName("LIKE"), Direction.OUTGOING); for (Node nodeBuf : friends) { TraversalPosition currentPosition = friends.currentPosition(); } }
  77. 77. Traversal sample • You can specify the traverse criteriapublic static void main(final String[] args) { GraphDatabaseService graphDB = new EmbeddedGraphDatabase(args[0]); Node node = graphDB.getNodeById(1); Traverser friends = node.traverse( //how to traversal Order.DEPTH_FIRST, BREADTH_FIRST //traversal termination condition StopEvaluator.END_OF_GRAPH, DEPTH_ONE // to get the type of node ReturnableEvaluator.ALL_BUT_START_NODE, ALL, isReturnableNode() // type of relational for traverse DynamicRelationshipType.withName("LIKE"), // specify a edge type for traverse Direction.OUTGOING); INCOMING, BOTH for (Node nodeBuf : friends) { TraversalPosition currentPosition = friends.currentPosition(); } }
  78. 78. Traversal sample Order.BREADTH_FIRST• Breadth-first search
  79. 79. Traversal sample Order.BREADTH_FIRST• Breadth-first search
  80. 80. Traversal sample Order.BREADTH_FIRST• Breadth-first search
  81. 81. Traversal sample Order.BREADTH_FIRST• Breadth-first search
  82. 82. Traversal sample Order.BREADTH_FIRST• Breadth-first search
  83. 83. Traversal sample Order.BREADTH_FIRST• Breadth-first search
  84. 84. Traversal sample Order.DEPTH_FIRST• Depth-first search
  85. 85. Traversal sample Order.DEPTH_FIRST• Depth-first search
  86. 86. Traversal sample Order.DEPTH_FIRST• Depth-first search
  87. 87. Traversal sample Order.DEPTH_FIRST• Depth-first search
  88. 88. Traversal sample Order.DEPTH_FIRST• Depth-first search
  89. 89. Traversal sample Order.DEPTH_FIRST• Depth-first search
  90. 90. Neoclipse sample http://wiki.neo4j.org/content/Neoclipse
  91. 91. experiment
  92. 92. experiment• Store the mixi’s social graph for Neo4j• Condition • Machine: 24 core CPU, Memory 65GB • Neo4j: BatchInsert, community, embedded• Data • # of node 15 million # of edge 600 million
  93. 93. experiment• Store the mixi’s social graph for Neo4j• Condition • Machine: 24 core CPU, Memory 65GB • Neo4j: BatchInsert, community, embedded• Data • # of node 15 million # of edge 600 millionprocess time 513m17sec (about 8.6h)
  94. 94. Network Dataset• Stanford Large Network Dataset Collection • SNAP has a Wide variety of graph data! Social Networks Communication networks Citation networks Collaboration networks Web graphs Product co-purchasing networks Internet peer-to-peer networks Road networks Autonomous systems graphs Signed networks Wikipedia networks and metadata Memetracker and Twitter http://snap.stanford.edu/data/index.html
  95. 95. Introduction to Analysis Sample
  96. 96. Architecture Service Database Analysis Visualization(Social Graph)
  97. 97. Architecture Service Database Analysis Visualization(Social Graph)
  98. 98. Introduction Analyses Sample• Centrality• Clustering coefficient
  99. 99. Centrality• Centrality • to measure the importance of eahc nodes
  100. 100. Centrality• Centrality • to measure the importance of eahc nodes
  101. 101. Centrality• Centrality • to measure the importance of eahc nodes
  102. 102. Centrality • Centrality • to measure the importance of eahc nodescloseness centrality
  103. 103. Centrality • Centrality • to measure the importance of eahc nodescloseness centrality Pagerank
  104. 104. Centrality • Centrality • to measure the importance of eahc nodescloseness centrality Pagerank degree centrality
  105. 105. Centrality • Centrality • to measure the importance of eahc nodescloseness centrality Pagerank degree centrality betweenness centrality
  106. 106. Centrality • Centrality • to measure the importance of eahc nodescloseness centrality Pagerank degree centrality betweenness centralityeigenvector centrality
  107. 107. Centrality • Centrality • to measure the importance of eahc nodescloseness centrality Pagerank degree centrality betweenness centralityeigenvector centrality centraization
  108. 108. Centrality • Centrality • to measure the importance of eahc nodes closeness centrality Pagerankdegree centralitybetweenness centrality eigenvector centrality centraization
  109. 109. Centrality • Centrality • to measure the importance of eahc nodes closeness centrality Pagerankdegree centralitybetweenness centrality eigenvector centrality centraization
  110. 110. Degree centrality• The simplest measuring. • Counting the number of edge of each nodes. • num of friends
  111. 111. Degree centrality• The simplest measuring. • Counting the number of edge of each nodes. • num of friends 1 1 1
  112. 112. Degree centrality• The simplest measuring. • Counting the number of edge of each nodes. • num of friends 2 1 1 2 1 2
  113. 113. Degree centrality• The simplest measuring. • Counting the number of edge of each nodes. • num of friends 2 1 1 2 1 2
  114. 114. Degree centrality• The simplest measuring. • Counting the number of edge of each nodes. • num of friends 2 1 1 5 2 1 2
  115. 115. Degree centrality• The simplest measuring. • Counting the number of edge of each nodes. • num of friends 2 1 1 5 2 1 2
  116. 116. Degree distribution of mixi • Random sampling the 1000 users • the summary of degree sistributionMin 1st Que. Median Mean 3rd Que. Max1.00 3.00 10.00 25.69 30.00 903.00
  117. 117. Degree distribution of mixi
  118. 118. Clustering coefficient• Network destiny around any node. • ≒ destiny relationship
  119. 119. Clustering coefficient• Network destiny around any node. • ≒ destiny relationship clustering coefficient 0 / 3 = 0 (min)
  120. 120. Clustering coefficient• Network destiny around any node. • ≒ destiny relationship clustering coefficient 0 / 3 = 0 (min) clustering coefficient =1/3
  121. 121. Clustering coefficient• Network destiny around any node. • ≒ destiny relationship clustering coefficient 0 / 3 = 0 (min) clustering coefficient =1/3 clustering coefficient =2/3
  122. 122. Clustering coefficient• Network destiny around any node. • ≒ destiny relationship clustering coefficient 0 / 3 = 0 (min) clustering coefficient =1/3 clustering coefficient =2/3 clustering coefficient = 3 / 3 = 1 (max)
  123. 123. Clustering coefficient • Random sampling the 1000 users • summary for Clustering coefficientMin 1st Que. Median Mean 3rd Que. Max0.00 0.00 0.1157 0.2071 0.2667 1.000
  124. 124. Clustering coefficient
  125. 125. Clustering coefficient
  126. 126. the sample of low Clustering coefficient user• degree 25, clustering coefficient 0.08
  127. 127. the sample of middle Clustering coefficient user• degree 14, clustering coefficient 0.17
  128. 128. the sample of high Clustering coefficient user• degree 10, clustering coefficient 0.68
  129. 129. the sample of MAX Clustering coefficient user• degree 4, clustering coefficient 1
  130. 130. Visualization Sample
  131. 131. • Visualize a my social graph on mixi• Weighting the Edge • Amount of communication(color, thickness)• Weighting the Vertex • cluster coefficient(color, thickness)• visualization tool Gephi http://gephi.org/
  132. 132. • Motivation for Social Graph mining• Overview for GraphDB• Introduction for Neo4j• The samples for graph analysis with R• Introduction Visualization tool Gephi
  133. 133. Thanks!

×