Mining the Social Graph          mixi.inc       Shunya Kimura
Introduction•   Name: Shunya Kimura    •   twitter: @kimuras•   Job:Data mining, Software engineering    •   text mining, ...
Agenda• Introduction• The past work• Introduction to GraphDB• Introduction to Neo4j• Introduction to analysis sample
Introduction
Motivation for social graph analysis Test of millions of nodes, hundreds of millions of edges. The diversity of graph algo...
Number of users on mixi                 30000000                                    ID                 22500000# of member...
What is Social Graph?
Feed Back
Feed Back
Feed Back
Feed Back
Feed Back
Approach for SG analysis           Feed Back
Approach for SG analysis           Feed Back
Approach for SG analysis           Feed Back
Approach for SG analysis           Feed Back
The past work
• Friend recommend
• Friend recommend• Community recommend
Relational Databasesfrom_id    to_id    id   name     age1          2        1    Kimura   181          3        2    kato...
Relational Databases                                        Dump &                                        Denormalizationf...
Relational Databases                                        Dump &                                        Denormalizationf...
Relational Databases                                        Dump &                                        Denormalizationf...
Relational Databases                                        Dump &                                        Denormalizationf...
Relational Databases                                        Dump &                                        Denormalizationf...
Relational Databases                                        Dump &                   reimplementation     Denormalizationf...
Relational Databases                                        Dump &                   reimplementation     Denormalizationf...
Relational Databases                                        Dump &                   reimplementation     Denormalizationf...
Introduction to GraphDB
What is graph
What is graph   Vertex (node)
What is graph       Vertex (node)Edge
What is graph       Vertex (node)            Undirected graphEdge
What is graph       Vertex (node)Edge
What is graph       Vertex (node)Edge
What is graph       Vertex (node)Edge
What is graph       Vertex (node)             Directed graphEdge
What is GraphDB        Vertex (node) Edge
What is GraphDBID:   1               Vertex (node)NAME: kimuraPROP: MaleAGE: 18       Edge
What is GraphDBID:   1               Vertex (node)NAME: kimuraPROP: MaleAGE: 18       Edge                  ID:   2       ...
What is GraphDBID:   1               Vertex (node)NAME: kimuraPROP: MaleAGE: 18       Edge                  ID:   2       ...
What is GraphDBID:   1               Vertex (node)NAME: kimuraPROP: MaleAGE: 18       Edge                  ID:   2       ...
What is GraphDBID:   1                       Vertex (node)NAME: kimuraPROP: MaleAGE: 18        Edge                       ...
What is GraphDBID:   1                       Vertex (node)NAME: kimuraPROP: MaleAGE: 18        Edge                       ...
What is GraphDBID:   1                       Vertex (node)NAME: kimuraPROP: MaleAGE: 18        Edge                       ...
The implementations   for GraphDB  http://en.wikipedia.org/wiki/GraphDB
Introduction to Neo4j
GraphDB Neo4j       •     True ACID transactions       •     High availability       •     Scales to billions of nods and ...
Other my favorite features       for Neo4j          http://www.tinkerpop.com/post/4633229547/tinkerpop-graph-stack
Other my favorite features       for Neo4j• RESTful APIs                 http://www.tinkerpop.com/post/4633229547/tinkerpo...
Other my favorite features       for Neo4j• RESTful APIs• Query Language(Cypher)                 http://www.tinkerpop.com/...
Other my favorite features       for Neo4j• RESTful APIs• Query Language(Cypher)• Full indexing – lucene                 h...
Other my favorite features       for Neo4j• RESTful APIs• Query Language(Cypher)• Full indexing   – lucene• Implemented gr...
Other my favorite features       for Neo4j• RESTful APIs• Query Language(Cypher)• Full indexing   – lucene• Implemented gr...
Introduction simple Neo4j usecase           Single node           Multi nodeEmbeddedServer
Introduction simple Neo4j usecase           Single node           Multi nodeEmbedded           Analyses systemServer
Introduction simple Neo4j usecase           Single node           Multi nodeEmbedded           Analyses system       Analy...
Introduction simple Neo4j usecase           Single node           Multi nodeEmbedded           Analyses system       Analy...
Introduction simple Neo4j usecase           Single node           Multi nodeEmbedded           Analyses system       Analy...
Introduction simple Neo4j usecase           Single node           Multi nodeEmbedded           Analyses system       Analy...
Introduction simple Neo4j usecase              Single node          Multi node           Analyses systemEmbedded          ...
Introduction simple Neo4j usecase              Single node          Multi node           Analyses systemEmbedded          ...
Introduction to simple   embedded Neo4j• Insert Vertices & make Relationships • Single node & Embedded• Traversal sample
Insert vertices,                   make relationshippublic final class InputVertex {    public static void main(final Stri...
Insert vertices,                   make relationshippublic final class InputVertex {    public static void main(final Stri...
Insert vertices,                   make relationshippublic final class InputVertex {    public static void main(final Stri...
Insert vertices,                   make relationshippublic final class InputVertex {    public static void main(final Stri...
Insert vertices,                   make relationshippublic final class InputVertex {    public static void main(final Stri...
Insert vertices,                   make relationshippublic final class InputVertex {    public static void main(final Stri...
Insert vertices,                   make relationshippublic final class InputVertex {    public static void main(final Stri...
Batch Insert    • Non thread safe, non transaction    • But very fast!public final class Batch {    public static void mai...
Traversal sample    • You can specify the traverse criteriapublic static void main(final String[] args) {        GraphData...
Traversal sample    • You can specify the traverse criteriapublic static void main(final String[] args) {        GraphData...
Traversal sample    • You can specify the traverse criteriapublic static void main(final String[] args) {        GraphData...
Traversal sample    • You can specify the traverse criteriapublic static void main(final String[] args) {        GraphData...
Traversal sample    • You can specify the traverse criteriapublic static void main(final String[] args) {        GraphData...
Traversal sample    • You can specify the traverse criteriapublic static void main(final String[] args) {        GraphData...
Traversal sample   Order.BREADTH_FIRST• Breadth-first search
Traversal sample   Order.BREADTH_FIRST• Breadth-first search
Traversal sample   Order.BREADTH_FIRST• Breadth-first search
Traversal sample   Order.BREADTH_FIRST• Breadth-first search
Traversal sample   Order.BREADTH_FIRST• Breadth-first search
Traversal sample   Order.BREADTH_FIRST• Breadth-first search
Traversal sample     Order.DEPTH_FIRST• Depth-first search
Traversal sample     Order.DEPTH_FIRST• Depth-first search
Traversal sample     Order.DEPTH_FIRST• Depth-first search
Traversal sample     Order.DEPTH_FIRST• Depth-first search
Traversal sample     Order.DEPTH_FIRST• Depth-first search
Traversal sample     Order.DEPTH_FIRST• Depth-first search
Neoclipse sample       http://wiki.neo4j.org/content/Neoclipse
experiment
experiment•   Store the mixi’s social graph for Neo4j•   Condition    •   Machine: 24 core CPU, Memory 65GB    •   Neo4j: ...
experiment•   Store the mixi’s social graph for Neo4j•   Condition    •   Machine: 24 core CPU, Memory 65GB    •   Neo4j: ...
Network Dataset•   Stanford Large Network Dataset Collection    •    SNAP has a Wide variety of graph data!             So...
Introduction to Analysis        Sample
Architecture   Service                  Database   Analysis   Visualization(Social Graph)
Architecture   Service                  Database   Analysis   Visualization(Social Graph)
Introduction Analyses          Sample• Centrality• Clustering coefficient
Centrality• Centrality • to measure the importance of eahc nodes
Centrality• Centrality • to measure the importance of eahc nodes
Centrality• Centrality • to measure the importance of eahc nodes
Centrality       • Centrality        • to measure the importance of eahc nodescloseness centrality
Centrality       • Centrality        • to measure the importance of eahc nodescloseness centrality    Pagerank
Centrality       • Centrality        • to measure the importance of eahc nodescloseness centrality    Pagerank degree cent...
Centrality       • Centrality        • to measure the importance of eahc nodescloseness centrality       Pagerank degree c...
Centrality        • Centrality         • to measure the importance of eahc nodescloseness centrality         Pagerank degr...
Centrality        • Centrality         • to measure the importance of eahc nodescloseness centrality         Pagerank degr...
Centrality            • Centrality             • to measure the importance of eahc nodes   closeness centrality        Pag...
Centrality            • Centrality             • to measure the importance of eahc nodes   closeness centrality        Pag...
Degree centrality•   The simplest measuring.    •   Counting the number of edge of each nodes.    •     num of friends
Degree centrality•   The simplest measuring.    •   Counting the number of edge of each nodes.    •     num of friends    ...
Degree centrality•   The simplest measuring.    •   Counting the number of edge of each nodes.    •     num of friends    ...
Degree centrality•   The simplest measuring.    •   Counting the number of edge of each nodes.    •     num of friends    ...
Degree centrality•   The simplest measuring.    •   Counting the number of edge of each nodes.    •     num of friends    ...
Degree centrality•   The simplest measuring.    •   Counting the number of edge of each nodes.    •     num of friends    ...
Degree distribution of mixi •     Random sampling the 1000 users •     the summary of degree sistributionMin       1st Que...
Degree distribution of mixi
Clustering coefficient•   Network destiny around any node.    •   ≒ destiny relationship
Clustering coefficient•   Network destiny around any node.    •   ≒ destiny relationship                                 cl...
Clustering coefficient•   Network destiny around any node.    •   ≒ destiny relationship                                 cl...
Clustering coefficient•   Network destiny around any node.    •   ≒ destiny relationship                                 cl...
Clustering coefficient•   Network destiny around any node.    •   ≒ destiny relationship                                 cl...
Clustering coefficient •     Random sampling the 1000 users •     summary for Clustering coefficientMin       1st Que. Media...
Clustering coefficient
Clustering coefficient
the sample of low Clustering           coefficient user•   degree 25, clustering coefficient   0.08
the sample of middle      Clustering coefficient user•   degree 14,   clustering coefficient   0.17
the sample of high Clustering           coefficient user•   degree 10,   clustering coefficient   0.68
the sample of MAX Clustering           coefficient user•   degree 4,   clustering coefficient   1
Visualization Sample
•   Visualize a my social graph on mixi•   Weighting the Edge    •   Amount of communication(color, thickness)•   Weightin...
•   Motivation for Social Graph mining•   Overview for GraphDB•   Introduction for Neo4j•   The samples for graph analysis...
Thanks!
Mining the social graph
Mining the social graph
Mining the social graph
Mining the social graph
Mining the social graph
Mining the social graph
Mining the social graph
Mining the social graph
Mining the social graph
Upcoming SlideShare
Loading in...5
×

Mining the social graph

2,210

Published on

Introduction for mining the social graph with Neo4j and R and Gephi.

Published in: Technology
1 Comment
13 Likes
Statistics
Notes
  • please make this presentation available to download
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
2,210
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
1
Likes
13
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • ・TCもmysqlも現役だし、大好き\n
  • ・TCもmysqlも現役だし、大好き\n
  • ・TCもmysqlも現役だし、大好き\n
  • ・TCもmysqlも現役だし、大好き\n
  • ・TCもmysqlも現役だし、大好き\n
  • ・TCもmysqlも現役だし、大好き\n
  • ・TCもmysqlも現役だし、大好き\n
  • ・TCもmysqlも現役だし、大好き\n
  • ・TCもmysqlも現役だし、大好き\n
  • ・TCもmysqlも現役だし、大好き\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Transcript of "Mining the social graph"

    1. 1. Mining the Social Graph mixi.inc Shunya Kimura
    2. 2. Introduction• Name: Shunya Kimura • twitter: @kimuras• Job:Data mining, Software engineering • text mining, graph mining, search engine
    3. 3. Agenda• Introduction• The past work• Introduction to GraphDB• Introduction to Neo4j• Introduction to analysis sample
    4. 4. Introduction
    5. 5. Motivation for social graph analysis Test of millions of nodes, hundreds of millions of edges. The diversity of graph algorithm by developing distributed processing technology. Challenging.
    6. 6. Number of users on mixi 30000000 ID 22500000# of member id 15000000 7500000 0 2007 2008 2009 2010 2011 year
    7. 7. What is Social Graph?
    8. 8. Feed Back
    9. 9. Feed Back
    10. 10. Feed Back
    11. 11. Feed Back
    12. 12. Feed Back
    13. 13. Approach for SG analysis Feed Back
    14. 14. Approach for SG analysis Feed Back
    15. 15. Approach for SG analysis Feed Back
    16. 16. Approach for SG analysis Feed Back
    17. 17. The past work
    18. 18. • Friend recommend
    19. 19. • Friend recommend• Community recommend
    20. 20. Relational Databasesfrom_id to_id id name age1 2 1 Kimura 181 3 2 kato 452 3 3 ito 21
    21. 21. Relational Databases Dump & Denormalizationfrom_id to_id id name age1 2 1 Kimura 181 3 2 kato 452 3 3 ito 21
    22. 22. Relational Databases Dump & Denormalizationfrom_id to_id id name age Key value1 2 1 Kimura 18 From:1 2,31 3 2 kato 45 From:2 32 3 3 ito 21 Prof:1 Kimura,18 Prof:2 Kato,45
    23. 23. Relational Databases Dump & Denormalizationfrom_id to_id id name age Key value1 2 1 Kimura 18 From:1 2,31 3 2 kato 45 From:2 32 3 3 ito 21 Prof:1 Kimura,18 Prof:2 Kato,45
    24. 24. Relational Databases Dump & Denormalizationfrom_id to_id id name age Key value1 2 1 Kimura 18 From:1 2,31 3 2 kato 45 From:2 32 3 3 ito 21 Prof:1 Kimura,18 Prof:2 Kato,45
    25. 25. Relational Databases Dump & Denormalizationfrom_id to_id id name age Key value1 2 1 Kimura 18 From:1 2,31 3 2 kato 45 From:2 32 3 3 ito 21 Prof:1 Kimuras,18 Prof:2 Kato,45
    26. 26. Relational Databases Dump & reimplementation Denormalizationfrom_id to_id id name age Key value1 2 1 Kimura 18 From:1 2,31 3 2 kato 45 From:2 32 3 3 ito 21 Prof:1 Kimuras,18 Prof:2 Kato,45
    27. 27. Relational Databases Dump & reimplementation Denormalizationfrom_id to_id id name age Key value11 2 3 maintenance cost 1 2 Kimura kato 18 45 From:1 From:2 2,3 32 3 3 ito 21 Prof:1 Kimuras,18 Prof:2 Kato,45
    28. 28. Relational Databases Dump & reimplementation Denormalizationfrom_id to_id id name age Key value11 2 3 maintenance cost 1 2 Kimura kato 18 45 From:1 From:2 2,3 32 3 3 ito 21 Prof:1 Kimuras,18 Prof:2 Kato,45 scalability
    29. 29. Introduction to GraphDB
    30. 30. What is graph
    31. 31. What is graph Vertex (node)
    32. 32. What is graph Vertex (node)Edge
    33. 33. What is graph Vertex (node) Undirected graphEdge
    34. 34. What is graph Vertex (node)Edge
    35. 35. What is graph Vertex (node)Edge
    36. 36. What is graph Vertex (node)Edge
    37. 37. What is graph Vertex (node) Directed graphEdge
    38. 38. What is GraphDB Vertex (node) Edge
    39. 39. What is GraphDBID: 1 Vertex (node)NAME: kimuraPROP: MaleAGE: 18 Edge
    40. 40. What is GraphDBID: 1 Vertex (node)NAME: kimuraPROP: MaleAGE: 18 Edge ID: 2 NAME: ITO PROP: Female AGE: 21
    41. 41. What is GraphDBID: 1 Vertex (node)NAME: kimuraPROP: MaleAGE: 18 Edge ID: 2 NAME: ITO PROP: Female AGE: 21
    42. 42. What is GraphDBID: 1 Vertex (node)NAME: kimuraPROP: MaleAGE: 18 Edge ID: 2 NAME: ITO PROP: Female AGE: 21
    43. 43. What is GraphDBID: 1 Vertex (node)NAME: kimuraPROP: MaleAGE: 18 Edge ID: 2ID: 3 NAME: ITOLABEL: Like PROP: FemaleSince: 2011/08/06 AGE: 21OutGoing: 2
    44. 44. What is GraphDBID: 1 Vertex (node)NAME: kimuraPROP: MaleAGE: 18 Edge ID: 2ID: 3 NAME: ITOLABEL: Like PROP: FemaleSince: 2011/08/06 AGE: 21OutGoing: 2
    45. 45. What is GraphDBID: 1 Vertex (node)NAME: kimuraPROP: MaleAGE: 18 Edge ID: 2ID: 3 NAME: ITOLABEL: Like PROP: FemaleSince: 2011/08/06 AGE: 21OutGoing: 2
    46. 46. The implementations for GraphDB http://en.wikipedia.org/wiki/GraphDB
    47. 47. Introduction to Neo4j
    48. 48. GraphDB Neo4j • True ACID transactions • High availability • Scales to billions of nods and relationships • High speed querying through traversals Single instance(GPLv3) Multiple instance(AGPLv3)Embedded EmbeddedGraphDatabase HighlyAvailableGraphDatabaseStandalone Neo4j Server Neo4j Server high availability mode http://neo4j.org/
    49. 49. Other my favorite features for Neo4j http://www.tinkerpop.com/post/4633229547/tinkerpop-graph-stack
    50. 50. Other my favorite features for Neo4j• RESTful APIs http://www.tinkerpop.com/post/4633229547/tinkerpop-graph-stack
    51. 51. Other my favorite features for Neo4j• RESTful APIs• Query Language(Cypher) http://www.tinkerpop.com/post/4633229547/tinkerpop-graph-stack
    52. 52. Other my favorite features for Neo4j• RESTful APIs• Query Language(Cypher)• Full indexing – lucene http://www.tinkerpop.com/post/4633229547/tinkerpop-graph-stack
    53. 53. Other my favorite features for Neo4j• RESTful APIs• Query Language(Cypher)• Full indexing – lucene• Implemented graph algorithm – A*, Dijkstra – High speed traverse http://www.tinkerpop.com/post/4633229547/tinkerpop-graph-stack
    54. 54. Other my favorite features for Neo4j• RESTful APIs• Query Language(Cypher)• Full indexing – lucene• Implemented graph algorithm – A*, Dijkstra – High speed traverse• Gremlin supported – Like a query language http://www.tinkerpop.com/post/4633229547/tinkerpop-graph-stack
    55. 55. Introduction simple Neo4j usecase Single node Multi nodeEmbeddedServer
    56. 56. Introduction simple Neo4j usecase Single node Multi nodeEmbedded Analyses systemServer
    57. 57. Introduction simple Neo4j usecase Single node Multi nodeEmbedded Analyses system Analyses systemServer
    58. 58. Introduction simple Neo4j usecase Single node Multi nodeEmbedded Analyses system Analyses system Analyses systemServer
    59. 59. Introduction simple Neo4j usecase Single node Multi nodeEmbedded Analyses system Analyses system Analyses system Analyses systemServer
    60. 60. Introduction simple Neo4j usecase Single node Multi nodeEmbedded Analyses system Analyses system Analyses system Analyses systemServer
    61. 61. Introduction simple Neo4j usecase Single node Multi node Analyses systemEmbedded Analyses system Analyses system Analyses systemServer
    62. 62. Introduction simple Neo4j usecase Single node Multi node Analyses systemEmbedded Analyses system Analyses system Analyses systemServer
    63. 63. Introduction to simple embedded Neo4j• Insert Vertices & make Relationships • Single node & Embedded• Traversal sample
    64. 64. Insert vertices, make relationshippublic final class InputVertex { public static void main(final String[] args) { GraphDatabaseService graphDb = new EmbeddedGraphDatabase("/tmp/neo4j"); Transaction tx = graphDb.beginTx(); try { Node firstNode = graphDb.createNode(); firstNode.setProperty("Name", "Kimura"); Node secondNode = graphDb.createNode(); secondNode.setProperty("Name", "Kato"); firstNode.createRelationshipTo(secondNode, DynamicRelationshipType.withName("LIKE")); tx.success(); } finally { tx.finish(); } graphDb.shutdown(); }}
    65. 65. Insert vertices, make relationshippublic final class InputVertex { public static void main(final String[] args) { GraphDatabaseService graphDb = new EmbeddedGraphDatabase("/tmp/neo4j"); Transaction tx = graphDb.beginTx(); try { Node firstNode = graphDb.createNode(); firstNode.setProperty("Name", "Kimura"); Node secondNode = graphDb.createNode(); secondNode.setProperty("Name", "Kato"); firstNode.createRelationshipTo(secondNode, DynamicRelationshipType.withName("LIKE")); tx.success(); } finally { tx.finish(); } graphDb.shutdown(); }}
    66. 66. Insert vertices, make relationshippublic final class InputVertex { public static void main(final String[] args) { ID: 1 GraphDatabaseService graphDb = new NAME: kimura EmbeddedGraphDatabase("/tmp/neo4j"); Transaction tx = graphDb.beginTx(); try { Node firstNode = graphDb.createNode(); firstNode.setProperty("Name", "Kimura"); Node secondNode = graphDb.createNode(); secondNode.setProperty("Name", "Kato"); firstNode.createRelationshipTo(secondNode, DynamicRelationshipType.withName("LIKE")); tx.success(); } finally { tx.finish(); } graphDb.shutdown(); }}
    67. 67. Insert vertices, make relationshippublic final class InputVertex { public static void main(final String[] args) { ID: 1 GraphDatabaseService graphDb = new NAME: kimura EmbeddedGraphDatabase("/tmp/neo4j"); Transaction tx = graphDb.beginTx(); try { Node firstNode = graphDb.createNode(); firstNode.setProperty("Name", "Kimura"); Node secondNode = graphDb.createNode(); secondNode.setProperty("Name", "Kato"); firstNode.createRelationshipTo(secondNode, DynamicRelationshipType.withName("LIKE")); tx.success(); } finally { tx.finish(); } graphDb.shutdown(); }}
    68. 68. Insert vertices, make relationshippublic final class InputVertex { public static void main(final String[] args) { ID: 1 GraphDatabaseService graphDb = new NAME: kimura EmbeddedGraphDatabase("/tmp/neo4j"); Transaction tx = graphDb.beginTx(); try { Node firstNode = graphDb.createNode(); firstNode.setProperty("Name", "Kimura"); Node secondNode = graphDb.createNode(); secondNode.setProperty("Name", "Kato"); firstNode.createRelationshipTo(secondNode, DynamicRelationshipType.withName("LIKE")); tx.success(); } finally { ID: 2 tx.finish(); NAME: Kato } graphDb.shutdown(); }}
    69. 69. Insert vertices, make relationshippublic final class InputVertex { public static void main(final String[] args) { ID: 1 GraphDatabaseService graphDb = new NAME: kimura EmbeddedGraphDatabase("/tmp/neo4j"); Transaction tx = graphDb.beginTx(); try { Node firstNode = graphDb.createNode(); firstNode.setProperty("Name", "Kimura"); Node secondNode = graphDb.createNode(); secondNode.setProperty("Name", "Kato"); firstNode.createRelationshipTo(secondNode, DynamicRelationshipType.withName("LIKE")); tx.success(); } finally { ID: 2 tx.finish(); NAME: Kato } graphDb.shutdown(); }}
    70. 70. Insert vertices, make relationshippublic final class InputVertex { public static void main(final String[] args) { ID: 1 GraphDatabaseService graphDb = new NAME: kimura EmbeddedGraphDatabase("/tmp/neo4j"); Transaction tx = graphDb.beginTx(); try { Node firstNode = graphDb.createNode(); ID: 3 firstNode.setProperty("Name", "Kimura"); Relation: Like Node secondNode = graphDb.createNode(); secondNode.setProperty("Name", "Kato"); firstNode.createRelationshipTo(secondNode, DynamicRelationshipType.withName("LIKE")); tx.success(); } finally { ID: 2 tx.finish(); NAME: Kato } graphDb.shutdown(); }}
    71. 71. Batch Insert • Non thread safe, non transaction • But very fast!public final class Batch { public static void main(final String[] args) { BatchInserter inserter = new BatchInserterImpl("/tmp/neo4j", BatchInserterImpl.loadProperties("/tmp/neo4j.props")); Map<String, Object> prop = new HashMap<String, Object>(); prop.put("Name", "Kimura"); prop.put("Age", 21); long node1 = inserter.createNode(prop); prop.put("Name", "Kato"); prop.put("Age", 21); long node2 = inserter.createNode(prop); inserter.createRelationship(node1, node2, DynamicRelationshipType.withName("LIKE"), null); inserter.shutdown(); }}
    72. 72. Traversal sample • You can specify the traverse criteriapublic static void main(final String[] args) { GraphDatabaseService graphDB = new EmbeddedGraphDatabase(args[0]); Node node = graphDB.getNodeById(1); Traverser friends = node.traverse( Order.DEPTH_FIRST, StopEvaluator.END_OF_GRAPH, ReturnableEvaluator.ALL_BUT_START_NODE, DynamicRelationshipType.withName("LIKE"), Direction.OUTGOING); for (Node nodeBuf : friends) { TraversalPosition currentPosition = friends.currentPosition(); } }
    73. 73. Traversal sample • You can specify the traverse criteriapublic static void main(final String[] args) { GraphDatabaseService graphDB = new EmbeddedGraphDatabase(args[0]); Node node = graphDB.getNodeById(1); Traverser friends = node.traverse( //how to traversal Order.DEPTH_FIRST, BREADTH_FIRST StopEvaluator.END_OF_GRAPH, ReturnableEvaluator.ALL_BUT_START_NODE, DynamicRelationshipType.withName("LIKE"), Direction.OUTGOING); for (Node nodeBuf : friends) { TraversalPosition currentPosition = friends.currentPosition(); } }
    74. 74. Traversal sample • You can specify the traverse criteriapublic static void main(final String[] args) { GraphDatabaseService graphDB = new EmbeddedGraphDatabase(args[0]); Node node = graphDB.getNodeById(1); Traverser friends = node.traverse( //how to traversal Order.DEPTH_FIRST, BREADTH_FIRST //traversal termination condition StopEvaluator.END_OF_GRAPH, DEPTH_ONE ReturnableEvaluator.ALL_BUT_START_NODE, DynamicRelationshipType.withName("LIKE"), Direction.OUTGOING); for (Node nodeBuf : friends) { TraversalPosition currentPosition = friends.currentPosition(); } }
    75. 75. Traversal sample • You can specify the traverse criteriapublic static void main(final String[] args) { GraphDatabaseService graphDB = new EmbeddedGraphDatabase(args[0]); Node node = graphDB.getNodeById(1); Traverser friends = node.traverse( //how to traversal Order.DEPTH_FIRST, BREADTH_FIRST //traversal termination condition StopEvaluator.END_OF_GRAPH, DEPTH_ONE // to get the type of node ReturnableEvaluator.ALL_BUT_START_NODE, ALL, isReturnableNode() DynamicRelationshipType.withName("LIKE"), Direction.OUTGOING); for (Node nodeBuf : friends) { TraversalPosition currentPosition = friends.currentPosition(); } }
    76. 76. Traversal sample • You can specify the traverse criteriapublic static void main(final String[] args) { GraphDatabaseService graphDB = new EmbeddedGraphDatabase(args[0]); Node node = graphDB.getNodeById(1); Traverser friends = node.traverse( //how to traversal Order.DEPTH_FIRST, BREADTH_FIRST //traversal termination condition StopEvaluator.END_OF_GRAPH, DEPTH_ONE // to get the type of node ReturnableEvaluator.ALL_BUT_START_NODE, ALL, isReturnableNode() // type of relational for traverse DynamicRelationshipType.withName("LIKE"), Direction.OUTGOING); for (Node nodeBuf : friends) { TraversalPosition currentPosition = friends.currentPosition(); } }
    77. 77. Traversal sample • You can specify the traverse criteriapublic static void main(final String[] args) { GraphDatabaseService graphDB = new EmbeddedGraphDatabase(args[0]); Node node = graphDB.getNodeById(1); Traverser friends = node.traverse( //how to traversal Order.DEPTH_FIRST, BREADTH_FIRST //traversal termination condition StopEvaluator.END_OF_GRAPH, DEPTH_ONE // to get the type of node ReturnableEvaluator.ALL_BUT_START_NODE, ALL, isReturnableNode() // type of relational for traverse DynamicRelationshipType.withName("LIKE"), // specify a edge type for traverse Direction.OUTGOING); INCOMING, BOTH for (Node nodeBuf : friends) { TraversalPosition currentPosition = friends.currentPosition(); } }
    78. 78. Traversal sample Order.BREADTH_FIRST• Breadth-first search
    79. 79. Traversal sample Order.BREADTH_FIRST• Breadth-first search
    80. 80. Traversal sample Order.BREADTH_FIRST• Breadth-first search
    81. 81. Traversal sample Order.BREADTH_FIRST• Breadth-first search
    82. 82. Traversal sample Order.BREADTH_FIRST• Breadth-first search
    83. 83. Traversal sample Order.BREADTH_FIRST• Breadth-first search
    84. 84. Traversal sample Order.DEPTH_FIRST• Depth-first search
    85. 85. Traversal sample Order.DEPTH_FIRST• Depth-first search
    86. 86. Traversal sample Order.DEPTH_FIRST• Depth-first search
    87. 87. Traversal sample Order.DEPTH_FIRST• Depth-first search
    88. 88. Traversal sample Order.DEPTH_FIRST• Depth-first search
    89. 89. Traversal sample Order.DEPTH_FIRST• Depth-first search
    90. 90. Neoclipse sample http://wiki.neo4j.org/content/Neoclipse
    91. 91. experiment
    92. 92. experiment• Store the mixi’s social graph for Neo4j• Condition • Machine: 24 core CPU, Memory 65GB • Neo4j: BatchInsert, community, embedded• Data • # of node 15 million # of edge 600 million
    93. 93. experiment• Store the mixi’s social graph for Neo4j• Condition • Machine: 24 core CPU, Memory 65GB • Neo4j: BatchInsert, community, embedded• Data • # of node 15 million # of edge 600 millionprocess time 513m17sec (about 8.6h)
    94. 94. Network Dataset• Stanford Large Network Dataset Collection • SNAP has a Wide variety of graph data! Social Networks Communication networks Citation networks Collaboration networks Web graphs Product co-purchasing networks Internet peer-to-peer networks Road networks Autonomous systems graphs Signed networks Wikipedia networks and metadata Memetracker and Twitter http://snap.stanford.edu/data/index.html
    95. 95. Introduction to Analysis Sample
    96. 96. Architecture Service Database Analysis Visualization(Social Graph)
    97. 97. Architecture Service Database Analysis Visualization(Social Graph)
    98. 98. Introduction Analyses Sample• Centrality• Clustering coefficient
    99. 99. Centrality• Centrality • to measure the importance of eahc nodes
    100. 100. Centrality• Centrality • to measure the importance of eahc nodes
    101. 101. Centrality• Centrality • to measure the importance of eahc nodes
    102. 102. Centrality • Centrality • to measure the importance of eahc nodescloseness centrality
    103. 103. Centrality • Centrality • to measure the importance of eahc nodescloseness centrality Pagerank
    104. 104. Centrality • Centrality • to measure the importance of eahc nodescloseness centrality Pagerank degree centrality
    105. 105. Centrality • Centrality • to measure the importance of eahc nodescloseness centrality Pagerank degree centrality betweenness centrality
    106. 106. Centrality • Centrality • to measure the importance of eahc nodescloseness centrality Pagerank degree centrality betweenness centralityeigenvector centrality
    107. 107. Centrality • Centrality • to measure the importance of eahc nodescloseness centrality Pagerank degree centrality betweenness centralityeigenvector centrality centraization
    108. 108. Centrality • Centrality • to measure the importance of eahc nodes closeness centrality Pagerankdegree centralitybetweenness centrality eigenvector centrality centraization
    109. 109. Centrality • Centrality • to measure the importance of eahc nodes closeness centrality Pagerankdegree centralitybetweenness centrality eigenvector centrality centraization
    110. 110. Degree centrality• The simplest measuring. • Counting the number of edge of each nodes. • num of friends
    111. 111. Degree centrality• The simplest measuring. • Counting the number of edge of each nodes. • num of friends 1 1 1
    112. 112. Degree centrality• The simplest measuring. • Counting the number of edge of each nodes. • num of friends 2 1 1 2 1 2
    113. 113. Degree centrality• The simplest measuring. • Counting the number of edge of each nodes. • num of friends 2 1 1 2 1 2
    114. 114. Degree centrality• The simplest measuring. • Counting the number of edge of each nodes. • num of friends 2 1 1 5 2 1 2
    115. 115. Degree centrality• The simplest measuring. • Counting the number of edge of each nodes. • num of friends 2 1 1 5 2 1 2
    116. 116. Degree distribution of mixi • Random sampling the 1000 users • the summary of degree sistributionMin 1st Que. Median Mean 3rd Que. Max1.00 3.00 10.00 25.69 30.00 903.00
    117. 117. Degree distribution of mixi
    118. 118. Clustering coefficient• Network destiny around any node. • ≒ destiny relationship
    119. 119. Clustering coefficient• Network destiny around any node. • ≒ destiny relationship clustering coefficient 0 / 3 = 0 (min)
    120. 120. Clustering coefficient• Network destiny around any node. • ≒ destiny relationship clustering coefficient 0 / 3 = 0 (min) clustering coefficient =1/3
    121. 121. Clustering coefficient• Network destiny around any node. • ≒ destiny relationship clustering coefficient 0 / 3 = 0 (min) clustering coefficient =1/3 clustering coefficient =2/3
    122. 122. Clustering coefficient• Network destiny around any node. • ≒ destiny relationship clustering coefficient 0 / 3 = 0 (min) clustering coefficient =1/3 clustering coefficient =2/3 clustering coefficient = 3 / 3 = 1 (max)
    123. 123. Clustering coefficient • Random sampling the 1000 users • summary for Clustering coefficientMin 1st Que. Median Mean 3rd Que. Max0.00 0.00 0.1157 0.2071 0.2667 1.000
    124. 124. Clustering coefficient
    125. 125. Clustering coefficient
    126. 126. the sample of low Clustering coefficient user• degree 25, clustering coefficient 0.08
    127. 127. the sample of middle Clustering coefficient user• degree 14, clustering coefficient 0.17
    128. 128. the sample of high Clustering coefficient user• degree 10, clustering coefficient 0.68
    129. 129. the sample of MAX Clustering coefficient user• degree 4, clustering coefficient 1
    130. 130. Visualization Sample
    131. 131. • Visualize a my social graph on mixi• Weighting the Edge • Amount of communication(color, thickness)• Weighting the Vertex • cluster coefficient(color, thickness)• visualization tool Gephi http://gephi.org/
    132. 132. • Motivation for Social Graph mining• Overview for GraphDB• Introduction for Neo4j• The samples for graph analysis with R• Introduction Visualization tool Gephi
    133. 133. Thanks!

    ×