5. Motivation for social graph analysis
Test of millions of nodes, hundreds of millions of edges.
The diversity of graph algorithm by developing distributed processing technology.
Challenging.
6. Number of users on mixi
30000000
ID
22500000
# of member id
15000000
7500000
0
2007 2008 2009 2010 2011
year
56. GraphDB Neo4j
⢠True ACID transactions
⢠High availability
⢠Scales to billions of nods and relationships
⢠High speed querying through traversals
Single instance(GPLv3) Multiple instance(AGPLv3)
Embedded EmbeddedGraphDatabase HighlyAvailableGraphDatabase
Standalone Neo4j Server Neo4j Server high availability mode
http://neo4j.org/
57. Other my favorite features
for Neo4j
http://www.tinkerpop.com/post/4633229547/tinkerpop-graph-stack
58. Other my favorite features
for Neo4j
⢠RESTful APIs
http://www.tinkerpop.com/post/4633229547/tinkerpop-graph-stack
59. Other my favorite features
for Neo4j
⢠RESTful APIs
⢠Query Language(Cypher)
http://www.tinkerpop.com/post/4633229547/tinkerpop-graph-stack
60. Other my favorite features
for Neo4j
⢠RESTful APIs
⢠Query Language(Cypher)
⢠Full indexing
â lucene
http://www.tinkerpop.com/post/4633229547/tinkerpop-graph-stack
61. Other my favorite features
for Neo4j
⢠RESTful APIs
⢠Query Language(Cypher)
⢠Full indexing
â lucene
⢠Implemented graph algorithm
â A*, Dijkstra
â High speed traverse
http://www.tinkerpop.com/post/4633229547/tinkerpop-graph-stack
62. Other my favorite features
for Neo4j
⢠RESTful APIs
⢠Query Language(Cypher)
⢠Full indexing
â lucene
⢠Implemented graph algorithm
â A*, Dijkstra
â High speed traverse
⢠Gremlin supported
â Like a query language
http://www.tinkerpop.com/post/4633229547/tinkerpop-graph-stack
65. Introduction simple Neo4j usecase
Single node Multi node
Embedded
Analyses system Analyses system
Server
66. Introduction simple Neo4j usecase
Single node Multi node
Embedded
Analyses system Analyses system
Analyses system
Server
67. Introduction simple Neo4j usecase
Single node Multi node
Embedded
Analyses system Analyses system
Analyses system Analyses system
Server
68. Introduction simple Neo4j usecase
Single node Multi node
Embedded
Analyses system Analyses system
Analyses system Analyses system
Server
69. Introduction simple Neo4j usecase
Single node Multi node
Analyses system
Embedded
Analyses system
Analyses system Analyses system
Server
70. Introduction simple Neo4j usecase
Single node Multi node
Analyses system
Embedded
Analyses system
Analyses system Analyses system
Server
71. Introduction to simple
embedded Neo4j
⢠Insert Vertices & make Relationships
⢠Single node & Embedded
⢠Traversal sample
72. Insert vertices,
make relationship
public final class InputVertex {
public static void main(final String[] args) {
GraphDatabaseService graphDb = new
EmbeddedGraphDatabase("/tmp/neo4j");
Transaction tx = graphDb.beginTx();
try {
Node firstNode = graphDb.createNode();
firstNode.setProperty("Name", "Kimura");
Node secondNode = graphDb.createNode();
secondNode.setProperty("Name", "Kato");
firstNode.createRelationshipTo(secondNode,
DynamicRelationshipType.withName("LIKE"));
tx.success();
} finally {
tx.finish();
}
graphDb.shutdown();
}
}
73. Insert vertices,
make relationship
public final class InputVertex {
public static void main(final String[] args) {
GraphDatabaseService graphDb = new
EmbeddedGraphDatabase("/tmp/neo4j");
Transaction tx = graphDb.beginTx();
try {
Node firstNode = graphDb.createNode();
firstNode.setProperty("Name", "Kimura");
Node secondNode = graphDb.createNode();
secondNode.setProperty("Name", "Kato");
firstNode.createRelationshipTo(secondNode,
DynamicRelationshipType.withName("LIKE"));
tx.success();
} finally {
tx.finish();
}
graphDb.shutdown();
}
}
74. Insert vertices,
make relationship
public final class InputVertex {
public static void main(final String[] args) { ID: 1
GraphDatabaseService graphDb = new NAME: kimura
EmbeddedGraphDatabase("/tmp/neo4j");
Transaction tx = graphDb.beginTx();
try {
Node firstNode = graphDb.createNode();
firstNode.setProperty("Name", "Kimura");
Node secondNode = graphDb.createNode();
secondNode.setProperty("Name", "Kato");
firstNode.createRelationshipTo(secondNode,
DynamicRelationshipType.withName("LIKE"));
tx.success();
} finally {
tx.finish();
}
graphDb.shutdown();
}
}
75. Insert vertices,
make relationship
public final class InputVertex {
public static void main(final String[] args) { ID: 1
GraphDatabaseService graphDb = new NAME: kimura
EmbeddedGraphDatabase("/tmp/neo4j");
Transaction tx = graphDb.beginTx();
try {
Node firstNode = graphDb.createNode();
firstNode.setProperty("Name", "Kimura");
Node secondNode = graphDb.createNode();
secondNode.setProperty("Name", "Kato");
firstNode.createRelationshipTo(secondNode,
DynamicRelationshipType.withName("LIKE"));
tx.success();
} finally {
tx.finish();
}
graphDb.shutdown();
}
}
76. Insert vertices,
make relationship
public final class InputVertex {
public static void main(final String[] args) { ID: 1
GraphDatabaseService graphDb = new NAME: kimura
EmbeddedGraphDatabase("/tmp/neo4j");
Transaction tx = graphDb.beginTx();
try {
Node firstNode = graphDb.createNode();
firstNode.setProperty("Name", "Kimura");
Node secondNode = graphDb.createNode();
secondNode.setProperty("Name", "Kato");
firstNode.createRelationshipTo(secondNode,
DynamicRelationshipType.withName("LIKE"));
tx.success();
} finally { ID: 2
tx.finish(); NAME: Kato
}
graphDb.shutdown();
}
}
77. Insert vertices,
make relationship
public final class InputVertex {
public static void main(final String[] args) { ID: 1
GraphDatabaseService graphDb = new NAME: kimura
EmbeddedGraphDatabase("/tmp/neo4j");
Transaction tx = graphDb.beginTx();
try {
Node firstNode = graphDb.createNode();
firstNode.setProperty("Name", "Kimura");
Node secondNode = graphDb.createNode();
secondNode.setProperty("Name", "Kato");
firstNode.createRelationshipTo(secondNode,
DynamicRelationshipType.withName("LIKE"));
tx.success();
} finally { ID: 2
tx.finish(); NAME: Kato
}
graphDb.shutdown();
}
}
78. Insert vertices,
make relationship
public final class InputVertex {
public static void main(final String[] args) { ID: 1
GraphDatabaseService graphDb = new NAME: kimura
EmbeddedGraphDatabase("/tmp/neo4j");
Transaction tx = graphDb.beginTx();
try {
Node firstNode = graphDb.createNode();
ID: 3
firstNode.setProperty("Name", "Kimura"); Relation: Like
Node secondNode = graphDb.createNode();
secondNode.setProperty("Name", "Kato");
firstNode.createRelationshipTo(secondNode,
DynamicRelationshipType.withName("LIKE"));
tx.success();
} finally { ID: 2
tx.finish(); NAME: Kato
}
graphDb.shutdown();
}
}
79. Batch Insert
⢠Non thread safe, non transaction
⢠But very fast!
public final class Batch {
public static void main(final String[] args) {
BatchInserter inserter = new BatchInserterImpl("/tmp/neo4j",
BatchInserterImpl.loadProperties("/tmp/neo4j.props"));
Map<String, Object> prop = new HashMap<String, Object>();
prop.put("Name", "Kimura");
prop.put("Age", 21);
long node1 = inserter.createNode(prop);
prop.put("Name", "Kato");
prop.put("Age", 21);
long node2 = inserter.createNode(prop);
inserter.createRelationship(node1, node2,
DynamicRelationshipType.withName("LIKE"), null);
inserter.shutdown();
}
}
80. Traversal sample
⢠You can specify the traverse criteria
public static void main(final String[] args) {
GraphDatabaseService graphDB = new EmbeddedGraphDatabase(args[0]);
Node node = graphDB.getNodeById(1);
Traverser friends = node.traverse(
Order.DEPTH_FIRST,
StopEvaluator.END_OF_GRAPH,
ReturnableEvaluator.ALL_BUT_START_NODE,
DynamicRelationshipType.withName("LIKE"),
Direction.OUTGOING);
for (Node nodeBuf : friends) {
TraversalPosition currentPosition = friends.currentPosition();
}
}
81. Traversal sample
⢠You can specify the traverse criteria
public static void main(final String[] args) {
GraphDatabaseService graphDB = new EmbeddedGraphDatabase(args[0]);
Node node = graphDB.getNodeById(1);
Traverser friends = node.traverse(
//how to traversal
Order.DEPTH_FIRST, BREADTH_FIRST
StopEvaluator.END_OF_GRAPH,
ReturnableEvaluator.ALL_BUT_START_NODE,
DynamicRelationshipType.withName("LIKE"),
Direction.OUTGOING);
for (Node nodeBuf : friends) {
TraversalPosition currentPosition = friends.currentPosition();
}
}
82. Traversal sample
⢠You can specify the traverse criteria
public static void main(final String[] args) {
GraphDatabaseService graphDB = new EmbeddedGraphDatabase(args[0]);
Node node = graphDB.getNodeById(1);
Traverser friends = node.traverse(
//how to traversal
Order.DEPTH_FIRST, BREADTH_FIRST
//traversal termination condition
StopEvaluator.END_OF_GRAPH, DEPTH_ONE
ReturnableEvaluator.ALL_BUT_START_NODE,
DynamicRelationshipType.withName("LIKE"),
Direction.OUTGOING);
for (Node nodeBuf : friends) {
TraversalPosition currentPosition = friends.currentPosition();
}
}
83. Traversal sample
⢠You can specify the traverse criteria
public static void main(final String[] args) {
GraphDatabaseService graphDB = new EmbeddedGraphDatabase(args[0]);
Node node = graphDB.getNodeById(1);
Traverser friends = node.traverse(
//how to traversal
Order.DEPTH_FIRST, BREADTH_FIRST
//traversal termination condition
StopEvaluator.END_OF_GRAPH, DEPTH_ONE
// to get the type of node
ReturnableEvaluator.ALL_BUT_START_NODE, ALL, isReturnableNode()
DynamicRelationshipType.withName("LIKE"),
Direction.OUTGOING);
for (Node nodeBuf : friends) {
TraversalPosition currentPosition = friends.currentPosition();
}
}
84. Traversal sample
⢠You can specify the traverse criteria
public static void main(final String[] args) {
GraphDatabaseService graphDB = new EmbeddedGraphDatabase(args[0]);
Node node = graphDB.getNodeById(1);
Traverser friends = node.traverse(
//how to traversal
Order.DEPTH_FIRST, BREADTH_FIRST
//traversal termination condition
StopEvaluator.END_OF_GRAPH, DEPTH_ONE
// to get the type of node
ReturnableEvaluator.ALL_BUT_START_NODE, ALL, isReturnableNode()
// type of relational for traverse
DynamicRelationshipType.withName("LIKE"),
Direction.OUTGOING);
for (Node nodeBuf : friends) {
TraversalPosition currentPosition = friends.currentPosition();
}
}
85. Traversal sample
⢠You can specify the traverse criteria
public static void main(final String[] args) {
GraphDatabaseService graphDB = new EmbeddedGraphDatabase(args[0]);
Node node = graphDB.getNodeById(1);
Traverser friends = node.traverse(
//how to traversal
Order.DEPTH_FIRST, BREADTH_FIRST
//traversal termination condition
StopEvaluator.END_OF_GRAPH, DEPTH_ONE
// to get the type of node
ReturnableEvaluator.ALL_BUT_START_NODE, ALL, isReturnableNode()
// type of relational for traverse
DynamicRelationshipType.withName("LIKE"),
// specify a edge type for traverse
Direction.OUTGOING); INCOMING, BOTH
for (Node nodeBuf : friends) {
TraversalPosition currentPosition = friends.currentPosition();
}
}
100. experiment
⢠Store the mixiâs social graph for Neo4j
⢠Condition
⢠Machine: 24 core CPU, Memory 65GB
⢠Neo4j: BatchInsert, community, embedded
⢠Data
⢠# of node 15 million # of edge 600 million
101. experiment
⢠Store the mixiâs social graph for Neo4j
⢠Condition
⢠Machine: 24 core CPU, Memory 65GB
⢠Neo4j: BatchInsert, community, embedded
⢠Data
⢠# of node 15 million # of edge 600 million
process time 513m17sec (about 8.6h)
102. Network Dataset
⢠Stanford Large Network Dataset Collection
⢠SNAP has a Wide variety of graph data!
Social Networks Communication networks
Citation networks Collaboration networks
Web graphs Product co-purchasing networks
Internet peer-to-peer networks Road networks
Autonomous systems graphs Signed networks
Wikipedia networks and metadata Memetracker and Twitter
http://snap.stanford.edu/data/index.html
110. Centrality
⢠Centrality
⢠to measure the importance of eahc nodes
closeness centrality
111. Centrality
⢠Centrality
⢠to measure the importance of eahc nodes
closeness centrality Pagerank
112. Centrality
⢠Centrality
⢠to measure the importance of eahc nodes
closeness centrality Pagerank
degree centrality
113. Centrality
⢠Centrality
⢠to measure the importance of eahc nodes
closeness centrality Pagerank
degree centrality betweenness centrality
114. Centrality
⢠Centrality
⢠to measure the importance of eahc nodes
closeness centrality Pagerank
degree centrality betweenness centrality
eigenvector centrality
115. Centrality
⢠Centrality
⢠to measure the importance of eahc nodes
closeness centrality Pagerank
degree centrality betweenness centrality
eigenvector centrality centraization
116. Centrality
⢠Centrality
⢠to measure the importance of eahc nodes
closeness centrality Pagerank
degree centralitybetweenness centrality
eigenvector centrality centraization
117. Centrality
⢠Centrality
⢠to measure the importance of eahc nodes
closeness centrality Pagerank
degree centralitybetweenness centrality
eigenvector centrality centraization
118. Degree centrality
⢠The simplest measuring.
⢠Counting the number of edge of each nodes.
⢠num of friends
119. Degree centrality
⢠The simplest measuring.
⢠Counting the number of edge of each nodes.
⢠num of friends
1 1
1
120. Degree centrality
⢠The simplest measuring.
⢠Counting the number of edge of each nodes.
⢠num of friends
2
1 1
2
1
2
121. Degree centrality
⢠The simplest measuring.
⢠Counting the number of edge of each nodes.
⢠num of friends
2
1 1
2
1
2
122. Degree centrality
⢠The simplest measuring.
⢠Counting the number of edge of each nodes.
⢠num of friends
2
1 1
5
2
1
2
123. Degree centrality
⢠The simplest measuring.
⢠Counting the number of edge of each nodes.
⢠num of friends
2
1 1
5
2
1
2
124. Degree distribution of mixi
⢠Random sampling the 1000 users
⢠the summary of degree sistribution
Min 1st Que. Median Mean 3rd Que. Max
1.00 3.00 10.00 25.69 30.00 903.00
131. Clustering coefďŹcient
⢠Random sampling the 1000 users
⢠summary for Clustering coefďŹcient
Min 1st Que. Median Mean 3rd Que. Max
0.00 0.00 0.1157 0.2071 0.2667 1.000
139. ⢠Visualize a my social graph on mixi
⢠Weighting the Edge
⢠Amount of communication(color, thickness)
⢠Weighting the Vertex
⢠cluster coefďŹcient(color, thickness)
⢠visualization tool Gephi
http://gephi.org/
140.
141. ⢠Motivation for Social Graph mining
⢠Overview for GraphDB
⢠Introduction for Neo4j
⢠The samples for graph analysis with R
⢠Introduction Visualization tool Gephi