DATA DAY T E X A S
Networks All Around Us: Discovering Networks in your Domain | 1/5/2015
Russell Jurney
RELATO
MAPS
MARKET
BACKGROUND
Serial Entrepreneur Contributed code to Apache Druid, Apache Pig, Apache DataFu,
Apache Whirr, Azkaban, MongoDB
Apache Commi?er
Three-Bme O'Reilly Author Started & Shipped Product at E8 Security
Ning, LinkedIn, Hortonworks veteran
2009 2010 2011
2012 2014
EXAMPLES OF NETWORKS
FOUNDER
NETWORKS
node = company
edge = employment transition as in people who…
…worked at one startup, founded another
WEBSITE
BEHAVIOR
node = web page
edge = user browses one page, then another
ONLINE
SOCIAL
NETWORKS
node = linkedin profile, edge = linked connection
EMAIL
INBOX
node = email address, edge = sent email
MARKETS
node = company, edge = partnership
TYPES OF NETWORKS
TINKERPOP
“Marko Rodriguez is the Doug Cutting of graph analytics.”
—Mark Twain
PROPERTY
GRAPHS
A PROPERTY
GRAPH IN
EVERY
DATABASE
PROPERTY GRAPHS IN YOUR DOMAIN
identify entities
identify relationships
specify schema (or not)
populate graph database
learn to think in graph walks
query in batch
query in realtime
POPULATING A PROPERTY GRAPH
// Add nodes
while((json = company_reader.readLine()) != null)
{
document = jsonSlurper.parseText(json)
v = graph.addVertex('company')
v.property("_id", document._id)
v.property("domain", document.domain)
v.property("name", document.name)
}
POPULATING A PROPERTY GRAPH
// Get a graph traverser
g = graph.traversal()
while((json = links_reader.readLine()) != null)
{
document = jsonSlurper.parseText(json)
// Add edges to graph
v1 = g.V().has('domain', document.home_domain).next()
v2 = g.V().has('domain', document.link_domain).next()
v1.addEdge(document.type, v2)
}
TOOLS OF
SNA
SNA = Social Network Analysis
centrality
clustering
block models
cores
dispersion
CENTRALITY
Centrality is a way of measuring how central or important a particular
node is in a social network.
OR
What nodes should I care about?
SINGLE-RELATIONAL CENTRALITY(S)
# all-links-the-same-type-centrality
g.V().out().groupCount()
# things-humans-walk-centrality
g.V().hasLabel(‘human’).out(‘walks’).groupCount()
# things-dogs-eat-centrality
g.V().hasLabel(‘dog’).out(‘eats’).groupCount()
MULTI-RELATIONAL CENTRALITY(S)
# things-eaten-by-things-humans-walk-centrality
g.V().hasLabel(‘human’).out(‘walks’).out(‘eats’).groupCount()
# things-hated-by-things-humans-pet-centrality
g.V().hasLabel(‘human’).out(‘pets’).out(‘hates’).groupCount()
# things-that-pet-things-that-eat-mice-centrality
g.V().in(‘eats’).in(‘pets’).groupCount()
CENTRALITIES
degree centrality
closeness centrality
betweenness centrality
eigenvector centrality
DEGREE CENTRALITY
in-degree centrality is nice…
it works even if you’re missing
a node’s outbound links
DEGREE CENTRALITY
# computation
count connections
…its that simple
in-degree centrality = popularity
out-degree centrality = gregariousness
# meaning
risk of catching cold
CLOSENESS CENTRALITY
# computation
count hops of all shortest paths
distance from all other nodes
reciprocal of farness
# meaning
communication efficiency
spread of information
BETWEENNESS CENTRALITY
# computation
count of times node appears in shortest paths…
…between all pairs of nodes
# meaning
control of communication between other nodes
EIGENVECTOR CENTRALITY
# computation
counts connections of connected nodes
more connected neighbors matter more
# meaning
influence of one node on others
pagerank is an eigenvector centrality
CLUSTERING
CLUSTERING
property based clustering: k-means
graph based clustering: modularity
property graph based clustering: CESNA
BLOCK MODELS
how much do clusters
connect?
are links reciprocal?
circos are helpful
CORES
DISPERSION
Romantic Partnerships and the Dispersion of Social Ties:
A Network Analysis of Relationship Status on Facebook
Russell Jurney, CEO
rjurney@relato.io
twi?er.com/rjurney
404-317-3620

Networks All Around Us: Extracting networks from your problem domain