This talk covers a basic intro of graphs, NOSQL and graph databases, followed b a number of domain examples and case studies, and a section on how graph databases can be interesting in the domain of insurance companies.
8. Q: What are graphs good for?
4
Thursday, April 19, 12
9. Q: What are graphs good for?
A: highly connected data
๏ Recommendations
๏ Business intelligence
๏ Social computing
๏ Geospatial
๏ MDM
๏ Systems management
๏ Genealogy
4
Thursday, April 19, 12
10. Q: What are graphs good for?
A: highly connected data
๏ Recommendations • Real Use Cases:
๏ Business intelligence • [A] ACL from Hell
๏ Social computing • [B] Timely recommendations
• [C] Global collaboration
๏ Geospatial
๏ MDM
๏ Systems management
๏ Genealogy
4
Thursday, April 19, 12
12. Trends in BigData & NOSQL
๏ 1. increasing data size (big data)
5
Thursday, April 19, 12
13. Trends in BigData & NOSQL
๏ 1. increasing data size (big data)
• “Every 2 -days we create as much information as we did up to
2003” Eric Schmidt
5
Thursday, April 19, 12
14. Trends in BigData & NOSQL
๏ 1. increasing data size (big data)
• “Every 2 -days we create as much information as we did up to
2003” Eric Schmidt
๏ 2. increasingly connected data (graph data)
5
Thursday, April 19, 12
15. Trends in BigData & NOSQL
๏ 1. increasing data size (big data)
• “Every 2 -days we create as much information as we did up to
2003” Eric Schmidt
๏ 2. increasingly connected data (graph data)
• for example, text documents to html
5
Thursday, April 19, 12
16. Trends in BigData & NOSQL
๏ 1. increasing data size (big data)
• “Every 2 -days we create as much information as we did up to
2003” Eric Schmidt
๏ 2. increasingly connected data (graph data)
• for example, text documents to html
๏ 3. semi-structured data
5
Thursday, April 19, 12
17. Trends in BigData & NOSQL
๏ 1. increasing data size (big data)
• “Every 2 -days we create as much information as we did up to
2003” Eric Schmidt
๏ 2. increasingly connected data (graph data)
• for example, text documents to html
๏ 3. semi-structured data
• individualization of data, with common sub-set
5
Thursday, April 19, 12
18. Trends in BigData & NOSQL
๏ 1. increasing data size (big data)
• “Every 2 -days we create as much information as we did up to
2003” Eric Schmidt
๏ 2. increasingly connected data (graph data)
• for example, text documents to html
๏ 3. semi-structured data
• individualization of data, with common sub-set
๏ 4. architecture - a facade over multiple services
5
Thursday, April 19, 12
19. Trends in BigData & NOSQL
๏ 1. increasing data size (big data)
• “Every 2 -days we create as much information as we did up to
2003” Eric Schmidt
๏ 2. increasingly connected data (graph data)
• for example, text documents to html
๏ 3. semi-structured data
• individualization of data, with common sub-set
๏ 4. architecture - a facade over multiple services
• from monolithic to modular, distributed applications
5
Thursday, April 19, 12
21. Key-Value Category
๏ “Dynamo: Amazon’s Highly Available Key-Value Store” (2007)
๏ Data model:
• Global key-value mapping
• Big scalable HashMap
• Highly fault tolerant (typically)
๏ Examples:
• Riak, Redis,Voldemort
7
Thursday, April 19, 12
22. Key-Value: Pros & Cons
๏ Strengths
• Simple data model
• Great at scaling out horizontally
• Scalable
• Available
๏ Weaknesses:
• Simplistic data model
• Poor for complex data
8
Thursday, April 19, 12
23. Column-Family Category
๏ Google’s “Bigtable: A Distributed Storage System for Structured
Data” (2006)
• Column-Family are essentially Big Table clones
๏ Data model:
• A big table, with column families
• Map-reduce for querying/processing
๏ Examples:
• HBase, HyperTable, Cassandra
9
Thursday, April 19, 12
24. Column-Family: Pros & Cons
๏ Strengths
• Data model supports semi-structured data
• Naturally indexed (columns)
• Good at scaling out horizontally
๏ Weaknesses:
• Unsuited for interconnected data
10
Thursday, April 19, 12
25. Document Database Category
๏ Data model
• Collections of documents
• A document is a key-value collection
• Index-centric, lots of map-reduce
๏ Examples
• CouchDB, MongoDB
11
Thursday, April 19, 12
26. Document Database: Pros & Cons
๏ Strengths
• Simple, powerful data model (just like SVN!)
• Good scaling (especially if sharding supported)
๏ Weaknesses:
• Unsuited for interconnected data
• Query model limited to keys (and indexes)
• Map reduce for larger queries
12
Thursday, April 19, 12
27. Graph Database Category
๏ Data model:
• Nodes & Relationships
• Hypergraph, sometimes (edges with multiple endpoints)
๏ Examples:
• Neo4j (of course), OrientDB, InfiniteGraph, AllegroGraph
13
Thursday, April 19, 12
28. Living in a NOSQL World
Complexity
Size
14
Thursday, April 19, 12
29. Living in a NOSQL World
Complexity
RDBMS
Size
14
Thursday, April 19, 12
30. Living in a NOSQL World
Complexity
RDBMS Key-Value
Store
Size
14
Thursday, April 19, 12
31. Living in a NOSQL World
Complexity
Column
Family
RDBMS Key-Value
Store
Size
14
Thursday, April 19, 12
32. Living in a NOSQL World
Complexity
Document
Databases
Column
Family
RDBMS Key-Value
Store
Size
14
Thursday, April 19, 12
33. Living in a NOSQL World
Complexity
Graph
Databases
Document
Databases
Column
Family
RDBMS Key-Value
Store
Size
14
Thursday, April 19, 12
34. Living in a NOSQL World
Complexity
Graph
Databases
Document
Databases
Column
Family
RDBMS Key-Value
Store
90% Size
of
use
cases
14
Thursday, April 19, 12
37. Graph Database: Pros & Cons
๏ Strengths
• Powerful data model, as general as RDBMS
15
Thursday, April 19, 12
38. Graph Database: Pros & Cons
๏ Strengths
• Powerful data model, as general as RDBMS
• Fast, for connected data
15
Thursday, April 19, 12
39. Graph Database: Pros & Cons
๏ Strengths
• Powerful data model, as general as RDBMS
• Fast, for connected data
• Easy to query
15
Thursday, April 19, 12
40. Graph Database: Pros & Cons
๏ Strengths
• Powerful data model, as general as RDBMS
• Fast, for connected data
• Easy to query
๏ Weaknesses:
15
Thursday, April 19, 12
41. Graph Database: Pros & Cons
๏ Strengths
• Powerful data model, as general as RDBMS
• Fast, for connected data
• Easy to query
๏ Weaknesses:
• Sharding (though they can scale reasonably well)
15
Thursday, April 19, 12
42. Graph Database: Pros & Cons
๏ Strengths
• Powerful data model, as general as RDBMS
• Fast, for connected data
• Easy to query
๏ Weaknesses:
• Sharding (though they can scale reasonably well)
‣also, stay tuned for developments here
15
Thursday, April 19, 12
43. Graph Database: Pros & Cons
๏ Strengths
• Powerful data model, as general as RDBMS
• Fast, for connected data
• Easy to query
๏ Weaknesses:
• Sharding (though they can scale reasonably well)
‣also, stay tuned for developments here
• Requires conceptual shift
15
Thursday, April 19, 12
44. Graph Database: Pros & Cons
๏ Strengths
• Powerful data model, as general as RDBMS
• Fast, for connected data
• Easy to query
๏ Weaknesses:
• Sharding (though they can scale reasonably well)
‣also, stay tuned for developments here
• Requires conceptual shift
‣though graph-like thinking becomes addictive
15
Thursday, April 19, 12
46. Some well-known named graphs
17
see http://en.wikipedia.org/wiki/Gallery_of_named_graphs
Thursday, April 19, 12
47. Some well-known named graphs
diamond
17
see http://en.wikipedia.org/wiki/Gallery_of_named_graphs
Thursday, April 19, 12
48. Some well-known named graphs
diamond butterfly
17
see http://en.wikipedia.org/wiki/Gallery_of_named_graphs
Thursday, April 19, 12
49. Some well-known named graphs
diamond butterfly star
17
see http://en.wikipedia.org/wiki/Gallery_of_named_graphs
Thursday, April 19, 12
50. Some well-known named graphs
diamond butterfly star bull
17
see http://en.wikipedia.org/wiki/Gallery_of_named_graphs
Thursday, April 19, 12
51. Some well-known named graphs
diamond butterfly star bull
franklin
17
see http://en.wikipedia.org/wiki/Gallery_of_named_graphs
Thursday, April 19, 12
52. Some well-known named graphs
diamond butterfly star bull
franklin robertson
17
see http://en.wikipedia.org/wiki/Gallery_of_named_graphs
Thursday, April 19, 12
53. Some well-known named graphs
diamond butterfly star bull
franklin robertson horton
17
see http://en.wikipedia.org/wiki/Gallery_of_named_graphs
Thursday, April 19, 12
54. Some well-known named graphs
diamond butterfly star bull
franklin robertson horton hall-janko
17
see http://en.wikipedia.org/wiki/Gallery_of_named_graphs
Thursday, April 19, 12
72. Cypher
๏ a pattern-matching query language
๏ declarative grammar with clauses (like SQL)
๏ aggregation, ordering, limits
๏ tabular results
22
Thursday, April 19, 12
73. Cypher
๏ a pattern-matching query language
๏ declarative grammar with clauses (like SQL)
๏ aggregation, ordering, limits
๏ tabular results
// get node with id 0
start a=node(0) return a
// traverse from node 1
start a=node(1) match (a)-->(b) return b
// return friends of friends
start a=node(1) match (a)--()--(c) return c
22
Thursday, April 19, 12
74. Neo4j - the Graph Database
23
Thursday, April 19, 12
75. Background of Neo4j
๏ 2001 - Windh Technologies, a media asset management company
• CTO Peter with Emil, Johan prototyped a proper graph interface
• first SQL-backed, then revised as a full-stack implementation
• (just like Amazon-Dynamo, Facebook-Cassandra)
๏ 2003 Neo4j went into 24/7 production
๏ 2006-2007 - Neo4j was spun off as an open source project
๏ 2009 seed funding for the company
๏ 2010 Neo4j Server was created (previously only an embedded DB)
๏ 2011 Fully funded silicon valley start-up - Neo Technology
24
Thursday, April 19, 12
76. Neo4j is a Graph Database
25
Thursday, April 19, 12
77. Neo4j is a Graph Database
๏ A Graph Database:
25
Thursday, April 19, 12
78. Neo4j is a Graph Database
๏ A Graph Database:
• a Property Graph with Nodes, Relationships
and Properties on both
25
Thursday, April 19, 12
79. Neo4j is a Graph Database
๏ A Graph Database:
• a Property Graph with Nodes, Relationships
and Properties on both
• perfect for complex, highly connected data
25
Thursday, April 19, 12
80. Neo4j is a Graph Database
๏ A Graph Database:
• a Property Graph with Nodes, Relationships
and Properties on both
• perfect for complex, highly connected data
๏ A Graph Database:
25
Thursday, April 19, 12
81. Neo4j is a Graph Database
๏ A Graph Database:
• a Property Graph with Nodes, Relationships
and Properties on both
• perfect for complex, highly connected data
๏ A Graph Database:
• reliable with real ACID Transactions
25
Thursday, April 19, 12
82. Neo4j is a Graph Database
๏ A Graph Database:
• a Property Graph with Nodes, Relationships
and Properties on both
• perfect for complex, highly connected data
๏ A Graph Database:
• reliable with real ACID Transactions
• scalable: 32 Billion Nodes, 32 Billion Relationships, 64 Billion
Properties
25
Thursday, April 19, 12
83. Neo4j is a Graph Database
๏ A Graph Database:
• a Property Graph with Nodes, Relationships
and Properties on both
• perfect for complex, highly connected data
๏ A Graph Database:
• reliable with real ACID Transactions
• scalable: 32 Billion Nodes, 32 Billion Relationships, 64 Billion
Properties
• Server with REST API, or Embeddable on the JVM
25
Thursday, April 19, 12
84. Neo4j is a Graph Database
๏ A Graph Database:
• a Property Graph with Nodes, Relationships
and Properties on both
• perfect for complex, highly connected data
๏ A Graph Database:
• reliable with real ACID Transactions
• scalable: 32 Billion Nodes, 32 Billion Relationships, 64 Billion
Properties
• Server with REST API, or Embeddable on the JVM
• high-performance with High-Availability (read scaling) 25
Thursday, April 19, 12
86. Q: What are graphs good for?
27
Thursday, April 19, 12
87. Q: What are graphs good for?
A: highly connected data
๏ Recommendations
๏ Business intelligence
๏ Social computing
๏ Geospatial
๏ MDM
๏ Systems management
๏ Genealogy
27
Thursday, April 19, 12
88. Q: What are graphs good for?
A: highly connected data
๏ Recommendations • Real Use Cases:
๏ Business intelligence • [A] ACL from Hell
๏ Social computing • [B] Timely recommendations
• [C] Global collaboration
๏ Geospatial
๏ MDM
๏ Systems management
๏ Genealogy
27
Thursday, April 19, 12
90. [A] ACL from Hell
๏ Customer: leading consumer utility company with tons
and tons of users
๏ Goal: comprehensive access control administration for
customers
๏ Benefits:
• Flexible and dynamic architecture
• Exceptional performance
• Extensible data model supports new applications
and features
• Low cost
28
Thursday, April 19, 12
91. [A] ACL from Hell
๏ Customer: leading consumer utility company with tons
and tons of users • A Reliable access control administration system for
5 million customers, subscriptions and agreements
๏ Goal: comprehensive access control administration for
customers • Complex dependencies between groups, companies,
individuals, accounts, products, subscriptions, services and
๏ Benefits: agreements
• Flexible and dynamic architecture • Broad and deep graphs (master customers with 1000s of
customers, subscriptions & agreements)
• Exceptional performance
• Extensible data model supports new applications
and features
• Low cost
28
Thursday, April 19, 12
92. [A] ACL from Hell
๏ Customer: leading consumer utility company with tons
and tons of users • A Reliable access control administration system for
5 million customers, subscriptions and agreements
๏ Goal: comprehensive access control administration for
customers • Complex dependencies between groups, companies,
individuals, accounts, products, subscriptions, services and
๏ Benefits: agreements
• Flexible and dynamic architecture • Broad and deep graphs (master customers with 1000s of
customers, subscriptions & agreements)
• Exceptional performance
• Extensible data model supports new applications name: Andreas works with
company: Neo
Technology
and features owns
member of gets discount on
• Low cost
account: 9758352794
subscribes to
has plan
agreement: ultimate
includes
subscription: sports provides group: graphistas
discounts service: NFL
promotion: fall includes
offered
subscribes to
provides service: Ravens
subscription: local
28
Thursday, April 19, 12
95. [B] Timely Recommendations
๏ Customer: a professional social network
• 35 millions users, adding 30,000+ each day
๏ Goal: up-to-date recommendations
• Scalable solution with real-time end-user
experience
• Low maintenance and reliable architecture
• 8-week implementation
30
Thursday, April 19, 12
96. [B] Timely Recommendations
๏ Problem:
๏ Customer: a professional social network
• 35 millions users, adding 30,000+ each day • Real-time recommendation imperative to attract new
users and maintain positive user retention
๏ Goal: up-to-date recommendations
• Clustered MySQL solution not scalable or fast enough
• Scalable solution with real-time end-user to support real-time requirements
experience ๏ Upgrade from running a batch job
• Low maintenance and reliable architecture
• initial hour-long batch job
• 8-week implementation
• but then success happened, and it became a day
• then two days
๏ With Neo4j, real time recommendations
30
Thursday, April 19, 12
97. [B] Timely Recommendations
๏ Problem:
๏ Customer: a professional social network
• 35 millions users, adding 30,000+ each day • Real-time recommendation imperative to attract new
users and maintain positive user retention
๏ Goal: up-to-date recommendations
• Clustered MySQL solution not scalable or fast enough
• Scalable solution with real-time end-user to support real-time requirements
experience ๏ Upgrade from running a batch job
• Low maintenance and reliable architecture
• initial hour-long batch job
• 8-week implementation
• but then success happened, and it became a day
• then two days
name:Andreas
job: talking ๏ With Neo4j, real time recommendations
knows
knows name: Tobias
job: coding
name: Stephen
knows job: DJ
knows knows
knows
name: Peter
job: building
name: Delia name: Tiberius
job: barking knows job: dancer
knows
name: Emil
knows
job: plumber
name: Allison
job: plumber
knows
knows
30
Thursday, April 19, 12
99. [C] Collaboration on Global Scale
๏ Customer: a worldwide software leader • Highly flexible data analysis
• highly collaborative end-users • Sub-second results for large, densely-connected data
๏ Goal: offer an online platform for global collaboration • User experience - competitive advantage
31
Thursday, April 19, 12
100. [C] Collaboration on Global Scale
๏ Customer: a worldwide software leader • Highly flexible data analysis
• highly collaborative end-users • Sub-second results for large, densely-connected data
๏ Goal: offer an online platform for global collaboration • User experience - competitive advantage
• Massive amounts of data tied to members, user
groups, member content, etc. all interconnected
• Infer collaborative relationships through user-
generated content
• Worldwide Availability
31
Thursday, April 19, 12
101. [C] Collaboration on Global Scale
๏ Customer: a worldwide software leader • Highly flexible data analysis
• highly collaborative end-users • Sub-second results for large, densely-connected data
๏ Goal: offer an online platform for global collaboration • User experience - competitive advantage
• Massive amounts of data tied to members, user
groups, member content, etc. all interconnected
• Infer collaborative relationships through user-
generated content
• Worldwide Availability
Asia North America Europe
31
Thursday, April 19, 12
102. [C] Collaboration on Global Scale
๏ Customer: a worldwide software leader • Highly flexible data analysis
• highly collaborative end-users • Sub-second results for large, densely-connected data
๏ Goal: offer an online platform for global collaboration • User experience - competitive advantage
• Massive amounts of data tied to members, user
groups, member content, etc. all interconnected
• Infer collaborative relationships through user-
generated content
• Worldwide Availability
Asia North America Europe
Asia North America Europe
31
Thursday, April 19, 12
105. Q: Why should you care?
A: because you have connected data.
33
Thursday, April 19, 12
106. Q: Why should you care?
A: because you have connected data.
๏ CRM, BI, social graphs
33
Thursday, April 19, 12
107. Q: Why should you care?
A: because you have connected data.
๏ CRM, BI, social graphs
๏ GeoSpatial analytics
33
Thursday, April 19, 12
108. Q: Why should you care?
A: because you have connected data.
๏ CRM, BI, social graphs
๏ GeoSpatial analytics
๏ Fraud detection
33
Thursday, April 19, 12
109. Q: Why should you care?
A: because you have connected data.
๏ CRM, BI, social graphs
๏ GeoSpatial analytics
๏ Fraud detection
๏ Network management
33
Thursday, April 19, 12
111. A sample insurance domain setup
Home
sub_product
concerns owns
Building A
Questionaire: fills_in Customer: C12
attribute
covered_by Q1
contains_question
Size: 120m2
Coverage:
Super
T&C: X is_offered
risk_has_attr
covers_risk
sub_cover has
Risk: Building
small is_offered
includes signed
Coverage: Fire includes Quote: C12
based_on
made
Policy: C12
made
User: U34
contains
owns
Agreement:
C12
34
Thursday, April 19, 12
113. Recommendations, BI, Social Computing
๏ enrich your CRM with data from Facebook, Google, Twitter etc
35
Thursday, April 19, 12
114. Recommendations, BI, Social Computing
๏ enrich your CRM with data from Facebook, Google, Twitter etc
๏ Recommender systems for products
35
Thursday, April 19, 12
115. Recommendations, BI, Social Computing
๏ enrich your CRM with data from Facebook, Google, Twitter etc
๏ Recommender systems for products
๏ Find influencers in your customer base for special treatment
35
Thursday, April 19, 12
116. This is what your CRM sees
Customer1
Peter Neubauer
36
http://inmaps.linkedinlabs.com/network
Thursday, April 19, 12
117. This is what your CRM doesn’t see.
37
http://inmaps.linkedinlabs.com/network
Thursday, April 19, 12
118. This is what your CRM doesn’t see.
37
http://inmaps.linkedinlabs.com/network
Thursday, April 19, 12
120. Geospatial features
๏ Dynamic layers from different sources
38
Thursday, April 19, 12
121. Geospatial features
๏ Dynamic layers from different sources
• domainstandardflood area layer + crime index + firestation +
living
data ->
index
38
Thursday, April 19, 12
122. Geospatial features
๏ Dynamic layers from different sources
• domainstandardflood area layer + crime index + firestation +
living
data ->
index
๏ routes of low insurance risks
38
Thursday, April 19, 12