RDBMS	
  to	
  Graphs	
  
Harnessing	
  the	
  Power	
  of	
  the	
  Graph	
  
September	
  2015	
  
Ryan	
  Boyd	
  
@ryguyrg	
  
Agenda	
  
•  Origins	
  of	
  Neo4j	
  
•  Benefits	
  of	
  Graphs	
  
•  Designing	
  your	
  Graph	
  Model	
  
•  Query	
  <me!	
  
•  Fi@ng	
  Neo4j	
  into	
  your	
  Enterprise	
  Architecture	
  	
  
•  Q&A	
  
Neo	
  Technology	
  Overview	
  
Product	
  
• Neo4j	
  -­‐	
  World’s	
  leading	
  graph	
  
database	
  
• 150+	
  enterprise	
  subscrip<on	
  
customers	
  including	
  over	
  	
  
50	
  of	
  the	
  Global	
  2000	
  
Company	
  
• Neo	
  Technology,	
  Creator	
  of	
  Neo4j	
  
• 100	
  employees	
  with	
  HQ	
  in	
  Silicon	
  
Valley,	
  London,	
  Munich,	
  Paris	
  and	
  
Malmö	
  
• $45M	
  in	
  funding	
  
Neo4j	
  AdopDon	
  by	
  Selected	
  VerDcals	
  
Financial

Services
 Communications
Health &

Life Sciences
HR &

Recruiting
Media &

Publishing
Social

Web
Industry 

& Logistics
Entertainment
 Consumer Retail
 Information Services
Business Services
How	
  Customers	
  Use	
  Neo4j	
  
Network &
Data Center 
Master Data

Management
Social
 Recom–
mendations
Identity
& Access
Search &

Discovery
 GEO
“Forrester	
  es<mates	
  that	
  over	
  25%	
  of	
  enterprises	
  will	
  be	
  using	
  
graph	
  databases	
  by	
  2017”	
  
Neo4j	
  Leads	
  the	
  Graph	
  Database	
  RevoluDon	
  
“Neo4j	
  is	
  the	
  current	
  market	
  leader	
  in	
  graph	
  databases.”	
  
“Graph	
  analysis	
  is	
  possibly	
  the	
  single	
  most	
  effecDve	
  compeDDve	
  
differenDator	
  for	
  organiza<ons	
  pursuing	
  data-­‐driven	
  opera<ons	
  
and	
  decisions	
  aaer	
  the	
  design	
  of	
  data	
  capture.”	
  
IT	
  Market	
  Clock	
  for	
  Database	
  Management	
  Systems,	
  2014	
  
hbps://www.gartner.com/doc/2852717/it-­‐market-­‐clock-­‐database-­‐management	
  
TechRadar™:	
  Enterprise	
  DBMS,	
  Q1	
  2014	
  
hbp://www.forrester.com/TechRadar+Enterprise+DBMS+Q1+2014/fulltext/-­‐/E-­‐RES106801	
  
Graph	
  Databases	
  –	
  and	
  Their	
  PotenDal	
  to	
  Transform	
  How	
  We	
  Capture	
  Interdependencies	
  (Enterprise	
  Management	
  Associates)	
  
hbp://blogs.enterprisemanagement.com/dennisdrogseth/2013/11/06/graph-­‐databasesand-­‐poten<al-­‐transform-­‐capture-­‐interdependencies/	
  
High	
  Business	
  Value	
  in	
  Data	
  RelaDonships	
  
Data	
  is	
  increasing	
  in	
  volume…	
  
•  New	
  digital	
  processes	
  
•  More	
  online	
  transac<ons	
  
•  New	
  social	
  networks	
  
•  More	
  devices	
  
Using	
  Data	
  RelaDonships	
  unlocks	
  value	
  	
  
•  Real-­‐<me	
  recommenda<ons	
  
•  Fraud	
  detec<on	
  
•  Master	
  data	
  management	
  
•  Network	
  and	
  IT	
  opera<ons	
  
•  Iden<ty	
  and	
  access	
  management	
  
•  Graph-­‐based	
  search	
  …	
  and	
  is	
  ge[ng	
  more	
  connected	
  
Customers,	
  products,	
  processes,	
  
devices	
  interact	
  and	
  relate	
  to	
  
each	
  other	
  
	
  
Early	
  adopters	
  became	
  industry	
  leaders	
  
RelaDonal	
  DBs	
  Can’t	
  Handle	
  RelaDonships	
  Well	
  
•  Cannot	
  model	
  or	
  store	
  data	
  and	
  rela>onships	
  
without	
  complexity	
  
•  Performance	
  degrades	
  with	
  number	
  and	
  levels	
  
of	
  rela<onships,	
  and	
  database	
  size	
  
•  Query	
  complexity	
  grows	
  with	
  need	
  for	
  JOINs	
  
•  Adding	
  new	
  types	
  of	
  	
  data	
  and	
  rela>onships	
  
requires	
  schema	
  redesign,	
  increasing	
  <me	
  to	
  
market	
  
…	
  making	
  tradi<onal	
  databases	
  inappropriate	
  
when	
  data	
  rela<onships	
  are	
  valuable	
  in	
  real-­‐Dme	
  
	
  
Slow	
  development	
  
Poor	
  performance	
  
Low	
  scalability	
  
Hard	
  to	
  maintain	
  
Modeling	
  as	
  a	
  Graph	
  
The	
  Whiteboard	
  Model	
  Is	
  the	
  Physical	
  Model	
  
CAR	
  
name:	
  “Dan”	
  
born:	
  May	
  29,	
  1970	
  
twiber:	
  “@dan”	
  
name:	
  “Ann”	
  
born:	
  	
  Dec	
  5,	
  1975	
  
since:	
  	
  
Jan	
  10,	
  2011	
  
brand:	
  “Volvo”	
  
model:	
  “V70”	
  
Property	
  Graph	
  Model	
  Components	
  
Nodes	
  
•  The	
  objects	
  in	
  the	
  graph	
  
•  Can	
  have	
  name-­‐value	
  proper&es	
  
•  Can	
  be	
  labeled	
  
RelaDonships	
  
•  Relate	
  nodes	
  by	
  type	
  and	
  direc<on	
  
•  Can	
  have	
  name-­‐value	
  proper&es	
  
LOVES	
  
LOVES	
  
LIVES	
  WITH	
  
PERSON	
   PERSON	
  
RelaDonal	
  Versus	
  Graph	
  Models	
  
RelaDonal	
  Model	
   Graph	
  Model	
  
KNOWS	
  
ANDREAS	
  
TOBIAS	
  
MICA	
  
DELIA	
  
Person	
   Friend	
  Person-­‐Friend	
  
ANDREAS	
  
DELIA	
  
TOBIAS	
  
MICA	
  
Let’s	
  Model!	
  
	
  
Customer,	
  Supplier,	
  and	
  Product	
  (Master	
  Data)	
  
Orders	
  (AcDvity)	
  
The	
  Domain	
  Model	
  
Except…	
  
Northwind	
  Example!	
  
	
  
The	
  QuintessenDal	
  
Northwind	
  Example!	
  
	
  
NOT	
  JUST	
  ANY	
  
(Northwind)-­‐[:TO]-­‐>(Graph)	
  
Building	
  the	
  Graph	
  Model	
  
Building	
  RelaDonships	
  in	
  Graphs	
  
SOLD	
  
Employee	
   Order	
  Order	
  
Locate	
  Foreign	
  Keys	
  
(FKs)-­‐[:BECOME]-­‐>(RelaDonships)	
  
Correct	
  DirecDons	
  
Simple	
  Join	
  Tables	
  Becomes	
  RelaDonships	
  
Afributed	
  Join	
  Tables	
  Become	
  
RelaDonships	
  with	
  ProperDes	
  
Working	
  Subset	
  (Today’s	
  Exercise)	
  
Northwind	
  Graph	
  Model	
  
Querying	
  Your	
  Data	
  
Basic	
  Query:	
  Who	
  do	
  people	
  report	
  to?	
  
MATCH	
  (:Employee{	
  firstName:“Steven”}	
  )	
  -­‐[:REPORTS_TO]-­‐>	
  (:Employee{	
  firstName:“Andrew”}	
  )	
  	
  
REPORTS_TO	
  
Steven	
   Andrew	
  
LABEL	
   PROPERTY	
  
NODE	
   NODE	
  
LABEL	
   PROPERTY	
  
Basic	
  Query:	
  Who	
  do	
  people	
  report	
  to?	
  
MATCH !
(e:Employee)<-[:REPORTS_TO]-(sub:Employee)!
RETURN !
*!
Basic	
  Query:	
  Who	
  do	
  people	
  report	
  to?	
  
Basic	
  Query:	
  Who	
  do	
  people	
  report	
  to?	
  
Real	
  Query	
  from	
  a	
  Customer	
  
Find	
  all	
  direct	
  reports	
  and	
  	
  
how	
  many	
  people	
  they	
  manage,	
  	
  
each	
  up	
  to	
  3	
  levels	
  down	
  
(SELECT T.directReportees AS directReportees, sum(T.count) AS count
FROM (
SELECT manager.pid AS directReportees, 0 AS count
FROM person_reportee manager
WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName")
UNION
SELECT manager.pid AS directReportees, count(manager.directly_manages) AS
count
FROM person_reportee manager
WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName")
GROUP BY directReportees
UNION
SELECT manager.pid AS directReportees, count(reportee.directly_manages) AS
count
FROM person_reportee manager
JOIN person_reportee reportee
ON manager.directly_manages = reportee.pid
WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName")
GROUP BY directReportees
UNION
SELECT manager.pid AS directReportees, count(L2Reportees.directly_manages)
AS count
FROM person_reportee manager
JOIN person_reportee L1Reportees
ON manager.directly_manages = L1Reportees.pid
JOIN person_reportee L2Reportees
ON L1Reportees.directly_manages = L2Reportees.pid
WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName")
GROUP BY directReportees
) AS T
GROUP BY directReportees)
UNION
(SELECT T.directReportees AS directReportees, sum(T.count) AS count
FROM (
SELECT manager.directly_manages AS directReportees, 0 AS count
FROM person_reportee manager
WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName")
UNION
SELECT reportee.pid AS directReportees,
count(reportee.directly_manages) AS count
FROM person_reportee manager
	
  
JOIN person_reportee reportee
ON manager.directly_manages = reportee.pid
WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lN
GROUP BY directReportees
UNION
SELECT depth1Reportees.pid AS directReportees,
count(depth2Reportees.directly_manages) AS count
FROM person_reportee manager
JOIN person_reportee L1Reportees
ON manager.directly_manages = L1Reportees.pid
JOIN person_reportee L2Reportees
ON L1Reportees.directly_manages = L2Reportees.pid
WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lN
GROUP BY directReportees
) AS T
GROUP BY directReportees)
UNION
(SELECT T.directReportees AS directReportees, sum(T.count) AS count
OUTER UNIONS
FROM(
SELECT reportee.directly_manages AS directReportees, 0 AS count
FROM person_reportee manager
JOIN person_reportee reportee
ON manager.directly_manages = reportee.pid
WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lNam
GROUP BY directReportees
count
FROM person_reportee manager
JOIN person_reportee L1Reportees
ON manager.directly_manages = L1Reportees.pid
JOIN person_reportee L2Reportees
ON L1Reportees.directly_manages = L2Reportees.pid
WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lN
GROUP BY directReportees
) AS T
GROUP BY directReportees)
UNION
(SELECT L2Reportees.directly_manages AS directReportees, 0 AS count
FROM person_reportee manager
JOIN person_reportee L1Reportees
ON manager.directly_manages = L1Reportees.pid
JOIN person_reportee L2Reportees
Real	
  Query	
  from	
  a	
  Customer	
  
MATCH	
  (manager)-­‐[:REPORTS_TO*0..3]-­‐>(boss),	
  
	
  	
  	
  	
  	
  	
  (report)-­‐[:REPORTS_TO*1..3]-­‐>(manager)	
  
WHERE	
  boss.name	
  =	
  “John	
  Doe”	
  
RETURN	
  manager.name	
  AS	
  Manager,	
  	
  
	
  	
  count(report)	
  AS	
  TotalReports	
  
Find	
  all	
  direct	
  reports	
  and	
  how	
  
many	
  people	
  they	
  manage,	
  	
  
up	
  to	
  3	
  levels	
  down	
  
Cypher	
  Query	
  
Real	
  Query	
  from	
  a	
  Customer	
  
Find	
  all	
  direct	
  reports	
  and	
  how	
  
many	
  people	
  they	
  manage,	
  	
  
up	
  to	
  3	
  levels	
  down	
  
Cypher	
  Query	
  
SQL	
  Query	
  
MATCH	
  (manager)-­‐[:REPORTS_TO*0..3]-­‐>(boss),	
  
	
  	
  	
  	
  	
  	
  (report)-­‐[:REPORTS_TO*1..3]-­‐>(manager)	
  
WHERE	
  boss.name	
  =	
  “John	
  Doe”	
  
RETURN	
  manager.name	
  AS	
  Manager,	
  	
  
	
  	
  count(report)	
  AS	
  TotalReports	
  
MATCH	
  (sub)-­‐[:REPORTS_TO*0..3]-­‐>(boss),	
  
	
  	
  	
  	
  	
  	
  (report)-­‐[:REPORTS_TO*1..3]-­‐>(sub)	
  
WHERE	
  boss.name	
  =	
  “John	
  Doe”	
  
RETURN	
  sub.name	
  AS	
  Subordinate,	
  	
  
	
  	
  count(report)	
  AS	
  Total	
  
Express	
  Complex	
  Queries	
  Easily	
  with	
  Cypher	
  
Find	
  all	
  direct	
  reports	
  and	
  how	
  
many	
  people	
  they	
  manage,	
  	
  
up	
  to	
  3	
  levels	
  down	
  
Cypher	
  Query	
  
SQL	
  Query	
  
“We	
  found	
  Neo4j	
  to	
  be	
  literally	
  thousands	
  of	
  Dmes	
  faster	
  
than	
  our	
  prior	
  MySQL	
  solu<on,	
  with	
  queries	
  that	
  require	
  
10	
  to	
  100	
  Dmes	
  less	
  code.	
  Today,	
  Neo4j	
  provides	
  eBay	
  
with	
  func<onality	
  that	
  was	
  previously	
  impossible.”	
  
	
  
Volker	
  Pacher	
  
Senior	
  Developer	
  
Who	
  is	
  in	
  Robert’s	
  (direct,	
  upwards)	
  reporDng	
  chain?	
  
MATCH !
p=(e:Employee)<-[:REPORTS_TO*]-(sub:Employee)!
WHERE!
sub.firstName = ‘Robert’!
RETURN !
p!
Who	
  is	
  in	
  Robert’s	
  (direct,	
  upwards)	
  reporDng	
  chain?	
  
Who’s	
  the	
  Big	
  Boss?	
  
MATCH !
p=(e:Employee)!
WHERE!
NOT (e)-[:REPORTS_TO]->()!
RETURN !
e.firstName as bigBoss!
Who’s	
  the	
  Big	
  Boss?	
  
Product	
  Cross-­‐Sell	
  
MATCH !
(choc:Product {productName: 'Chocolade'})!
<-[:PRODUCT]-(:Order)<-[:SOLD]-(employee),!
(employee)-[:SOLD]->(o2)-[:PRODUCT]->(other:Product)!
RETURN !
employee.firstName, other.productName, count(distinct o2) as count!
ORDER BY !
count DESC!
LIMIT 5;!
Product	
  Cross-­‐Sell	
  
High	
  Performance	
  	
  
Cypher	
  vs	
  SQL	
  -­‐	
  Paths	
  
MATCH (u:User)-[:KNOWS*5..5]->(f5)
WHERE u.name = 'John'
RETURN count(f5) as size;
Cypher	
  
Find	
  Size	
  of	
  John’s	
  5th	
  degree	
  Network	
  
●  100k	
  Users	
  
●  5M	
  
Rela<onships	
  
●  Query	
  took	
  5	
  
min,	
  30s	
  
●  Returns	
  count	
  of	
  
312M	
  
	
  
Neo4j	
  config:	
  
	
  
page-­‐cache	
  =	
  512m	
  
heap	
  =	
  4G	
  
Cypher	
  vs	
  SQL	
  -­‐	
  Paths	
  
SELECT count(*)
FROM
user,
user_friend as uf1,
user_friend as uf2,
user_friend as uf3,
user_friend as uf4,
user_friend as uf5
user as f5
WHERE
user.name='John' AND
user.id = uf1.user_1 AND
uf1.user_2 = uf2.user_1 AND
uf2.user_2 = uf3.user_1 AND
uf3.user_2 = uf4.user_1 AND
uf4.user_2 = uf5.user_1 AND
uf5.user_2 = f5.id;
SQL	
  
Find	
  Size	
  of	
  John’s	
  5th	
  degree	
  Network	
  
●  100k	
  Users	
  
●  5M	
  Connec<ons	
  
●  Query	
  took	
  1hr	
  55	
  mins	
  
●  Returns	
  312M	
  
	
  
MySQL	
  config:	
  
	
  
key_buffer	
  =	
  2G	
  
join_buffer_size	
  =	
  2G	
  
Cypher	
  vs	
  SQL	
  -­‐	
  Paths	
  	
  
SELECT count(*)
FROM
user,
user_friend as uf1,
user_friend as uf2,
user_friend as uf3,
user_friend as uf4,
user_friend as uf5
WHERE
user.name='John' AND
user.id = uf1.user_1 AND
uf1.user_2 = uf2.user_1 AND
uf2.user_2 = uf3.user_1 AND
uf3.user_2 = uf4.user_1 AND
uf4.user_2 = uf5.user_1;
SQL	
  
Op>mize:	
  Only	
  count	
  on	
  JOIN	
  table	
  
●  100k	
  Users	
  
●  5M	
  Connec<ons	
  
●  Query	
  took	
  2	
  min,	
  30s	
  
●  Returns	
  count	
  of	
  312M	
  
	
  
MySQL	
  config:	
  
	
  
key_buffer	
  =	
  2G	
  
join_buffer_size	
  =	
  2G	
  
Cypher	
  vs	
  SQL	
  -­‐	
  Paths	
  
MATCH (u:User)-[:KNOWS*4..4]->(f4)
WHERE u.name = 'John'
RETURN sum(size((f4)-[:KNOWS]->()))
Cypher	
  
Op>mize:	
  Only	
  sum	
  degree	
  of	
  last	
  step	
  
●  100k	
  Users	
  
●  5M	
  
Rela<onships	
  
●  Query	
  takes	
  12	
  
sec	
  
●  Returns	
  count	
  of	
  
312M	
  
	
  
Neo4j	
  config:	
  
	
  
page-­‐cache	
  =	
  512m	
  
heap	
  =	
  4G	
  
Neo4j	
  Clustering	
  	
  
Architecture	
  OpDmized	
  for	
  Speed	
  &	
  Availability	
  at	
  Scale	
  
50
Performance	
  Benefits	
  
•  No	
  network	
  hops	
  within	
  queries	
  
•  Real-­‐>me	
  opera>ons	
  with	
  fast	
  and	
  
consistent	
  response	
  <mes	
  	
  
•  Cache	
  sharding	
  spreads	
  cache	
  across	
  
cluster	
  for	
  very	
  large	
  graphs	
  
Clustering	
  Features	
  
•  Master-­‐slave	
  replica<on	
  with	
  	
  
master	
  re-­‐elec>on	
  and	
  failover	
  	
  
•  Each	
  instance	
  has	
  its	
  own	
  local	
  cache	
  
•  Horizontal	
  scaling	
  &	
  disaster	
  recovery	
  
Load	
  Balancer	
  
Neo4j	
  Neo4j	
  Neo4j	
  
Ge[ng	
  Data	
  into	
  Neo4j	
  
Cypher-­‐Based	
  “LOAD	
  CSV”	
  Capability	
  
•  Transac<onal	
  (ACID)	
  writes	
  
•  Ini<al	
  and	
  incremental	
  loads	
  of	
  up	
  to	
  	
  
10	
  million	
  nodes	
  and	
  rela<onships	
  
Command-­‐Line	
  Bulk	
  Loader	
  	
  	
  	
  neo4j-­‐import	
  
•  For	
  ini<al	
  database	
  popula<on	
  
•  For	
  loads	
  with	
  10B+	
  records	
  
•  Up	
  to	
  1M	
  records	
  per	
  second	
  
	
  4.58	
  million	
  things	
  
and	
  their	
  rela<onships…	
  
	
  
Loads	
  in	
  100	
  seconds!	
  
MIGRATE	
  	
  
ALL	
  DATA	
  
MIGRATE	
  	
  
GRAPH	
  DATA	
  
DUPLICATE	
  
GRAPH	
  DATA	
  
Non-­‐graph	
  data	
   Graph	
  data	
  
Graph	
  data	
  All	
  data	
  
All	
  data	
  
RelaDonal	
  
Database	
  
Graph	
  
Database	
  
Applica<on	
  
Applica<on	
  
Applica<on	
  
Three	
  Ways	
  to	
  Load	
  Data	
  into	
  Neo4j	
  
Polyglot	
  Persistence	
  	
  
Data	
  Storage	
  and	
  
Business	
  Rules	
  Execu<on	
  
Data	
  Mining	
  	
  
and	
  Aggrega<on	
  
Neo4j	
  Fits	
  into	
  Your	
  Enterprise	
  Environment	
  
ApplicaDon	
  
Graph	
  Database	
  Cluster	
  
Neo4j	
   Neo4j	
   Neo4j	
  
Ad	
  Hoc	
  
Analysis	
  
Bulk	
  AnalyDc	
  
Infrastructure	
  
Graph	
  Compute	
  Engine	
  
EDW	
  	
  	
  …	
  
Data	
  
ScienDst	
  
End	
  User	
  
Databases	
  
Rela<onal	
  
NoSQL	
  
Hadoop	
  
Neo4j	
  +	
  Mongo!	
  
Users	
  Love	
  Neo4j	
  
Users	
  Love	
  Neo4j	
  
Learn	
  the	
  Way	
  of	
  the	
  Graph	
  
Quickly	
  and	
  Easily	
  
Quick	
  Start	
  in	
  1	
  minute	
  
Quick	
  Start:	
  Plan	
  Your	
  Project	
  
1	
  
2	
  
3	
  
4	
  
5	
  
6	
  
7	
  
8	
  
Learn	
  Neo4j	
  
Decide	
  on	
  Architecture	
  
Import	
  and	
  Model	
  Data	
  
Build	
  ApplicaDon	
  
Test	
  ApplicaDon	
  
Deploy	
  your	
  app	
  
in	
  as	
  lible	
  as	
  8	
  weeks	
  
PROFESSIONAL	
  SERVICES	
  PLAN	
  
There	
  Are	
  Lots	
  of	
  Ways	
  to	
  Easily	
  Learn	
  Neo4j	
  
Huge	
  Ecosystem	
  of	
  Graph	
  Enthusiasts	
  
•  1,000,000+	
  downloads	
  
•  20,000+	
  educa<on	
  registrants	
  
•  18,000+	
  Meetup	
  members	
  
•  100+	
  technology	
  and	
  service	
  partners	
  
•  150+	
  enterprise	
  subscrip<on	
  customers	
  	
  
including	
  50+	
  Global	
  2000	
  companies	
  
Get	
  Started	
  Now	
  
Summary	
  of	
  the	
  Power	
  of	
  the	
  Graph	
  
•  Take	
  rela<onships	
  and	
  connected	
  data	
  seriously	
  
•  Seriously	
  easy	
  to	
  model	
  	
  
•  Serious	
  performance	
  	
  
•  Fits	
  in	
  with	
  your	
  Enterprise	
  Architecture	
  
•  Easy	
  to	
  get	
  started	
  
•  Fast	
  to	
  reap	
  the	
  benefits	
  
RDBMS	
  to	
  Graphs	
  
Harnessing	
  the	
  Power	
  of	
  the	
  Graph	
  
Start	
  of	
  Q&A	
  
Ryan	
  Boyd	
  
@ryguyrg	
  

RDBMS to Graphs

  • 1.
    RDBMS  to  Graphs   Harnessing  the  Power  of  the  Graph   September  2015   Ryan  Boyd   @ryguyrg  
  • 2.
    Agenda   •  Origins  of  Neo4j   •  Benefits  of  Graphs   •  Designing  your  Graph  Model   •  Query  <me!   •  Fi@ng  Neo4j  into  your  Enterprise  Architecture     •  Q&A  
  • 3.
    Neo  Technology  Overview   Product   • Neo4j  -­‐  World’s  leading  graph   database   • 150+  enterprise  subscrip<on   customers  including  over     50  of  the  Global  2000   Company   • Neo  Technology,  Creator  of  Neo4j   • 100  employees  with  HQ  in  Silicon   Valley,  London,  Munich,  Paris  and   Malmö   • $45M  in  funding  
  • 4.
    Neo4j  AdopDon  by  Selected  VerDcals   Financial
 Services Communications Health &
 Life Sciences HR &
 Recruiting Media &
 Publishing Social
 Web Industry 
 & Logistics Entertainment Consumer Retail Information Services Business Services
  • 5.
    How  Customers  Use  Neo4j   Network & Data Center Master Data
 Management Social Recom– mendations Identity & Access Search &
 Discovery GEO
  • 6.
    “Forrester  es<mates  that  over  25%  of  enterprises  will  be  using   graph  databases  by  2017”   Neo4j  Leads  the  Graph  Database  RevoluDon   “Neo4j  is  the  current  market  leader  in  graph  databases.”   “Graph  analysis  is  possibly  the  single  most  effecDve  compeDDve   differenDator  for  organiza<ons  pursuing  data-­‐driven  opera<ons   and  decisions  aaer  the  design  of  data  capture.”   IT  Market  Clock  for  Database  Management  Systems,  2014   hbps://www.gartner.com/doc/2852717/it-­‐market-­‐clock-­‐database-­‐management   TechRadar™:  Enterprise  DBMS,  Q1  2014   hbp://www.forrester.com/TechRadar+Enterprise+DBMS+Q1+2014/fulltext/-­‐/E-­‐RES106801   Graph  Databases  –  and  Their  PotenDal  to  Transform  How  We  Capture  Interdependencies  (Enterprise  Management  Associates)   hbp://blogs.enterprisemanagement.com/dennisdrogseth/2013/11/06/graph-­‐databasesand-­‐poten<al-­‐transform-­‐capture-­‐interdependencies/  
  • 7.
    High  Business  Value  in  Data  RelaDonships   Data  is  increasing  in  volume…   •  New  digital  processes   •  More  online  transac<ons   •  New  social  networks   •  More  devices   Using  Data  RelaDonships  unlocks  value     •  Real-­‐<me  recommenda<ons   •  Fraud  detec<on   •  Master  data  management   •  Network  and  IT  opera<ons   •  Iden<ty  and  access  management   •  Graph-­‐based  search  …  and  is  ge[ng  more  connected   Customers,  products,  processes,   devices  interact  and  relate  to   each  other     Early  adopters  became  industry  leaders  
  • 8.
    RelaDonal  DBs  Can’t  Handle  RelaDonships  Well   •  Cannot  model  or  store  data  and  rela>onships   without  complexity   •  Performance  degrades  with  number  and  levels   of  rela<onships,  and  database  size   •  Query  complexity  grows  with  need  for  JOINs   •  Adding  new  types  of    data  and  rela>onships   requires  schema  redesign,  increasing  <me  to   market   …  making  tradi<onal  databases  inappropriate   when  data  rela<onships  are  valuable  in  real-­‐Dme     Slow  development   Poor  performance   Low  scalability   Hard  to  maintain  
  • 9.
  • 10.
    The  Whiteboard  Model  Is  the  Physical  Model  
  • 11.
    CAR   name:  “Dan”   born:  May  29,  1970   twiber:  “@dan”   name:  “Ann”   born:    Dec  5,  1975   since:     Jan  10,  2011   brand:  “Volvo”   model:  “V70”   Property  Graph  Model  Components   Nodes   •  The  objects  in  the  graph   •  Can  have  name-­‐value  proper&es   •  Can  be  labeled   RelaDonships   •  Relate  nodes  by  type  and  direc<on   •  Can  have  name-­‐value  proper&es   LOVES   LOVES   LIVES  WITH   PERSON   PERSON  
  • 12.
    RelaDonal  Versus  Graph  Models   RelaDonal  Model   Graph  Model   KNOWS   ANDREAS   TOBIAS   MICA   DELIA   Person   Friend  Person-­‐Friend   ANDREAS   DELIA   TOBIAS   MICA  
  • 13.
    Let’s  Model!     Customer,  Supplier,  and  Product  (Master  Data)   Orders  (AcDvity)  
  • 14.
  • 15.
  • 16.
  • 17.
    The  QuintessenDal   Northwind  Example!     NOT  JUST  ANY  
  • 18.
  • 19.
    Building  RelaDonships  in  Graphs   SOLD   Employee   Order  Order  
  • 20.
  • 21.
  • 22.
    Simple  Join  Tables  Becomes  RelaDonships  
  • 23.
    Afributed  Join  Tables  Become   RelaDonships  with  ProperDes  
  • 24.
  • 25.
  • 26.
  • 29.
    Basic  Query:  Who  do  people  report  to?   MATCH  (:Employee{  firstName:“Steven”}  )  -­‐[:REPORTS_TO]-­‐>  (:Employee{  firstName:“Andrew”}  )     REPORTS_TO   Steven   Andrew   LABEL   PROPERTY   NODE   NODE   LABEL   PROPERTY  
  • 30.
    Basic  Query:  Who  do  people  report  to?   MATCH ! (e:Employee)<-[:REPORTS_TO]-(sub:Employee)! RETURN ! *!
  • 31.
    Basic  Query:  Who  do  people  report  to?  
  • 32.
    Basic  Query:  Who  do  people  report  to?  
  • 33.
    Real  Query  from  a  Customer   Find  all  direct  reports  and     how  many  people  they  manage,     each  up  to  3  levels  down  
  • 34.
    (SELECT T.directReportees ASdirectReportees, sum(T.count) AS count FROM ( SELECT manager.pid AS directReportees, 0 AS count FROM person_reportee manager WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") UNION SELECT manager.pid AS directReportees, count(manager.directly_manages) AS count FROM person_reportee manager WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") GROUP BY directReportees UNION SELECT manager.pid AS directReportees, count(reportee.directly_manages) AS count FROM person_reportee manager JOIN person_reportee reportee ON manager.directly_manages = reportee.pid WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") GROUP BY directReportees UNION SELECT manager.pid AS directReportees, count(L2Reportees.directly_manages) AS count FROM person_reportee manager JOIN person_reportee L1Reportees ON manager.directly_manages = L1Reportees.pid JOIN person_reportee L2Reportees ON L1Reportees.directly_manages = L2Reportees.pid WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") GROUP BY directReportees ) AS T GROUP BY directReportees) UNION (SELECT T.directReportees AS directReportees, sum(T.count) AS count FROM ( SELECT manager.directly_manages AS directReportees, 0 AS count FROM person_reportee manager WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") UNION SELECT reportee.pid AS directReportees, count(reportee.directly_manages) AS count FROM person_reportee manager   JOIN person_reportee reportee ON manager.directly_manages = reportee.pid WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lN GROUP BY directReportees UNION SELECT depth1Reportees.pid AS directReportees, count(depth2Reportees.directly_manages) AS count FROM person_reportee manager JOIN person_reportee L1Reportees ON manager.directly_manages = L1Reportees.pid JOIN person_reportee L2Reportees ON L1Reportees.directly_manages = L2Reportees.pid WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lN GROUP BY directReportees ) AS T GROUP BY directReportees) UNION (SELECT T.directReportees AS directReportees, sum(T.count) AS count OUTER UNIONS FROM( SELECT reportee.directly_manages AS directReportees, 0 AS count FROM person_reportee manager JOIN person_reportee reportee ON manager.directly_manages = reportee.pid WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lNam GROUP BY directReportees count FROM person_reportee manager JOIN person_reportee L1Reportees ON manager.directly_manages = L1Reportees.pid JOIN person_reportee L2Reportees ON L1Reportees.directly_manages = L2Reportees.pid WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lN GROUP BY directReportees ) AS T GROUP BY directReportees) UNION (SELECT L2Reportees.directly_manages AS directReportees, 0 AS count FROM person_reportee manager JOIN person_reportee L1Reportees ON manager.directly_manages = L1Reportees.pid JOIN person_reportee L2Reportees
  • 35.
    Real  Query  from  a  Customer   MATCH  (manager)-­‐[:REPORTS_TO*0..3]-­‐>(boss),              (report)-­‐[:REPORTS_TO*1..3]-­‐>(manager)   WHERE  boss.name  =  “John  Doe”   RETURN  manager.name  AS  Manager,        count(report)  AS  TotalReports   Find  all  direct  reports  and  how   many  people  they  manage,     up  to  3  levels  down   Cypher  Query  
  • 36.
    Real  Query  from  a  Customer   Find  all  direct  reports  and  how   many  people  they  manage,     up  to  3  levels  down   Cypher  Query   SQL  Query   MATCH  (manager)-­‐[:REPORTS_TO*0..3]-­‐>(boss),              (report)-­‐[:REPORTS_TO*1..3]-­‐>(manager)   WHERE  boss.name  =  “John  Doe”   RETURN  manager.name  AS  Manager,        count(report)  AS  TotalReports  
  • 37.
    MATCH  (sub)-­‐[:REPORTS_TO*0..3]-­‐>(boss),              (report)-­‐[:REPORTS_TO*1..3]-­‐>(sub)   WHERE  boss.name  =  “John  Doe”   RETURN  sub.name  AS  Subordinate,        count(report)  AS  Total   Express  Complex  Queries  Easily  with  Cypher   Find  all  direct  reports  and  how   many  people  they  manage,     up  to  3  levels  down   Cypher  Query   SQL  Query  
  • 38.
    “We  found  Neo4j  to  be  literally  thousands  of  Dmes  faster   than  our  prior  MySQL  solu<on,  with  queries  that  require   10  to  100  Dmes  less  code.  Today,  Neo4j  provides  eBay   with  func<onality  that  was  previously  impossible.”     Volker  Pacher   Senior  Developer  
  • 39.
    Who  is  in  Robert’s  (direct,  upwards)  reporDng  chain?   MATCH ! p=(e:Employee)<-[:REPORTS_TO*]-(sub:Employee)! WHERE! sub.firstName = ‘Robert’! RETURN ! p!
  • 40.
    Who  is  in  Robert’s  (direct,  upwards)  reporDng  chain?  
  • 41.
    Who’s  the  Big  Boss?   MATCH ! p=(e:Employee)! WHERE! NOT (e)-[:REPORTS_TO]->()! RETURN ! e.firstName as bigBoss!
  • 42.
  • 43.
    Product  Cross-­‐Sell   MATCH! (choc:Product {productName: 'Chocolade'})! <-[:PRODUCT]-(:Order)<-[:SOLD]-(employee),! (employee)-[:SOLD]->(o2)-[:PRODUCT]->(other:Product)! RETURN ! employee.firstName, other.productName, count(distinct o2) as count! ORDER BY ! count DESC! LIMIT 5;!
  • 44.
  • 45.
  • 46.
    Cypher  vs  SQL  -­‐  Paths   MATCH (u:User)-[:KNOWS*5..5]->(f5) WHERE u.name = 'John' RETURN count(f5) as size; Cypher   Find  Size  of  John’s  5th  degree  Network   ●  100k  Users   ●  5M   Rela<onships   ●  Query  took  5   min,  30s   ●  Returns  count  of   312M     Neo4j  config:     page-­‐cache  =  512m   heap  =  4G  
  • 47.
    Cypher  vs  SQL  -­‐  Paths   SELECT count(*) FROM user, user_friend as uf1, user_friend as uf2, user_friend as uf3, user_friend as uf4, user_friend as uf5 user as f5 WHERE user.name='John' AND user.id = uf1.user_1 AND uf1.user_2 = uf2.user_1 AND uf2.user_2 = uf3.user_1 AND uf3.user_2 = uf4.user_1 AND uf4.user_2 = uf5.user_1 AND uf5.user_2 = f5.id; SQL   Find  Size  of  John’s  5th  degree  Network   ●  100k  Users   ●  5M  Connec<ons   ●  Query  took  1hr  55  mins   ●  Returns  312M     MySQL  config:     key_buffer  =  2G   join_buffer_size  =  2G  
  • 48.
    Cypher  vs  SQL  -­‐  Paths     SELECT count(*) FROM user, user_friend as uf1, user_friend as uf2, user_friend as uf3, user_friend as uf4, user_friend as uf5 WHERE user.name='John' AND user.id = uf1.user_1 AND uf1.user_2 = uf2.user_1 AND uf2.user_2 = uf3.user_1 AND uf3.user_2 = uf4.user_1 AND uf4.user_2 = uf5.user_1; SQL   Op>mize:  Only  count  on  JOIN  table   ●  100k  Users   ●  5M  Connec<ons   ●  Query  took  2  min,  30s   ●  Returns  count  of  312M     MySQL  config:     key_buffer  =  2G   join_buffer_size  =  2G  
  • 49.
    Cypher  vs  SQL  -­‐  Paths   MATCH (u:User)-[:KNOWS*4..4]->(f4) WHERE u.name = 'John' RETURN sum(size((f4)-[:KNOWS]->())) Cypher   Op>mize:  Only  sum  degree  of  last  step   ●  100k  Users   ●  5M   Rela<onships   ●  Query  takes  12   sec   ●  Returns  count  of   312M     Neo4j  config:     page-­‐cache  =  512m   heap  =  4G  
  • 50.
    Neo4j  Clustering     Architecture  OpDmized  for  Speed  &  Availability  at  Scale   50 Performance  Benefits   •  No  network  hops  within  queries   •  Real-­‐>me  opera>ons  with  fast  and   consistent  response  <mes     •  Cache  sharding  spreads  cache  across   cluster  for  very  large  graphs   Clustering  Features   •  Master-­‐slave  replica<on  with     master  re-­‐elec>on  and  failover     •  Each  instance  has  its  own  local  cache   •  Horizontal  scaling  &  disaster  recovery   Load  Balancer   Neo4j  Neo4j  Neo4j  
  • 51.
    Ge[ng  Data  into  Neo4j   Cypher-­‐Based  “LOAD  CSV”  Capability   •  Transac<onal  (ACID)  writes   •  Ini<al  and  incremental  loads  of  up  to     10  million  nodes  and  rela<onships   Command-­‐Line  Bulk  Loader        neo4j-­‐import   •  For  ini<al  database  popula<on   •  For  loads  with  10B+  records   •  Up  to  1M  records  per  second    4.58  million  things   and  their  rela<onships…     Loads  in  100  seconds!  
  • 52.
    MIGRATE     ALL  DATA   MIGRATE     GRAPH  DATA   DUPLICATE   GRAPH  DATA   Non-­‐graph  data   Graph  data   Graph  data  All  data   All  data   RelaDonal   Database   Graph   Database   Applica<on   Applica<on   Applica<on   Three  Ways  to  Load  Data  into  Neo4j  
  • 53.
  • 54.
    Data  Storage  and   Business  Rules  Execu<on   Data  Mining     and  Aggrega<on   Neo4j  Fits  into  Your  Enterprise  Environment   ApplicaDon   Graph  Database  Cluster   Neo4j   Neo4j   Neo4j   Ad  Hoc   Analysis   Bulk  AnalyDc   Infrastructure   Graph  Compute  Engine   EDW      …   Data   ScienDst   End  User   Databases   Rela<onal   NoSQL   Hadoop  
  • 55.
  • 56.
  • 57.
  • 58.
    Learn  the  Way  of  the  Graph   Quickly  and  Easily  
  • 59.
    Quick  Start  in  1  minute  
  • 60.
    Quick  Start:  Plan  Your  Project   1   2   3   4   5   6   7   8   Learn  Neo4j   Decide  on  Architecture   Import  and  Model  Data   Build  ApplicaDon   Test  ApplicaDon   Deploy  your  app   in  as  lible  as  8  weeks   PROFESSIONAL  SERVICES  PLAN  
  • 61.
    There  Are  Lots  of  Ways  to  Easily  Learn  Neo4j  
  • 62.
    Huge  Ecosystem  of  Graph  Enthusiasts   •  1,000,000+  downloads   •  20,000+  educa<on  registrants   •  18,000+  Meetup  members   •  100+  technology  and  service  partners   •  150+  enterprise  subscrip<on  customers     including  50+  Global  2000  companies  
  • 63.
  • 64.
    Summary  of  the  Power  of  the  Graph   •  Take  rela<onships  and  connected  data  seriously   •  Seriously  easy  to  model     •  Serious  performance     •  Fits  in  with  your  Enterprise  Architecture   •  Easy  to  get  started   •  Fast  to  reap  the  benefits  
  • 65.
    RDBMS  to  Graphs   Harnessing  the  Power  of  the  Graph   Start  of  Q&A   Ryan  Boyd   @ryguyrg