Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
AURELIUS
THINKAURELIUS.COM
Titan:db
Scaling Relationship Data with C*
Matthias Broecheler
@mbroecheler
September XI, MMXIV...
Storing relationship data in Cassandra entails data
denormalization or pointer chasing inside the application which
reduce...
Multi-Relational Data
Structure
Graph
Titan = Cassandra + Graph
Titan 0.5
Cassandra
,CH?;L M=;F;<CFCNS
@;OFN NIF?L;H=?
IJ?H
MIOL=?
GOFNC >;N;=?HN?L
BCAB J?L@ILG;H=?
Key	
   ColumnA	
   ColumnB	
   ColumnC	
   ColumnD	
   ColumnE	
   ColumnF	
  
username	
   email	
   password	
  
ma7	
   ma7@	
   12345	
  
john	
   john@	
   qwerty	
  
billy	
   billy@	
   abcde	
 ...
CREATE INDEX ON username(User), email(User), productid(Product)
User
 Product
productid: 52235
name: cup
price: 12.55
user...
username	
   producDd	
   Dme	
  
ma7	
   52235	
   9/5/14	
  
billy	
   42215	
   8/7/14	
  
billy	
   42215	
   8/7/14	
...
User
 Product
productid: 52235
name: cup
price: 12.55
username: matt
email: matt@
password: 12345
buy
time: 9/5/14
What did ‘matt’ buy?
 Application level join
username	
   producDd	
   Dme	
  
ma7	
   52235	
   9/5/14	
  
billy	
   4221...
What did ‘matt’ buy?
 g.V.has(‘username’,’matt’)
.out(‘buy’)
User
 Product
productid: 52235
name: cup
price: 12.55
usernam...
What did ‘matt’ recently
buy?
Application level join
username	
   producDd	
   Dme	
  
ma7	
   52235	
   9/5/14	
  
billy	...
g.V.has(‘username’,’matt’)
.outE(‘buy’).orderBy(‘time’,DESC)
[0..9].inV
What did ‘matt’ recently
buy?
User
 Product
produc...
What did ‘matt’ recently
buy?
slow
User
 Product
productid: 52235
name: cup
price: 12.55
username: matt
email: matt@
passw...
username	
   producDd	
   Dme	
  
ma7	
   52235	
   9/5/14	
  
billy	
   42215	
   8/7/14	
  
billy	
   42215	
   8/7/14	
...
User
 Product
productid: 52235
name: cup
price: 12.55
username: matt
email: matt@
password: 12345
buy
time: 9/5/14
What di...
producDd	
   username	
   Dme	
  
52235	
   ma7	
   9/5/14	
  
42215	
   billy	
   8/7/14	
  
42215	
   billy	
   8/7/14	
...
User
 Product
productid: 52235
name: cup
price: 12.55
username: matt
email: matt@
password: 12345
buy
time: 9/5/14
g.V.has...
Product join tables won’t scale
username	
   producDd	
   Dme	
   username	
   Dme	
   producDd	
  
username	
   email	
  ...
User
 Product
productid: 52235
name: cup
price: 12.55
username: matt
email: matt@
password: 12345
buy
time: 9/5/14
PARTITI...
Token Ring
(BOP)
Edge Cut
- assigns ids to map
vertices into “optimal”
token range
- Maintains virtual partitions
Vertex Cut
Combined Graph Partitioning
Database
Datastore
Transactions
v = g.V.has(‘username’,’matt’)
.has(‘password’,’12345’)
p = g.V.has(‘productid’,52235)
e = v.addEdge(‘buy’,p)...
Transaction Consistency
u = g.addVertex([username:’matt’])
p = g.V.has(‘username’,’senior’)
u.addEdge(‘father’,p)
p.setPro...
Polyglot Data Architecture
© Jay Kreps @ LinkedIn
Transaction
modifications
logged
Consumers
Titan Event Framework
Use Cases
http://arli.us/magazinaluiza
Security
Fraud
http://arli.us/cisco-sec1
© Sean York @ Pearson Education
http://bit.ly/
WPTitanSEAGraph
http://arli.us/musicgraphintro
Music Graph
Knowledge Graph
TitanDB.io
Relationships + Cassandra
AURELIUS
THINKAURELIUS.COM
Cassandra Summit 2014: TitanDB - Scaling Relationship Data and Analysis with Cassandra
Upcoming SlideShare
Loading in …5
×

Cassandra Summit 2014: TitanDB - Scaling Relationship Data and Analysis with Cassandra

3,644 views

Published on

Presenter: Matthias Broecheler, Managing Partner at Aurelius LLC

Storing relationship data in Cassandra entails data denormalization or pointer chasing inside the application which reduces developer productivity, is error prone, and slow due to lack of optimization. Titan:db exposes a property graph data model directly atop Cassandra which makes storing and querying relationship data fast, easy, and scalable to huge graphs. This talk demonstrates how Titan's features enable complex, multi-relational databases in Cassandra and discusses customer use cases for recommendation and personalization engines.

Published in: Technology

Cassandra Summit 2014: TitanDB - Scaling Relationship Data and Analysis with Cassandra

  1. 1. AURELIUS THINKAURELIUS.COM Titan:db Scaling Relationship Data with C* Matthias Broecheler @mbroecheler September XI, MMXIV #CassandraSummit
  2. 2. Storing relationship data in Cassandra entails data denormalization or pointer chasing inside the application which reduces developer productivity, is error prone, and slow due to lack of optimization. Titan:db exposes a property graph data model directly atop Cassandra which makes storing and querying relationship data fast, easy, and scalable to huge graphs. This talk demonstrates how Titan's features enable complex, multi-relational databases in Cassandra and discusses customer use cases for recommendation and personalization engines.
  3. 3. Multi-Relational Data Structure Graph
  4. 4. Titan = Cassandra + Graph
  5. 5. Titan 0.5
  6. 6. Cassandra ,CH?;L M=;F;<CFCNS @;OFN NIF?L;H=? IJ?H MIOL=? GOFNC >;N;=?HN?L BCAB J?L@ILG;H=?
  7. 7. Key   ColumnA   ColumnB   ColumnC   ColumnD   ColumnE   ColumnF  
  8. 8. username   email   password   ma7   ma7@   12345   john   john@   qwerty   billy   billy@   abcde   producDd   name   price   52235   cup   12.55   42215   spoon   7.22   24529   knife   5.32   User Product CREATE INDEX ON User.username, User.email, Product.productid
  9. 9. CREATE INDEX ON username(User), email(User), productid(Product) User Product productid: 52235 name: cup price: 12.55 username: matt email: matt@ password: 12345
  10. 10. username   producDd   Dme   ma7   52235   9/5/14   billy   42215   8/7/14   billy   42215   8/7/14   Buy username   email   password   ma7   ma7@   12345   john   john@   qwerty   billy   billy@   abcde   producDd   name   price   52235   cup   12.55   42215   spoon   7.22   24529   knife   5.32  
  11. 11. User Product productid: 52235 name: cup price: 12.55 username: matt email: matt@ password: 12345 buy time: 9/5/14
  12. 12. What did ‘matt’ buy? Application level join username   producDd   Dme   ma7   52235   9/5/14   billy   42215   8/7/14   billy   42215   8/7/14   Buy username   email   password   ma7   ma7@   12345   john   john@   qwerty   billy   billy@   abcde   producDd   name   price   52235   cup   12.55   42215   spoon   7.22   24529   knife   5.32  
  13. 13. What did ‘matt’ buy? g.V.has(‘username’,’matt’) .out(‘buy’) User Product productid: 52235 name: cup price: 12.55 username: matt email: matt@ password: 12345 buy time: 9/5/14
  14. 14. What did ‘matt’ recently buy? Application level join username   producDd   Dme   ma7   52235   9/5/14   billy   42215   8/7/14   billy   42215   8/7/14   Buy username   email   password   ma7   ma7@   12345   john   john@   qwerty   billy   billy@   abcde   producDd   name   price   52235   cup   12.55   42215   spoon   7.22   24529   knife   5.32  
  15. 15. g.V.has(‘username’,’matt’) .outE(‘buy’).orderBy(‘time’,DESC) [0..9].inV What did ‘matt’ recently buy? User Product productid: 52235 name: cup price: 12.55 username: matt email: matt@ password: 12345 buy time: 9/5/14
  16. 16. What did ‘matt’ recently buy? slow User Product productid: 52235 name: cup price: 12.55 username: matt email: matt@ password: 12345 buy time: 9/5/14 g.V.has(‘username’,’matt’) .outE(‘buy’).orderBy(‘time’,DESC) [0..9].inV
  17. 17. username   producDd   Dme   ma7   52235   9/5/14   billy   42215   8/7/14   billy   42215   8/7/14   What did ‘matt’ recently buy? Rewrite join logic username   Dme   producDd   ma7   9/5/14   52235   billy   8/7/14   42215   billy   8/7/14   42215   username   email   password   ma7   ma7@   12345   john   john@   qwerty   billy   billy@   abcde   producDd   name   price   52235   cup   12.55   42215   spoon   7.22   24529   knife   5.32  
  18. 18. User Product productid: 52235 name: cup price: 12.55 username: matt email: matt@ password: 12345 buy time: 9/5/14 What did ‘matt’ recently buy? CREATE INDEX ON buy edges by time OUT direction g.V.has(‘username’,’matt’) .outE(‘buy’).orderBy(‘time’,DESC) [0..9].inV
  19. 19. producDd   username   Dme   52235   ma7   9/5/14   42215   billy   8/7/14   42215   billy   8/7/14   Who bought ‘52235’? More application joins producDd   Dme   producDd   52235   9/5/14   ma7   42215   8/7/14   billy   42215   8/7/14   billy   username   email   password   ma7   ma7@   12345   john   john@   qwerty   billy   billy@   abcde   producDd   name   price   52235   cup   12.55   42215   spoon   7.22   24529   knife   5.32  
  20. 20. User Product productid: 52235 name: cup price: 12.55 username: matt email: matt@ password: 12345 buy time: 9/5/14 g.V.has(‘productid’,52235) .in(‘buy’) CREATE INDEX ON buy edges by time IN direction Who bought ‘52235’?
  21. 21. Product join tables won’t scale username   producDd   Dme   username   Dme   producDd   username   email   password   ma7   ma7@   12345   john   john@   qwerty   billy   billy@   abcde   producDd   name   price   52235   cup   12.55   42215   spoon   7.22   24529   knife   5.32   producDd   username   Dme   52235   ma7   9/5/14   42215   billy   8/7/14   42215   billy   8/7/14   producDd   Dme   producDd   52235   9/5/14   ma7   42215   8/7/14   billy   42215   8/7/14   billy  
  22. 22. User Product productid: 52235 name: cup price: 12.55 username: matt email: matt@ password: 12345 buy time: 9/5/14 PARTITION Product Vertices
  23. 23. Token Ring (BOP) Edge Cut - assigns ids to map vertices into “optimal” token range - Maintains virtual partitions
  24. 24. Vertex Cut
  25. 25. Combined Graph Partitioning
  26. 26. Database Datastore
  27. 27. Transactions v = g.V.has(‘username’,’matt’) .has(‘password’,’12345’) p = g.V.has(‘productid’,52235) e = v.addEdge(‘buy’,p) e.setProperty(‘time’,’9/11/2014’) o = g.addVertex([orderid:242343]) o.addEdge(‘buyer’,v) o.addEdge(‘product’,p) g.commit() unit of work Atomicity Consistency Isolation Durability
  28. 28. Transaction Consistency u = g.addVertex([username:’matt’]) p = g.V.has(‘username’,’senior’) u.addEdge(‘father’,p) p.setProperty(‘surname’,’Jones’) g.commit() Locks acquired to ensure consistency constraints are enforced •  Index Uniqueness •  Multiplicity Constraints •  Cardinality Constraints
  29. 29. Polyglot Data Architecture © Jay Kreps @ LinkedIn
  30. 30. Transaction modifications logged Consumers Titan Event Framework
  31. 31. Use Cases
  32. 32. http://arli.us/magazinaluiza
  33. 33. Security Fraud http://arli.us/cisco-sec1
  34. 34. © Sean York @ Pearson Education
  35. 35. http://bit.ly/ WPTitanSEAGraph
  36. 36. http://arli.us/musicgraphintro Music Graph Knowledge Graph
  37. 37. TitanDB.io Relationships + Cassandra
  38. 38. AURELIUS THINKAURELIUS.COM

×