The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012

on

  • 1,393 views

Session presented at Big Data Spain 2012 Conference ...

Session presented at Big Data Spain 2012 Conference
16th Nov 2012
ETSI Telecomunicacion UPM Madrid
www.bigdataspain.org
More info: http://www.bigdataspain.org/es-2012/conference/top-five-questions-about-nosql/jonathan-ellis

Statistics

Views

Total Views
1,393
Views on SlideShare
1,393
Embed Views
0

Actions

Likes
0
Downloads
36
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012 Presentation Transcript

  • 1. Five questionsfor your NoSQL solution!Jonathan EllisCTO, DataStaxProject Chair, Apache Cassandra
  • 2. how do I my application? model©2012 DataStax
  • 3. Popular options • Key/value • Tabular • Document • Graph?©2012 DataStax
  • 4. Schema is your friend{ "id": "e451dd42-ece3-11e1-a0a3-34159e154f4c", "name": "jbellis", "state": "TX", "birthdate": "1/1/1976", "email_addresses": ["jbellis@gmail", "jbellis@datastax.com"],} ©2012 DataStax
  • 5. SQL can be your friend too CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date date ); CREATE INDEX ON users(state); SELECT * FROM users WHERE state=‘Texas’ AND birth_date > ‘1950-01-01’;©2012 DataStax
  • 6. Collections CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date date ); CREATE TABLE users_addresses ( user_id uuid REFERENCES users, email text ); SELECT * FROM users NATURAL JOIN users_addresses;©2012 DataStax
  • 7. Collections CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, X birth_date date ); CREATE TABLE users_addresses ( user_id uuid REFERENCES users, email text ); SELECT * FROM users NATURAL JOIN users_addresses;©2012 DataStax
  • 8. Collections CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date date, email_addresses set<text> ); UPDATE users SET email_addresses = email_addresses + {‘jbellis@gmail.com’, ‘jbellis@datastax.com’};©2012 DataStax
  • 9. Joins don’t scale • No joins • No subqueries • No aggregation functions* or GROUP BY • ORDER BY?©2012 DataStax
  • 10. SELECT * FROM tweetsWHERE user_id IN (SELECT follower FROM followers WHERE user_id = ’driftx’) followers ? ©2012 DataStax tweets
  • 11. Clustering in CassandraCREATE TABLE timeline ( user_id tweet_id _author _body  user_id uuid,  tweet_id timeuuid, jbellis 3290f9da.. rbranson lorem  tweet_author uuid, jbellis 3895411a.. tjake ipsum tweet_body text, ... ... ...  PRIMARY KEY (user_id, tweet_id) driftx 3290f9da.. rbranson lorem); driftx 71b46a84.. yzhang dolor ... ... ... yukim 3290f9da.. rbranson lorem yukim e451dd42.. tjake amet ... ... ... ©2012 DataStax
  • 12. Clustering in CassandraCREATE TABLE timeline ( user_id tweet_id _author _body  user_id uuid,  tweet_id timeuuid, jbellis 3290f9da.. rbranson lorem  tweet_author uuid, jbellis 3895411a.. tjake ipsum tweet_body text, ... ... ...  PRIMARY KEY (user_id, tweet_id) driftx 3290f9da.. rbranson lorem); driftx 71b46a84.. yzhang dolor ... ... ...SELECT * FROM timelineWHERE user_id = ’driftx’; yukim 3290f9da.. rbranson lorem yukim e451dd42.. tjake amet ... ... ... ©2012 DataStax
  • 13. how does it perform?©2012 DataStax
  • 14. VLDB benchmark©2012 DataStax
  • 15. Locking©2012 DataStax
  • 16. Efficiency©2012 DataStax
  • 17. UPDATE users SET email_addresses = email_addresses + {...} WHERE user_id = ‘jbellis’;©2012 DataStax
  • 18. Durability©2012 DataStax
  • 19. Log-structured storage engine write( k1 , c1:v1 ) Memory Memtable Commit log©2012 DataStax Hard drive
  • 20. write( k1 , c1:v1 ) Memory k1 c1:v1 Memtable k1 c1:v1 Commit log©2012 DataStax Hard drive
  • 21. write( k1 , c2:v2 ) Memory k1 c1:v1 c2:v2 k1 c1:v1 k1 c2:v2©2012 DataStax Hard drive
  • 22. write( k2 , c1:v1 c2:v2 ) Memory k1 c1:v1 c2:v2 k2 c1:v1 c2:v2 k1 c1:v1 k1 c2:v2 k2 c1:v1 c2:v2©2012 DataStax Hard drive
  • 23. write( k1 , c1:v4 c3:v3 ) Memory k1 c1:v4 c2:v2 c3:v3 k2 c1:v1 c2:v2 k1 c1:v1 k1 c2:v2 k2 c1:v1 c2:v2 k1 c1:v4 c3:v3©2012 DataStax Hard drive
  • 24. Memory flush index / BF cleanup k1 c1:v4 c2:v2 c3:v3 k2 c1:v1 c2:v2 SSTable©2012 DataStax Hard drive
  • 25. No random writes©2012 DataStax
  • 26. The gory details©2012 DataStax
  • 27. Larger than memory datasets©2012 DataStax
  • 28. how does it handle failure?©2012 DataStax
  • 29. Classic partitioning with SPOF partition 1 partition 2 partition 3 partition 4 router client©2012 DataStax
  • 30. Availability • “High availability implies that a single fault will not bring down your system. Not ‘we’ll recover quickly.’” -- Ben Coverston: DataStax • “The biggest problem with failover is that youre almost never using it until it really hurts. Its like backups that you never test.” -- Rick Branson: Instagram©2012 DataStax
  • 31. Fully distributed, no SPOF client p3 p6 p1 p1 p1©2012 DataStax
  • 32. Multiple datacenters©2012 DataStax
  • 33. ©2012 DataStax
  • 34. Self-healing request 1 Client Coordinator internal request 2 response 4 internal response 3 Replica©2012 DataStax
  • 35. Self-healing request 1 Client Coordinator internal request 2 response 4 internal response 3 Replica©2012 DataStax
  • 36. Self-healing request 1 Client Coordinator internal request 2 timeout response 4 Replica©2012 DataStax replica fails
  • 37. Self-healing request 1 Client Coordinator internal request 2 X timeout response 4 Replica©2012 DataStax replica fails
  • 38. Self-healing request 1 Client Coordinator internal request 2 timeout response 4 hint 3 Replica©2012 DataStax replica fails
  • 39. Self-healing request 1 Client Coordinator internal request 2 X timeout response 4 hint 3 Replica©2012 DataStax replica fails
  • 40. Other healing modes • AntiEntropyService • Read repair©2012 DataStax
  • 41. Dynamic snitch(dealing with partial failure) 90% busy Client Coordinator 30% busy 40% busy ©2012 DataStax
  • 42. how does it scale?©2012 DataStax
  • 43. VLDB benchmark©2012 DataStax
  • 44. Scaling antipatterns • Metadata servers • Router bottlenecks • Overloading existing nodes when adding capacity©2012 DataStax
  • 45. how is it? flexible©2012 DataStax
  • 46. ©2012 DataStax
  • 47. Data model: Realtime LiveStocks stock last GOOG $95.52 AAPL $186.10 AMZN $112.98 Portfolios user stock shares jbellis GOOG 80 jbellis LNKD 20 yukim AMZN 100 StockHist stock date price GOOG 2011-01-01 $8.23 GOOG 2011-01-02 $6.14 GOOG 2011-001-03 $7.78©2012 DataStax
  • 48. Data model: Analytics HistLoss worst_date loss Portfolio1 2011-07-23 -$34.81 Portfolio2 2011-03-11 -$11432.24 Portfolio3 2011-05-21 -$1476.93©2012 DataStax
  • 49. Data model: Analytics 10dayreturns stock rdate return GOOG 2011-07-25 $8.23 GOOG 2011-07-24 $6.14 GOOG 2011-07-23 $7.78 AAPL 2011-07-25 $15.32 AAPL 2011-07-24 $12.68 INSERT OVERWRITE TABLE 10dayreturns SELECT a.stock, b.date as rdate, b.price - a.price FROM StockHist a JOIN StockHist b ON (a.stock = b.stock AND date_add(a.date, 10) = b.date);©2012 DataStax
  • 50. Data model: Analytics portfolio_returns portfolio rdate preturn Portfolio1 2011-07-25 $118.21 Portfolio1 2011-07-24 $60.78 Portfolio1 2011-07-23 -$34.81 Portfolio2 2011-07-25 $2143.92 Portfolio3 2011-07-24 -$10.19 INSERT OVERWRITE TABLE portfolio_returns SELECT portfolio, rdate, SUM(b.return) FROM portfolios a JOIN 10dayreturns b ON (a.stock = b.stock) GROUP BY portfolio, rdate;©2012 DataStax
  • 51. Data model: Analytics HistLoss worst_date loss Portfolio1 2011-07-23 -$34.81 Portfolio2 2011-03-11 -$11432.24 Portfolio3 2011-05-21 -$1476.93 INSERT OVERWRITE TABLE HistLoss SELECT a.portfolio, rdate, minp FROM ( SELECT portfolio, min(preturn) as minp FROM portfolio_returns GROUP BY portfolio ) a JOIN portfolio_returns b ON (a.portfolio = b.portfolio and a.minp = b.preturn);©2012 DataStax
  • 52. ©2012 DataStax
  • 53. Some Cassandra users©2012 DataStax
  • 54. Questions?Image credits • http://www.flickr.com/photos/26817893@N05/2573006312/ • http://www.flickr.com/photos/rowanbank/7686239548 • http://www.flickr.com/photos/mervtheswerve/6081933265 • http://www.flickr.com/photos/dg_pics/2526208830 • http://www.flickr.com/photos/wainwright/351684037 • http://www.flickr.com/photos/mikeneilson/1606662529 • http://www.flickr.com/photos/sbisson/3852905534 • http://www.flickr.com/photos/breadnbadger/2674928517