The NoSQL Ecosystem        Adam Marcus         MIT CSAILmarcua@csail.mit.edu / @marcua
About Me     ●         Social Computing + Database Systems     ●         Easily Distracted: Wrote The NoSQL Ecosystem in  ...
History, CompressedLate 1990s                         Today
History, Compressed Buy RAMLate 1990s                         Today
History, Compressed Buy RAM Wrap RDBMSsLate 1990s                         Today
History, Compressed Buy RAM Wrap RDBMSs   NoSQLLate 1990s                         Today
History, Compressed                          Not Only SQL Buy RAM Wrap RDBMSs   NoSQLLate 1990s                          T...
History, Compressed Buy RAM Wrap RDBMSs   NoSQL   CommercializeLate 1990s                            Today
History, Compressed Buy RAM Wrap RDBMSs   NoSQL   CommercializeLate 1990s                            Today
The List, So You Dont Yell at Me                HBase                  CouchDB                               Neo4j     Cas...
The List, So You Dont Yell at Me      Marcus Law of Databases:           HBase             CouchDB                        ...
The List, So You Dont Yell at Me      Marcus Law of Databases:           HBase             CouchDB                        ...
interesting properties  real-world usage     takeaways      questions
interesting properties  real-world usage     takeaways      questions
Conventional Wisdom on NoSQL●    Key-based data model●    Sloppy schema●    Single-key transactions●    In-app joins●    E...
Exceptions are More Interesting●    Data Model●    Query Model●    Transactions●    Consistency
Data Model●    Usually key-based...    ●        Column-families    ●        Documents    ●        Data structures
Data Model●    Usually key-based...    ●        Column-families    ●        Documents    ●        Data structures●    ...b...
Query Model●    Redis: data structure-specific operations●    CouchDB, Riak: MapReduce●    Cassandra, MongoDB: SQL-like la...
Query Model●    Redis: data structure-specific operations●    CouchDB, Riak: MapReduce●    Cassandra, MongoDB: SQL-like la...
Transactions●    Full ACID for single key●    Redis: multi-key single-node transactions
Consistency●   Strong: Appears that all replicas see all writes●   Eventual: Replicas may have different, divergent versions
Consistency●   Strong: Appears that all replicas see all writes●   Eventual: Replicas may have different, divergent versio...
Consistency  ●   Strong: Appears that all replicas see all writes  ●   Eventual: Replicas may have different, divergent ve...
interesting properties  real-world usage     takeaways      questions
NoSQL Use-cases●    Cassandra●    HBase●    MongoDB
Cassandra●    BigTable data model: key   column family●    Dynamo sharding model: consistent hashing●    Eventual or stron...
Cassandra at Netflix  ●      Transitioned from Oracle  ●      Store customer profiles, customer:movie watch      log, and ...
Cassandra at Netflix  ●      Transitioned from Oracle  ●      Store customer profiles, customer:movie watch      log, and ...
Cassandra at Netflix  ●      Transitioned from Oracle  ●      Store customer profiles, customer:movie watch      log, and ...
Cassandra at Netflix (contd)●    Benefit: async inter-datacenter replication●    Benefit: no downtime for schema changes● ...
HBase●    Data model: key       column family●    Sharding model: range partitioning●    Strong consistency    Application...
HBase●    Data model: key         column family●    Sharding model: range partitioning●    Strong consistency●    Applicat...
HBase for Facebook Messages●    Cassandra/Dynamo eventual consistency was    difficult to program against    Benefit: simp...
HBase for Facebook Messages●    Cassandra/Dynamo eventual consistency was    difficult to program against●    Benefit: sim...
MongoDB●    Document-based data model●    Range-based partitioning●    Consistency depends on how you use it
MongoDB: Two use-cases●    Archiving at Craigslist    ●        2.2B historical posts, semi-structured    ●        Relative...
MongoDB: Two use-cases●    Archiving at Craigslist    ●        2.2B historical posts, semi-structured    ●        Relative...
interesting properties  real-world usage     takeaways      questions
Takeaways●    Developer accessibility●    Ecosystem of reuse●    Soft spot
Developer Accessibility
Developer Accessibility    devops@DEVOPS_BORAT
Developer Accessibility    devops          self-taught dev@DEVOPS_BORAT
A Different Five-Minute RuleWhat does a first-time user of your systemexperience in their first five minutes?(idea credit:...
Redishttp://simonwillison.net/2009/Oct/22/redis/
PostgreSQLhttp://www.postgresql.org/docs/9.1/interactive/sql-createtable.html
Cassandrahttp://www.datastax.com/docs/0.7/getting_started/using_cli
Accessibility is ForeverBeyond five minutes:●    changing schemas●    scaling up●    modifying topology
Accessibility is ForeverBeyond five minutes:●    changing schemas●    scaling up●    modifying topology       An accessibl...
Ecosystem of Reuse
Spectrum of ReusabilityMonolithic System         System Kernel
Spectrum of ReusabilityMonolithic System            System KernelMongoDB Redis Use asprovided
Spectrum of ReusabilityMonolithic System            System KernelMongoDB      Cassandra Redis       Voldemort Use as     P...
Spectrum of ReusabilityMonolithic System                System KernelMongoDB      Cassandra   ZooKeeper Redis       Voldem...
Spectrum of ReusabilityMonolithic System                System KernelMongoDB      Cassandra   ZooKeeper Redis       Voldem...
And finally, a soft spot
polyglot persistence            =right tool for right task
Polyglot persistence: Wheres the data?“HBase at Mendeley,” Dan Harveyhttp://www.slideshare.net/danharvey/hbase-at-mendeley
Polyglot persistence: Wheres the data?“HBase at Mendeley,” Dan Harveyhttp://www.slideshare.net/danharvey/hbase-at-mendeley
Polyglot persistence: Wheres the data?Aid developers in reasoning about data consistency          across multiple storage ...
interesting properties  real-world usage     takeaways      questions
●    Systems: Polyglot persistence + data consistency?
●    Systems: Polyglot persistence + data consistency?●    Operations: Availability/consistency/latency tradeoffs in    da...
●    Systems: Polyglot persistence + data consistency?●    Operations: Availability/consistency/latency tradeoffs in    da...
●    Systems: Polyglot persistence + data consistency?●    Operations: Availability/consistency/latency tradeoffs in    da...
●    Systems: Polyglot persistence + data consistency?●    Operations: Availability/consistency/latency tradeoffs in    da...
●    Systems: Polyglot persistence + data consistency?●    Operations: Availability/consistency/latency tradeoffs in    da...
Thank You!●    Systems: Polyglot persistence + data consistency?●    Operations: Availability/consistency/latency tradeoff...
Photo Creditshttp://www.flickr.com/photos/dhannah/4577657955/http://www.flickr.com/photos/27316226@N02/3000888100/sizes/m/...
Upcoming SlideShare
Loading in...5
×

HPTS 2011: The NoSQL Ecosystem

4,787

Published on

My talk for HPTS 2011, summarizing the NoSQL ecosystem and what systems builders can learn from NoSQL's success.

Published in: Technology
0 Comments
10 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
4,787
On Slideshare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
100
Comments
0
Likes
10
Embeds 0
No embeds

No notes for slide

HPTS 2011: The NoSQL Ecosystem

  1. 1. The NoSQL Ecosystem Adam Marcus MIT CSAILmarcua@csail.mit.edu / @marcua
  2. 2. About Me ● Social Computing + Database Systems ● Easily Distracted: Wrote The NoSQL Ecosystem in The Architecture of Open Source Applications 11 http://www.aosabook.org/en/nosql.html
  3. 3. History, CompressedLate 1990s Today
  4. 4. History, Compressed Buy RAMLate 1990s Today
  5. 5. History, Compressed Buy RAM Wrap RDBMSsLate 1990s Today
  6. 6. History, Compressed Buy RAM Wrap RDBMSs NoSQLLate 1990s Today
  7. 7. History, Compressed Not Only SQL Buy RAM Wrap RDBMSs NoSQLLate 1990s Today
  8. 8. History, Compressed Buy RAM Wrap RDBMSs NoSQL CommercializeLate 1990s Today
  9. 9. History, Compressed Buy RAM Wrap RDBMSs NoSQL CommercializeLate 1990s Today
  10. 10. The List, So You Dont Yell at Me HBase CouchDB Neo4j Cassandra Riak InfoGridBerkeleyDB Voldemort HyperTable Redis MongoDB SonesAllegroGraph HyperGraphDB DEX FlockDB VertexDB Oracle NoSQLTokyo Cabinet MemcacheDB
  11. 11. The List, So You Dont Yell at Me Marcus Law of Databases: HBase CouchDB Neo4j Cassandra Riak InfoGrid The number of persistence HyperTableBerkeleyDB doubles every 1.5 years Voldemort options Redis MongoDB SonesAllegroGraph HyperGraphDB DEX FlockDB VertexDB Oracle NoSQLTokyo Cabinet MemcacheDB
  12. 12. The List, So You Dont Yell at Me Marcus Law of Databases: HBase CouchDB Neo4j Cassandra Riak InfoGrid The number of persistence HyperTableBerkeleyDB doubles every 1.5 years Voldemort options Redis MongoDB Sones Corollary:AllegroGraph HyperGraphDB Im sorry I missedDEX FlockDB your persistence optionOracle NoSQL VertexDBTokyo Cabinet MemcacheDB
  13. 13. interesting properties real-world usage takeaways questions
  14. 14. interesting properties real-world usage takeaways questions
  15. 15. Conventional Wisdom on NoSQL● Key-based data model● Sloppy schema● Single-key transactions● In-app joins● Eventually consistent
  16. 16. Exceptions are More Interesting● Data Model● Query Model● Transactions● Consistency
  17. 17. Data Model● Usually key-based... ● Column-families ● Documents ● Data structures
  18. 18. Data Model● Usually key-based... ● Column-families ● Documents ● Data structures● ...but not always ● Graph stores
  19. 19. Query Model● Redis: data structure-specific operations● CouchDB, Riak: MapReduce● Cassandra, MongoDB: SQL-like languages, no joins or transactions
  20. 20. Query Model● Redis: data structure-specific operations● CouchDB, Riak: MapReduce● Cassandra, MongoDB: SQL-like languages, no joins or transactions● Third-party ● High-level: PigLatin, HiveQL ● Library: Cascading, Crunch ● Streaming: Flume, Kafka, S4, Scribe
  21. 21. Transactions● Full ACID for single key● Redis: multi-key single-node transactions
  22. 22. Consistency● Strong: Appears that all replicas see all writes● Eventual: Replicas may have different, divergent versions
  23. 23. Consistency● Strong: Appears that all replicas see all writes● Eventual: Replicas may have different, divergent versions● Dynamo: strong or eventual (quorum size)
  24. 24. Consistency ● Strong: Appears that all replicas see all writes ● Eventual: Replicas may have different, divergent versions ● Dynamo: strong or eventual (quorum size) ● PNUTs: Timeline consistency ● ...many consistency models!See: http://www.allthingsdistributed.com/2007/12/eventually_consistent.html
  25. 25. interesting properties real-world usage takeaways questions
  26. 26. NoSQL Use-cases● Cassandra● HBase● MongoDB
  27. 27. Cassandra● BigTable data model: key column family● Dynamo sharding model: consistent hashing● Eventual or strong consistency
  28. 28. Cassandra at Netflix ● Transitioned from Oracle ● Store customer profiles, customer:movie watch log, and detailed usage logging Note: no multi-record locking (e.g., bank transfer) In-datacenter: 3 replicas, per-app consistency read/write quorum = 1, ~1ms latency read/write quorum = 2, ~3ms latency“Replicating Datacenter Oracle with Global Apache Cassandra on AWS” by Adrian Cockcrofthttp://www.slideshare.net/adrianco/migrating-netflix-from-oracle-to-global-cassandra
  29. 29. Cassandra at Netflix ● Transitioned from Oracle ● Store customer profiles, customer:movie watch log, and detailed usage logging ● Note: no multi-record locking (e.g., bank transfer) In-datacenter: 3 replicas, per-app consistency read/write quorum = 1, ~1ms latency read/write quorum = 2, ~3ms latency“Replicating Datacenter Oracle with Global Apache Cassandra on AWS” by Adrian Cockcrofthttp://www.slideshare.net/adrianco/migrating-netflix-from-oracle-to-global-cassandra
  30. 30. Cassandra at Netflix ● Transitioned from Oracle ● Store customer profiles, customer:movie watch log, and detailed usage logging ● Note: no multi-record locking (e.g., bank transfer) ● In-datacenter: 3 replicas, per-app consistency“Replicating Datacenter Oracle with Global Apache Cassandra on AWS” by Adrian Cockcrofthttp://www.slideshare.net/adrianco/migrating-netflix-from-oracle-to-global-cassandra
  31. 31. Cassandra at Netflix (contd)● Benefit: async inter-datacenter replication● Benefit: no downtime for schema changes● Benefit: hooks for live backups
  32. 32. HBase● Data model: key column family● Sharding model: range partitioning● Strong consistency Applications Logging events/crawls, storing analytics Twitter: replicate data from MySQL, Hadoop analytics Facebook Messages
  33. 33. HBase● Data model: key column family● Sharding model: range partitioning● Strong consistency● Applications ● Logging events/crawls, storing analytics ● Twitter: replicate data from MySQL, Hadoop analytics ● Facebook Messages
  34. 34. HBase for Facebook Messages● Cassandra/Dynamo eventual consistency was difficult to program against Benefit: simple consistency model Benefit: flexible data model Benefit: simple sharding, load balancing, replication
  35. 35. HBase for Facebook Messages● Cassandra/Dynamo eventual consistency was difficult to program against● Benefit: simple consistency model● Benefit: flexible data model● Benefit: simple sharding, load balancing, replication
  36. 36. MongoDB● Document-based data model● Range-based partitioning● Consistency depends on how you use it
  37. 37. MongoDB: Two use-cases● Archiving at Craigslist ● 2.2B historical posts, semi-structured ● Relatively large blobs: avg 2KB, max > 4 MB
  38. 38. MongoDB: Two use-cases● Archiving at Craigslist ● 2.2B historical posts, semi-structured ● Relatively large blobs: avg 2KB, max > 4 MB● Checkins at Foursquare ● Geospatial indexing ● Small location-based updates, sharded on user
  39. 39. interesting properties real-world usage takeaways questions
  40. 40. Takeaways● Developer accessibility● Ecosystem of reuse● Soft spot
  41. 41. Developer Accessibility
  42. 42. Developer Accessibility devops@DEVOPS_BORAT
  43. 43. Developer Accessibility devops self-taught dev@DEVOPS_BORAT
  44. 44. A Different Five-Minute RuleWhat does a first-time user of your systemexperience in their first five minutes?(idea credit: Justin Sheehy, Basho)
  45. 45. Redishttp://simonwillison.net/2009/Oct/22/redis/
  46. 46. PostgreSQLhttp://www.postgresql.org/docs/9.1/interactive/sql-createtable.html
  47. 47. Cassandrahttp://www.datastax.com/docs/0.7/getting_started/using_cli
  48. 48. Accessibility is ForeverBeyond five minutes:● changing schemas● scaling up● modifying topology
  49. 49. Accessibility is ForeverBeyond five minutes:● changing schemas● scaling up● modifying topology An accessible data store will ease first-time users in, and support experienced users as they expand
  50. 50. Ecosystem of Reuse
  51. 51. Spectrum of ReusabilityMonolithic System System Kernel
  52. 52. Spectrum of ReusabilityMonolithic System System KernelMongoDB Redis Use asprovided
  53. 53. Spectrum of ReusabilityMonolithic System System KernelMongoDB Cassandra Redis Voldemort Use as Pick betweenprovided components
  54. 54. Spectrum of ReusabilityMonolithic System System KernelMongoDB Cassandra ZooKeeper Redis Voldemort LevelDB Use as Pick between Reusableprovided components components
  55. 55. Spectrum of ReusabilityMonolithic System System KernelMongoDB Cassandra ZooKeeper Redis Voldemort LevelDB riak_core Use as Pick between Reusable Build yourprovided components components own system
  56. 56. And finally, a soft spot
  57. 57. polyglot persistence =right tool for right task
  58. 58. Polyglot persistence: Wheres the data?“HBase at Mendeley,” Dan Harveyhttp://www.slideshare.net/danharvey/hbase-at-mendeley
  59. 59. Polyglot persistence: Wheres the data?“HBase at Mendeley,” Dan Harveyhttp://www.slideshare.net/danharvey/hbase-at-mendeley
  60. 60. Polyglot persistence: Wheres the data?Aid developers in reasoning about data consistency across multiple storage engines“HBase at Mendeley,” Dan Harveyhttp://www.slideshare.net/danharvey/hbase-at-mendeley
  61. 61. interesting properties real-world usage takeaways questions
  62. 62. ● Systems: Polyglot persistence + data consistency?
  63. 63. ● Systems: Polyglot persistence + data consistency?● Operations: Availability/consistency/latency tradeoffs in datacenters?
  64. 64. ● Systems: Polyglot persistence + data consistency?● Operations: Availability/consistency/latency tradeoffs in datacenters?● Accessibility: RDBMS five-minute usability?
  65. 65. ● Systems: Polyglot persistence + data consistency?● Operations: Availability/consistency/latency tradeoffs in datacenters?● Accessibility: RDBMS five-minute usability?● Accessibility: Scale-up wizard for storage systems?
  66. 66. ● Systems: Polyglot persistence + data consistency?● Operations: Availability/consistency/latency tradeoffs in datacenters?● Accessibility: RDBMS five-minute usability?● Accessibility: Scale-up wizard for storage systems?● Comparisons: Design or implementation?
  67. 67. ● Systems: Polyglot persistence + data consistency?● Operations: Availability/consistency/latency tradeoffs in datacenters?● Accessibility: RDBMS five-minute usability?● Accessibility: Scale-up wizard for storage systems?● Comparisons: Design or implementation?● Future: Next-generation NoSQL stores?
  68. 68. Thank You!● Systems: Polyglot persistence + data consistency?● Operations: Availability/consistency/latency tradeoffs in datacenters?● Accessibility: RDBMS five-minute usability?● Accessibility: Scale-up wizard for storage systems?● Comparisons: Design or implementation?● Future: Next-generation NoSQL stores? Adam Marcus marcua@csail.mit.edu / @marcua
  69. 69. Photo Creditshttp://www.flickr.com/photos/dhannah/4577657955/http://www.flickr.com/photos/27316226@N02/3000888100/sizes/m/in/photostream/http://www.texample.net/tikz/examples/valentine-heart/http://voltdb.com/sites/default/files/VCE_button.png
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×