Your SlideShare is downloading. ×
HPTS 2011: The NoSQL Ecosystem
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

HPTS 2011: The NoSQL Ecosystem

4,677

Published on

My talk for HPTS 2011, summarizing the NoSQL ecosystem and what systems builders can learn from NoSQL's success.

My talk for HPTS 2011, summarizing the NoSQL ecosystem and what systems builders can learn from NoSQL's success.

Published in: Technology
0 Comments
10 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
4,677
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
99
Comments
0
Likes
10
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. The NoSQL Ecosystem Adam Marcus MIT CSAILmarcua@csail.mit.edu / @marcua
  • 2. About Me ● Social Computing + Database Systems ● Easily Distracted: Wrote The NoSQL Ecosystem in The Architecture of Open Source Applications 11 http://www.aosabook.org/en/nosql.html
  • 3. History, CompressedLate 1990s Today
  • 4. History, Compressed Buy RAMLate 1990s Today
  • 5. History, Compressed Buy RAM Wrap RDBMSsLate 1990s Today
  • 6. History, Compressed Buy RAM Wrap RDBMSs NoSQLLate 1990s Today
  • 7. History, Compressed Not Only SQL Buy RAM Wrap RDBMSs NoSQLLate 1990s Today
  • 8. History, Compressed Buy RAM Wrap RDBMSs NoSQL CommercializeLate 1990s Today
  • 9. History, Compressed Buy RAM Wrap RDBMSs NoSQL CommercializeLate 1990s Today
  • 10. The List, So You Dont Yell at Me HBase CouchDB Neo4j Cassandra Riak InfoGridBerkeleyDB Voldemort HyperTable Redis MongoDB SonesAllegroGraph HyperGraphDB DEX FlockDB VertexDB Oracle NoSQLTokyo Cabinet MemcacheDB
  • 11. The List, So You Dont Yell at Me Marcus Law of Databases: HBase CouchDB Neo4j Cassandra Riak InfoGrid The number of persistence HyperTableBerkeleyDB doubles every 1.5 years Voldemort options Redis MongoDB SonesAllegroGraph HyperGraphDB DEX FlockDB VertexDB Oracle NoSQLTokyo Cabinet MemcacheDB
  • 12. The List, So You Dont Yell at Me Marcus Law of Databases: HBase CouchDB Neo4j Cassandra Riak InfoGrid The number of persistence HyperTableBerkeleyDB doubles every 1.5 years Voldemort options Redis MongoDB Sones Corollary:AllegroGraph HyperGraphDB Im sorry I missedDEX FlockDB your persistence optionOracle NoSQL VertexDBTokyo Cabinet MemcacheDB
  • 13. interesting properties real-world usage takeaways questions
  • 14. interesting properties real-world usage takeaways questions
  • 15. Conventional Wisdom on NoSQL● Key-based data model● Sloppy schema● Single-key transactions● In-app joins● Eventually consistent
  • 16. Exceptions are More Interesting● Data Model● Query Model● Transactions● Consistency
  • 17. Data Model● Usually key-based... ● Column-families ● Documents ● Data structures
  • 18. Data Model● Usually key-based... ● Column-families ● Documents ● Data structures● ...but not always ● Graph stores
  • 19. Query Model● Redis: data structure-specific operations● CouchDB, Riak: MapReduce● Cassandra, MongoDB: SQL-like languages, no joins or transactions
  • 20. Query Model● Redis: data structure-specific operations● CouchDB, Riak: MapReduce● Cassandra, MongoDB: SQL-like languages, no joins or transactions● Third-party ● High-level: PigLatin, HiveQL ● Library: Cascading, Crunch ● Streaming: Flume, Kafka, S4, Scribe
  • 21. Transactions● Full ACID for single key● Redis: multi-key single-node transactions
  • 22. Consistency● Strong: Appears that all replicas see all writes● Eventual: Replicas may have different, divergent versions
  • 23. Consistency● Strong: Appears that all replicas see all writes● Eventual: Replicas may have different, divergent versions● Dynamo: strong or eventual (quorum size)
  • 24. Consistency ● Strong: Appears that all replicas see all writes ● Eventual: Replicas may have different, divergent versions ● Dynamo: strong or eventual (quorum size) ● PNUTs: Timeline consistency ● ...many consistency models!See: http://www.allthingsdistributed.com/2007/12/eventually_consistent.html
  • 25. interesting properties real-world usage takeaways questions
  • 26. NoSQL Use-cases● Cassandra● HBase● MongoDB
  • 27. Cassandra● BigTable data model: key column family● Dynamo sharding model: consistent hashing● Eventual or strong consistency
  • 28. Cassandra at Netflix ● Transitioned from Oracle ● Store customer profiles, customer:movie watch log, and detailed usage logging Note: no multi-record locking (e.g., bank transfer) In-datacenter: 3 replicas, per-app consistency read/write quorum = 1, ~1ms latency read/write quorum = 2, ~3ms latency“Replicating Datacenter Oracle with Global Apache Cassandra on AWS” by Adrian Cockcrofthttp://www.slideshare.net/adrianco/migrating-netflix-from-oracle-to-global-cassandra
  • 29. Cassandra at Netflix ● Transitioned from Oracle ● Store customer profiles, customer:movie watch log, and detailed usage logging ● Note: no multi-record locking (e.g., bank transfer) In-datacenter: 3 replicas, per-app consistency read/write quorum = 1, ~1ms latency read/write quorum = 2, ~3ms latency“Replicating Datacenter Oracle with Global Apache Cassandra on AWS” by Adrian Cockcrofthttp://www.slideshare.net/adrianco/migrating-netflix-from-oracle-to-global-cassandra
  • 30. Cassandra at Netflix ● Transitioned from Oracle ● Store customer profiles, customer:movie watch log, and detailed usage logging ● Note: no multi-record locking (e.g., bank transfer) ● In-datacenter: 3 replicas, per-app consistency“Replicating Datacenter Oracle with Global Apache Cassandra on AWS” by Adrian Cockcrofthttp://www.slideshare.net/adrianco/migrating-netflix-from-oracle-to-global-cassandra
  • 31. Cassandra at Netflix (contd)● Benefit: async inter-datacenter replication● Benefit: no downtime for schema changes● Benefit: hooks for live backups
  • 32. HBase● Data model: key column family● Sharding model: range partitioning● Strong consistency Applications Logging events/crawls, storing analytics Twitter: replicate data from MySQL, Hadoop analytics Facebook Messages
  • 33. HBase● Data model: key column family● Sharding model: range partitioning● Strong consistency● Applications ● Logging events/crawls, storing analytics ● Twitter: replicate data from MySQL, Hadoop analytics ● Facebook Messages
  • 34. HBase for Facebook Messages● Cassandra/Dynamo eventual consistency was difficult to program against Benefit: simple consistency model Benefit: flexible data model Benefit: simple sharding, load balancing, replication
  • 35. HBase for Facebook Messages● Cassandra/Dynamo eventual consistency was difficult to program against● Benefit: simple consistency model● Benefit: flexible data model● Benefit: simple sharding, load balancing, replication
  • 36. MongoDB● Document-based data model● Range-based partitioning● Consistency depends on how you use it
  • 37. MongoDB: Two use-cases● Archiving at Craigslist ● 2.2B historical posts, semi-structured ● Relatively large blobs: avg 2KB, max > 4 MB
  • 38. MongoDB: Two use-cases● Archiving at Craigslist ● 2.2B historical posts, semi-structured ● Relatively large blobs: avg 2KB, max > 4 MB● Checkins at Foursquare ● Geospatial indexing ● Small location-based updates, sharded on user
  • 39. interesting properties real-world usage takeaways questions
  • 40. Takeaways● Developer accessibility● Ecosystem of reuse● Soft spot
  • 41. Developer Accessibility
  • 42. Developer Accessibility devops@DEVOPS_BORAT
  • 43. Developer Accessibility devops self-taught dev@DEVOPS_BORAT
  • 44. A Different Five-Minute RuleWhat does a first-time user of your systemexperience in their first five minutes?(idea credit: Justin Sheehy, Basho)
  • 45. Redishttp://simonwillison.net/2009/Oct/22/redis/
  • 46. PostgreSQLhttp://www.postgresql.org/docs/9.1/interactive/sql-createtable.html
  • 47. Cassandrahttp://www.datastax.com/docs/0.7/getting_started/using_cli
  • 48. Accessibility is ForeverBeyond five minutes:● changing schemas● scaling up● modifying topology
  • 49. Accessibility is ForeverBeyond five minutes:● changing schemas● scaling up● modifying topology An accessible data store will ease first-time users in, and support experienced users as they expand
  • 50. Ecosystem of Reuse
  • 51. Spectrum of ReusabilityMonolithic System System Kernel
  • 52. Spectrum of ReusabilityMonolithic System System KernelMongoDB Redis Use asprovided
  • 53. Spectrum of ReusabilityMonolithic System System KernelMongoDB Cassandra Redis Voldemort Use as Pick betweenprovided components
  • 54. Spectrum of ReusabilityMonolithic System System KernelMongoDB Cassandra ZooKeeper Redis Voldemort LevelDB Use as Pick between Reusableprovided components components
  • 55. Spectrum of ReusabilityMonolithic System System KernelMongoDB Cassandra ZooKeeper Redis Voldemort LevelDB riak_core Use as Pick between Reusable Build yourprovided components components own system
  • 56. And finally, a soft spot
  • 57. polyglot persistence =right tool for right task
  • 58. Polyglot persistence: Wheres the data?“HBase at Mendeley,” Dan Harveyhttp://www.slideshare.net/danharvey/hbase-at-mendeley
  • 59. Polyglot persistence: Wheres the data?“HBase at Mendeley,” Dan Harveyhttp://www.slideshare.net/danharvey/hbase-at-mendeley
  • 60. Polyglot persistence: Wheres the data?Aid developers in reasoning about data consistency across multiple storage engines“HBase at Mendeley,” Dan Harveyhttp://www.slideshare.net/danharvey/hbase-at-mendeley
  • 61. interesting properties real-world usage takeaways questions
  • 62. ● Systems: Polyglot persistence + data consistency?
  • 63. ● Systems: Polyglot persistence + data consistency?● Operations: Availability/consistency/latency tradeoffs in datacenters?
  • 64. ● Systems: Polyglot persistence + data consistency?● Operations: Availability/consistency/latency tradeoffs in datacenters?● Accessibility: RDBMS five-minute usability?
  • 65. ● Systems: Polyglot persistence + data consistency?● Operations: Availability/consistency/latency tradeoffs in datacenters?● Accessibility: RDBMS five-minute usability?● Accessibility: Scale-up wizard for storage systems?
  • 66. ● Systems: Polyglot persistence + data consistency?● Operations: Availability/consistency/latency tradeoffs in datacenters?● Accessibility: RDBMS five-minute usability?● Accessibility: Scale-up wizard for storage systems?● Comparisons: Design or implementation?
  • 67. ● Systems: Polyglot persistence + data consistency?● Operations: Availability/consistency/latency tradeoffs in datacenters?● Accessibility: RDBMS five-minute usability?● Accessibility: Scale-up wizard for storage systems?● Comparisons: Design or implementation?● Future: Next-generation NoSQL stores?
  • 68. Thank You!● Systems: Polyglot persistence + data consistency?● Operations: Availability/consistency/latency tradeoffs in datacenters?● Accessibility: RDBMS five-minute usability?● Accessibility: Scale-up wizard for storage systems?● Comparisons: Design or implementation?● Future: Next-generation NoSQL stores? Adam Marcus marcua@csail.mit.edu / @marcua
  • 69. Photo Creditshttp://www.flickr.com/photos/dhannah/4577657955/http://www.flickr.com/photos/27316226@N02/3000888100/sizes/m/in/photostream/http://www.texample.net/tikz/examples/valentine-heart/http://voltdb.com/sites/default/files/VCE_button.png

×