HPTS 2011: The NoSQL Ecosystem

The NoSQL Ecosystem



        Adam Marcus
         MIT CSAIL
marcua@csail.mit.edu / @marcua
About Me
     ●
         Social Computing + Database Systems
     ●
         Easily Distracted: Wrote The NoSQL Ecosystem in
         The Architecture of Open Source Applications   1




1
    http://www.aosabook.org/en/nosql.html
History, Compressed




Late 1990s                         Today
History, Compressed




 Buy RAM

Late 1990s                         Today
History, Compressed




 Buy RAM Wrap RDBMSs

Late 1990s                         Today
History, Compressed




 Buy RAM Wrap RDBMSs   NoSQL

Late 1990s                         Today
History, Compressed


                          Not Only SQL




 Buy RAM Wrap RDBMSs   NoSQL

Late 1990s                          Today
History, Compressed




 Buy RAM Wrap RDBMSs   NoSQL   Commercialize

Late 1990s                            Today
History, Compressed




 Buy RAM Wrap RDBMSs   NoSQL   Commercialize

Late 1990s                            Today
The List, So You Don't Yell at Me

                HBase                  CouchDB
                               Neo4j
     Cassandra          Riak            InfoGrid
BerkeleyDB      Voldemort      HyperTable
       Redis
                         MongoDB        Sones
AllegroGraph HyperGraphDB
                                 DEX    FlockDB
             VertexDB
                                    Oracle NoSQL
Tokyo Cabinet       MemcacheDB
The List, So You Don't Yell at Me
      Marcus' Law of Databases:
           HBase             CouchDB
                             Neo4j
     Cassandra        Riak           InfoGrid
       The number of persistence
                      HyperTable
BerkeleyDB doubles every 1.5 years
            Voldemort
    options
       Redis
                       MongoDB       Sones
AllegroGraph HyperGraphDB
                               DEX   FlockDB
           VertexDB
                                 Oracle NoSQL
Tokyo Cabinet    MemcacheDB
The List, So You Don't Yell at Me
      Marcus' Law of Databases:
           HBase             CouchDB
                           Neo4j
     Cassandra      Riak           InfoGrid
       The number of persistence
                      HyperTable
BerkeleyDB doubles every 1.5 years
            Voldemort
    options
        Redis
                     MongoDB       Sones
                  Corollary:
AllegroGraph HyperGraphDB
          I'm sorry I missedDEX FlockDB
                             your
             persistence optionOracle NoSQL
            VertexDB
Tokyo Cabinet    MemcacheDB
interesting properties
  real-world usage
     takeaways
      questions
interesting properties
  real-world usage
     takeaways
      questions
Conventional Wisdom on NoSQL
●
    Key-based data model
●
    Sloppy schema
●
    Single-key transactions
●
    In-app joins
●
    Eventually consistent
Exceptions are More Interesting
●
    Data Model
●
    Query Model
●
    Transactions
●
    Consistency
Data Model
●
    Usually key-based...
    ●
        Column-families
    ●
        Documents
    ●
        Data structures
Data Model
●
    Usually key-based...
    ●
        Column-families
    ●
        Documents
    ●
        Data structures
●
    ...but not always
    ●
        Graph stores
Query Model
●
    Redis: data structure-specific operations
●
    CouchDB, Riak: MapReduce
●
    Cassandra, MongoDB: SQL-like languages, no
    joins or transactions
Query Model
●
    Redis: data structure-specific operations
●
    CouchDB, Riak: MapReduce
●
    Cassandra, MongoDB: SQL-like languages, no
    joins or transactions
●
    Third-party
    ●
        High-level: PigLatin, HiveQL
    ●
        Library: Cascading, Crunch
    ●
        Streaming: Flume, Kafka, S4, Scribe
Transactions
●
    Full ACID for single key
●
    Redis: multi-key single-node transactions
Consistency
●   Strong: Appears that all replicas see all writes
●   Eventual: Replicas may have different, divergent versions
Consistency
●   Strong: Appears that all replicas see all writes
●   Eventual: Replicas may have different, divergent versions

●   Dynamo: strong or eventual (quorum size)
Consistency
  ●   Strong: Appears that all replicas see all writes
  ●   Eventual: Replicas may have different, divergent versions

  ●   Dynamo: strong or eventual (quorum size)


  ●   PNUTs: Timeline consistency
  ●   ...many consistency models!




See: http://www.allthingsdistributed.com/2007/12/eventually_consistent.html
interesting properties
  real-world usage
     takeaways
      questions
NoSQL Use-cases
●
    Cassandra
●
    HBase
●
    MongoDB
Cassandra
●
    BigTable data model: key   column family
●
    Dynamo sharding model: consistent hashing
●
    Eventual or strong consistency
Cassandra at Netflix
  ●
      Transitioned from Oracle
  ●
      Store customer profiles, customer:movie watch
      log, and detailed usage logging
         Note: no multi-record locking (e.g., bank transfer)
      In-datacenter: 3 replicas, per-app consistency
         read/write quorum = 1, ~1ms latency
         read/write quorum = 2, ~3ms latency


“Replicating Datacenter Oracle with Global Apache Cassandra on AWS” by Adrian Cockcroft
http://www.slideshare.net/adrianco/migrating-netflix-from-oracle-to-global-cassandra
Cassandra at Netflix
  ●
      Transitioned from Oracle
  ●
      Store customer profiles, customer:movie watch
      log, and detailed usage logging
      ●
          Note: no multi-record locking (e.g., bank transfer)
      In-datacenter: 3 replicas, per-app consistency
          read/write quorum = 1, ~1ms latency
          read/write quorum = 2, ~3ms latency


“Replicating Datacenter Oracle with Global Apache Cassandra on AWS” by Adrian Cockcroft
http://www.slideshare.net/adrianco/migrating-netflix-from-oracle-to-global-cassandra
Cassandra at Netflix
  ●
      Transitioned from Oracle
  ●
      Store customer profiles, customer:movie watch
      log, and detailed usage logging
      ●
          Note: no multi-record locking (e.g., bank transfer)
  ●
      In-datacenter: 3 replicas, per-app consistency




“Replicating Datacenter Oracle with Global Apache Cassandra on AWS” by Adrian Cockcroft
http://www.slideshare.net/adrianco/migrating-netflix-from-oracle-to-global-cassandra
Cassandra at Netflix (cont'd)
●
    Benefit: async inter-datacenter replication
●
    Benefit: no downtime for schema changes
●
    Benefit: hooks for live backups
HBase
●
    Data model: key       column family
●
    Sharding model: range partitioning
●
    Strong consistency


    Applications
      Logging events/crawls, storing analytics
      Twitter: replicate data from MySQL, Hadoop analytics
      Facebook Messages
HBase
●
    Data model: key         column family
●
    Sharding model: range partitioning
●
    Strong consistency


●
    Applications
    ●
        Logging events/crawls, storing analytics
    ●
        Twitter: replicate data from MySQL, Hadoop analytics
    ●
        Facebook Messages
HBase for Facebook Messages
●
    Cassandra/Dynamo eventual consistency was
    difficult to program against


    Benefit: simple consistency model
    Benefit: flexible data model
    Benefit: simple sharding, load balancing,
    replication
HBase for Facebook Messages
●
    Cassandra/Dynamo eventual consistency was
    difficult to program against


●
    Benefit: simple consistency model
●
    Benefit: flexible data model
●
    Benefit: simple sharding, load balancing,
    replication
MongoDB
●
    Document-based data model
●
    Range-based partitioning
●
    Consistency depends on how you use it
MongoDB: Two use-cases
●
    Archiving at Craigslist
    ●
        2.2B historical posts, semi-structured
    ●
        Relatively large blobs: avg 2KB, max > 4 MB
MongoDB: Two use-cases
●
    Archiving at Craigslist
    ●
        2.2B historical posts, semi-structured
    ●
        Relatively large blobs: avg 2KB, max > 4 MB
●
    Checkins at Foursquare
    ●
        Geospatial indexing
    ●
        Small location-based updates, sharded on user
interesting properties
  real-world usage
     takeaways
      questions
Takeaways
●
    Developer accessibility
●
    Ecosystem of reuse
●
    Soft spot
Developer Accessibility
Developer Accessibility
    devops




@DEVOPS_BORAT
Developer Accessibility
    devops          self-taught dev




@DEVOPS_BORAT
A Different Five-Minute Rule



What does a first-time user of your system
experience in their first five minutes?
(idea credit: Justin Sheehy, Basho)
Redis
http://simonwillison.net/2009/Oct/22/redis/
PostgreSQL
http://www.postgresql.org/docs/9.1/interactive/sql-createtable.html
Cassandra
http://www.datastax.com/docs/0.7/getting_started/using_cli
Accessibility is Forever
Beyond five minutes:
●
    changing schemas
●
    scaling up
●
    modifying topology
Accessibility is Forever
Beyond five minutes:
●
    changing schemas
●
    scaling up
●
    modifying topology


       An accessible data store will ease
        first-time users in, and support
       experienced users as they expand
Ecosystem of Reuse
Spectrum of Reusability


Monolithic System         System Kernel
Spectrum of Reusability


Monolithic System            System Kernel


MongoDB
 Redis


 Use as
provided
Spectrum of Reusability


Monolithic System            System Kernel


MongoDB      Cassandra
 Redis       Voldemort


 Use as     Pick between
provided    components
Spectrum of Reusability


Monolithic System                System Kernel


MongoDB      Cassandra   ZooKeeper
 Redis       Voldemort    LevelDB


 Use as     Pick between Reusable
provided    components components
Spectrum of Reusability


Monolithic System                System Kernel


MongoDB      Cassandra   ZooKeeper
 Redis       Voldemort    LevelDB     riak_core


 Use as     Pick between Reusable     Build your
provided    components components    own system
And finally, a soft spot
polyglot persistence
            =
right tool for right task
Polyglot persistence: Where's the data?




“HBase at Mendeley,” Dan Harvey
http://www.slideshare.net/danharvey/hbase-at-mendeley
Polyglot persistence: Where's the data?




“HBase at Mendeley,” Dan Harvey
http://www.slideshare.net/danharvey/hbase-at-mendeley
Polyglot persistence: Where's the data?




Aid developers in reasoning about data consistency
          across multiple storage engines
“HBase at Mendeley,” Dan Harvey
http://www.slideshare.net/danharvey/hbase-at-mendeley
interesting properties
  real-world usage
     takeaways
      questions
●
    Systems: Polyglot persistence + data consistency?
●
    Systems: Polyglot persistence + data consistency?
●
    Operations: Availability/consistency/latency tradeoffs in
    datacenters?
●
    Systems: Polyglot persistence + data consistency?
●
    Operations: Availability/consistency/latency tradeoffs in
    datacenters?
●
    Accessibility: RDBMS five-minute usability?
●
    Systems: Polyglot persistence + data consistency?
●
    Operations: Availability/consistency/latency tradeoffs in
    datacenters?
●
    Accessibility: RDBMS five-minute usability?
●
    Accessibility: Scale-up wizard for storage systems?
●
    Systems: Polyglot persistence + data consistency?
●
    Operations: Availability/consistency/latency tradeoffs in
    datacenters?
●
    Accessibility: RDBMS five-minute usability?
●
    Accessibility: Scale-up wizard for storage systems?
●
    Comparisons: Design or implementation?
●
    Systems: Polyglot persistence + data consistency?
●
    Operations: Availability/consistency/latency tradeoffs in
    datacenters?
●
    Accessibility: RDBMS five-minute usability?
●
    Accessibility: Scale-up wizard for storage systems?
●
    Comparisons: Design or implementation?
●
    Future: Next-generation NoSQL stores?
Thank You!
●
    Systems: Polyglot persistence + data consistency?
●
    Operations: Availability/consistency/latency tradeoffs in
    datacenters?
●
    Accessibility: RDBMS five-minute usability?
●
    Accessibility: Scale-up wizard for storage systems?
●
    Comparisons: Design or implementation?
●
    Future: Next-generation NoSQL stores?

                    Adam Marcus
            marcua@csail.mit.edu / @marcua
Photo Credits
http://www.flickr.com/photos/dhannah/457765
7955/
http://www.flickr.com/photos/27316226@N02/3
000888100/sizes/m/in/photostream/
http://www.texample.net/tikz/examples/valenti
ne-heart/
http://voltdb.com/sites/default/files/VCE_button
.png
1 of 69

Recommended

No sql landscape_nosqltips by
No sql landscape_nosqltipsNo sql landscape_nosqltips
No sql landscape_nosqltipsimarcticblue
1.2K views41 slides
Overview of no sql by
Overview of no sqlOverview of no sql
Overview of no sqlSean Murphy
1.3K views20 slides
Demystfying nosql databases by
Demystfying nosql databasesDemystfying nosql databases
Demystfying nosql databasesMike King
67 views18 slides
NoSQL Databases by
NoSQL DatabasesNoSQL Databases
NoSQL DatabasesEduard Tudenhoefner
1K views49 slides
Navigating NoSQL in cloudy skies by
Navigating NoSQL in cloudy skiesNavigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesshnkr_rmchndrn
863 views55 slides
NoSQL Databases by
NoSQL DatabasesNoSQL Databases
NoSQL DatabasesAshish Karki
1.7K views7 slides

More Related Content

What's hot

Couchbase by
CouchbaseCouchbase
CouchbaseArpit Aggarwal
82 views13 slides
NoSQL databases - An introduction by
NoSQL databases - An introductionNoSQL databases - An introduction
NoSQL databases - An introductionPooyan Mehrparvar
4.9K views53 slides
The Hive Think Tank: Rocking the Database World with RocksDB by
The Hive Think Tank: Rocking the Database World with RocksDBThe Hive Think Tank: Rocking the Database World with RocksDB
The Hive Think Tank: Rocking the Database World with RocksDBThe Hive
1.4K views36 slides
An Intro to NoSQL Databases by
An Intro to NoSQL DatabasesAn Intro to NoSQL Databases
An Intro to NoSQL DatabasesRajith Pemabandu
2.5K views83 slides
Nosql databases for the .net developer by
Nosql databases for the .net developerNosql databases for the .net developer
Nosql databases for the .net developerJesus Rodriguez
2.3K views33 slides
NoSql by
NoSqlNoSql
NoSqlGirish Khanzode
2.6K views164 slides

What's hot(20)

The Hive Think Tank: Rocking the Database World with RocksDB by The Hive
The Hive Think Tank: Rocking the Database World with RocksDBThe Hive Think Tank: Rocking the Database World with RocksDB
The Hive Think Tank: Rocking the Database World with RocksDB
The Hive1.4K views
Nosql databases for the .net developer by Jesus Rodriguez
Nosql databases for the .net developerNosql databases for the .net developer
Nosql databases for the .net developer
Jesus Rodriguez2.3K views
NoSQL Seminer by Partha Das
NoSQL SeminerNoSQL Seminer
NoSQL Seminer
Partha Das760 views
Introduction to Cassandra (June 2010) by gdusbabek
Introduction to Cassandra (June 2010)Introduction to Cassandra (June 2010)
Introduction to Cassandra (June 2010)
gdusbabek8K views
Comparative study of modern databases by Anirban Konar
Comparative study of modern databasesComparative study of modern databases
Comparative study of modern databases
Anirban Konar691 views
Four NoSQL Databases You Should Know by Mahmoud Khaled
Four NoSQL Databases You Should KnowFour NoSQL Databases You Should Know
Four NoSQL Databases You Should Know
Mahmoud Khaled10K views
A Seminar on NoSQL Databases. by Navdeep Charan
A Seminar on NoSQL Databases.A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.
Navdeep Charan1.2K views
No SQL - A Simple Intro by Karthi Keyan
No SQL - A Simple IntroNo SQL - A Simple Intro
No SQL - A Simple Intro
Karthi Keyan1.3K views
First steps to Azure Cosmos DB: Getting Started with MongoDB and NoSQL by Hansamali Gamage
First steps to Azure Cosmos DB: Getting Started with MongoDB and NoSQLFirst steps to Azure Cosmos DB: Getting Started with MongoDB and NoSQL
First steps to Azure Cosmos DB: Getting Started with MongoDB and NoSQL
Hansamali Gamage70 views
Java BigData Full Stack Development (version 2.0) by Alexey Zinoviev
Java BigData Full Stack Development (version 2.0)Java BigData Full Stack Development (version 2.0)
Java BigData Full Stack Development (version 2.0)
Alexey Zinoviev718 views

Similar to HPTS 2011: The NoSQL Ecosystem

Introduction to NoSQL | Big Data Hadoop Spark Tutorial | CloudxLab by
Introduction to NoSQL | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to NoSQL | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to NoSQL | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
192 views23 slides
Drop acid by
Drop acidDrop acid
Drop acidMike Feltman
1.8K views29 slides
JPoint'15 Mom, I so wish Hibernate for my NoSQL database... by
JPoint'15 Mom, I so wish Hibernate for my NoSQL database...JPoint'15 Mom, I so wish Hibernate for my NoSQL database...
JPoint'15 Mom, I so wish Hibernate for my NoSQL database...Alexey Zinoviev
1.1K views48 slides
NoSQL by
NoSQLNoSQL
NoSQLTomas Bosak
988 views15 slides
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve... by
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...Felix Gessert
5K views316 slides
No SQL Technologies by
No SQL TechnologiesNo SQL Technologies
No SQL TechnologiesCris Holdorph
1.1K views46 slides

Similar to HPTS 2011: The NoSQL Ecosystem(20)

Introduction to NoSQL | Big Data Hadoop Spark Tutorial | CloudxLab by CloudxLab
Introduction to NoSQL | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to NoSQL | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to NoSQL | Big Data Hadoop Spark Tutorial | CloudxLab
CloudxLab192 views
JPoint'15 Mom, I so wish Hibernate for my NoSQL database... by Alexey Zinoviev
JPoint'15 Mom, I so wish Hibernate for my NoSQL database...JPoint'15 Mom, I so wish Hibernate for my NoSQL database...
JPoint'15 Mom, I so wish Hibernate for my NoSQL database...
Alexey Zinoviev1.1K views
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve... by Felix Gessert
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
Felix Gessert5K views
No Sql Introduction by Dingding Ye
No Sql IntroductionNo Sql Introduction
No Sql Introduction
Dingding Ye4.3K views
NoSQL Intro with cassandra by Brian Enochson
NoSQL Intro with cassandraNoSQL Intro with cassandra
NoSQL Intro with cassandra
Brian Enochson3.1K views
Introduction to NoSql by Omid Vahdaty
Introduction to NoSqlIntroduction to NoSql
Introduction to NoSql
Omid Vahdaty321 views
NoSQL by dbulic
NoSQLNoSQL
NoSQL
dbulic1.4K views
Using Spring with NoSQL databases (SpringOne China 2012) by Chris Richardson
Using Spring with NoSQL databases (SpringOne China 2012)Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)
Chris Richardson4.9K views
On Rails with Apache Cassandra by Stu Hood
On Rails with Apache CassandraOn Rails with Apache Cassandra
On Rails with Apache Cassandra
Stu Hood3.4K views
SQL? NoSQL? NewSQL?!? What’s a Java developer to do? - JDC2012 Cairo, Egypt by Chris Richardson
SQL? NoSQL? NewSQL?!? What’s a Java developer to do? - JDC2012 Cairo, EgyptSQL? NoSQL? NewSQL?!? What’s a Java developer to do? - JDC2012 Cairo, Egypt
SQL? NoSQL? NewSQL?!? What’s a Java developer to do? - JDC2012 Cairo, Egypt
Chris Richardson1.3K views
Mongo db transcript by foliba
Mongo db transcriptMongo db transcript
Mongo db transcript
foliba698 views
Spring one2gx2010 spring-nonrelational_data by Roger Xia
Spring one2gx2010 spring-nonrelational_dataSpring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_data
Roger Xia1.3K views
Introduction to NoSQL by balwinders
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
balwinders1.8K views

Recently uploaded

"How we switched to Kanban and how it integrates with product planning", Vady... by
"How we switched to Kanban and how it integrates with product planning", Vady..."How we switched to Kanban and how it integrates with product planning", Vady...
"How we switched to Kanban and how it integrates with product planning", Vady...Fwdays
61 views24 slides
"Role of a CTO in software outsourcing company", Yuriy Nakonechnyy by
"Role of a CTO in software outsourcing company", Yuriy Nakonechnyy"Role of a CTO in software outsourcing company", Yuriy Nakonechnyy
"Role of a CTO in software outsourcing company", Yuriy NakonechnyyFwdays
40 views21 slides
Java Platform Approach 1.0 - Picnic Meetup by
Java Platform Approach 1.0 - Picnic MeetupJava Platform Approach 1.0 - Picnic Meetup
Java Platform Approach 1.0 - Picnic MeetupRick Ossendrijver
25 views39 slides
GigaIO: The March of Composability Onward to Memory with CXL by
GigaIO: The March of Composability Onward to Memory with CXLGigaIO: The March of Composability Onward to Memory with CXL
GigaIO: The March of Composability Onward to Memory with CXLCXL Forum
126 views12 slides
"Ukrainian Mobile Banking Scaling in Practice. From 0 to 100 and beyond", Vad... by
"Ukrainian Mobile Banking Scaling in Practice. From 0 to 100 and beyond", Vad..."Ukrainian Mobile Banking Scaling in Practice. From 0 to 100 and beyond", Vad...
"Ukrainian Mobile Banking Scaling in Practice. From 0 to 100 and beyond", Vad...Fwdays
40 views30 slides
.conf Go 2023 - Data analysis as a routine by
.conf Go 2023 - Data analysis as a routine.conf Go 2023 - Data analysis as a routine
.conf Go 2023 - Data analysis as a routineSplunk
90 views12 slides

Recently uploaded(20)

"How we switched to Kanban and how it integrates with product planning", Vady... by Fwdays
"How we switched to Kanban and how it integrates with product planning", Vady..."How we switched to Kanban and how it integrates with product planning", Vady...
"How we switched to Kanban and how it integrates with product planning", Vady...
Fwdays61 views
"Role of a CTO in software outsourcing company", Yuriy Nakonechnyy by Fwdays
"Role of a CTO in software outsourcing company", Yuriy Nakonechnyy"Role of a CTO in software outsourcing company", Yuriy Nakonechnyy
"Role of a CTO in software outsourcing company", Yuriy Nakonechnyy
Fwdays40 views
GigaIO: The March of Composability Onward to Memory with CXL by CXL Forum
GigaIO: The March of Composability Onward to Memory with CXLGigaIO: The March of Composability Onward to Memory with CXL
GigaIO: The March of Composability Onward to Memory with CXL
CXL Forum126 views
"Ukrainian Mobile Banking Scaling in Practice. From 0 to 100 and beyond", Vad... by Fwdays
"Ukrainian Mobile Banking Scaling in Practice. From 0 to 100 and beyond", Vad..."Ukrainian Mobile Banking Scaling in Practice. From 0 to 100 and beyond", Vad...
"Ukrainian Mobile Banking Scaling in Practice. From 0 to 100 and beyond", Vad...
Fwdays40 views
.conf Go 2023 - Data analysis as a routine by Splunk
.conf Go 2023 - Data analysis as a routine.conf Go 2023 - Data analysis as a routine
.conf Go 2023 - Data analysis as a routine
Splunk90 views
Future of Learning - Khoong Chan Meng by NUS-ISS
Future of Learning - Khoong Chan MengFuture of Learning - Khoong Chan Meng
Future of Learning - Khoong Chan Meng
NUS-ISS31 views
Empathic Computing: Delivering the Potential of the Metaverse by Mark Billinghurst
Empathic Computing: Delivering  the Potential of the MetaverseEmpathic Computing: Delivering  the Potential of the Metaverse
Empathic Computing: Delivering the Potential of the Metaverse
Mark Billinghurst449 views
The details of description: Techniques, tips, and tangents on alternative tex... by BookNet Canada
The details of description: Techniques, tips, and tangents on alternative tex...The details of description: Techniques, tips, and tangents on alternative tex...
The details of description: Techniques, tips, and tangents on alternative tex...
BookNet Canada110 views
"Fast Start to Building on AWS", Igor Ivaniuk by Fwdays
"Fast Start to Building on AWS", Igor Ivaniuk"Fast Start to Building on AWS", Igor Ivaniuk
"Fast Start to Building on AWS", Igor Ivaniuk
Fwdays36 views
"AI Startup Growth from Idea to 1M ARR", Oleksandr Uspenskyi by Fwdays
"AI Startup Growth from Idea to 1M ARR", Oleksandr Uspenskyi"AI Startup Growth from Idea to 1M ARR", Oleksandr Uspenskyi
"AI Startup Growth from Idea to 1M ARR", Oleksandr Uspenskyi
Fwdays26 views
Photowave Presentation Slides - 11.8.23.pptx by CXL Forum
Photowave Presentation Slides - 11.8.23.pptxPhotowave Presentation Slides - 11.8.23.pptx
Photowave Presentation Slides - 11.8.23.pptx
CXL Forum126 views
Liqid: Composable CXL Preview by CXL Forum
Liqid: Composable CXL PreviewLiqid: Composable CXL Preview
Liqid: Composable CXL Preview
CXL Forum121 views
"Thriving Culture in a Product Company — Practical Story", Volodymyr Tsukur by Fwdays
"Thriving Culture in a Product Company — Practical Story", Volodymyr Tsukur"Thriving Culture in a Product Company — Practical Story", Volodymyr Tsukur
"Thriving Culture in a Product Company — Practical Story", Volodymyr Tsukur
Fwdays40 views
Upskilling the Evolving Workforce with Digital Fluency for Tomorrow's Challen... by NUS-ISS
Upskilling the Evolving Workforce with Digital Fluency for Tomorrow's Challen...Upskilling the Evolving Workforce with Digital Fluency for Tomorrow's Challen...
Upskilling the Evolving Workforce with Digital Fluency for Tomorrow's Challen...
NUS-ISS23 views
AMD: 4th Generation EPYC CXL Demo by CXL Forum
AMD: 4th Generation EPYC CXL DemoAMD: 4th Generation EPYC CXL Demo
AMD: 4th Generation EPYC CXL Demo
CXL Forum126 views
Combining Orchestration and Choreography for a Clean Architecture by ThomasHeinrichs1
Combining Orchestration and Choreography for a Clean ArchitectureCombining Orchestration and Choreography for a Clean Architecture
Combining Orchestration and Choreography for a Clean Architecture
ThomasHeinrichs168 views
AI: mind, matter, meaning, metaphors, being, becoming, life values by Twain Liu 刘秋艳
AI: mind, matter, meaning, metaphors, being, becoming, life valuesAI: mind, matter, meaning, metaphors, being, becoming, life values
AI: mind, matter, meaning, metaphors, being, becoming, life values
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV by Splunk
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV
Splunk86 views

HPTS 2011: The NoSQL Ecosystem

  • 1. The NoSQL Ecosystem Adam Marcus MIT CSAIL marcua@csail.mit.edu / @marcua
  • 2. About Me ● Social Computing + Database Systems ● Easily Distracted: Wrote The NoSQL Ecosystem in The Architecture of Open Source Applications 1 1 http://www.aosabook.org/en/nosql.html
  • 4. History, Compressed Buy RAM Late 1990s Today
  • 5. History, Compressed Buy RAM Wrap RDBMSs Late 1990s Today
  • 6. History, Compressed Buy RAM Wrap RDBMSs NoSQL Late 1990s Today
  • 7. History, Compressed Not Only SQL Buy RAM Wrap RDBMSs NoSQL Late 1990s Today
  • 8. History, Compressed Buy RAM Wrap RDBMSs NoSQL Commercialize Late 1990s Today
  • 9. History, Compressed Buy RAM Wrap RDBMSs NoSQL Commercialize Late 1990s Today
  • 10. The List, So You Don't Yell at Me HBase CouchDB Neo4j Cassandra Riak InfoGrid BerkeleyDB Voldemort HyperTable Redis MongoDB Sones AllegroGraph HyperGraphDB DEX FlockDB VertexDB Oracle NoSQL Tokyo Cabinet MemcacheDB
  • 11. The List, So You Don't Yell at Me Marcus' Law of Databases: HBase CouchDB Neo4j Cassandra Riak InfoGrid The number of persistence HyperTable BerkeleyDB doubles every 1.5 years Voldemort options Redis MongoDB Sones AllegroGraph HyperGraphDB DEX FlockDB VertexDB Oracle NoSQL Tokyo Cabinet MemcacheDB
  • 12. The List, So You Don't Yell at Me Marcus' Law of Databases: HBase CouchDB Neo4j Cassandra Riak InfoGrid The number of persistence HyperTable BerkeleyDB doubles every 1.5 years Voldemort options Redis MongoDB Sones Corollary: AllegroGraph HyperGraphDB I'm sorry I missedDEX FlockDB your persistence optionOracle NoSQL VertexDB Tokyo Cabinet MemcacheDB
  • 13. interesting properties real-world usage takeaways questions
  • 14. interesting properties real-world usage takeaways questions
  • 15. Conventional Wisdom on NoSQL ● Key-based data model ● Sloppy schema ● Single-key transactions ● In-app joins ● Eventually consistent
  • 16. Exceptions are More Interesting ● Data Model ● Query Model ● Transactions ● Consistency
  • 17. Data Model ● Usually key-based... ● Column-families ● Documents ● Data structures
  • 18. Data Model ● Usually key-based... ● Column-families ● Documents ● Data structures ● ...but not always ● Graph stores
  • 19. Query Model ● Redis: data structure-specific operations ● CouchDB, Riak: MapReduce ● Cassandra, MongoDB: SQL-like languages, no joins or transactions
  • 20. Query Model ● Redis: data structure-specific operations ● CouchDB, Riak: MapReduce ● Cassandra, MongoDB: SQL-like languages, no joins or transactions ● Third-party ● High-level: PigLatin, HiveQL ● Library: Cascading, Crunch ● Streaming: Flume, Kafka, S4, Scribe
  • 21. Transactions ● Full ACID for single key ● Redis: multi-key single-node transactions
  • 22. Consistency ● Strong: Appears that all replicas see all writes ● Eventual: Replicas may have different, divergent versions
  • 23. Consistency ● Strong: Appears that all replicas see all writes ● Eventual: Replicas may have different, divergent versions ● Dynamo: strong or eventual (quorum size)
  • 24. Consistency ● Strong: Appears that all replicas see all writes ● Eventual: Replicas may have different, divergent versions ● Dynamo: strong or eventual (quorum size) ● PNUTs: Timeline consistency ● ...many consistency models! See: http://www.allthingsdistributed.com/2007/12/eventually_consistent.html
  • 25. interesting properties real-world usage takeaways questions
  • 26. NoSQL Use-cases ● Cassandra ● HBase ● MongoDB
  • 27. Cassandra ● BigTable data model: key column family ● Dynamo sharding model: consistent hashing ● Eventual or strong consistency
  • 28. Cassandra at Netflix ● Transitioned from Oracle ● Store customer profiles, customer:movie watch log, and detailed usage logging Note: no multi-record locking (e.g., bank transfer) In-datacenter: 3 replicas, per-app consistency read/write quorum = 1, ~1ms latency read/write quorum = 2, ~3ms latency “Replicating Datacenter Oracle with Global Apache Cassandra on AWS” by Adrian Cockcroft http://www.slideshare.net/adrianco/migrating-netflix-from-oracle-to-global-cassandra
  • 29. Cassandra at Netflix ● Transitioned from Oracle ● Store customer profiles, customer:movie watch log, and detailed usage logging ● Note: no multi-record locking (e.g., bank transfer) In-datacenter: 3 replicas, per-app consistency read/write quorum = 1, ~1ms latency read/write quorum = 2, ~3ms latency “Replicating Datacenter Oracle with Global Apache Cassandra on AWS” by Adrian Cockcroft http://www.slideshare.net/adrianco/migrating-netflix-from-oracle-to-global-cassandra
  • 30. Cassandra at Netflix ● Transitioned from Oracle ● Store customer profiles, customer:movie watch log, and detailed usage logging ● Note: no multi-record locking (e.g., bank transfer) ● In-datacenter: 3 replicas, per-app consistency “Replicating Datacenter Oracle with Global Apache Cassandra on AWS” by Adrian Cockcroft http://www.slideshare.net/adrianco/migrating-netflix-from-oracle-to-global-cassandra
  • 31. Cassandra at Netflix (cont'd) ● Benefit: async inter-datacenter replication ● Benefit: no downtime for schema changes ● Benefit: hooks for live backups
  • 32. HBase ● Data model: key column family ● Sharding model: range partitioning ● Strong consistency Applications Logging events/crawls, storing analytics Twitter: replicate data from MySQL, Hadoop analytics Facebook Messages
  • 33. HBase ● Data model: key column family ● Sharding model: range partitioning ● Strong consistency ● Applications ● Logging events/crawls, storing analytics ● Twitter: replicate data from MySQL, Hadoop analytics ● Facebook Messages
  • 34. HBase for Facebook Messages ● Cassandra/Dynamo eventual consistency was difficult to program against Benefit: simple consistency model Benefit: flexible data model Benefit: simple sharding, load balancing, replication
  • 35. HBase for Facebook Messages ● Cassandra/Dynamo eventual consistency was difficult to program against ● Benefit: simple consistency model ● Benefit: flexible data model ● Benefit: simple sharding, load balancing, replication
  • 36. MongoDB ● Document-based data model ● Range-based partitioning ● Consistency depends on how you use it
  • 37. MongoDB: Two use-cases ● Archiving at Craigslist ● 2.2B historical posts, semi-structured ● Relatively large blobs: avg 2KB, max > 4 MB
  • 38. MongoDB: Two use-cases ● Archiving at Craigslist ● 2.2B historical posts, semi-structured ● Relatively large blobs: avg 2KB, max > 4 MB ● Checkins at Foursquare ● Geospatial indexing ● Small location-based updates, sharded on user
  • 39. interesting properties real-world usage takeaways questions
  • 40. Takeaways ● Developer accessibility ● Ecosystem of reuse ● Soft spot
  • 42. Developer Accessibility devops @DEVOPS_BORAT
  • 43. Developer Accessibility devops self-taught dev @DEVOPS_BORAT
  • 44. A Different Five-Minute Rule What does a first-time user of your system experience in their first five minutes? (idea credit: Justin Sheehy, Basho)
  • 48. Accessibility is Forever Beyond five minutes: ● changing schemas ● scaling up ● modifying topology
  • 49. Accessibility is Forever Beyond five minutes: ● changing schemas ● scaling up ● modifying topology An accessible data store will ease first-time users in, and support experienced users as they expand
  • 51. Spectrum of Reusability Monolithic System System Kernel
  • 52. Spectrum of Reusability Monolithic System System Kernel MongoDB Redis Use as provided
  • 53. Spectrum of Reusability Monolithic System System Kernel MongoDB Cassandra Redis Voldemort Use as Pick between provided components
  • 54. Spectrum of Reusability Monolithic System System Kernel MongoDB Cassandra ZooKeeper Redis Voldemort LevelDB Use as Pick between Reusable provided components components
  • 55. Spectrum of Reusability Monolithic System System Kernel MongoDB Cassandra ZooKeeper Redis Voldemort LevelDB riak_core Use as Pick between Reusable Build your provided components components own system
  • 56. And finally, a soft spot
  • 57. polyglot persistence = right tool for right task
  • 58. Polyglot persistence: Where's the data? “HBase at Mendeley,” Dan Harvey http://www.slideshare.net/danharvey/hbase-at-mendeley
  • 59. Polyglot persistence: Where's the data? “HBase at Mendeley,” Dan Harvey http://www.slideshare.net/danharvey/hbase-at-mendeley
  • 60. Polyglot persistence: Where's the data? Aid developers in reasoning about data consistency across multiple storage engines “HBase at Mendeley,” Dan Harvey http://www.slideshare.net/danharvey/hbase-at-mendeley
  • 61. interesting properties real-world usage takeaways questions
  • 62. Systems: Polyglot persistence + data consistency?
  • 63. Systems: Polyglot persistence + data consistency? ● Operations: Availability/consistency/latency tradeoffs in datacenters?
  • 64. Systems: Polyglot persistence + data consistency? ● Operations: Availability/consistency/latency tradeoffs in datacenters? ● Accessibility: RDBMS five-minute usability?
  • 65. Systems: Polyglot persistence + data consistency? ● Operations: Availability/consistency/latency tradeoffs in datacenters? ● Accessibility: RDBMS five-minute usability? ● Accessibility: Scale-up wizard for storage systems?
  • 66. Systems: Polyglot persistence + data consistency? ● Operations: Availability/consistency/latency tradeoffs in datacenters? ● Accessibility: RDBMS five-minute usability? ● Accessibility: Scale-up wizard for storage systems? ● Comparisons: Design or implementation?
  • 67. Systems: Polyglot persistence + data consistency? ● Operations: Availability/consistency/latency tradeoffs in datacenters? ● Accessibility: RDBMS five-minute usability? ● Accessibility: Scale-up wizard for storage systems? ● Comparisons: Design or implementation? ● Future: Next-generation NoSQL stores?
  • 68. Thank You! ● Systems: Polyglot persistence + data consistency? ● Operations: Availability/consistency/latency tradeoffs in datacenters? ● Accessibility: RDBMS five-minute usability? ● Accessibility: Scale-up wizard for storage systems? ● Comparisons: Design or implementation? ● Future: Next-generation NoSQL stores? Adam Marcus marcua@csail.mit.edu / @marcua