SlideShare a Scribd company logo
1 of 93
Dynamo:
Theme and Variations
@shanley
Riak
150
Services
 Global access         Multiple machines       Multiple datacenters
                                  
    Scale to peak loads easily    Tolerance of continuous failure
Traditionally production systems store their
                  state in relational databases. For many of the
                  more common usage patterns of state
                  persistence, however, a relational database is
                  a solution that is far from ideal.

                  Most of these services only store and retrieve
                  data by primary key and do not require the
                  complex querying and management
                  functionality offered by an RDBMS.

                  This excess functionality requires expensive
                  hardware and highly skilled personnel for its
                  operation, making it a very inefficient solution.

                  In addition, the available replication
                  technologies are limited and typically choose
                  consistency over availability.

                  Although many advances have been made in
                  the recent years, it is still not easy to scale-out
                  databases or use smart partitioning schemes
                  for load balancing.
Dynamo: Amazon’s Highly Available Key-value Store
CAP Theorem
People tend to focus on consistency/availability as the sole driver of
emerging database models because it provides a simple and academic
explanation for more complex evolutionary factors. In fact, CAP
Theorem, according to its original author, “prohibits only a tiny part of
the design space: perfect availability and consistency in the presence of
partitions, which are rare… there is little reason to forfeit C or A when
the system is not partitioned.” In reality, a much larger range of
considerations and tradeoffs have informed the “NoSQL” movement…
Traditionally production systems store their
                  state in relational databases. For many of the
                  more common usage patterns of state
                  persistence, however, a relational database is
                  a solution that is far from ideal.

                  Most of these services only store and retrieve
                  data by primary key and do not require the
                  complex querying and management
                  functionality offered by an RDBMS.

                  This excess functionality requires expensive
                  hardware and highly skilled personnel for its
                  operation, making it a very inefficient solution.

                  In addition, the available replication
                  technologies are limited and typically choose
                  consistency over availability.

                  Although many advances have been made in
                  the recent years, it is still not easy to scale-out
                  databases or use smart partitioning schemes
                  for load balancing.
Dynamo: Amazon’s Highly Available Key-value Store
Spanner is Google’s scalable, multi-version,
                  globally- distributed, and synchronously-
                  replicated database… It is the first system to
                  distribute data at global scale and support
                  externally-consistent distributed transactions...

                  Spanner is designed to scale up to millions of
                  machines across hundreds of datacenters and
                  trillions of database rows… Spanner’s main
                  focus is managing cross-datacenter replicated
                  data…

                  Spanner started… as part of a rewrite of
                  Google’s advertising backend called F1 [35].
                  This backend was originally based on a MySQL
                  database…

                  Resharding this revenue-critical database as it
                  grew in the number of customers and their
                  data was extremely costly. The last resharding
                  took over two years of intense effort…



Spanner: Google’s Globally-Distributed Database
Shanley’s Theorem
Database design is driven
by a virtuous tension
between the requirements
of the app, the profile of
developer productivity,
and the limitations of the
operational scenario.
Database design is driven
by a virtuous tension
between the requirements
of the app, the profile of
developer productivity,
and the limitations of the
  Stringent latency requirements measured at the 99.9% percentile  Highly available

                   Always writeable
                                          
                                         Modeled as keys/values

operational scenario.
Database design is driven
by a virtuous tension
between the requirements
of the app, the profile of
developer productivity,
and the limitations of the
  Choice to manage conflict resolution themselves or manage on the data store level

      Simple, primary-key only interface    No need for relational data model

operational scenario.
Database design is driven
by a virtuous tension
between the requirements
of the app, the profile of
 Functions on commodity hardware  Each object must be replicated across multiple DCs

developer productivity,
     Can scale out one node at a time with minimal impact on system and operators



and the limitations of the
operational scenario.
Database design is driven
by a virtuous tension
between the requirements
of the app, the profile of
developer productivity,
and the
Database design is driven
by a virtuous tension
between the requirements
of the app, the profile of
developer productivity,
and the limitations of the
      1995: Less than 40 million internet users; now: 2.4 billion

     Latency perceived as unavailability  New types of applications

operational scenario.              
Database design is driven
by a virtuous tension
between the requirements
of the app, the profile of
developer productivity,
and the limitations of the
             Much more data      Unstructured data

               New kinds of business requirements

operational scenario.
     App scales gracefully without high development overheard
Database design is driven
by a virtuous tension
between the requirements
of the app, the profile of
  Scale-out design on less expensive hardware       Ability to easily meet peak loads

developer productivity,
           Run efficiently across multiple sites   Low operational burden



and the limitations of the
operational scenario
Aspects of the database:
•    How to distribute data around the cluster
•   Adding new nodes
•   Replicating data
•   Resolving data conflicts
•   Dealing with failure scenarios
•   Data model
Database design is driven
by a virtuous tension
between the requirements
of the app, the profile of
developer productivity,
and the how to distribute
data around the cluster
Database design is driven
by a virtuous tension
between the requirements
of the app, the profile of
developer productivity,
and the how to distribute
data around the cluster
Bunny Names A-G



    Bunny Names H-R




     Bunny Names R-Z




    how to distribute data
around the cluster
Bunny Names Disproportionately Trend Towards Bunny, Cuddles, Fluffy,
    Mr. Bunny, Peter Rabbit, Velveteen, Peter Cottontail, and Mitten




    how to distribute data
around the cluster
how to distribute data
around the cluster
 Reduce risk of hot spots in the database   Data is automatically assigned to nodes




    how to distribute data
around the cluster
how to distribute data
around the cluster
how to distribute data
around the cluster
how to distribute data
around the cluster
adding new nodes
adding new nodes
Bunny Names A-G




adding new nodes
A-G   Bunny Names A-G   Bunny Names A-G   Bunny Na




      adding new nodes
adding new nodes
adding new nodes
 “decoupling of partitioning and partition placement”



  adding new nodes
replicating data
master




                          slave
slave




        replicating data
writes
                      master




                          slave
slave




        replicating data
writes
                              master



reads




                                  slave
        slave




                replicating data
master




                          slave
slave




        replicating data
Availability Clock
                                           master

1. Time to figure out the master is gone
2. Master election




                                               slave
              slave




                      replicating data
Availability Clock
                                       master

Consistency > Availability

Unavailable to writes until data can
be confirmed correct




                                           slave
               slave




                       replicating data
replicating data
“Every node in Dynamo should have the same
set of responsibilities as its peers; there should
be no distinguished node or nodes that take
special roles or extra set of responsibilities.”



          replicating data
writes      reads       writes
writes
            reads                               reads
                      writes       reads
     reads
                     reads
         writes                             writes
             reads      writes      reads




         replicating data
writes      reads       writes
writes
               reads                               reads
                        writes        reads
     reads
                       reads
         writes                                writes
             reads         writes      reads




            Clients can read / write to any node
          All updates reach all replicas eventually

         replicating data
 w and r values

replicating data
 number of replicas that need to participate in a read/write for a
                         success response

           replicating data
put(                  )

                            w=1
 only one node needs to be available to complete write request

          replicating data
put(           )


          w=3

replicating data
 reads when not all writes have propagated (laggy or down node)

        resolving conflicts
 different clients update at the exact same time

resolving conflicts
a85hYGBgzm
DKBVIsTFUPPmcwJ
TLmsTIcmsJ1nA8qz
K7HcQwqfB0hzNac
xCYWcA1ZIgsA
Whether one object is a direct descendant of the other
   Whether the objects are direct descendants of a common parent
       Whether the objects are unrelated in recent heritage




vector clocks that show relationships between objects


   resolving conflicts
vector clock is updated when objects are updated
last-write wins or conflicts can be resolved on client side

    resolving conflicts
if stale responses are returned as part of the read,
              those replicas are updated

 resolving conflicts
failure conditions
n = 3


failure conditions
writes & updates




                   n = 3


failure conditions
hinted handoff


failure conditions
developing apps
“Most of these services only store and retrieve
 data by primary key and do not require the
 complex querying and management
 functionality offered by an RDBMS.”




developing apps
“schema-less”    more flexibility, agility

developing apps
Session      User/Session ID   Session Data




          developing apps
Session       User/Session ID   Session Data

Advertising   Campaign ID       Ad Content




          developing apps
Session       User/Session ID   Session Data

Advertising   Campaign ID       Ad Content

Logs          Date              Log File




          developing apps
Session       User/Session ID   Session Data

Advertising   Campaign ID       Ad Content

Logs          Date              Log File

Sensor        Date, Date/Time   Updates




          developing apps
Session        User/Session ID      Session Data

Advertising    Campaign ID          Ad Content

Logs           Date                 Log File

Sensor         Date, Date/Time      Updates

User Data      Login, Email, UUID   User Attributes




            developing apps
Session        User/Session ID         Session Data

Advertising    Campaign ID             Ad Content

Logs           Date                    Log File

Sensor         Date, Date/Time         Updates

User Data      Login, Email, UUID      User Attributes

Content         Title, Integer, Etc.   Text, JSON, XML


            developing apps
future
future
future
more data types
 counters     sets

 sever side structure and conflict resolution policy

    more data types
there is little reason to forfeit C or A when the
  system is not partitioned




strong consistency
 conditional writes    consistent reads



strong consistency
other advanced features
other advanced features
 metadata    aggregation tasks  search



other advanced features
 metadata    aggregation tasks  search



other advanced features
other advanced features
In summary…
rapid evolutionary change
significant events
explosion of new systems
evolving into higher-order
         systems
We’re hiring.
shanley@basho.com

More Related Content

What's hot

DBArtisan® vs Quest Toad with DB Admin Module
DBArtisan® vs Quest Toad with DB Admin ModuleDBArtisan® vs Quest Toad with DB Admin Module
DBArtisan® vs Quest Toad with DB Admin ModuleEmbarcadero Technologies
 
Design Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational DatabasesDesign Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational Databasesguestdfd1ec
 
Compaction and Splitting in Apache Accumulo
Compaction and Splitting in Apache AccumuloCompaction and Splitting in Apache Accumulo
Compaction and Splitting in Apache AccumuloHortonworks
 
Realtime hadoopsigmod2011
Realtime hadoopsigmod2011Realtime hadoopsigmod2011
Realtime hadoopsigmod2011iammutex
 
Using multi tiered storage systems for storing both structured & unstructured...
Using multi tiered storage systems for storing both structured & unstructured...Using multi tiered storage systems for storing both structured & unstructured...
Using multi tiered storage systems for storing both structured & unstructured...ORACLE USER GROUP ESTONIA
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMichael Hiskey
 
Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio overview jan 2013
Kognitio overview jan 2013Michael Hiskey
 
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseHBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseEdureka!
 
Cloudera Sessions - Clinic 1 - Getting Started With Hadoop
Cloudera Sessions - Clinic 1 - Getting Started With HadoopCloudera Sessions - Clinic 1 - Getting Started With Hadoop
Cloudera Sessions - Clinic 1 - Getting Started With HadoopCloudera, Inc.
 
10 sql tips to speed up your database cats whocode.com
10 sql tips to speed up your database   cats whocode.com10 sql tips to speed up your database   cats whocode.com
10 sql tips to speed up your database cats whocode.comKaing Menglieng
 
How an Enterprise Data Fabric (EDF) can improve resiliency and performance
How an Enterprise Data Fabric (EDF) can improve resiliency and performanceHow an Enterprise Data Fabric (EDF) can improve resiliency and performance
How an Enterprise Data Fabric (EDF) can improve resiliency and performancegojkoadzic
 
Times Ten in-memory database when time counts - Laszlo Ludas
Times Ten in-memory database when time counts - Laszlo LudasTimes Ten in-memory database when time counts - Laszlo Ludas
Times Ten in-memory database when time counts - Laszlo LudasORACLE USER GROUP ESTONIA
 
Adaptec Hybrid RAID Whitepaper
Adaptec Hybrid RAID WhitepaperAdaptec Hybrid RAID Whitepaper
Adaptec Hybrid RAID WhitepaperAdaptec by PMC
 
Microsoft Openness Mongo DB
Microsoft Openness Mongo DBMicrosoft Openness Mongo DB
Microsoft Openness Mongo DBHeriyadi Janwar
 

What's hot (19)

DBArtisan® vs Quest Toad with DB Admin Module
DBArtisan® vs Quest Toad with DB Admin ModuleDBArtisan® vs Quest Toad with DB Admin Module
DBArtisan® vs Quest Toad with DB Admin Module
 
sigmod08
sigmod08sigmod08
sigmod08
 
Design Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational DatabasesDesign Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational Databases
 
Compaction and Splitting in Apache Accumulo
Compaction and Splitting in Apache AccumuloCompaction and Splitting in Apache Accumulo
Compaction and Splitting in Apache Accumulo
 
Cloud computing era
Cloud computing eraCloud computing era
Cloud computing era
 
Introduction to h base
Introduction to h baseIntroduction to h base
Introduction to h base
 
Realtime hadoopsigmod2011
Realtime hadoopsigmod2011Realtime hadoopsigmod2011
Realtime hadoopsigmod2011
 
Using multi tiered storage systems for storing both structured & unstructured...
Using multi tiered storage systems for storing both structured & unstructured...Using multi tiered storage systems for storing both structured & unstructured...
Using multi tiered storage systems for storing both structured & unstructured...
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
 
Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio overview jan 2013
Kognitio overview jan 2013
 
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseHBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
 
Cloudera Sessions - Clinic 1 - Getting Started With Hadoop
Cloudera Sessions - Clinic 1 - Getting Started With HadoopCloudera Sessions - Clinic 1 - Getting Started With Hadoop
Cloudera Sessions - Clinic 1 - Getting Started With Hadoop
 
10 sql tips to speed up your database cats whocode.com
10 sql tips to speed up your database   cats whocode.com10 sql tips to speed up your database   cats whocode.com
10 sql tips to speed up your database cats whocode.com
 
How an Enterprise Data Fabric (EDF) can improve resiliency and performance
How an Enterprise Data Fabric (EDF) can improve resiliency and performanceHow an Enterprise Data Fabric (EDF) can improve resiliency and performance
How an Enterprise Data Fabric (EDF) can improve resiliency and performance
 
Times Ten in-memory database when time counts - Laszlo Ludas
Times Ten in-memory database when time counts - Laszlo LudasTimes Ten in-memory database when time counts - Laszlo Ludas
Times Ten in-memory database when time counts - Laszlo Ludas
 
Adaptec Hybrid RAID Whitepaper
Adaptec Hybrid RAID WhitepaperAdaptec Hybrid RAID Whitepaper
Adaptec Hybrid RAID Whitepaper
 
gfs-sosp2003
gfs-sosp2003gfs-sosp2003
gfs-sosp2003
 
Microsoft Openness Mongo DB
Microsoft Openness Mongo DBMicrosoft Openness Mongo DB
Microsoft Openness Mongo DB
 
Bigdata and Hadoop Introduction
Bigdata and Hadoop IntroductionBigdata and Hadoop Introduction
Bigdata and Hadoop Introduction
 

Similar to Dynamo Systems - QCon SF 2012 Presentation

Building Scalable Databases on AWS - AWS Summit 2012 - NYC
Building Scalable Databases on AWS - AWS Summit 2012 - NYCBuilding Scalable Databases on AWS - AWS Summit 2012 - NYC
Building Scalable Databases on AWS - AWS Summit 2012 - NYCAmazon Web Services
 
Navigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesNavigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesshnkr_rmchndrn
 
Scalable Database Options on AWS
Scalable Database Options on AWSScalable Database Options on AWS
Scalable Database Options on AWSAmazon Web Services
 
DAT101 Understanding AWS Database Options - AWS re: Invent 2012
DAT101 Understanding AWS Database Options - AWS re: Invent 2012DAT101 Understanding AWS Database Options - AWS re: Invent 2012
DAT101 Understanding AWS Database Options - AWS re: Invent 2012Amazon Web Services
 
Minnebar 2013 - Scaling with Cassandra
Minnebar 2013 - Scaling with CassandraMinnebar 2013 - Scaling with Cassandra
Minnebar 2013 - Scaling with CassandraJeff Bollinger
 
Common MongoDB Use Cases
Common MongoDB Use CasesCommon MongoDB Use Cases
Common MongoDB Use CasesDATAVERSITY
 
13h00 p duff-building-applications-with-aws-final
13h00   p duff-building-applications-with-aws-final13h00   p duff-building-applications-with-aws-final
13h00 p duff-building-applications-with-aws-finalLuiz Gustavo Santos
 
Why Every NoSQL Deployment Should Be Paired with Hadoop Webinar
Why Every NoSQL Deployment Should Be Paired with Hadoop WebinarWhy Every NoSQL Deployment Should Be Paired with Hadoop Webinar
Why Every NoSQL Deployment Should Be Paired with Hadoop WebinarCloudera, Inc.
 
SQL and NoSQL in SQL Server
SQL and NoSQL in SQL ServerSQL and NoSQL in SQL Server
SQL and NoSQL in SQL ServerMichael Rys
 
Front Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesFront Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesJon Meredith
 
NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabasesAdi Challa
 
Optimizing Your Database Performance | Embarcadero Technologies
Optimizing Your Database Performance | Embarcadero TechnologiesOptimizing Your Database Performance | Embarcadero Technologies
Optimizing Your Database Performance | Embarcadero TechnologiesEmbarcadero Technologies
 
Optimizing Your Database Performance | Embarcadero Technologies
Optimizing Your Database Performance | Embarcadero TechnologiesOptimizing Your Database Performance | Embarcadero Technologies
Optimizing Your Database Performance | Embarcadero TechnologiesMichael Findling
 
Top 6 Reasons to Use a Distributed Data Grid
Top 6 Reasons to Use a Distributed Data GridTop 6 Reasons to Use a Distributed Data Grid
Top 6 Reasons to Use a Distributed Data GridScaleOut Software
 
Database DBMS SQL ORACLE
Database DBMS SQL ORACLEDatabase DBMS SQL ORACLE
Database DBMS SQL ORACLERahul Kunchhal
 

Similar to Dynamo Systems - QCon SF 2012 Presentation (20)

Building Scalable Databases on AWS - AWS Summit 2012 - NYC
Building Scalable Databases on AWS - AWS Summit 2012 - NYCBuilding Scalable Databases on AWS - AWS Summit 2012 - NYC
Building Scalable Databases on AWS - AWS Summit 2012 - NYC
 
Navigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesNavigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skies
 
Scalable Database Options on AWS
Scalable Database Options on AWSScalable Database Options on AWS
Scalable Database Options on AWS
 
DAT101 Understanding AWS Database Options - AWS re: Invent 2012
DAT101 Understanding AWS Database Options - AWS re: Invent 2012DAT101 Understanding AWS Database Options - AWS re: Invent 2012
DAT101 Understanding AWS Database Options - AWS re: Invent 2012
 
High Performance Databases
High Performance DatabasesHigh Performance Databases
High Performance Databases
 
Understanding Database Options
Understanding Database OptionsUnderstanding Database Options
Understanding Database Options
 
No sql
No sqlNo sql
No sql
 
Minnebar 2013 - Scaling with Cassandra
Minnebar 2013 - Scaling with CassandraMinnebar 2013 - Scaling with Cassandra
Minnebar 2013 - Scaling with Cassandra
 
Common MongoDB Use Cases
Common MongoDB Use CasesCommon MongoDB Use Cases
Common MongoDB Use Cases
 
13h00 p duff-building-applications-with-aws-final
13h00   p duff-building-applications-with-aws-final13h00   p duff-building-applications-with-aws-final
13h00 p duff-building-applications-with-aws-final
 
Why Every NoSQL Deployment Should Be Paired with Hadoop Webinar
Why Every NoSQL Deployment Should Be Paired with Hadoop WebinarWhy Every NoSQL Deployment Should Be Paired with Hadoop Webinar
Why Every NoSQL Deployment Should Be Paired with Hadoop Webinar
 
Building Applications with AWS
Building Applications with AWSBuilding Applications with AWS
Building Applications with AWS
 
SQL and NoSQL in SQL Server
SQL and NoSQL in SQL ServerSQL and NoSQL in SQL Server
SQL and NoSQL in SQL Server
 
Front Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesFront Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
 
Db trends final
Db trends   finalDb trends   final
Db trends final
 
NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabases
 
Optimizing Your Database Performance | Embarcadero Technologies
Optimizing Your Database Performance | Embarcadero TechnologiesOptimizing Your Database Performance | Embarcadero Technologies
Optimizing Your Database Performance | Embarcadero Technologies
 
Optimizing Your Database Performance | Embarcadero Technologies
Optimizing Your Database Performance | Embarcadero TechnologiesOptimizing Your Database Performance | Embarcadero Technologies
Optimizing Your Database Performance | Embarcadero Technologies
 
Top 6 Reasons to Use a Distributed Data Grid
Top 6 Reasons to Use a Distributed Data GridTop 6 Reasons to Use a Distributed Data Grid
Top 6 Reasons to Use a Distributed Data Grid
 
Database DBMS SQL ORACLE
Database DBMS SQL ORACLEDatabase DBMS SQL ORACLE
Database DBMS SQL ORACLE
 

Recently uploaded

Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 

Recently uploaded (20)

Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 

Dynamo Systems - QCon SF 2012 Presentation

  • 4.
  • 5.
  • 7.
  • 8.
  • 9.  Global access  Multiple machines  Multiple datacenters   Scale to peak loads easily  Tolerance of continuous failure
  • 10.
  • 11. Traditionally production systems store their state in relational databases. For many of the more common usage patterns of state persistence, however, a relational database is a solution that is far from ideal. Most of these services only store and retrieve data by primary key and do not require the complex querying and management functionality offered by an RDBMS. This excess functionality requires expensive hardware and highly skilled personnel for its operation, making it a very inefficient solution. In addition, the available replication technologies are limited and typically choose consistency over availability. Although many advances have been made in the recent years, it is still not easy to scale-out databases or use smart partitioning schemes for load balancing. Dynamo: Amazon’s Highly Available Key-value Store
  • 13. People tend to focus on consistency/availability as the sole driver of emerging database models because it provides a simple and academic explanation for more complex evolutionary factors. In fact, CAP Theorem, according to its original author, “prohibits only a tiny part of the design space: perfect availability and consistency in the presence of partitions, which are rare… there is little reason to forfeit C or A when the system is not partitioned.” In reality, a much larger range of considerations and tradeoffs have informed the “NoSQL” movement…
  • 14. Traditionally production systems store their state in relational databases. For many of the more common usage patterns of state persistence, however, a relational database is a solution that is far from ideal. Most of these services only store and retrieve data by primary key and do not require the complex querying and management functionality offered by an RDBMS. This excess functionality requires expensive hardware and highly skilled personnel for its operation, making it a very inefficient solution. In addition, the available replication technologies are limited and typically choose consistency over availability. Although many advances have been made in the recent years, it is still not easy to scale-out databases or use smart partitioning schemes for load balancing. Dynamo: Amazon’s Highly Available Key-value Store
  • 15. Spanner is Google’s scalable, multi-version, globally- distributed, and synchronously- replicated database… It is the first system to distribute data at global scale and support externally-consistent distributed transactions... Spanner is designed to scale up to millions of machines across hundreds of datacenters and trillions of database rows… Spanner’s main focus is managing cross-datacenter replicated data… Spanner started… as part of a rewrite of Google’s advertising backend called F1 [35]. This backend was originally based on a MySQL database… Resharding this revenue-critical database as it grew in the number of customers and their data was extremely costly. The last resharding took over two years of intense effort… Spanner: Google’s Globally-Distributed Database
  • 17. Database design is driven by a virtuous tension between the requirements of the app, the profile of developer productivity, and the limitations of the operational scenario.
  • 18. Database design is driven by a virtuous tension between the requirements of the app, the profile of developer productivity, and the limitations of the  Stringent latency requirements measured at the 99.9% percentile  Highly available  Always writeable   Modeled as keys/values operational scenario.
  • 19. Database design is driven by a virtuous tension between the requirements of the app, the profile of developer productivity, and the limitations of the  Choice to manage conflict resolution themselves or manage on the data store level  Simple, primary-key only interface  No need for relational data model operational scenario.
  • 20. Database design is driven by a virtuous tension between the requirements of the app, the profile of  Functions on commodity hardware  Each object must be replicated across multiple DCs developer productivity,  Can scale out one node at a time with minimal impact on system and operators and the limitations of the operational scenario.
  • 21. Database design is driven by a virtuous tension between the requirements of the app, the profile of developer productivity, and the
  • 22. Database design is driven by a virtuous tension between the requirements of the app, the profile of developer productivity, and the limitations of the  1995: Less than 40 million internet users; now: 2.4 billion  Latency perceived as unavailability  New types of applications operational scenario. 
  • 23. Database design is driven by a virtuous tension between the requirements of the app, the profile of developer productivity, and the limitations of the  Much more data  Unstructured data  New kinds of business requirements operational scenario.  App scales gracefully without high development overheard
  • 24. Database design is driven by a virtuous tension between the requirements of the app, the profile of  Scale-out design on less expensive hardware Ability to easily meet peak loads developer productivity,  Run efficiently across multiple sites Low operational burden and the limitations of the operational scenario
  • 25. Aspects of the database: • How to distribute data around the cluster • Adding new nodes • Replicating data • Resolving data conflicts • Dealing with failure scenarios • Data model
  • 26. Database design is driven by a virtuous tension between the requirements of the app, the profile of developer productivity, and the how to distribute data around the cluster
  • 27. Database design is driven by a virtuous tension between the requirements of the app, the profile of developer productivity, and the how to distribute data around the cluster
  • 28. Bunny Names A-G Bunny Names H-R Bunny Names R-Z how to distribute data around the cluster
  • 29. Bunny Names Disproportionately Trend Towards Bunny, Cuddles, Fluffy, Mr. Bunny, Peter Rabbit, Velveteen, Peter Cottontail, and Mitten how to distribute data around the cluster
  • 30. how to distribute data around the cluster
  • 31.  Reduce risk of hot spots in the database Data is automatically assigned to nodes how to distribute data around the cluster
  • 32. how to distribute data around the cluster
  • 33. how to distribute data around the cluster
  • 34. how to distribute data around the cluster
  • 38. A-G Bunny Names A-G Bunny Names A-G Bunny Na adding new nodes
  • 41.  “decoupling of partitioning and partition placement” adding new nodes
  • 43. master slave slave replicating data
  • 44. writes master slave slave replicating data
  • 45. writes master reads slave slave replicating data
  • 46. master slave slave replicating data
  • 47. Availability Clock master 1. Time to figure out the master is gone 2. Master election slave slave replicating data
  • 48. Availability Clock master Consistency > Availability Unavailable to writes until data can be confirmed correct slave slave replicating data
  • 50. “Every node in Dynamo should have the same set of responsibilities as its peers; there should be no distinguished node or nodes that take special roles or extra set of responsibilities.” replicating data
  • 51. writes reads writes writes reads reads writes reads reads reads writes writes reads writes reads replicating data
  • 52. writes reads writes writes reads reads writes reads reads reads writes writes reads writes reads  Clients can read / write to any node  All updates reach all replicas eventually replicating data
  • 53.  w and r values replicating data
  • 54.  number of replicas that need to participate in a read/write for a success response replicating data
  • 55. put( )  w=1  only one node needs to be available to complete write request replicating data
  • 56. put( )  w=3 replicating data
  • 57.  reads when not all writes have propagated (laggy or down node) resolving conflicts
  • 58.  different clients update at the exact same time resolving conflicts
  • 60. Whether one object is a direct descendant of the other Whether the objects are direct descendants of a common parent Whether the objects are unrelated in recent heritage vector clocks that show relationships between objects resolving conflicts
  • 61. vector clock is updated when objects are updated last-write wins or conflicts can be resolved on client side resolving conflicts
  • 62. if stale responses are returned as part of the read, those replicas are updated resolving conflicts
  • 64. n = 3 failure conditions
  • 65. writes & updates n = 3 failure conditions
  • 68. “Most of these services only store and retrieve data by primary key and do not require the complex querying and management functionality offered by an RDBMS.” developing apps
  • 69. “schema-less”  more flexibility, agility developing apps
  • 70. Session User/Session ID Session Data developing apps
  • 71. Session User/Session ID Session Data Advertising Campaign ID Ad Content developing apps
  • 72. Session User/Session ID Session Data Advertising Campaign ID Ad Content Logs Date Log File developing apps
  • 73. Session User/Session ID Session Data Advertising Campaign ID Ad Content Logs Date Log File Sensor Date, Date/Time Updates developing apps
  • 74. Session User/Session ID Session Data Advertising Campaign ID Ad Content Logs Date Log File Sensor Date, Date/Time Updates User Data Login, Email, UUID User Attributes developing apps
  • 75. Session User/Session ID Session Data Advertising Campaign ID Ad Content Logs Date Log File Sensor Date, Date/Time Updates User Data Login, Email, UUID User Attributes Content Title, Integer, Etc. Text, JSON, XML developing apps
  • 80.  counters  sets  sever side structure and conflict resolution policy more data types
  • 81. there is little reason to forfeit C or A when the system is not partitioned strong consistency
  • 82.  conditional writes  consistent reads strong consistency
  • 85.  metadata  aggregation tasks  search other advanced features
  • 86.  metadata  aggregation tasks  search other advanced features
  • 91. explosion of new systems

Editor's Notes

  1. Twitter.com/shanley, shanley@basho.com
  2. Basho makes a highly available, open source, distributed database called Riak that is based on the principles of the Dynamo paper.
  3. The Dynamo paper was published about 5 years ago and outlines a set of properties for a highly available key/value store that provides an “always-on” experience for end users. Dynamo is an internal service at Amazon that power parts of AWS, including S3, and the Amazon.com shopping cart.
  4. As the Dynamo paper discusses, a single page request to Amazon can result in requests to over 150 services, many of which have their own dependencies. High availability and low-latency must be created across the entire service-oriented architecture to present these properties in the user-facing experience.
  5. Principles from the Dynamo paper have spawned many open-source and commercial projects including Cassandra (came out of Facebook, commercial players include Datastax and Acunu) and Project Voldemort (an open source project coming out of LinkedIn). Basho, launched in 2009, is based on many of Dynamo’s principles for distributed systems and extends the core functionality with full-text search, multi-datacenter replication, MapReduce and other features.
  6. The Amazon shopping cart is a canonical use case of an application requiring the high availability and low-latency that Dynamo-based systems provide. Amazon itself serves as a classic example of both a new class of website/application with a different set of business requirements, as well as a new breed of infrastructure provided by its Web Services platform. For both scenarios, unavailability has a direct impact on revenue, as well as on the user trust required for a healthy business.
  7. Beyond availability and low-latency, the Dynamo paper cites a number of other requirements for its data store. These are important design considerations that profoundly affect the implementation of the database.
  8. In the past, these problems might have been addressed with a big server and some MySQL. Why hasn’t Amazon taken that route?
  9. MySQL systems are consistent. Consistent systems will not be write available during failure conditions, focusing on correctness over serving requests. Ensuring write-availability is a critical aspect of the Dynamo design.
  10. CAP theorem states that in the event of network partition, systems must choose consistency or availability – not bolth.
  11. However, CAP is only one of many considerations in database design. We are in the early stages of databases that are designed to handle new types of operational environments and application needs / constraints.
  12. As the Dynamo paper shows, many other factors play a role – including the expense of hardware and people to maintain the system, and the need for a scale-out model.
  13. Indeed, even in database designs like Google’s Spanner (in contrast, a consistent distributed database), some of the primary concerns beyond CAP are the ability to scale to multiple datacenters and the operational cost of data growth.
  14. Perhaps it’s time for a new theorem!
  15. Amazon’s application requirements in creating Dynamo.
  16. Amazon’s definition of developer productivity in relation to Dynamo.
  17. Amazon’s operational requirements.
  18. Turns out, Amazon’s need were echoing trends going on in the larger world. The big shift is not all about CAP – it’s about the big shift to distributed systems. Distributed systems aren’t just about DNS and CDN anymore. The business needs that require a distributed system are relevant to more and more companies. As a result, you’re seeing more and more “old” ideas (databases, storage, monitoring) being re-architected and re-thought for a new, distributed environment.
  19. How requirements for applications have changed.
  20. How the way we define developer productivity has changed.
  21. How the operational environment we are building in has changed.
  22. These are the other hard problems in distributed systems that touch CAP but also speak beyond it – areas the Dynamo paper posits approaches to. Dynamo can really be seen as a collection of technologies and approaches that produce the desired properties of the database… all interconnected in the circle of life. Let’s take a look at how these areas have been handled in a relational world vs how Dynamo addresses them.
  23. In a relational world, you might have thrown all your data (let’s say you’re building a database of bunny pictures and information) on one big machine.
  24. As you grew, you might shard the data across multiple machines through some logical division of data.
  25. Sharding, however, can lead to hot spots in the database. Also, writing and maintaining sharding logic increases the overhead of operating and developing on the database. Significant growth of data or traffic typically means significant, often manual resharding projects. Figuring out how to intelligently split the dataset without negative impacts on performance, operations and development presents a significant challenge.
  26. Two of Dynamo’s goals were to reduce hot spots in the database, and reduce the operational burden of scale.
  27. To accomplish this, Dynamo uses consistent hashing, a technique first innovated at Akamai. In consistent hashing, you start with a 160-bit integer space…
  28. And divide it into equally sized partitions that form a range of possible values.
  29. Keys are hashed onto the integer space. Keys are “owned” by the partition they are hashed onto, and partitions, or virtual nodes, are claimed by physical nodes in the cluster. The even distribution of data created by the hashing function, and that fact that physical nodes share responsibility for partitions, results in a cluster that shares the load.
  30. In a relational world, scaling up might mean getting bigger machines…
  31. Or re-sharding your sharding scheme.
  32. At massive scale though, this might land you in sharding hell.
  33. You might note that the word “shard” appears zero times in the Dynamo paper.
  34. When new nodes are added to a Dynamo cluster, a joining node takes over partitions until responsibility is again equal. Existing nodes handoff data for the appropriate key spaces. Cluster state is shared through “gossip protocol” and nodes update their view of the cluster as it changes.
  35. This is one of the major innovations in the Dynamo paper. Decoupling the partitioning scheme from how partitions are assigned to physical nodes means that data load can be evenly shared by the cluster and adding nodes doesn’t requiring manually figuring out where to put data.
  36. Data in a relational database is generally replicated using a master/slave setup to ensure consistency.
  37. Writes are applied through a master node, which ensures that the writes are applied consistently to both master and slave nodes.
  38. Reads can occur at slave or master nodes, but writes must go through the master.
  39. This can be problematic in the event of master failure.
  40. It can take time for slaves nodes to realize the master is no longer available and to elect a new master.
  41. In the duration, that data will be unavailable to writes and updates.
  42. Dynamo paper states that there is no master that can cause write unavailability, and that all nodes are equal.
  43. In a Dynamo-based system, all nodes responsible for a replica can serve read and write requests. The system uses eventual consistency – as opposed to all replicas seeing an update at the same time, all replicas will see updates eventually – in practice, usually a very small window until all replicas reflect the most recent value.
  44. Dynamo systems also provide w and r values on requests to maintain availability despite failure, and to allow the developer to tune to some extent the “correctness” of reads and writes.
  45. Lower w and r values produce high availability and lower latency.
  46. With a w or r value equal to the number of replicas, higher correctness is possible.
  47. In any system that uses an eventually consistent model and replicates data, you run the risk of divergent data or “entropy”. Two common scenarios for this are when not all writes have propagated to all nodes…
  48. Or when different clients update the same datum concurrently.
  49. The solution… looks like this.
  50. A vector clock is a piece of metadata attached to each object.
  51. It gives the developer a choice: to resolve a conflict at the data store level (“last write wins”), or let the client resolve it with business logic relevant to the use case.
  52. Dynamo also provides some nice anti-entropy features, including read repair.
  53. Dynamo also has a slick way to maintain write availability during node failure.
  54. If a node becomes unavailable due to hardware failure or network partition, writes/updates for that node will go to a fallback.
  55. This neighboring node will “hand off” data when the original node returns to the cluster.
  56. Another significant point is how Dynamo changes application development and how we define developer productivity.
  57. There is now more unstructured data in the world, more apps that don’t require a strong schema. Using a “schema-less” key/value data model, you can eliminate some of the need for extensive data model “pre-planning”, and change applications or develop new features without changing the underlying data model. It’s simpler and for some applications that fit a key/value model, more productive.
  58. A look at common apps and use cases for Dynamo-like systems and simple approaches to building data on them with a key/value scheme…
  59. What does the future of NoSQL hold?
  60. Many “ NoSQL ” systems have thrown out consistency in favor of availability…
  61. But maybe we also threw out a lot of useful things in the process. What about higher-level data types, search, aggregation tasks, and the developer-friendly things we love in relational databases ? We’ve done a lot so far to address some of the underlying architectural requirements of new apps and platforms, but there is more we can do to enable greater queriability and broader use cases.
  62. There is growing research and practice around ways we can offer more data types on top of distributed systems, data types that are tolerant of an eventually consistent design.
  63. At Riak, we are applying this research to offer more advanced data types like counters and sets in a future release, and some implementations of this can already be seen in the wild.
  64. And what about consistency? The author of CAP Theorem has stated that CAP doesn’t forbid CA during normal operations…
  65. So can we create systems that can offer both availability and consistency or offer greater guarantees around consistency? Like offering conditional writes - failing a write if intervening requests occur, or if they don’t meet some other requirement? Or offering Paxos-like implementations to ensure strict ordering among replicas? What about consistent reads that always reflect the last written value?
  66. The problem with offering other advanced features – like Search, MapReduce, storing terms and indexes - in distributed systems like Dynamo is that data is ALL AROUND THE CLUSTER… these tasks require finding data and performing tasks on many different nodes in the system.
  67. Oftentimes, the response to date has been to run secondary clusters. However, this requires cluster replication and carries a higher operational burden.
  68. What if we find ways to see data locality / even distribution as more of a spectrum, allowing us to perform advanced tasks without degrading performance or requiring secondary clusters?
  69. Biology tells us that
  70. Rapid emergence of new species and variations
  71. Caused by significant events (like the Dynamo paper, the explosion of commodity hardware, cloud computing, mobile/social revolution)…
  72. Can lead to new types of organism (NoSQL, newSQL, combinations of NoSQL and MySQL)…
  73. But it’s our job to evolve them further into more features / functionality than ever before!
  74. Because it’s hard….