SlideShare a Scribd company logo
A Global In-memory Data System for MySQL

Daniel Austin, PayPal Technical Staff


Percona Live! MySQL Conference
Santa Clara, April 12th, 2012 v1.3
Intro: Globalizing NDB

          Proposed Architecture

                What We Learned

                         Q&A



Global In-memory MySQL         Confidential and Proprietary   2
Mission YesSQL

 “Develop a globally distributed DB For
 user-related data.”
 • Must Not Fail (99.999%)
 • Must Not Lose Data. Period.
 • Must Support Transactions
 • Must Support (some) SQL
 • Must WriteRead 32-bit integer globally in
   1000ms
 • MinMax Data Volume: 10-100 TB (working)
 • Must Scale Linearly with Costs

                               Confidential and Proprietary
THE FUNDAMENTAL PROBLEM IN
DISTRIBUTED DATA SYSTEMS

“How Do We Manage Reliable
Distribution of Data Across Geographical
Distances?”




                              Confidential and Proprietary
One Approach: The NoSQL Solution


 • NoSQL Systems provide a solution that relaxes
   many of the common constraints of typical
   RDBMS systems
   – Slow - RDBMS has not scaled with CPUs
   – Often require complex data management
     (SOX, SOR)
   – Costly to build and maintain, slow to change and
     adapt
   – Intolerant of CAP models (more on this later)
 • Non-relational models, usually key-value
 • May be batched or streaming
 • Not necessarily distributed geographically

                                        Confidential and Proprietary
Big Data Myth #1: Big Data = NoSQL


 • „Big Data‟ Refers to a Common Set of Problems
    – Large Volumes
    – High Rates of Change
       • Of Data
       • Of Data Models
       • Of Data Presentation and Output
    – Often Require „Fast Data‟ as well as „Big‟
       • Near-real Time Analytics
       • Mapping Complex Structures


 Takeaway: Big Data is the problem, NoSQL is one
 (proposed) solution
                                           Confidential and Proprietary
NoSQL Hype Cycle: Where Are We Now?

   There are currently more than 120+ NoSQL
   databases listed at nosql-database.org!



                              You Are Here ?




As the pace of new technology solutions has slowed, some clear winners have emerged.

                                                          Confidential and Proprietary
3 Essential Kinds of NoSQL Systems


 1. Columnar K-V Systems
    – Hadoop, Hbase, Cassandra, PNUTs
 2. Document-Based
    – MongoDB, TerraCotta
 3. Graph-Based
    – FlockDB, Voldemort

 Takeaway: These were originally designed as
 solutions to specific problems because no
 commercial solution would work.

                                     Confidential and Proprietary
Intro: Globalizing Big Data

       Proposed Architecture

                What We Learned

                         Q&A



Global In-memory MySQL          Confidential and Proprietary   9
OUR APPROACH: THE YESSQL SOLUTION

Use MySQL/NDB

Use AWS



Key Tradeoffs:
  Cost vs. Performance
  HA vs. CAP
  Complexity vs. RDBMS features




                                  Confidential and Proprietary
WHY NDB?

            Pro                       Con
•   True HA by design     •   Some semantic
     – Fast recovery          limitations on fields
•   Supports (some) X-    •   Size constraints (2
    actions                   TB?)
•   Relational Model           – Hardware limits
•   In-memory                    also
    architecture = high   •   Higher cost/byte
    performance           •   Requires reasonable
•   Disk storage for          data partitioning
    non-indexed data      •   Higher complexity
     (since 5.1)
•   APIs, APIs, APIs                 Confidential and Proprietary
How NDB Works in One Slide




Graphics courtesy dev.mysql.com

                                  Confidential and Proprietary
CIRCULAR REPLICATION/FAILOVER




Graphics courtesy O’Reilly OnLamp.com


                                        Confidential and Proprietary
SYSTEM AVAILABILITY DEFINED

• Availability of the entire
  system:
            n   m
Asys = 1 – P(1-Pri)j
           i=1 j=1             V
                               I
                               P

• Number of Parallel
  Components Needed to




                                   Parallel
  Achieve Availability Amin:
                                                      Serial

Nmin =
[ln(1-Amin)/ln(1-r)]

                                              Confidential and Proprietary
Big Data Myth #2: The CAP Theorem Doesn’t
Say What You Think It Does


 • Consistency, Availability, (Network) Partition
    – Pick any two? Not really.
 • The Real Story: These are not Independent
   Variables
 • AP =CP (Um, what? But…A != C )
 • Variations:
    – PACELC (adds latency tolerance)
    – VPAC (adds variability)
 Takeaway: the real story here is about the tradeoffs
 made by designers of different systems, and the
 main tradeoff is between consistency and
 availability, usually in favor of the latter.
                                        Confidential and Proprietary
What about “High Performance”?

  • Maximum lightspeed distance on Earth’s
    Surface: ~67 ms
  • Target: data available worldwide in < 1000 ms

                  Sound Easy?

                   Think Again!




                                     Confidential and Proprietary
AWS Meets NDB

• Why AWS?
  – Cheap and easy infrastructure-in-a-box
     (Or so we thought! Ha!)
• Services Used:
  – EC2 (Centos 5.3, small instances for mgm
    & query nodes, XL for data
  – Elastic IPs/ELB
  – EBS Volumes
  – S3
  – Cloudwatch

                                 Confidential and Proprietary
ARCHITECTURAL TILES
                                  AWS Availability Zones
Tiling Rules
                                  A                         B
• Never separate NDB & SQL
• Ndb:2-SQL:1-MGM:1
• Scale by adding more tiles
• Failover 1st to nearest AZ
• Then to nearest DC
• At least 1 replica/AZ
                                  C                ELB
• Don‟t share nodes
• Mgmt nodes are redundant
Limitations                           Unused (not present in all locations)

• AWS is network-bound @
   250 MBPS – ouch!
• Need specific ACL across AZ
   boundaries
• AZs not uniform!
• No GSLB                              NDB              MGM                  SQL
• Dynamic IPs
• ELB sticky sessions !reliable

                                                  Confidential and Proprietary
Architecture Stack

Scale by Tiling



          A       B   A   B         A   B           A      B
                       A   B
                        A   B




                           A    B           A      B




                                                7 AWS Data Centers:
                                                US-E, US-W, US-W2, TK, EU, AS, SA-B

                                                         Confidential and Proprietary
Other Technologies Considered

 • Paxos
   – Elegant-but-complex consensus-based
     messaging protocol
   – Used in Google Megastore, Bing metadata
 • Java Query Caching
   – Queries as serialized objects
   – Not yet working
 • Multiple Ring Architectures
   – Even more complicated = no way


                                     Confidential and Proprietary
Intro: Globalizing Big Data

          Proposed Architecture

             What We Learned

                         Q&A



Global In-memory MySQL          Confidential and Proprietary   21
SYSTEM READ/WRITE PERFORMANCE (!)

 What we tested:
 • 32 & 256 byte char fields
                                               In-region replication tests
 • Reads, writes, query speed vs. volume
 • Data replication speeds
 Results:
 • Global replication < 350 ms
 • 256 byte writeread < 1000ms worldwide




  06/19/2011    06/20/2011    06/21/2011   06/22/2011                     06/23/2011

                                           Confidential and Proprietary
The Commit Ordering Problem

 • Why does commit ordering matter?
 • Write operators are non-commutative

    [W(d,t1),W(d,t2)] != 0 unless t1=t2

   – Can lead to inconsistency
   – Can lead to timestamp corruption
   – Forcing sequential writes defeats Amdahl‟s
     rule
 • Can show up in GSLB scenarios

                                   Confidential and Proprietary
Dark Side of AWS

 • Deploying NDB at scale on AWS is hard
   – Dynamic IPs (use hostfile)
   – DNS issues
   – Security groups (ec2-authorize)
   – Inconsistent EC2 deployments
      • Are availability zones independent
        network segments?
   – No GSLB (!) (rent or buy)
   Be prepared to struggle a bit!


                                   Confidential and Proprietary
Future Directions for YesSQL

 • Alternate solution using Pacemaker, Heartbeat
    – From Yves Trudeau @ Percona
    – Uses InnoDB, not NDB
 • Implement Memcached plugin
    – To test NoSQL functionality, APIs
 • Add simple connection-based persistence to preserve
   connections during failover
 • Can we increase the number of data nodes?
 • As the number of datacenters goes up, sync times go
   down. What‟s the relation?
 • So far only limited testing of hard configurations
    – Production systems will likely need more RAM for MGMs


                                           Confidential and Proprietary
Hard Lessons, Shared
 • Be Careful…
    – With “Eventual Consistency”-related concepts
    – ACID, CAP are not really as well-defined as we‟d like
      considering how often we invoke them
 • NDB is a good solution
    – Real HA, real performance, real SQL
    – Notable limitations around fields, datatypes
    – Successfully competes with NoSQL systems for most use
      cases – better in many cases
 • AWS makes the hard things possible
    – But not so much the easy things easier

    Takeaway: You can achieve high performance and
    availability without giving up relational models and
    read consistency!
                                               Confidential and Proprietary
“In the long run, we are all dead
eventually consistent.”
Maynard Keynes on NoSQL Databases


Twitter: @daniel_b_austin
Email: daaustin@paypal.com




With apologies and thanks to the real DB experts, Andrew Goodman, Yves
Trudeau, Clement Frazer, Daniel Abadi, and everyone else!

More Related Content

What's hot

My sql 5.6_replwebinar_may12
My sql 5.6_replwebinar_may12My sql 5.6_replwebinar_may12
My sql 5.6_replwebinar_may12
Mat Keep
 
Data Stores @ Netflix
Data Stores @ NetflixData Stores @ Netflix
Data Stores @ Netflix
Vinay Kumar Chella
 
MySQL NDB Cluster 101
MySQL NDB Cluster 101MySQL NDB Cluster 101
MySQL NDB Cluster 101
Bernd Ocklin
 
Cisco UCS Integrated Infrastructure for Big Data with Cassandra
Cisco UCS Integrated Infrastructure for Big Data with CassandraCisco UCS Integrated Infrastructure for Big Data with Cassandra
Cisco UCS Integrated Infrastructure for Big Data with Cassandra
DataStax Academy
 
How LinkedIn uses memcached, a spoonful of SOA, and a sprinkle of SQL to scale
How LinkedIn uses memcached, a spoonful of SOA, and a sprinkle of SQL to scaleHow LinkedIn uses memcached, a spoonful of SOA, and a sprinkle of SQL to scale
How LinkedIn uses memcached, a spoonful of SOA, and a sprinkle of SQL to scale
LinkedIn
 
Voldemort : Prototype to Production
Voldemort : Prototype to ProductionVoldemort : Prototype to Production
Voldemort : Prototype to Production
Vinoth Chandar
 
Cassandra Summit 2014: Active-Active Cassandra Behind the Scenes
Cassandra Summit 2014: Active-Active Cassandra Behind the ScenesCassandra Summit 2014: Active-Active Cassandra Behind the Scenes
Cassandra Summit 2014: Active-Active Cassandra Behind the Scenes
DataStax Academy
 
Intro to GlusterFS Webinar - August 2011
Intro to GlusterFS Webinar - August 2011Intro to GlusterFS Webinar - August 2011
Intro to GlusterFS Webinar - August 2011
GlusterFS
 
Introduction to IBM Spectrum Scale and Its Use in Life Science
Introduction to IBM Spectrum Scale and Its Use in Life ScienceIntroduction to IBM Spectrum Scale and Its Use in Life Science
Introduction to IBM Spectrum Scale and Its Use in Life Science
Sandeep Patil
 
Introduction to GlusterFS Webinar - September 2011
Introduction to GlusterFS Webinar - September 2011Introduction to GlusterFS Webinar - September 2011
Introduction to GlusterFS Webinar - September 2011
GlusterFS
 
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Community
 
Yair Hershko - Building Software Defined Storage Cloud Using OpenStack
Yair Hershko - Building Software Defined Storage Cloud Using OpenStackYair Hershko - Building Software Defined Storage Cloud Using OpenStack
Yair Hershko - Building Software Defined Storage Cloud Using OpenStack
Cloud Native Day Tel Aviv
 
White Paper - CEVA-XM4 Intelligent Vision Processor
White Paper - CEVA-XM4 Intelligent Vision ProcessorWhite Paper - CEVA-XM4 Intelligent Vision Processor
White Paper - CEVA-XM4 Intelligent Vision Processor
CEVA, Inc.
 
Percona Live 2012PPT: MySQL Cluster And NDB Cluster
Percona Live 2012PPT: MySQL Cluster And NDB ClusterPercona Live 2012PPT: MySQL Cluster And NDB Cluster
Percona Live 2012PPT: MySQL Cluster And NDB Cluster
mysqlops
 
Dynomite @ Redis Conference 2016
Dynomite @ Redis Conference 2016Dynomite @ Redis Conference 2016
Dynomite @ Redis Conference 2016
Ioannis Papapanagiotou
 
Developing polyglot persistence applications #javaone 2012
Developing polyglot persistence applications  #javaone 2012Developing polyglot persistence applications  #javaone 2012
Developing polyglot persistence applications #javaone 2012
Chris Richardson
 
High Performance Cloud Computing
High Performance Cloud ComputingHigh Performance Cloud Computing
High Performance Cloud Computing
NephoScale
 
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...
Redis Labs
 

What's hot (20)

My sql 5.6_replwebinar_may12
My sql 5.6_replwebinar_may12My sql 5.6_replwebinar_may12
My sql 5.6_replwebinar_may12
 
Data Stores @ Netflix
Data Stores @ NetflixData Stores @ Netflix
Data Stores @ Netflix
 
MySQL NDB Cluster 101
MySQL NDB Cluster 101MySQL NDB Cluster 101
MySQL NDB Cluster 101
 
Cisco UCS Integrated Infrastructure for Big Data with Cassandra
Cisco UCS Integrated Infrastructure for Big Data with CassandraCisco UCS Integrated Infrastructure for Big Data with Cassandra
Cisco UCS Integrated Infrastructure for Big Data with Cassandra
 
How LinkedIn uses memcached, a spoonful of SOA, and a sprinkle of SQL to scale
How LinkedIn uses memcached, a spoonful of SOA, and a sprinkle of SQL to scaleHow LinkedIn uses memcached, a spoonful of SOA, and a sprinkle of SQL to scale
How LinkedIn uses memcached, a spoonful of SOA, and a sprinkle of SQL to scale
 
Voldemort : Prototype to Production
Voldemort : Prototype to ProductionVoldemort : Prototype to Production
Voldemort : Prototype to Production
 
Cassandra Summit 2014: Active-Active Cassandra Behind the Scenes
Cassandra Summit 2014: Active-Active Cassandra Behind the ScenesCassandra Summit 2014: Active-Active Cassandra Behind the Scenes
Cassandra Summit 2014: Active-Active Cassandra Behind the Scenes
 
Intro to GlusterFS Webinar - August 2011
Intro to GlusterFS Webinar - August 2011Intro to GlusterFS Webinar - August 2011
Intro to GlusterFS Webinar - August 2011
 
Introduction to IBM Spectrum Scale and Its Use in Life Science
Introduction to IBM Spectrum Scale and Its Use in Life ScienceIntroduction to IBM Spectrum Scale and Its Use in Life Science
Introduction to IBM Spectrum Scale and Its Use in Life Science
 
Methods of NoSQL database systems benchmarking
Methods of NoSQL database systems benchmarkingMethods of NoSQL database systems benchmarking
Methods of NoSQL database systems benchmarking
 
Introduction to GlusterFS Webinar - September 2011
Introduction to GlusterFS Webinar - September 2011Introduction to GlusterFS Webinar - September 2011
Introduction to GlusterFS Webinar - September 2011
 
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
 
Yair Hershko - Building Software Defined Storage Cloud Using OpenStack
Yair Hershko - Building Software Defined Storage Cloud Using OpenStackYair Hershko - Building Software Defined Storage Cloud Using OpenStack
Yair Hershko - Building Software Defined Storage Cloud Using OpenStack
 
White Paper - CEVA-XM4 Intelligent Vision Processor
White Paper - CEVA-XM4 Intelligent Vision ProcessorWhite Paper - CEVA-XM4 Intelligent Vision Processor
White Paper - CEVA-XM4 Intelligent Vision Processor
 
Percona Live 2012PPT: MySQL Cluster And NDB Cluster
Percona Live 2012PPT: MySQL Cluster And NDB ClusterPercona Live 2012PPT: MySQL Cluster And NDB Cluster
Percona Live 2012PPT: MySQL Cluster And NDB Cluster
 
Dynomite @ Redis Conference 2016
Dynomite @ Redis Conference 2016Dynomite @ Redis Conference 2016
Dynomite @ Redis Conference 2016
 
Developing polyglot persistence applications #javaone 2012
Developing polyglot persistence applications  #javaone 2012Developing polyglot persistence applications  #javaone 2012
Developing polyglot persistence applications #javaone 2012
 
High Performance Cloud Computing
High Performance Cloud ComputingHigh Performance Cloud Computing
High Performance Cloud Computing
 
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...
 
Google Compute and MapR
Google Compute and MapRGoogle Compute and MapR
Google Compute and MapR
 

Viewers also liked

Big data comes in small packages v1.2
Big data comes in small packages v1.2Big data comes in small packages v1.2
Big data comes in small packages v1.2
Daniel Austin
 
Managing Performance Globally with MySQL
Managing Performance Globally with MySQLManaging Performance Globally with MySQL
Managing Performance Globally with MySQL
Daniel Austin
 
Reconceiving the Web as a Distributed (NoSQL) Data System
Reconceiving the Web as a Distributed (NoSQL) Data SystemReconceiving the Web as a Distributed (NoSQL) Data System
Reconceiving the Web as a Distributed (NoSQL) Data System
Daniel Austin
 
FLOK-Ecuador Research process presentation by Michel Bauwens
FLOK-Ecuador Research process presentation by Michel BauwensFLOK-Ecuador Research process presentation by Michel Bauwens
FLOK-Ecuador Research process presentation by Michel Bauwens
bbhorne
 
Big Data and the Future of Money 2014
Big Data and the Future of Money 2014Big Data and the Future of Money 2014
Big Data and the Future of Money 2014
Daniel Austin
 
Performance: How Fast is Fast Enough?
Performance: How Fast is Fast Enough?Performance: How Fast is Fast Enough?
Performance: How Fast is Fast Enough?
Daniel Austin
 
Performance analysisclass
Performance analysisclassPerformance analysisclass
Performance analysisclass
Daniel Austin
 
Next generation web protocols
Next generation web protocolsNext generation web protocols
Next generation web protocols
Daniel Austin
 
Web Performance Bootcamp 2014
Web Performance Bootcamp 2014Web Performance Bootcamp 2014
Web Performance Bootcamp 2014
Daniel Austin
 
The Fastest Possible Search Algorithm: Grover's Search and the World of Quant...
The Fastest Possible Search Algorithm: Grover's Search and the World of Quant...The Fastest Possible Search Algorithm: Grover's Search and the World of Quant...
The Fastest Possible Search Algorithm: Grover's Search and the World of Quant...
Daniel Austin
 
Always Offline: Delay-Tolerant Networking for the Internet of Things
Always Offline: Delay-Tolerant Networking for the Internet of ThingsAlways Offline: Delay-Tolerant Networking for the Internet of Things
Always Offline: Delay-Tolerant Networking for the Internet of Things
Daniel Austin
 
Big data and the Future of Money (World Big Data Congress 2013)
Big data and the Future of Money (World Big Data Congress 2013)Big data and the Future of Money (World Big Data Congress 2013)
Big data and the Future of Money (World Big Data Congress 2013)
Daniel Austin
 
Perspectives on the Evolution of HTML
Perspectives on the Evolution of HTMLPerspectives on the Evolution of HTML
Perspectives on the Evolution of HTML
Daniel Austin
 
Designing Delay-tolerant Data Services for the Network of Things
Designing Delay-tolerant Data Services for the Network of ThingsDesigning Delay-tolerant Data Services for the Network of Things
Designing Delay-tolerant Data Services for the Network of Things
Daniel Austin
 

Viewers also liked (14)

Big data comes in small packages v1.2
Big data comes in small packages v1.2Big data comes in small packages v1.2
Big data comes in small packages v1.2
 
Managing Performance Globally with MySQL
Managing Performance Globally with MySQLManaging Performance Globally with MySQL
Managing Performance Globally with MySQL
 
Reconceiving the Web as a Distributed (NoSQL) Data System
Reconceiving the Web as a Distributed (NoSQL) Data SystemReconceiving the Web as a Distributed (NoSQL) Data System
Reconceiving the Web as a Distributed (NoSQL) Data System
 
FLOK-Ecuador Research process presentation by Michel Bauwens
FLOK-Ecuador Research process presentation by Michel BauwensFLOK-Ecuador Research process presentation by Michel Bauwens
FLOK-Ecuador Research process presentation by Michel Bauwens
 
Big Data and the Future of Money 2014
Big Data and the Future of Money 2014Big Data and the Future of Money 2014
Big Data and the Future of Money 2014
 
Performance: How Fast is Fast Enough?
Performance: How Fast is Fast Enough?Performance: How Fast is Fast Enough?
Performance: How Fast is Fast Enough?
 
Performance analysisclass
Performance analysisclassPerformance analysisclass
Performance analysisclass
 
Next generation web protocols
Next generation web protocolsNext generation web protocols
Next generation web protocols
 
Web Performance Bootcamp 2014
Web Performance Bootcamp 2014Web Performance Bootcamp 2014
Web Performance Bootcamp 2014
 
The Fastest Possible Search Algorithm: Grover's Search and the World of Quant...
The Fastest Possible Search Algorithm: Grover's Search and the World of Quant...The Fastest Possible Search Algorithm: Grover's Search and the World of Quant...
The Fastest Possible Search Algorithm: Grover's Search and the World of Quant...
 
Always Offline: Delay-Tolerant Networking for the Internet of Things
Always Offline: Delay-Tolerant Networking for the Internet of ThingsAlways Offline: Delay-Tolerant Networking for the Internet of Things
Always Offline: Delay-Tolerant Networking for the Internet of Things
 
Big data and the Future of Money (World Big Data Congress 2013)
Big data and the Future of Money (World Big Data Congress 2013)Big data and the Future of Money (World Big Data Congress 2013)
Big data and the Future of Money (World Big Data Congress 2013)
 
Perspectives on the Evolution of HTML
Perspectives on the Evolution of HTMLPerspectives on the Evolution of HTML
Perspectives on the Evolution of HTML
 
Designing Delay-tolerant Data Services for the Network of Things
Designing Delay-tolerant Data Services for the Network of ThingsDesigning Delay-tolerant Data Services for the Network of Things
Designing Delay-tolerant Data Services for the Network of Things
 

Similar to Yes sql08 inmemorydb

PayPal Big Data and MySQL Cluster
PayPal Big Data and MySQL ClusterPayPal Big Data and MySQL Cluster
PayPal Big Data and MySQL Cluster
Mat Keep
 
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...Qian Lin
 
Microsoft Openness Mongo DB
Microsoft Openness Mongo DBMicrosoft Openness Mongo DB
Microsoft Openness Mongo DBHeriyadi Janwar
 
Navigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesNavigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skies
shnkr_rmchndrn
 
NoSQL Intro with cassandra
NoSQL Intro with cassandraNoSQL Intro with cassandra
NoSQL Intro with cassandra
Brian Enochson
 
MySQL Cluster no PayPal
MySQL Cluster no PayPalMySQL Cluster no PayPal
MySQL Cluster no PayPal
MySQL Brasil
 
MySQL Cluster Scaling to a Billion Queries
MySQL Cluster Scaling to a Billion QueriesMySQL Cluster Scaling to a Billion Queries
MySQL Cluster Scaling to a Billion Queries
Bernd Ocklin
 
Apache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and BasicsApache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and Basics
Oleg Magazov
 
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
LinkedIn
 
How and when to use NoSQL
How and when to use NoSQLHow and when to use NoSQL
How and when to use NoSQL
Amazon Web Services
 
NoSQL for great good [hanoi.rb talk]
NoSQL for great good [hanoi.rb talk]NoSQL for great good [hanoi.rb talk]
NoSQL for great good [hanoi.rb talk]
Huy Do
 
Applications in the Cloud
Applications in the CloudApplications in the Cloud
Applications in the Cloud
Eberhard Wolff
 
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDBBig Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
BigDataCloud
 
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,..."Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...
lisapaglia
 
Webinar: The Future of SQL
Webinar: The Future of SQLWebinar: The Future of SQL
Webinar: The Future of SQL
Crate.io
 
MySQL NDB Cluster 8.0
MySQL NDB Cluster 8.0MySQL NDB Cluster 8.0
MySQL NDB Cluster 8.0
Ted Wennmark
 
NoSQL
NoSQLNoSQL

Similar to Yes sql08 inmemorydb (20)

PayPal Big Data and MySQL Cluster
PayPal Big Data and MySQL ClusterPayPal Big Data and MySQL Cluster
PayPal Big Data and MySQL Cluster
 
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
Microsoft Openness Mongo DB
Microsoft Openness Mongo DBMicrosoft Openness Mongo DB
Microsoft Openness Mongo DB
 
Navigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesNavigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skies
 
NoSQL Intro with cassandra
NoSQL Intro with cassandraNoSQL Intro with cassandra
NoSQL Intro with cassandra
 
MySQL Cluster no PayPal
MySQL Cluster no PayPalMySQL Cluster no PayPal
MySQL Cluster no PayPal
 
NoSQL_Night
NoSQL_NightNoSQL_Night
NoSQL_Night
 
MySQL Cluster Scaling to a Billion Queries
MySQL Cluster Scaling to a Billion QueriesMySQL Cluster Scaling to a Billion Queries
MySQL Cluster Scaling to a Billion Queries
 
Apache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and BasicsApache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and Basics
 
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
 
How and when to use NoSQL
How and when to use NoSQLHow and when to use NoSQL
How and when to use NoSQL
 
NoSQL for great good [hanoi.rb talk]
NoSQL for great good [hanoi.rb talk]NoSQL for great good [hanoi.rb talk]
NoSQL for great good [hanoi.rb talk]
 
Applications in the Cloud
Applications in the CloudApplications in the Cloud
Applications in the Cloud
 
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDBBig Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
 
My sql tutorial-oscon-2012
My sql tutorial-oscon-2012My sql tutorial-oscon-2012
My sql tutorial-oscon-2012
 
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,..."Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...
 
Webinar: The Future of SQL
Webinar: The Future of SQLWebinar: The Future of SQL
Webinar: The Future of SQL
 
MySQL NDB Cluster 8.0
MySQL NDB Cluster 8.0MySQL NDB Cluster 8.0
MySQL NDB Cluster 8.0
 
NoSQL
NoSQLNoSQL
NoSQL
 

More from Daniel Austin

HTML5, HTTP2, and You 1.1
HTML5, HTTP2, and You 1.1HTML5, HTTP2, and You 1.1
HTML5, HTTP2, and You 1.1
Daniel Austin
 
Web Performance BootCamp 2013
Web Performance BootCamp 2013Web Performance BootCamp 2013
Web Performance BootCamp 2013
Daniel Austin
 
Quantum Computing in a Nutshell: Grover's Search and the World of Quantum Com...
Quantum Computing in a Nutshell: Grover's Search and the World of Quantum Com...Quantum Computing in a Nutshell: Grover's Search and the World of Quantum Com...
Quantum Computing in a Nutshell: Grover's Search and the World of Quantum Com...
Daniel Austin
 
Big Data is a Big Scam Most of the Time! (MySQL Connect Keynote 2012)
Big Data is a Big Scam Most of the Time! (MySQL Connect Keynote 2012)Big Data is a Big Scam Most of the Time! (MySQL Connect Keynote 2012)
Big Data is a Big Scam Most of the Time! (MySQL Connect Keynote 2012)
Daniel Austin
 
The Fastest Possible Search Algorithm
The Fastest Possible Search AlgorithmThe Fastest Possible Search Algorithm
The Fastest Possible Search Algorithm
Daniel Austin
 
Notes on a High-Performance JSON Protocol
Notes on a High-Performance JSON ProtocolNotes on a High-Performance JSON Protocol
Notes on a High-Performance JSON Protocol
Daniel Austin
 
Wrestling Large Data Volumes to the Ground
Wrestling Large Data Volumes to the GroundWrestling Large Data Volumes to the Ground
Wrestling Large Data Volumes to the Ground
Daniel Austin
 

More from Daniel Austin (7)

HTML5, HTTP2, and You 1.1
HTML5, HTTP2, and You 1.1HTML5, HTTP2, and You 1.1
HTML5, HTTP2, and You 1.1
 
Web Performance BootCamp 2013
Web Performance BootCamp 2013Web Performance BootCamp 2013
Web Performance BootCamp 2013
 
Quantum Computing in a Nutshell: Grover's Search and the World of Quantum Com...
Quantum Computing in a Nutshell: Grover's Search and the World of Quantum Com...Quantum Computing in a Nutshell: Grover's Search and the World of Quantum Com...
Quantum Computing in a Nutshell: Grover's Search and the World of Quantum Com...
 
Big Data is a Big Scam Most of the Time! (MySQL Connect Keynote 2012)
Big Data is a Big Scam Most of the Time! (MySQL Connect Keynote 2012)Big Data is a Big Scam Most of the Time! (MySQL Connect Keynote 2012)
Big Data is a Big Scam Most of the Time! (MySQL Connect Keynote 2012)
 
The Fastest Possible Search Algorithm
The Fastest Possible Search AlgorithmThe Fastest Possible Search Algorithm
The Fastest Possible Search Algorithm
 
Notes on a High-Performance JSON Protocol
Notes on a High-Performance JSON ProtocolNotes on a High-Performance JSON Protocol
Notes on a High-Performance JSON Protocol
 
Wrestling Large Data Volumes to the Ground
Wrestling Large Data Volumes to the GroundWrestling Large Data Volumes to the Ground
Wrestling Large Data Volumes to the Ground
 

Recently uploaded

Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 

Recently uploaded (20)

Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 

Yes sql08 inmemorydb

  • 1. A Global In-memory Data System for MySQL Daniel Austin, PayPal Technical Staff Percona Live! MySQL Conference Santa Clara, April 12th, 2012 v1.3
  • 2. Intro: Globalizing NDB Proposed Architecture What We Learned Q&A Global In-memory MySQL Confidential and Proprietary 2
  • 3. Mission YesSQL “Develop a globally distributed DB For user-related data.” • Must Not Fail (99.999%) • Must Not Lose Data. Period. • Must Support Transactions • Must Support (some) SQL • Must WriteRead 32-bit integer globally in 1000ms • MinMax Data Volume: 10-100 TB (working) • Must Scale Linearly with Costs Confidential and Proprietary
  • 4. THE FUNDAMENTAL PROBLEM IN DISTRIBUTED DATA SYSTEMS “How Do We Manage Reliable Distribution of Data Across Geographical Distances?” Confidential and Proprietary
  • 5. One Approach: The NoSQL Solution • NoSQL Systems provide a solution that relaxes many of the common constraints of typical RDBMS systems – Slow - RDBMS has not scaled with CPUs – Often require complex data management (SOX, SOR) – Costly to build and maintain, slow to change and adapt – Intolerant of CAP models (more on this later) • Non-relational models, usually key-value • May be batched or streaming • Not necessarily distributed geographically Confidential and Proprietary
  • 6. Big Data Myth #1: Big Data = NoSQL • „Big Data‟ Refers to a Common Set of Problems – Large Volumes – High Rates of Change • Of Data • Of Data Models • Of Data Presentation and Output – Often Require „Fast Data‟ as well as „Big‟ • Near-real Time Analytics • Mapping Complex Structures Takeaway: Big Data is the problem, NoSQL is one (proposed) solution Confidential and Proprietary
  • 7. NoSQL Hype Cycle: Where Are We Now? There are currently more than 120+ NoSQL databases listed at nosql-database.org! You Are Here ? As the pace of new technology solutions has slowed, some clear winners have emerged. Confidential and Proprietary
  • 8. 3 Essential Kinds of NoSQL Systems 1. Columnar K-V Systems – Hadoop, Hbase, Cassandra, PNUTs 2. Document-Based – MongoDB, TerraCotta 3. Graph-Based – FlockDB, Voldemort Takeaway: These were originally designed as solutions to specific problems because no commercial solution would work. Confidential and Proprietary
  • 9. Intro: Globalizing Big Data Proposed Architecture What We Learned Q&A Global In-memory MySQL Confidential and Proprietary 9
  • 10. OUR APPROACH: THE YESSQL SOLUTION Use MySQL/NDB Use AWS Key Tradeoffs: Cost vs. Performance HA vs. CAP Complexity vs. RDBMS features Confidential and Proprietary
  • 11. WHY NDB? Pro Con • True HA by design • Some semantic – Fast recovery limitations on fields • Supports (some) X- • Size constraints (2 actions TB?) • Relational Model – Hardware limits • In-memory also architecture = high • Higher cost/byte performance • Requires reasonable • Disk storage for data partitioning non-indexed data • Higher complexity (since 5.1) • APIs, APIs, APIs Confidential and Proprietary
  • 12. How NDB Works in One Slide Graphics courtesy dev.mysql.com Confidential and Proprietary
  • 13. CIRCULAR REPLICATION/FAILOVER Graphics courtesy O’Reilly OnLamp.com Confidential and Proprietary
  • 14. SYSTEM AVAILABILITY DEFINED • Availability of the entire system: n m Asys = 1 – P(1-Pri)j i=1 j=1 V I P • Number of Parallel Components Needed to Parallel Achieve Availability Amin: Serial Nmin = [ln(1-Amin)/ln(1-r)] Confidential and Proprietary
  • 15. Big Data Myth #2: The CAP Theorem Doesn’t Say What You Think It Does • Consistency, Availability, (Network) Partition – Pick any two? Not really. • The Real Story: These are not Independent Variables • AP =CP (Um, what? But…A != C ) • Variations: – PACELC (adds latency tolerance) – VPAC (adds variability) Takeaway: the real story here is about the tradeoffs made by designers of different systems, and the main tradeoff is between consistency and availability, usually in favor of the latter. Confidential and Proprietary
  • 16. What about “High Performance”? • Maximum lightspeed distance on Earth’s Surface: ~67 ms • Target: data available worldwide in < 1000 ms Sound Easy? Think Again! Confidential and Proprietary
  • 17. AWS Meets NDB • Why AWS? – Cheap and easy infrastructure-in-a-box (Or so we thought! Ha!) • Services Used: – EC2 (Centos 5.3, small instances for mgm & query nodes, XL for data – Elastic IPs/ELB – EBS Volumes – S3 – Cloudwatch Confidential and Proprietary
  • 18. ARCHITECTURAL TILES AWS Availability Zones Tiling Rules A B • Never separate NDB & SQL • Ndb:2-SQL:1-MGM:1 • Scale by adding more tiles • Failover 1st to nearest AZ • Then to nearest DC • At least 1 replica/AZ C ELB • Don‟t share nodes • Mgmt nodes are redundant Limitations Unused (not present in all locations) • AWS is network-bound @ 250 MBPS – ouch! • Need specific ACL across AZ boundaries • AZs not uniform! • No GSLB NDB MGM SQL • Dynamic IPs • ELB sticky sessions !reliable Confidential and Proprietary
  • 19. Architecture Stack Scale by Tiling A B A B A B A B A B A B A B A B 7 AWS Data Centers: US-E, US-W, US-W2, TK, EU, AS, SA-B Confidential and Proprietary
  • 20. Other Technologies Considered • Paxos – Elegant-but-complex consensus-based messaging protocol – Used in Google Megastore, Bing metadata • Java Query Caching – Queries as serialized objects – Not yet working • Multiple Ring Architectures – Even more complicated = no way Confidential and Proprietary
  • 21. Intro: Globalizing Big Data Proposed Architecture What We Learned Q&A Global In-memory MySQL Confidential and Proprietary 21
  • 22. SYSTEM READ/WRITE PERFORMANCE (!) What we tested: • 32 & 256 byte char fields In-region replication tests • Reads, writes, query speed vs. volume • Data replication speeds Results: • Global replication < 350 ms • 256 byte writeread < 1000ms worldwide 06/19/2011 06/20/2011 06/21/2011 06/22/2011 06/23/2011 Confidential and Proprietary
  • 23. The Commit Ordering Problem • Why does commit ordering matter? • Write operators are non-commutative [W(d,t1),W(d,t2)] != 0 unless t1=t2 – Can lead to inconsistency – Can lead to timestamp corruption – Forcing sequential writes defeats Amdahl‟s rule • Can show up in GSLB scenarios Confidential and Proprietary
  • 24. Dark Side of AWS • Deploying NDB at scale on AWS is hard – Dynamic IPs (use hostfile) – DNS issues – Security groups (ec2-authorize) – Inconsistent EC2 deployments • Are availability zones independent network segments? – No GSLB (!) (rent or buy) Be prepared to struggle a bit! Confidential and Proprietary
  • 25. Future Directions for YesSQL • Alternate solution using Pacemaker, Heartbeat – From Yves Trudeau @ Percona – Uses InnoDB, not NDB • Implement Memcached plugin – To test NoSQL functionality, APIs • Add simple connection-based persistence to preserve connections during failover • Can we increase the number of data nodes? • As the number of datacenters goes up, sync times go down. What‟s the relation? • So far only limited testing of hard configurations – Production systems will likely need more RAM for MGMs Confidential and Proprietary
  • 26. Hard Lessons, Shared • Be Careful… – With “Eventual Consistency”-related concepts – ACID, CAP are not really as well-defined as we‟d like considering how often we invoke them • NDB is a good solution – Real HA, real performance, real SQL – Notable limitations around fields, datatypes – Successfully competes with NoSQL systems for most use cases – better in many cases • AWS makes the hard things possible – But not so much the easy things easier Takeaway: You can achieve high performance and availability without giving up relational models and read consistency! Confidential and Proprietary
  • 27. “In the long run, we are all dead eventually consistent.” Maynard Keynes on NoSQL Databases Twitter: @daniel_b_austin Email: daaustin@paypal.com With apologies and thanks to the real DB experts, Andrew Goodman, Yves Trudeau, Clement Frazer, Daniel Abadi, and everyone else!

Editor's Notes

  1. Service Reliability. Must be buzzword compliant, as in RFC 2119 Tradeoffs discussed previously.
  2. This is really the problem we want to solve. It’s one of the fundamental problems in computer science and doesn’t have a completely satisfactory solution.
  3. But is this all really necessary? Or just fashion?
  4. This is big myth #1. they are not at all necessarily even related, one could have either or both
  5. Mike’s talk last year, only
  6. For YesSQL, we needed a more complex set of relations, and transactional support.
  7. The rest of this section of our talk will be working through the consequences of these tradeoffs.
  8. NDB has a much higher maturity level than most NoSQL systems
  9. The CAP Theorem is a limited version of the Systemic Qualities model.
  10. Performance is response time.
  11. “Conservation of timestamps”
  12. Also need to consider updates from versions after 5.1