A Global In-memory Data System for MySQL

Daniel Austin, PayPal Technical Staff


Percona Live! MySQL Conference
Santa Clara, April 12th, 2012 v1.3
Intro: Globalizing NDB

          Proposed Architecture

                What We Learned

                         Q&A



Global In-memory MySQL         Confidential and Proprietary   2
Mission YesSQL

 “Develop a globally distributed DB For
 user-related data.”
 • Must Not Fail (99.999%)
 • Must Not Lose Data. Period.
 • Must Support Transactions
 • Must Support (some) SQL
 • Must WriteRead 32-bit integer globally in
   1000ms
 • MinMax Data Volume: 10-100 TB (working)
 • Must Scale Linearly with Costs

                               Confidential and Proprietary
THE FUNDAMENTAL PROBLEM IN
DISTRIBUTED DATA SYSTEMS

“How Do We Manage Reliable
Distribution of Data Across Geographical
Distances?”




                              Confidential and Proprietary
One Approach: The NoSQL Solution


 • NoSQL Systems provide a solution that relaxes
   many of the common constraints of typical
   RDBMS systems
   – Slow - RDBMS has not scaled with CPUs
   – Often require complex data management
     (SOX, SOR)
   – Costly to build and maintain, slow to change and
     adapt
   – Intolerant of CAP models (more on this later)
 • Non-relational models, usually key-value
 • May be batched or streaming
 • Not necessarily distributed geographically

                                        Confidential and Proprietary
Big Data Myth #1: Big Data = NoSQL


 • „Big Data‟ Refers to a Common Set of Problems
    – Large Volumes
    – High Rates of Change
       • Of Data
       • Of Data Models
       • Of Data Presentation and Output
    – Often Require „Fast Data‟ as well as „Big‟
       • Near-real Time Analytics
       • Mapping Complex Structures


 Takeaway: Big Data is the problem, NoSQL is one
 (proposed) solution
                                           Confidential and Proprietary
NoSQL Hype Cycle: Where Are We Now?

   There are currently more than 120+ NoSQL
   databases listed at nosql-database.org!



                              You Are Here ?




As the pace of new technology solutions has slowed, some clear winners have emerged.

                                                          Confidential and Proprietary
3 Essential Kinds of NoSQL Systems


 1. Columnar K-V Systems
    – Hadoop, Hbase, Cassandra, PNUTs
 2. Document-Based
    – MongoDB, TerraCotta
 3. Graph-Based
    – FlockDB, Voldemort

 Takeaway: These were originally designed as
 solutions to specific problems because no
 commercial solution would work.

                                     Confidential and Proprietary
Intro: Globalizing Big Data

       Proposed Architecture

                What We Learned

                         Q&A



Global In-memory MySQL          Confidential and Proprietary   9
OUR APPROACH: THE YESSQL SOLUTION

Use MySQL/NDB

Use AWS



Key Tradeoffs:
  Cost vs. Performance
  HA vs. CAP
  Complexity vs. RDBMS features




                                  Confidential and Proprietary
WHY NDB?

            Pro                       Con
•   True HA by design     •   Some semantic
     – Fast recovery          limitations on fields
•   Supports (some) X-    •   Size constraints (2
    actions                   TB?)
•   Relational Model           – Hardware limits
•   In-memory                    also
    architecture = high   •   Higher cost/byte
    performance           •   Requires reasonable
•   Disk storage for          data partitioning
    non-indexed data      •   Higher complexity
     (since 5.1)
•   APIs, APIs, APIs                 Confidential and Proprietary
How NDB Works in One Slide




Graphics courtesy dev.mysql.com

                                  Confidential and Proprietary
CIRCULAR REPLICATION/FAILOVER




Graphics courtesy O’Reilly OnLamp.com


                                        Confidential and Proprietary
SYSTEM AVAILABILITY DEFINED

• Availability of the entire
  system:
            n   m
Asys = 1 – P(1-Pri)j
           i=1 j=1             V
                               I
                               P

• Number of Parallel
  Components Needed to




                                   Parallel
  Achieve Availability Amin:
                                                      Serial

Nmin =
[ln(1-Amin)/ln(1-r)]

                                              Confidential and Proprietary
Big Data Myth #2: The CAP Theorem Doesn’t
Say What You Think It Does


 • Consistency, Availability, (Network) Partition
    – Pick any two? Not really.
 • The Real Story: These are not Independent
   Variables
 • AP =CP (Um, what? But…A != C )
 • Variations:
    – PACELC (adds latency tolerance)
    – VPAC (adds variability)
 Takeaway: the real story here is about the tradeoffs
 made by designers of different systems, and the
 main tradeoff is between consistency and
 availability, usually in favor of the latter.
                                        Confidential and Proprietary
What about “High Performance”?

  • Maximum lightspeed distance on Earth’s
    Surface: ~67 ms
  • Target: data available worldwide in < 1000 ms

                  Sound Easy?

                   Think Again!




                                     Confidential and Proprietary
AWS Meets NDB

• Why AWS?
  – Cheap and easy infrastructure-in-a-box
     (Or so we thought! Ha!)
• Services Used:
  – EC2 (Centos 5.3, small instances for mgm
    & query nodes, XL for data
  – Elastic IPs/ELB
  – EBS Volumes
  – S3
  – Cloudwatch

                                 Confidential and Proprietary
ARCHITECTURAL TILES
                                  AWS Availability Zones
Tiling Rules
                                  A                         B
• Never separate NDB & SQL
• Ndb:2-SQL:1-MGM:1
• Scale by adding more tiles
• Failover 1st to nearest AZ
• Then to nearest DC
• At least 1 replica/AZ
                                  C                ELB
• Don‟t share nodes
• Mgmt nodes are redundant
Limitations                           Unused (not present in all locations)

• AWS is network-bound @
   250 MBPS – ouch!
• Need specific ACL across AZ
   boundaries
• AZs not uniform!
• No GSLB                              NDB              MGM                  SQL
• Dynamic IPs
• ELB sticky sessions !reliable

                                                  Confidential and Proprietary
Architecture Stack

Scale by Tiling



          A       B   A   B         A   B           A      B
                       A   B
                        A   B




                           A    B           A      B




                                                7 AWS Data Centers:
                                                US-E, US-W, US-W2, TK, EU, AS, SA-B

                                                         Confidential and Proprietary
Other Technologies Considered

 • Paxos
   – Elegant-but-complex consensus-based
     messaging protocol
   – Used in Google Megastore, Bing metadata
 • Java Query Caching
   – Queries as serialized objects
   – Not yet working
 • Multiple Ring Architectures
   – Even more complicated = no way


                                     Confidential and Proprietary
Intro: Globalizing Big Data

          Proposed Architecture

             What We Learned

                         Q&A



Global In-memory MySQL          Confidential and Proprietary   21
SYSTEM READ/WRITE PERFORMANCE (!)

 What we tested:
 • 32 & 256 byte char fields
                                               In-region replication tests
 • Reads, writes, query speed vs. volume
 • Data replication speeds
 Results:
 • Global replication < 350 ms
 • 256 byte writeread < 1000ms worldwide




  06/19/2011    06/20/2011    06/21/2011   06/22/2011                     06/23/2011

                                           Confidential and Proprietary
The Commit Ordering Problem

 • Why does commit ordering matter?
 • Write operators are non-commutative

    [W(d,t1),W(d,t2)] != 0 unless t1=t2

   – Can lead to inconsistency
   – Can lead to timestamp corruption
   – Forcing sequential writes defeats Amdahl‟s
     rule
 • Can show up in GSLB scenarios

                                   Confidential and Proprietary
Dark Side of AWS

 • Deploying NDB at scale on AWS is hard
   – Dynamic IPs (use hostfile)
   – DNS issues
   – Security groups (ec2-authorize)
   – Inconsistent EC2 deployments
      • Are availability zones independent
        network segments?
   – No GSLB (!) (rent or buy)
   Be prepared to struggle a bit!


                                   Confidential and Proprietary
Future Directions for YesSQL

 • Alternate solution using Pacemaker, Heartbeat
    – From Yves Trudeau @ Percona
    – Uses InnoDB, not NDB
 • Implement Memcached plugin
    – To test NoSQL functionality, APIs
 • Add simple connection-based persistence to preserve
   connections during failover
 • Can we increase the number of data nodes?
 • As the number of datacenters goes up, sync times go
   down. What‟s the relation?
 • So far only limited testing of hard configurations
    – Production systems will likely need more RAM for MGMs


                                           Confidential and Proprietary
Hard Lessons, Shared
 • Be Careful…
    – With “Eventual Consistency”-related concepts
    – ACID, CAP are not really as well-defined as we‟d like
      considering how often we invoke them
 • NDB is a good solution
    – Real HA, real performance, real SQL
    – Notable limitations around fields, datatypes
    – Successfully competes with NoSQL systems for most use
      cases – better in many cases
 • AWS makes the hard things possible
    – But not so much the easy things easier

    Takeaway: You can achieve high performance and
    availability without giving up relational models and
    read consistency!
                                               Confidential and Proprietary
“In the long run, we are all dead
eventually consistent.”
Maynard Keynes on NoSQL Databases


Twitter: @daniel_b_austin
Email: daaustin@paypal.com




With apologies and thanks to the real DB experts, Andrew Goodman, Yves
Trudeau, Clement Frazer, Daniel Abadi, and everyone else!

Yes sql08 inmemorydb

  • 1.
    A Global In-memoryData System for MySQL Daniel Austin, PayPal Technical Staff Percona Live! MySQL Conference Santa Clara, April 12th, 2012 v1.3
  • 2.
    Intro: Globalizing NDB Proposed Architecture What We Learned Q&A Global In-memory MySQL Confidential and Proprietary 2
  • 3.
    Mission YesSQL “Developa globally distributed DB For user-related data.” • Must Not Fail (99.999%) • Must Not Lose Data. Period. • Must Support Transactions • Must Support (some) SQL • Must WriteRead 32-bit integer globally in 1000ms • MinMax Data Volume: 10-100 TB (working) • Must Scale Linearly with Costs Confidential and Proprietary
  • 4.
    THE FUNDAMENTAL PROBLEMIN DISTRIBUTED DATA SYSTEMS “How Do We Manage Reliable Distribution of Data Across Geographical Distances?” Confidential and Proprietary
  • 5.
    One Approach: TheNoSQL Solution • NoSQL Systems provide a solution that relaxes many of the common constraints of typical RDBMS systems – Slow - RDBMS has not scaled with CPUs – Often require complex data management (SOX, SOR) – Costly to build and maintain, slow to change and adapt – Intolerant of CAP models (more on this later) • Non-relational models, usually key-value • May be batched or streaming • Not necessarily distributed geographically Confidential and Proprietary
  • 6.
    Big Data Myth#1: Big Data = NoSQL • „Big Data‟ Refers to a Common Set of Problems – Large Volumes – High Rates of Change • Of Data • Of Data Models • Of Data Presentation and Output – Often Require „Fast Data‟ as well as „Big‟ • Near-real Time Analytics • Mapping Complex Structures Takeaway: Big Data is the problem, NoSQL is one (proposed) solution Confidential and Proprietary
  • 7.
    NoSQL Hype Cycle:Where Are We Now? There are currently more than 120+ NoSQL databases listed at nosql-database.org! You Are Here ? As the pace of new technology solutions has slowed, some clear winners have emerged. Confidential and Proprietary
  • 8.
    3 Essential Kindsof NoSQL Systems 1. Columnar K-V Systems – Hadoop, Hbase, Cassandra, PNUTs 2. Document-Based – MongoDB, TerraCotta 3. Graph-Based – FlockDB, Voldemort Takeaway: These were originally designed as solutions to specific problems because no commercial solution would work. Confidential and Proprietary
  • 9.
    Intro: Globalizing BigData Proposed Architecture What We Learned Q&A Global In-memory MySQL Confidential and Proprietary 9
  • 10.
    OUR APPROACH: THEYESSQL SOLUTION Use MySQL/NDB Use AWS Key Tradeoffs: Cost vs. Performance HA vs. CAP Complexity vs. RDBMS features Confidential and Proprietary
  • 11.
    WHY NDB? Pro Con • True HA by design • Some semantic – Fast recovery limitations on fields • Supports (some) X- • Size constraints (2 actions TB?) • Relational Model – Hardware limits • In-memory also architecture = high • Higher cost/byte performance • Requires reasonable • Disk storage for data partitioning non-indexed data • Higher complexity (since 5.1) • APIs, APIs, APIs Confidential and Proprietary
  • 12.
    How NDB Worksin One Slide Graphics courtesy dev.mysql.com Confidential and Proprietary
  • 13.
    CIRCULAR REPLICATION/FAILOVER Graphics courtesyO’Reilly OnLamp.com Confidential and Proprietary
  • 14.
    SYSTEM AVAILABILITY DEFINED •Availability of the entire system: n m Asys = 1 – P(1-Pri)j i=1 j=1 V I P • Number of Parallel Components Needed to Parallel Achieve Availability Amin: Serial Nmin = [ln(1-Amin)/ln(1-r)] Confidential and Proprietary
  • 15.
    Big Data Myth#2: The CAP Theorem Doesn’t Say What You Think It Does • Consistency, Availability, (Network) Partition – Pick any two? Not really. • The Real Story: These are not Independent Variables • AP =CP (Um, what? But…A != C ) • Variations: – PACELC (adds latency tolerance) – VPAC (adds variability) Takeaway: the real story here is about the tradeoffs made by designers of different systems, and the main tradeoff is between consistency and availability, usually in favor of the latter. Confidential and Proprietary
  • 16.
    What about “HighPerformance”? • Maximum lightspeed distance on Earth’s Surface: ~67 ms • Target: data available worldwide in < 1000 ms Sound Easy? Think Again! Confidential and Proprietary
  • 17.
    AWS Meets NDB •Why AWS? – Cheap and easy infrastructure-in-a-box (Or so we thought! Ha!) • Services Used: – EC2 (Centos 5.3, small instances for mgm & query nodes, XL for data – Elastic IPs/ELB – EBS Volumes – S3 – Cloudwatch Confidential and Proprietary
  • 18.
    ARCHITECTURAL TILES AWS Availability Zones Tiling Rules A B • Never separate NDB & SQL • Ndb:2-SQL:1-MGM:1 • Scale by adding more tiles • Failover 1st to nearest AZ • Then to nearest DC • At least 1 replica/AZ C ELB • Don‟t share nodes • Mgmt nodes are redundant Limitations Unused (not present in all locations) • AWS is network-bound @ 250 MBPS – ouch! • Need specific ACL across AZ boundaries • AZs not uniform! • No GSLB NDB MGM SQL • Dynamic IPs • ELB sticky sessions !reliable Confidential and Proprietary
  • 19.
    Architecture Stack Scale byTiling A B A B A B A B A B A B A B A B 7 AWS Data Centers: US-E, US-W, US-W2, TK, EU, AS, SA-B Confidential and Proprietary
  • 20.
    Other Technologies Considered • Paxos – Elegant-but-complex consensus-based messaging protocol – Used in Google Megastore, Bing metadata • Java Query Caching – Queries as serialized objects – Not yet working • Multiple Ring Architectures – Even more complicated = no way Confidential and Proprietary
  • 21.
    Intro: Globalizing BigData Proposed Architecture What We Learned Q&A Global In-memory MySQL Confidential and Proprietary 21
  • 22.
    SYSTEM READ/WRITE PERFORMANCE(!) What we tested: • 32 & 256 byte char fields In-region replication tests • Reads, writes, query speed vs. volume • Data replication speeds Results: • Global replication < 350 ms • 256 byte writeread < 1000ms worldwide 06/19/2011 06/20/2011 06/21/2011 06/22/2011 06/23/2011 Confidential and Proprietary
  • 23.
    The Commit OrderingProblem • Why does commit ordering matter? • Write operators are non-commutative [W(d,t1),W(d,t2)] != 0 unless t1=t2 – Can lead to inconsistency – Can lead to timestamp corruption – Forcing sequential writes defeats Amdahl‟s rule • Can show up in GSLB scenarios Confidential and Proprietary
  • 24.
    Dark Side ofAWS • Deploying NDB at scale on AWS is hard – Dynamic IPs (use hostfile) – DNS issues – Security groups (ec2-authorize) – Inconsistent EC2 deployments • Are availability zones independent network segments? – No GSLB (!) (rent or buy) Be prepared to struggle a bit! Confidential and Proprietary
  • 25.
    Future Directions forYesSQL • Alternate solution using Pacemaker, Heartbeat – From Yves Trudeau @ Percona – Uses InnoDB, not NDB • Implement Memcached plugin – To test NoSQL functionality, APIs • Add simple connection-based persistence to preserve connections during failover • Can we increase the number of data nodes? • As the number of datacenters goes up, sync times go down. What‟s the relation? • So far only limited testing of hard configurations – Production systems will likely need more RAM for MGMs Confidential and Proprietary
  • 26.
    Hard Lessons, Shared • Be Careful… – With “Eventual Consistency”-related concepts – ACID, CAP are not really as well-defined as we‟d like considering how often we invoke them • NDB is a good solution – Real HA, real performance, real SQL – Notable limitations around fields, datatypes – Successfully competes with NoSQL systems for most use cases – better in many cases • AWS makes the hard things possible – But not so much the easy things easier Takeaway: You can achieve high performance and availability without giving up relational models and read consistency! Confidential and Proprietary
  • 27.
    “In the longrun, we are all dead eventually consistent.” Maynard Keynes on NoSQL Databases Twitter: @daniel_b_austin Email: daaustin@paypal.com With apologies and thanks to the real DB experts, Andrew Goodman, Yves Trudeau, Clement Frazer, Daniel Abadi, and everyone else!

Editor's Notes

  • #4 Service Reliability. Must be buzzword compliant, as in RFC 2119 Tradeoffs discussed previously.
  • #5 This is really the problem we want to solve. It’s one of the fundamental problems in computer science and doesn’t have a completely satisfactory solution.
  • #6 But is this all really necessary? Or just fashion?
  • #7 This is big myth #1. they are not at all necessarily even related, one could have either or both
  • #8 Mike’s talk last year, only
  • #9 For YesSQL, we needed a more complex set of relations, and transactional support.
  • #11 The rest of this section of our talk will be working through the consequences of these tradeoffs.
  • #12 NDB has a much higher maturity level than most NoSQL systems
  • #16 The CAP Theorem is a limited version of the Systemic Qualities model.
  • #17 Performance is response time.
  • #24 “Conservation of timestamps”
  • #26 Also need to consider updates from versions after 5.1