SlideShare a Scribd company logo
Big Data is a Big Scam (Most of the Time)

Daniel Austin, PayPal Technical Staff


MySQL Connect Conference
September 30, 2012         v1.2
Today’s Agenda



         Big Myths about Big Data


           YESQL: A Counterexample


                              Q&A



     Global In-memory MySQL         Confidential and Proprietary   2
THE FUNDAMENTAL PROBLEM IN
DISTRIBUTED DATA SYSTEMS

“How Do We Manage Reliable
Distribution of Data Across Geographical
Distances?”




                               Confidential and Proprietary
The NoSQL Solution


 •  NoSQL Systems provide a solution that relaxes
    many of the common constraints of typical
    RDBMS systems
    –  Slow - RDBMS has not scaled with CPUs
    –  Often require complex data management (SOX,
       SOR)
    –  Costly to build and maintain, slow to change and
       adapt
    –  Intolerant of CAP models (more on this later)
 •  Non-relational models, usually key-value
 •  May be batched or streaming
 •  Not necessarily distributed geographically

                                          Confidential and Proprietary
Big Data Myth #1: Big Data = NoSQL


 •  ‘Big Data’ Refers to a Common Set of Problems
    –  Large Volumes
    –  High Rates of Change
        •  Of Data
        •  Of Data Models
        •  Of Data Presentation and Output
    –  Often Require ‘Fast Data’ as well as ‘Big’
        •  Near-real Time Analytics
        •  Mapping Complex Structures

 Takeaway: Big Data is the problem, NoSQL is one
 (proposed) solution
                                            Confidential and Proprietary
3 Kinds of Big Data Systems


 1.  Columnar K-V Systems
    – Hadoop, Hbase, Cassandra, PNUTs
 2.  Document-Based
    – MongoDB, TerraCotta
 3.  Graph-Based
    – FlockDB, Voldemort

 Takeaway: These were originally designed as
 solutions to specific problems because no
 commercial solution would work.

                                 Confidential and Proprietary
Big Data Hype Cycle: Where Are We Now?

   There are currently more than 120+ NoSQL
   databases listed at nosql-databases.com!



                              You Are Here ?




As the pace of new technology solutions has slowed, some clear winners have emerged.

                                                          Confidential and Proprietary
Big Data Myth #2: The CAP Theorem Doesn’t
Say What You Think It Does


 •  Consistency, Availability, (Network) Partition
 •  The Real Story: These are not Independent
    Variables
 •  AP =CP (Um, what? But…A != C )
 •  Variations:
    –  PACELC (adds latency tolerance)


 Takeaway: the real story here is about the tradeoffs
 made by designers of different systems, and the
 main tradeoff is between consistency and
 availability, usually in favor of the latter.

                                         Confidential and Proprietary
Big Data Myth: You Need A Big Data System


 Well, Maybe….But Before You Go There…
 There are essentially two ‘Big Data Problems’:
 “I have too much data and it’s coming in too fast to
 handle with any RDBMS.”

 “I have a lot of data distributed geographically and
 need to be able to read and write from anywhere in
 near real-time.”

 Takeaway: if you have one of these Big Data
 problems, a NoSQL solution might work for you.
 But there are also other alternatives…
                                       Confidential and Proprietary
BIG DATA MYTH #3: BIG DATA AND NOSQL
ARE NEW IDEAS

•  The first and most successful
   such system is DNS, created in
   1983.
•  Began with flat files
•  Currently serves the entire
   Internet (!)
•  DNS is an AP system,
   availability is #1
•  Many extensions complicate a
   simple design
•  Suggests a new term for CAP-
   like ideas: variability
     •  DNS variability is very high,
        often 2-3x the mean


                                        Confidential and Proprietary
Today’s Agenda



            Big Myths About Big Data


       YESQL: A Counterexample


                              Q&A



     Global In-memory MySQL         Confidential and Proprietary   11
Mission YESQL

 “Develop a globally distributed DB For
 user-related data.”
 •  Must Not Fail (99.999%)
 •  Must Not Lose Data. Period.
 •  Must Support Transactions
 •  Must Support (some) SQL
 •  Must WriteRead 32-bit integer globally in
    1000ms
 •  Maximum Data Volume: 100 TB
 •  Must Scale Linearly with Costs

                                  Confidential and Proprietary
What about “High Performance”?

  • Maximum lightspeed distance on Earth’s
    Surface: ~67 ms
  • Target: data available worldwide in < 1000 ms

                  Sound Easy?

                   Think Again!




                                     Confidential and Proprietary
WHY MYSQL CLUSTER?

            Pro                        Con
•  True HA by design     •    Some semantic
    –  Fast recovery          limitations on fields
•  Supports (some) X-    •    Size constraints (2
   actions                    TB?)
•  Relational Model            –  Hardware limits
•  In-memory                      also
   architecture = high   •    Higher cost/byte
   performance           •    Requires reasonable
•  Disk storage for           data partitioning
   non-indexed data      •    Higher complexity
•  APIs, APIs, APIs
                                     Confidential and Proprietary
How MySQL Cluster Works in 1 Slide




Graphics courtesy dev.mysql.com

                                  Confidential and Proprietary
CIRCULAR REPLICATION/FAILOVER




Graphics courtesy O’Reilly OnLamp.com


                                        Confidential and Proprietary
AVAILABILITY DEFINED

•  Availability of the entire
   system:
             n   m
Asys = 1 – Π(1-Πri)j            V
            i=1 j=1             I
                                P

•  Number of Parallel
   Components Needed to
   Achieve Availability Amin:



                                    Parallel
                                                       Serial
Nmin = [ln(1-Amin)/ln(1-r)]



                                               Confidential and Proprietary
AWS Meets MySQL Cluster

 •  Why AWS?
   – Cheap and easy infrastructure-in-a-box
      (Or so I thought! Ha!)
 •  Services Used:
   – EC2 (Centos 5.3, small instances for mgm
     & query nodes, XL for data
   – Elastic IPs/ELB
   – EBS Volumes
   – S3
   – Cloudwatch

                                  Confidential and Proprietary
ARCHITECTURAL TILES
                                   AWS Availability Zones
Tiling Rules
•  Never separate NDB & SQL
                                   A                        B
•  Ndb:2-SQL:1-MGM:1
•  Scale by adding more tiles
•  Failover 1st to nearest AZ
•  Then to nearest DC
•  At least 1 replica/AZ
                                   C                ELB
•  Don’t share nodes
•  Mgmt nodes are redundant
Limitations                            Unused (not present in all locations)
•  AWS is network-bound @
   250 MBPS – ouch!
•  Need specific ACL across AZ           Data                Mgmt                 SQL
   boundaries                            Node                Node                 Node
•  AZs not uniform!
•  No GSLB
•  Dynamic IPs
•  ELB sticky sessions !reliable

                                                   Confidential and Proprietary
Architecture Stack

Scale by Tiling




                  A   B             A      B           A       B
                   A   B
                    A   B




                         A      B               A      B




                     5 AWS Data Centers: US-E, US-W, TK, EU, AS




                                                             Confidential and Proprietary
Other Technologies Considered

 •  Paxos
   – Elegant-but-complex consensus-based
     messaging protocol
   – Used in Google Megastore, Bing metadata
 •  Java Query Caching
   – Queries as serialized objects
   – Not yet working
 •  Multiple Ring Architectures
   – Even more complicated = no way


                                     Confidential and Proprietary
SYSTEM READ/WRITE PERFORMANCE (!)

 What we tested:
 •  32 & 256 byte char fields
                                                 In-region replication tests
 •  Reads, writes, query speed vs. volume
 •  Data replication speeds
 Results:
 •  Global replication < 350 ms
 •  256 byte read < 10ms worldwide




  06/19/2011     06/20/2011     06/21/2011   06/22/2011                     06/23/2011

                                             Confidential and Proprietary
Data Models and Query Optimization


  •  Network Latency is an obvious issue
  •  Data model requires all segments present
     in each geo-region
  •  Parameterized (Linked) Joins
    – Adaptive Query Localization (SIP)
      technique from Clustra (see Clement
      Frazer’s blog for details)




                                   Confidential and Proprietary
Conservation of Timestamps or The
Commit Ordering Problem
 •  Why does commit ordering matter?
 •  Write operators are non-commutative

     [W(d,t1),W(d,t2)] != 0 unless t1=t2

   – Can lead to inconsistency
   – Can lead to timestamp corruption
   – Forcing sequential writes defeats Amdahl’s
     rule
 •  Can show up in GSLB scenarios

                                   Confidential and Proprietary
Hard Lessons, Shared

 •  Be Careful…
     –  With “Eventual Consistency”-related concepts
     –  ACID, CAP are not really as well-defined as we’d like
        considering how often we invoke them
 •  MySQL Cluster is a good solution
     –  Real HA, real SQL
     –  Notable limitations around fields, datatypes
     –  Successfully competes with NoSQL systems for most use
        cases – better in many cases
 •  NoSQL Systems
     –  All have relatively low levels of maturity
     –  More suitable for simpler key-value models
     –  Victim of Tech Fashion


                                            Confidential and Proprietary
Future Directions

 •  Alternate solution using Pacemaker,
    Heartbeat
   – From Yves Trudeau @ Percona
   – Uses InnoDB, not NDB
 •  Implement Memcached plugin
   – To test NoSQL functionality, APIs
 •  Add simple connection-based persistence
    to preserve connections during failover
 •  Better data node distribution
 •  Better testing & monitoring

                                   Confidential and Proprietary
Summing Up On YESQL v0.85


•  It works! Far better than expected.
•  Very fast, very reliable
•  Reduced complexity since v0.7
•  AWS poses challenges that private data
   centers may not experience
•  You can achieve high performance and
   availability without giving up relational
   models and read consistency!


                                  Confidential and Proprietary
The Big Picture on Big Data

 •  Only use Big Data solutions when you have a real
    Big Data problem.
    –  Don’t be a Dedicated Follower of Tech Fashion!
 •  Not all Big Data solutions are created equal
    –  What tradeoffs are most important to you?
    –  Consistency, Fault Tolerance, Availability,
       Performance, Variability
 •  Is your data model a fit for NoSQL?
    –  You don’t have to give up the relational model in
       most cases, so don’t!
 •  You can achieve high performance and availability
    without giving up relational models and read
    consistency! Just say YESQL!

                                           Confidential and Proprietary
“In the long run, we are all dead
eventually consistent.”
Maynard Keynes on NoSQL Databases


Twitter: @daniel_b_austin
Email: daaustin@paypal.com




With apologies and thanks to the real DB experts, Andrew Goodman, Yves
Trudeau, Frazer Clement, Daniel Abadi, Kent Beck, and everyone else who
contributed. It really works!

More Related Content

What's hot

Conference slides: MySQL Cluster Performance Tuning
Conference slides: MySQL Cluster Performance TuningConference slides: MySQL Cluster Performance Tuning
Conference slides: MySQL Cluster Performance Tuning
Severalnines
 
Conference tutorial: MySQL Cluster as NoSQL
Conference tutorial: MySQL Cluster as NoSQLConference tutorial: MySQL Cluster as NoSQL
Conference tutorial: MySQL Cluster as NoSQL
Severalnines
 
MySQL Cluster (NDB) - Best Practices Percona Live 2017
MySQL Cluster (NDB) - Best Practices Percona Live 2017MySQL Cluster (NDB) - Best Practices Percona Live 2017
MySQL Cluster (NDB) - Best Practices Percona Live 2017
Severalnines
 
MySQL NDB Cluster 8.0
MySQL NDB Cluster 8.0MySQL NDB Cluster 8.0
MySQL NDB Cluster 8.0
Ted Wennmark
 
Choosing a Next Gen Database: the New World Order of NoSQL, NewSQL, and MySQL
Choosing a Next Gen Database: the New World Order of NoSQL, NewSQL, and MySQLChoosing a Next Gen Database: the New World Order of NoSQL, NewSQL, and MySQL
Choosing a Next Gen Database: the New World Order of NoSQL, NewSQL, and MySQL
ScaleBase
 
MySQL Cluster
MySQL ClusterMySQL Cluster
MySQL Cluster
Abel Flórez
 
MOUG17 Keynote: Oracle OpenWorld Major Announcements
MOUG17 Keynote: Oracle OpenWorld Major AnnouncementsMOUG17 Keynote: Oracle OpenWorld Major Announcements
MOUG17 Keynote: Oracle OpenWorld Major Announcements
Monica Li
 
The Real Scoop on Migrating from Oracle Databases
The Real Scoop on Migrating from Oracle DatabasesThe Real Scoop on Migrating from Oracle Databases
The Real Scoop on Migrating from Oracle Databases
EDB
 
MariaDB: Connect Storage Engine
MariaDB: Connect Storage EngineMariaDB: Connect Storage Engine
MariaDB: Connect Storage Engine
Kangaroot
 
MySQL for Software-as-a-Service (SaaS)
MySQL for Software-as-a-Service (SaaS)MySQL for Software-as-a-Service (SaaS)
MySQL for Software-as-a-Service (SaaS)
Mario Beck
 
MySQL 5.7: Focus on InnoDB
MySQL 5.7: Focus on InnoDBMySQL 5.7: Focus on InnoDB
MySQL 5.7: Focus on InnoDB
Mario Beck
 
MySQL InnoDB Cluster and NDB Cluster
MySQL InnoDB Cluster and NDB ClusterMySQL InnoDB Cluster and NDB Cluster
MySQL InnoDB Cluster and NDB Cluster
Mario Beck
 
Python Utilities for Managing MySQL Databases
Python Utilities for Managing MySQL DatabasesPython Utilities for Managing MySQL Databases
Python Utilities for Managing MySQL Databases
Mats Kindahl
 
20141011 my sql clusterv01pptx
20141011 my sql clusterv01pptx20141011 my sql clusterv01pptx
20141011 my sql clusterv01pptx
Ivan Ma
 
Severalnines Self-Training: MySQL® Cluster - Part VI
Severalnines Self-Training: MySQL® Cluster - Part VISeveralnines Self-Training: MySQL® Cluster - Part VI
Severalnines Self-Training: MySQL® Cluster - Part VI
Severalnines
 
MariaDB Galera Cluster
MariaDB Galera ClusterMariaDB Galera Cluster
MariaDB Galera ClusterAbdul Manaf
 
What's new in MySQL 5.7, Oracle Virtual Technology Summit, 2016
What's new in MySQL 5.7, Oracle Virtual Technology Summit, 2016What's new in MySQL 5.7, Oracle Virtual Technology Summit, 2016
What's new in MySQL 5.7, Oracle Virtual Technology Summit, 2016Geir Høydalsvik
 
Elastic Scalability in MySQL Fabric Using OpenStack
Elastic Scalability in MySQL Fabric Using OpenStackElastic Scalability in MySQL Fabric Using OpenStack
Elastic Scalability in MySQL Fabric Using OpenStack
Mats Kindahl
 
Avoiding.the.pitfallsof.oracle.migration.2013
Avoiding.the.pitfallsof.oracle.migration.2013Avoiding.the.pitfallsof.oracle.migration.2013
Avoiding.the.pitfallsof.oracle.migration.2013
EDB
 
DB12c: All You Need to Know About the Resource Manager
DB12c: All You Need to Know About the Resource ManagerDB12c: All You Need to Know About the Resource Manager
DB12c: All You Need to Know About the Resource Manager
Maris Elsins
 

What's hot (20)

Conference slides: MySQL Cluster Performance Tuning
Conference slides: MySQL Cluster Performance TuningConference slides: MySQL Cluster Performance Tuning
Conference slides: MySQL Cluster Performance Tuning
 
Conference tutorial: MySQL Cluster as NoSQL
Conference tutorial: MySQL Cluster as NoSQLConference tutorial: MySQL Cluster as NoSQL
Conference tutorial: MySQL Cluster as NoSQL
 
MySQL Cluster (NDB) - Best Practices Percona Live 2017
MySQL Cluster (NDB) - Best Practices Percona Live 2017MySQL Cluster (NDB) - Best Practices Percona Live 2017
MySQL Cluster (NDB) - Best Practices Percona Live 2017
 
MySQL NDB Cluster 8.0
MySQL NDB Cluster 8.0MySQL NDB Cluster 8.0
MySQL NDB Cluster 8.0
 
Choosing a Next Gen Database: the New World Order of NoSQL, NewSQL, and MySQL
Choosing a Next Gen Database: the New World Order of NoSQL, NewSQL, and MySQLChoosing a Next Gen Database: the New World Order of NoSQL, NewSQL, and MySQL
Choosing a Next Gen Database: the New World Order of NoSQL, NewSQL, and MySQL
 
MySQL Cluster
MySQL ClusterMySQL Cluster
MySQL Cluster
 
MOUG17 Keynote: Oracle OpenWorld Major Announcements
MOUG17 Keynote: Oracle OpenWorld Major AnnouncementsMOUG17 Keynote: Oracle OpenWorld Major Announcements
MOUG17 Keynote: Oracle OpenWorld Major Announcements
 
The Real Scoop on Migrating from Oracle Databases
The Real Scoop on Migrating from Oracle DatabasesThe Real Scoop on Migrating from Oracle Databases
The Real Scoop on Migrating from Oracle Databases
 
MariaDB: Connect Storage Engine
MariaDB: Connect Storage EngineMariaDB: Connect Storage Engine
MariaDB: Connect Storage Engine
 
MySQL for Software-as-a-Service (SaaS)
MySQL for Software-as-a-Service (SaaS)MySQL for Software-as-a-Service (SaaS)
MySQL for Software-as-a-Service (SaaS)
 
MySQL 5.7: Focus on InnoDB
MySQL 5.7: Focus on InnoDBMySQL 5.7: Focus on InnoDB
MySQL 5.7: Focus on InnoDB
 
MySQL InnoDB Cluster and NDB Cluster
MySQL InnoDB Cluster and NDB ClusterMySQL InnoDB Cluster and NDB Cluster
MySQL InnoDB Cluster and NDB Cluster
 
Python Utilities for Managing MySQL Databases
Python Utilities for Managing MySQL DatabasesPython Utilities for Managing MySQL Databases
Python Utilities for Managing MySQL Databases
 
20141011 my sql clusterv01pptx
20141011 my sql clusterv01pptx20141011 my sql clusterv01pptx
20141011 my sql clusterv01pptx
 
Severalnines Self-Training: MySQL® Cluster - Part VI
Severalnines Self-Training: MySQL® Cluster - Part VISeveralnines Self-Training: MySQL® Cluster - Part VI
Severalnines Self-Training: MySQL® Cluster - Part VI
 
MariaDB Galera Cluster
MariaDB Galera ClusterMariaDB Galera Cluster
MariaDB Galera Cluster
 
What's new in MySQL 5.7, Oracle Virtual Technology Summit, 2016
What's new in MySQL 5.7, Oracle Virtual Technology Summit, 2016What's new in MySQL 5.7, Oracle Virtual Technology Summit, 2016
What's new in MySQL 5.7, Oracle Virtual Technology Summit, 2016
 
Elastic Scalability in MySQL Fabric Using OpenStack
Elastic Scalability in MySQL Fabric Using OpenStackElastic Scalability in MySQL Fabric Using OpenStack
Elastic Scalability in MySQL Fabric Using OpenStack
 
Avoiding.the.pitfallsof.oracle.migration.2013
Avoiding.the.pitfallsof.oracle.migration.2013Avoiding.the.pitfallsof.oracle.migration.2013
Avoiding.the.pitfallsof.oracle.migration.2013
 
DB12c: All You Need to Know About the Resource Manager
DB12c: All You Need to Know About the Resource ManagerDB12c: All You Need to Know About the Resource Manager
DB12c: All You Need to Know About the Resource Manager
 

Viewers also liked

MySQL カジュアル 福岡 03
MySQL カジュアル 福岡 03MySQL カジュアル 福岡 03
MySQL カジュアル 福岡 03
Aya Komuro
 
keyvi the key value index @ Cliqz
keyvi the key value index @ Cliqzkeyvi the key value index @ Cliqz
keyvi the key value index @ Cliqz
Hendrik Muhs
 
DB技術[実践]入門を読んだ
DB技術[実践]入門を読んだDB技術[実践]入門を読んだ
DB技術[実践]入門を読んだYuuki Tan-nai
 
Big Data: It's More Than Volume, Paypal
Big Data: It's More Than Volume, PaypalBig Data: It's More Than Volume, Paypal
Big Data: It's More Than Volume, Paypal
Innovation Enterprise
 
H2O World - Data Science w/ Big Data in a Corporate Environment - Nachum Shacham
H2O World - Data Science w/ Big Data in a Corporate Environment - Nachum ShachamH2O World - Data Science w/ Big Data in a Corporate Environment - Nachum Shacham
H2O World - Data Science w/ Big Data in a Corporate Environment - Nachum Shacham
Sri Ambati
 
Scaling Twitter
Scaling TwitterScaling Twitter
Scaling Twitter
Blaine
 
MySQL Cluster no PayPal
MySQL Cluster no PayPalMySQL Cluster no PayPal
MySQL Cluster no PayPal
MySQL Brasil
 
初心者エンジニアのシステム構築失敗談
初心者エンジニアのシステム構築失敗談初心者エンジニアのシステム構築失敗談
初心者エンジニアのシステム構築失敗談Makoto Haruyama
 
DRBD9とdrbdmanageの概要紹介
DRBD9とdrbdmanageの概要紹介DRBD9とdrbdmanageの概要紹介
DRBD9とdrbdmanageの概要紹介
株式会社サードウェア
 
Driving Sales Execution with Compelling Messaging
Driving Sales Execution with Compelling MessagingDriving Sales Execution with Compelling Messaging
Driving Sales Execution with Compelling Messaging
Russell Scherwin
 
The little engine(s) that could: scaling online social networks
The little engine(s) that could: scaling online social  networksThe little engine(s) that could: scaling online social  networks
The little engine(s) that could: scaling online social networks
Josep M. Pujol
 
Fluentd meetup #2
Fluentd meetup #2Fluentd meetup #2
Fluentd meetup #2
Tomohiro Ikeda
 
fluent-plugin-resque_stat
fluent-plugin-resque_statfluent-plugin-resque_stat
fluent-plugin-resque_statMakoto Haruyama
 
Tracking The Trackers WWW 2016
Tracking The Trackers WWW 2016Tracking The Trackers WWW 2016
Tracking The Trackers WWW 2016
Josep M. Pujol
 
私がMySQLを始めた理由
私がMySQLを始めた理由私がMySQLを始めた理由
私がMySQLを始めた理由yoyamasaki
 
MySQL for Excelの紹介
MySQL for Excelの紹介MySQL for Excelの紹介
MySQL for Excelの紹介yoyamasaki
 
カジュアルにギャップキーロックとネクストキーロック
カジュアルにギャップキーロックとネクストキーロックカジュアルにギャップキーロックとネクストキーロック
カジュアルにギャップキーロックとネクストキーロック株式会社シャーロック
 

Viewers also liked (20)

MySQL カジュアル 福岡 03
MySQL カジュアル 福岡 03MySQL カジュアル 福岡 03
MySQL カジュアル 福岡 03
 
keyvi the key value index @ Cliqz
keyvi the key value index @ Cliqzkeyvi the key value index @ Cliqz
keyvi the key value index @ Cliqz
 
DB技術[実践]入門を読んだ
DB技術[実践]入門を読んだDB技術[実践]入門を読んだ
DB技術[実践]入門を読んだ
 
Big Data: It's More Than Volume, Paypal
Big Data: It's More Than Volume, PaypalBig Data: It's More Than Volume, Paypal
Big Data: It's More Than Volume, Paypal
 
H2O World - Data Science w/ Big Data in a Corporate Environment - Nachum Shacham
H2O World - Data Science w/ Big Data in a Corporate Environment - Nachum ShachamH2O World - Data Science w/ Big Data in a Corporate Environment - Nachum Shacham
H2O World - Data Science w/ Big Data in a Corporate Environment - Nachum Shacham
 
Scaling Twitter
Scaling TwitterScaling Twitter
Scaling Twitter
 
Boo box e my sql
Boo box e my sqlBoo box e my sql
Boo box e my sql
 
MySQL Cluster no PayPal
MySQL Cluster no PayPalMySQL Cluster no PayPal
MySQL Cluster no PayPal
 
初心者エンジニアのシステム構築失敗談
初心者エンジニアのシステム構築失敗談初心者エンジニアのシステム構築失敗談
初心者エンジニアのシステム構築失敗談
 
DRBD9とdrbdmanageの概要紹介
DRBD9とdrbdmanageの概要紹介DRBD9とdrbdmanageの概要紹介
DRBD9とdrbdmanageの概要紹介
 
Driving Sales Execution with Compelling Messaging
Driving Sales Execution with Compelling MessagingDriving Sales Execution with Compelling Messaging
Driving Sales Execution with Compelling Messaging
 
The little engine(s) that could: scaling online social networks
The little engine(s) that could: scaling online social  networksThe little engine(s) that could: scaling online social  networks
The little engine(s) that could: scaling online social networks
 
Fluentd meetup #2
Fluentd meetup #2Fluentd meetup #2
Fluentd meetup #2
 
fluent-plugin-resque_stat
fluent-plugin-resque_statfluent-plugin-resque_stat
fluent-plugin-resque_stat
 
Tracking The Trackers WWW 2016
Tracking The Trackers WWW 2016Tracking The Trackers WWW 2016
Tracking The Trackers WWW 2016
 
私がMySQLを始めた理由
私がMySQLを始めた理由私がMySQLを始めた理由
私がMySQLを始めた理由
 
MySQL for Excelの紹介
MySQL for Excelの紹介MySQL for Excelの紹介
MySQL for Excelの紹介
 
Fluentd in Co-Work
Fluentd in Co-WorkFluentd in Co-Work
Fluentd in Co-Work
 
MySQL de NoSQL Fukuoka
MySQL de NoSQL FukuokaMySQL de NoSQL Fukuoka
MySQL de NoSQL Fukuoka
 
カジュアルにギャップキーロックとネクストキーロック
カジュアルにギャップキーロックとネクストキーロックカジュアルにギャップキーロックとネクストキーロック
カジュアルにギャップキーロックとネクストキーロック
 

Similar to PayPal Big Data and MySQL Cluster

Yes sql08 inmemorydb
Yes sql08 inmemorydbYes sql08 inmemorydb
Yes sql08 inmemorydb
Daniel Austin
 
A Global In-memory Data System for MySQL
A Global In-memory Data System for MySQLA Global In-memory Data System for MySQL
A Global In-memory Data System for MySQL
Daniel Austin
 
Big Data is a Big Scam Most of the Time! (MySQL Connect Keynote 2012)
Big Data is a Big Scam Most of the Time! (MySQL Connect Keynote 2012)Big Data is a Big Scam Most of the Time! (MySQL Connect Keynote 2012)
Big Data is a Big Scam Most of the Time! (MySQL Connect Keynote 2012)
Daniel Austin
 
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,..."Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...
lisapaglia
 
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDBBig Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
BigDataCloud
 
Microsoft Openness Mongo DB
Microsoft Openness Mongo DBMicrosoft Openness Mongo DB
Microsoft Openness Mongo DBHeriyadi Janwar
 
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
LinkedIn
 
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...Qian Lin
 
NoSQL Intro with cassandra
NoSQL Intro with cassandraNoSQL Intro with cassandra
NoSQL Intro with cassandra
Brian Enochson
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
Rahul Borate
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
Rahul Borate
 
Webinar: The Future of SQL
Webinar: The Future of SQLWebinar: The Future of SQL
Webinar: The Future of SQL
Crate.io
 
Storage Systems For Scalable systems
Storage Systems For Scalable systemsStorage Systems For Scalable systems
Storage Systems For Scalable systemselliando dias
 
Cosmos db
Cosmos dbCosmos db
Cosmos db
Akshat Thakar
 
NoSQL
NoSQLNoSQL
Navigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesNavigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skies
shnkr_rmchndrn
 
Apache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and BasicsApache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and Basics
Oleg Magazov
 
OSDC 2018 | The operational brain: how new Paradigms like Machine Learning ar...
OSDC 2018 | The operational brain: how new Paradigms like Machine Learning ar...OSDC 2018 | The operational brain: how new Paradigms like Machine Learning ar...
OSDC 2018 | The operational brain: how new Paradigms like Machine Learning ar...
NETWAYS
 
Solr cloud the 'search first' nosql database extended deep dive
Solr cloud the 'search first' nosql database   extended deep diveSolr cloud the 'search first' nosql database   extended deep dive
Solr cloud the 'search first' nosql database extended deep dive
lucenerevolution
 

Similar to PayPal Big Data and MySQL Cluster (20)

Yes sql08 inmemorydb
Yes sql08 inmemorydbYes sql08 inmemorydb
Yes sql08 inmemorydb
 
A Global In-memory Data System for MySQL
A Global In-memory Data System for MySQLA Global In-memory Data System for MySQL
A Global In-memory Data System for MySQL
 
Big Data is a Big Scam Most of the Time! (MySQL Connect Keynote 2012)
Big Data is a Big Scam Most of the Time! (MySQL Connect Keynote 2012)Big Data is a Big Scam Most of the Time! (MySQL Connect Keynote 2012)
Big Data is a Big Scam Most of the Time! (MySQL Connect Keynote 2012)
 
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,..."Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...
 
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDBBig Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
 
Microsoft Openness Mongo DB
Microsoft Openness Mongo DBMicrosoft Openness Mongo DB
Microsoft Openness Mongo DB
 
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
 
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
 
NoSQL Intro with cassandra
NoSQL Intro with cassandraNoSQL Intro with cassandra
NoSQL Intro with cassandra
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
 
Webinar: The Future of SQL
Webinar: The Future of SQLWebinar: The Future of SQL
Webinar: The Future of SQL
 
Storage Systems For Scalable systems
Storage Systems For Scalable systemsStorage Systems For Scalable systems
Storage Systems For Scalable systems
 
Cosmos db
Cosmos dbCosmos db
Cosmos db
 
NoSQL
NoSQLNoSQL
NoSQL
 
Navigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesNavigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skies
 
Apache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and BasicsApache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and Basics
 
OSDC 2018 | The operational brain: how new Paradigms like Machine Learning ar...
OSDC 2018 | The operational brain: how new Paradigms like Machine Learning ar...OSDC 2018 | The operational brain: how new Paradigms like Machine Learning ar...
OSDC 2018 | The operational brain: how new Paradigms like Machine Learning ar...
 
Solr cloud the 'search first' nosql database extended deep dive
Solr cloud the 'search first' nosql database   extended deep diveSolr cloud the 'search first' nosql database   extended deep dive
Solr cloud the 'search first' nosql database extended deep dive
 

More from Mat Keep

Blockchain & the IoT
Blockchain & the IoTBlockchain & the IoT
Blockchain & the IoT
Mat Keep
 
10-Step Methodology to Building a Single View with MongoDB
10-Step Methodology to Building a Single View with MongoDB10-Step Methodology to Building a Single View with MongoDB
10-Step Methodology to Building a Single View with MongoDB
Mat Keep
 
MongoDB at Baidu
MongoDB at BaiduMongoDB at Baidu
MongoDB at Baidu
Mat Keep
 
MongoDB_Spark
MongoDB_SparkMongoDB_Spark
MongoDB_SparkMat Keep
 
Business of iot_mongodb_spark
Business of iot_mongodb_sparkBusiness of iot_mongodb_spark
Business of iot_mongodb_spark
Mat Keep
 
Mongo db 2.6_security_architecture
Mongo db 2.6_security_architectureMongo db 2.6_security_architecture
Mongo db 2.6_security_architecture
Mat Keep
 
MySQL HA Solutions
MySQL HA SolutionsMySQL HA Solutions
MySQL HA Solutions
Mat Keep
 
MySQL Cluster NoSQL Memcached API
MySQL Cluster NoSQL Memcached APIMySQL Cluster NoSQL Memcached API
MySQL Cluster NoSQL Memcached API
Mat Keep
 
My sql 5.6_replwebinar_may12
My sql 5.6_replwebinar_may12My sql 5.6_replwebinar_may12
My sql 5.6_replwebinar_may12
Mat Keep
 

More from Mat Keep (9)

Blockchain & the IoT
Blockchain & the IoTBlockchain & the IoT
Blockchain & the IoT
 
10-Step Methodology to Building a Single View with MongoDB
10-Step Methodology to Building a Single View with MongoDB10-Step Methodology to Building a Single View with MongoDB
10-Step Methodology to Building a Single View with MongoDB
 
MongoDB at Baidu
MongoDB at BaiduMongoDB at Baidu
MongoDB at Baidu
 
MongoDB_Spark
MongoDB_SparkMongoDB_Spark
MongoDB_Spark
 
Business of iot_mongodb_spark
Business of iot_mongodb_sparkBusiness of iot_mongodb_spark
Business of iot_mongodb_spark
 
Mongo db 2.6_security_architecture
Mongo db 2.6_security_architectureMongo db 2.6_security_architecture
Mongo db 2.6_security_architecture
 
MySQL HA Solutions
MySQL HA SolutionsMySQL HA Solutions
MySQL HA Solutions
 
MySQL Cluster NoSQL Memcached API
MySQL Cluster NoSQL Memcached APIMySQL Cluster NoSQL Memcached API
MySQL Cluster NoSQL Memcached API
 
My sql 5.6_replwebinar_may12
My sql 5.6_replwebinar_may12My sql 5.6_replwebinar_may12
My sql 5.6_replwebinar_may12
 

Recently uploaded

Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 

Recently uploaded (20)

Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 

PayPal Big Data and MySQL Cluster

  • 1. Big Data is a Big Scam (Most of the Time) Daniel Austin, PayPal Technical Staff MySQL Connect Conference September 30, 2012 v1.2
  • 2. Today’s Agenda Big Myths about Big Data YESQL: A Counterexample Q&A Global In-memory MySQL Confidential and Proprietary 2
  • 3. THE FUNDAMENTAL PROBLEM IN DISTRIBUTED DATA SYSTEMS “How Do We Manage Reliable Distribution of Data Across Geographical Distances?” Confidential and Proprietary
  • 4. The NoSQL Solution •  NoSQL Systems provide a solution that relaxes many of the common constraints of typical RDBMS systems –  Slow - RDBMS has not scaled with CPUs –  Often require complex data management (SOX, SOR) –  Costly to build and maintain, slow to change and adapt –  Intolerant of CAP models (more on this later) •  Non-relational models, usually key-value •  May be batched or streaming •  Not necessarily distributed geographically Confidential and Proprietary
  • 5. Big Data Myth #1: Big Data = NoSQL •  ‘Big Data’ Refers to a Common Set of Problems –  Large Volumes –  High Rates of Change •  Of Data •  Of Data Models •  Of Data Presentation and Output –  Often Require ‘Fast Data’ as well as ‘Big’ •  Near-real Time Analytics •  Mapping Complex Structures Takeaway: Big Data is the problem, NoSQL is one (proposed) solution Confidential and Proprietary
  • 6. 3 Kinds of Big Data Systems 1.  Columnar K-V Systems – Hadoop, Hbase, Cassandra, PNUTs 2.  Document-Based – MongoDB, TerraCotta 3.  Graph-Based – FlockDB, Voldemort Takeaway: These were originally designed as solutions to specific problems because no commercial solution would work. Confidential and Proprietary
  • 7. Big Data Hype Cycle: Where Are We Now? There are currently more than 120+ NoSQL databases listed at nosql-databases.com! You Are Here ? As the pace of new technology solutions has slowed, some clear winners have emerged. Confidential and Proprietary
  • 8. Big Data Myth #2: The CAP Theorem Doesn’t Say What You Think It Does •  Consistency, Availability, (Network) Partition •  The Real Story: These are not Independent Variables •  AP =CP (Um, what? But…A != C ) •  Variations: –  PACELC (adds latency tolerance) Takeaway: the real story here is about the tradeoffs made by designers of different systems, and the main tradeoff is between consistency and availability, usually in favor of the latter. Confidential and Proprietary
  • 9. Big Data Myth: You Need A Big Data System Well, Maybe….But Before You Go There… There are essentially two ‘Big Data Problems’: “I have too much data and it’s coming in too fast to handle with any RDBMS.” “I have a lot of data distributed geographically and need to be able to read and write from anywhere in near real-time.” Takeaway: if you have one of these Big Data problems, a NoSQL solution might work for you. But there are also other alternatives… Confidential and Proprietary
  • 10. BIG DATA MYTH #3: BIG DATA AND NOSQL ARE NEW IDEAS •  The first and most successful such system is DNS, created in 1983. •  Began with flat files •  Currently serves the entire Internet (!) •  DNS is an AP system, availability is #1 •  Many extensions complicate a simple design •  Suggests a new term for CAP- like ideas: variability •  DNS variability is very high, often 2-3x the mean Confidential and Proprietary
  • 11. Today’s Agenda Big Myths About Big Data YESQL: A Counterexample Q&A Global In-memory MySQL Confidential and Proprietary 11
  • 12. Mission YESQL “Develop a globally distributed DB For user-related data.” •  Must Not Fail (99.999%) •  Must Not Lose Data. Period. •  Must Support Transactions •  Must Support (some) SQL •  Must WriteRead 32-bit integer globally in 1000ms •  Maximum Data Volume: 100 TB •  Must Scale Linearly with Costs Confidential and Proprietary
  • 13. What about “High Performance”? • Maximum lightspeed distance on Earth’s Surface: ~67 ms • Target: data available worldwide in < 1000 ms Sound Easy? Think Again! Confidential and Proprietary
  • 14. WHY MYSQL CLUSTER? Pro Con •  True HA by design •  Some semantic –  Fast recovery limitations on fields •  Supports (some) X- •  Size constraints (2 actions TB?) •  Relational Model –  Hardware limits •  In-memory also architecture = high •  Higher cost/byte performance •  Requires reasonable •  Disk storage for data partitioning non-indexed data •  Higher complexity •  APIs, APIs, APIs Confidential and Proprietary
  • 15. How MySQL Cluster Works in 1 Slide Graphics courtesy dev.mysql.com Confidential and Proprietary
  • 16. CIRCULAR REPLICATION/FAILOVER Graphics courtesy O’Reilly OnLamp.com Confidential and Proprietary
  • 17. AVAILABILITY DEFINED •  Availability of the entire system: n m Asys = 1 – Π(1-Πri)j V i=1 j=1 I P •  Number of Parallel Components Needed to Achieve Availability Amin: Parallel Serial Nmin = [ln(1-Amin)/ln(1-r)] Confidential and Proprietary
  • 18. AWS Meets MySQL Cluster •  Why AWS? – Cheap and easy infrastructure-in-a-box (Or so I thought! Ha!) •  Services Used: – EC2 (Centos 5.3, small instances for mgm & query nodes, XL for data – Elastic IPs/ELB – EBS Volumes – S3 – Cloudwatch Confidential and Proprietary
  • 19. ARCHITECTURAL TILES AWS Availability Zones Tiling Rules •  Never separate NDB & SQL A B •  Ndb:2-SQL:1-MGM:1 •  Scale by adding more tiles •  Failover 1st to nearest AZ •  Then to nearest DC •  At least 1 replica/AZ C ELB •  Don’t share nodes •  Mgmt nodes are redundant Limitations Unused (not present in all locations) •  AWS is network-bound @ 250 MBPS – ouch! •  Need specific ACL across AZ Data Mgmt SQL boundaries Node Node Node •  AZs not uniform! •  No GSLB •  Dynamic IPs •  ELB sticky sessions !reliable Confidential and Proprietary
  • 20. Architecture Stack Scale by Tiling A B A B A B A B A B A B A B 5 AWS Data Centers: US-E, US-W, TK, EU, AS Confidential and Proprietary
  • 21. Other Technologies Considered •  Paxos – Elegant-but-complex consensus-based messaging protocol – Used in Google Megastore, Bing metadata •  Java Query Caching – Queries as serialized objects – Not yet working •  Multiple Ring Architectures – Even more complicated = no way Confidential and Proprietary
  • 22. SYSTEM READ/WRITE PERFORMANCE (!) What we tested: •  32 & 256 byte char fields In-region replication tests •  Reads, writes, query speed vs. volume •  Data replication speeds Results: •  Global replication < 350 ms •  256 byte read < 10ms worldwide 06/19/2011 06/20/2011 06/21/2011 06/22/2011 06/23/2011 Confidential and Proprietary
  • 23. Data Models and Query Optimization •  Network Latency is an obvious issue •  Data model requires all segments present in each geo-region •  Parameterized (Linked) Joins – Adaptive Query Localization (SIP) technique from Clustra (see Clement Frazer’s blog for details) Confidential and Proprietary
  • 24. Conservation of Timestamps or The Commit Ordering Problem •  Why does commit ordering matter? •  Write operators are non-commutative [W(d,t1),W(d,t2)] != 0 unless t1=t2 – Can lead to inconsistency – Can lead to timestamp corruption – Forcing sequential writes defeats Amdahl’s rule •  Can show up in GSLB scenarios Confidential and Proprietary
  • 25. Hard Lessons, Shared •  Be Careful… –  With “Eventual Consistency”-related concepts –  ACID, CAP are not really as well-defined as we’d like considering how often we invoke them •  MySQL Cluster is a good solution –  Real HA, real SQL –  Notable limitations around fields, datatypes –  Successfully competes with NoSQL systems for most use cases – better in many cases •  NoSQL Systems –  All have relatively low levels of maturity –  More suitable for simpler key-value models –  Victim of Tech Fashion Confidential and Proprietary
  • 26. Future Directions •  Alternate solution using Pacemaker, Heartbeat – From Yves Trudeau @ Percona – Uses InnoDB, not NDB •  Implement Memcached plugin – To test NoSQL functionality, APIs •  Add simple connection-based persistence to preserve connections during failover •  Better data node distribution •  Better testing & monitoring Confidential and Proprietary
  • 27. Summing Up On YESQL v0.85 •  It works! Far better than expected. •  Very fast, very reliable •  Reduced complexity since v0.7 •  AWS poses challenges that private data centers may not experience •  You can achieve high performance and availability without giving up relational models and read consistency! Confidential and Proprietary
  • 28. The Big Picture on Big Data •  Only use Big Data solutions when you have a real Big Data problem. –  Don’t be a Dedicated Follower of Tech Fashion! •  Not all Big Data solutions are created equal –  What tradeoffs are most important to you? –  Consistency, Fault Tolerance, Availability, Performance, Variability •  Is your data model a fit for NoSQL? –  You don’t have to give up the relational model in most cases, so don’t! •  You can achieve high performance and availability without giving up relational models and read consistency! Just say YESQL! Confidential and Proprietary
  • 29. “In the long run, we are all dead eventually consistent.” Maynard Keynes on NoSQL Databases Twitter: @daniel_b_austin Email: daaustin@paypal.com With apologies and thanks to the real DB experts, Andrew Goodman, Yves Trudeau, Frazer Clement, Daniel Abadi, Kent Beck, and everyone else who contributed. It really works!