0
<Insert Picture Here>   MySQL Cluster @ PayPal                                                      Henrique Leandro      ...
Big Data is a Big Scam (Most of the Time)      Daniel Austin, PayPal Technical Staff      MySQL Connect Conference Brazil ...
Today’s Agenda                     Big Myths about Big Data                       YESQL: A Counterexample                 ...
THE FUNDAMENTAL PROBLEM IN       DISTRIBUTED DATA SYSTEMS       “How Do We Manage Reliable Distribution       of Data Acro...
Big Data Myth #1: Big Data = NoSQL     • ‘Big Data’ Refers to a Common Set of Problems          – Large Volumes          –...
The NoSQL Solution     • NoSQL Systems provide a solution that relaxes       many of the common constraints of typical RDB...
3 Kinds of Big Data Systems     1. Columnar K-V Systems          – Hadoop, Hbase, Cassandra, PNUTs     2. Document-Based  ...
Big Data Myth #2: The CAP Theorem Doesn’t Say  What You Think It Does     • Consistency, Availability, (Network) Partition...
Big Data Hype Cycle: Where Are We Now?     There are currently more than 120+ NoSQL     databases listed at nosql-database...
BIG DATA MYTH #3: BIG DATA AND NOSQL ARE       NEW IDEAS       •    The first and most successful            such system i...
Big Data Myth: You Need A Big Data System     Well, Maybe….But Before You Go There…     There are essentially two ‘Big Dat...
Today’s Agenda                        Big Myths About Big Data                   YESQL: A Counterexample                  ...
Mission YESQL     “Develop a globally distributed DB For     user-related data.”     • Must Not Fail (99.999%)     • Must ...
What about “High Performance”?       • Maximum lightspeed distance on Earth’s         Surface: ~67 ms       • Target: data...
WHY MYSQL CLUSTER?                     Pro                       Con       •    True HA by design     •   Some semantic   ...
How MySQL Cluster Works in One Slide           Graphics courtesy dev.mysql.com                                            ...
Confidential and Proprietary   17Wednesday, December 12, 12
CIRCULAR REPLICATION/FAILOVER       Graphics courtesy O’Reilly OnLamp.com                                               Co...
•   Click to edit Master text styles                                                Confidential and Proprietary   19Wedne...
AWS Meets MySQL Cluster     • Why AWS?          – Cheap and easy infrastructure-in-a-box             (Or so I thought! Ha!...
ARCHITECTURAL TILES                                                AWS Availability Zones       Tiling Rules              ...
Architecture Stack          Scale by Tiling                             A           B               A   B       A        B...
SYSTEM READ/WRITE PERFORMANCE (!)       What we tested:       • 32 & 256 byte char fields                                 ...
Data Models and Query Optimization for NDB     • Network Latency is an obvious issue     • Data model requires all segment...
Hard Lessons, Shared     • Be Careful…        – With “Eventual Consistency”-related concepts        – ACID, CAP are not re...
Confidential and Proprietary   26Wednesday, December 12, 12
Confidential and Proprietary   27Wednesday, December 12, 12
MySQL Enterprise Edition                             28Wednesday, December 12, 12
MySQL Enterprise Edition        Mais produtividade, menores riscos e maior capacidade        para o MySQL.                ...
Future Directions     • Alternate solution using Pacemaker,       Heartbeat          – Uses InnoDB, not NDB     • Implemen...
Summing Up On YESQL v0.85     • It works! Far better than expected.     • Very fast, very reliable     • Reduced complexit...
The Big Picture on Big Data     • Only use Big Data solutions when you have a real       Big Data problem.          – Don’...
“In the long run, we are all dead      eventually consistent.”      Maynard Keynes on NoSQL Databases      Twitter: @danie...
<Insert Picture Here>   Obrigado                                                   Henrique Leandro                       ...
Upcoming SlideShare
Loading in...5
×

MySQL Cluster no PayPal

867

Published on

Por que o PayPal escolheu o MySQL como solução para seus problemas de Big Data

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
867
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
26
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Transcript of "MySQL Cluster no PayPal"

  1. 1. <Insert Picture Here> MySQL Cluster @ PayPal Henrique Leandro henrique.leandro@oracle.com Consultor Senior MySQL henrique.leandro@oracle.com Dec-2012Wednesday, December 12, 12
  2. 2. Big Data is a Big Scam (Most of the Time) Daniel Austin, PayPal Technical Staff MySQL Connect Conference Brazil December 04, 2012 v1.3Wednesday, December 12, 12
  3. 3. Today’s Agenda Big Myths about Big Data YESQL: A Counterexample Q&A Confidential and Proprietary 3Wednesday, December 12, 12
  4. 4. THE FUNDAMENTAL PROBLEM IN DISTRIBUTED DATA SYSTEMS “How Do We Manage Reliable Distribution of Data Across Geographical Distances?” Confidential and ProprietaryWednesday, December 12, 12
  5. 5. Big Data Myth #1: Big Data = NoSQL • ‘Big Data’ Refers to a Common Set of Problems – Large Volumes – High Rates of Change • Of Data • Of Data Models • Of Data Presentation and Output – Often Require ‘Fast Data’ as well as ‘Big’ • Near-real Time Analytics • Mapping Complex Structures Takeaway: Big Data is the problem, NoSQL is one (proposed) solution Confidential and ProprietaryWednesday, December 12, 12
  6. 6. The NoSQL Solution • NoSQL Systems provide a solution that relaxes many of the common constraints of typical RDBMS systems – Slow - RDBMS has not scaled with CPUs – Often require complex data management (SOX, SOR) – Costly to build and maintain, slow to change and adapt – Intolerant of CAP models (more on this later) • Non-relational models, usually key-value • May be batched or streaming • Not necessarily distributed geographically Confidential and ProprietaryWednesday, December 12, 12
  7. 7. 3 Kinds of Big Data Systems 1. Columnar K-V Systems – Hadoop, Hbase, Cassandra, PNUTs 2. Document-Based – MongoDB, TerraCotta 3. Graph-Based – FlockDB, Voldemort Takeaway: These were originally designed as solutions to specific problems because no commercial solution would work. Confidential and ProprietaryWednesday, December 12, 12
  8. 8. Big Data Myth #2: The CAP Theorem Doesn’t Say What You Think It Does • Consistency, Availability, (Network) Partition • The Real Story: These are not Independent Variables • AP =CP (Um, what? But…A != C ) • Variations: – PACELC (adds latency tolerance) Takeaway: the real story here is about the tradeoffs made by designers of different systems, and the main tradeoff is between consistency and availability, usually in favor of the latter. Confidential and ProprietaryWednesday, December 12, 12
  9. 9. Big Data Hype Cycle: Where Are We Now? There are currently more than 120+ NoSQL databases listed at nosql-databases.com! You Are Here ? As the pace of new technology solutions has slowed, some clear winners have emerged. Confidential and ProprietaryWednesday, December 12, 12
  10. 10. BIG DATA MYTH #3: BIG DATA AND NOSQL ARE NEW IDEAS • The first and most successful such system is DNS, created in 1983. • Began with flat files • Currently serves the entire Internet (!) • DNS is an AP system, availability is #1 • Many extensions complicate a simple design • Suggests a new term for CAP-like ideas: variability • DNS variability is very high, often 2-3x the mean Confidential and ProprietaryWednesday, December 12, 12
  11. 11. Big Data Myth: You Need A Big Data System Well, Maybe….But Before You Go There… There are essentially two ‘Big Data Problems’: “I have too much data and it’s coming in too fast to handle with any RDBMS.” “I have a lot of data distributed geographically and need to be able to read and write from anywhere in near real-time.” Takeaway: if you have one of these Big Data problems, a NoSQL solution might work for you. But there are also other alternatives… Confidential and ProprietaryWednesday, December 12, 12
  12. 12. Today’s Agenda Big Myths About Big Data YESQL: A Counterexample Q&A Confidential and Proprietary 12Wednesday, December 12, 12
  13. 13. Mission YESQL “Develop a globally distributed DB For user-related data.” • Must Not Fail (99.999%) • Must Not Lose Data. Period. • Must Support Transactions • Must Support (some) SQL • Must WriteRead 32-bit integer globally in 1000ms • Maximum Data Volume: 100 TB • Must Scale Linearly with Costs Confidential and ProprietaryWednesday, December 12, 12
  14. 14. What about “High Performance”? • Maximum lightspeed distance on Earth’s Surface: ~67 ms • Target: data available worldwide in < 1000 ms Sound Easy? Think Again! Confidential and ProprietaryWednesday, December 12, 12
  15. 15. WHY MYSQL CLUSTER? Pro Con • True HA by design • Some semantic – Fast recovery limitations on fields • Supports (some) X- • Size constraints (2 actions TB?) • Relational Model Hardware limits also • In-memory • Higher cost/byte architecture = high • Requires reasonable performance data partitioning • Disk storage for • Higher complexity non-indexed data (since 5.1) • APIs, APIs, APIs Confidential and ProprietaryWednesday, December 12, 12
  16. 16. How MySQL Cluster Works in One Slide Graphics courtesy dev.mysql.com Confidential and ProprietaryWednesday, December 12, 12
  17. 17. Confidential and Proprietary 17Wednesday, December 12, 12
  18. 18. CIRCULAR REPLICATION/FAILOVER Graphics courtesy O’Reilly OnLamp.com Confidential and ProprietaryWednesday, December 12, 12
  19. 19. • Click to edit Master text styles Confidential and Proprietary 19Wednesday, December 12, 12
  20. 20. AWS Meets MySQL Cluster • Why AWS? – Cheap and easy infrastructure-in-a-box (Or so I thought! Ha!) • Services Used: – EC2 (Centos 5.3, small instances for mgm & query nodes, XL for data – Elastic IPs/ELB – EBS Volumes – S3 – Cloudwatch Confidential and ProprietaryWednesday, December 12, 12
  21. 21. ARCHITECTURAL TILES AWS Availability Zones Tiling Rules A B • Never separate NDB & SQL • Ndb:2-SQL:1-MGM:1 • Scale by adding more tiles • Failover 1st to nearest AZ • Then to nearest DC • At least 1 replica/AZ C ELB • Don’t share nodes • Mgmt nodes are redundant Limitations Unused (not present in all locations) • AWS is network-bound @ 250 MBPS – ouch! • Need specific ACL across AZ boundaries • AZs not uniform! • No GSLB NDB MGM SQL Confidential and ProprietaryWednesday, December 12, 12
  22. 22. Architecture Stack Scale by Tiling A B A B A B A B A B A B A B 5 AWS Data Centers: US-E, US-W, TK, EU, AS Confidential and ProprietaryWednesday, December 12, 12
  23. 23. SYSTEM READ/WRITE PERFORMANCE (!) What we tested: • 32 & 256 byte char fields In-region replication tests • Reads, writes, query speed vs. volume • Data replication speeds Results: • Global replication < 350 ms • 256 byte read < 1000 ms worldwide 06/19/2011 06/20/2011 06/21/2011 06/22/2011 06/23/2011 Confidential and ProprietaryWednesday, December 12, 12
  24. 24. Data Models and Query Optimization for NDB • Network Latency is an obvious issue • Data model requires all segments present in each geo-region • Parameterized (Linked) Joins – SPJ technique from Clustra (see Clement Frazer’s blog for details) Confidential and ProprietaryWednesday, December 12, 12
  25. 25. Hard Lessons, Shared • Be Careful… – With “Eventual Consistency”-related concepts – ACID, CAP are not really as well-defined as we’d like considering how often we invoke them • MySQL Cluster is a good solution – Real HA, real SQL – Notable limitations around fields, datatypes – Successfully competes with NoSQL systems for most use cases – better in many cases • NoSQL Systems – All have relatively low levels of maturity – More suitable for simpler key-value models – Victim of Tech Fashion Confidential and ProprietaryWednesday, December 12, 12
  26. 26. Confidential and Proprietary 26Wednesday, December 12, 12
  27. 27. Confidential and Proprietary 27Wednesday, December 12, 12
  28. 28. MySQL Enterprise Edition 28Wednesday, December 12, 12
  29. 29. MySQL Enterprise Edition Mais produtividade, menores riscos e maior capacidade para o MySQL. Oracle Premier Lifetime Support MySQL Enterprise Oracle Product Security Certifications/Integrations MySQL Enterprise MySQL Enterprise Audit Monitor/Query Analyzer MySQL Enterprise MySQL Enterprise Scalability Backup MySQL Enterprise MySQL Workbench High Availability 28Wednesday, December 12, 12
  30. 30. Future Directions • Alternate solution using Pacemaker, Heartbeat – Uses InnoDB, not NDB • Implement Memcached plugin – To test NoSQL functionality from MySQL • Add simple connection-based persistence to preserve connections during failover • Better data node distribution • Better testing & monitoring Confidential and ProprietaryWednesday, December 12, 12
  31. 31. Summing Up On YESQL v0.85 • It works! Far better than expected. • Very fast, very reliable • Reduced complexity since v0.7 • AWS poses challenges that private data centers may not experience • You can achieve high performance and availability without giving up relational models and read consistency! Confidential and ProprietaryWednesday, December 12, 12
  32. 32. The Big Picture on Big Data • Only use Big Data solutions when you have a real Big Data problem. – Don’t be a Dedicated Follower of Tech Fashion! • Not all Big Data solutions are created equal – What tradeoffs are most important to you? – Consistency, Fault Tolerance, Availability, Performance, Variability • Is your data model a fit for NoSQL? – You don’t have to give up the relational model in most cases, so don’t! • You can achieve high performance and availability without giving up relational models and read consistency! Just say YESQL! Confidential and ProprietaryWednesday, December 12, 12
  33. 33. “In the long run, we are all dead eventually consistent.” Maynard Keynes on NoSQL Databases Twitter: @daniel_b_austin Emai: daaustin@paypal.com With apologies and thanks to the real DB experts, Andrew Goodman, Yves Trudeau, Clement Frazer, Daniel Abadi, Kent Beck, and everyone else who contributed. It really works!Wednesday, December 12, 12
  34. 34. <Insert Picture Here> Obrigado Henrique Leandro henrique.leandro@oracle.com Consultor Senior MySQL henrique.leandro@oracle.com Dec-2012Wednesday, December 12, 12
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×