SlideShare a Scribd company logo
1 of 16
MySQL Connect Conference Keynote Address
September 30, 2012 v1.2
Big Data is a Big Scam (Most of the Time)
Daniel Austin, PayPal Technical Staff
Confidential and Proprietary 2Global In-memory MySQL
Big Myths About Big Data
Preview - YESQL: A
Counterexample
Today’s Agenda
Confidential and Proprietary
THE FUNDAMENTAL PROBLEM IN
DISTRIBUTED DATA SYSTEMS
“How Do We Manage Reliable
Distribution of Data Across Geographical
Distances?”
Confidential and Proprietary
Big Data Myth #1: Big Data = NoSQL
• „Big Data‟ Refers to a Common Set of Problems
– Large Volumes
– High Rates of Change
• Of Data
• Of Data Models
• Of Data Presentation and Output
– Often Require „Fast Data‟ as well as „Big‟
• Near-real Time Analytics
• Mapping Complex Structures
Takeaway: Big Data is the problem, NoSQL is one
(proposed) solution
Confidential and Proprietary
D oYou Need A Big Data System?
Well, Maybe….But Before You Go There…
There are essentially two „Big Data Problems‟:
“I have too much data and it‟s coming in too fast to
handle with any RDBMS.”
“I have a lot of data distributed geographically and
need to be able to read and write from anywhere in
near real-time.”
Takeaway: if you have one of these Big Data
problems, a NoSQL solution might work for you.
But there are also other alternatives…
Confidential and Proprietary
The NoSQL Solution
• NoSQL Systems provide a solution that relaxes
many of the common constraints of typical
RDBMS systems
– Slow - RDBMS has not scaled with CPUs
– Often require complex data management
(SOX, SOR)
– Costly to build and maintain, slow to change and
adapt
– Intolerant of CAP models (more on this later)
• Non-relational models, usually key-value
• May be batched or streaming
• Not necessarily distributed geographically
Confidential and Proprietary
Big Data Myth #2: The CAP Theorem Doesn’t
Say What You Think It Does
• Consistency, Availability, (Network) Partition
• The Real Story: These are not Independent
Variables
• AP =CP (Um, what? But…A != C )
• Variations:
– PACELC (adds latency tolerance)
Takeaway: the real story here is about the tradeoffs
made by designers of different systems, and the
main tradeoff is between consistency and
availability, usually in favor of the latter.
Confidential and Proprietary
Big Data Hype Cycle: Where Are We Now?
There are currently more than 120+ NoSQL
databases listed at nosql-databases.com!
You Are Here ?
As the pace of new technology solutions has slowed, some clear winners have emerged.
Confidential and Proprietary
BIG DATA MYTH #3: BIG DATA AND NOSQL
ARE NEW IDEAS
• The first and most successful
such system is DNS, created in
1983.
• Began with flat files
• Currently serves the entire
Internet (!)
• DNS is an AP
system, availability is #1
• Many extensions complicate a
simple design
• Suggests a new term for CAP-
like ideas: variability
• DNS variability is very
high, often 2-3x the mean
Confidential and Proprietary 10Global In-memory MySQL
Big Myths About Big Data
Preview : YESQL: A
Counterexample
Q&A
Today’s Agenda
Confidential and Proprietary
“Develop a globally distributed DB For
user-related data.”
• Must Not Fail (99.999%)
• Must Not Lose Data. Period.
• Must Support Transactions
• Must Support (some) SQL
• Must WriteRead 32-bit integer globally in
1000ms
• Maximum Data Volume: 100 TB
• Must Scale Linearly with Costs
Mission YESQL
Confidential and Proprietary
What about “High Performance”?
•Maximum lightspeed distance on Earth’s
Surface: ~67 ms
•Target: data available worldwide in < 1000 ms
Sound Easy?
Think Again!
Confidential and Proprietary
Architecture Stack
A B A B A B
A B A B
5 AWS Data Centers:
US-E, US-W, TK, EU, AS
A B
A B
Scale by Tiling
Confidential and Proprietary
In The Full Session….
• More Big Data Myths
• YeSQL Architecture
• Failover
• Conservation of Timestamps!
• Join me today at 103o AM for the details!
Confidential and Proprietary
Summing Up: The Big Picture on Big Data
• Only use Big Data solutions when you have a real
Big Data problem.
– Don‟t be a Dedicated Follower of Tech Fashion!
• Not all Big Data solutions are created equal
– What tradeoffs are most important to you?
– Consistency, Fault
Tolerance, Availability, Performance, Variability
• Is your data model a fit for NoSQL?
– You don‟t have to give up the relational model in
most cases, so don‟t!
• You can achieve high performance and
availability without giving up relational models
and read consistency! Just say YESQL!
Twitter: @daniel_b_austin
Emai: daaustin@paypal.com
“In the long run, we are all dead
eventually consistent.”
Maynard Keynes on NoSQL Databases
With apologies and thanks to the real DB experts, Andrew Goodman, Yves
Trudeau, Clement Frazer, Daniel Abadi, Kent Beck, and everyone else who
contributed. It really works!

More Related Content

What's hot

Bi 2.0 hadoop everywhere
Bi 2.0   hadoop everywhereBi 2.0   hadoop everywhere
Bi 2.0 hadoop everywhereDmitry Tolpeko
 
Using splunk for_big_data
Using splunk for_big_dataUsing splunk for_big_data
Using splunk for_big_dataAccenture
 
What makes an effective data team?
What makes an effective data team?What makes an effective data team?
What makes an effective data team?Snowplow Analytics
 
Full-Stack Data Science: How to be a One-person Data Team
Full-Stack Data Science: How to be a One-person Data TeamFull-Stack Data Science: How to be a One-person Data Team
Full-Stack Data Science: How to be a One-person Data TeamGreg Goltsov
 
QuantCell Research - The Big Data Spreadsheet
QuantCell Research - The Big Data SpreadsheetQuantCell Research - The Big Data Spreadsheet
QuantCell Research - The Big Data Spreadsheetinside-BigData.com
 
Introduction to Big Data
Introduction  to Big DataIntroduction  to Big Data
Introduction to Big DataMike Frampton
 

What's hot (7)

Bi 2.0 hadoop everywhere
Bi 2.0   hadoop everywhereBi 2.0   hadoop everywhere
Bi 2.0 hadoop everywhere
 
Using splunk for_big_data
Using splunk for_big_dataUsing splunk for_big_data
Using splunk for_big_data
 
What makes an effective data team?
What makes an effective data team?What makes an effective data team?
What makes an effective data team?
 
Ds01 data science
Ds01   data scienceDs01   data science
Ds01 data science
 
Full-Stack Data Science: How to be a One-person Data Team
Full-Stack Data Science: How to be a One-person Data TeamFull-Stack Data Science: How to be a One-person Data Team
Full-Stack Data Science: How to be a One-person Data Team
 
QuantCell Research - The Big Data Spreadsheet
QuantCell Research - The Big Data SpreadsheetQuantCell Research - The Big Data Spreadsheet
QuantCell Research - The Big Data Spreadsheet
 
Introduction to Big Data
Introduction  to Big DataIntroduction  to Big Data
Introduction to Big Data
 

Similar to Big myths about big data and new SQL alternatives

PayPal Big Data and MySQL Cluster
PayPal Big Data and MySQL ClusterPayPal Big Data and MySQL Cluster
PayPal Big Data and MySQL ClusterMat Keep
 
Lunch & Learn Intro to Big Data
Lunch & Learn Intro to Big DataLunch & Learn Intro to Big Data
Lunch & Learn Intro to Big DataMelissa Hornbostel
 
MySQL Cluster no PayPal
MySQL Cluster no PayPalMySQL Cluster no PayPal
MySQL Cluster no PayPalMySQL Brasil
 
NoSQL – Back to the Future or Yet Another DB Feature?
NoSQL – Back to the Future or Yet Another DB Feature?NoSQL – Back to the Future or Yet Another DB Feature?
NoSQL – Back to the Future or Yet Another DB Feature?Martin Scholl
 
Yes sql08 inmemorydb
Yes sql08 inmemorydbYes sql08 inmemorydb
Yes sql08 inmemorydbDaniel Austin
 
Virtue desk atomic-db vs relational vs everything
Virtue desk atomic-db vs relational vs everythingVirtue desk atomic-db vs relational vs everything
Virtue desk atomic-db vs relational vs everythingjfxm3671
 
OSDC 2018 | The operational brain: how new Paradigms like Machine Learning ar...
OSDC 2018 | The operational brain: how new Paradigms like Machine Learning ar...OSDC 2018 | The operational brain: how new Paradigms like Machine Learning ar...
OSDC 2018 | The operational brain: how new Paradigms like Machine Learning ar...NETWAYS
 
Strata Online_road_to_enterprise_data_2011
Strata Online_road_to_enterprise_data_2011Strata Online_road_to_enterprise_data_2011
Strata Online_road_to_enterprise_data_2011Lynn Langit
 
The return of big iron?
The return of big iron?The return of big iron?
The return of big iron?Ben Stopford
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapSrinath Perera
 
Big Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementBig Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementTony Bain
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataRoi Blanco
 
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012Gigaom
 
The Economics of Scale: Promises and Perils of Going Distributed
The Economics of Scale: Promises and Perils of Going DistributedThe Economics of Scale: Promises and Perils of Going Distributed
The Economics of Scale: Promises and Perils of Going DistributedTyler Treat
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache CassandraInstaclustr
 
SQLCAT: Tier-1 BI in the World of Big Data
SQLCAT: Tier-1 BI in the World of Big DataSQLCAT: Tier-1 BI in the World of Big Data
SQLCAT: Tier-1 BI in the World of Big DataDenny Lee
 
Where Does Big Data Meet Big Database - QCon 2012
Where Does Big Data Meet Big Database - QCon 2012Where Does Big Data Meet Big Database - QCon 2012
Where Does Big Data Meet Big Database - QCon 2012Ben Stopford
 
Storage Systems For Scalable systems
Storage Systems For Scalable systemsStorage Systems For Scalable systems
Storage Systems For Scalable systemselliando dias
 
Big iron 2 (published)
Big iron 2 (published)Big iron 2 (published)
Big iron 2 (published)Ben Stopford
 
IT Arena-2021
IT Arena-2021IT Arena-2021
IT Arena-2021b0ris_1
 

Similar to Big myths about big data and new SQL alternatives (20)

PayPal Big Data and MySQL Cluster
PayPal Big Data and MySQL ClusterPayPal Big Data and MySQL Cluster
PayPal Big Data and MySQL Cluster
 
Lunch & Learn Intro to Big Data
Lunch & Learn Intro to Big DataLunch & Learn Intro to Big Data
Lunch & Learn Intro to Big Data
 
MySQL Cluster no PayPal
MySQL Cluster no PayPalMySQL Cluster no PayPal
MySQL Cluster no PayPal
 
NoSQL – Back to the Future or Yet Another DB Feature?
NoSQL – Back to the Future or Yet Another DB Feature?NoSQL – Back to the Future or Yet Another DB Feature?
NoSQL – Back to the Future or Yet Another DB Feature?
 
Yes sql08 inmemorydb
Yes sql08 inmemorydbYes sql08 inmemorydb
Yes sql08 inmemorydb
 
Virtue desk atomic-db vs relational vs everything
Virtue desk atomic-db vs relational vs everythingVirtue desk atomic-db vs relational vs everything
Virtue desk atomic-db vs relational vs everything
 
OSDC 2018 | The operational brain: how new Paradigms like Machine Learning ar...
OSDC 2018 | The operational brain: how new Paradigms like Machine Learning ar...OSDC 2018 | The operational brain: how new Paradigms like Machine Learning ar...
OSDC 2018 | The operational brain: how new Paradigms like Machine Learning ar...
 
Strata Online_road_to_enterprise_data_2011
Strata Online_road_to_enterprise_data_2011Strata Online_road_to_enterprise_data_2011
Strata Online_road_to_enterprise_data_2011
 
The return of big iron?
The return of big iron?The return of big iron?
The return of big iron?
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and Roadmap
 
Big Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementBig Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data Management
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012
 
The Economics of Scale: Promises and Perils of Going Distributed
The Economics of Scale: Promises and Perils of Going DistributedThe Economics of Scale: Promises and Perils of Going Distributed
The Economics of Scale: Promises and Perils of Going Distributed
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache Cassandra
 
SQLCAT: Tier-1 BI in the World of Big Data
SQLCAT: Tier-1 BI in the World of Big DataSQLCAT: Tier-1 BI in the World of Big Data
SQLCAT: Tier-1 BI in the World of Big Data
 
Where Does Big Data Meet Big Database - QCon 2012
Where Does Big Data Meet Big Database - QCon 2012Where Does Big Data Meet Big Database - QCon 2012
Where Does Big Data Meet Big Database - QCon 2012
 
Storage Systems For Scalable systems
Storage Systems For Scalable systemsStorage Systems For Scalable systems
Storage Systems For Scalable systems
 
Big iron 2 (published)
Big iron 2 (published)Big iron 2 (published)
Big iron 2 (published)
 
IT Arena-2021
IT Arena-2021IT Arena-2021
IT Arena-2021
 

More from Daniel Austin

Next generation web protocols
Next generation web protocolsNext generation web protocols
Next generation web protocolsDaniel Austin
 
Always Offline: Delay-Tolerant Networking for the Internet of Things
Always Offline: Delay-Tolerant Networking for the Internet of ThingsAlways Offline: Delay-Tolerant Networking for the Internet of Things
Always Offline: Delay-Tolerant Networking for the Internet of ThingsDaniel Austin
 
Performance: How Fast is Fast Enough?
Performance: How Fast is Fast Enough?Performance: How Fast is Fast Enough?
Performance: How Fast is Fast Enough?Daniel Austin
 
Big Data and the Future of Money 2014
Big Data and the Future of Money 2014Big Data and the Future of Money 2014
Big Data and the Future of Money 2014Daniel Austin
 
Big data comes in small packages v1.2
Big data comes in small packages v1.2Big data comes in small packages v1.2
Big data comes in small packages v1.2Daniel Austin
 
Designing Delay-tolerant Data Services for the Network of Things
Designing Delay-tolerant Data Services for the Network of ThingsDesigning Delay-tolerant Data Services for the Network of Things
Designing Delay-tolerant Data Services for the Network of ThingsDaniel Austin
 
Web Performance Bootcamp 2014
Web Performance Bootcamp 2014Web Performance Bootcamp 2014
Web Performance Bootcamp 2014Daniel Austin
 
HTML5, HTTP2, and You 1.1
HTML5, HTTP2, and You 1.1HTML5, HTTP2, and You 1.1
HTML5, HTTP2, and You 1.1Daniel Austin
 
Managing Performance Globally with MySQL
Managing Performance Globally with MySQLManaging Performance Globally with MySQL
Managing Performance Globally with MySQLDaniel Austin
 
Web Performance BootCamp 2013
Web Performance BootCamp 2013Web Performance BootCamp 2013
Web Performance BootCamp 2013Daniel Austin
 
Perspectives on the Evolution of HTML
Perspectives on the Evolution of HTMLPerspectives on the Evolution of HTML
Perspectives on the Evolution of HTMLDaniel Austin
 
The Fastest Possible Search Algorithm: Grover's Search and the World of Quant...
The Fastest Possible Search Algorithm: Grover's Search and the World of Quant...The Fastest Possible Search Algorithm: Grover's Search and the World of Quant...
The Fastest Possible Search Algorithm: Grover's Search and the World of Quant...Daniel Austin
 
Quantum Computing in a Nutshell: Grover's Search and the World of Quantum Com...
Quantum Computing in a Nutshell: Grover's Search and the World of Quantum Com...Quantum Computing in a Nutshell: Grover's Search and the World of Quantum Com...
Quantum Computing in a Nutshell: Grover's Search and the World of Quantum Com...Daniel Austin
 
Reconceiving the Web as a Distributed (NoSQL) Data System
Reconceiving the Web as a Distributed (NoSQL) Data SystemReconceiving the Web as a Distributed (NoSQL) Data System
Reconceiving the Web as a Distributed (NoSQL) Data SystemDaniel Austin
 
Big data and the Future of Money (World Big Data Congress 2013)
Big data and the Future of Money (World Big Data Congress 2013)Big data and the Future of Money (World Big Data Congress 2013)
Big data and the Future of Money (World Big Data Congress 2013)Daniel Austin
 
Performance analysisclass
Performance analysisclassPerformance analysisclass
Performance analysisclassDaniel Austin
 
The Fastest Possible Search Algorithm
The Fastest Possible Search AlgorithmThe Fastest Possible Search Algorithm
The Fastest Possible Search AlgorithmDaniel Austin
 
A Global In-memory Data System for MySQL
A Global In-memory Data System for MySQLA Global In-memory Data System for MySQL
A Global In-memory Data System for MySQLDaniel Austin
 
Notes on a High-Performance JSON Protocol
Notes on a High-Performance JSON ProtocolNotes on a High-Performance JSON Protocol
Notes on a High-Performance JSON ProtocolDaniel Austin
 
Wrestling Large Data Volumes to the Ground
Wrestling Large Data Volumes to the GroundWrestling Large Data Volumes to the Ground
Wrestling Large Data Volumes to the GroundDaniel Austin
 

More from Daniel Austin (20)

Next generation web protocols
Next generation web protocolsNext generation web protocols
Next generation web protocols
 
Always Offline: Delay-Tolerant Networking for the Internet of Things
Always Offline: Delay-Tolerant Networking for the Internet of ThingsAlways Offline: Delay-Tolerant Networking for the Internet of Things
Always Offline: Delay-Tolerant Networking for the Internet of Things
 
Performance: How Fast is Fast Enough?
Performance: How Fast is Fast Enough?Performance: How Fast is Fast Enough?
Performance: How Fast is Fast Enough?
 
Big Data and the Future of Money 2014
Big Data and the Future of Money 2014Big Data and the Future of Money 2014
Big Data and the Future of Money 2014
 
Big data comes in small packages v1.2
Big data comes in small packages v1.2Big data comes in small packages v1.2
Big data comes in small packages v1.2
 
Designing Delay-tolerant Data Services for the Network of Things
Designing Delay-tolerant Data Services for the Network of ThingsDesigning Delay-tolerant Data Services for the Network of Things
Designing Delay-tolerant Data Services for the Network of Things
 
Web Performance Bootcamp 2014
Web Performance Bootcamp 2014Web Performance Bootcamp 2014
Web Performance Bootcamp 2014
 
HTML5, HTTP2, and You 1.1
HTML5, HTTP2, and You 1.1HTML5, HTTP2, and You 1.1
HTML5, HTTP2, and You 1.1
 
Managing Performance Globally with MySQL
Managing Performance Globally with MySQLManaging Performance Globally with MySQL
Managing Performance Globally with MySQL
 
Web Performance BootCamp 2013
Web Performance BootCamp 2013Web Performance BootCamp 2013
Web Performance BootCamp 2013
 
Perspectives on the Evolution of HTML
Perspectives on the Evolution of HTMLPerspectives on the Evolution of HTML
Perspectives on the Evolution of HTML
 
The Fastest Possible Search Algorithm: Grover's Search and the World of Quant...
The Fastest Possible Search Algorithm: Grover's Search and the World of Quant...The Fastest Possible Search Algorithm: Grover's Search and the World of Quant...
The Fastest Possible Search Algorithm: Grover's Search and the World of Quant...
 
Quantum Computing in a Nutshell: Grover's Search and the World of Quantum Com...
Quantum Computing in a Nutshell: Grover's Search and the World of Quantum Com...Quantum Computing in a Nutshell: Grover's Search and the World of Quantum Com...
Quantum Computing in a Nutshell: Grover's Search and the World of Quantum Com...
 
Reconceiving the Web as a Distributed (NoSQL) Data System
Reconceiving the Web as a Distributed (NoSQL) Data SystemReconceiving the Web as a Distributed (NoSQL) Data System
Reconceiving the Web as a Distributed (NoSQL) Data System
 
Big data and the Future of Money (World Big Data Congress 2013)
Big data and the Future of Money (World Big Data Congress 2013)Big data and the Future of Money (World Big Data Congress 2013)
Big data and the Future of Money (World Big Data Congress 2013)
 
Performance analysisclass
Performance analysisclassPerformance analysisclass
Performance analysisclass
 
The Fastest Possible Search Algorithm
The Fastest Possible Search AlgorithmThe Fastest Possible Search Algorithm
The Fastest Possible Search Algorithm
 
A Global In-memory Data System for MySQL
A Global In-memory Data System for MySQLA Global In-memory Data System for MySQL
A Global In-memory Data System for MySQL
 
Notes on a High-Performance JSON Protocol
Notes on a High-Performance JSON ProtocolNotes on a High-Performance JSON Protocol
Notes on a High-Performance JSON Protocol
 
Wrestling Large Data Volumes to the Ground
Wrestling Large Data Volumes to the GroundWrestling Large Data Volumes to the Ground
Wrestling Large Data Volumes to the Ground
 

Recently uploaded

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 

Recently uploaded (20)

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 

Big myths about big data and new SQL alternatives

  • 1. MySQL Connect Conference Keynote Address September 30, 2012 v1.2 Big Data is a Big Scam (Most of the Time) Daniel Austin, PayPal Technical Staff
  • 2. Confidential and Proprietary 2Global In-memory MySQL Big Myths About Big Data Preview - YESQL: A Counterexample Today’s Agenda
  • 3. Confidential and Proprietary THE FUNDAMENTAL PROBLEM IN DISTRIBUTED DATA SYSTEMS “How Do We Manage Reliable Distribution of Data Across Geographical Distances?”
  • 4. Confidential and Proprietary Big Data Myth #1: Big Data = NoSQL • „Big Data‟ Refers to a Common Set of Problems – Large Volumes – High Rates of Change • Of Data • Of Data Models • Of Data Presentation and Output – Often Require „Fast Data‟ as well as „Big‟ • Near-real Time Analytics • Mapping Complex Structures Takeaway: Big Data is the problem, NoSQL is one (proposed) solution
  • 5. Confidential and Proprietary D oYou Need A Big Data System? Well, Maybe….But Before You Go There… There are essentially two „Big Data Problems‟: “I have too much data and it‟s coming in too fast to handle with any RDBMS.” “I have a lot of data distributed geographically and need to be able to read and write from anywhere in near real-time.” Takeaway: if you have one of these Big Data problems, a NoSQL solution might work for you. But there are also other alternatives…
  • 6. Confidential and Proprietary The NoSQL Solution • NoSQL Systems provide a solution that relaxes many of the common constraints of typical RDBMS systems – Slow - RDBMS has not scaled with CPUs – Often require complex data management (SOX, SOR) – Costly to build and maintain, slow to change and adapt – Intolerant of CAP models (more on this later) • Non-relational models, usually key-value • May be batched or streaming • Not necessarily distributed geographically
  • 7. Confidential and Proprietary Big Data Myth #2: The CAP Theorem Doesn’t Say What You Think It Does • Consistency, Availability, (Network) Partition • The Real Story: These are not Independent Variables • AP =CP (Um, what? But…A != C ) • Variations: – PACELC (adds latency tolerance) Takeaway: the real story here is about the tradeoffs made by designers of different systems, and the main tradeoff is between consistency and availability, usually in favor of the latter.
  • 8. Confidential and Proprietary Big Data Hype Cycle: Where Are We Now? There are currently more than 120+ NoSQL databases listed at nosql-databases.com! You Are Here ? As the pace of new technology solutions has slowed, some clear winners have emerged.
  • 9. Confidential and Proprietary BIG DATA MYTH #3: BIG DATA AND NOSQL ARE NEW IDEAS • The first and most successful such system is DNS, created in 1983. • Began with flat files • Currently serves the entire Internet (!) • DNS is an AP system, availability is #1 • Many extensions complicate a simple design • Suggests a new term for CAP- like ideas: variability • DNS variability is very high, often 2-3x the mean
  • 10. Confidential and Proprietary 10Global In-memory MySQL Big Myths About Big Data Preview : YESQL: A Counterexample Q&A Today’s Agenda
  • 11. Confidential and Proprietary “Develop a globally distributed DB For user-related data.” • Must Not Fail (99.999%) • Must Not Lose Data. Period. • Must Support Transactions • Must Support (some) SQL • Must WriteRead 32-bit integer globally in 1000ms • Maximum Data Volume: 100 TB • Must Scale Linearly with Costs Mission YESQL
  • 12. Confidential and Proprietary What about “High Performance”? •Maximum lightspeed distance on Earth’s Surface: ~67 ms •Target: data available worldwide in < 1000 ms Sound Easy? Think Again!
  • 13. Confidential and Proprietary Architecture Stack A B A B A B A B A B 5 AWS Data Centers: US-E, US-W, TK, EU, AS A B A B Scale by Tiling
  • 14. Confidential and Proprietary In The Full Session…. • More Big Data Myths • YeSQL Architecture • Failover • Conservation of Timestamps! • Join me today at 103o AM for the details!
  • 15. Confidential and Proprietary Summing Up: The Big Picture on Big Data • Only use Big Data solutions when you have a real Big Data problem. – Don‟t be a Dedicated Follower of Tech Fashion! • Not all Big Data solutions are created equal – What tradeoffs are most important to you? – Consistency, Fault Tolerance, Availability, Performance, Variability • Is your data model a fit for NoSQL? – You don‟t have to give up the relational model in most cases, so don‟t! • You can achieve high performance and availability without giving up relational models and read consistency! Just say YESQL!
  • 16. Twitter: @daniel_b_austin Emai: daaustin@paypal.com “In the long run, we are all dead eventually consistent.” Maynard Keynes on NoSQL Databases With apologies and thanks to the real DB experts, Andrew Goodman, Yves Trudeau, Clement Frazer, Daniel Abadi, Kent Beck, and everyone else who contributed. It really works!

Editor's Notes

  1. This is really the problem we want to solve. It’s one of the fundamental problems in computer science and doesn’t have a completely satisfactory solution.
  2. This is big myth #1. they are not at all necessarily even related, one could have either or both. These are good problems to have!
  3. The CAP Theorem is a limited version of the Systemic Qualities model.
  4. Mike’s talk last year, only
  5. Dr. Paul MockapetrisConsistency in DNS is a complicated idea
  6. Service Reliability. Must be buzzword compliant, as in RFC 2119 Tradeoffs discussed previously.
  7. Performance is response time.