SlideShare a Scribd company logo
1 of 47
Download to read offline
page
THE EXPERT GUIDE TO FAST DATA
1
Why	
  VoltDB	
  is	
  the	
  solu/on	
  to	
  “Fast”	
  
	
  
page© 2015 VoltDB
OUR SPEAKERS
Dr. Mike Stonebraker of MIT
Co-founder of VoltDB
John Hugg
Senior Software Engineer, VoltDB
page© 2015 VoltDB
OUTLINE
•  Characteristics of fast data
•  Non-workable solutions
•  VoltDB solution
•  Lambda architecture solution
page© 2015 VoltDB
FAST DATA
•  Comes from humans
•  State management in multi-player internet games
•  E.g., leaderboards
•  Comes from the Internet of Things (IoT)
•  Real-time geo-positioning
•  E.g., Waze
•  Comes from both
•  E.g., stock market transactions
4
page© 2015 VoltDB
FAST DATA RATES
•  10 messages (transactions) per second
•  Use you cell phone
•  1,000 transactions per second
•  Use RDBMS (or whatever)
•  100,000 transactions per second
•  Now it gets interesting…
•  From now on we will use “transaction” and “message”
interchangeably
5
page© 2015 VoltDB
REQUIREMENTS FOR FAST DATA APPLICATIONS
•  Keep up
•  Obviously
•  And continue to do so when your load changes
•  Only game in town is “scale out”
•  Not “scale up”
•  Avoid pokey products
•  Product 1 executes 1,000 messages per core
•  Product 2 executes 25,000 messages per core
•  Difference between P1 and P2 on 100,000 messages per second is 4
cores versus 100 cores
6
page© 2015 VoltDB
REQUIREMENTS FOR FAST DATA APPLICATIONS
•  High level language
•  SQL!
•  Don’t want to code in “message assembler”
•  Augmented by windowing operations
•  E.g., moving average of IBM stock price every over last 10
trades
•  So-called windowed aggregates
7
page© 2015 VoltDB
REQUIREMENTS FOR FAST DATA APPLICATIONS
•  High availability (HA)
•  I don’t know anybody who will take down time these days
•  Requires a backup machine
•  And real-time failover
•  As well as restore on recovery
8
page© 2015 VoltDB
REQUIREMENTS FOR FAST DATA APPLICATIONS
•  Never lose my data
•  Unacceptable to lose my airline reservation
•  Or my standing on the leaderboard
•  Requires no data loss during failover
•  Unacceptable to drop transactions on the floor
9
page© 2015 VoltDB
REQUIREMENTS FOR FAST DATA APPLICATIONS
•  Data Consistency
•  Unacceptable to sell the last widget to multiple customers
•  Or do a money transfer, where only half of it gets done
•  Or produce an incorrect leaderboard
•  Requires standard ACID transactions
10
page© 2015 VoltDB
REQUIREMENTS FOR FAST DATA APPLICATIONS
•  Data Consistency for replicas
•  Unacceptable to sell the last widget to multiple customers
during a node failure
•  Or do a money transfer, where only half of it gets done
during a node failure
•  Requires standard ACID transactions
•  On replicas as well as data
•  Eventual consistency does not work!
11
page© 2015 VoltDB
NON-SOLUTIONS FOR FAST DATA
•  RDBMSs (Oracle, MySQL, …)
•  NoSQL (Cassandra, Mongo, …)
12
page© 2015 VoltDB
NON-SOLUTIONS FOR FAST DATA -- RDBMSS
•  Four major sources of overhead (assuming
data sits in main memory)
•  Buffer pool overhead
•  Locking overhead
•  Write-ahead log overhead
•  Threading overhead
•  In aggregate these account for ~90% of
the total time
13
page© 2015 VoltDB
NON-SOLUTIONS FOR FAST DATA -- RDBMSS
•  Slow, slow, slow, slow
•  Disk-based system (buffer pool overhead)
•  Record-level locking too expensive
•  Aries-style write ahead logging too expensive
•  Multi-threading latches are killing
•  Limited to a few thousand transactions per second
•  If you know you will never need to go faster, then this will
work
14
page© 2015 VoltDB
NON-SOLUTIONS FOR FAST DATA -- NOSQL
•  Low level language (message assembler)
•  No ACID!!!!
•  Buffer pool and threading overhead still present
•  Worst of all worlds – low performance and low
function
15
page© 2015 VoltDB
SOLUTIONS FOR FAST DATA
•  High performance main memory SQL-ACID DBMS
(VoltDB, Hekaton, Hana, …)
•  Complex event processing engine (CEP) (Storm,
Streambase, …)
16
page© 2015 VoltDB
EXAMPLE OPERATION ON FAST DATA
•  First hedge fund example
•  Find me a strawberry followed within 5 msec by a banana
followed with 10 msec by a grape
•  Look for complex patterns in a fire hose
•  CEP is a natural here
17
page© 2015 VoltDB
EXAMPLE OPERATION ON FAST DATA
•  Second hedge fund example
•  In a worldwide trading system
•  Keep the global state on the enterprise
•  For or against every stock in real time (msecs)
•  And ring the red telephone if there is too much risk
•  And don’t lose any messages!!!
•  Sweet spot for SQL-ACID-main-memory
18
page© 2015 VoltDB
CHARACTERIZATION
•  CEP natural for “big pattern little state” applications
•  Main memory SQL natural for “big state little pattern”
applications
•  Note that analytics applications are all in the second bucket
•  Anecdotal evidence that there are 3-4 big state problems for
every big pattern problem
•  “an unnamed but reliable source”
19
page© 2015 VoltDB
VOLTDB SOLUTION
•  SQL plus windows
•  Main memory
•  Scale out on N nodes
•  Very high performance
•  Figure 40,000 messages/transactions per core
per second
20
page© 2015 VoltDB
VOLTDB SOLUTION
•  ACID
•  With a lot of detailed trickery
•  ACID on local replicas
•  With more trickery
•  Optional ACID on remote replicas
•  Nobody is willing to pay the latency cost….
21
page
VOLTDB FAST DATA
DEMO
John Hugg
VoltDB Founding Engineering
page
© 2015 VoltDB
The	
  Lambda	
  Architecture	
  
Now with an Example
23
page© 2015 VoltDB
LAMBDA OVERVIEW
•  Batch processing is well understood and robust.
Latency is pretty horrific.
•  Stream processing is immediate.
Complex and not as robust to hardware or user failure.
•  Lambda Architecture says do both in parallel to compensate
Speed Layer & Batch Layer
24
page© 2015 VoltDB
EXAMPLE LAMBDA STACK
Speed Layer
Batch Layer
25
page© 2015 VoltDB
EXAMPLE PROBLEM
26
page© 2015 VoltDB
HOW MANY
PEOPLE USED
MY APP
TODAY?
27
page© 2015 VoltDB
HOW MANY
UNIQUE
USERS
INTERACTED
WITH MY APP
TODAY?
28
page© 2015 VoltDB
Open	
  Cupcake	
  Time	
  
App Identifier
Unique Device ID
appid = 87
deviceid = 12
29
page© 2015 VoltDB
Open	
  Cupcake	
  Time	
  
App Identifier
Unique Device ID
appid = 87
deviceid = 12
The	
  Lambda	
  Architecture	
  
30
page© 2015 VoltDB
1 MILLION
APPID,DEVICEID
PAIRS PER SECOND
31
page© 2015 VoltDB
Enter	
  HyperLogLog	
  
A	
  	
  method	
  of	
  es/ma/ng	
  cardinality.	
  
blob	
  =	
  update(integer,	
  blob)	
  
integer	
  =	
  estimate(blob)	
  
Fixed	
  blob	
  size.	
  
A	
  few	
  kilobytes	
  to	
  get	
  99%	
  accuracy.	
  
32
page© 2015 VoltDB
Open	
  Cupcake	
  Time	
  
App Identifier
Unique Device ID
appid = 87
deviceid = 12
33
page© 2015 VoltDB
Open	
  Cupcake	
  Time	
  
App Identifier
Unique Device ID
appid = 87
deviceid = 12
34
page© 2015 VoltDB
DECLARE SQL STATEMENTS
35
page© 2015 VoltDB
PARAMS ARE APP ID & DEVICE ID
36
page© 2015 VoltDB
GET ROW FOR THIS APP ID FROM STATE
37
page© 2015 VoltDB
CREATE A HYPERLOGLOG STRUCTURE FROM THE ROW
OR CREATE A NEW HLL IF NO ROW
38
page© 2015 VoltDB
ADD THIS UNIQUE ID TO THE HLL STRUCTURE
39
page© 2015 VoltDB
UPDATE ROW WITH NEW HLL BYTES AND THE COMPUTED
ESTIMATE
40
page© 2015 VoltDB
ADVANTAGES
41
page© 2015 VoltDB
LESS
COMPLEX
OPERATIONALLY
42
page© 2015 VoltDB
LESS CODE IN FEWER PLACES
•  HyperLogLog code is used entirely
within one stored procedure.
•  Client uses SQL + simple schema for
queries & reporting.
Less	
  	
  
Complex	
  	
  
Development	
  
SELECT	
  appid,	
  devicecount	
  	
  
FROM	
  estimates	
  	
  
ORDER	
  BY	
  devicecount	
  DESC	
  	
  
LIMIT	
  10;	
  
43
page© 2015 VoltDB
DEMO
44
page© 2015 VoltDB
WANT TO CELEBRATE MIKE?
Grab	
  your	
  commemora/ve	
  Stonebraker	
  Turing	
  award	
  t-­‐shirt.	
  
	
  
For	
  more	
  details	
  visit:	
  
	
  www.voltdb.com/stonebrakershirt	
  
	
  
	
  
45
page© 2015 VoltDB
QUESTIONS?
•  Use the chat window to type in your questions
•  Try VoltDB yourself:
Ø  Free trial of the Enterprise Edition:
•  www.voltdb.com/download
Ø  Try VoltDB in the Cloud
Ø  http://voltdb.com/products/cloud
Ø  Try the “Unique Devices” app
Ø  https://github.com/VoltDB/voltdb/tree/master/examples/uniquedevices
Ø  Open source version of VoltDB is available on github.com
46
page© 2015 VoltDB page
THANK YOU!
47

More Related Content

What's hot

Akmal Chaudhri - How to Build Streaming Data Applications: Evaluating the Top...
Akmal Chaudhri - How to Build Streaming Data Applications: Evaluating the Top...Akmal Chaudhri - How to Build Streaming Data Applications: Evaluating the Top...
Akmal Chaudhri - How to Build Streaming Data Applications: Evaluating the Top...
NoSQLmatters
 
Building Pinterest Real-Time Ads Platform Using Kafka Streams
Building Pinterest Real-Time Ads Platform Using Kafka Streams Building Pinterest Real-Time Ads Platform Using Kafka Streams
Building Pinterest Real-Time Ads Platform Using Kafka Streams
confluent
 
Event Streaming Architecture for Industry 4.0 - Abdelkrim Hadjidj & Jan Kuni...
Event Streaming Architecture for Industry 4.0 -  Abdelkrim Hadjidj & Jan Kuni...Event Streaming Architecture for Industry 4.0 -  Abdelkrim Hadjidj & Jan Kuni...
Event Streaming Architecture for Industry 4.0 - Abdelkrim Hadjidj & Jan Kuni...
Flink Forward
 
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
confluent
 

What's hot (20)

Mike Stonebraker on Designing An Architecture For Real-time Event Processing
Mike Stonebraker on Designing An Architecture For Real-time Event ProcessingMike Stonebraker on Designing An Architecture For Real-time Event Processing
Mike Stonebraker on Designing An Architecture For Real-time Event Processing
 
Pipelining the Heroes with Kafka and Graph
Pipelining the Heroes with Kafka and GraphPipelining the Heroes with Kafka and Graph
Pipelining the Heroes with Kafka and Graph
 
Ingesting IoT data in Food Processing
Ingesting IoT data in Food ProcessingIngesting IoT data in Food Processing
Ingesting IoT data in Food Processing
 
Achieving Real-Time Analytics at Hermes | Zulf Qureshi, HVR and Dr. Stefan Ro...
Achieving Real-Time Analytics at Hermes | Zulf Qureshi, HVR and Dr. Stefan Ro...Achieving Real-Time Analytics at Hermes | Zulf Qureshi, HVR and Dr. Stefan Ro...
Achieving Real-Time Analytics at Hermes | Zulf Qureshi, HVR and Dr. Stefan Ro...
 
Flink Forward Berlin 2017: Gyula Fora - Building and operating large-scale st...
Flink Forward Berlin 2017: Gyula Fora - Building and operating large-scale st...Flink Forward Berlin 2017: Gyula Fora - Building and operating large-scale st...
Flink Forward Berlin 2017: Gyula Fora - Building and operating large-scale st...
 
Modernising Change - Lime Point - Confluent - Kong
Modernising Change - Lime Point - Confluent - KongModernising Change - Lime Point - Confluent - Kong
Modernising Change - Lime Point - Confluent - Kong
 
Scylla Summit 2022: Scalable and Sustainable Supply Chains with DLT and ScyllaDB
Scylla Summit 2022: Scalable and Sustainable Supply Chains with DLT and ScyllaDBScylla Summit 2022: Scalable and Sustainable Supply Chains with DLT and ScyllaDB
Scylla Summit 2022: Scalable and Sustainable Supply Chains with DLT and ScyllaDB
 
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
 
What does an event mean? Manage the meaning of your data! | Andreas Wombacher...
What does an event mean? Manage the meaning of your data! | Andreas Wombacher...What does an event mean? Manage the meaning of your data! | Andreas Wombacher...
What does an event mean? Manage the meaning of your data! | Andreas Wombacher...
 
Generali connection platform_full
Generali connection platform_fullGenerali connection platform_full
Generali connection platform_full
 
Akmal Chaudhri - How to Build Streaming Data Applications: Evaluating the Top...
Akmal Chaudhri - How to Build Streaming Data Applications: Evaluating the Top...Akmal Chaudhri - How to Build Streaming Data Applications: Evaluating the Top...
Akmal Chaudhri - How to Build Streaming Data Applications: Evaluating the Top...
 
Real time data processing and model inferncing platform with Kafka streams (N...
Real time data processing and model inferncing platform with Kafka streams (N...Real time data processing and model inferncing platform with Kafka streams (N...
Real time data processing and model inferncing platform with Kafka streams (N...
 
Building Pinterest Real-Time Ads Platform Using Kafka Streams
Building Pinterest Real-Time Ads Platform Using Kafka Streams Building Pinterest Real-Time Ads Platform Using Kafka Streams
Building Pinterest Real-Time Ads Platform Using Kafka Streams
 
Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...
Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...
Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...
 
Event Streaming Architecture for Industry 4.0 - Abdelkrim Hadjidj & Jan Kuni...
Event Streaming Architecture for Industry 4.0 -  Abdelkrim Hadjidj & Jan Kuni...Event Streaming Architecture for Industry 4.0 -  Abdelkrim Hadjidj & Jan Kuni...
Event Streaming Architecture for Industry 4.0 - Abdelkrim Hadjidj & Jan Kuni...
 
Streaming data in the cloud with Confluent and MongoDB Atlas | Robert Waters,...
Streaming data in the cloud with Confluent and MongoDB Atlas | Robert Waters,...Streaming data in the cloud with Confluent and MongoDB Atlas | Robert Waters,...
Streaming data in the cloud with Confluent and MongoDB Atlas | Robert Waters,...
 
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
 
Streaming Data in the Cloud with Confluent and MongoDB Atlas | Robert Walters...
Streaming Data in the Cloud with Confluent and MongoDB Atlas | Robert Walters...Streaming Data in the Cloud with Confluent and MongoDB Atlas | Robert Walters...
Streaming Data in the Cloud with Confluent and MongoDB Atlas | Robert Walters...
 
Flink Forward Berlin 2017: Bas Geerdink, Martijn Visser - Fast Data at ING - ...
Flink Forward Berlin 2017: Bas Geerdink, Martijn Visser - Fast Data at ING - ...Flink Forward Berlin 2017: Bas Geerdink, Martijn Visser - Fast Data at ING - ...
Flink Forward Berlin 2017: Bas Geerdink, Martijn Visser - Fast Data at ING - ...
 
Kafka and Kafka Streams in the Global Schibsted Data Platform
Kafka and Kafka Streams in the Global Schibsted Data PlatformKafka and Kafka Streams in the Global Schibsted Data Platform
Kafka and Kafka Streams in the Global Schibsted Data Platform
 

Viewers also liked

Viewers also liked (7)

Building Fast Applications for Streaming Data
Building Fast Applications for Streaming DataBuilding Fast Applications for Streaming Data
Building Fast Applications for Streaming Data
 
Hadoop at Tapad
Hadoop at TapadHadoop at Tapad
Hadoop at Tapad
 
Fast Data – the New Big Data
Fast Data – the New Big DataFast Data – the New Big Data
Fast Data – the New Big Data
 
Always Valid Inference (Ramesh Johari, Stanford)
Always Valid Inference (Ramesh Johari, Stanford)Always Valid Inference (Ramesh Johari, Stanford)
Always Valid Inference (Ramesh Johari, Stanford)
 
Sf data mining_meetup
Sf data mining_meetupSf data mining_meetup
Sf data mining_meetup
 
Understanding the Top Four Use Cases for IoT
Understanding the Top Four Use Cases for IoTUnderstanding the Top Four Use Cases for IoT
Understanding the Top Four Use Cases for IoT
 
Fast Data Overview
Fast Data OverviewFast Data Overview
Fast Data Overview
 

Similar to The Expert Guide to Fast Data

SimplifyStreamingArchitecture
SimplifyStreamingArchitectureSimplifyStreamingArchitecture
SimplifyStreamingArchitecture
Maheedhar Gunturu
 
PPCD_And_AmazonRDS
PPCD_And_AmazonRDSPPCD_And_AmazonRDS
PPCD_And_AmazonRDS
Vibhor Kumar
 
Creating an Elastic Platform Using Kafka and Microservices in OpenShift
Creating an Elastic Platform Using Kafka and Microservices in OpenShift Creating an Elastic Platform Using Kafka and Microservices in OpenShift
Creating an Elastic Platform Using Kafka and Microservices in OpenShift
confluent
 

Similar to The Expert Guide to Fast Data (20)

Powering Fast Data and the Hadoop Ecosystem with VoltDB and Hortonworks
Powering Fast Data and the Hadoop Ecosystem with VoltDB and HortonworksPowering Fast Data and the Hadoop Ecosystem with VoltDB and Hortonworks
Powering Fast Data and the Hadoop Ecosystem with VoltDB and Hortonworks
 
How to build streaming data applications - evaluating the top contenders
How to build streaming data applications - evaluating the top contendersHow to build streaming data applications - evaluating the top contenders
How to build streaming data applications - evaluating the top contenders
 
SimplifyStreamingArchitecture
SimplifyStreamingArchitectureSimplifyStreamingArchitecture
SimplifyStreamingArchitecture
 
Eat Your Data and Have It Too: Get the Blazing Performance of In-Memory Opera...
Eat Your Data and Have It Too: Get the Blazing Performance of In-Memory Opera...Eat Your Data and Have It Too: Get the Blazing Performance of In-Memory Opera...
Eat Your Data and Have It Too: Get the Blazing Performance of In-Memory Opera...
 
NewSQL vs NoSQL for New OLTP
NewSQL vs NoSQL for New OLTPNewSQL vs NoSQL for New OLTP
NewSQL vs NoSQL for New OLTP
 
IMCSummit 2015 - 1 IT Business - The Evolution of Pivotal Gemfire
IMCSummit 2015 - 1 IT Business  - The Evolution of Pivotal GemfireIMCSummit 2015 - 1 IT Business  - The Evolution of Pivotal Gemfire
IMCSummit 2015 - 1 IT Business - The Evolution of Pivotal Gemfire
 
NoSQL and ACID
NoSQL and ACIDNoSQL and ACID
NoSQL and ACID
 
Navigating Transactions: ACID Complexity in Modern Databases
Navigating Transactions: ACID Complexity in Modern DatabasesNavigating Transactions: ACID Complexity in Modern Databases
Navigating Transactions: ACID Complexity in Modern Databases
 
Navigating Transactions: ACID Complexity in Modern Databases- Mydbops Open So...
Navigating Transactions: ACID Complexity in Modern Databases- Mydbops Open So...Navigating Transactions: ACID Complexity in Modern Databases- Mydbops Open So...
Navigating Transactions: ACID Complexity in Modern Databases- Mydbops Open So...
 
Using a Fast Operational Database to Build Real-time Streaming Aggregations
Using a Fast Operational Database to Build Real-time Streaming AggregationsUsing a Fast Operational Database to Build Real-time Streaming Aggregations
Using a Fast Operational Database to Build Real-time Streaming Aggregations
 
Database Virtualization: The Next Wave of Big Data
Database Virtualization: The Next Wave of Big DataDatabase Virtualization: The Next Wave of Big Data
Database Virtualization: The Next Wave of Big Data
 
Angular Owin Katana TypeScript
Angular Owin Katana TypeScriptAngular Owin Katana TypeScript
Angular Owin Katana TypeScript
 
PPCD_And_AmazonRDS
PPCD_And_AmazonRDSPPCD_And_AmazonRDS
PPCD_And_AmazonRDS
 
VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...
VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...
VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...
 
M|18 How Copart Switched to MariaDB and Reduced Costs During Growth
M|18 How Copart Switched to MariaDB and Reduced Costs During GrowthM|18 How Copart Switched to MariaDB and Reduced Costs During Growth
M|18 How Copart Switched to MariaDB and Reduced Costs During Growth
 
The role of NoSQL in the Next Generation of Financial Informatics
The role of NoSQL in the Next Generation of Financial InformaticsThe role of NoSQL in the Next Generation of Financial Informatics
The role of NoSQL in the Next Generation of Financial Informatics
 
Creating an Elastic Platform Using Kafka and Microservices in OpenShift
Creating an Elastic Platform Using Kafka and Microservices in OpenShift Creating an Elastic Platform Using Kafka and Microservices in OpenShift
Creating an Elastic Platform Using Kafka and Microservices in OpenShift
 
Best Practices for Architecting VDI with Flash Storage
Best Practices for Architecting VDI with Flash StorageBest Practices for Architecting VDI with Flash Storage
Best Practices for Architecting VDI with Flash Storage
 
Dubbo and Weidian's practice on micro-service architecture
Dubbo and Weidian's practice on micro-service architectureDubbo and Weidian's practice on micro-service architecture
Dubbo and Weidian's practice on micro-service architecture
 
ReactJS - Re-rendering pages in the age of the mutable DOM
ReactJS - Re-rendering pages in the age of the mutable DOMReactJS - Re-rendering pages in the age of the mutable DOM
ReactJS - Re-rendering pages in the age of the mutable DOM
 

More from VoltDB

More from VoltDB (16)

TripleLift: Preparing for a New Programmatic Ad-Tech World
TripleLift: Preparing for a New Programmatic Ad-Tech WorldTripleLift: Preparing for a New Programmatic Ad-Tech World
TripleLift: Preparing for a New Programmatic Ad-Tech World
 
Transforming Your Business with Fast Data – Five Use Case Examples
Transforming Your Business with Fast Data – Five Use Case ExamplesTransforming Your Business with Fast Data – Five Use Case Examples
Transforming Your Business with Fast Data – Five Use Case Examples
 
VoltDB and HPE Vertica Present: Building an IoT Architecture for Fast + Big Data
VoltDB and HPE Vertica Present: Building an IoT Architecture for Fast + Big DataVoltDB and HPE Vertica Present: Building an IoT Architecture for Fast + Big Data
VoltDB and HPE Vertica Present: Building an IoT Architecture for Fast + Big Data
 
Kyle Kingsbury Talks about the Jepsen Test: What VoltDB Learned About Data Ac...
Kyle Kingsbury Talks about the Jepsen Test: What VoltDB Learned About Data Ac...Kyle Kingsbury Talks about the Jepsen Test: What VoltDB Learned About Data Ac...
Kyle Kingsbury Talks about the Jepsen Test: What VoltDB Learned About Data Ac...
 
Moving Beyond Batch: Transactional Databases for Real-time Data
Moving Beyond Batch: Transactional Databases for Real-time DataMoving Beyond Batch: Transactional Databases for Real-time Data
Moving Beyond Batch: Transactional Databases for Real-time Data
 
Stored Procedure Superpowers: A Developer’s Guide
Stored Procedure Superpowers: A Developer’s GuideStored Procedure Superpowers: A Developer’s Guide
Stored Procedure Superpowers: A Developer’s Guide
 
Fast Data Choices: 5 Strategies for Evaluating Alternative Business and Techn...
Fast Data Choices: 5 Strategies for Evaluating Alternative Business and Techn...Fast Data Choices: 5 Strategies for Evaluating Alternative Business and Techn...
Fast Data Choices: 5 Strategies for Evaluating Alternative Business and Techn...
 
How First to Value Beats First to Market: Case Studies of Fast Data Success
How First to Value Beats First to Market: Case Studies of Fast Data SuccessHow First to Value Beats First to Market: Case Studies of Fast Data Success
How First to Value Beats First to Market: Case Studies of Fast Data Success
 
Lessons Learned: The Impact of Fast Data for Personalization
Lessons Learned: The Impact of Fast Data for PersonalizationLessons Learned: The Impact of Fast Data for Personalization
Lessons Learned: The Impact of Fast Data for Personalization
 
Fast Data for Competitive Advantage: 4 Steps to Expand your Window of Opportu...
Fast Data for Competitive Advantage: 4 Steps to Expand your Window of Opportu...Fast Data for Competitive Advantage: 4 Steps to Expand your Window of Opportu...
Fast Data for Competitive Advantage: 4 Steps to Expand your Window of Opportu...
 
Understanding the Operational Database Infrastructure for IoT and Fast Data
Understanding the Operational Database Infrastructure for IoT and Fast DataUnderstanding the Operational Database Infrastructure for IoT and Fast Data
Understanding the Operational Database Infrastructure for IoT and Fast Data
 
The Two Generals Problem
The Two Generals ProblemThe Two Generals Problem
The Two Generals Problem
 
How to Build Fast Data Applications: Evaluating the Top Contenders
How to Build Fast Data Applications: Evaluating the Top ContendersHow to Build Fast Data Applications: Evaluating the Top Contenders
How to Build Fast Data Applications: Evaluating the Top Contenders
 
The 10 MS Rule: Getting to 'Yes' with Fast Data & Hadoop
The 10 MS Rule: Getting to 'Yes' with Fast Data & HadoopThe 10 MS Rule: Getting to 'Yes' with Fast Data & Hadoop
The 10 MS Rule: Getting to 'Yes' with Fast Data & Hadoop
 
The State of Streaming Analytics: The Need for Speed and Scale
The State of Streaming Analytics: The Need for Speed and ScaleThe State of Streaming Analytics: The Need for Speed and Scale
The State of Streaming Analytics: The Need for Speed and Scale
 
Fast Data: Achieving Real-Time Data Analysis Across the Financial Data Continuum
Fast Data: Achieving Real-Time Data Analysis Across the Financial Data ContinuumFast Data: Achieving Real-Time Data Analysis Across the Financial Data Continuum
Fast Data: Achieving Real-Time Data Analysis Across the Financial Data Continuum
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Recently uploaded (20)

Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 

The Expert Guide to Fast Data

  • 1. page THE EXPERT GUIDE TO FAST DATA 1 Why  VoltDB  is  the  solu/on  to  “Fast”    
  • 2. page© 2015 VoltDB OUR SPEAKERS Dr. Mike Stonebraker of MIT Co-founder of VoltDB John Hugg Senior Software Engineer, VoltDB
  • 3. page© 2015 VoltDB OUTLINE •  Characteristics of fast data •  Non-workable solutions •  VoltDB solution •  Lambda architecture solution
  • 4. page© 2015 VoltDB FAST DATA •  Comes from humans •  State management in multi-player internet games •  E.g., leaderboards •  Comes from the Internet of Things (IoT) •  Real-time geo-positioning •  E.g., Waze •  Comes from both •  E.g., stock market transactions 4
  • 5. page© 2015 VoltDB FAST DATA RATES •  10 messages (transactions) per second •  Use you cell phone •  1,000 transactions per second •  Use RDBMS (or whatever) •  100,000 transactions per second •  Now it gets interesting… •  From now on we will use “transaction” and “message” interchangeably 5
  • 6. page© 2015 VoltDB REQUIREMENTS FOR FAST DATA APPLICATIONS •  Keep up •  Obviously •  And continue to do so when your load changes •  Only game in town is “scale out” •  Not “scale up” •  Avoid pokey products •  Product 1 executes 1,000 messages per core •  Product 2 executes 25,000 messages per core •  Difference between P1 and P2 on 100,000 messages per second is 4 cores versus 100 cores 6
  • 7. page© 2015 VoltDB REQUIREMENTS FOR FAST DATA APPLICATIONS •  High level language •  SQL! •  Don’t want to code in “message assembler” •  Augmented by windowing operations •  E.g., moving average of IBM stock price every over last 10 trades •  So-called windowed aggregates 7
  • 8. page© 2015 VoltDB REQUIREMENTS FOR FAST DATA APPLICATIONS •  High availability (HA) •  I don’t know anybody who will take down time these days •  Requires a backup machine •  And real-time failover •  As well as restore on recovery 8
  • 9. page© 2015 VoltDB REQUIREMENTS FOR FAST DATA APPLICATIONS •  Never lose my data •  Unacceptable to lose my airline reservation •  Or my standing on the leaderboard •  Requires no data loss during failover •  Unacceptable to drop transactions on the floor 9
  • 10. page© 2015 VoltDB REQUIREMENTS FOR FAST DATA APPLICATIONS •  Data Consistency •  Unacceptable to sell the last widget to multiple customers •  Or do a money transfer, where only half of it gets done •  Or produce an incorrect leaderboard •  Requires standard ACID transactions 10
  • 11. page© 2015 VoltDB REQUIREMENTS FOR FAST DATA APPLICATIONS •  Data Consistency for replicas •  Unacceptable to sell the last widget to multiple customers during a node failure •  Or do a money transfer, where only half of it gets done during a node failure •  Requires standard ACID transactions •  On replicas as well as data •  Eventual consistency does not work! 11
  • 12. page© 2015 VoltDB NON-SOLUTIONS FOR FAST DATA •  RDBMSs (Oracle, MySQL, …) •  NoSQL (Cassandra, Mongo, …) 12
  • 13. page© 2015 VoltDB NON-SOLUTIONS FOR FAST DATA -- RDBMSS •  Four major sources of overhead (assuming data sits in main memory) •  Buffer pool overhead •  Locking overhead •  Write-ahead log overhead •  Threading overhead •  In aggregate these account for ~90% of the total time 13
  • 14. page© 2015 VoltDB NON-SOLUTIONS FOR FAST DATA -- RDBMSS •  Slow, slow, slow, slow •  Disk-based system (buffer pool overhead) •  Record-level locking too expensive •  Aries-style write ahead logging too expensive •  Multi-threading latches are killing •  Limited to a few thousand transactions per second •  If you know you will never need to go faster, then this will work 14
  • 15. page© 2015 VoltDB NON-SOLUTIONS FOR FAST DATA -- NOSQL •  Low level language (message assembler) •  No ACID!!!! •  Buffer pool and threading overhead still present •  Worst of all worlds – low performance and low function 15
  • 16. page© 2015 VoltDB SOLUTIONS FOR FAST DATA •  High performance main memory SQL-ACID DBMS (VoltDB, Hekaton, Hana, …) •  Complex event processing engine (CEP) (Storm, Streambase, …) 16
  • 17. page© 2015 VoltDB EXAMPLE OPERATION ON FAST DATA •  First hedge fund example •  Find me a strawberry followed within 5 msec by a banana followed with 10 msec by a grape •  Look for complex patterns in a fire hose •  CEP is a natural here 17
  • 18. page© 2015 VoltDB EXAMPLE OPERATION ON FAST DATA •  Second hedge fund example •  In a worldwide trading system •  Keep the global state on the enterprise •  For or against every stock in real time (msecs) •  And ring the red telephone if there is too much risk •  And don’t lose any messages!!! •  Sweet spot for SQL-ACID-main-memory 18
  • 19. page© 2015 VoltDB CHARACTERIZATION •  CEP natural for “big pattern little state” applications •  Main memory SQL natural for “big state little pattern” applications •  Note that analytics applications are all in the second bucket •  Anecdotal evidence that there are 3-4 big state problems for every big pattern problem •  “an unnamed but reliable source” 19
  • 20. page© 2015 VoltDB VOLTDB SOLUTION •  SQL plus windows •  Main memory •  Scale out on N nodes •  Very high performance •  Figure 40,000 messages/transactions per core per second 20
  • 21. page© 2015 VoltDB VOLTDB SOLUTION •  ACID •  With a lot of detailed trickery •  ACID on local replicas •  With more trickery •  Optional ACID on remote replicas •  Nobody is willing to pay the latency cost…. 21
  • 22. page VOLTDB FAST DATA DEMO John Hugg VoltDB Founding Engineering
  • 23. page © 2015 VoltDB The  Lambda  Architecture   Now with an Example 23
  • 24. page© 2015 VoltDB LAMBDA OVERVIEW •  Batch processing is well understood and robust. Latency is pretty horrific. •  Stream processing is immediate. Complex and not as robust to hardware or user failure. •  Lambda Architecture says do both in parallel to compensate Speed Layer & Batch Layer 24
  • 25. page© 2015 VoltDB EXAMPLE LAMBDA STACK Speed Layer Batch Layer 25
  • 27. page© 2015 VoltDB HOW MANY PEOPLE USED MY APP TODAY? 27
  • 28. page© 2015 VoltDB HOW MANY UNIQUE USERS INTERACTED WITH MY APP TODAY? 28
  • 29. page© 2015 VoltDB Open  Cupcake  Time   App Identifier Unique Device ID appid = 87 deviceid = 12 29
  • 30. page© 2015 VoltDB Open  Cupcake  Time   App Identifier Unique Device ID appid = 87 deviceid = 12 The  Lambda  Architecture   30
  • 31. page© 2015 VoltDB 1 MILLION APPID,DEVICEID PAIRS PER SECOND 31
  • 32. page© 2015 VoltDB Enter  HyperLogLog   A    method  of  es/ma/ng  cardinality.   blob  =  update(integer,  blob)   integer  =  estimate(blob)   Fixed  blob  size.   A  few  kilobytes  to  get  99%  accuracy.   32
  • 33. page© 2015 VoltDB Open  Cupcake  Time   App Identifier Unique Device ID appid = 87 deviceid = 12 33
  • 34. page© 2015 VoltDB Open  Cupcake  Time   App Identifier Unique Device ID appid = 87 deviceid = 12 34
  • 35. page© 2015 VoltDB DECLARE SQL STATEMENTS 35
  • 36. page© 2015 VoltDB PARAMS ARE APP ID & DEVICE ID 36
  • 37. page© 2015 VoltDB GET ROW FOR THIS APP ID FROM STATE 37
  • 38. page© 2015 VoltDB CREATE A HYPERLOGLOG STRUCTURE FROM THE ROW OR CREATE A NEW HLL IF NO ROW 38
  • 39. page© 2015 VoltDB ADD THIS UNIQUE ID TO THE HLL STRUCTURE 39
  • 40. page© 2015 VoltDB UPDATE ROW WITH NEW HLL BYTES AND THE COMPUTED ESTIMATE 40
  • 43. page© 2015 VoltDB LESS CODE IN FEWER PLACES •  HyperLogLog code is used entirely within one stored procedure. •  Client uses SQL + simple schema for queries & reporting. Less     Complex     Development   SELECT  appid,  devicecount     FROM  estimates     ORDER  BY  devicecount  DESC     LIMIT  10;   43
  • 45. page© 2015 VoltDB WANT TO CELEBRATE MIKE? Grab  your  commemora/ve  Stonebraker  Turing  award  t-­‐shirt.     For  more  details  visit:    www.voltdb.com/stonebrakershirt       45
  • 46. page© 2015 VoltDB QUESTIONS? •  Use the chat window to type in your questions •  Try VoltDB yourself: Ø  Free trial of the Enterprise Edition: •  www.voltdb.com/download Ø  Try VoltDB in the Cloud Ø  http://voltdb.com/products/cloud Ø  Try the “Unique Devices” app Ø  https://github.com/VoltDB/voltdb/tree/master/examples/uniquedevices Ø  Open source version of VoltDB is available on github.com 46
  • 47. page© 2015 VoltDB page THANK YOU! 47