SlideShare a Scribd company logo
®
© 2016 MapR Technologies 1®
© 2016 MapR Technologies 1© 2016 MapR Technologies
®
Scaling and Streaming in the Extreme
Jim Scott – Director, Enterprise Strategy & Architecture
@kingmesal #bigdataeverywhere
®
© 2016 MapR Technologies 2®
© 2016 MapR Technologies 2
Topics
•  Background
–  Fundamentals
•  Zeta Architecture overview
•  Messaging platform
–  Benefits
–  Building your applications
•  Including microservices
•  Story time with examples
®
© 2016 MapR Technologies 3®
© 2016 MapR Technologies 3© 2016 MapR Technologies© 2016 MapR Technologies
Background
®
© 2016 MapR Technologies 4®
© 2016 MapR Technologies 4
Data is the Problem
•  Stop talking about “Big Data” and start talking about “Data”
–  People argue over “what constitutes big data?”
•  Enterprise Architecture is the solution
–  Your business applications depend on data
•  Size REALLY doesn’t matter
–  I don’t have “big data” right now
–  Stop worrying about when you qualify your data as big
–  Build your applications so you do NOT have to rearchitect when you finally
qualify your data as “big”
•  Prepare for success
®
© 2016 MapR Technologies 5®
© 2016 MapR Technologies 5
All About Scaling
•  The Goal
–  Remove data silos and enable all ANALYTICS in one place
–  Remove the pain from figuring out how to get the data moved
•  How many servers do you need to run your business…
–  More than one application server?
–  More than one web server?
–  More than one database server?
–  More than one cluster?
•  Scalable resource management and infrastructure
®
© 2016 MapR Technologies 6®
© 2016 MapR Technologies 6
Proper Allocation of Resources
®
© 2016 MapR Technologies 7®
© 2016 MapR Technologies 7© 2016 MapR Technologies© 2016 MapR Technologies
Zeta Architecture
®
© 2016 MapR Technologies 8®
© 2016 MapR Technologies 8
The Next Generation Enterprise Architecture
•  Dynamic compute resources
•  Common storage platform
•  Real-time application support
•  Flexible programming models
•  Deployment management
•  Solution based approach
•  Applications to operate a
business
* This is a pluggable architecture
®
© 2016 MapR Technologies 9®
© 2016 MapR Technologies 9
Advertising Platform on Zeta
®
© 2016 MapR Technologies 10®
© 2016 MapR Technologies 10
Simplified Architecture
•  Less moving parts
–  Less things to go wrong
•  Better resource utilization
–  Scale any application up or down on demand
•  Common deployment model (new isolation model)
–  Repeatability between environments (dev, qa, production)
•  Improved integration testing
–  Listen to production streams in dev and qa (** this is a BIG DEAL! **)
•  Shared file system
–  Get at the data anywhere in the cluster
–  Simplifies business continuity
®
© 2016 MapR Technologies 11®
© 2016 MapR Technologies 11
Reminder…
®
© 2016 MapR Technologies 12®
© 2016 MapR Technologies 12© 2016 MapR Technologies© 2016 MapR Technologies
Messaging platform
®
© 2016 MapR Technologies 13®
© 2016 MapR Technologies 13
Ability to Handle the “Extreme”
•  1+ Trillion Events
–  per day
•  Millions of Producers
–  Billions of events per second
•  Multiple Consumers
–  Potentially for every event
•  Multiple Data Centers
–  Plan for success
–  Plan for drastic failure
Think that is crazy? Consider having 100
servers and performing:
Monitoring and Application logs…
–  100 metrics per server
–  60 samples per minute
–  50 metrics per request
–  1,000 log entries per request (abnormally
small, depends on level)
–  1million requests per day
~ 2 billion events per day, for one small
(ish) use case
Extreme Average Reality
®
© 2016 MapR Technologies 14®
© 2016 MapR Technologies 14
Which products are we discussing?
®
© 2016 MapR Technologies 15®
© 2016 MapR Technologies 15
Logical Dataflow
Messaging Analytics
Consumers
Stream Processors
®
© 2016 MapR Technologies 16®
© 2016 MapR Technologies 16
Considering a Messaging Platform
•  50-100k messages per second used to be good
–  Not really good to handle decoupled communication between services
•  Kafka model is BLAZING fast
–  Kafka 0.9 API with message sizes at 200 bytes
–  MapR Streams on a 5 node cluster sustained 18 million events / sec
–  Throughput of 3.5GB/s and over 1.5 trillion events / day
•  Manual sharding is not a “great” solution
–  Adding more servers should be easy and fool proof, not painful
–  Yes, I have lived through this
®
© 2016 MapR Technologies 17®
© 2016 MapR Technologies 17
Easy Scale-out
•  Stream processing engines built to consume via the Kafka API
–  Apache Flink
–  Apache Spark
–  Apache Apex (incubating)
–  Apache Storm
–  Apache Samza
–  Akka Streams - not apache ;-)
–  StreamSets (effectively a stream processing engine, but different)
•  Build your own (Simple API)
®
© 2016 MapR Technologies 18®
© 2016 MapR Technologies 18
Advertising Server Use Case
•  The redline is a message request
and response
–  Work distribution
•  1 to 1
•  1 to many
–  RPC Options
•  Manual sharding
•  Could automate, not easy
–  Decouple with a message
•  One topic to the ad engine
•  One topic per web server
•  What about exception cases
–  Web server dies
–  Ad server dies
®
© 2016 MapR Technologies 19®
© 2016 MapR Technologies 19
Behind the Curtains
Producer
Activity Handler
Producer
Producer
Historical
Interesting
Data Real-time
Analysis
Results Dashboard
Anomaly
Detection
®
© 2016 MapR Technologies 20®
© 2016 MapR Technologies 20© 2016 MapR Technologies© 2016 MapR Technologies
Story time with examples
®
© 2016 MapR Technologies 21®
© 2016 MapR Technologies 21
Ship picks up containers…
Singapore
®
© 2016 MapR Technologies 22®
© 2016 MapR Technologies 22
Arrives at destination…
Tokyo
®
© 2016 MapR Technologies 23®
© 2016 MapR Technologies 23
While enroute to next destination…
Washington
®
© 2016 MapR Technologies 24®
© 2016 MapR Technologies 24
Where does the data live…
Singapore Washington
Tokyo
®
© 2016 MapR Technologies 25®
© 2016 MapR Technologies 25
Feels like an Analogy
•  Data is generated on the ship
–  Must have an easy way (i.e. foolproof) to move the data off the ship
•  Each port stores the data from the ship
–  Moving data between locations
–  Analytics could happen at any location
•  This is a multi-data center time series data use case
–  Events from sensors = metrics
–  Same concepts as data center monitoring
®
© 2016 MapR Technologies 26®
© 2016 MapR Technologies 26
Sensor
Time series data
Metrics
Collector
Sensor
Sensor
Document
DB
Analytics
®
© 2016 MapR Technologies 27®
© 2016 MapR Technologies 27
Story Time Summary
•  Resiliency in the metrics collector
–  Easily scalable regardless of how many sensors are added
•  Replicate events between data centers
–  Security, business continuity, data ownership
•  Perform analytics at the source for different use cases
–  Analytics on the event stream
–  Analytics on aggregated data in the database
–  Maybe you want your event stream to be your database…
®
© 2016 MapR Technologies 28®
© 2016 MapR Technologies 28
“The truth
is out there.”
– Spock
®
© 2016 MapR Technologies 29®
© 2016 MapR Technologies 29© 2016 MapR Technologies© 2016 MapR Technologies
Wrap up
®
© 2016 MapR Technologies 30®
© 2016 MapR Technologies 30
®
© 2016 MapR Technologies 31®
© 2016 MapR Technologies 31
Q&A
@kingmesal
jscott@mapr.com
Engage with us!
kingmesal

More Related Content

What's hot

SIEM Modernization: Build a Situationally Aware Organization with Apache Kafka®
SIEM Modernization: Build a Situationally Aware Organization with Apache Kafka®SIEM Modernization: Build a Situationally Aware Organization with Apache Kafka®
SIEM Modernization: Build a Situationally Aware Organization with Apache Kafka®
confluent
 
Ingesting IoT data in Food Processing
Ingesting IoT data in Food ProcessingIngesting IoT data in Food Processing
Ingesting IoT data in Food Processing
confluent
 
"Introduction to Kx Technology", James Corcoran, Head of Engineering EMEA at ...
"Introduction to Kx Technology", James Corcoran, Head of Engineering EMEA at ..."Introduction to Kx Technology", James Corcoran, Head of Engineering EMEA at ...
"Introduction to Kx Technology", James Corcoran, Head of Engineering EMEA at ...
Dataconomy Media
 
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in ProductionTugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Codemotion
 
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
confluent
 
Pulsar - Real-time Analytics at Scale
Pulsar - Real-time Analytics at ScalePulsar - Real-time Analytics at Scale
Pulsar - Real-time Analytics at Scale
Tony Ng
 
Zero Downtime App Deployment using Hadoop
Zero Downtime App Deployment using HadoopZero Downtime App Deployment using Hadoop
Zero Downtime App Deployment using Hadoop
DataWorks Summit/Hadoop Summit
 
Building a real-time, scalable and intelligent programmatic ad buying platform
Building a real-time, scalable and intelligent programmatic ad buying platformBuilding a real-time, scalable and intelligent programmatic ad buying platform
Building a real-time, scalable and intelligent programmatic ad buying platform
Jampp
 
Scylla Summit 2022: An Odyssey to ScyllaDB and Apache Kafka
Scylla Summit 2022: An Odyssey to ScyllaDB and Apache KafkaScylla Summit 2022: An Odyssey to ScyllaDB and Apache Kafka
Scylla Summit 2022: An Odyssey to ScyllaDB and Apache Kafka
ScyllaDB
 
Taming velocity - a tale of four streams
Taming velocity - a tale of four streamsTaming velocity - a tale of four streams
Taming velocity - a tale of four streams
Emanuele Della Valle
 
Getting It Right Exactly Once: Principles for Streaming Architectures
Getting It Right Exactly Once: Principles for Streaming ArchitecturesGetting It Right Exactly Once: Principles for Streaming Architectures
Getting It Right Exactly Once: Principles for Streaming Architectures
SingleStore
 
Using Hazelcast in the Kappa architecture
Using Hazelcast in the Kappa architectureUsing Hazelcast in the Kappa architecture
Using Hazelcast in the Kappa architecture
Oliver Buckley-Salmon
 
Processing 19 billion messages in real time and NOT dying in the process
Processing 19 billion messages in real time and NOT dying in the processProcessing 19 billion messages in real time and NOT dying in the process
Processing 19 billion messages in real time and NOT dying in the process
Jampp
 
Serhii Kholodniuk: What you need to know, before migrating data platform to G...
Serhii Kholodniuk: What you need to know, before migrating data platform to G...Serhii Kholodniuk: What you need to know, before migrating data platform to G...
Serhii Kholodniuk: What you need to know, before migrating data platform to G...
Lviv Startup Club
 
Building Pinterest Real-Time Ads Platform Using Kafka Streams
Building Pinterest Real-Time Ads Platform Using Kafka Streams Building Pinterest Real-Time Ads Platform Using Kafka Streams
Building Pinterest Real-Time Ads Platform Using Kafka Streams
confluent
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Carol McDonald
 
Speed layer : Real time views in LAMBDA architecture
Speed layer : Real time views in LAMBDA architecture Speed layer : Real time views in LAMBDA architecture
Speed layer : Real time views in LAMBDA architecture
Tin Ho
 
Applying Machine Learning to Live Patient Data
Applying Machine Learning to  Live Patient DataApplying Machine Learning to  Live Patient Data
Applying Machine Learning to Live Patient Data
Carol McDonald
 
Fast Cars, Big Data - How Streaming Can Help Formula 1
Fast Cars, Big Data - How Streaming Can Help Formula 1Fast Cars, Big Data - How Streaming Can Help Formula 1
Fast Cars, Big Data - How Streaming Can Help Formula 1
Tugdual Grall
 

What's hot (20)

SIEM Modernization: Build a Situationally Aware Organization with Apache Kafka®
SIEM Modernization: Build a Situationally Aware Organization with Apache Kafka®SIEM Modernization: Build a Situationally Aware Organization with Apache Kafka®
SIEM Modernization: Build a Situationally Aware Organization with Apache Kafka®
 
Ingesting IoT data in Food Processing
Ingesting IoT data in Food ProcessingIngesting IoT data in Food Processing
Ingesting IoT data in Food Processing
 
"Introduction to Kx Technology", James Corcoran, Head of Engineering EMEA at ...
"Introduction to Kx Technology", James Corcoran, Head of Engineering EMEA at ..."Introduction to Kx Technology", James Corcoran, Head of Engineering EMEA at ...
"Introduction to Kx Technology", James Corcoran, Head of Engineering EMEA at ...
 
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in ProductionTugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
 
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
 
Pulsar - Real-time Analytics at Scale
Pulsar - Real-time Analytics at ScalePulsar - Real-time Analytics at Scale
Pulsar - Real-time Analytics at Scale
 
Zero Downtime App Deployment using Hadoop
Zero Downtime App Deployment using HadoopZero Downtime App Deployment using Hadoop
Zero Downtime App Deployment using Hadoop
 
Building a real-time, scalable and intelligent programmatic ad buying platform
Building a real-time, scalable and intelligent programmatic ad buying platformBuilding a real-time, scalable and intelligent programmatic ad buying platform
Building a real-time, scalable and intelligent programmatic ad buying platform
 
Scylla Summit 2022: An Odyssey to ScyllaDB and Apache Kafka
Scylla Summit 2022: An Odyssey to ScyllaDB and Apache KafkaScylla Summit 2022: An Odyssey to ScyllaDB and Apache Kafka
Scylla Summit 2022: An Odyssey to ScyllaDB and Apache Kafka
 
Taming velocity - a tale of four streams
Taming velocity - a tale of four streamsTaming velocity - a tale of four streams
Taming velocity - a tale of four streams
 
Getting It Right Exactly Once: Principles for Streaming Architectures
Getting It Right Exactly Once: Principles for Streaming ArchitecturesGetting It Right Exactly Once: Principles for Streaming Architectures
Getting It Right Exactly Once: Principles for Streaming Architectures
 
Using Hazelcast in the Kappa architecture
Using Hazelcast in the Kappa architectureUsing Hazelcast in the Kappa architecture
Using Hazelcast in the Kappa architecture
 
Processing 19 billion messages in real time and NOT dying in the process
Processing 19 billion messages in real time and NOT dying in the processProcessing 19 billion messages in real time and NOT dying in the process
Processing 19 billion messages in real time and NOT dying in the process
 
Serhii Kholodniuk: What you need to know, before migrating data platform to G...
Serhii Kholodniuk: What you need to know, before migrating data platform to G...Serhii Kholodniuk: What you need to know, before migrating data platform to G...
Serhii Kholodniuk: What you need to know, before migrating data platform to G...
 
Building Pinterest Real-Time Ads Platform Using Kafka Streams
Building Pinterest Real-Time Ads Platform Using Kafka Streams Building Pinterest Real-Time Ads Platform Using Kafka Streams
Building Pinterest Real-Time Ads Platform Using Kafka Streams
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
 
Speed layer : Real time views in LAMBDA architecture
Speed layer : Real time views in LAMBDA architecture Speed layer : Real time views in LAMBDA architecture
Speed layer : Real time views in LAMBDA architecture
 
Applying Machine Learning to Live Patient Data
Applying Machine Learning to  Live Patient DataApplying Machine Learning to  Live Patient Data
Applying Machine Learning to Live Patient Data
 
Fast Cars, Big Data - How Streaming Can Help Formula 1
Fast Cars, Big Data - How Streaming Can Help Formula 1Fast Cars, Big Data - How Streaming Can Help Formula 1
Fast Cars, Big Data - How Streaming Can Help Formula 1
 

Similar to Streaming in the Extreme

Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
MapR Technologies
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and Rain
MapR Technologies
 
Real World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in ProductionReal World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in Production
Codemotion
 
Map r seattle streams meetup oct 2016
Map r seattle streams meetup   oct 2016Map r seattle streams meetup   oct 2016
Map r seattle streams meetup oct 2016
Nitin Kumar
 
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Codemotion
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Mathieu Dumoulin
 
Where is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteWhere is Data Going? - RMDC Keynote
Where is Data Going? - RMDC Keynote
Ted Dunning
 
Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop
DataWorks Summit/Hadoop Summit
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
DataWorks Summit/Hadoop Summit
 
How Spark is Enabling the New Wave of Converged Applications
How Spark is Enabling  the New Wave of Converged ApplicationsHow Spark is Enabling  the New Wave of Converged Applications
How Spark is Enabling the New Wave of Converged Applications
MapR Technologies
 
Spark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating ExampleSpark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating Example
Ian Downard
 
HUG Italy meet-up with Fabian Wilckens, MapR EMEA Solutions Architect
HUG Italy meet-up with Fabian Wilckens, MapR EMEA Solutions ArchitectHUG Italy meet-up with Fabian Wilckens, MapR EMEA Solutions Architect
HUG Italy meet-up with Fabian Wilckens, MapR EMEA Solutions Architect
SpagoWorld
 
MapR 5.2: Getting More Value from the MapR Converged Community Edition
MapR 5.2: Getting More Value from the MapR Converged Community EditionMapR 5.2: Getting More Value from the MapR Converged Community Edition
MapR 5.2: Getting More Value from the MapR Converged Community Edition
MapR Technologies
 
Handling the Extremes: Scaling and Streaming in Finance
Handling the Extremes: Scaling and Streaming in FinanceHandling the Extremes: Scaling and Streaming in Finance
Handling the Extremes: Scaling and Streaming in Finance
MapR Technologies
 
MapR 5.2: Getting More Value from the MapR Converged Data Platform
MapR 5.2: Getting More Value from the MapR Converged Data PlatformMapR 5.2: Getting More Value from the MapR Converged Data Platform
MapR 5.2: Getting More Value from the MapR Converged Data Platform
MapR Technologies
 
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Tugdual Grall
 
Deep Learning vs. Cheap Learning
Deep Learning vs. Cheap LearningDeep Learning vs. Cheap Learning
Deep Learning vs. Cheap Learning
MapR Technologies
 
Is Spark Replacing Hadoop
Is Spark Replacing HadoopIs Spark Replacing Hadoop
Is Spark Replacing Hadoop
MapR Technologies
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine Learning
Ted Dunning
 
MapR and Machine Learning Primer
MapR and Machine Learning PrimerMapR and Machine Learning Primer
MapR and Machine Learning Primer
Mathieu Dumoulin
 

Similar to Streaming in the Extreme (20)

Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and Rain
 
Real World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in ProductionReal World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in Production
 
Map r seattle streams meetup oct 2016
Map r seattle streams meetup   oct 2016Map r seattle streams meetup   oct 2016
Map r seattle streams meetup oct 2016
 
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
 
Where is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteWhere is Data Going? - RMDC Keynote
Where is Data Going? - RMDC Keynote
 
Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
 
How Spark is Enabling the New Wave of Converged Applications
How Spark is Enabling  the New Wave of Converged ApplicationsHow Spark is Enabling  the New Wave of Converged Applications
How Spark is Enabling the New Wave of Converged Applications
 
Spark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating ExampleSpark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating Example
 
HUG Italy meet-up with Fabian Wilckens, MapR EMEA Solutions Architect
HUG Italy meet-up with Fabian Wilckens, MapR EMEA Solutions ArchitectHUG Italy meet-up with Fabian Wilckens, MapR EMEA Solutions Architect
HUG Italy meet-up with Fabian Wilckens, MapR EMEA Solutions Architect
 
MapR 5.2: Getting More Value from the MapR Converged Community Edition
MapR 5.2: Getting More Value from the MapR Converged Community EditionMapR 5.2: Getting More Value from the MapR Converged Community Edition
MapR 5.2: Getting More Value from the MapR Converged Community Edition
 
Handling the Extremes: Scaling and Streaming in Finance
Handling the Extremes: Scaling and Streaming in FinanceHandling the Extremes: Scaling and Streaming in Finance
Handling the Extremes: Scaling and Streaming in Finance
 
MapR 5.2: Getting More Value from the MapR Converged Data Platform
MapR 5.2: Getting More Value from the MapR Converged Data PlatformMapR 5.2: Getting More Value from the MapR Converged Data Platform
MapR 5.2: Getting More Value from the MapR Converged Data Platform
 
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
 
Deep Learning vs. Cheap Learning
Deep Learning vs. Cheap LearningDeep Learning vs. Cheap Learning
Deep Learning vs. Cheap Learning
 
Is Spark Replacing Hadoop
Is Spark Replacing HadoopIs Spark Replacing Hadoop
Is Spark Replacing Hadoop
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine Learning
 
MapR and Machine Learning Primer
MapR and Machine Learning PrimerMapR and Machine Learning Primer
MapR and Machine Learning Primer
 

Recently uploaded

FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 

Recently uploaded (20)

FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 

Streaming in the Extreme

  • 1. ® © 2016 MapR Technologies 1® © 2016 MapR Technologies 1© 2016 MapR Technologies ® Scaling and Streaming in the Extreme Jim Scott – Director, Enterprise Strategy & Architecture @kingmesal #bigdataeverywhere
  • 2. ® © 2016 MapR Technologies 2® © 2016 MapR Technologies 2 Topics •  Background –  Fundamentals •  Zeta Architecture overview •  Messaging platform –  Benefits –  Building your applications •  Including microservices •  Story time with examples
  • 3. ® © 2016 MapR Technologies 3® © 2016 MapR Technologies 3© 2016 MapR Technologies© 2016 MapR Technologies Background
  • 4. ® © 2016 MapR Technologies 4® © 2016 MapR Technologies 4 Data is the Problem •  Stop talking about “Big Data” and start talking about “Data” –  People argue over “what constitutes big data?” •  Enterprise Architecture is the solution –  Your business applications depend on data •  Size REALLY doesn’t matter –  I don’t have “big data” right now –  Stop worrying about when you qualify your data as big –  Build your applications so you do NOT have to rearchitect when you finally qualify your data as “big” •  Prepare for success
  • 5. ® © 2016 MapR Technologies 5® © 2016 MapR Technologies 5 All About Scaling •  The Goal –  Remove data silos and enable all ANALYTICS in one place –  Remove the pain from figuring out how to get the data moved •  How many servers do you need to run your business… –  More than one application server? –  More than one web server? –  More than one database server? –  More than one cluster? •  Scalable resource management and infrastructure
  • 6. ® © 2016 MapR Technologies 6® © 2016 MapR Technologies 6 Proper Allocation of Resources
  • 7. ® © 2016 MapR Technologies 7® © 2016 MapR Technologies 7© 2016 MapR Technologies© 2016 MapR Technologies Zeta Architecture
  • 8. ® © 2016 MapR Technologies 8® © 2016 MapR Technologies 8 The Next Generation Enterprise Architecture •  Dynamic compute resources •  Common storage platform •  Real-time application support •  Flexible programming models •  Deployment management •  Solution based approach •  Applications to operate a business * This is a pluggable architecture
  • 9. ® © 2016 MapR Technologies 9® © 2016 MapR Technologies 9 Advertising Platform on Zeta
  • 10. ® © 2016 MapR Technologies 10® © 2016 MapR Technologies 10 Simplified Architecture •  Less moving parts –  Less things to go wrong •  Better resource utilization –  Scale any application up or down on demand •  Common deployment model (new isolation model) –  Repeatability between environments (dev, qa, production) •  Improved integration testing –  Listen to production streams in dev and qa (** this is a BIG DEAL! **) •  Shared file system –  Get at the data anywhere in the cluster –  Simplifies business continuity
  • 11. ® © 2016 MapR Technologies 11® © 2016 MapR Technologies 11 Reminder…
  • 12. ® © 2016 MapR Technologies 12® © 2016 MapR Technologies 12© 2016 MapR Technologies© 2016 MapR Technologies Messaging platform
  • 13. ® © 2016 MapR Technologies 13® © 2016 MapR Technologies 13 Ability to Handle the “Extreme” •  1+ Trillion Events –  per day •  Millions of Producers –  Billions of events per second •  Multiple Consumers –  Potentially for every event •  Multiple Data Centers –  Plan for success –  Plan for drastic failure Think that is crazy? Consider having 100 servers and performing: Monitoring and Application logs… –  100 metrics per server –  60 samples per minute –  50 metrics per request –  1,000 log entries per request (abnormally small, depends on level) –  1million requests per day ~ 2 billion events per day, for one small (ish) use case Extreme Average Reality
  • 14. ® © 2016 MapR Technologies 14® © 2016 MapR Technologies 14 Which products are we discussing?
  • 15. ® © 2016 MapR Technologies 15® © 2016 MapR Technologies 15 Logical Dataflow Messaging Analytics Consumers Stream Processors
  • 16. ® © 2016 MapR Technologies 16® © 2016 MapR Technologies 16 Considering a Messaging Platform •  50-100k messages per second used to be good –  Not really good to handle decoupled communication between services •  Kafka model is BLAZING fast –  Kafka 0.9 API with message sizes at 200 bytes –  MapR Streams on a 5 node cluster sustained 18 million events / sec –  Throughput of 3.5GB/s and over 1.5 trillion events / day •  Manual sharding is not a “great” solution –  Adding more servers should be easy and fool proof, not painful –  Yes, I have lived through this
  • 17. ® © 2016 MapR Technologies 17® © 2016 MapR Technologies 17 Easy Scale-out •  Stream processing engines built to consume via the Kafka API –  Apache Flink –  Apache Spark –  Apache Apex (incubating) –  Apache Storm –  Apache Samza –  Akka Streams - not apache ;-) –  StreamSets (effectively a stream processing engine, but different) •  Build your own (Simple API)
  • 18. ® © 2016 MapR Technologies 18® © 2016 MapR Technologies 18 Advertising Server Use Case •  The redline is a message request and response –  Work distribution •  1 to 1 •  1 to many –  RPC Options •  Manual sharding •  Could automate, not easy –  Decouple with a message •  One topic to the ad engine •  One topic per web server •  What about exception cases –  Web server dies –  Ad server dies
  • 19. ® © 2016 MapR Technologies 19® © 2016 MapR Technologies 19 Behind the Curtains Producer Activity Handler Producer Producer Historical Interesting Data Real-time Analysis Results Dashboard Anomaly Detection
  • 20. ® © 2016 MapR Technologies 20® © 2016 MapR Technologies 20© 2016 MapR Technologies© 2016 MapR Technologies Story time with examples
  • 21. ® © 2016 MapR Technologies 21® © 2016 MapR Technologies 21 Ship picks up containers… Singapore
  • 22. ® © 2016 MapR Technologies 22® © 2016 MapR Technologies 22 Arrives at destination… Tokyo
  • 23. ® © 2016 MapR Technologies 23® © 2016 MapR Technologies 23 While enroute to next destination… Washington
  • 24. ® © 2016 MapR Technologies 24® © 2016 MapR Technologies 24 Where does the data live… Singapore Washington Tokyo
  • 25. ® © 2016 MapR Technologies 25® © 2016 MapR Technologies 25 Feels like an Analogy •  Data is generated on the ship –  Must have an easy way (i.e. foolproof) to move the data off the ship •  Each port stores the data from the ship –  Moving data between locations –  Analytics could happen at any location •  This is a multi-data center time series data use case –  Events from sensors = metrics –  Same concepts as data center monitoring
  • 26. ® © 2016 MapR Technologies 26® © 2016 MapR Technologies 26 Sensor Time series data Metrics Collector Sensor Sensor Document DB Analytics
  • 27. ® © 2016 MapR Technologies 27® © 2016 MapR Technologies 27 Story Time Summary •  Resiliency in the metrics collector –  Easily scalable regardless of how many sensors are added •  Replicate events between data centers –  Security, business continuity, data ownership •  Perform analytics at the source for different use cases –  Analytics on the event stream –  Analytics on aggregated data in the database –  Maybe you want your event stream to be your database…
  • 28. ® © 2016 MapR Technologies 28® © 2016 MapR Technologies 28 “The truth is out there.” – Spock
  • 29. ® © 2016 MapR Technologies 29® © 2016 MapR Technologies 29© 2016 MapR Technologies© 2016 MapR Technologies Wrap up
  • 30. ® © 2016 MapR Technologies 30® © 2016 MapR Technologies 30
  • 31. ® © 2016 MapR Technologies 31® © 2016 MapR Technologies 31 Q&A @kingmesal jscott@mapr.com Engage with us! kingmesal