Streaming in the Extreme

®
© 2016 MapR Technologies 1®
© 2016 MapR Technologies 1© 2016 MapR Technologies
®
Scaling and Streaming in the Extreme
Jim Scott – Director, Enterprise Strategy & Architecture
@kingmesal #bigdataeverywhere

®
© 2016 MapR Technologies 2
Topics
•  Background
–  Fundamentals
•  Zeta Architecture overview
•  Messaging platform
–  Benefits
–  Building your applications
•  Including microservices
•  Story time with examples

®
© 2016 MapR Technologies 3© 2016 MapR Technologies© 2016 MapR Technologies
Background

®
Data is the Problem
•  Stop talking about “Big Data” and start talking about “Data”
–  People argue over “what constitutes big data?”
•  Enterprise Architecture is the solution
–  Your business applications depend on data
•  Size REALLY doesn’t matter
–  I don’t have “big data” right now
–  Stop worrying about when you qualify your data as big
–  Build your applications so you do NOT have to rearchitect when you finally
qualify your data as “big”
•  Prepare for success

®
All About Scaling
•  The Goal
–  Remove data silos and enable all ANALYTICS in one place
–  Remove the pain from figuring out how to get the data moved
•  How many servers do you need to run your business…
–  More than one application server?
–  More than one web server?
–  More than one database server?
–  More than one cluster?
•  Scalable resource management and infrastructure

®
Proper Allocation of Resources

®
Zeta Architecture

®
The Next Generation Enterprise Architecture
•  Dynamic compute resources
•  Common storage platform
•  Real-time application support
•  Flexible programming models
•  Deployment management
•  Solution based approach
•  Applications to operate a
business
* This is a pluggable architecture

®
Advertising Platform on Zeta

®
Simplified Architecture
•  Less moving parts
–  Less things to go wrong
•  Better resource utilization
–  Scale any application up or down on demand
•  Common deployment model (new isolation model)
–  Repeatability between environments (dev, qa, production)
•  Improved integration testing
–  Listen to production streams in dev and qa (** this is a BIG DEAL! **)
•  Shared file system
–  Get at the data anywhere in the cluster
–  Simplifies business continuity

®
Reminder…

®
Messaging platform

®
Ability to Handle the “Extreme”
•  1+ Trillion Events
–  per day
•  Millions of Producers
–  Billions of events per second
•  Multiple Consumers
–  Potentially for every event
•  Multiple Data Centers
–  Plan for success
–  Plan for drastic failure
Think that is crazy? Consider having 100
servers and performing:
Monitoring and Application logs…
–  100 metrics per server
–  60 samples per minute
–  50 metrics per request
–  1,000 log entries per request (abnormally
small, depends on level)
–  1million requests per day
~ 2 billion events per day, for one small
(ish) use case
Extreme Average Reality

®
Which products are we discussing?

®
Logical Dataflow
Messaging Analytics
Consumers
Stream Processors

®
Considering a Messaging Platform
•  50-100k messages per second used to be good
–  Not really good to handle decoupled communication between services
•  Kafka model is BLAZING fast
–  Kafka 0.9 API with message sizes at 200 bytes
–  MapR Streams on a 5 node cluster sustained 18 million events / sec
–  Throughput of 3.5GB/s and over 1.5 trillion events / day
•  Manual sharding is not a “great” solution
–  Adding more servers should be easy and fool proof, not painful
–  Yes, I have lived through this

®
Easy Scale-out
•  Stream processing engines built to consume via the Kafka API
–  Apache Flink
–  Apache Spark
–  Apache Apex (incubating)
–  Apache Storm
–  Apache Samza
–  Akka Streams - not apache ;-)
–  StreamSets (effectively a stream processing engine, but different)
•  Build your own (Simple API)

®
Advertising Server Use Case
•  The redline is a message request
and response
–  Work distribution
•  1 to 1
•  1 to many
–  RPC Options
•  Manual sharding
•  Could automate, not easy
–  Decouple with a message
•  One topic to the ad engine
•  One topic per web server
•  What about exception cases
–  Web server dies
–  Ad server dies

®
Behind the Curtains
Producer
Activity Handler
Producer
Producer
Historical
Interesting
Data Real-time
Analysis
Results Dashboard
Anomaly
Detection

®
Story time with examples

®
Ship picks up containers…
Singapore

®
Arrives at destination…
Tokyo

®
While enroute to next destination…
Washington

®
Where does the data live…
Singapore Washington
Tokyo

®
Feels like an Analogy
•  Data is generated on the ship
–  Must have an easy way (i.e. foolproof) to move the data off the ship
•  Each port stores the data from the ship
–  Moving data between locations
–  Analytics could happen at any location
•  This is a multi-data center time series data use case
–  Events from sensors = metrics
–  Same concepts as data center monitoring

®
Sensor
Time series data
Metrics
Collector
Sensor
Sensor
Document
DB
Analytics

®
Story Time Summary
•  Resiliency in the metrics collector
–  Easily scalable regardless of how many sensors are added
•  Replicate events between data centers
–  Security, business continuity, data ownership
•  Perform analytics at the source for different use cases
–  Analytics on the event stream
–  Analytics on aggregated data in the database
–  Maybe you want your event stream to be your database…

®
“The truth
is out there.”
– Spock

®
Wrap up

®

®
Q&A
@kingmesal
jscott@mapr.com
Engage with us!
kingmesal

Streaming in the Extreme

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Streaming in the Extreme

Similar to Streaming in the Extreme (20)

Recently uploaded

Recently uploaded (20)

Streaming in the Extreme