SlideShare a Scribd company logo
1 of 42
Download to read offline
®
© 2014 MapR Technologies 1
®
© 2014 MapR Technologies
Ted Dunning
®
© 2014 MapR Technologies 2
My Background
•  University, Startups
–  Aptex, MusicMatch, ID Analytics, Veoh
–  big data since before it was big
•  Open source
–  even before the internet
–  Calcite, Datafu, Drill, Kylin, Mahout, Myriad, Samoa, Singa, Storm,
Zookeeper
–  bought the beer at first HUG
•  VP Incubator at Apache Software Foundation
•  MapR
®
© 2014 MapR Technologies 3
5 Views of Streaming
•  Enterprise teams
•  Micro-services
•  Abstract computing
•  Delivering results
•  Synthesis
®
© 2014 MapR Technologies 4
File upload
web service
Files
Thumbnail
extraction
Transcoding
uploads
thumbs
recodes
Files
®
© 2014 MapR Technologies 5
The	first	story	
The	enterprise	bandit
®
© 2014 MapR Technologies 6
Evolution Beyond Massive Monolithic Systems
•  In monoliths, complexity of mainframe systems led to
specialization
–  Storage
–  DB
–  Systems analysis
–  Programmers
–  Operations
–  Data entry
•  This made n-tier architectures a natural next step
®
© 2014 MapR Technologies 7
3-tier Architecture
Web tier
Middle tier
Data tier
®
© 2014 MapR Technologies 8
3-tier Architecture (essence)
Web tier
Middle tier
Data tier
®
© 2014 MapR Technologies 9
3-tier, in Practice
Web tier
Middle tier
Data tier
Web tier
Middle tier
Data tier
Web tier
Middle tier
Data tier
Web tier
Middle tier
Data tier
®
© 2014 MapR Technologies 10
To Compensate, Add CONTROL
•  Enterprise service busses evolved to re-establish specialization
and centralize control
•  The key point is advanced processing embedded in a messaging
and control backbone
®
© 2014 MapR Technologies 11
Enterprise Service Bus
®
© 2014 MapR Technologies 12
Summary 1
•  Tiering leads to tic-tac-toe architectures
•  ESB leads to control-heavy balls of string
–  Better than balls of mud, but only just barely
•  This isn’t the answer
®
© 2014 MapR Technologies 13
The	second	story	
The	startup	Samurai
®
© 2014 MapR Technologies 14
Time for Some Pendulum Swinging
•  ESB’s are still alive and kicking
•  Backlash in progress
–  Google, Facebook, Netflix (after DVD mail), Amazon, LinkedIn (v. 2)
–  And a gazillion less well known companies
–  Heavily associated with SV startup scene
–  Heavily associated developments in javascript, Python communities
•  Meteor.js, node
•  Swagger
•  Consider transposing n-tier architectures
®
© 2014 MapR Technologies 15
RPC layer
Logic
Disk
RPC layer
Logic
Disk
RPC layer
Logic
Disk
Start with Partitioning
®
© 2014 MapR Technologies 16
Give Them a Job, and a Way to Communicate
Keep it very
light-weight!
®
© 2014 MapR Technologies 17
Results Can Be Stunning
•  Companies who adopted this style are associated with stunning
success
–  See previous list
•  Companies that did not are associated with …
•  Of course, this may just be what happens when you hire smart
folk
–  Indeed, may only be possible with crazy smart teams
®
© 2014 MapR Technologies 18
But …
•  Much of the discussion talks about RPC (call/response) services
•  This fine, but limiting
•  Key idiom is deferred processing
–  Do something urgently
–  Queue message to complete later
®
© 2014 MapR Technologies 19
But …
•  RPC is simple …
–  REST
–  Protobuf
–  Avro
–  Yada yada
•  No infrastructure needed beyond network + DNS + load balancer
–  And you have those already
•  Non-scalable persistence layers are often assumed
®
© 2014 MapR Technologies 20
For Message Based Services
•  The message receiver may not even be running right now
–  So we need a persistent queue
•  The number of messages is plausibly very high
–  Total number of external requests (x 5-10)
–  Total number of persistence ops (x 2-3)
•  Millions of messages, GB/s of traffic quite plausible
•  Moving this to enterprise from startups adds challenges
®
© 2014 MapR Technologies 21
Summary 2
•  Micro-services requires durable, high-performance message
queues
•  These systems don’t just like durable, high performance queues
•  These systems require durability. And high performance.
•  Old school queues need not apply
®
© 2014 MapR Technologies 22
The	third	story	
The	tired	metaphor
®
© 2014 MapR Technologies 23
The	third	story	
The	tired	metaphor	
The	view	from	the	clouds
®
© 2014 MapR Technologies 24
Not What Turing had in Mind
•  Conventional programs are all about batch
–  Given a finite input, produce a finite output
–  Key parameters are time to complete, cost, halting, correctness
–  OK for batch processing, OK for query/response
•  Stream processing is different
–  Given a finite prefix of an infinite input, produce a prefix of infinite output
–  We are allowed to change our minds of some of the output
–  Key parameters are latency, cost, commitment level
®
© 2014 MapR Technologies 25
Δt
tprovisional
Input
Output
Note that the existence
of provisional outputs
implies we have to handle
provisional inputs as well
®
© 2014 MapR Technologies 26
More Complications
•  Our latency isn’t the only story
•  We don’t get data instantly
•  So we don’t even start with zero latency
•  In fact, delay is the key problem in flow-based computing
®
© 2014 MapR Technologies 27
Thought Problem
•  What is the temperature everywhere on earth
–  Right now
–  This is impossible
•  What was the temperature everywhere on earth an hour ago?
–  This is hard
•  What was the temperature everywhere on earth last month?
–  This is pretty easy
•  Does this mean we cannot talk about today’s weather?
®
© 2014 MapR Technologies 28
The Problem of State
•  The present temperature of Earth may or may not exist
•  Only the delayed temperature can matter to a practical
computation
•  But computations in different places will see different delays
•  (promise me you know that I’m not just talking temperature)
®
© 2014 MapR Technologies 29
Summary 3
•  For important problems, we have to represent distributed
computations as messages and flows
•  This isn’t a matter of convenience
®
© 2014 MapR Technologies 30
The	n-th	story	
Getting	stuff	done	in		
the	real	world
®
© 2014 MapR Technologies 31
mySQL
mySQL
files
Web-site
Auth
service
Upload
service
Image
extractor
Transcoder
User
profiles
Search
User action
logging
Recommendation
analysis
mySQL
mySQL
mySQL
Oracle
Solr
Elastic
®
© 2014 MapR Technologies 32
mySQL
mySQL
files
Web-site
Auth
service
Upload
service
Image
extractor
Transcoder
User
profiles
Search
User action
logging
Recommendation
analysis
mySQL
mySQL
mySQL
Oracle
Solr
Elastic
®
© 2014 MapR Technologies 33
Micro-service Diagram
File upload
web service
Raw
files
Thumbnail
extraction
Transcoding
Video
metadata
Video
files
DB updater
DB
snapshots
Metadata
snapsMetadata
snapsMetadata
snap db
Live
metadata
DB
uploads
thumbs
recodes
video
adds
snaps
Image
files
®
© 2014 MapR Technologies 34
Some Omitted Details
File upload
web service
Files
Thumbnail
extraction
Transcoding
uploads
thumbs
recodes
Files
®
© 2014 MapR Technologies 35
More Omitted details
Thumbnail
extraction
uploads
thumbs
metrics
exceptions
checkpoints
Input
Output
Monitoring
Restart
®
© 2014 MapR Technologies 36
Some Real World Implications
•  Messaging must be durable and infrastructural
–  Can’t depend on sender or receiver actually running
•  Messages aren’t great for everything
–  1TB message?
•  We need (scalable) files
•  We need (scalable) tables
•  We need (scalable) streams
•  We still should isolate persistence if possible
®
© 2014 MapR Technologies 37
Some Real World Notes
•  Durable high-speed queuing is required
–  Kafka does this
–  MapR Streams does this
–  Very little else does it
•  Traditional messaging is not even close
–  With durability, most queues drop to <10k messages/second (MB/s)
–  High performance systems handle >1 M messages/second (GB/s)
•  Latencies less than 1ms are handled by different systems
•  Global scale introduces additional constraints not considered
here
®
© 2014 MapR Technologies 38
Summary
•  Micro-services is the natural evolution
•  Micro-services implies durable, fast, Kafka-
esque queuing
•  Macro and micro architectures are required
•  Infrastructure has to include more than just
queues. Must have files and tables also
Web tier
Middle tier
Data tier
Web tier
Middle tier
Data tier
Web tier
Middle tier
Data tier
Web tier
Middle tier
Data tier
Thumbnail
extraction
uploads
thumbs
metrics
exceptions
checkpoints
Input
Output
Monitoring
Restart
®
© 2014 MapR Technologies 39
®
© 2014 MapR Technologies 40
Other Short Books by Ted Dunning & Ellen Friedman
•  For sale from Amazon or O’Reilly
•  Free e-books currently available courtesy of MapR
http://bit.ly/ebook-real-
world-hadoop
http://bit.ly/mapr-tsdb-
ebook
http://bit.ly/ebook-
anomaly
http://bit.ly/
recommendation-
ebook
®
© 2014 MapR Technologies 41
Sharing Big Data Safely
by Ted Dunning and Ellen Friedman © Oct 2016 (published by O’Reilly)
Free copies courtesy @MapR
Download eBook
http://bit.ly/mapr-sharing-big-data
Book signing for print book:
Data Day Texas 16 Jan 2016
®
© 2014 MapR Technologies 42
Q&A
@mapr, @ted_dunning maprtech
tdunning@mapr.tech.com
Engage with us!
MapR
maprtech
mapr-technologies

More Related Content

More from MapR Technologies

More from MapR Technologies (20)

Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model Management
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
 
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsLive Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIs
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn Prediction
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data Platform
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
 
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareBest Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in Healthcare
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and Analytics
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT Better
 
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLEvolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQL
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and Rain
 
Open Source Innovations in the MapR Ecosystem Pack 2.0
Open Source Innovations in the MapR Ecosystem Pack 2.0Open Source Innovations in the MapR Ecosystem Pack 2.0
Open Source Innovations in the MapR Ecosystem Pack 2.0
 
How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications
 
MapR 5.2: Getting More Value from the MapR Converged Data Platform
MapR 5.2: Getting More Value from the MapR Converged Data PlatformMapR 5.2: Getting More Value from the MapR Converged Data Platform
MapR 5.2: Getting More Value from the MapR Converged Data Platform
 
MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -
 
Handling the Extremes: Scaling and Streaming in Finance
Handling the Extremes: Scaling and Streaming in FinanceHandling the Extremes: Scaling and Streaming in Finance
Handling the Extremes: Scaling and Streaming in Finance
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 

Modern streaming analytics, flow versus state

  • 1. ® © 2014 MapR Technologies 1 ® © 2014 MapR Technologies Ted Dunning
  • 2. ® © 2014 MapR Technologies 2 My Background •  University, Startups –  Aptex, MusicMatch, ID Analytics, Veoh –  big data since before it was big •  Open source –  even before the internet –  Calcite, Datafu, Drill, Kylin, Mahout, Myriad, Samoa, Singa, Storm, Zookeeper –  bought the beer at first HUG •  VP Incubator at Apache Software Foundation •  MapR
  • 3. ® © 2014 MapR Technologies 3 5 Views of Streaming •  Enterprise teams •  Micro-services •  Abstract computing •  Delivering results •  Synthesis
  • 4. ® © 2014 MapR Technologies 4 File upload web service Files Thumbnail extraction Transcoding uploads thumbs recodes Files
  • 5. ® © 2014 MapR Technologies 5 The first story The enterprise bandit
  • 6. ® © 2014 MapR Technologies 6 Evolution Beyond Massive Monolithic Systems •  In monoliths, complexity of mainframe systems led to specialization –  Storage –  DB –  Systems analysis –  Programmers –  Operations –  Data entry •  This made n-tier architectures a natural next step
  • 7. ® © 2014 MapR Technologies 7 3-tier Architecture Web tier Middle tier Data tier
  • 8. ® © 2014 MapR Technologies 8 3-tier Architecture (essence) Web tier Middle tier Data tier
  • 9. ® © 2014 MapR Technologies 9 3-tier, in Practice Web tier Middle tier Data tier Web tier Middle tier Data tier Web tier Middle tier Data tier Web tier Middle tier Data tier
  • 10. ® © 2014 MapR Technologies 10 To Compensate, Add CONTROL •  Enterprise service busses evolved to re-establish specialization and centralize control •  The key point is advanced processing embedded in a messaging and control backbone
  • 11. ® © 2014 MapR Technologies 11 Enterprise Service Bus
  • 12. ® © 2014 MapR Technologies 12 Summary 1 •  Tiering leads to tic-tac-toe architectures •  ESB leads to control-heavy balls of string –  Better than balls of mud, but only just barely •  This isn’t the answer
  • 13. ® © 2014 MapR Technologies 13 The second story The startup Samurai
  • 14. ® © 2014 MapR Technologies 14 Time for Some Pendulum Swinging •  ESB’s are still alive and kicking •  Backlash in progress –  Google, Facebook, Netflix (after DVD mail), Amazon, LinkedIn (v. 2) –  And a gazillion less well known companies –  Heavily associated with SV startup scene –  Heavily associated developments in javascript, Python communities •  Meteor.js, node •  Swagger •  Consider transposing n-tier architectures
  • 15. ® © 2014 MapR Technologies 15 RPC layer Logic Disk RPC layer Logic Disk RPC layer Logic Disk Start with Partitioning
  • 16. ® © 2014 MapR Technologies 16 Give Them a Job, and a Way to Communicate Keep it very light-weight!
  • 17. ® © 2014 MapR Technologies 17 Results Can Be Stunning •  Companies who adopted this style are associated with stunning success –  See previous list •  Companies that did not are associated with … •  Of course, this may just be what happens when you hire smart folk –  Indeed, may only be possible with crazy smart teams
  • 18. ® © 2014 MapR Technologies 18 But … •  Much of the discussion talks about RPC (call/response) services •  This fine, but limiting •  Key idiom is deferred processing –  Do something urgently –  Queue message to complete later
  • 19. ® © 2014 MapR Technologies 19 But … •  RPC is simple … –  REST –  Protobuf –  Avro –  Yada yada •  No infrastructure needed beyond network + DNS + load balancer –  And you have those already •  Non-scalable persistence layers are often assumed
  • 20. ® © 2014 MapR Technologies 20 For Message Based Services •  The message receiver may not even be running right now –  So we need a persistent queue •  The number of messages is plausibly very high –  Total number of external requests (x 5-10) –  Total number of persistence ops (x 2-3) •  Millions of messages, GB/s of traffic quite plausible •  Moving this to enterprise from startups adds challenges
  • 21. ® © 2014 MapR Technologies 21 Summary 2 •  Micro-services requires durable, high-performance message queues •  These systems don’t just like durable, high performance queues •  These systems require durability. And high performance. •  Old school queues need not apply
  • 22. ® © 2014 MapR Technologies 22 The third story The tired metaphor
  • 23. ® © 2014 MapR Technologies 23 The third story The tired metaphor The view from the clouds
  • 24. ® © 2014 MapR Technologies 24 Not What Turing had in Mind •  Conventional programs are all about batch –  Given a finite input, produce a finite output –  Key parameters are time to complete, cost, halting, correctness –  OK for batch processing, OK for query/response •  Stream processing is different –  Given a finite prefix of an infinite input, produce a prefix of infinite output –  We are allowed to change our minds of some of the output –  Key parameters are latency, cost, commitment level
  • 25. ® © 2014 MapR Technologies 25 Δt tprovisional Input Output Note that the existence of provisional outputs implies we have to handle provisional inputs as well
  • 26. ® © 2014 MapR Technologies 26 More Complications •  Our latency isn’t the only story •  We don’t get data instantly •  So we don’t even start with zero latency •  In fact, delay is the key problem in flow-based computing
  • 27. ® © 2014 MapR Technologies 27 Thought Problem •  What is the temperature everywhere on earth –  Right now –  This is impossible •  What was the temperature everywhere on earth an hour ago? –  This is hard •  What was the temperature everywhere on earth last month? –  This is pretty easy •  Does this mean we cannot talk about today’s weather?
  • 28. ® © 2014 MapR Technologies 28 The Problem of State •  The present temperature of Earth may or may not exist •  Only the delayed temperature can matter to a practical computation •  But computations in different places will see different delays •  (promise me you know that I’m not just talking temperature)
  • 29. ® © 2014 MapR Technologies 29 Summary 3 •  For important problems, we have to represent distributed computations as messages and flows •  This isn’t a matter of convenience
  • 30. ® © 2014 MapR Technologies 30 The n-th story Getting stuff done in the real world
  • 31. ® © 2014 MapR Technologies 31 mySQL mySQL files Web-site Auth service Upload service Image extractor Transcoder User profiles Search User action logging Recommendation analysis mySQL mySQL mySQL Oracle Solr Elastic
  • 32. ® © 2014 MapR Technologies 32 mySQL mySQL files Web-site Auth service Upload service Image extractor Transcoder User profiles Search User action logging Recommendation analysis mySQL mySQL mySQL Oracle Solr Elastic
  • 33. ® © 2014 MapR Technologies 33 Micro-service Diagram File upload web service Raw files Thumbnail extraction Transcoding Video metadata Video files DB updater DB snapshots Metadata snapsMetadata snapsMetadata snap db Live metadata DB uploads thumbs recodes video adds snaps Image files
  • 34. ® © 2014 MapR Technologies 34 Some Omitted Details File upload web service Files Thumbnail extraction Transcoding uploads thumbs recodes Files
  • 35. ® © 2014 MapR Technologies 35 More Omitted details Thumbnail extraction uploads thumbs metrics exceptions checkpoints Input Output Monitoring Restart
  • 36. ® © 2014 MapR Technologies 36 Some Real World Implications •  Messaging must be durable and infrastructural –  Can’t depend on sender or receiver actually running •  Messages aren’t great for everything –  1TB message? •  We need (scalable) files •  We need (scalable) tables •  We need (scalable) streams •  We still should isolate persistence if possible
  • 37. ® © 2014 MapR Technologies 37 Some Real World Notes •  Durable high-speed queuing is required –  Kafka does this –  MapR Streams does this –  Very little else does it •  Traditional messaging is not even close –  With durability, most queues drop to <10k messages/second (MB/s) –  High performance systems handle >1 M messages/second (GB/s) •  Latencies less than 1ms are handled by different systems •  Global scale introduces additional constraints not considered here
  • 38. ® © 2014 MapR Technologies 38 Summary •  Micro-services is the natural evolution •  Micro-services implies durable, fast, Kafka- esque queuing •  Macro and micro architectures are required •  Infrastructure has to include more than just queues. Must have files and tables also Web tier Middle tier Data tier Web tier Middle tier Data tier Web tier Middle tier Data tier Web tier Middle tier Data tier Thumbnail extraction uploads thumbs metrics exceptions checkpoints Input Output Monitoring Restart
  • 39. ® © 2014 MapR Technologies 39
  • 40. ® © 2014 MapR Technologies 40 Other Short Books by Ted Dunning & Ellen Friedman •  For sale from Amazon or O’Reilly •  Free e-books currently available courtesy of MapR http://bit.ly/ebook-real- world-hadoop http://bit.ly/mapr-tsdb- ebook http://bit.ly/ebook- anomaly http://bit.ly/ recommendation- ebook
  • 41. ® © 2014 MapR Technologies 41 Sharing Big Data Safely by Ted Dunning and Ellen Friedman © Oct 2016 (published by O’Reilly) Free copies courtesy @MapR Download eBook http://bit.ly/mapr-sharing-big-data Book signing for print book: Data Day Texas 16 Jan 2016
  • 42. ® © 2014 MapR Technologies 42 Q&A @mapr, @ted_dunning maprtech tdunning@mapr.tech.com Engage with us! MapR maprtech mapr-technologies