More Related Content More from MapR Technologies (20) Modern streaming analytics, flow versus state1. ®
© 2014 MapR Technologies 1
®
© 2014 MapR Technologies
Ted Dunning
2. ®
© 2014 MapR Technologies 2
My Background
• University, Startups
– Aptex, MusicMatch, ID Analytics, Veoh
– big data since before it was big
• Open source
– even before the internet
– Calcite, Datafu, Drill, Kylin, Mahout, Myriad, Samoa, Singa, Storm,
Zookeeper
– bought the beer at first HUG
• VP Incubator at Apache Software Foundation
• MapR
3. ®
© 2014 MapR Technologies 3
5 Views of Streaming
• Enterprise teams
• Micro-services
• Abstract computing
• Delivering results
• Synthesis
4. ®
© 2014 MapR Technologies 4
File upload
web service
Files
Thumbnail
extraction
Transcoding
uploads
thumbs
recodes
Files
5. ®
© 2014 MapR Technologies 5
The first story
The enterprise bandit
6. ®
© 2014 MapR Technologies 6
Evolution Beyond Massive Monolithic Systems
• In monoliths, complexity of mainframe systems led to
specialization
– Storage
– DB
– Systems analysis
– Programmers
– Operations
– Data entry
• This made n-tier architectures a natural next step
7. ®
© 2014 MapR Technologies 7
3-tier Architecture
Web tier
Middle tier
Data tier
8. ®
© 2014 MapR Technologies 8
3-tier Architecture (essence)
Web tier
Middle tier
Data tier
9. ®
© 2014 MapR Technologies 9
3-tier, in Practice
Web tier
Middle tier
Data tier
Web tier
Middle tier
Data tier
Web tier
Middle tier
Data tier
Web tier
Middle tier
Data tier
10. ®
© 2014 MapR Technologies 10
To Compensate, Add CONTROL
• Enterprise service busses evolved to re-establish specialization
and centralize control
• The key point is advanced processing embedded in a messaging
and control backbone
12. ®
© 2014 MapR Technologies 12
Summary 1
• Tiering leads to tic-tac-toe architectures
• ESB leads to control-heavy balls of string
– Better than balls of mud, but only just barely
• This isn’t the answer
13. ®
© 2014 MapR Technologies 13
The second story
The startup Samurai
14. ®
© 2014 MapR Technologies 14
Time for Some Pendulum Swinging
• ESB’s are still alive and kicking
• Backlash in progress
– Google, Facebook, Netflix (after DVD mail), Amazon, LinkedIn (v. 2)
– And a gazillion less well known companies
– Heavily associated with SV startup scene
– Heavily associated developments in javascript, Python communities
• Meteor.js, node
• Swagger
• Consider transposing n-tier architectures
15. ®
© 2014 MapR Technologies 15
RPC layer
Logic
Disk
RPC layer
Logic
Disk
RPC layer
Logic
Disk
Start with Partitioning
16. ®
© 2014 MapR Technologies 16
Give Them a Job, and a Way to Communicate
Keep it very
light-weight!
17. ®
© 2014 MapR Technologies 17
Results Can Be Stunning
• Companies who adopted this style are associated with stunning
success
– See previous list
• Companies that did not are associated with …
• Of course, this may just be what happens when you hire smart
folk
– Indeed, may only be possible with crazy smart teams
18. ®
© 2014 MapR Technologies 18
But …
• Much of the discussion talks about RPC (call/response) services
• This fine, but limiting
• Key idiom is deferred processing
– Do something urgently
– Queue message to complete later
19. ®
© 2014 MapR Technologies 19
But …
• RPC is simple …
– REST
– Protobuf
– Avro
– Yada yada
• No infrastructure needed beyond network + DNS + load balancer
– And you have those already
• Non-scalable persistence layers are often assumed
20. ®
© 2014 MapR Technologies 20
For Message Based Services
• The message receiver may not even be running right now
– So we need a persistent queue
• The number of messages is plausibly very high
– Total number of external requests (x 5-10)
– Total number of persistence ops (x 2-3)
• Millions of messages, GB/s of traffic quite plausible
• Moving this to enterprise from startups adds challenges
21. ®
© 2014 MapR Technologies 21
Summary 2
• Micro-services requires durable, high-performance message
queues
• These systems don’t just like durable, high performance queues
• These systems require durability. And high performance.
• Old school queues need not apply
22. ®
© 2014 MapR Technologies 22
The third story
The tired metaphor
23. ®
© 2014 MapR Technologies 23
The third story
The tired metaphor
The view from the clouds
24. ®
© 2014 MapR Technologies 24
Not What Turing had in Mind
• Conventional programs are all about batch
– Given a finite input, produce a finite output
– Key parameters are time to complete, cost, halting, correctness
– OK for batch processing, OK for query/response
• Stream processing is different
– Given a finite prefix of an infinite input, produce a prefix of infinite output
– We are allowed to change our minds of some of the output
– Key parameters are latency, cost, commitment level
25. ®
© 2014 MapR Technologies 25
Δt
tprovisional
Input
Output
Note that the existence
of provisional outputs
implies we have to handle
provisional inputs as well
26. ®
© 2014 MapR Technologies 26
More Complications
• Our latency isn’t the only story
• We don’t get data instantly
• So we don’t even start with zero latency
• In fact, delay is the key problem in flow-based computing
27. ®
© 2014 MapR Technologies 27
Thought Problem
• What is the temperature everywhere on earth
– Right now
– This is impossible
• What was the temperature everywhere on earth an hour ago?
– This is hard
• What was the temperature everywhere on earth last month?
– This is pretty easy
• Does this mean we cannot talk about today’s weather?
28. ®
© 2014 MapR Technologies 28
The Problem of State
• The present temperature of Earth may or may not exist
• Only the delayed temperature can matter to a practical
computation
• But computations in different places will see different delays
• (promise me you know that I’m not just talking temperature)
29. ®
© 2014 MapR Technologies 29
Summary 3
• For important problems, we have to represent distributed
computations as messages and flows
• This isn’t a matter of convenience
30. ®
© 2014 MapR Technologies 30
The n-th story
Getting stuff done in
the real world
31. ®
© 2014 MapR Technologies 31
mySQL
mySQL
files
Web-site
Auth
service
Upload
service
Image
extractor
Transcoder
User
profiles
Search
User action
logging
Recommendation
analysis
mySQL
mySQL
mySQL
Oracle
Solr
Elastic
32. ®
© 2014 MapR Technologies 32
mySQL
mySQL
files
Web-site
Auth
service
Upload
service
Image
extractor
Transcoder
User
profiles
Search
User action
logging
Recommendation
analysis
mySQL
mySQL
mySQL
Oracle
Solr
Elastic
33. ®
© 2014 MapR Technologies 33
Micro-service Diagram
File upload
web service
Raw
files
Thumbnail
extraction
Transcoding
Video
metadata
Video
files
DB updater
DB
snapshots
Metadata
snapsMetadata
snapsMetadata
snap db
Live
metadata
DB
uploads
thumbs
recodes
video
adds
snaps
Image
files
34. ®
© 2014 MapR Technologies 34
Some Omitted Details
File upload
web service
Files
Thumbnail
extraction
Transcoding
uploads
thumbs
recodes
Files
35. ®
© 2014 MapR Technologies 35
More Omitted details
Thumbnail
extraction
uploads
thumbs
metrics
exceptions
checkpoints
Input
Output
Monitoring
Restart
36. ®
© 2014 MapR Technologies 36
Some Real World Implications
• Messaging must be durable and infrastructural
– Can’t depend on sender or receiver actually running
• Messages aren’t great for everything
– 1TB message?
• We need (scalable) files
• We need (scalable) tables
• We need (scalable) streams
• We still should isolate persistence if possible
37. ®
© 2014 MapR Technologies 37
Some Real World Notes
• Durable high-speed queuing is required
– Kafka does this
– MapR Streams does this
– Very little else does it
• Traditional messaging is not even close
– With durability, most queues drop to <10k messages/second (MB/s)
– High performance systems handle >1 M messages/second (GB/s)
• Latencies less than 1ms are handled by different systems
• Global scale introduces additional constraints not considered
here
38. ®
© 2014 MapR Technologies 38
Summary
• Micro-services is the natural evolution
• Micro-services implies durable, fast, Kafka-
esque queuing
• Macro and micro architectures are required
• Infrastructure has to include more than just
queues. Must have files and tables also
Web tier
Middle tier
Data tier
Web tier
Middle tier
Data tier
Web tier
Middle tier
Data tier
Web tier
Middle tier
Data tier
Thumbnail
extraction
uploads
thumbs
metrics
exceptions
checkpoints
Input
Output
Monitoring
Restart
40. ®
© 2014 MapR Technologies 40
Other Short Books by Ted Dunning & Ellen Friedman
• For sale from Amazon or O’Reilly
• Free e-books currently available courtesy of MapR
http://bit.ly/ebook-real-
world-hadoop
http://bit.ly/mapr-tsdb-
ebook
http://bit.ly/ebook-
anomaly
http://bit.ly/
recommendation-
ebook
41. ®
© 2014 MapR Technologies 41
Sharing Big Data Safely
by Ted Dunning and Ellen Friedman © Oct 2016 (published by O’Reilly)
Free copies courtesy @MapR
Download eBook
http://bit.ly/mapr-sharing-big-data
Book signing for print book:
Data Day Texas 16 Jan 2016
42. ®
© 2014 MapR Technologies 42
Q&A
@mapr, @ted_dunning maprtech
tdunning@mapr.tech.com
Engage with us!
MapR
maprtech
mapr-technologies