More Related Content Similar to Geo-Distributed Big Data and Analytics (20) More from MapR Technologies (12) Geo-Distributed Big Data and Analytics1. © 2017 MapR Technologies 1
Geo-Distributed Big Data and Analytics:
Data where you need it
Computation where you want it
2. © 2017 MapR Technologies 2
Contact Information
Ted Dunning, PhD
Chief Application Architect, MapR Technologies
Board member, Apache Software Foundation
O’Reilly author
Email tdunning@mapr.com tdunning@apache.org
Twitter @ted_dunning
3. © 2017 MapR Technologies 3
Contact Information
Ellen Friedman, PhD
Principal Technologist, MapR Technologies
Committer Apache Drill & Apache Mahout projects
O’Reilly author
Email efriedman@mapr.com ellenf@apache.org
Twitter @Ellen_Friedman
4. © 2017 MapR Technologies 4
Imagine a future where …
You easily collect, access & analyze big data where ever you need it across
the globe in a seamless system under the same security & administration
5. © 2017 MapR Technologies 5
The future is here: Global Data Fabric
• Global data fabric lets you update business globally
• Local activities coordinate with global analyses
• Do this without requiring huge teams at each site
• Do this affordably, reliably, with low latency and at large scale
We’re here to explain how, but first: a real-world case study
6. © 2017 MapR Technologies 6
MapR customer:
“A year in the bank”
7. © 2017 MapR Technologies 7
A Year of Technical Credit, not Debt
• Streaming media delivery is inherently geo-distributed
• Must measure if you want to manage
– metrics need to be collected and processed locally
– and distributed back to central facility
• How?
Streams!
8. © 2017 MapR Technologies 8
Collect Data
log consolidator
web server
web server
Web-
server
Log
Web-
server
Log
log_events
log-stash
log-stash
data center
9. © 2017 MapR Technologies 9
And Transport to Global Analytics
log consolidator
web server
web server
Web-
server
Log
Web-
server
Log
log_events
log-stash
log-stash
data center GHQ
log_events
events
Elaborate
events
(log-stash)
Aggregate
Signal
detection
10. © 2017 MapR Technologies 10
With Many Sources
log consolidator
web server
web server
Web-
server
Log
Web-
server
Log
log_events
log-stash
log-stash
data center GHQ
log_events
events
Elaborate
events
(log-stash)
Aggregate
Signal
detection
11. © 2017 MapR Technologies 11
With Many Sources
log consolidator
web server
web server
Web-
server
Log
Web-
server
Log
log_events
log-stash
log-stash
data center GHQ
log_events
events
Elaborate
events
(log-stash)
Aggregate
Signal
detection
log consolidator
web server
Web-
server
Log
web server
Web-
server
Log
log_events
log-stash
log-stash
data center
12. © 2017 MapR Technologies 12
With Many Sources
log consolidator
web server
web server
Web-
server
Log
Web-
server
Log
log_events
log-stash
log-stash
data center GHQ
log_events
events
Elaborate
events
(log-stash)
Aggregate
Signal
detection
log consolidator
web server
Web-
server
Log
web server
Web-
server
Log
log_events
log-stash
log-stash
data center
log consolidator
web server
Web-
server
Log
web server
Web-
server
Log
log_events
log-stash
log-stash
data center
13. © 2017 MapR Technologies 13
Analytics Doesn’t Care About Location
GHQ
log_events
events
Elaborate
events
(log-stash)
Aggregate
Signal
detection
14. © 2017 MapR Technologies 14
MapR Streams:
Geo-distributed data
plus separation of concerns
16. © 2017 MapR Technologies 16
Mechanism for a Global Data Fabric: Streaming
• Streaming data is becoming mainstream
• Innovative technologies are emerging to handle and process
streaming data
• Stream-1st architecture is a powerful approach with surprisingly
widespread advantages
17. © 2017 MapR Technologies 17
Images © E. Friedman
.
IoT Data: Sensors & Smart Parts
©WesAbrams
18. © 2017 MapR Technologies 18
Revolution in Manufacturing: Smart Tools
• Respond quickly to new requirements
• IoT enabled “smart tools” on
manufacturing floor
• Reconfigurable factory
• Removes barriers to communication:
engineering, design, analysis,
manufacture in one space
Image credit Bond Bryan Architecture, used with permission
Factory 2050: Boeing AMRC at University of Sheffield
19. © 2017 MapR Technologies 19
Streaming data has value
beyond real-time insights
20. © 2017 MapR Technologies 20
Predictive Maintenance
• Streaming sensor data + long term maintenance histories
• Machine learning model detects anomalous pattern
• “Failure signature” warns of need for maintenance before
damage occurs
Image courtesy Mtell used with permission.
in Real World Hadoop by Dunning & Friedman © 2015
21. © 2017 MapR Technologies 21
Stream-1st Architecture
Real-time
analytics
EMR
Patient Facilities
management
Insurance
audit
A
B
Medical tests
C
Medical test
results
People often start with event
data streamed to a real-time
application (A)
Image © 2016 Ted Dunning & Ellen Friedman from Chap 1 O’Reilly book Streaming Architecture
used with permission
22. © 2017 MapR Technologies 22
Heart of Stream-1st Architecture: Message Transport
Real-time
analytics
EMR
Patient Facilities
management
Insurance
audit
A
B
Medical tests
C
Medical test
results
The right messaging tool
supports other classes of use
cases (B & C)
Image © 2016 Ted Dunning & Ellen Friedman from Chap 1 O’Reilly book Streaming Architecture
used with permission
23. © 2017 MapR Technologies 23
Key capabilities:
Message Transport: Apache Kafka & MapR Streams
• Highly scalable
• High throughput, low latency
• Multiple producers &
consumers: decoupled
• Durable messages
• Geo-distributed replication
preserves offsets (unique to
MapR Streams)
Consumer
group
Messages
Producer
Consumer
group
Consumer
group
Producer
Image © 2016 Ted Dunning & Ellen Friedman from Chap 2 of O’Reilly book Streaming Architecture
used with permission
24. © 2017 MapR Technologies 24
Stream transport
supports micro services
25. © 2017 MapR Technologies 25
Stream-1st Architecture: Basis for Micro-Services
Stream instead of database as the shared “truth”
POS
1..n
Fraud
detector
Last card
use
Updater
Card
analytics
Other
card activity
Image © 2016 Ted Dunning & Ellen Friedman from Chap 6 of O’Reilly book Streaming Architecture used with permission
26. © 2017 MapR Technologies 26
MapR Streams:
efficient bi-directional,
multi-master replication
27. © 2017 MapR Technologies 27
With MapR, Geo-Distributed Data Appears Local
stream
Data
source
Consumer
28. © 2017 MapR Technologies 28
With MapR, Geo-Distributed Data Appears Local
stream
stream
Data
source
Consumer
29. © 2017 MapR Technologies 29
With MapR, Geo-distributed Data Appears Local
stream
stream
Data
source
ConsumerGlobal Data Center
Regional Data Center
30. © 2017 MapR Technologies 30
Unique to MapR: Manage Topics at Stream Level
• Many more topics on MapR cluster
• Topics are grouped together in Stream (different from Kafka)
• Policies set at the Stream level such as time-to-live, ACEs (controlled
access at this level is different than Kafka)
• Geo-distributed stream replication (different from Kafka)
Stream
Topic 1
Topic 3
Topic 2
Image © 2016 Ted Dunning & Ellen Friedman from Chap 5 of O’Reilly book Streaming Architecture used with permission
31. © 2017 MapR Technologies 31
Multi-Master Matters: MapR Table Replication
SF
DB
NY
DB
Source Source
SF
DB
NY
DB
Source Source
A
B
Better:
Bi-directional table replication
32. © 2017 MapR Technologies 32
Multi-Master Replication Matters
From O’Reilly report “Data Where You Want It: Geo-Distribution of Big Data and
Analytics” © 2017 by Ted Dunning & Ellen Friedman, used with permission
33. © 2017 MapR Technologies 33
Legacy Applications
MapR: Files, tables, streams in one technology
Data Center
Big Data 1.0 Applications Next-Gen Applications
Converged Data Platform
High Availability Real Time Unified Security Multi-tenancy Disaster Recovery Global Namespace
Real-Time Database Stream TransportWeb-Scale Storage
34. © 2017 MapR Technologies 34
Example
Files
Table
Streams
Directories
Cluster
Volume mount point
35. © 2017 MapR Technologies 35
Geo-distributed data
with ease of management
36. © 2017 MapR Technologies 36
Remember this? Universal Pathname
Files
Table
Streams
Directories
Cluster
Volume mount point
37. © 2017 MapR Technologies 37
Global Namespace: Advantage for Geo-Distribution
• Enables program to refer to data anywhere
– On premise, in cloud, around the world
• Easier to manage: Supports separation of concerns
• Unique to MapR
38. © 2017 MapR Technologies 38
The Need for Cloud
• Fewer machines to purchase, flexibility regarding hardware
• Eases pressure on IT
• Cloud-bursting possible for short-term increase in computing
• Lets you optimize cluster usage (especially if you maintain
cloud neutrality)
39. © 2017 MapR Technologies 39
Cloud Neutrality for Optimization
Burst
Private
On-premise
data center
Core
4x cheaper for base load
4x cheaper for
peak loads
40. © 2017 MapR Technologies 40
Hybrid Cloud On-Premise Architecture
• Optimization of resources, but…
• Too complicated unless have a good platform for geo-distributed
data
• MapR is such a platform
41. © 2017 MapR Technologies 41
The Need for Containers
• To deploy exactly the same thing in many data-centers
• To get predictable behavior in production compared with testing
• Result: Docker-style containers becoming ubiquitous
42. © 2017 MapR Technologies 42
Key to Lightweight Containers: Persist State to MapR
Data platform
Stateful
Application
Stateful
Application
Stateless
Application
Container management system
• Works for files, tables, streams
• Run stateful applications in stateless containers
43. © 2017 MapR Technologies 43
Data where you want it,
compute where you need it…
44. © 2017 MapR Technologies 44
... including processing
at the IoT edge
45. © 2017 MapR Technologies 45
MapR Edge
• Small footprint cluster for remote processing
• Intended to run right next to the data producing device
– Unified end-to-end security policy
– Reliable replication to cloud and data center, even with occasional
connections
• Small but full MapR data services (files, tables, streams)
– Normal data protection (snapshots, mirroring, replication)
– Normal management capabilities (volumes, fine-grained access
control, monitoring)
46. © 2017 MapR Technologies 46
Why MapR Edge is Useful
Data
source
Data
source
Data
source
Report
Data
source
• Designed to sit at IoT edge
• Intended to run right next to data- producing
device
47. © 2017 MapR Technologies 47
Who needs MapR Edge?
• Connected car industry
• Telecommunications industry
• Hospitals and medical testing facilities
• Anyone who benefits from global learning but needs local
action
© Ellen Friedman
© Ellen Friedman
©WesAbrams
48. © 2017 MapR Technologies 48
MapR Edge: Improves Time to Insight
Before MapR Edge After MapR Edge
• Oil & gas
• Medical device
• Car test & dev
48 hours
12 hours
24 hours
< 2 hours
< 15 minutes
< 5 minutes
49. © 2017 MapR Technologies 49
Use Case: Telecommunications
Callers
Towers
cdr data
50. © 2017 MapR Technologies 50
Streaming in Telecom
• Data collection & handling happens at different levels
– tower, local data center, central data center)
• Batch: Can take 30 minutes per level
• Streaming: Latency drops to seconds or sub-seconds per level
• Ability to respond as events occur
• MapR Streams enables stream replication with offsets across data
centers
51. © 2017 MapR Technologies 51
Telecom Reporting and Logging
Tower 2
Tower 1
Data
source
HQ
Aggregate
Data
source
52. © 2017 MapR Technologies 52
Data Center
REST
https REST GW
Use Case: Automotive IoT
Car
CAN Bus
μ
Raw stream
Dispatcher
Data stream
Metrics
μ
53. © 2017 MapR Technologies 53
Global
analytics
models
GHQ
metrics
data center 1
data center 2
metrics
m1
m2
m3
m4
metrics
m1
m2
m3
m4
models
models
Global Machine Learning Foundation
54. © 2017 MapR Technologies 54
Learn globally, act locally
55. © 2017 MapR Technologies 55
Image © Ellen Friedman 2015, used with permission. From Chap 7 “Streaming
Architecture” book. Read free online: http://bit.ly/streams-ebook-ch7
.
Over 20% of world’s
shipping containers pass
through Singapore’s port.
Use Case: Container Shipping
56. © 2017 MapR Technologies 56
IoT Data for Container Shipping
Tokyo:
Sensors stream data to on-board
cluster that reports to onshore cluster
while in port
En route to Singapore:
MapR Streams geo-replication sends
data to next port before ship arrives.
Problem in Sydney:
Real-time insights alert to “high
humidity” in some containers
Singapore
Tokyo
Sydney
Corporate
HQ
A
B
C
Details in Chapter 7 “Streaming Architecture” book. Read free online here: http://bit.ly/streams-ebook-ch7
Figure used with permission.
57. © 2017 MapR Technologies 57
Additional Resources
O’Reilly report by Ted Dunning & Ellen Friedman © March 2017
Download free pdf courtesy of MapR:
http://bit.ly/mapr-geo-distribution-ebook-pdf
O’Reilly book by Ted Dunning & Ellen Friedman
© March 2016
Read free courtesy of MapR
https://mapr.com/streaming-architecture-using-
apache-kafka-mapr-streams/
58. © 2017 MapR Technologies 58
Please support women in tech – help build
girls’ dreams of what they can accomplish
© Ellen Friedman 2015#womenintech #datawomen
59. © 2017 MapR Technologies 59
Q&A
@mapr
Maprtechnologies
tdunning@mapr.com
ENGAGE WITH US
@ ted_dunning
@ Ellen_Friedman