© 2017 MapR Technologies 1
Geo-Distributed Big Data and Analytics:
Data where you need it
Computation where you want it
© 2017 MapR Technologies 2
Contact Information
Ted Dunning, PhD
Chief Application Architect, MapR Technologies
Board member, Apache Software Foundation
O’Reilly author
Email tdunning@mapr.com tdunning@apache.org
Twitter @ted_dunning
© 2017 MapR Technologies 3
Contact Information
Ellen Friedman, PhD
Principal Technologist, MapR Technologies
Committer Apache Drill & Apache Mahout projects
O’Reilly author
Email efriedman@mapr.com ellenf@apache.org
Twitter @Ellen_Friedman
© 2017 MapR Technologies 4
Imagine a future where …
You easily collect, access & analyze big data where ever you need it across
the globe in a seamless system under the same security & administration
© 2017 MapR Technologies 5
The future is here: Global Data Fabric
• Global data fabric lets you update business globally
• Local activities coordinate with global analyses
• Do this without requiring huge teams at each site
• Do this affordably, reliably, with low latency and at large scale
We’re here to explain how, but first: a real-world case study
© 2017 MapR Technologies 6
MapR customer:
“A year in the bank”
© 2017 MapR Technologies 7
A Year of Technical Credit, not Debt
• Streaming media delivery is inherently geo-distributed
• Must measure if you want to manage
– metrics need to be collected and processed locally
– and distributed back to central facility
• How?
Streams!
© 2017 MapR Technologies 8
Collect Data
log consolidator
web server
web server
Web-
server
Log
Web-
server
Log
log_events
log-stash
log-stash
data center
© 2017 MapR Technologies 9
And Transport to Global Analytics
log consolidator
web server
web server
Web-
server
Log
Web-
server
Log
log_events
log-stash
log-stash
data center GHQ
log_events
events
Elaborate
events
(log-stash)
Aggregate
Signal
detection
© 2017 MapR Technologies 10
With Many Sources
log consolidator
web server
web server
Web-
server
Log
Web-
server
Log
log_events
log-stash
log-stash
data center GHQ
log_events
events
Elaborate
events
(log-stash)
Aggregate
Signal
detection
© 2017 MapR Technologies 11
With Many Sources
log consolidator
web server
web server
Web-
server
Log
Web-
server
Log
log_events
log-stash
log-stash
data center GHQ
log_events
events
Elaborate
events
(log-stash)
Aggregate
Signal
detection
log consolidator
web server
Web-
server
Log
web server
Web-
server
Log
log_events
log-stash
log-stash
data center
© 2017 MapR Technologies 12
With Many Sources
log consolidator
web server
web server
Web-
server
Log
Web-
server
Log
log_events
log-stash
log-stash
data center GHQ
log_events
events
Elaborate
events
(log-stash)
Aggregate
Signal
detection
log consolidator
web server
Web-
server
Log
web server
Web-
server
Log
log_events
log-stash
log-stash
data center
log consolidator
web server
Web-
server
Log
web server
Web-
server
Log
log_events
log-stash
log-stash
data center
© 2017 MapR Technologies 13
Analytics Doesn’t Care About Location
GHQ
log_events
events
Elaborate
events
(log-stash)
Aggregate
Signal
detection
© 2017 MapR Technologies 14
MapR Streams:
Geo-distributed data
plus separation of concerns
© 2017 MapR Technologies 15
Why stream?
© 2017 MapR Technologies 16
Mechanism for a Global Data Fabric: Streaming
• Streaming data is becoming mainstream
• Innovative technologies are emerging to handle and process
streaming data
• Stream-1st architecture is a powerful approach with surprisingly
widespread advantages
© 2017 MapR Technologies 17
Images © E. Friedman
.
IoT Data: Sensors & Smart Parts
©WesAbrams
© 2017 MapR Technologies 18
Revolution in Manufacturing: Smart Tools
• Respond quickly to new requirements
• IoT enabled “smart tools” on
manufacturing floor
• Reconfigurable factory
• Removes barriers to communication:
engineering, design, analysis,
manufacture in one space
Image credit Bond Bryan Architecture, used with permission
Factory 2050: Boeing AMRC at University of Sheffield
© 2017 MapR Technologies 19
Streaming data has value
beyond real-time insights
© 2017 MapR Technologies 20
Predictive Maintenance
• Streaming sensor data + long term maintenance histories 
• Machine learning model detects anomalous pattern
• “Failure signature” warns of need for maintenance before
damage occurs
Image courtesy Mtell used with permission.
in Real World Hadoop by Dunning & Friedman © 2015
© 2017 MapR Technologies 21
Stream-1st Architecture
Real-time
analytics
EMR
Patient Facilities
management
Insurance
audit
A
B
Medical tests
C
Medical test
results
People often start with event
data streamed to a real-time
application (A)
Image © 2016 Ted Dunning & Ellen Friedman from Chap 1 O’Reilly book Streaming Architecture
used with permission
© 2017 MapR Technologies 22
Heart of Stream-1st Architecture: Message Transport
Real-time
analytics
EMR
Patient Facilities
management
Insurance
audit
A
B
Medical tests
C
Medical test
results
The right messaging tool
supports other classes of use
cases (B & C)
Image © 2016 Ted Dunning & Ellen Friedman from Chap 1 O’Reilly book Streaming Architecture
used with permission
© 2017 MapR Technologies 23
Key capabilities:
Message Transport: Apache Kafka & MapR Streams
• Highly scalable
• High throughput, low latency
• Multiple producers &
consumers: decoupled
• Durable messages
• Geo-distributed replication
preserves offsets (unique to
MapR Streams)
Consumer
group
Messages
Producer
Consumer
group
Consumer
group
Producer
Image © 2016 Ted Dunning & Ellen Friedman from Chap 2 of O’Reilly book Streaming Architecture
used with permission
© 2017 MapR Technologies 24
Stream transport
supports micro services
© 2017 MapR Technologies 25
Stream-1st Architecture: Basis for Micro-Services
Stream instead of database as the shared “truth”
POS
1..n
Fraud
detector
Last card
use
Updater
Card
analytics
Other
card activity
Image © 2016 Ted Dunning & Ellen Friedman from Chap 6 of O’Reilly book Streaming Architecture used with permission
© 2017 MapR Technologies 26
MapR Streams:
efficient bi-directional,
multi-master replication
© 2017 MapR Technologies 27
With MapR, Geo-Distributed Data Appears Local
stream
Data
source
Consumer
© 2017 MapR Technologies 28
With MapR, Geo-Distributed Data Appears Local
stream
stream
Data
source
Consumer
© 2017 MapR Technologies 29
With MapR, Geo-distributed Data Appears Local
stream
stream
Data
source
ConsumerGlobal Data Center
Regional Data Center
© 2017 MapR Technologies 30
Unique to MapR: Manage Topics at Stream Level
• Many more topics on MapR cluster
• Topics are grouped together in Stream (different from Kafka)
• Policies set at the Stream level such as time-to-live, ACEs (controlled
access at this level is different than Kafka)
• Geo-distributed stream replication (different from Kafka)
Stream
Topic 1
Topic 3
Topic 2
Image © 2016 Ted Dunning & Ellen Friedman from Chap 5 of O’Reilly book Streaming Architecture used with permission
© 2017 MapR Technologies 31
Multi-Master Matters: MapR Table Replication
SF
DB
NY
DB
Source Source
SF
DB
NY
DB
Source Source
A
B
Better:
Bi-directional table replication
© 2017 MapR Technologies 32
Multi-Master Replication Matters
From O’Reilly report “Data Where You Want It: Geo-Distribution of Big Data and
Analytics” © 2017 by Ted Dunning & Ellen Friedman, used with permission
© 2017 MapR Technologies 33
Legacy Applications
MapR: Files, tables, streams in one technology
Data Center
Big Data 1.0 Applications Next-Gen Applications
Converged Data Platform
High Availability Real Time Unified Security Multi-tenancy Disaster Recovery Global Namespace
Real-Time Database Stream TransportWeb-Scale Storage
© 2017 MapR Technologies 34
Example
Files
Table
Streams
Directories
Cluster
Volume mount point
© 2017 MapR Technologies 35
Geo-distributed data
with ease of management
© 2017 MapR Technologies 36
Remember this? Universal Pathname
Files
Table
Streams
Directories
Cluster
Volume mount point
© 2017 MapR Technologies 37
Global Namespace: Advantage for Geo-Distribution
• Enables program to refer to data anywhere
– On premise, in cloud, around the world
• Easier to manage: Supports separation of concerns
• Unique to MapR
© 2017 MapR Technologies 38
The Need for Cloud
• Fewer machines to purchase, flexibility regarding hardware
• Eases pressure on IT
• Cloud-bursting possible for short-term increase in computing
• Lets you optimize cluster usage (especially if you maintain
cloud neutrality)
© 2017 MapR Technologies 39
Cloud Neutrality for Optimization
Burst
Private
On-premise
data center
Core
4x cheaper for base load
4x cheaper for
peak loads
© 2017 MapR Technologies 40
Hybrid Cloud On-Premise Architecture
• Optimization of resources, but…
• Too complicated unless have a good platform for geo-distributed
data
• MapR is such a platform
© 2017 MapR Technologies 41
The Need for Containers
• To deploy exactly the same thing in many data-centers
• To get predictable behavior in production compared with testing
• Result: Docker-style containers becoming ubiquitous
© 2017 MapR Technologies 42
Key to Lightweight Containers: Persist State to MapR
Data platform
Stateful
Application
Stateful
Application
Stateless
Application
Container management system
• Works for files, tables, streams
• Run stateful applications in stateless containers
© 2017 MapR Technologies 43
Data where you want it,
compute where you need it…
© 2017 MapR Technologies 44
... including processing
at the IoT edge
© 2017 MapR Technologies 45
MapR Edge
• Small footprint cluster for remote processing
• Intended to run right next to the data producing device
– Unified end-to-end security policy
– Reliable replication to cloud and data center, even with occasional
connections
• Small but full MapR data services (files, tables, streams)
– Normal data protection (snapshots, mirroring, replication)
– Normal management capabilities (volumes, fine-grained access
control, monitoring)
© 2017 MapR Technologies 46
Why MapR Edge is Useful
Data
source
Data
source
Data
source
Report
Data
source
• Designed to sit at IoT edge
• Intended to run right next to data- producing
device
© 2017 MapR Technologies 47
Who needs MapR Edge?
• Connected car industry
• Telecommunications industry
• Hospitals and medical testing facilities
• Anyone who benefits from global learning but needs local
action
© Ellen Friedman
© Ellen Friedman
©WesAbrams
© 2017 MapR Technologies 48
MapR Edge: Improves Time to Insight
Before MapR Edge After MapR Edge
• Oil & gas
• Medical device
• Car test & dev
48 hours
12 hours
24 hours
< 2 hours
< 15 minutes
< 5 minutes
© 2017 MapR Technologies 49
Use Case: Telecommunications
Callers
Towers
cdr data
© 2017 MapR Technologies 50
Streaming in Telecom
• Data collection & handling happens at different levels
– tower, local data center, central data center)
• Batch: Can take 30 minutes per level
• Streaming: Latency drops to seconds or sub-seconds per level
• Ability to respond as events occur
• MapR Streams enables stream replication with offsets across data
centers
© 2017 MapR Technologies 51
Telecom Reporting and Logging
Tower 2
Tower 1
Data
source
HQ
Aggregate
Data
source
© 2017 MapR Technologies 52
Data Center
REST
https REST GW
Use Case: Automotive IoT
Car
CAN Bus
μ
Raw stream
Dispatcher
Data stream
Metrics
μ
© 2017 MapR Technologies 53
Global
analytics
models
GHQ
metrics
data center 1
data center 2
metrics
m1
m2
m3
m4
metrics
m1
m2
m3
m4
models
models
Global Machine Learning Foundation
© 2017 MapR Technologies 54
Learn globally, act locally
© 2017 MapR Technologies 55
Image © Ellen Friedman 2015, used with permission. From Chap 7 “Streaming
Architecture” book. Read free online: http://bit.ly/streams-ebook-ch7
.
Over 20% of world’s
shipping containers pass
through Singapore’s port.
Use Case: Container Shipping
© 2017 MapR Technologies 56
IoT Data for Container Shipping
Tokyo:
Sensors stream data to on-board
cluster that reports to onshore cluster
while in port
En route to Singapore:
MapR Streams geo-replication sends
data to next port before ship arrives.
Problem in Sydney:
Real-time insights alert to “high
humidity” in some containers
Singapore
Tokyo
Sydney
Corporate
HQ
A
B
C
Details in Chapter 7 “Streaming Architecture” book. Read free online here: http://bit.ly/streams-ebook-ch7
Figure used with permission.
© 2017 MapR Technologies 57
Additional Resources
O’Reilly report by Ted Dunning & Ellen Friedman © March 2017
Download free pdf courtesy of MapR:
http://bit.ly/mapr-geo-distribution-ebook-pdf
O’Reilly book by Ted Dunning & Ellen Friedman
© March 2016
Read free courtesy of MapR
https://mapr.com/streaming-architecture-using-
apache-kafka-mapr-streams/
© 2017 MapR Technologies 58
Please support women in tech – help build
girls’ dreams of what they can accomplish
© Ellen Friedman 2015#womenintech #datawomen
© 2017 MapR Technologies 59
Q&A
@mapr
Maprtechnologies
tdunning@mapr.com
ENGAGE WITH US
@ ted_dunning
@ Ellen_Friedman

Geo-Distributed Big Data and Analytics

  • 1.
    © 2017 MapRTechnologies 1 Geo-Distributed Big Data and Analytics: Data where you need it Computation where you want it
  • 2.
    © 2017 MapRTechnologies 2 Contact Information Ted Dunning, PhD Chief Application Architect, MapR Technologies Board member, Apache Software Foundation O’Reilly author Email tdunning@mapr.com tdunning@apache.org Twitter @ted_dunning
  • 3.
    © 2017 MapRTechnologies 3 Contact Information Ellen Friedman, PhD Principal Technologist, MapR Technologies Committer Apache Drill & Apache Mahout projects O’Reilly author Email efriedman@mapr.com ellenf@apache.org Twitter @Ellen_Friedman
  • 4.
    © 2017 MapRTechnologies 4 Imagine a future where … You easily collect, access & analyze big data where ever you need it across the globe in a seamless system under the same security & administration
  • 5.
    © 2017 MapRTechnologies 5 The future is here: Global Data Fabric • Global data fabric lets you update business globally • Local activities coordinate with global analyses • Do this without requiring huge teams at each site • Do this affordably, reliably, with low latency and at large scale We’re here to explain how, but first: a real-world case study
  • 6.
    © 2017 MapRTechnologies 6 MapR customer: “A year in the bank”
  • 7.
    © 2017 MapRTechnologies 7 A Year of Technical Credit, not Debt • Streaming media delivery is inherently geo-distributed • Must measure if you want to manage – metrics need to be collected and processed locally – and distributed back to central facility • How? Streams!
  • 8.
    © 2017 MapRTechnologies 8 Collect Data log consolidator web server web server Web- server Log Web- server Log log_events log-stash log-stash data center
  • 9.
    © 2017 MapRTechnologies 9 And Transport to Global Analytics log consolidator web server web server Web- server Log Web- server Log log_events log-stash log-stash data center GHQ log_events events Elaborate events (log-stash) Aggregate Signal detection
  • 10.
    © 2017 MapRTechnologies 10 With Many Sources log consolidator web server web server Web- server Log Web- server Log log_events log-stash log-stash data center GHQ log_events events Elaborate events (log-stash) Aggregate Signal detection
  • 11.
    © 2017 MapRTechnologies 11 With Many Sources log consolidator web server web server Web- server Log Web- server Log log_events log-stash log-stash data center GHQ log_events events Elaborate events (log-stash) Aggregate Signal detection log consolidator web server Web- server Log web server Web- server Log log_events log-stash log-stash data center
  • 12.
    © 2017 MapRTechnologies 12 With Many Sources log consolidator web server web server Web- server Log Web- server Log log_events log-stash log-stash data center GHQ log_events events Elaborate events (log-stash) Aggregate Signal detection log consolidator web server Web- server Log web server Web- server Log log_events log-stash log-stash data center log consolidator web server Web- server Log web server Web- server Log log_events log-stash log-stash data center
  • 13.
    © 2017 MapRTechnologies 13 Analytics Doesn’t Care About Location GHQ log_events events Elaborate events (log-stash) Aggregate Signal detection
  • 14.
    © 2017 MapRTechnologies 14 MapR Streams: Geo-distributed data plus separation of concerns
  • 15.
    © 2017 MapRTechnologies 15 Why stream?
  • 16.
    © 2017 MapRTechnologies 16 Mechanism for a Global Data Fabric: Streaming • Streaming data is becoming mainstream • Innovative technologies are emerging to handle and process streaming data • Stream-1st architecture is a powerful approach with surprisingly widespread advantages
  • 17.
    © 2017 MapRTechnologies 17 Images © E. Friedman . IoT Data: Sensors & Smart Parts ©WesAbrams
  • 18.
    © 2017 MapRTechnologies 18 Revolution in Manufacturing: Smart Tools • Respond quickly to new requirements • IoT enabled “smart tools” on manufacturing floor • Reconfigurable factory • Removes barriers to communication: engineering, design, analysis, manufacture in one space Image credit Bond Bryan Architecture, used with permission Factory 2050: Boeing AMRC at University of Sheffield
  • 19.
    © 2017 MapRTechnologies 19 Streaming data has value beyond real-time insights
  • 20.
    © 2017 MapRTechnologies 20 Predictive Maintenance • Streaming sensor data + long term maintenance histories  • Machine learning model detects anomalous pattern • “Failure signature” warns of need for maintenance before damage occurs Image courtesy Mtell used with permission. in Real World Hadoop by Dunning & Friedman © 2015
  • 21.
    © 2017 MapRTechnologies 21 Stream-1st Architecture Real-time analytics EMR Patient Facilities management Insurance audit A B Medical tests C Medical test results People often start with event data streamed to a real-time application (A) Image © 2016 Ted Dunning & Ellen Friedman from Chap 1 O’Reilly book Streaming Architecture used with permission
  • 22.
    © 2017 MapRTechnologies 22 Heart of Stream-1st Architecture: Message Transport Real-time analytics EMR Patient Facilities management Insurance audit A B Medical tests C Medical test results The right messaging tool supports other classes of use cases (B & C) Image © 2016 Ted Dunning & Ellen Friedman from Chap 1 O’Reilly book Streaming Architecture used with permission
  • 23.
    © 2017 MapRTechnologies 23 Key capabilities: Message Transport: Apache Kafka & MapR Streams • Highly scalable • High throughput, low latency • Multiple producers & consumers: decoupled • Durable messages • Geo-distributed replication preserves offsets (unique to MapR Streams) Consumer group Messages Producer Consumer group Consumer group Producer Image © 2016 Ted Dunning & Ellen Friedman from Chap 2 of O’Reilly book Streaming Architecture used with permission
  • 24.
    © 2017 MapRTechnologies 24 Stream transport supports micro services
  • 25.
    © 2017 MapRTechnologies 25 Stream-1st Architecture: Basis for Micro-Services Stream instead of database as the shared “truth” POS 1..n Fraud detector Last card use Updater Card analytics Other card activity Image © 2016 Ted Dunning & Ellen Friedman from Chap 6 of O’Reilly book Streaming Architecture used with permission
  • 26.
    © 2017 MapRTechnologies 26 MapR Streams: efficient bi-directional, multi-master replication
  • 27.
    © 2017 MapRTechnologies 27 With MapR, Geo-Distributed Data Appears Local stream Data source Consumer
  • 28.
    © 2017 MapRTechnologies 28 With MapR, Geo-Distributed Data Appears Local stream stream Data source Consumer
  • 29.
    © 2017 MapRTechnologies 29 With MapR, Geo-distributed Data Appears Local stream stream Data source ConsumerGlobal Data Center Regional Data Center
  • 30.
    © 2017 MapRTechnologies 30 Unique to MapR: Manage Topics at Stream Level • Many more topics on MapR cluster • Topics are grouped together in Stream (different from Kafka) • Policies set at the Stream level such as time-to-live, ACEs (controlled access at this level is different than Kafka) • Geo-distributed stream replication (different from Kafka) Stream Topic 1 Topic 3 Topic 2 Image © 2016 Ted Dunning & Ellen Friedman from Chap 5 of O’Reilly book Streaming Architecture used with permission
  • 31.
    © 2017 MapRTechnologies 31 Multi-Master Matters: MapR Table Replication SF DB NY DB Source Source SF DB NY DB Source Source A B Better: Bi-directional table replication
  • 32.
    © 2017 MapRTechnologies 32 Multi-Master Replication Matters From O’Reilly report “Data Where You Want It: Geo-Distribution of Big Data and Analytics” © 2017 by Ted Dunning & Ellen Friedman, used with permission
  • 33.
    © 2017 MapRTechnologies 33 Legacy Applications MapR: Files, tables, streams in one technology Data Center Big Data 1.0 Applications Next-Gen Applications Converged Data Platform High Availability Real Time Unified Security Multi-tenancy Disaster Recovery Global Namespace Real-Time Database Stream TransportWeb-Scale Storage
  • 34.
    © 2017 MapRTechnologies 34 Example Files Table Streams Directories Cluster Volume mount point
  • 35.
    © 2017 MapRTechnologies 35 Geo-distributed data with ease of management
  • 36.
    © 2017 MapRTechnologies 36 Remember this? Universal Pathname Files Table Streams Directories Cluster Volume mount point
  • 37.
    © 2017 MapRTechnologies 37 Global Namespace: Advantage for Geo-Distribution • Enables program to refer to data anywhere – On premise, in cloud, around the world • Easier to manage: Supports separation of concerns • Unique to MapR
  • 38.
    © 2017 MapRTechnologies 38 The Need for Cloud • Fewer machines to purchase, flexibility regarding hardware • Eases pressure on IT • Cloud-bursting possible for short-term increase in computing • Lets you optimize cluster usage (especially if you maintain cloud neutrality)
  • 39.
    © 2017 MapRTechnologies 39 Cloud Neutrality for Optimization Burst Private On-premise data center Core 4x cheaper for base load 4x cheaper for peak loads
  • 40.
    © 2017 MapRTechnologies 40 Hybrid Cloud On-Premise Architecture • Optimization of resources, but… • Too complicated unless have a good platform for geo-distributed data • MapR is such a platform
  • 41.
    © 2017 MapRTechnologies 41 The Need for Containers • To deploy exactly the same thing in many data-centers • To get predictable behavior in production compared with testing • Result: Docker-style containers becoming ubiquitous
  • 42.
    © 2017 MapRTechnologies 42 Key to Lightweight Containers: Persist State to MapR Data platform Stateful Application Stateful Application Stateless Application Container management system • Works for files, tables, streams • Run stateful applications in stateless containers
  • 43.
    © 2017 MapRTechnologies 43 Data where you want it, compute where you need it…
  • 44.
    © 2017 MapRTechnologies 44 ... including processing at the IoT edge
  • 45.
    © 2017 MapRTechnologies 45 MapR Edge • Small footprint cluster for remote processing • Intended to run right next to the data producing device – Unified end-to-end security policy – Reliable replication to cloud and data center, even with occasional connections • Small but full MapR data services (files, tables, streams) – Normal data protection (snapshots, mirroring, replication) – Normal management capabilities (volumes, fine-grained access control, monitoring)
  • 46.
    © 2017 MapRTechnologies 46 Why MapR Edge is Useful Data source Data source Data source Report Data source • Designed to sit at IoT edge • Intended to run right next to data- producing device
  • 47.
    © 2017 MapRTechnologies 47 Who needs MapR Edge? • Connected car industry • Telecommunications industry • Hospitals and medical testing facilities • Anyone who benefits from global learning but needs local action © Ellen Friedman © Ellen Friedman ©WesAbrams
  • 48.
    © 2017 MapRTechnologies 48 MapR Edge: Improves Time to Insight Before MapR Edge After MapR Edge • Oil & gas • Medical device • Car test & dev 48 hours 12 hours 24 hours < 2 hours < 15 minutes < 5 minutes
  • 49.
    © 2017 MapRTechnologies 49 Use Case: Telecommunications Callers Towers cdr data
  • 50.
    © 2017 MapRTechnologies 50 Streaming in Telecom • Data collection & handling happens at different levels – tower, local data center, central data center) • Batch: Can take 30 minutes per level • Streaming: Latency drops to seconds or sub-seconds per level • Ability to respond as events occur • MapR Streams enables stream replication with offsets across data centers
  • 51.
    © 2017 MapRTechnologies 51 Telecom Reporting and Logging Tower 2 Tower 1 Data source HQ Aggregate Data source
  • 52.
    © 2017 MapRTechnologies 52 Data Center REST https REST GW Use Case: Automotive IoT Car CAN Bus μ Raw stream Dispatcher Data stream Metrics μ
  • 53.
    © 2017 MapRTechnologies 53 Global analytics models GHQ metrics data center 1 data center 2 metrics m1 m2 m3 m4 metrics m1 m2 m3 m4 models models Global Machine Learning Foundation
  • 54.
    © 2017 MapRTechnologies 54 Learn globally, act locally
  • 55.
    © 2017 MapRTechnologies 55 Image © Ellen Friedman 2015, used with permission. From Chap 7 “Streaming Architecture” book. Read free online: http://bit.ly/streams-ebook-ch7 . Over 20% of world’s shipping containers pass through Singapore’s port. Use Case: Container Shipping
  • 56.
    © 2017 MapRTechnologies 56 IoT Data for Container Shipping Tokyo: Sensors stream data to on-board cluster that reports to onshore cluster while in port En route to Singapore: MapR Streams geo-replication sends data to next port before ship arrives. Problem in Sydney: Real-time insights alert to “high humidity” in some containers Singapore Tokyo Sydney Corporate HQ A B C Details in Chapter 7 “Streaming Architecture” book. Read free online here: http://bit.ly/streams-ebook-ch7 Figure used with permission.
  • 57.
    © 2017 MapRTechnologies 57 Additional Resources O’Reilly report by Ted Dunning & Ellen Friedman © March 2017 Download free pdf courtesy of MapR: http://bit.ly/mapr-geo-distribution-ebook-pdf O’Reilly book by Ted Dunning & Ellen Friedman © March 2016 Read free courtesy of MapR https://mapr.com/streaming-architecture-using- apache-kafka-mapr-streams/
  • 58.
    © 2017 MapRTechnologies 58 Please support women in tech – help build girls’ dreams of what they can accomplish © Ellen Friedman 2015#womenintech #datawomen
  • 59.
    © 2017 MapRTechnologies 59 Q&A @mapr Maprtechnologies tdunning@mapr.com ENGAGE WITH US @ ted_dunning @ Ellen_Friedman