SlideShare a Scribd company logo
1 of 43
Download to read offline
®
© 2016 MapR Technologies 1®
© 2016 MapR Technologies 1© 2016 MapR Technologies
®
Exploring Data Pipelines for Spark Streaming Applications
Carol McDonald, Industry Solutions Architect
2016
®
© 2016 MapR Technologies 2®
© 2016 MapR Technologies 2
What is Streaming Data? Got Some Examples?
Data Collection
Devices
Smart Machinery Phones and Tablets Home Automation
RFID Systems Digital Signage Security Systems Medical Devices
®
© 2016 MapR Technologies 3®
© 2016 MapR Technologies 3
It was hot
at 6:05
yesterday
!
Why Stream Processing?
Analyze
6:01 P.M.:
72°
6:02 P.M.:
75°
6:03 P.M.: 77°
6:04 P.M.: 85°
6:05 P.M.: 90°
6:06 P.M.: 85°
6:07 P.M.: 77°
6:08 P.M.: 75°
90°90°
6:01 P.M.: 72°
6:02 P.M.: 75°
6:03 P.M.: 77°
6:04 P.M.: 85°
6:05 P.M.: 90°
6:06 P.M.: 85°
6:07 P.M.: 77°
6:08 P.M.: 75°
Batch processing may be too late for some events
®
© 2016 MapR Technologies 4®
© 2016 MapR Technologies 4
Why Stream Processing?
6:05 P.M.: 90°
To
pic
Stream
Temperature
Turn on the
air
conditioning!
It’s becoming important to process events as they arrive
®
© 2016 MapR Technologies 5®
© 2016 MapR Technologies 5
Key to Real Time: Event-based Data Flows
web events
etc…
machine sensors
Biometrics
Mobile events
®
© 2016 MapR Technologies 6®
© 2016 MapR Technologies 6
What if BP had detected problems before the oil hit
the water ?
•  1M samples/sec
•  High performance at
scale is necessary!
®
© 2016 MapR Technologies 7®
© 2016 MapR Technologies 7
Use Case: Time Series Data
Data for
real-time monitoring
read
Sensor
time-stamped data Spark processing
Spark
Streaming
Stream
Topic
®
© 2016 MapR Technologies 8®
© 2016 MapR Technologies 8
Schema
•  All events stored, CF data could be set to expire data
•  Filtered alerts put in CF alerts
•  Daily summaries put in CF stats
Row key
CF data CF alerts CF stats
hz … psi psi … hz_avg … psi_min
COHUTTA_3/10/14_1:01 10.37 84 0
COHUTTA_3/10/14 10 0
Row Key contains oil
pump name, date, and
a time stamp
®
© 2016 MapR Technologies 9®
© 2016 MapR Technologies 9
Schema
•  All events stored, CF data could be set to expire data
•  Filtered alerts put in CF alerts
•  Daily summaries put in CF stats
Row key
CF data CF alerts CF stats
hz … psi psi … hz_avg … psi_min
COHUTTA_3/10/14_1:01 10.37 84 0
COHUTTA_3/10/14 10 0
®
© 2016 MapR Technologies 10®
© 2016 MapR Technologies 10
Schema
•  All events stored, CF data could be set to expire data
•  Filtered alerts put in CF alerts
•  Daily summaries put in CF stats
Row key
CF data CF alerts CF stats
hz … psi psi … hz_avg … psi_min
COHUTTA_3/10/14_1:01 10.37 84 0
COHUTTA_3/10/14 10 0
®
© 2016 MapR Technologies 11®
© 2016 MapR Technologies 11
Serve DataStore DataCollect Data
What Do We Need to Do ?
Process DataData Sources
? ? ? ?
®
© 2016 MapR Technologies 12®
© 2016 MapR Technologies 12
How do we do this with High Performance at Scale?
•  Parallel operations and minimize disk read/write time
®
© 2016 MapR Technologies 13®
© 2016 MapR Technologies 13
Collect the Data
Data Ingest
MapR-FS
Source
Stream
Topic
•  Data Ingest:
–  File Based: NFS with MapR-FS,
HDFS
–  Network Based: MapR Streams,
Kafka, Kinesis, Twitter, Sockets...
®
© 2016 MapR Technologies 14®
© 2016 MapR Technologies 14
MapR Streams Publish Subscribe Messaging
Topics Organize Events into Categories
and decouple Producers from Consumers
®
© 2016 MapR Technologies 15®
© 2016 MapR Technologies 15
Scalable Messaging with MapR Streams
Topics are partitioned for throughput and scalability
®
© 2016 MapR Technologies 16®
© 2016 MapR Technologies 16
How do we do this with High Performance at Scale?
•  Parallel , Partitioned = fast , scalable
–  Messaging with MapR Streams
®
© 2016 MapR Technologies 17®
© 2016 MapR Technologies 17
Collect Data
Process the Data with Spark Streaming
MapR-FS
Process Data
Stream
Topic
•  Extension of the core Spark AP
•  Enables scalable, high-throughput,
fault-tolerant stream processing of
live data
®
© 2016 MapR Technologies 18®
© 2016 MapR Technologies 18
Processing Spark DStreams
Data stream divided into batches of X milliseconds = DStreams
®
© 2016 MapR Technologies 19®
© 2016 MapR Technologies 19
Spark Resilient Distributed Datasets
RDD
W
Executor
P4
W
Executor
P1 P3
W
Executor
P2
partitioned
Partition 1
8213034705, 95,
2.927373,
jake7870, 0……
Partition 2
8213034705,
115, 2.943484,
Davidbresler2,
1….
Partition 3
8213034705,
100, 2.951285,
gladimacowgirl,
58…
Partition 4
8213034705,
117, 2.998947,
daysrus, 95….
Spark revolves around RDDs
•  Read only collection of elements
•  Partitioned across a cluster
•  Operated on in parallel
•  Cached in memory
®
© 2016 MapR Technologies 20®
© 2016 MapR Technologies 20
Spark Resilient Distributed Datasets
Spark revolves around RDDs
•  Read only collection of elements
•  Partitioned across a cluster
•  Operated on in parallel
•  Cached in memory
®
© 2016 MapR Technologies 21®
© 2016 MapR Technologies 21
How do we do this with High Performance at Scale?
•  Parallel , Partitioned = fast , scalable
–  Processing with Spark
®
© 2016 MapR Technologies 22®
© 2016 MapR Technologies 22
Processing Spark DStreams
transformations à create new RDDs
Two types of operations on DStreams:
•  Transformations:
–  Create new DStreams
–  map, filter, reduceByKey, SQL. . .
•  Output Operations
DStream
RDDs
DStream
RDDs
transform	
  transform	
  
data from
time 0 to 1
RDD @ time 1
data from
time 1 to 2
RDD @ time 2
data from
time 2 to 3
RDD @ time 3
RDD @ time 3
transform	
  
RDD @ time 1 RDD @ time 2
®
© 2016 MapR Technologies 23®
© 2016 MapR Technologies 23
Two types of operations on DStreams
•  Transformations
•  Output Operations: trigger
Computation
–  Save to File, HBase..
•  saveAsHadoopFiles
•  saveAsHadoopDataset
•  saveAsTextFiles
Processing Spark DStreams
Output operations à trigger computation
MapR-FS
MapR-DB
DStream
RDDs
data from
time 0 to 1
data from
time 1 to 2
data from
time 2 to 3
RDD @ time 3RDD @ time 1 RDD @ time 2
mapmap map
savesave save
®
© 2016 MapR Technologies 24®
© 2016 MapR Technologies 24
Serve DataStore DataCollect Data
What Do We Need to Do ?
MapR-FS
Process DataData Sources
MapR-FS
Stream
Topic
®
© 2016 MapR Technologies 25®
© 2016 MapR Technologies 25
MapR-DB (HBase API) is Designed to Scale
Key
Range
xxxx
xxxx
Key
Range
xxxx
xxxx
Key
Range
xxxx
xxxx
Key colB col
C
val val val
xxx val val
Key colB col
C
val val val
xxx val val
Key colB col
C
val val val
xxx val val
Fast Reads and Writes by Key! Data is automatically partitioned
by Key Range!
®
© 2016 MapR Technologies 26®
© 2016 MapR Technologies 26
Store Lots of Data with NoSQL MapR-DB
bottleneck
Key colB col
C
val val val
xxx val val
Key colB col
C
val val val
xxx val val
Key colB col
C
val val val
xxx val val
Storage ModelRDBMS MapR-DB
Normalized schema à Joins for
queries can cause bottleneck De-Normalized schema à Data that
is read together is stored together
®
© 2016 MapR Technologies 27®
© 2016 MapR Technologies 27
Key to Real Time: Event-based Data Flows
Key to Scale = Parallel Partitioned:
•  Messaging
•  Processing
•  Storage
®
© 2016 MapR Technologies 28®
© 2016 MapR Technologies 28
Serve DataStore DataCollect Data
What Do We Need to Do ?
MapR-FS
Process DataData Sources
MapR-FS
Stream
Topic
®
© 2016 MapR Technologies 29®
© 2016 MapR Technologies 29
Use Case Example Code
Data for
real-time monitoring
read
Sensor
time-stamped data Spark processing
Spark
Streaming
Stream
Topic
®
© 2016 MapR Technologies 30®
© 2016 MapR Technologies 30
Use Case Example Code
Data for
real-time monitoring
read
Sensor
time-stamped data Spark processing
Spark
Streaming
Stream
Topic
®
© 2016 MapR Technologies 31®
© 2016 MapR Technologies 31
KafkaProducer
String topic=“/streams/pump:warning”;
public static KafkaProducer producer;
Properties properties = new Properties();
properties.put("value.serializer",
"org.apache.kafka.common.serialization.StringSerializer");
// Instantiate KafkaProducer with properties
producer = new KafkaProducer<String, String>(properties);
String txt = “msg text”;
ProducerRecord<String, String> rec = new
ProducerRecord<String, String>(topic, txt);
producer.send(rec);
®
© 2016 MapR Technologies 32®
© 2016 MapR Technologies 32
Use Case Example Code
Data for
real-time monitoring
read
Sensor
time-stamped data Spark processing
Spark
Streaming
Stream
Topic
®
© 2016 MapR Technologies 33®
© 2016 MapR Technologies 33
Create a DStream
DStream: a sequence of RDDs
representing a stream of data
val ssc = new StreamingContext(sparkConf, Seconds(5))
val dStream = KafkaUtils.createDirectStream[String,
String](ssc, kafkaParams, topicsSet)
batch
time 0 to 1
batch
time 1 to 2
batch
time 2 to 3
dStream
Stored in memory
as an RDD
®
© 2016 MapR Technologies 34®
© 2016 MapR Technologies 34
Process DStream
val sensorDStream = dStream.map(_._2).map(parseSensor)
dStream RDDs
batch
time 2 to 3
batch
time 1 to 2
batch
time 0 to 1
sensorDStream RDDs
New RDDs created
for every batch
map map map
®
© 2016 MapR Technologies 35®
© 2016 MapR Technologies 35
Message Data to Sensor Object
case class Sensor(resid: String, date: String, time: String,
hz: Double, disp: Double, flo: Double, sedPPM: Double,
psi: Double, chlPPM: Double)
def parseSensor(str: String): Sensor = {
val p = str.split(",")
Sensor(p(0), p(1), p(2), p(3).toDouble, p(4).toDouble, p(5).toDouble,
p(6).toDouble, p(7).toDouble, p(8).toDouble)
}
®
© 2016 MapR Technologies 36®
© 2016 MapR Technologies 36
DataFrame and SQL Operations
// for Each RDD
sensorDStream.foreachRDD { rdd =>
val sqlContext = SQLContext.getOrCreate(rdd.sparkContext)
rdd.toDF().registerTempTable("sensor")
val res = sqlContext.sql( "SELECT resid, date,
max(hz) as maxhz, min(hz) as minhz, avg(hz) as avghz,
max(disp) as maxdisp, min(disp) as mindisp, avg(disp) as avgdisp,
max(flo) as maxflo, min(flo) as minflo, avg(flo) as avgflo,
max(psi) as maxpsi, min(psi) as minpsi, avg(psi) as avgpsi
FROM sensor GROUP BY resid,date")
res.show()
}
®
© 2016 MapR Technologies 37®
© 2016 MapR Technologies 37
Streaming Application Output
®
© 2016 MapR Technologies 38®
© 2016 MapR Technologies 38
Save to HBase
rdd.map(Sensor.convertToPut).saveAsHadoopDataset(jobConfig)
linesRDD DStream
sensorRDD DStream
output operation: persist
data to external storage
Put objects written
to HBase
batch
time 2-3
batch
time 1 to 2
batch
time 0 to 1
mapmap map
savesave save
®
© 2016 MapR Technologies 39®
© 2016 MapR Technologies 39
Start Receiving Data
sensorDStream.foreachRDD { rdd =>
. . .
}
// Start the computation
ssc.start()
// Wait for the computation to terminate
ssc.awaitTermination()
®
© 2016 MapR Technologies 40®
© 2016 MapR Technologies 40
Stream Processing
Building a Complete Data Architecture
MapR File System
(MapR-FS)
MapR Converged Data Platform
MapR Database
(MapR-DB)
MapR Streams
Sources/Apps Bulk Processing
®
© 2016 MapR Technologies 41®
© 2016 MapR Technologies 41
To Learn More:
•  Read explanation of and Download code
–  https://www.mapr.com/blog/fast-scalable-streaming-applications-mapr-streams-
spark-streaming-and-mapr-db
–  https://www.mapr.com/blog/spark-streaming-hbase
®
© 2016 MapR Technologies 42®
© 2016 MapR Technologies 42
To Learn More:
•  http://learn.mapr.com/
®
© 2016 MapR Technologies 43®
© 2016 MapR Technologies 43
Q&A
@mapr
@caroljmcdonald
https://www.mapr.com/blog/author/carol-mcdonald
Engage with us!
mapr-technologies

More Related Content

What's hot

Cmu-2011-09.pptx
Cmu-2011-09.pptxCmu-2011-09.pptx
Cmu-2011-09.pptx
Ted Dunning
 

What's hot (20)

Introduction to Apache HBase, MapR Tables and Security
Introduction to Apache HBase, MapR Tables and SecurityIntroduction to Apache HBase, MapR Tables and Security
Introduction to Apache HBase, MapR Tables and Security
 
M7 and Apache Drill, Micheal Hausenblas
M7 and Apache Drill, Micheal HausenblasM7 and Apache Drill, Micheal Hausenblas
M7 and Apache Drill, Micheal Hausenblas
 
Drill into Drill – How Providing Flexibility and Performance is Possible
Drill into Drill – How Providing Flexibility and Performance is PossibleDrill into Drill – How Providing Flexibility and Performance is Possible
Drill into Drill – How Providing Flexibility and Performance is Possible
 
Cmu-2011-09.pptx
Cmu-2011-09.pptxCmu-2011-09.pptx
Cmu-2011-09.pptx
 
Using Apache Drill
Using Apache DrillUsing Apache Drill
Using Apache Drill
 
Dealing with an Upside Down Internet
Dealing with an Upside Down InternetDealing with an Upside Down Internet
Dealing with an Upside Down Internet
 
Free Code Friday: Drill 101 - Basics of Apache Drill
Free Code Friday: Drill 101 - Basics of Apache DrillFree Code Friday: Drill 101 - Basics of Apache Drill
Free Code Friday: Drill 101 - Basics of Apache Drill
 
Spark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different RulesSpark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different Rules
 
MapR M7: Providing an enterprise quality Apache HBase API
MapR M7: Providing an enterprise quality Apache HBase APIMapR M7: Providing an enterprise quality Apache HBase API
MapR M7: Providing an enterprise quality Apache HBase API
 
Apache drill
Apache drillApache drill
Apache drill
 
Hug france-2012-12-04
Hug france-2012-12-04Hug france-2012-12-04
Hug france-2012-12-04
 
MapR 5.2: Getting More Value from the MapR Converged Community Edition
MapR 5.2: Getting More Value from the MapR Converged Community EditionMapR 5.2: Getting More Value from the MapR Converged Community Edition
MapR 5.2: Getting More Value from the MapR Converged Community Edition
 
Analyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillAnalyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache Drill
 
Apache Drill
Apache DrillApache Drill
Apache Drill
 
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...
 
Apache Drill - Why, What, How
Apache Drill - Why, What, HowApache Drill - Why, What, How
Apache Drill - Why, What, How
 
MapR 5.2: Getting More Value from the MapR Converged Data Platform
MapR 5.2: Getting More Value from the MapR Converged Data PlatformMapR 5.2: Getting More Value from the MapR Converged Data Platform
MapR 5.2: Getting More Value from the MapR Converged Data Platform
 
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
 
Working with Delimited Data in Apache Drill 1.6.0
Working with Delimited Data in Apache Drill 1.6.0Working with Delimited Data in Apache Drill 1.6.0
Working with Delimited Data in Apache Drill 1.6.0
 
MapR 5.2 Product Update
MapR 5.2 Product UpdateMapR 5.2 Product Update
MapR 5.2 Product Update
 

Viewers also liked

Spark Internals - Hadoop Source Code Reading #16 in Japan
Spark Internals - Hadoop Source Code Reading #16 in JapanSpark Internals - Hadoop Source Code Reading #16 in Japan
Spark Internals - Hadoop Source Code Reading #16 in Japan
Taro L. Saito
 

Viewers also liked (11)

Apache spark core
Apache spark coreApache spark core
Apache spark core
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 
Spark streaming state of the union
Spark streaming state of the unionSpark streaming state of the union
Spark streaming state of the union
 
Spark Internals - Hadoop Source Code Reading #16 in Japan
Spark Internals - Hadoop Source Code Reading #16 in JapanSpark Internals - Hadoop Source Code Reading #16 in Japan
Spark Internals - Hadoop Source Code Reading #16 in Japan
 
Introduction to Spark - DataFactZ
Introduction to Spark - DataFactZIntroduction to Spark - DataFactZ
Introduction to Spark - DataFactZ
 
Build a Time Series Application with Apache Spark and Apache HBase
Build a Time Series Application with Apache Spark and Apache  HBaseBuild a Time Series Application with Apache Spark and Apache  HBase
Build a Time Series Application with Apache Spark and Apache HBase
 
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
 
Apache Spark Core
Apache Spark CoreApache Spark Core
Apache Spark Core
 
Apache Spark An Overview
Apache Spark An OverviewApache Spark An Overview
Apache Spark An Overview
 
Zero to Streaming: Spark and Cassandra
Zero to Streaming: Spark and CassandraZero to Streaming: Spark and Cassandra
Zero to Streaming: Spark and Cassandra
 
Applying Machine Learning to Live Patient Data
Applying Machine Learning to  Live Patient DataApplying Machine Learning to  Live Patient Data
Applying Machine Learning to Live Patient Data
 

Similar to Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API and the HBase API

Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
MapR Technologies
 

Similar to Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API and the HBase API (20)

How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications
 
Advanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming DataAdvanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming Data
 
Map r seattle streams meetup oct 2016
Map r seattle streams meetup   oct 2016Map r seattle streams meetup   oct 2016
Map r seattle streams meetup oct 2016
 
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
 
How Spark is Enabling the New Wave of Converged Applications
How Spark is Enabling  the New Wave of Converged ApplicationsHow Spark is Enabling  the New Wave of Converged Applications
How Spark is Enabling the New Wave of Converged Applications
 
Fast Cars, Big Data - How Streaming Can Help Formula 1
Fast Cars, Big Data - How Streaming Can Help Formula 1Fast Cars, Big Data - How Streaming Can Help Formula 1
Fast Cars, Big Data - How Streaming Can Help Formula 1
 
Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1
 
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
 
Paradigmas de procesamiento en Big Data: estado actual, tendencias y oportu...
Paradigmas de procesamiento en  Big Data: estado actual,  tendencias y oportu...Paradigmas de procesamiento en  Big Data: estado actual,  tendencias y oportu...
Paradigmas de procesamiento en Big Data: estado actual, tendencias y oportu...
 
Structured Streaming Data Pipeline Using Kafka, Spark, and MapR-DB
Structured Streaming Data Pipeline Using Kafka, Spark, and MapR-DBStructured Streaming Data Pipeline Using Kafka, Spark, and MapR-DB
Structured Streaming Data Pipeline Using Kafka, Spark, and MapR-DB
 
The Keys to Digital Transformation
The Keys to Digital TransformationThe Keys to Digital Transformation
The Keys to Digital Transformation
 
Querying Network Packet Captures with Spark and Drill
Querying Network Packet Captures with Spark and DrillQuerying Network Packet Captures with Spark and Drill
Querying Network Packet Captures with Spark and Drill
 
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
 
Real World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in ProductionReal World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in Production
 
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
 
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
 
Streaming in the Extreme
Streaming in the ExtremeStreaming in the Extreme
Streaming in the Extreme
 
Is Spark Replacing Hadoop
Is Spark Replacing HadoopIs Spark Replacing Hadoop
Is Spark Replacing Hadoop
 
CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and Rain
 

More from Carol McDonald

More from Carol McDonald (19)

Introduction to machine learning with GPUs
Introduction to machine learning with GPUsIntroduction to machine learning with GPUs
Introduction to machine learning with GPUs
 
Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...
Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...
Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...
 
Analyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DB
Analyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DBAnalyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DB
Analyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DB
 
Analysis of Popular Uber Locations using Apache APIs: Spark Machine Learning...
Analysis of Popular Uber Locations using Apache APIs:  Spark Machine Learning...Analysis of Popular Uber Locations using Apache APIs:  Spark Machine Learning...
Analysis of Popular Uber Locations using Apache APIs: Spark Machine Learning...
 
Predicting Flight Delays with Spark Machine Learning
Predicting Flight Delays with Spark Machine LearningPredicting Flight Delays with Spark Machine Learning
Predicting Flight Delays with Spark Machine Learning
 
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
 
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
 
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
 
How Big Data is Reducing Costs and Improving Outcomes in Health Care
How Big Data is Reducing Costs and Improving Outcomes in Health CareHow Big Data is Reducing Costs and Improving Outcomes in Health Care
How Big Data is Reducing Costs and Improving Outcomes in Health Care
 
Demystifying AI, Machine Learning and Deep Learning
Demystifying AI, Machine Learning and Deep LearningDemystifying AI, Machine Learning and Deep Learning
Demystifying AI, Machine Learning and Deep Learning
 
Spark graphx
Spark graphxSpark graphx
Spark graphx
 
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
 
Streaming patterns revolutionary architectures
Streaming patterns revolutionary architectures Streaming patterns revolutionary architectures
Streaming patterns revolutionary architectures
 
Spark machine learning predicting customer churn
Spark machine learning predicting customer churnSpark machine learning predicting customer churn
Spark machine learning predicting customer churn
 
Streaming Patterns Revolutionary Architectures with the Kafka API
Streaming Patterns Revolutionary Architectures with the Kafka APIStreaming Patterns Revolutionary Architectures with the Kafka API
Streaming Patterns Revolutionary Architectures with the Kafka API
 
Apache Spark Machine Learning Decision Trees
Apache Spark Machine Learning Decision TreesApache Spark Machine Learning Decision Trees
Apache Spark Machine Learning Decision Trees
 
Apache Spark Machine Learning
Apache Spark Machine LearningApache Spark Machine Learning
Apache Spark Machine Learning
 
Machine Learning Recommendations with Spark
Machine Learning Recommendations with SparkMachine Learning Recommendations with Spark
Machine Learning Recommendations with Spark
 
CU9411MW.DOC
CU9411MW.DOCCU9411MW.DOC
CU9411MW.DOC
 

Recently uploaded

TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 

Recently uploaded (20)

call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 

Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API and the HBase API

  • 1. ® © 2016 MapR Technologies 1® © 2016 MapR Technologies 1© 2016 MapR Technologies ® Exploring Data Pipelines for Spark Streaming Applications Carol McDonald, Industry Solutions Architect 2016
  • 2. ® © 2016 MapR Technologies 2® © 2016 MapR Technologies 2 What is Streaming Data? Got Some Examples? Data Collection Devices Smart Machinery Phones and Tablets Home Automation RFID Systems Digital Signage Security Systems Medical Devices
  • 3. ® © 2016 MapR Technologies 3® © 2016 MapR Technologies 3 It was hot at 6:05 yesterday ! Why Stream Processing? Analyze 6:01 P.M.: 72° 6:02 P.M.: 75° 6:03 P.M.: 77° 6:04 P.M.: 85° 6:05 P.M.: 90° 6:06 P.M.: 85° 6:07 P.M.: 77° 6:08 P.M.: 75° 90°90° 6:01 P.M.: 72° 6:02 P.M.: 75° 6:03 P.M.: 77° 6:04 P.M.: 85° 6:05 P.M.: 90° 6:06 P.M.: 85° 6:07 P.M.: 77° 6:08 P.M.: 75° Batch processing may be too late for some events
  • 4. ® © 2016 MapR Technologies 4® © 2016 MapR Technologies 4 Why Stream Processing? 6:05 P.M.: 90° To pic Stream Temperature Turn on the air conditioning! It’s becoming important to process events as they arrive
  • 5. ® © 2016 MapR Technologies 5® © 2016 MapR Technologies 5 Key to Real Time: Event-based Data Flows web events etc… machine sensors Biometrics Mobile events
  • 6. ® © 2016 MapR Technologies 6® © 2016 MapR Technologies 6 What if BP had detected problems before the oil hit the water ? •  1M samples/sec •  High performance at scale is necessary!
  • 7. ® © 2016 MapR Technologies 7® © 2016 MapR Technologies 7 Use Case: Time Series Data Data for real-time monitoring read Sensor time-stamped data Spark processing Spark Streaming Stream Topic
  • 8. ® © 2016 MapR Technologies 8® © 2016 MapR Technologies 8 Schema •  All events stored, CF data could be set to expire data •  Filtered alerts put in CF alerts •  Daily summaries put in CF stats Row key CF data CF alerts CF stats hz … psi psi … hz_avg … psi_min COHUTTA_3/10/14_1:01 10.37 84 0 COHUTTA_3/10/14 10 0 Row Key contains oil pump name, date, and a time stamp
  • 9. ® © 2016 MapR Technologies 9® © 2016 MapR Technologies 9 Schema •  All events stored, CF data could be set to expire data •  Filtered alerts put in CF alerts •  Daily summaries put in CF stats Row key CF data CF alerts CF stats hz … psi psi … hz_avg … psi_min COHUTTA_3/10/14_1:01 10.37 84 0 COHUTTA_3/10/14 10 0
  • 10. ® © 2016 MapR Technologies 10® © 2016 MapR Technologies 10 Schema •  All events stored, CF data could be set to expire data •  Filtered alerts put in CF alerts •  Daily summaries put in CF stats Row key CF data CF alerts CF stats hz … psi psi … hz_avg … psi_min COHUTTA_3/10/14_1:01 10.37 84 0 COHUTTA_3/10/14 10 0
  • 11. ® © 2016 MapR Technologies 11® © 2016 MapR Technologies 11 Serve DataStore DataCollect Data What Do We Need to Do ? Process DataData Sources ? ? ? ?
  • 12. ® © 2016 MapR Technologies 12® © 2016 MapR Technologies 12 How do we do this with High Performance at Scale? •  Parallel operations and minimize disk read/write time
  • 13. ® © 2016 MapR Technologies 13® © 2016 MapR Technologies 13 Collect the Data Data Ingest MapR-FS Source Stream Topic •  Data Ingest: –  File Based: NFS with MapR-FS, HDFS –  Network Based: MapR Streams, Kafka, Kinesis, Twitter, Sockets...
  • 14. ® © 2016 MapR Technologies 14® © 2016 MapR Technologies 14 MapR Streams Publish Subscribe Messaging Topics Organize Events into Categories and decouple Producers from Consumers
  • 15. ® © 2016 MapR Technologies 15® © 2016 MapR Technologies 15 Scalable Messaging with MapR Streams Topics are partitioned for throughput and scalability
  • 16. ® © 2016 MapR Technologies 16® © 2016 MapR Technologies 16 How do we do this with High Performance at Scale? •  Parallel , Partitioned = fast , scalable –  Messaging with MapR Streams
  • 17. ® © 2016 MapR Technologies 17® © 2016 MapR Technologies 17 Collect Data Process the Data with Spark Streaming MapR-FS Process Data Stream Topic •  Extension of the core Spark AP •  Enables scalable, high-throughput, fault-tolerant stream processing of live data
  • 18. ® © 2016 MapR Technologies 18® © 2016 MapR Technologies 18 Processing Spark DStreams Data stream divided into batches of X milliseconds = DStreams
  • 19. ® © 2016 MapR Technologies 19® © 2016 MapR Technologies 19 Spark Resilient Distributed Datasets RDD W Executor P4 W Executor P1 P3 W Executor P2 partitioned Partition 1 8213034705, 95, 2.927373, jake7870, 0…… Partition 2 8213034705, 115, 2.943484, Davidbresler2, 1…. Partition 3 8213034705, 100, 2.951285, gladimacowgirl, 58… Partition 4 8213034705, 117, 2.998947, daysrus, 95…. Spark revolves around RDDs •  Read only collection of elements •  Partitioned across a cluster •  Operated on in parallel •  Cached in memory
  • 20. ® © 2016 MapR Technologies 20® © 2016 MapR Technologies 20 Spark Resilient Distributed Datasets Spark revolves around RDDs •  Read only collection of elements •  Partitioned across a cluster •  Operated on in parallel •  Cached in memory
  • 21. ® © 2016 MapR Technologies 21® © 2016 MapR Technologies 21 How do we do this with High Performance at Scale? •  Parallel , Partitioned = fast , scalable –  Processing with Spark
  • 22. ® © 2016 MapR Technologies 22® © 2016 MapR Technologies 22 Processing Spark DStreams transformations à create new RDDs Two types of operations on DStreams: •  Transformations: –  Create new DStreams –  map, filter, reduceByKey, SQL. . . •  Output Operations DStream RDDs DStream RDDs transform  transform   data from time 0 to 1 RDD @ time 1 data from time 1 to 2 RDD @ time 2 data from time 2 to 3 RDD @ time 3 RDD @ time 3 transform   RDD @ time 1 RDD @ time 2
  • 23. ® © 2016 MapR Technologies 23® © 2016 MapR Technologies 23 Two types of operations on DStreams •  Transformations •  Output Operations: trigger Computation –  Save to File, HBase.. •  saveAsHadoopFiles •  saveAsHadoopDataset •  saveAsTextFiles Processing Spark DStreams Output operations à trigger computation MapR-FS MapR-DB DStream RDDs data from time 0 to 1 data from time 1 to 2 data from time 2 to 3 RDD @ time 3RDD @ time 1 RDD @ time 2 mapmap map savesave save
  • 24. ® © 2016 MapR Technologies 24® © 2016 MapR Technologies 24 Serve DataStore DataCollect Data What Do We Need to Do ? MapR-FS Process DataData Sources MapR-FS Stream Topic
  • 25. ® © 2016 MapR Technologies 25® © 2016 MapR Technologies 25 MapR-DB (HBase API) is Designed to Scale Key Range xxxx xxxx Key Range xxxx xxxx Key Range xxxx xxxx Key colB col C val val val xxx val val Key colB col C val val val xxx val val Key colB col C val val val xxx val val Fast Reads and Writes by Key! Data is automatically partitioned by Key Range!
  • 26. ® © 2016 MapR Technologies 26® © 2016 MapR Technologies 26 Store Lots of Data with NoSQL MapR-DB bottleneck Key colB col C val val val xxx val val Key colB col C val val val xxx val val Key colB col C val val val xxx val val Storage ModelRDBMS MapR-DB Normalized schema à Joins for queries can cause bottleneck De-Normalized schema à Data that is read together is stored together
  • 27. ® © 2016 MapR Technologies 27® © 2016 MapR Technologies 27 Key to Real Time: Event-based Data Flows Key to Scale = Parallel Partitioned: •  Messaging •  Processing •  Storage
  • 28. ® © 2016 MapR Technologies 28® © 2016 MapR Technologies 28 Serve DataStore DataCollect Data What Do We Need to Do ? MapR-FS Process DataData Sources MapR-FS Stream Topic
  • 29. ® © 2016 MapR Technologies 29® © 2016 MapR Technologies 29 Use Case Example Code Data for real-time monitoring read Sensor time-stamped data Spark processing Spark Streaming Stream Topic
  • 30. ® © 2016 MapR Technologies 30® © 2016 MapR Technologies 30 Use Case Example Code Data for real-time monitoring read Sensor time-stamped data Spark processing Spark Streaming Stream Topic
  • 31. ® © 2016 MapR Technologies 31® © 2016 MapR Technologies 31 KafkaProducer String topic=“/streams/pump:warning”; public static KafkaProducer producer; Properties properties = new Properties(); properties.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer"); // Instantiate KafkaProducer with properties producer = new KafkaProducer<String, String>(properties); String txt = “msg text”; ProducerRecord<String, String> rec = new ProducerRecord<String, String>(topic, txt); producer.send(rec);
  • 32. ® © 2016 MapR Technologies 32® © 2016 MapR Technologies 32 Use Case Example Code Data for real-time monitoring read Sensor time-stamped data Spark processing Spark Streaming Stream Topic
  • 33. ® © 2016 MapR Technologies 33® © 2016 MapR Technologies 33 Create a DStream DStream: a sequence of RDDs representing a stream of data val ssc = new StreamingContext(sparkConf, Seconds(5)) val dStream = KafkaUtils.createDirectStream[String, String](ssc, kafkaParams, topicsSet) batch time 0 to 1 batch time 1 to 2 batch time 2 to 3 dStream Stored in memory as an RDD
  • 34. ® © 2016 MapR Technologies 34® © 2016 MapR Technologies 34 Process DStream val sensorDStream = dStream.map(_._2).map(parseSensor) dStream RDDs batch time 2 to 3 batch time 1 to 2 batch time 0 to 1 sensorDStream RDDs New RDDs created for every batch map map map
  • 35. ® © 2016 MapR Technologies 35® © 2016 MapR Technologies 35 Message Data to Sensor Object case class Sensor(resid: String, date: String, time: String, hz: Double, disp: Double, flo: Double, sedPPM: Double, psi: Double, chlPPM: Double) def parseSensor(str: String): Sensor = { val p = str.split(",") Sensor(p(0), p(1), p(2), p(3).toDouble, p(4).toDouble, p(5).toDouble, p(6).toDouble, p(7).toDouble, p(8).toDouble) }
  • 36. ® © 2016 MapR Technologies 36® © 2016 MapR Technologies 36 DataFrame and SQL Operations // for Each RDD sensorDStream.foreachRDD { rdd => val sqlContext = SQLContext.getOrCreate(rdd.sparkContext) rdd.toDF().registerTempTable("sensor") val res = sqlContext.sql( "SELECT resid, date, max(hz) as maxhz, min(hz) as minhz, avg(hz) as avghz, max(disp) as maxdisp, min(disp) as mindisp, avg(disp) as avgdisp, max(flo) as maxflo, min(flo) as minflo, avg(flo) as avgflo, max(psi) as maxpsi, min(psi) as minpsi, avg(psi) as avgpsi FROM sensor GROUP BY resid,date") res.show() }
  • 37. ® © 2016 MapR Technologies 37® © 2016 MapR Technologies 37 Streaming Application Output
  • 38. ® © 2016 MapR Technologies 38® © 2016 MapR Technologies 38 Save to HBase rdd.map(Sensor.convertToPut).saveAsHadoopDataset(jobConfig) linesRDD DStream sensorRDD DStream output operation: persist data to external storage Put objects written to HBase batch time 2-3 batch time 1 to 2 batch time 0 to 1 mapmap map savesave save
  • 39. ® © 2016 MapR Technologies 39® © 2016 MapR Technologies 39 Start Receiving Data sensorDStream.foreachRDD { rdd => . . . } // Start the computation ssc.start() // Wait for the computation to terminate ssc.awaitTermination()
  • 40. ® © 2016 MapR Technologies 40® © 2016 MapR Technologies 40 Stream Processing Building a Complete Data Architecture MapR File System (MapR-FS) MapR Converged Data Platform MapR Database (MapR-DB) MapR Streams Sources/Apps Bulk Processing
  • 41. ® © 2016 MapR Technologies 41® © 2016 MapR Technologies 41 To Learn More: •  Read explanation of and Download code –  https://www.mapr.com/blog/fast-scalable-streaming-applications-mapr-streams- spark-streaming-and-mapr-db –  https://www.mapr.com/blog/spark-streaming-hbase
  • 42. ® © 2016 MapR Technologies 42® © 2016 MapR Technologies 42 To Learn More: •  http://learn.mapr.com/
  • 43. ® © 2016 MapR Technologies 43® © 2016 MapR Technologies 43 Q&A @mapr @caroljmcdonald https://www.mapr.com/blog/author/carol-mcdonald Engage with us! mapr-technologies