SlideShare a Scribd company logo
1 of 43
®
© 2016 MapR Technologies 1®
© 2016 MapR Technologies 1© 2016 MapR Technologies
®
Exploring Data Pipelines for Spark Streaming Applications
Carol McDonald, Industry Solutions Architect
2016
®
© 2016 MapR Technologies 2®
© 2016 MapR Technologies 2
What is Streaming Data? Got Some Examples?
Data Collection
Devices
Smart Machinery Phones and Tablets Home Automation
RFID Systems Digital Signage Security Systems Medical Devices
®
© 2016 MapR Technologies 3®
© 2016 MapR Technologies 3
It was hot
at 6:05
yesterday
!
Why Stream Processing?
Analyze
6:01 P.M.:
72°
6:02 P.M.:
75°
6:03 P.M.: 77°
6:04 P.M.: 85°
6:05 P.M.: 90°
6:06 P.M.: 85°
6:07 P.M.: 77°
6:08 P.M.: 75°
90°90°
6:01 P.M.: 72°
6:02 P.M.: 75°
6:03 P.M.: 77°
6:04 P.M.: 85°
6:05 P.M.: 90°
6:06 P.M.: 85°
6:07 P.M.: 77°
6:08 P.M.: 75°
Batch processing may be too late for some events
®
© 2016 MapR Technologies 4®
© 2016 MapR Technologies 4
Why Stream Processing?
6:05 P.M.: 90°
To
pic
Stream
Temperature
Turn on the
air
conditioning!
It’s becoming important to process events as they arrive
®
© 2016 MapR Technologies 5®
© 2016 MapR Technologies 5
Key to Real Time: Event-based Data Flows
web events
etc…
machine sensors
Biometrics
Mobile events
®
© 2016 MapR Technologies 6®
© 2016 MapR Technologies 6
What if BP had detected problems before the oil hit
the water ?
•  1M samples/sec
•  High performance at
scale is necessary!
®
© 2016 MapR Technologies 7®
© 2016 MapR Technologies 7
Use Case: Time Series Data
Data for
real-time monitoring
read
Sensor
time-stamped data Spark processing
Spark
Streaming
Stream
Topic
®
© 2016 MapR Technologies 8®
© 2016 MapR Technologies 8
Schema
•  All events stored, CF data could be set to expire data
•  Filtered alerts put in CF alerts
•  Daily summaries put in CF stats
Row key
CF data CF alerts CF stats
hz … psi psi … hz_avg … psi_min
COHUTTA_3/10/14_1:01 10.37 84 0
COHUTTA_3/10/14 10 0
Row Key contains oil
pump name, date, and
a time stamp
®
© 2016 MapR Technologies 9®
© 2016 MapR Technologies 9
Schema
•  All events stored, CF data could be set to expire data
•  Filtered alerts put in CF alerts
•  Daily summaries put in CF stats
Row key
CF data CF alerts CF stats
hz … psi psi … hz_avg … psi_min
COHUTTA_3/10/14_1:01 10.37 84 0
COHUTTA_3/10/14 10 0
®
© 2016 MapR Technologies 10®
© 2016 MapR Technologies 10
Schema
•  All events stored, CF data could be set to expire data
•  Filtered alerts put in CF alerts
•  Daily summaries put in CF stats
Row key
CF data CF alerts CF stats
hz … psi psi … hz_avg … psi_min
COHUTTA_3/10/14_1:01 10.37 84 0
COHUTTA_3/10/14 10 0
®
© 2016 MapR Technologies 11®
© 2016 MapR Technologies 11
Serve DataStore DataCollect Data
What Do We Need to Do ?
Process DataData Sources
? ? ? ?
®
© 2016 MapR Technologies 12®
© 2016 MapR Technologies 12
How do we do this with High Performance at Scale?
•  Parallel operations and minimize disk read/write time
®
© 2016 MapR Technologies 13®
© 2016 MapR Technologies 13
Collect the Data
Data Ingest
MapR-FS
Source
Stream
Topic
•  Data Ingest:
–  File Based: NFS with MapR-FS,
HDFS
–  Network Based: MapR Streams,
Kafka, Kinesis, Twitter, Sockets...
®
© 2016 MapR Technologies 14®
© 2016 MapR Technologies 14
MapR Streams Publish Subscribe Messaging
Topics Organize Events into Categories
and decouple Producers from Consumers
®
© 2016 MapR Technologies 15®
© 2016 MapR Technologies 15
Scalable Messaging with MapR Streams
Topics are partitioned for throughput and scalability
®
© 2016 MapR Technologies 16®
© 2016 MapR Technologies 16
How do we do this with High Performance at Scale?
•  Parallel , Partitioned = fast , scalable
–  Messaging with MapR Streams
®
© 2016 MapR Technologies 17®
© 2016 MapR Technologies 17
Collect Data
Process the Data with Spark Streaming
MapR-FS
Process Data
Stream
Topic
•  Extension of the core Spark AP
•  Enables scalable, high-throughput,
fault-tolerant stream processing of
live data
®
© 2016 MapR Technologies 18®
© 2016 MapR Technologies 18
Processing Spark DStreams
Data stream divided into batches of X milliseconds = DStreams
®
© 2016 MapR Technologies 19®
© 2016 MapR Technologies 19
Spark Resilient Distributed Datasets
RDD
W
Executor
P4
W
Executor
P1 P3
W
Executor
P2
partitioned
Partition 1
8213034705, 95,
2.927373,
jake7870, 0……
Partition 2
8213034705,
115, 2.943484,
Davidbresler2,
1….
Partition 3
8213034705,
100, 2.951285,
gladimacowgirl,
58…
Partition 4
8213034705,
117, 2.998947,
daysrus, 95….
Spark revolves around RDDs
•  Read only collection of elements
•  Partitioned across a cluster
•  Operated on in parallel
•  Cached in memory
®
© 2016 MapR Technologies 20®
© 2016 MapR Technologies 20
Spark Resilient Distributed Datasets
Spark revolves around RDDs
•  Read only collection of elements
•  Partitioned across a cluster
•  Operated on in parallel
•  Cached in memory
®
© 2016 MapR Technologies 21®
© 2016 MapR Technologies 21
How do we do this with High Performance at Scale?
•  Parallel , Partitioned = fast , scalable
–  Processing with Spark
®
© 2016 MapR Technologies 22®
© 2016 MapR Technologies 22
Processing Spark DStreams
transformations à create new RDDs
Two types of operations on DStreams:
•  Transformations:
–  Create new DStreams
–  map, filter, reduceByKey, SQL. . .
•  Output Operations
DStream
RDDs
DStream
RDDs
transform	
  transform	
  
data from
time 0 to 1
RDD @ time 1
data from
time 1 to 2
RDD @ time 2
data from
time 2 to 3
RDD @ time 3
RDD @ time 3
transform	
  
RDD @ time 1 RDD @ time 2
®
© 2016 MapR Technologies 23®
© 2016 MapR Technologies 23
Two types of operations on DStreams
•  Transformations
•  Output Operations: trigger
Computation
–  Save to File, HBase..
•  saveAsHadoopFiles
•  saveAsHadoopDataset
•  saveAsTextFiles
Processing Spark DStreams
Output operations à trigger computation
MapR-FS
MapR-DB
DStream
RDDs
data from
time 0 to 1
data from
time 1 to 2
data from
time 2 to 3
RDD @ time 3RDD @ time 1 RDD @ time 2
mapmap map
savesave save
®
© 2016 MapR Technologies 24®
© 2016 MapR Technologies 24
Serve DataStore DataCollect Data
What Do We Need to Do ?
MapR-FS
Process DataData Sources
MapR-FS
Stream
Topic
®
© 2016 MapR Technologies 25®
© 2016 MapR Technologies 25
MapR-DB (HBase API) is Designed to Scale
Key
Range
xxxx
xxxx
Key
Range
xxxx
xxxx
Key
Range
xxxx
xxxx
Key colB col
C
val val val
xxx val val
Key colB col
C
val val val
xxx val val
Key colB col
C
val val val
xxx val val
Fast Reads and Writes by Key! Data is automatically partitioned
by Key Range!
®
© 2016 MapR Technologies 26®
© 2016 MapR Technologies 26
Store Lots of Data with NoSQL MapR-DB
bottleneck
Key colB col
C
val val val
xxx val val
Key colB col
C
val val val
xxx val val
Key colB col
C
val val val
xxx val val
Storage ModelRDBMS MapR-DB
Normalized schema à Joins for
queries can cause bottleneck De-Normalized schema à Data that
is read together is stored together
®
© 2016 MapR Technologies 27®
© 2016 MapR Technologies 27
Key to Real Time: Event-based Data Flows
Key to Scale = Parallel Partitioned:
•  Messaging
•  Processing
•  Storage
®
© 2016 MapR Technologies 28®
© 2016 MapR Technologies 28
Serve DataStore DataCollect Data
What Do We Need to Do ?
MapR-FS
Process DataData Sources
MapR-FS
Stream
Topic
®
© 2016 MapR Technologies 29®
© 2016 MapR Technologies 29
Use Case Example Code
Data for
real-time monitoring
read
Sensor
time-stamped data Spark processing
Spark
Streaming
Stream
Topic
®
© 2016 MapR Technologies 30®
© 2016 MapR Technologies 30
Use Case Example Code
Data for
real-time monitoring
read
Sensor
time-stamped data Spark processing
Spark
Streaming
Stream
Topic
®
© 2016 MapR Technologies 31®
© 2016 MapR Technologies 31
KafkaProducer
String topic=“/streams/pump:warning”;
public static KafkaProducer producer;
Properties properties = new Properties();
properties.put("value.serializer",
"org.apache.kafka.common.serialization.StringSerializer");
// Instantiate KafkaProducer with properties
producer = new KafkaProducer<String, String>(properties);
String txt = “msg text”;
ProducerRecord<String, String> rec = new
ProducerRecord<String, String>(topic, txt);
producer.send(rec);
®
© 2016 MapR Technologies 32®
© 2016 MapR Technologies 32
Use Case Example Code
Data for
real-time monitoring
read
Sensor
time-stamped data Spark processing
Spark
Streaming
Stream
Topic
®
© 2016 MapR Technologies 33®
© 2016 MapR Technologies 33
Create a DStream
DStream: a sequence of RDDs
representing a stream of data
val ssc = new StreamingContext(sparkConf, Seconds(5))
val dStream = KafkaUtils.createDirectStream[String,
String](ssc, kafkaParams, topicsSet)
batch
time 0 to 1
batch
time 1 to 2
batch
time 2 to 3
dStream
Stored in memory
as an RDD
®
© 2016 MapR Technologies 34®
© 2016 MapR Technologies 34
Process DStream
val sensorDStream = dStream.map(_._2).map(parseSensor)
dStream RDDs
batch
time 2 to 3
batch
time 1 to 2
batch
time 0 to 1
sensorDStream RDDs
New RDDs created
for every batch
map map map
®
© 2016 MapR Technologies 35®
© 2016 MapR Technologies 35
Message Data to Sensor Object
case class Sensor(resid: String, date: String, time: String,
hz: Double, disp: Double, flo: Double, sedPPM: Double,
psi: Double, chlPPM: Double)
def parseSensor(str: String): Sensor = {
val p = str.split(",")
Sensor(p(0), p(1), p(2), p(3).toDouble, p(4).toDouble, p(5).toDouble,
p(6).toDouble, p(7).toDouble, p(8).toDouble)
}
®
© 2016 MapR Technologies 36®
© 2016 MapR Technologies 36
DataFrame and SQL Operations
// for Each RDD
sensorDStream.foreachRDD { rdd =>
val sqlContext = SQLContext.getOrCreate(rdd.sparkContext)
rdd.toDF().registerTempTable("sensor")
val res = sqlContext.sql( "SELECT resid, date,
max(hz) as maxhz, min(hz) as minhz, avg(hz) as avghz,
max(disp) as maxdisp, min(disp) as mindisp, avg(disp) as avgdisp,
max(flo) as maxflo, min(flo) as minflo, avg(flo) as avgflo,
max(psi) as maxpsi, min(psi) as minpsi, avg(psi) as avgpsi
FROM sensor GROUP BY resid,date")
res.show()
}
®
© 2016 MapR Technologies 37®
© 2016 MapR Technologies 37
Streaming Application Output
®
© 2016 MapR Technologies 38®
© 2016 MapR Technologies 38
Save to HBase
rdd.map(Sensor.convertToPut).saveAsHadoopDataset(jobConfig)
linesRDD DStream
sensorRDD DStream
output operation: persist
data to external storage
Put objects written
to HBase
batch
time 2-3
batch
time 1 to 2
batch
time 0 to 1
mapmap map
savesave save
®
© 2016 MapR Technologies 39®
© 2016 MapR Technologies 39
Start Receiving Data
sensorDStream.foreachRDD { rdd =>
. . .
}
// Start the computation
ssc.start()
// Wait for the computation to terminate
ssc.awaitTermination()
®
© 2016 MapR Technologies 40®
© 2016 MapR Technologies 40
Stream Processing
Building a Complete Data Architecture
MapR File System
(MapR-FS)
MapR Converged Data Platform
MapR Database
(MapR-DB)
MapR Streams
Sources/Apps Bulk Processing
®
© 2016 MapR Technologies 41®
© 2016 MapR Technologies 41
To Learn More:
•  Read explanation of and Download code
–  https://www.mapr.com/blog/fast-scalable-streaming-applications-mapr-streams-
spark-streaming-and-mapr-db
–  https://www.mapr.com/blog/spark-streaming-hbase
®
© 2016 MapR Technologies 42®
© 2016 MapR Technologies 42
To Learn More:
•  http://learn.mapr.com/
®
© 2016 MapR Technologies 43®
© 2016 MapR Technologies 43
Q&A
@mapr
@caroljmcdonald
https://www.mapr.com/blog/author/carol-mcdonald
Engage with us!
mapr-technologies

More Related Content

What's hot

Apache Spark streaming and HBase
Apache Spark streaming and HBaseApache Spark streaming and HBase
Apache Spark streaming and HBaseCarol McDonald
 
Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1Carol McDonald
 
Advanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming DataAdvanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming DataCarol McDonald
 
Introduction to Spark on Hadoop
Introduction to Spark on HadoopIntroduction to Spark on Hadoop
Introduction to Spark on HadoopCarol McDonald
 
NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill Carol McDonald
 
NoSQL Application Development with JSON and MapR-DB
NoSQL Application Development with JSON and MapR-DBNoSQL Application Development with JSON and MapR-DB
NoSQL Application Development with JSON and MapR-DBMapR Technologies
 
Mathematical bridges From Old to New
Mathematical bridges From Old to NewMathematical bridges From Old to New
Mathematical bridges From Old to NewMapR Technologies
 
Deep Learning vs. Cheap Learning
Deep Learning vs. Cheap LearningDeep Learning vs. Cheap Learning
Deep Learning vs. Cheap LearningMapR Technologies
 
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Codemotion
 
Apache Spark Machine Learning Decision Trees
Apache Spark Machine Learning Decision TreesApache Spark Machine Learning Decision Trees
Apache Spark Machine Learning Decision TreesCarol McDonald
 
Dealing with an Upside Down Internet
Dealing with an Upside Down InternetDealing with an Upside Down Internet
Dealing with an Upside Down InternetMapR Technologies
 
Big Data Everywhere Chicago: SQL on Hadoop
Big Data Everywhere Chicago: SQL on Hadoop Big Data Everywhere Chicago: SQL on Hadoop
Big Data Everywhere Chicago: SQL on Hadoop BigDataEverywhere
 
Applying Machine Learning to Live Patient Data
Applying Machine Learning to  Live Patient DataApplying Machine Learning to  Live Patient Data
Applying Machine Learning to Live Patient DataCarol McDonald
 
When Streaming Becomes Strategic
When Streaming Becomes StrategicWhen Streaming Becomes Strategic
When Streaming Becomes StrategicMapR Technologies
 
How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications MapR Technologies
 
Cognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approachesCognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approachesTed Dunning
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...DataWorks Summit/Hadoop Summit
 
Goto amsterdam-2013-skinned
Goto amsterdam-2013-skinnedGoto amsterdam-2013-skinned
Goto amsterdam-2013-skinnedTed Dunning
 
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Carol McDonald
 

What's hot (20)

Apache Spark streaming and HBase
Apache Spark streaming and HBaseApache Spark streaming and HBase
Apache Spark streaming and HBase
 
Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1
 
Advanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming DataAdvanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming Data
 
Introduction to Spark on Hadoop
Introduction to Spark on HadoopIntroduction to Spark on Hadoop
Introduction to Spark on Hadoop
 
NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill
 
NoSQL Application Development with JSON and MapR-DB
NoSQL Application Development with JSON and MapR-DBNoSQL Application Development with JSON and MapR-DB
NoSQL Application Development with JSON and MapR-DB
 
Mathematical bridges From Old to New
Mathematical bridges From Old to NewMathematical bridges From Old to New
Mathematical bridges From Old to New
 
Deep Learning vs. Cheap Learning
Deep Learning vs. Cheap LearningDeep Learning vs. Cheap Learning
Deep Learning vs. Cheap Learning
 
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
 
Apache Spark Machine Learning Decision Trees
Apache Spark Machine Learning Decision TreesApache Spark Machine Learning Decision Trees
Apache Spark Machine Learning Decision Trees
 
Dealing with an Upside Down Internet
Dealing with an Upside Down InternetDealing with an Upside Down Internet
Dealing with an Upside Down Internet
 
Big Data Everywhere Chicago: SQL on Hadoop
Big Data Everywhere Chicago: SQL on Hadoop Big Data Everywhere Chicago: SQL on Hadoop
Big Data Everywhere Chicago: SQL on Hadoop
 
Applying Machine Learning to Live Patient Data
Applying Machine Learning to  Live Patient DataApplying Machine Learning to  Live Patient Data
Applying Machine Learning to Live Patient Data
 
When Streaming Becomes Strategic
When Streaming Becomes StrategicWhen Streaming Becomes Strategic
When Streaming Becomes Strategic
 
How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications
 
Cognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approachesCognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approaches
 
MapR 5.2 Product Update
MapR 5.2 Product UpdateMapR 5.2 Product Update
MapR 5.2 Product Update
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
 
Goto amsterdam-2013-skinned
Goto amsterdam-2013-skinnedGoto amsterdam-2013-skinned
Goto amsterdam-2013-skinned
 
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
 

Viewers also liked

HBase Sizing Notes
HBase Sizing NotesHBase Sizing Notes
HBase Sizing Noteslarsgeorge
 
MapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document DatabaseMapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document DatabaseMapR Technologies
 
HBase Sizing Guide
HBase Sizing GuideHBase Sizing Guide
HBase Sizing Guidelarsgeorge
 
HBase Operations and Best Practices
HBase Operations and Best PracticesHBase Operations and Best Practices
HBase Operations and Best PracticesVenu Anuganti
 
Apache Spark Streaming - www.know bigdata.com
Apache Spark Streaming - www.know bigdata.comApache Spark Streaming - www.know bigdata.com
Apache Spark Streaming - www.know bigdata.comknowbigdata
 
An introduction To Apache Spark
An introduction To Apache SparkAn introduction To Apache Spark
An introduction To Apache SparkAmir Sedighi
 
Zeta Architecture: The Next Generation Big Data Architecture
Zeta Architecture: The Next Generation Big Data ArchitectureZeta Architecture: The Next Generation Big Data Architecture
Zeta Architecture: The Next Generation Big Data ArchitectureMapR Technologies
 
Simplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkSimplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkDatabricks
 
Reactive dashboard’s using apache spark
Reactive dashboard’s using apache sparkReactive dashboard’s using apache spark
Reactive dashboard’s using apache sparkRahul Kumar
 
(BDT310) Big Data Architectural Patterns and Best Practices on AWS
(BDT310) Big Data Architectural Patterns and Best Practices on AWS(BDT310) Big Data Architectural Patterns and Best Practices on AWS
(BDT310) Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
 
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...Anton Kirillov
 
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Helena Edelson
 
Apache HBase Performance Tuning
Apache HBase Performance TuningApache HBase Performance Tuning
Apache HBase Performance TuningLars Hofhansl
 

Viewers also liked (15)

HBase Sizing Notes
HBase Sizing NotesHBase Sizing Notes
HBase Sizing Notes
 
MapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document DatabaseMapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document Database
 
HBase Sizing Guide
HBase Sizing GuideHBase Sizing Guide
HBase Sizing Guide
 
Spark streaming
Spark streamingSpark streaming
Spark streaming
 
HBase Operations and Best Practices
HBase Operations and Best PracticesHBase Operations and Best Practices
HBase Operations and Best Practices
 
Apache Spark Streaming - www.know bigdata.com
Apache Spark Streaming - www.know bigdata.comApache Spark Streaming - www.know bigdata.com
Apache Spark Streaming - www.know bigdata.com
 
An introduction To Apache Spark
An introduction To Apache SparkAn introduction To Apache Spark
An introduction To Apache Spark
 
Apache Spark & Streaming
Apache Spark & StreamingApache Spark & Streaming
Apache Spark & Streaming
 
Zeta Architecture: The Next Generation Big Data Architecture
Zeta Architecture: The Next Generation Big Data ArchitectureZeta Architecture: The Next Generation Big Data Architecture
Zeta Architecture: The Next Generation Big Data Architecture
 
Simplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkSimplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache Spark
 
Reactive dashboard’s using apache spark
Reactive dashboard’s using apache sparkReactive dashboard’s using apache spark
Reactive dashboard’s using apache spark
 
(BDT310) Big Data Architectural Patterns and Best Practices on AWS
(BDT310) Big Data Architectural Patterns and Best Practices on AWS(BDT310) Big Data Architectural Patterns and Best Practices on AWS
(BDT310) Big Data Architectural Patterns and Best Practices on AWS
 
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
 
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
 
Apache HBase Performance Tuning
Apache HBase Performance TuningApache HBase Performance Tuning
Apache HBase Performance Tuning
 

Similar to Spark Streaming Data Pipelines

Map r seattle streams meetup oct 2016
Map r seattle streams meetup   oct 2016Map r seattle streams meetup   oct 2016
Map r seattle streams meetup oct 2016Nitin Kumar
 
Free Code Friday - Spark Streaming with HBase
Free Code Friday - Spark Streaming with HBaseFree Code Friday - Spark Streaming with HBase
Free Code Friday - Spark Streaming with HBaseMapR Technologies
 
How Spark is Enabling the New Wave of Converged Applications
How Spark is Enabling  the New Wave of Converged ApplicationsHow Spark is Enabling  the New Wave of Converged Applications
How Spark is Enabling the New Wave of Converged ApplicationsMapR Technologies
 
Fast Cars, Big Data - How Streaming Can Help Formula 1
Fast Cars, Big Data - How Streaming Can Help Formula 1Fast Cars, Big Data - How Streaming Can Help Formula 1
Fast Cars, Big Data - How Streaming Can Help Formula 1Tugdual Grall
 
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!Tugdual Grall
 
Paradigmas de procesamiento en Big Data: estado actual, tendencias y oportu...
Paradigmas de procesamiento en  Big Data: estado actual,  tendencias y oportu...Paradigmas de procesamiento en  Big Data: estado actual,  tendencias y oportu...
Paradigmas de procesamiento en Big Data: estado actual, tendencias y oportu...Facultad de Informática UCM
 
Structured Streaming Data Pipeline Using Kafka, Spark, and MapR-DB
Structured Streaming Data Pipeline Using Kafka, Spark, and MapR-DBStructured Streaming Data Pipeline Using Kafka, Spark, and MapR-DB
Structured Streaming Data Pipeline Using Kafka, Spark, and MapR-DBCarol McDonald
 
The Keys to Digital Transformation
The Keys to Digital TransformationThe Keys to Digital Transformation
The Keys to Digital TransformationMapR Technologies
 
Querying Network Packet Captures with Spark and Drill
Querying Network Packet Captures with Spark and DrillQuerying Network Packet Captures with Spark and Drill
Querying Network Packet Captures with Spark and DrillVince Gonzalez
 
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...MapR Technologies
 
Real World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in ProductionReal World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in ProductionCodemotion
 
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...Debraj GuhaThakurta
 
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...Debraj GuhaThakurta
 
CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016Mathieu Dumoulin
 
MapR 5.2: Getting More Value from the MapR Converged Community Edition
MapR 5.2: Getting More Value from the MapR Converged Community EditionMapR 5.2: Getting More Value from the MapR Converged Community Edition
MapR 5.2: Getting More Value from the MapR Converged Community EditionMapR Technologies
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainMapR Technologies
 
The Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache SparkThe Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache SparkCloudera, Inc.
 
Cassandra + Spark (You’ve got the lighter, let’s start a fire)
Cassandra + Spark (You’ve got the lighter, let’s start a fire)Cassandra + Spark (You’ve got the lighter, let’s start a fire)
Cassandra + Spark (You’ve got the lighter, let’s start a fire)Robert Stupp
 

Similar to Spark Streaming Data Pipelines (20)

Map r seattle streams meetup oct 2016
Map r seattle streams meetup   oct 2016Map r seattle streams meetup   oct 2016
Map r seattle streams meetup oct 2016
 
Free Code Friday - Spark Streaming with HBase
Free Code Friday - Spark Streaming with HBaseFree Code Friday - Spark Streaming with HBase
Free Code Friday - Spark Streaming with HBase
 
How Spark is Enabling the New Wave of Converged Applications
How Spark is Enabling  the New Wave of Converged ApplicationsHow Spark is Enabling  the New Wave of Converged Applications
How Spark is Enabling the New Wave of Converged Applications
 
Fast Cars, Big Data - How Streaming Can Help Formula 1
Fast Cars, Big Data - How Streaming Can Help Formula 1Fast Cars, Big Data - How Streaming Can Help Formula 1
Fast Cars, Big Data - How Streaming Can Help Formula 1
 
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
 
Paradigmas de procesamiento en Big Data: estado actual, tendencias y oportu...
Paradigmas de procesamiento en  Big Data: estado actual,  tendencias y oportu...Paradigmas de procesamiento en  Big Data: estado actual,  tendencias y oportu...
Paradigmas de procesamiento en Big Data: estado actual, tendencias y oportu...
 
Structured Streaming Data Pipeline Using Kafka, Spark, and MapR-DB
Structured Streaming Data Pipeline Using Kafka, Spark, and MapR-DBStructured Streaming Data Pipeline Using Kafka, Spark, and MapR-DB
Structured Streaming Data Pipeline Using Kafka, Spark, and MapR-DB
 
The Keys to Digital Transformation
The Keys to Digital TransformationThe Keys to Digital Transformation
The Keys to Digital Transformation
 
Querying Network Packet Captures with Spark and Drill
Querying Network Packet Captures with Spark and DrillQuerying Network Packet Captures with Spark and Drill
Querying Network Packet Captures with Spark and Drill
 
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
 
Real World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in ProductionReal World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in Production
 
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
 
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
 
Streaming in the Extreme
Streaming in the ExtremeStreaming in the Extreme
Streaming in the Extreme
 
Is Spark Replacing Hadoop
Is Spark Replacing HadoopIs Spark Replacing Hadoop
Is Spark Replacing Hadoop
 
CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016
 
MapR 5.2: Getting More Value from the MapR Converged Community Edition
MapR 5.2: Getting More Value from the MapR Converged Community EditionMapR 5.2: Getting More Value from the MapR Converged Community Edition
MapR 5.2: Getting More Value from the MapR Converged Community Edition
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and Rain
 
The Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache SparkThe Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache Spark
 
Cassandra + Spark (You’ve got the lighter, let’s start a fire)
Cassandra + Spark (You’ve got the lighter, let’s start a fire)Cassandra + Spark (You’ve got the lighter, let’s start a fire)
Cassandra + Spark (You’ve got the lighter, let’s start a fire)
 

More from MapR Technologies

Converging your data landscape
Converging your data landscapeConverging your data landscape
Converging your data landscapeMapR Technologies
 
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationMapR Technologies
 
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataSelf-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataMapR Technologies
 
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureMapR Technologies
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...MapR Technologies
 
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsMapR Technologies
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMapR Technologies
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action MapR Technologies
 
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsLive Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsMapR Technologies
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageMapR Technologies
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionMapR Technologies
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformMapR Technologies
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...MapR Technologies
 
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareBest Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareMapR Technologies
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsMapR Technologies
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Technologies
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data AnalyticsMapR Technologies
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsMapR Technologies
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR Technologies
 
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLEvolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLMapR Technologies
 

More from MapR Technologies (20)

Converging your data landscape
Converging your data landscapeConverging your data landscape
Converging your data landscape
 
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & Evaluation
 
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataSelf-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your Data
 
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data Capture
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
 
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning Logistics
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model Management
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
 
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsLive Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIs
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn Prediction
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data Platform
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
 
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareBest Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in Healthcare
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and Analytics
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT Better
 
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLEvolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQL
 

Recently uploaded

Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeCzechDreamin
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftshyamraj55
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfFIDO Alliance
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomCzechDreamin
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfFIDO Alliance
 
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxJennifer Lim
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessUXDXConf
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutesconfluent
 
A Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyA Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyUXDXConf
 
AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101vincent683379
 
ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty SecureFemke de Vroome
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfFIDO Alliance
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...FIDO Alliance
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?Mark Billinghurst
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoTAnalytics
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...CzechDreamin
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxDavid Michel
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityScyllaDB
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka DoktorováCzechDreamin
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaCzechDreamin
 

Recently uploaded (20)

Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoft
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
A Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyA Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System Strategy
 
AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101
 
ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty Secure
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 

Spark Streaming Data Pipelines

  • 1. ® © 2016 MapR Technologies 1® © 2016 MapR Technologies 1© 2016 MapR Technologies ® Exploring Data Pipelines for Spark Streaming Applications Carol McDonald, Industry Solutions Architect 2016
  • 2. ® © 2016 MapR Technologies 2® © 2016 MapR Technologies 2 What is Streaming Data? Got Some Examples? Data Collection Devices Smart Machinery Phones and Tablets Home Automation RFID Systems Digital Signage Security Systems Medical Devices
  • 3. ® © 2016 MapR Technologies 3® © 2016 MapR Technologies 3 It was hot at 6:05 yesterday ! Why Stream Processing? Analyze 6:01 P.M.: 72° 6:02 P.M.: 75° 6:03 P.M.: 77° 6:04 P.M.: 85° 6:05 P.M.: 90° 6:06 P.M.: 85° 6:07 P.M.: 77° 6:08 P.M.: 75° 90°90° 6:01 P.M.: 72° 6:02 P.M.: 75° 6:03 P.M.: 77° 6:04 P.M.: 85° 6:05 P.M.: 90° 6:06 P.M.: 85° 6:07 P.M.: 77° 6:08 P.M.: 75° Batch processing may be too late for some events
  • 4. ® © 2016 MapR Technologies 4® © 2016 MapR Technologies 4 Why Stream Processing? 6:05 P.M.: 90° To pic Stream Temperature Turn on the air conditioning! It’s becoming important to process events as they arrive
  • 5. ® © 2016 MapR Technologies 5® © 2016 MapR Technologies 5 Key to Real Time: Event-based Data Flows web events etc… machine sensors Biometrics Mobile events
  • 6. ® © 2016 MapR Technologies 6® © 2016 MapR Technologies 6 What if BP had detected problems before the oil hit the water ? •  1M samples/sec •  High performance at scale is necessary!
  • 7. ® © 2016 MapR Technologies 7® © 2016 MapR Technologies 7 Use Case: Time Series Data Data for real-time monitoring read Sensor time-stamped data Spark processing Spark Streaming Stream Topic
  • 8. ® © 2016 MapR Technologies 8® © 2016 MapR Technologies 8 Schema •  All events stored, CF data could be set to expire data •  Filtered alerts put in CF alerts •  Daily summaries put in CF stats Row key CF data CF alerts CF stats hz … psi psi … hz_avg … psi_min COHUTTA_3/10/14_1:01 10.37 84 0 COHUTTA_3/10/14 10 0 Row Key contains oil pump name, date, and a time stamp
  • 9. ® © 2016 MapR Technologies 9® © 2016 MapR Technologies 9 Schema •  All events stored, CF data could be set to expire data •  Filtered alerts put in CF alerts •  Daily summaries put in CF stats Row key CF data CF alerts CF stats hz … psi psi … hz_avg … psi_min COHUTTA_3/10/14_1:01 10.37 84 0 COHUTTA_3/10/14 10 0
  • 10. ® © 2016 MapR Technologies 10® © 2016 MapR Technologies 10 Schema •  All events stored, CF data could be set to expire data •  Filtered alerts put in CF alerts •  Daily summaries put in CF stats Row key CF data CF alerts CF stats hz … psi psi … hz_avg … psi_min COHUTTA_3/10/14_1:01 10.37 84 0 COHUTTA_3/10/14 10 0
  • 11. ® © 2016 MapR Technologies 11® © 2016 MapR Technologies 11 Serve DataStore DataCollect Data What Do We Need to Do ? Process DataData Sources ? ? ? ?
  • 12. ® © 2016 MapR Technologies 12® © 2016 MapR Technologies 12 How do we do this with High Performance at Scale? •  Parallel operations and minimize disk read/write time
  • 13. ® © 2016 MapR Technologies 13® © 2016 MapR Technologies 13 Collect the Data Data Ingest MapR-FS Source Stream Topic •  Data Ingest: –  File Based: NFS with MapR-FS, HDFS –  Network Based: MapR Streams, Kafka, Kinesis, Twitter, Sockets...
  • 14. ® © 2016 MapR Technologies 14® © 2016 MapR Technologies 14 MapR Streams Publish Subscribe Messaging Topics Organize Events into Categories and decouple Producers from Consumers
  • 15. ® © 2016 MapR Technologies 15® © 2016 MapR Technologies 15 Scalable Messaging with MapR Streams Topics are partitioned for throughput and scalability
  • 16. ® © 2016 MapR Technologies 16® © 2016 MapR Technologies 16 How do we do this with High Performance at Scale? •  Parallel , Partitioned = fast , scalable –  Messaging with MapR Streams
  • 17. ® © 2016 MapR Technologies 17® © 2016 MapR Technologies 17 Collect Data Process the Data with Spark Streaming MapR-FS Process Data Stream Topic •  Extension of the core Spark AP •  Enables scalable, high-throughput, fault-tolerant stream processing of live data
  • 18. ® © 2016 MapR Technologies 18® © 2016 MapR Technologies 18 Processing Spark DStreams Data stream divided into batches of X milliseconds = DStreams
  • 19. ® © 2016 MapR Technologies 19® © 2016 MapR Technologies 19 Spark Resilient Distributed Datasets RDD W Executor P4 W Executor P1 P3 W Executor P2 partitioned Partition 1 8213034705, 95, 2.927373, jake7870, 0…… Partition 2 8213034705, 115, 2.943484, Davidbresler2, 1…. Partition 3 8213034705, 100, 2.951285, gladimacowgirl, 58… Partition 4 8213034705, 117, 2.998947, daysrus, 95…. Spark revolves around RDDs •  Read only collection of elements •  Partitioned across a cluster •  Operated on in parallel •  Cached in memory
  • 20. ® © 2016 MapR Technologies 20® © 2016 MapR Technologies 20 Spark Resilient Distributed Datasets Spark revolves around RDDs •  Read only collection of elements •  Partitioned across a cluster •  Operated on in parallel •  Cached in memory
  • 21. ® © 2016 MapR Technologies 21® © 2016 MapR Technologies 21 How do we do this with High Performance at Scale? •  Parallel , Partitioned = fast , scalable –  Processing with Spark
  • 22. ® © 2016 MapR Technologies 22® © 2016 MapR Technologies 22 Processing Spark DStreams transformations à create new RDDs Two types of operations on DStreams: •  Transformations: –  Create new DStreams –  map, filter, reduceByKey, SQL. . . •  Output Operations DStream RDDs DStream RDDs transform  transform   data from time 0 to 1 RDD @ time 1 data from time 1 to 2 RDD @ time 2 data from time 2 to 3 RDD @ time 3 RDD @ time 3 transform   RDD @ time 1 RDD @ time 2
  • 23. ® © 2016 MapR Technologies 23® © 2016 MapR Technologies 23 Two types of operations on DStreams •  Transformations •  Output Operations: trigger Computation –  Save to File, HBase.. •  saveAsHadoopFiles •  saveAsHadoopDataset •  saveAsTextFiles Processing Spark DStreams Output operations à trigger computation MapR-FS MapR-DB DStream RDDs data from time 0 to 1 data from time 1 to 2 data from time 2 to 3 RDD @ time 3RDD @ time 1 RDD @ time 2 mapmap map savesave save
  • 24. ® © 2016 MapR Technologies 24® © 2016 MapR Technologies 24 Serve DataStore DataCollect Data What Do We Need to Do ? MapR-FS Process DataData Sources MapR-FS Stream Topic
  • 25. ® © 2016 MapR Technologies 25® © 2016 MapR Technologies 25 MapR-DB (HBase API) is Designed to Scale Key Range xxxx xxxx Key Range xxxx xxxx Key Range xxxx xxxx Key colB col C val val val xxx val val Key colB col C val val val xxx val val Key colB col C val val val xxx val val Fast Reads and Writes by Key! Data is automatically partitioned by Key Range!
  • 26. ® © 2016 MapR Technologies 26® © 2016 MapR Technologies 26 Store Lots of Data with NoSQL MapR-DB bottleneck Key colB col C val val val xxx val val Key colB col C val val val xxx val val Key colB col C val val val xxx val val Storage ModelRDBMS MapR-DB Normalized schema à Joins for queries can cause bottleneck De-Normalized schema à Data that is read together is stored together
  • 27. ® © 2016 MapR Technologies 27® © 2016 MapR Technologies 27 Key to Real Time: Event-based Data Flows Key to Scale = Parallel Partitioned: •  Messaging •  Processing •  Storage
  • 28. ® © 2016 MapR Technologies 28® © 2016 MapR Technologies 28 Serve DataStore DataCollect Data What Do We Need to Do ? MapR-FS Process DataData Sources MapR-FS Stream Topic
  • 29. ® © 2016 MapR Technologies 29® © 2016 MapR Technologies 29 Use Case Example Code Data for real-time monitoring read Sensor time-stamped data Spark processing Spark Streaming Stream Topic
  • 30. ® © 2016 MapR Technologies 30® © 2016 MapR Technologies 30 Use Case Example Code Data for real-time monitoring read Sensor time-stamped data Spark processing Spark Streaming Stream Topic
  • 31. ® © 2016 MapR Technologies 31® © 2016 MapR Technologies 31 KafkaProducer String topic=“/streams/pump:warning”; public static KafkaProducer producer; Properties properties = new Properties(); properties.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer"); // Instantiate KafkaProducer with properties producer = new KafkaProducer<String, String>(properties); String txt = “msg text”; ProducerRecord<String, String> rec = new ProducerRecord<String, String>(topic, txt); producer.send(rec);
  • 32. ® © 2016 MapR Technologies 32® © 2016 MapR Technologies 32 Use Case Example Code Data for real-time monitoring read Sensor time-stamped data Spark processing Spark Streaming Stream Topic
  • 33. ® © 2016 MapR Technologies 33® © 2016 MapR Technologies 33 Create a DStream DStream: a sequence of RDDs representing a stream of data val ssc = new StreamingContext(sparkConf, Seconds(5)) val dStream = KafkaUtils.createDirectStream[String, String](ssc, kafkaParams, topicsSet) batch time 0 to 1 batch time 1 to 2 batch time 2 to 3 dStream Stored in memory as an RDD
  • 34. ® © 2016 MapR Technologies 34® © 2016 MapR Technologies 34 Process DStream val sensorDStream = dStream.map(_._2).map(parseSensor) dStream RDDs batch time 2 to 3 batch time 1 to 2 batch time 0 to 1 sensorDStream RDDs New RDDs created for every batch map map map
  • 35. ® © 2016 MapR Technologies 35® © 2016 MapR Technologies 35 Message Data to Sensor Object case class Sensor(resid: String, date: String, time: String, hz: Double, disp: Double, flo: Double, sedPPM: Double, psi: Double, chlPPM: Double) def parseSensor(str: String): Sensor = { val p = str.split(",") Sensor(p(0), p(1), p(2), p(3).toDouble, p(4).toDouble, p(5).toDouble, p(6).toDouble, p(7).toDouble, p(8).toDouble) }
  • 36. ® © 2016 MapR Technologies 36® © 2016 MapR Technologies 36 DataFrame and SQL Operations // for Each RDD sensorDStream.foreachRDD { rdd => val sqlContext = SQLContext.getOrCreate(rdd.sparkContext) rdd.toDF().registerTempTable("sensor") val res = sqlContext.sql( "SELECT resid, date, max(hz) as maxhz, min(hz) as minhz, avg(hz) as avghz, max(disp) as maxdisp, min(disp) as mindisp, avg(disp) as avgdisp, max(flo) as maxflo, min(flo) as minflo, avg(flo) as avgflo, max(psi) as maxpsi, min(psi) as minpsi, avg(psi) as avgpsi FROM sensor GROUP BY resid,date") res.show() }
  • 37. ® © 2016 MapR Technologies 37® © 2016 MapR Technologies 37 Streaming Application Output
  • 38. ® © 2016 MapR Technologies 38® © 2016 MapR Technologies 38 Save to HBase rdd.map(Sensor.convertToPut).saveAsHadoopDataset(jobConfig) linesRDD DStream sensorRDD DStream output operation: persist data to external storage Put objects written to HBase batch time 2-3 batch time 1 to 2 batch time 0 to 1 mapmap map savesave save
  • 39. ® © 2016 MapR Technologies 39® © 2016 MapR Technologies 39 Start Receiving Data sensorDStream.foreachRDD { rdd => . . . } // Start the computation ssc.start() // Wait for the computation to terminate ssc.awaitTermination()
  • 40. ® © 2016 MapR Technologies 40® © 2016 MapR Technologies 40 Stream Processing Building a Complete Data Architecture MapR File System (MapR-FS) MapR Converged Data Platform MapR Database (MapR-DB) MapR Streams Sources/Apps Bulk Processing
  • 41. ® © 2016 MapR Technologies 41® © 2016 MapR Technologies 41 To Learn More: •  Read explanation of and Download code –  https://www.mapr.com/blog/fast-scalable-streaming-applications-mapr-streams- spark-streaming-and-mapr-db –  https://www.mapr.com/blog/spark-streaming-hbase
  • 42. ® © 2016 MapR Technologies 42® © 2016 MapR Technologies 42 To Learn More: •  http://learn.mapr.com/
  • 43. ® © 2016 MapR Technologies 43® © 2016 MapR Technologies 43 Q&A @mapr @caroljmcdonald https://www.mapr.com/blog/author/carol-mcdonald Engage with us! mapr-technologies