SlideShare a Scribd company logo
1 of 32
Download to read offline
Spark Streaming
March 2015
A Brief Introduction for Developers
@StratioBD
Who am I?
SPARK STREAMING OVERVIEW
Big Data Developer at Stratio. Working on ingestion and
streaming projects with Spark Streaming and Apache Flume.
Currently researching on Spark SQL optimizations and other stuff.
Santiago Mola
@mola_io
SPARK
• What is Apache Spark?
• RDD
• RDD API
1 2 SPARK STREAMING
• What is Spark Streaming?
• Who uses it?
• Receivers
• Discretized Streams (DStream)
• Window functions
• Use case: Twitter text classification
INDEX
WHAT IS
APACHE SPARK?1
1.1. What is Apache Spark?
Spark Streaming: A Brief Introduction for Developers @ Berlin, March 2015
Apache Spark™ is a fast and general engine for large-scale data processing.
“The Spark engine runs in a variety of environments, from cloud services to Hadoop or Mesos
clusters. It is used to perform ETL, interactive queries (SQL), advancedanalytics (e.g. machine learning)
and streaming over large datasets in a wide range of data stores (e.g. HDFS, Cassandra,HBase, S3).
Spark supports a variety of popular development languages including Java, Python and Scala.”
Databricks – What is Spark?
https://databricks.com/spark/about
1.1. What is Apache Spark?
Spark Streaming: A Brief Introduction for Developers @ Berlin, March 2015
1.1. What does it look like?
Spark Streaming: A Brief Introduction for Developers @ Berlin, March 2015
Let’s count words…
val textFile = spark.textFile("hdfs://...")
val counts = textFile
.flatMap(line => line.split(" "))
.map(word => (word, 1))
.reduceByKey(_ + _)
counts.saveAsTextFile("hdfs://...")
1.2. Resilient Distributed Dataset (RDD)
Spark Streaming: A Brief Introduction for Developers @ Berlin, March 2015
A RDD is a collection of elements that is immutable, distributed and fault-tolerant.
Transformations can be applied to a RDD, resulting in new RDD.
Actions can be applied to a RDD to obtain a value.
RDD is lazy.
Resilient Distributed Dataset (RDD)
1.2. Resilient Distributed Dataset (RDD)
Spark Streaming: A Brief Introduction for Developers @ Berlin, March 2015
RDD[String]
(textFile)
“hello world”
“foo bar”
“foo foo bar”
“bye world”
RDD[String]
(flatMap)
“hello”
“world”
“foo”
“bar”
“foo”
“foo”
“bar”
“bye”
“world”
RDD[(String,Int)]
(map)
(“hello”, 1)
(“world”, 1)
(“foo”, 1)
(“bar”, 1)
(“foo”, 1)
(“foo”, 1)
(“bar”, 1)
(“bye”, 1)
(“world”, 1)
RDD[(String,Int)]
(reduceByKey)
(“hello”, 1)
(“foo”, 3)
(“bar”, 2)
(“bye”, 1)
(“world”, 2)
1.2. Resilient Distributed Dataset (RDD)
Spark Streaming: A Brief Introduction for Developers @ Berlin, March 2015
val textFile : RDD[String] = spark.textFile("hdfs://...")
val flatMapped : RDD[String] = textFile.flatMap(line => line.split(" "))
val mapped : RDD[(String,Int)] = flatMapped.map(Word => (word, 1))
val counts : RDD[(String,Int)] = mapped.reduceByKey(_ + _)
counts.saveAsTextFile("hdfs://...")
1.3. RDD API
Spark Streaming: A Brief Introduction for Developers @ Berlin, March 2015
map(func)
filter(func)
flatMap(func)
mapPartitions(func)
mapPartitionsWithIndex(func)
sample(withReplacement, fraction, seed)
union(otherDataset)
intersection(otherDataset)
distinct([numTasks]))
groupByKey([numTasks])
reduceByKey(func, [numTasks])
aggregateByKey(zeroValue)(seqOp, combOp, [numTasks])
sortByKey([ascending], [numTasks])
join(otherDataset, [numTasks])
cogroup(otherDataset, [numTasks])
cartesian(otherDataset)
pipe(command, [envVars])
coalesce(numPartitions)
repartition(numPartitions)
repartitionAndSortWithinPartitions(partitioner)
Transformations
reduce(func)
collect()
count()
first()
take(n)
takeSample(withReplacement, num, [seed])
takeOrdered(n, [ordering])
saveAsTextFile(path)
saveAsSequenceFile(path)
saveAsObjectFile(path)
countByKey()
foreach(func)
Actions
https://spark.apache.org/docs/latest/programming-guide.html
Full docs
1. Recap…
Spark Streaming: A Brief Introduction for Developers @ Berlin, March 2015
• Apache Spark is an awesome, distributed, fault-tolerant, easy-to-use processing engine.
• The most important concept is the RDD, which is an immutable and distributed collection of elements.
• RDD API provides a lot of high-level transformations that make distributed processing easier.
• On top of Spark core, we have MLLib (machine learning), Spark SQL (query engine), GraphX (graph
algorithms) and… Spark Streaming (stream processing)!
2. SPARK
STREAMING
2.1 What is Spark Streaming?
Spark Streaming: A Brief Introduction for Developers @ Berlin, March 2015
Source: http://spark.apache.org/docs/latest/streaming-programming-guide.html
2.1 What is Spark Streaming?
Spark Streaming: A Brief Introduction for Developers @ Berlin, March 2015
Source: http://spark.apache.org/docs/latest/streaming-programming-guide.html
2.1 Who uses it?
Spark Streaming: A Brief Introduction for Developers @ Berlin, March 2015
Source: http://es.slideshare.net/pacoid/databricks-meetup-los-angeles-apache-spark-user-group
2.2. Receivers
Spark Streaming: A Brief Introduction for Developers @ Berlin, March 2015
• File Stream
• Sockets
• Actors (Akka)
• Queue RDDs (Testing)
2.2. Receivers
Spark Streaming: A Brief Introduction for Developers @ Berlin, March 2015
Twitter
Flume
Kafka
Kinesis
2.2. Discretized streams (DStream)
Spark Streaming: A Brief Introduction for Developers @ Berlin, March 2015
Spark Streaming does not work with continuous live streams, but with a discretized representation.
The DStream (discretized stream) represents a sequence of RDDs, each of them corresponding to a
micro-batch.
2.3. What does it look like?
Spark Streaming: A Brief Introduction for Developers @ Berlin, March 2015
Let’s count words… again…
val textStream = ssc.socketTextStream(“localhost“, 9000)
val counts = textStream
.flatMap(line => line.split(" "))
.map(word => (word, 1))
.reduceByKey(_ + _)
counts.print()
2.2. Discretized streams (DStream)
Spark Streaming: A Brief Introduction for Developers @ Berlin, March 2015
2.2. Window operations
Spark Streaming: A Brief Introduction for Developers @ Berlin, March 2015
2.3. What does it look like?
Spark Streaming: A Brief Introduction for Developers @ Berlin, March 2015
Let’s count words… and print every 10 seconds the counters of the last 60 seconds
val textStream = ssc.socketTextStream(“localhost“, 9000)
val counts = textStream
.flatMap(line => line.split(" "))
.map(word => (word, 1))
.reduceByKeyAndWindow(_ + _, Seconds(60), Seconds(10))
counts.print()
2.4. Twitter text classification
Spark Streaming: A Brief Introduction for Developers @ Berlin, March 2015
println("Initializing Streaming Spark Context...")
val conf = new SparkConf().setAppName(this.getClass.getSimpleName)
val ssc = new StreamingContext(conf, Seconds(5))
println("Initializing Twitter stream...")
val tweets = TwitterUtils.createStream(ssc, Utils.getAuth)
val statuses = tweets.map(_.getText)
println("Initalizaing the the KMeans model...")
val model =
new KMeansModel(ssc.sparkContext.objectFile[Vector](modelFile.toString).collect())
val filteredTweets = statuses
.filter(t => model.predict(Utils.featurize(t)) == clusterNumber)
filteredTweets.print()
Source: http://databricks.gitbooks.io/databricks-spark-reference-applications/content/twitter_classifier/predict.html
2.5. Recap…
Spark Streaming: A Brief Introduction for Developers @ Berlin, March 2015
Source: http://databricks.gitbooks.io/databricks-spark-reference-applications/content/twitter_classifier/predict.html
• Spark Streaming uses a discrete representation of live streams, where each batch is a RDD.
• Data can be received from a wide variety of sources.
• Streaming APIs resemble RDD APIs: learning it is trivial for Spark (batch) users.
• Streaming API has a wide variety of high-level transformations (most transformations available to RDD
+ window transformations).
• It can be combined with the RDD API… that means integration with Mllib (machine learning), GraphX
(graph algorithms), RDD persistence or any other Spark components.
Thanks!
www.stratio.com
https://github.com/Stratio
@StratioBD
SUPPORT SLIDES
Spark Streaming: A Brief Introduction for Developers @ Berlin, March 2015
RDD, Stages…
Spark Streaming: A Brief Introduction for Developers @ Berlin, March 2015
StreamingContext
Spark Streaming: A Brief Introduction for Developers @ Berlin, March 2015
Checkpointing
Spark Streaming: A Brief Introduction for Developers @ Berlin, March 2015
Transform
Spark Streaming: A Brief Introduction for Developers @ Berlin, March 2015

More Related Content

What's hot

Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...Brian O'Neill
 
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...Cedric CARBONE
 
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
Real-Time Anomaly Detection  with Spark MLlib, Akka and  CassandraReal-Time Anomaly Detection  with Spark MLlib, Akka and  Cassandra
Real-Time Anomaly Detection with Spark MLlib, Akka and CassandraNatalino Busa
 
Spark streaming State of the Union - Strata San Jose 2015
Spark streaming State of the Union - Strata San Jose 2015Spark streaming State of the Union - Strata San Jose 2015
Spark streaming State of the Union - Strata San Jose 2015Databricks
 
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...Spark Summit
 
Introduction to TitanDB
Introduction to TitanDB Introduction to TitanDB
Introduction to TitanDB Knoldus Inc.
 
Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadin
Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadinSpark and Cassandra: An Amazing Apache Love Story by Patrick McFadin
Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadinSpark Summit
 
Reactive dashboard’s using apache spark
Reactive dashboard’s using apache sparkReactive dashboard’s using apache spark
Reactive dashboard’s using apache sparkRahul Kumar
 
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino BusaReal-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino BusaSpark Summit
 
Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017Petr Zapletal
 
Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...
Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...
Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...Spark Summit
 
Spark Summit East 2015 Keynote -- Databricks CEO Ion Stoica
Spark Summit East 2015 Keynote -- Databricks CEO Ion StoicaSpark Summit East 2015 Keynote -- Databricks CEO Ion Stoica
Spark Summit East 2015 Keynote -- Databricks CEO Ion StoicaDatabricks
 
Vertica And Spark: Connecting Computation And Data
Vertica And Spark: Connecting Computation And DataVertica And Spark: Connecting Computation And Data
Vertica And Spark: Connecting Computation And DataSpark Summit
 
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...Anton Kirillov
 
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch AnalysisNoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch AnalysisHelena Edelson
 
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Helena Edelson
 
Spark Streaming, Machine Learning and meetup.com streaming API.
Spark Streaming, Machine Learning and  meetup.com streaming API.Spark Streaming, Machine Learning and  meetup.com streaming API.
Spark Streaming, Machine Learning and meetup.com streaming API.Sergey Zelvenskiy
 
SMACK Stack 1.1
SMACK Stack 1.1SMACK Stack 1.1
SMACK Stack 1.1Joe Stein
 
Adding Complex Data to Spark Stack by Tug Grall
Adding Complex Data to Spark Stack by Tug GrallAdding Complex Data to Spark Stack by Tug Grall
Adding Complex Data to Spark Stack by Tug GrallSpark Summit
 

What's hot (20)

Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
 
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
 
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
Real-Time Anomaly Detection  with Spark MLlib, Akka and  CassandraReal-Time Anomaly Detection  with Spark MLlib, Akka and  Cassandra
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
 
Spark streaming State of the Union - Strata San Jose 2015
Spark streaming State of the Union - Strata San Jose 2015Spark streaming State of the Union - Strata San Jose 2015
Spark streaming State of the Union - Strata San Jose 2015
 
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
 
Introduction to TitanDB
Introduction to TitanDB Introduction to TitanDB
Introduction to TitanDB
 
Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadin
Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadinSpark and Cassandra: An Amazing Apache Love Story by Patrick McFadin
Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadin
 
Reactive dashboard’s using apache spark
Reactive dashboard’s using apache sparkReactive dashboard’s using apache spark
Reactive dashboard’s using apache spark
 
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino BusaReal-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
 
Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017
 
Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...
Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...
Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...
 
Spark Summit East 2015 Keynote -- Databricks CEO Ion Stoica
Spark Summit East 2015 Keynote -- Databricks CEO Ion StoicaSpark Summit East 2015 Keynote -- Databricks CEO Ion Stoica
Spark Summit East 2015 Keynote -- Databricks CEO Ion Stoica
 
Vertica And Spark: Connecting Computation And Data
Vertica And Spark: Connecting Computation And DataVertica And Spark: Connecting Computation And Data
Vertica And Spark: Connecting Computation And Data
 
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
 
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch AnalysisNoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
 
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
 
Spark Streaming, Machine Learning and meetup.com streaming API.
Spark Streaming, Machine Learning and  meetup.com streaming API.Spark Streaming, Machine Learning and  meetup.com streaming API.
Spark Streaming, Machine Learning and meetup.com streaming API.
 
SMACK Stack 1.1
SMACK Stack 1.1SMACK Stack 1.1
SMACK Stack 1.1
 
Adding Complex Data to Spark Stack by Tug Grall
Adding Complex Data to Spark Stack by Tug GrallAdding Complex Data to Spark Stack by Tug Grall
Adding Complex Data to Spark Stack by Tug Grall
 
Cassandra & Spark for IoT
Cassandra & Spark for IoTCassandra & Spark for IoT
Cassandra & Spark for IoT
 

Viewers also liked

Introduction to Asynchronous scala
Introduction to Asynchronous scalaIntroduction to Asynchronous scala
Introduction to Asynchronous scalaStratio
 
Functional programming in scala
Functional programming in scalaFunctional programming in scala
Functional programming in scalaStratio
 
Primeros pasos con Spark - Spark Meetup Madrid 30-09-2014
Primeros pasos con Spark - Spark Meetup Madrid 30-09-2014Primeros pasos con Spark - Spark Meetup Madrid 30-09-2014
Primeros pasos con Spark - Spark Meetup Madrid 30-09-2014Stratio
 
Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016
Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016
Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016Stratio
 
Lunch&Learn: Combinación de modelos
Lunch&Learn: Combinación de modelosLunch&Learn: Combinación de modelos
Lunch&Learn: Combinación de modelosStratio
 
Stratio platform overview v4.1
Stratio platform overview v4.1Stratio platform overview v4.1
Stratio platform overview v4.1Stratio
 
Distributed Logistic Model Trees
Distributed Logistic Model TreesDistributed Logistic Model Trees
Distributed Logistic Model TreesStratio
 
On-the-fly ETL con EFK: ElasticSearch, Flume, Kibana
On-the-fly ETL con EFK: ElasticSearch, Flume, KibanaOn-the-fly ETL con EFK: ElasticSearch, Flume, Kibana
On-the-fly ETL con EFK: ElasticSearch, Flume, KibanaStratio
 
Meetup: Spark + Kerberos
Meetup: Spark + KerberosMeetup: Spark + Kerberos
Meetup: Spark + KerberosStratio
 

Viewers also liked (9)

Introduction to Asynchronous scala
Introduction to Asynchronous scalaIntroduction to Asynchronous scala
Introduction to Asynchronous scala
 
Functional programming in scala
Functional programming in scalaFunctional programming in scala
Functional programming in scala
 
Primeros pasos con Spark - Spark Meetup Madrid 30-09-2014
Primeros pasos con Spark - Spark Meetup Madrid 30-09-2014Primeros pasos con Spark - Spark Meetup Madrid 30-09-2014
Primeros pasos con Spark - Spark Meetup Madrid 30-09-2014
 
Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016
Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016
Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016
 
Lunch&Learn: Combinación de modelos
Lunch&Learn: Combinación de modelosLunch&Learn: Combinación de modelos
Lunch&Learn: Combinación de modelos
 
Stratio platform overview v4.1
Stratio platform overview v4.1Stratio platform overview v4.1
Stratio platform overview v4.1
 
Distributed Logistic Model Trees
Distributed Logistic Model TreesDistributed Logistic Model Trees
Distributed Logistic Model Trees
 
On-the-fly ETL con EFK: ElasticSearch, Flume, Kibana
On-the-fly ETL con EFK: ElasticSearch, Flume, KibanaOn-the-fly ETL con EFK: ElasticSearch, Flume, Kibana
On-the-fly ETL con EFK: ElasticSearch, Flume, Kibana
 
Meetup: Spark + Kerberos
Meetup: Spark + KerberosMeetup: Spark + Kerberos
Meetup: Spark + Kerberos
 

Similar to Spark Streaming @ Berlin Apache Spark Meetup, March 2015

Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)Michael Rys
 
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...Michael Rys
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Guido Schmutz
 
Apache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetupApache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetupNed Shawa
 
Spark what's new what's coming
Spark what's new what's comingSpark what's new what's coming
Spark what's new what's comingDatabricks
 
An Introduction to Spark
An Introduction to SparkAn Introduction to Spark
An Introduction to Sparkjlacefie
 
An Introduct to Spark - Atlanta Spark Meetup
An Introduct to Spark - Atlanta Spark MeetupAn Introduct to Spark - Atlanta Spark Meetup
An Introduct to Spark - Atlanta Spark Meetupjlacefie
 
Spark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsSpark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsGuido Schmutz
 
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Michael Rys
 
Azure Databricks is Easier Than You Think
Azure Databricks is Easier Than You ThinkAzure Databricks is Easier Than You Think
Azure Databricks is Easier Than You ThinkIke Ellis
 
Spark Cassandra Connector: Past, Present and Furure
Spark Cassandra Connector: Past, Present and FurureSpark Cassandra Connector: Past, Present and Furure
Spark Cassandra Connector: Past, Present and FurureDataStax Academy
 
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...BigDataEverywhere
 
20150716 introduction to apache spark v3
20150716 introduction to apache spark v3 20150716 introduction to apache spark v3
20150716 introduction to apache spark v3 Andrey Vykhodtsev
 
Spark streaming state of the union
Spark streaming state of the unionSpark streaming state of the union
Spark streaming state of the unionDatabricks
 
Realizing the Promise of Portable Data Processing with Apache Beam
Realizing the Promise of Portable Data Processing with Apache BeamRealizing the Promise of Portable Data Processing with Apache Beam
Realizing the Promise of Portable Data Processing with Apache BeamDataWorks Summit
 
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Helena Edelson
 
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...Simplilearn
 
Spark streaming , Spark SQL
Spark streaming , Spark SQLSpark streaming , Spark SQL
Spark streaming , Spark SQLYousun Jeong
 

Similar to Spark Streaming @ Berlin Apache Spark Meetup, March 2015 (20)

Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)
 
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
 
Apache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetupApache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetup
 
Spark what's new what's coming
Spark what's new what's comingSpark what's new what's coming
Spark what's new what's coming
 
An Introduction to Spark
An Introduction to SparkAn Introduction to Spark
An Introduction to Spark
 
An Introduct to Spark - Atlanta Spark Meetup
An Introduct to Spark - Atlanta Spark MeetupAn Introduct to Spark - Atlanta Spark Meetup
An Introduct to Spark - Atlanta Spark Meetup
 
Spark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsSpark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka Streams
 
Spark core
Spark coreSpark core
Spark core
 
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
 
20170126 big data processing
20170126 big data processing20170126 big data processing
20170126 big data processing
 
Azure Databricks is Easier Than You Think
Azure Databricks is Easier Than You ThinkAzure Databricks is Easier Than You Think
Azure Databricks is Easier Than You Think
 
Spark Cassandra Connector: Past, Present and Furure
Spark Cassandra Connector: Past, Present and FurureSpark Cassandra Connector: Past, Present and Furure
Spark Cassandra Connector: Past, Present and Furure
 
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
 
20150716 introduction to apache spark v3
20150716 introduction to apache spark v3 20150716 introduction to apache spark v3
20150716 introduction to apache spark v3
 
Spark streaming state of the union
Spark streaming state of the unionSpark streaming state of the union
Spark streaming state of the union
 
Realizing the Promise of Portable Data Processing with Apache Beam
Realizing the Promise of Portable Data Processing with Apache BeamRealizing the Promise of Portable Data Processing with Apache Beam
Realizing the Promise of Portable Data Processing with Apache Beam
 
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
 
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
 
Spark streaming , Spark SQL
Spark streaming , Spark SQLSpark streaming , Spark SQL
Spark streaming , Spark SQL
 

More from Stratio

Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...
Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...
Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...Stratio
 
Can an intelligent system exist without awareness? BDS18
Can an intelligent system exist without awareness? BDS18Can an intelligent system exist without awareness? BDS18
Can an intelligent system exist without awareness? BDS18Stratio
 
Kafka and KSQL - Apache Kafka Meetup
Kafka and KSQL - Apache Kafka MeetupKafka and KSQL - Apache Kafka Meetup
Kafka and KSQL - Apache Kafka MeetupStratio
 
Wild Data - The Data Science Meetup
Wild Data - The Data Science MeetupWild Data - The Data Science Meetup
Wild Data - The Data Science MeetupStratio
 
Using Kafka on Event-driven Microservices Architectures - Apache Kafka Meetup
Using Kafka on Event-driven Microservices Architectures - Apache Kafka MeetupUsing Kafka on Event-driven Microservices Architectures - Apache Kafka Meetup
Using Kafka on Event-driven Microservices Architectures - Apache Kafka MeetupStratio
 
Ensemble methods in Machine Learning
Ensemble methods in Machine Learning Ensemble methods in Machine Learning
Ensemble methods in Machine Learning Stratio
 
Stratio Sparta 2.0
Stratio Sparta 2.0Stratio Sparta 2.0
Stratio Sparta 2.0Stratio
 
Big Data Security: Facing the challenge
Big Data Security: Facing the challengeBig Data Security: Facing the challenge
Big Data Security: Facing the challengeStratio
 
Operationalizing Big Data
Operationalizing Big DataOperationalizing Big Data
Operationalizing Big DataStratio
 
Artificial Intelligence on Data Centric Platform
Artificial Intelligence on Data Centric PlatformArtificial Intelligence on Data Centric Platform
Artificial Intelligence on Data Centric PlatformStratio
 
Introduction to Artificial Neural Networks
Introduction to Artificial Neural NetworksIntroduction to Artificial Neural Networks
Introduction to Artificial Neural NetworksStratio
 
“A Distributed Operational and Informational Technological Stack”
“A Distributed Operational and Informational Technological Stack” “A Distributed Operational and Informational Technological Stack”
“A Distributed Operational and Informational Technological Stack” Stratio
 
Meetup: Cómo monitorizar y optimizar procesos de Spark usando la Spark Web - ...
Meetup: Cómo monitorizar y optimizar procesos de Spark usando la Spark Web - ...Meetup: Cómo monitorizar y optimizar procesos de Spark usando la Spark Web - ...
Meetup: Cómo monitorizar y optimizar procesos de Spark usando la Spark Web - ...Stratio
 
Advanced search and Top-K queries in Cassandra
Advanced search and Top-K queries in CassandraAdvanced search and Top-K queries in Cassandra
Advanced search and Top-K queries in CassandraStratio
 

More from Stratio (14)

Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...
Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...
Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...
 
Can an intelligent system exist without awareness? BDS18
Can an intelligent system exist without awareness? BDS18Can an intelligent system exist without awareness? BDS18
Can an intelligent system exist without awareness? BDS18
 
Kafka and KSQL - Apache Kafka Meetup
Kafka and KSQL - Apache Kafka MeetupKafka and KSQL - Apache Kafka Meetup
Kafka and KSQL - Apache Kafka Meetup
 
Wild Data - The Data Science Meetup
Wild Data - The Data Science MeetupWild Data - The Data Science Meetup
Wild Data - The Data Science Meetup
 
Using Kafka on Event-driven Microservices Architectures - Apache Kafka Meetup
Using Kafka on Event-driven Microservices Architectures - Apache Kafka MeetupUsing Kafka on Event-driven Microservices Architectures - Apache Kafka Meetup
Using Kafka on Event-driven Microservices Architectures - Apache Kafka Meetup
 
Ensemble methods in Machine Learning
Ensemble methods in Machine Learning Ensemble methods in Machine Learning
Ensemble methods in Machine Learning
 
Stratio Sparta 2.0
Stratio Sparta 2.0Stratio Sparta 2.0
Stratio Sparta 2.0
 
Big Data Security: Facing the challenge
Big Data Security: Facing the challengeBig Data Security: Facing the challenge
Big Data Security: Facing the challenge
 
Operationalizing Big Data
Operationalizing Big DataOperationalizing Big Data
Operationalizing Big Data
 
Artificial Intelligence on Data Centric Platform
Artificial Intelligence on Data Centric PlatformArtificial Intelligence on Data Centric Platform
Artificial Intelligence on Data Centric Platform
 
Introduction to Artificial Neural Networks
Introduction to Artificial Neural NetworksIntroduction to Artificial Neural Networks
Introduction to Artificial Neural Networks
 
“A Distributed Operational and Informational Technological Stack”
“A Distributed Operational and Informational Technological Stack” “A Distributed Operational and Informational Technological Stack”
“A Distributed Operational and Informational Technological Stack”
 
Meetup: Cómo monitorizar y optimizar procesos de Spark usando la Spark Web - ...
Meetup: Cómo monitorizar y optimizar procesos de Spark usando la Spark Web - ...Meetup: Cómo monitorizar y optimizar procesos de Spark usando la Spark Web - ...
Meetup: Cómo monitorizar y optimizar procesos de Spark usando la Spark Web - ...
 
Advanced search and Top-K queries in Cassandra
Advanced search and Top-K queries in CassandraAdvanced search and Top-K queries in Cassandra
Advanced search and Top-K queries in Cassandra
 

Recently uploaded

Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknowmakika9823
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 

Recently uploaded (20)

Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 

Spark Streaming @ Berlin Apache Spark Meetup, March 2015

  • 1. Spark Streaming March 2015 A Brief Introduction for Developers @StratioBD
  • 2. Who am I? SPARK STREAMING OVERVIEW Big Data Developer at Stratio. Working on ingestion and streaming projects with Spark Streaming and Apache Flume. Currently researching on Spark SQL optimizations and other stuff. Santiago Mola @mola_io
  • 3. SPARK • What is Apache Spark? • RDD • RDD API 1 2 SPARK STREAMING • What is Spark Streaming? • Who uses it? • Receivers • Discretized Streams (DStream) • Window functions • Use case: Twitter text classification INDEX
  • 5. 1.1. What is Apache Spark? Spark Streaming: A Brief Introduction for Developers @ Berlin, March 2015 Apache Spark™ is a fast and general engine for large-scale data processing. “The Spark engine runs in a variety of environments, from cloud services to Hadoop or Mesos clusters. It is used to perform ETL, interactive queries (SQL), advancedanalytics (e.g. machine learning) and streaming over large datasets in a wide range of data stores (e.g. HDFS, Cassandra,HBase, S3). Spark supports a variety of popular development languages including Java, Python and Scala.” Databricks – What is Spark? https://databricks.com/spark/about
  • 6. 1.1. What is Apache Spark? Spark Streaming: A Brief Introduction for Developers @ Berlin, March 2015
  • 7. 1.1. What does it look like? Spark Streaming: A Brief Introduction for Developers @ Berlin, March 2015 Let’s count words… val textFile = spark.textFile("hdfs://...") val counts = textFile .flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_ + _) counts.saveAsTextFile("hdfs://...")
  • 8. 1.2. Resilient Distributed Dataset (RDD) Spark Streaming: A Brief Introduction for Developers @ Berlin, March 2015 A RDD is a collection of elements that is immutable, distributed and fault-tolerant. Transformations can be applied to a RDD, resulting in new RDD. Actions can be applied to a RDD to obtain a value. RDD is lazy. Resilient Distributed Dataset (RDD)
  • 9. 1.2. Resilient Distributed Dataset (RDD) Spark Streaming: A Brief Introduction for Developers @ Berlin, March 2015 RDD[String] (textFile) “hello world” “foo bar” “foo foo bar” “bye world” RDD[String] (flatMap) “hello” “world” “foo” “bar” “foo” “foo” “bar” “bye” “world” RDD[(String,Int)] (map) (“hello”, 1) (“world”, 1) (“foo”, 1) (“bar”, 1) (“foo”, 1) (“foo”, 1) (“bar”, 1) (“bye”, 1) (“world”, 1) RDD[(String,Int)] (reduceByKey) (“hello”, 1) (“foo”, 3) (“bar”, 2) (“bye”, 1) (“world”, 2)
  • 10. 1.2. Resilient Distributed Dataset (RDD) Spark Streaming: A Brief Introduction for Developers @ Berlin, March 2015 val textFile : RDD[String] = spark.textFile("hdfs://...") val flatMapped : RDD[String] = textFile.flatMap(line => line.split(" ")) val mapped : RDD[(String,Int)] = flatMapped.map(Word => (word, 1)) val counts : RDD[(String,Int)] = mapped.reduceByKey(_ + _) counts.saveAsTextFile("hdfs://...")
  • 11. 1.3. RDD API Spark Streaming: A Brief Introduction for Developers @ Berlin, March 2015 map(func) filter(func) flatMap(func) mapPartitions(func) mapPartitionsWithIndex(func) sample(withReplacement, fraction, seed) union(otherDataset) intersection(otherDataset) distinct([numTasks])) groupByKey([numTasks]) reduceByKey(func, [numTasks]) aggregateByKey(zeroValue)(seqOp, combOp, [numTasks]) sortByKey([ascending], [numTasks]) join(otherDataset, [numTasks]) cogroup(otherDataset, [numTasks]) cartesian(otherDataset) pipe(command, [envVars]) coalesce(numPartitions) repartition(numPartitions) repartitionAndSortWithinPartitions(partitioner) Transformations reduce(func) collect() count() first() take(n) takeSample(withReplacement, num, [seed]) takeOrdered(n, [ordering]) saveAsTextFile(path) saveAsSequenceFile(path) saveAsObjectFile(path) countByKey() foreach(func) Actions https://spark.apache.org/docs/latest/programming-guide.html Full docs
  • 12. 1. Recap… Spark Streaming: A Brief Introduction for Developers @ Berlin, March 2015 • Apache Spark is an awesome, distributed, fault-tolerant, easy-to-use processing engine. • The most important concept is the RDD, which is an immutable and distributed collection of elements. • RDD API provides a lot of high-level transformations that make distributed processing easier. • On top of Spark core, we have MLLib (machine learning), Spark SQL (query engine), GraphX (graph algorithms) and… Spark Streaming (stream processing)!
  • 14. 2.1 What is Spark Streaming? Spark Streaming: A Brief Introduction for Developers @ Berlin, March 2015 Source: http://spark.apache.org/docs/latest/streaming-programming-guide.html
  • 15. 2.1 What is Spark Streaming? Spark Streaming: A Brief Introduction for Developers @ Berlin, March 2015 Source: http://spark.apache.org/docs/latest/streaming-programming-guide.html
  • 16. 2.1 Who uses it? Spark Streaming: A Brief Introduction for Developers @ Berlin, March 2015 Source: http://es.slideshare.net/pacoid/databricks-meetup-los-angeles-apache-spark-user-group
  • 17. 2.2. Receivers Spark Streaming: A Brief Introduction for Developers @ Berlin, March 2015 • File Stream • Sockets • Actors (Akka) • Queue RDDs (Testing)
  • 18. 2.2. Receivers Spark Streaming: A Brief Introduction for Developers @ Berlin, March 2015 Twitter Flume Kafka Kinesis
  • 19. 2.2. Discretized streams (DStream) Spark Streaming: A Brief Introduction for Developers @ Berlin, March 2015 Spark Streaming does not work with continuous live streams, but with a discretized representation. The DStream (discretized stream) represents a sequence of RDDs, each of them corresponding to a micro-batch.
  • 20. 2.3. What does it look like? Spark Streaming: A Brief Introduction for Developers @ Berlin, March 2015 Let’s count words… again… val textStream = ssc.socketTextStream(“localhost“, 9000) val counts = textStream .flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_ + _) counts.print()
  • 21. 2.2. Discretized streams (DStream) Spark Streaming: A Brief Introduction for Developers @ Berlin, March 2015
  • 22. 2.2. Window operations Spark Streaming: A Brief Introduction for Developers @ Berlin, March 2015
  • 23. 2.3. What does it look like? Spark Streaming: A Brief Introduction for Developers @ Berlin, March 2015 Let’s count words… and print every 10 seconds the counters of the last 60 seconds val textStream = ssc.socketTextStream(“localhost“, 9000) val counts = textStream .flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKeyAndWindow(_ + _, Seconds(60), Seconds(10)) counts.print()
  • 24. 2.4. Twitter text classification Spark Streaming: A Brief Introduction for Developers @ Berlin, March 2015 println("Initializing Streaming Spark Context...") val conf = new SparkConf().setAppName(this.getClass.getSimpleName) val ssc = new StreamingContext(conf, Seconds(5)) println("Initializing Twitter stream...") val tweets = TwitterUtils.createStream(ssc, Utils.getAuth) val statuses = tweets.map(_.getText) println("Initalizaing the the KMeans model...") val model = new KMeansModel(ssc.sparkContext.objectFile[Vector](modelFile.toString).collect()) val filteredTweets = statuses .filter(t => model.predict(Utils.featurize(t)) == clusterNumber) filteredTweets.print() Source: http://databricks.gitbooks.io/databricks-spark-reference-applications/content/twitter_classifier/predict.html
  • 25. 2.5. Recap… Spark Streaming: A Brief Introduction for Developers @ Berlin, March 2015 Source: http://databricks.gitbooks.io/databricks-spark-reference-applications/content/twitter_classifier/predict.html • Spark Streaming uses a discrete representation of live streams, where each batch is a RDD. • Data can be received from a wide variety of sources. • Streaming APIs resemble RDD APIs: learning it is trivial for Spark (batch) users. • Streaming API has a wide variety of high-level transformations (most transformations available to RDD + window transformations). • It can be combined with the RDD API… that means integration with Mllib (machine learning), GraphX (graph algorithms), RDD persistence or any other Spark components.
  • 27.
  • 28. SUPPORT SLIDES Spark Streaming: A Brief Introduction for Developers @ Berlin, March 2015
  • 29. RDD, Stages… Spark Streaming: A Brief Introduction for Developers @ Berlin, March 2015
  • 30. StreamingContext Spark Streaming: A Brief Introduction for Developers @ Berlin, March 2015
  • 31. Checkpointing Spark Streaming: A Brief Introduction for Developers @ Berlin, March 2015
  • 32. Transform Spark Streaming: A Brief Introduction for Developers @ Berlin, March 2015