SlideShare a Scribd company logo
Introduction to
Spark with Scala
Introduction to
Spark with Scala
Himanshu Gupta
Software Consultant
Knoldus Software LLP
Himanshu Gupta
Software Consultant
Knoldus Software LLP
Who am I ?Who am I ?
Himanshu Gupta (@himanshug735)
Software Consultant at Knoldus Software LLP
Spark & Scala enthusiast
Himanshu Gupta (@himanshug735)
Software Consultant at Knoldus Software LLP
Spark & Scala enthusiast
AgendaAgenda
● What is Spark ?
● Why we need Spark ?
● Brief introduction to RDD
● Brief introduction to Spark Streaming
● How to install Spark ?
● Demo
● What is Spark ?
● Why we need Spark ?
● Brief introduction to RDD
● Brief introduction to Spark Streaming
● How to install Spark ?
● Demo
What is Apache Spark ?What is Apache Spark ?
Fast and general engine for large-scale data processing
with libraries for SQL, streaming, advanced analytics
Fast and general engine for large-scale data processing
with libraries for SQL, streaming, advanced analytics
Spark HistorySpark History
Project Begins
at
UCB AMP Lab
20092009
20102010
Open Sourced
Apache Incubator
20112011
20122012
20132013
20142014
20152015
Data Frames
Cloudera
Support
Apache
Top level
Spark
Summit
2013
Spark
Summit
2014
Spark StackSpark Stack
Img src - http://spark.apache.org/Img src - http://spark.apache.org/
Fastest Growing Open Source ProjectFastest Growing Open Source Project
Img src - https://databricks.com/blog/2015/03/31/spark-turns-five-years-old.htmlImg src - https://databricks.com/blog/2015/03/31/spark-turns-five-years-old.html
AgendaAgenda
● What is Spark ?
● Why we need Spark ?
● Brief introduction to RDD
● Brief introduction to Spark Streaming
● How to install Spark ?
● Demo
● What is Spark ?
● Why we need Spark ?
● Brief introduction to RDD
● Brief introduction to Spark Streaming
● How to install Spark ?
● Demo
Code SizeCode Size
Img src - http://spark-summit.org/wp-content/uploads/2013/10/Zaharia-spark-summit-2013-matei.pdfImg src - http://spark-summit.org/wp-content/uploads/2013/10/Zaharia-spark-summit-2013-matei.pdf
Word Count Ex.
public static class WordCountMapClass extends MapReduceBase
implements Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
}
public void map(LongWritable key, Text value, OutputCollector<Text,
IntWritable> output, Reporter reporter) throws IOException {
String line = value.toString();
StringTokenizer itr = new StringTokenizer(line);
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
output.collect(word, one);
}
}
public static class WorkdCountReduce extends MapReduceBase
implements Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterator<IntWritable> values,
OutputCollector<Text, IntWritable> output, Reporter reporter) throws
IOException {
int sum = 0;
while (values.hasNext()) {
sum += values.next().get();
}
output.collect(key, new IntWritable(sum));
}
}
val file = spark.textFile("hdfs://...")
val counts = file.flatMap(line => line.split(" "))
.map(word => (word, 1))
.reduceByKey(_ + _)
counts.saveAsTextFile("hdfs://...")
Daytona GraySort Record:
Data to sort 100TB
Daytona GraySort Record:
Data to sort 100TB
Img src -http://www.slideshare.net/databricks/new-directions-for-apache-spark-in-2015Img src -http://www.slideshare.net/databricks/new-directions-for-apache-spark-in-2015
Hadoop (2013):Hadoop (2013): 2100 nodes2100 nodes
72 minutes72 minutes
Spark (2014):Spark (2014): 206 nodes206 nodes
23 minutes23 minutes
Runs EverywhereRuns Everywhere
Img src - http://spark.apache.org/
Who are using Apache Spark ?Who are using Apache Spark ?
Img src - http://www.slideshare.net/datamantra/introduction-to-apache-spark-45062010Img src - http://www.slideshare.net/datamantra/introduction-to-apache-spark-45062010
AgendaAgenda
● What is Spark ?
● Why we need Spark ?
● Brief introduction to RDD
● Brief introduction to Spark Streaming
● How to install Spark ?
● Demo
● What is Spark ?
● Why we need Spark ?
● Brief introduction to RDD
● Brief introduction to Spark Streaming
● How to install Spark ?
● Demo
Brief Introduction to RDDBrief Introduction to RDD
 RDD stands for Resilient Distributed Dataset
 A fault tolerant, distributed collection of objects.
 In Spark all work is expressed in following ways:
1) Creating new RDD(s)
2) Transforming existing RDD(s)
3) Calling operations on RDD(s)
 RDD stands for Resilient Distributed Dataset
 A fault tolerant, distributed collection of objects.
 In Spark all work is expressed in following ways:
1) Creating new RDD(s)
2) Transforming existing RDD(s)
3) Calling operations on RDD(s)
Example (RDD)Example (RDD)
val master = "local"
val conf = new SparkConf().setMaster(master)
This is the Spark
Configuration
Example (RDD)Example (RDD)
val master = "local"
val conf = new SparkConf().setMaster(master)
val sc = new SparkContext(conf)
This is the Spark
Context
Contd...Contd...
Example (RDD)Example (RDD)
val master = "local"
val conf = new SparkConf().setMaster(master)
val sc = new SparkContext(conf)
This is the Spark
Context
Contd...Contd...
Example (RDD)Example (RDD)
val master = "local"
val conf = new SparkConf().setMaster(master)
val sc = new SparkContext(conf)
val lines = sc.textFile("data.txt")
Extract lines
from text file
Contd...Contd...
Example (RDD)Example (RDD)
val master = "local"
val conf = new SparkConf().setMaster(master)
val sc = new SparkContext(conf)
val lines = sc.textFile("demo.txt")
val words = lines.flatMap(_.split(" ")).map((_,1))
Map lines
to words
map
Contd...Contd...
Example (RDD)Example (RDD)
val master = "local"
val conf = new SparkConf().setMaster(master)
val sc = new SparkContext(conf)
val lines = sc.textFile("demo.txt")
val words = lines.flatMap(_.split(" ")).map((_,1))
val wordCountRDD = words.reduceByKey(_ + _)
Word Count RDD
map groupBy
Contd...Contd...
Example (RDD)Example (RDD)
val master = "local"
val conf = new SparkConf().setMaster(master)
val sc = new SparkContext(conf)
val lines = sc.textFile("demo.txt")
val words = lines.flatMap(_.split(" ")).map((_,1))
val wordCountRDD = words.reduceByKey(_ + _)
val wordCount = wordCountRDD.collect
Map[word, count] map groupBy
collect
Starts
Computation
Contd...Contd...
Example (RDD)Example (RDD)
val master = "local"
val conf = new SparkConf().setMaster(master)
val sc = new SparkContext(conf)
val lines = sc.textFile("demo.txt")
val words = lines.flatMap(_.split(" ")).map((_,1))
val wordCountRDD = words.reduceByKey(_ + _)
val wordCount = wordCountRDD.collect
map groupBy
collect
Transformation Action
Contd...Contd...
AgendaAgenda
● What is Spark ?
● Why we need Spark ?
● Brief introduction to RDD
● Brief introduction to Spark Streaming
● How to install Spark ?
● Demo
● What is Spark ?
● Why we need Spark ?
● Brief introduction to RDD
● Brief introduction to Spark Streaming
● How to install Spark ?
● Demo
Brief Introduction to Spark StreamingBrief Introduction to Spark Streaming
Img src - http://spark.apache.org/Img src - http://spark.apache.org/
How Spark Streaming Works ?How Spark Streaming Works ?
Img src - http://spark.apache.org/Img src - http://spark.apache.org/
Why we need Spark Streaming ?Why we need Spark Streaming ?
High Level API:High Level API:
TwitterUtils.createStream(...)
.filter(_.getText.contains("Spark"))
.countByWindow(Seconds(10), Seconds(5))
//Counting tweets on a sliding window
Fault Tolerant:Fault Tolerant:
Integration:Integration:
Img src - http://spark.apache.org/Img src - http://spark.apache.org/
Integrated with Spark SQL, MLLib,
GraphX...
Example (Spark Streaming)Example (Spark Streaming)
val master = "local"
val conf = new SparkConf().setMaster(master)
Specify Spark
Configuration
Example (Spark Streaming)Example (Spark Streaming)
val master = "local"
val conf = new SparkConf().setMaster(master)
val ssc = new StreamingContext(conf, Seconds(10))
Setup Stream
Context
Contd...Contd...
Example (Spark Streaming)Example (Spark Streaming)
val master = "local"
val conf = new SparkConf().setMaster(master)
val ssc = new StreamingContext(conf, Seconds(10))
val lines = ssc.socketTextStream("localhost", 9999)
This is the
ReceiverInputDStream
lines
DStream
at time
0 - 1
at time
1 - 2
at time
2 - 3
at time
3 - 4
Contd...Contd...
Example (Spark Streaming)Example (Spark Streaming)
val master = "local"
val conf = new SparkConf().setMaster(master)
val ssc = new StreamingContext(conf, Seconds(10))
val lines = ssc.socketTextStream("localhost", 9999)
val words = lines.flatMap(_.split(" ")).map((_, 1))
lines
DStream
at time
0 - 1
words/pairs
DStream
at time
1 - 2
at time
2 - 3
at time
3 - 4
map
Creates a Dstream
(sequence of RDDs)
Contd...Contd...
Example (Spark Streaming)Example (Spark Streaming)
val master = "local"
val conf = new SparkConf().setMaster(master)
val ssc = new StreamingContext(conf, Seconds(10))
val lines = ssc.socketTextStream("localhost", 9999)
val words = lines.flatMap(_.split(" ")).map((_, 1))
val wordCounts = words.reduceByKey(_ + _)
lines
DStream
at time
0 - 1
words/pairs
DStream
at time
1 - 2
at time
2 - 3
at time
3 - 4
wordCount
DStream
map
groupBy
Groups Dstream
by Words
Contd...Contd...
Example (Spark Streaming)Example (Spark Streaming)
val master = "local"
val conf = new SparkConf().setMaster(master)
val ssc = new StreamingContext(conf, Seconds(10))
val lines = ssc.socketTextStream("localhost", 9999)
val words = lines.flatMap(_.split(" ")).map((_, 1))
val wordCounts = words.reduceByKey(_ + _)
ssc.start()
lines
DStream
at time
0 - 1
words/pairs
DStream
at time
1 - 2
at time
2 - 3
at time
3 - 4
wordCount
DStream
map
groupBy
Start streaming
& computation
Contd...Contd...
AgendaAgenda
● What is Spark ?
● Why we need Spark ?
● Brief introduction to RDD
● Brief introduction to Spark Streaming
● How to install Spark ?
● Demo
● What is Spark ?
● Why we need Spark ?
● Brief introduction to RDD
● Brief introduction to Spark Streaming
● How to install Spark ?
● Demo
How to Install Spark ?
 Download Spark from -
http://spark.apache.org/downloads.html
 Extract it to a suitable directory.
 Go to the directory via terminal & run following command -
mvn -DskipTests clean package
 Now Spark is ready to run in Interactive mode
./bin/spark-shell
 Download Spark from -
http://spark.apache.org/downloads.html
 Extract it to a suitable directory.
 Go to the directory via terminal & run following command -
mvn -DskipTests clean package
 Now Spark is ready to run in Interactive mode
./bin/spark-shell
sbt Setup
name := "Spark Demo"
version := "1.0"
scalaVersion := "2.10.5"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "1.2.1",
"org.apache.spark" %% "spark-streaming" % "1.2.1",
"org.apache.spark" %% "spark-sql" % "1.2.1",
"org.apache.spark" %% "spark-mllib" % "1.2.1"
)
AgendaAgenda
● What is Spark ?
● Why we need Spark ?
● Brief introduction to RDD
● Brief introduction to Spark Streaming
● How to install Spark ?
● Demo
● What is Spark ?
● Why we need Spark ?
● Brief introduction to RDD
● Brief introduction to Spark Streaming
● How to install Spark ?
● Demo
Demo
Download Code
https://github.com/knoldus/spark-scala
References
http://spark.apache.org/
http://spark-summit.org/2014
http://spark.apache.org/docs/latest/quick-start.html
http://stackoverflow.com/questions/tagged/apache-spark
https://www.youtube.com/results?search_query=apache+spark
http://apache-spark-user-list.1001560.n3.nabble.com/
http://www.slideshare.net/paulszulc/apache-spark-101-in-50-min
Presenter:
himanshu@knoldus.com
@himanshug735
Presenter:
himanshu@knoldus.com
@himanshug735
Organizer:
@Knolspeak
http://www.knoldus.com
http://blog.knoldus.com
Organizer:
@Knolspeak
http://www.knoldus.com
http://blog.knoldus.com
Thanks

More Related Content

What's hot

Apache Spark
Apache Spark Apache Spark
Apache Spark
Majid Hajibaba
 
Sparkcamp @ Strata CA: Intro to Apache Spark with Hands-on Tutorials
Sparkcamp @ Strata CA: Intro to Apache Spark with Hands-on TutorialsSparkcamp @ Strata CA: Intro to Apache Spark with Hands-on Tutorials
Sparkcamp @ Strata CA: Intro to Apache Spark with Hands-on Tutorials
Databricks
 
Introduction to Structured Streaming | Big Data Hadoop Spark Tutorial | Cloud...
Introduction to Structured Streaming | Big Data Hadoop Spark Tutorial | Cloud...Introduction to Structured Streaming | Big Data Hadoop Spark Tutorial | Cloud...
Introduction to Structured Streaming | Big Data Hadoop Spark Tutorial | Cloud...
CloudxLab
 
Data analysis scala_spark
Data analysis scala_sparkData analysis scala_spark
Data analysis scala_spark
Yiguang Hu
 
Meet Up - Spark Stream Processing + Kafka
Meet Up - Spark Stream Processing + KafkaMeet Up - Spark Stream Processing + Kafka
Meet Up - Spark Stream Processing + Kafka
Knoldus Inc.
 
Elasticsearch And Apache Lucene For Apache Spark And MLlib
Elasticsearch And Apache Lucene For Apache Spark And MLlibElasticsearch And Apache Lucene For Apache Spark And MLlib
Elasticsearch And Apache Lucene For Apache Spark And MLlib
Jen Aman
 
Keeping Spark on Track: Productionizing Spark for ETL
Keeping Spark on Track: Productionizing Spark for ETLKeeping Spark on Track: Productionizing Spark for ETL
Keeping Spark on Track: Productionizing Spark for ETL
Databricks
 
Apache spark basics
Apache spark basicsApache spark basics
Apache spark basics
sparrowAnalytics.com
 
Apache Spark Tutorial
Apache Spark TutorialApache Spark Tutorial
Apache Spark Tutorial
Ahmet Bulut
 
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
DataWorks Summit
 
Spark stream - Kafka
Spark stream - Kafka Spark stream - Kafka
Spark stream - Kafka
Dori Waldman
 
Spark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted MalaskaSpark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted Malaska
Spark Summit
 
Spark ETL Techniques - Creating An Optimal Fantasy Baseball Roster
Spark ETL Techniques - Creating An Optimal Fantasy Baseball RosterSpark ETL Techniques - Creating An Optimal Fantasy Baseball Roster
Spark ETL Techniques - Creating An Optimal Fantasy Baseball Roster
Don Drake
 
Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...
Databricks
 
Sparkly Notebook: Interactive Analysis and Visualization with Spark
Sparkly Notebook: Interactive Analysis and Visualization with SparkSparkly Notebook: Interactive Analysis and Visualization with Spark
Sparkly Notebook: Interactive Analysis and Visualization with Spark
felixcss
 
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekProcessing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeek
Venkata Naga Ravi
 
Introduction to spark
Introduction to sparkIntroduction to spark
Introduction to spark
Duyhai Doan
 
Why your Spark job is failing
Why your Spark job is failingWhy your Spark job is failing
Why your Spark job is failing
Sandy Ryza
 
Beyond shuffling global big data tech conference 2015 sj
Beyond shuffling   global big data tech conference 2015 sjBeyond shuffling   global big data tech conference 2015 sj
Beyond shuffling global big data tech conference 2015 sj
Holden Karau
 
Beneath RDD in Apache Spark by Jacek Laskowski
Beneath RDD in Apache Spark by Jacek LaskowskiBeneath RDD in Apache Spark by Jacek Laskowski
Beneath RDD in Apache Spark by Jacek Laskowski
Spark Summit
 

What's hot (20)

Apache Spark
Apache Spark Apache Spark
Apache Spark
 
Sparkcamp @ Strata CA: Intro to Apache Spark with Hands-on Tutorials
Sparkcamp @ Strata CA: Intro to Apache Spark with Hands-on TutorialsSparkcamp @ Strata CA: Intro to Apache Spark with Hands-on Tutorials
Sparkcamp @ Strata CA: Intro to Apache Spark with Hands-on Tutorials
 
Introduction to Structured Streaming | Big Data Hadoop Spark Tutorial | Cloud...
Introduction to Structured Streaming | Big Data Hadoop Spark Tutorial | Cloud...Introduction to Structured Streaming | Big Data Hadoop Spark Tutorial | Cloud...
Introduction to Structured Streaming | Big Data Hadoop Spark Tutorial | Cloud...
 
Data analysis scala_spark
Data analysis scala_sparkData analysis scala_spark
Data analysis scala_spark
 
Meet Up - Spark Stream Processing + Kafka
Meet Up - Spark Stream Processing + KafkaMeet Up - Spark Stream Processing + Kafka
Meet Up - Spark Stream Processing + Kafka
 
Elasticsearch And Apache Lucene For Apache Spark And MLlib
Elasticsearch And Apache Lucene For Apache Spark And MLlibElasticsearch And Apache Lucene For Apache Spark And MLlib
Elasticsearch And Apache Lucene For Apache Spark And MLlib
 
Keeping Spark on Track: Productionizing Spark for ETL
Keeping Spark on Track: Productionizing Spark for ETLKeeping Spark on Track: Productionizing Spark for ETL
Keeping Spark on Track: Productionizing Spark for ETL
 
Apache spark basics
Apache spark basicsApache spark basics
Apache spark basics
 
Apache Spark Tutorial
Apache Spark TutorialApache Spark Tutorial
Apache Spark Tutorial
 
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
 
Spark stream - Kafka
Spark stream - Kafka Spark stream - Kafka
Spark stream - Kafka
 
Spark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted MalaskaSpark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted Malaska
 
Spark ETL Techniques - Creating An Optimal Fantasy Baseball Roster
Spark ETL Techniques - Creating An Optimal Fantasy Baseball RosterSpark ETL Techniques - Creating An Optimal Fantasy Baseball Roster
Spark ETL Techniques - Creating An Optimal Fantasy Baseball Roster
 
Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...
 
Sparkly Notebook: Interactive Analysis and Visualization with Spark
Sparkly Notebook: Interactive Analysis and Visualization with SparkSparkly Notebook: Interactive Analysis and Visualization with Spark
Sparkly Notebook: Interactive Analysis and Visualization with Spark
 
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekProcessing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeek
 
Introduction to spark
Introduction to sparkIntroduction to spark
Introduction to spark
 
Why your Spark job is failing
Why your Spark job is failingWhy your Spark job is failing
Why your Spark job is failing
 
Beyond shuffling global big data tech conference 2015 sj
Beyond shuffling   global big data tech conference 2015 sjBeyond shuffling   global big data tech conference 2015 sj
Beyond shuffling global big data tech conference 2015 sj
 
Beneath RDD in Apache Spark by Jacek Laskowski
Beneath RDD in Apache Spark by Jacek LaskowskiBeneath RDD in Apache Spark by Jacek Laskowski
Beneath RDD in Apache Spark by Jacek Laskowski
 

Similar to Introduction to Spark with Scala

[QE 2018] Łukasz Gawron – Testing Batch and Streaming Spark Applications
[QE 2018] Łukasz Gawron – Testing Batch and Streaming Spark Applications[QE 2018] Łukasz Gawron – Testing Batch and Streaming Spark Applications
[QE 2018] Łukasz Gawron – Testing Batch and Streaming Spark Applications
Future Processing
 
Testing batch and streaming Spark applications
Testing batch and streaming Spark applicationsTesting batch and streaming Spark applications
Testing batch and streaming Spark applications
Łukasz Gawron
 
NigthClazz Spark - Machine Learning / Introduction à Spark et Zeppelin
NigthClazz Spark - Machine Learning / Introduction à Spark et ZeppelinNigthClazz Spark - Machine Learning / Introduction à Spark et Zeppelin
NigthClazz Spark - Machine Learning / Introduction à Spark et Zeppelin
Zenika
 
Spark Streaming Programming Techniques You Should Know with Gerard Maas
Spark Streaming Programming Techniques You Should Know with Gerard MaasSpark Streaming Programming Techniques You Should Know with Gerard Maas
Spark Streaming Programming Techniques You Should Know with Gerard Maas
Spark Summit
 
Introduction to Scalding and Monoids
Introduction to Scalding and MonoidsIntroduction to Scalding and Monoids
Introduction to Scalding and Monoids
Hugo Gävert
 
Memulai Data Processing dengan Spark dan Python
Memulai Data Processing dengan Spark dan PythonMemulai Data Processing dengan Spark dan Python
Memulai Data Processing dengan Spark dan Python
Ridwan Fadjar
 
Big Data LDN 2017: Processing Fast Data With Apache Spark: the Tale of Two APIs
Big Data LDN 2017: Processing Fast Data With Apache Spark: the Tale of Two APIsBig Data LDN 2017: Processing Fast Data With Apache Spark: the Tale of Two APIs
Big Data LDN 2017: Processing Fast Data With Apache Spark: the Tale of Two APIs
Matt Stubbs
 
A Tale of Two APIs: Using Spark Streaming In Production
A Tale of Two APIs: Using Spark Streaming In ProductionA Tale of Two APIs: Using Spark Streaming In Production
A Tale of Two APIs: Using Spark Streaming In Production
Lightbend
 
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
Аліна Шепшелей
 
SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...
SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...
SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...
Inhacking
 
Spark Streaming, Machine Learning and meetup.com streaming API.
Spark Streaming, Machine Learning and  meetup.com streaming API.Spark Streaming, Machine Learning and  meetup.com streaming API.
Spark Streaming, Machine Learning and meetup.com streaming API.
Sergey Zelvenskiy
 
Productionizing your Streaming Jobs
Productionizing your Streaming JobsProductionizing your Streaming Jobs
Productionizing your Streaming Jobs
Databricks
 
Apache Spark Workshop
Apache Spark WorkshopApache Spark Workshop
Apache Spark Workshop
Michael Spector
 
Big Data Analytics with Scala at SCALA.IO 2013
Big Data Analytics with Scala at SCALA.IO 2013Big Data Analytics with Scala at SCALA.IO 2013
Big Data Analytics with Scala at SCALA.IO 2013
Samir Bessalah
 
London Cassandra Meetup 10/23: Apache Cassandra at British Gas Connected Home...
London Cassandra Meetup 10/23: Apache Cassandra at British Gas Connected Home...London Cassandra Meetup 10/23: Apache Cassandra at British Gas Connected Home...
London Cassandra Meetup 10/23: Apache Cassandra at British Gas Connected Home...
DataStax Academy
 
Spark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark MeetupSpark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark Meetup
Databricks
 
Spark streaming with kafka
Spark streaming with kafkaSpark streaming with kafka
Spark streaming with kafka
Dori Waldman
 
Using spark 1.2 with Java 8 and Cassandra
Using spark 1.2 with Java 8 and CassandraUsing spark 1.2 with Java 8 and Cassandra
Using spark 1.2 with Java 8 and Cassandra
Denis Dus
 
Spark what's new what's coming
Spark what's new what's comingSpark what's new what's coming
Spark what's new what's coming
Databricks
 
Spark with Elasticsearch - umd version 2014
Spark with Elasticsearch - umd version 2014Spark with Elasticsearch - umd version 2014
Spark with Elasticsearch - umd version 2014Holden Karau
 

Similar to Introduction to Spark with Scala (20)

[QE 2018] Łukasz Gawron – Testing Batch and Streaming Spark Applications
[QE 2018] Łukasz Gawron – Testing Batch and Streaming Spark Applications[QE 2018] Łukasz Gawron – Testing Batch and Streaming Spark Applications
[QE 2018] Łukasz Gawron – Testing Batch and Streaming Spark Applications
 
Testing batch and streaming Spark applications
Testing batch and streaming Spark applicationsTesting batch and streaming Spark applications
Testing batch and streaming Spark applications
 
NigthClazz Spark - Machine Learning / Introduction à Spark et Zeppelin
NigthClazz Spark - Machine Learning / Introduction à Spark et ZeppelinNigthClazz Spark - Machine Learning / Introduction à Spark et Zeppelin
NigthClazz Spark - Machine Learning / Introduction à Spark et Zeppelin
 
Spark Streaming Programming Techniques You Should Know with Gerard Maas
Spark Streaming Programming Techniques You Should Know with Gerard MaasSpark Streaming Programming Techniques You Should Know with Gerard Maas
Spark Streaming Programming Techniques You Should Know with Gerard Maas
 
Introduction to Scalding and Monoids
Introduction to Scalding and MonoidsIntroduction to Scalding and Monoids
Introduction to Scalding and Monoids
 
Memulai Data Processing dengan Spark dan Python
Memulai Data Processing dengan Spark dan PythonMemulai Data Processing dengan Spark dan Python
Memulai Data Processing dengan Spark dan Python
 
Big Data LDN 2017: Processing Fast Data With Apache Spark: the Tale of Two APIs
Big Data LDN 2017: Processing Fast Data With Apache Spark: the Tale of Two APIsBig Data LDN 2017: Processing Fast Data With Apache Spark: the Tale of Two APIs
Big Data LDN 2017: Processing Fast Data With Apache Spark: the Tale of Two APIs
 
A Tale of Two APIs: Using Spark Streaming In Production
A Tale of Two APIs: Using Spark Streaming In ProductionA Tale of Two APIs: Using Spark Streaming In Production
A Tale of Two APIs: Using Spark Streaming In Production
 
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
 
SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...
SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...
SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...
 
Spark Streaming, Machine Learning and meetup.com streaming API.
Spark Streaming, Machine Learning and  meetup.com streaming API.Spark Streaming, Machine Learning and  meetup.com streaming API.
Spark Streaming, Machine Learning and meetup.com streaming API.
 
Productionizing your Streaming Jobs
Productionizing your Streaming JobsProductionizing your Streaming Jobs
Productionizing your Streaming Jobs
 
Apache Spark Workshop
Apache Spark WorkshopApache Spark Workshop
Apache Spark Workshop
 
Big Data Analytics with Scala at SCALA.IO 2013
Big Data Analytics with Scala at SCALA.IO 2013Big Data Analytics with Scala at SCALA.IO 2013
Big Data Analytics with Scala at SCALA.IO 2013
 
London Cassandra Meetup 10/23: Apache Cassandra at British Gas Connected Home...
London Cassandra Meetup 10/23: Apache Cassandra at British Gas Connected Home...London Cassandra Meetup 10/23: Apache Cassandra at British Gas Connected Home...
London Cassandra Meetup 10/23: Apache Cassandra at British Gas Connected Home...
 
Spark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark MeetupSpark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark Meetup
 
Spark streaming with kafka
Spark streaming with kafkaSpark streaming with kafka
Spark streaming with kafka
 
Using spark 1.2 with Java 8 and Cassandra
Using spark 1.2 with Java 8 and CassandraUsing spark 1.2 with Java 8 and Cassandra
Using spark 1.2 with Java 8 and Cassandra
 
Spark what's new what's coming
Spark what's new what's comingSpark what's new what's coming
Spark what's new what's coming
 
Spark with Elasticsearch - umd version 2014
Spark with Elasticsearch - umd version 2014Spark with Elasticsearch - umd version 2014
Spark with Elasticsearch - umd version 2014
 

Recently uploaded

DfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributionsDfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributions
gestioneergodomus
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Dr.Costas Sachpazis
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
Massimo Talia
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
Aditya Rajan Patra
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
thanhdowork
 
Investor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptxInvestor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptx
AmarGB2
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
SamSarthak3
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
Amil Baba Dawood bangali
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
AJAYKUMARPUND1
 
The Role of Electrical and Electronics Engineers in IOT Technology.pdf
The Role of Electrical and Electronics Engineers in IOT Technology.pdfThe Role of Electrical and Electronics Engineers in IOT Technology.pdf
The Role of Electrical and Electronics Engineers in IOT Technology.pdf
Nettur Technical Training Foundation
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
bakpo1
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
WENKENLI1
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
Pratik Pawar
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
gerogepatton
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
Kamal Acharya
 
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
ssuser7dcef0
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Sreedhar Chowdam
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
VENKATESHvenky89705
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
fxintegritypublishin
 
14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
SyedAbiiAzazi1
 

Recently uploaded (20)

DfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributionsDfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributions
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
 
Investor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptxInvestor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptx
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
 
The Role of Electrical and Electronics Engineers in IOT Technology.pdf
The Role of Electrical and Electronics Engineers in IOT Technology.pdfThe Role of Electrical and Electronics Engineers in IOT Technology.pdf
The Role of Electrical and Electronics Engineers in IOT Technology.pdf
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
 
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
 
14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
 

Introduction to Spark with Scala

  • 1. Introduction to Spark with Scala Introduction to Spark with Scala Himanshu Gupta Software Consultant Knoldus Software LLP Himanshu Gupta Software Consultant Knoldus Software LLP
  • 2. Who am I ?Who am I ? Himanshu Gupta (@himanshug735) Software Consultant at Knoldus Software LLP Spark & Scala enthusiast Himanshu Gupta (@himanshug735) Software Consultant at Knoldus Software LLP Spark & Scala enthusiast
  • 3. AgendaAgenda ● What is Spark ? ● Why we need Spark ? ● Brief introduction to RDD ● Brief introduction to Spark Streaming ● How to install Spark ? ● Demo ● What is Spark ? ● Why we need Spark ? ● Brief introduction to RDD ● Brief introduction to Spark Streaming ● How to install Spark ? ● Demo
  • 4. What is Apache Spark ?What is Apache Spark ? Fast and general engine for large-scale data processing with libraries for SQL, streaming, advanced analytics Fast and general engine for large-scale data processing with libraries for SQL, streaming, advanced analytics
  • 5. Spark HistorySpark History Project Begins at UCB AMP Lab 20092009 20102010 Open Sourced Apache Incubator 20112011 20122012 20132013 20142014 20152015 Data Frames Cloudera Support Apache Top level Spark Summit 2013 Spark Summit 2014
  • 6. Spark StackSpark Stack Img src - http://spark.apache.org/Img src - http://spark.apache.org/
  • 7. Fastest Growing Open Source ProjectFastest Growing Open Source Project Img src - https://databricks.com/blog/2015/03/31/spark-turns-five-years-old.htmlImg src - https://databricks.com/blog/2015/03/31/spark-turns-five-years-old.html
  • 8. AgendaAgenda ● What is Spark ? ● Why we need Spark ? ● Brief introduction to RDD ● Brief introduction to Spark Streaming ● How to install Spark ? ● Demo ● What is Spark ? ● Why we need Spark ? ● Brief introduction to RDD ● Brief introduction to Spark Streaming ● How to install Spark ? ● Demo
  • 9. Code SizeCode Size Img src - http://spark-summit.org/wp-content/uploads/2013/10/Zaharia-spark-summit-2013-matei.pdfImg src - http://spark-summit.org/wp-content/uploads/2013/10/Zaharia-spark-summit-2013-matei.pdf
  • 10. Word Count Ex. public static class WordCountMapClass extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); } public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { String line = value.toString(); StringTokenizer itr = new StringTokenizer(line); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); output.collect(word, one); } } public static class WorkdCountReduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { int sum = 0; while (values.hasNext()) { sum += values.next().get(); } output.collect(key, new IntWritable(sum)); } } val file = spark.textFile("hdfs://...") val counts = file.flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_ + _) counts.saveAsTextFile("hdfs://...")
  • 11. Daytona GraySort Record: Data to sort 100TB Daytona GraySort Record: Data to sort 100TB Img src -http://www.slideshare.net/databricks/new-directions-for-apache-spark-in-2015Img src -http://www.slideshare.net/databricks/new-directions-for-apache-spark-in-2015 Hadoop (2013):Hadoop (2013): 2100 nodes2100 nodes 72 minutes72 minutes Spark (2014):Spark (2014): 206 nodes206 nodes 23 minutes23 minutes
  • 12. Runs EverywhereRuns Everywhere Img src - http://spark.apache.org/
  • 13. Who are using Apache Spark ?Who are using Apache Spark ? Img src - http://www.slideshare.net/datamantra/introduction-to-apache-spark-45062010Img src - http://www.slideshare.net/datamantra/introduction-to-apache-spark-45062010
  • 14. AgendaAgenda ● What is Spark ? ● Why we need Spark ? ● Brief introduction to RDD ● Brief introduction to Spark Streaming ● How to install Spark ? ● Demo ● What is Spark ? ● Why we need Spark ? ● Brief introduction to RDD ● Brief introduction to Spark Streaming ● How to install Spark ? ● Demo
  • 15. Brief Introduction to RDDBrief Introduction to RDD  RDD stands for Resilient Distributed Dataset  A fault tolerant, distributed collection of objects.  In Spark all work is expressed in following ways: 1) Creating new RDD(s) 2) Transforming existing RDD(s) 3) Calling operations on RDD(s)  RDD stands for Resilient Distributed Dataset  A fault tolerant, distributed collection of objects.  In Spark all work is expressed in following ways: 1) Creating new RDD(s) 2) Transforming existing RDD(s) 3) Calling operations on RDD(s)
  • 16. Example (RDD)Example (RDD) val master = "local" val conf = new SparkConf().setMaster(master) This is the Spark Configuration
  • 17. Example (RDD)Example (RDD) val master = "local" val conf = new SparkConf().setMaster(master) val sc = new SparkContext(conf) This is the Spark Context Contd...Contd...
  • 18. Example (RDD)Example (RDD) val master = "local" val conf = new SparkConf().setMaster(master) val sc = new SparkContext(conf) This is the Spark Context Contd...Contd...
  • 19. Example (RDD)Example (RDD) val master = "local" val conf = new SparkConf().setMaster(master) val sc = new SparkContext(conf) val lines = sc.textFile("data.txt") Extract lines from text file Contd...Contd...
  • 20. Example (RDD)Example (RDD) val master = "local" val conf = new SparkConf().setMaster(master) val sc = new SparkContext(conf) val lines = sc.textFile("demo.txt") val words = lines.flatMap(_.split(" ")).map((_,1)) Map lines to words map Contd...Contd...
  • 21. Example (RDD)Example (RDD) val master = "local" val conf = new SparkConf().setMaster(master) val sc = new SparkContext(conf) val lines = sc.textFile("demo.txt") val words = lines.flatMap(_.split(" ")).map((_,1)) val wordCountRDD = words.reduceByKey(_ + _) Word Count RDD map groupBy Contd...Contd...
  • 22. Example (RDD)Example (RDD) val master = "local" val conf = new SparkConf().setMaster(master) val sc = new SparkContext(conf) val lines = sc.textFile("demo.txt") val words = lines.flatMap(_.split(" ")).map((_,1)) val wordCountRDD = words.reduceByKey(_ + _) val wordCount = wordCountRDD.collect Map[word, count] map groupBy collect Starts Computation Contd...Contd...
  • 23. Example (RDD)Example (RDD) val master = "local" val conf = new SparkConf().setMaster(master) val sc = new SparkContext(conf) val lines = sc.textFile("demo.txt") val words = lines.flatMap(_.split(" ")).map((_,1)) val wordCountRDD = words.reduceByKey(_ + _) val wordCount = wordCountRDD.collect map groupBy collect Transformation Action Contd...Contd...
  • 24. AgendaAgenda ● What is Spark ? ● Why we need Spark ? ● Brief introduction to RDD ● Brief introduction to Spark Streaming ● How to install Spark ? ● Demo ● What is Spark ? ● Why we need Spark ? ● Brief introduction to RDD ● Brief introduction to Spark Streaming ● How to install Spark ? ● Demo
  • 25. Brief Introduction to Spark StreamingBrief Introduction to Spark Streaming Img src - http://spark.apache.org/Img src - http://spark.apache.org/
  • 26. How Spark Streaming Works ?How Spark Streaming Works ? Img src - http://spark.apache.org/Img src - http://spark.apache.org/
  • 27. Why we need Spark Streaming ?Why we need Spark Streaming ? High Level API:High Level API: TwitterUtils.createStream(...) .filter(_.getText.contains("Spark")) .countByWindow(Seconds(10), Seconds(5)) //Counting tweets on a sliding window Fault Tolerant:Fault Tolerant: Integration:Integration: Img src - http://spark.apache.org/Img src - http://spark.apache.org/ Integrated with Spark SQL, MLLib, GraphX...
  • 28. Example (Spark Streaming)Example (Spark Streaming) val master = "local" val conf = new SparkConf().setMaster(master) Specify Spark Configuration
  • 29. Example (Spark Streaming)Example (Spark Streaming) val master = "local" val conf = new SparkConf().setMaster(master) val ssc = new StreamingContext(conf, Seconds(10)) Setup Stream Context Contd...Contd...
  • 30. Example (Spark Streaming)Example (Spark Streaming) val master = "local" val conf = new SparkConf().setMaster(master) val ssc = new StreamingContext(conf, Seconds(10)) val lines = ssc.socketTextStream("localhost", 9999) This is the ReceiverInputDStream lines DStream at time 0 - 1 at time 1 - 2 at time 2 - 3 at time 3 - 4 Contd...Contd...
  • 31. Example (Spark Streaming)Example (Spark Streaming) val master = "local" val conf = new SparkConf().setMaster(master) val ssc = new StreamingContext(conf, Seconds(10)) val lines = ssc.socketTextStream("localhost", 9999) val words = lines.flatMap(_.split(" ")).map((_, 1)) lines DStream at time 0 - 1 words/pairs DStream at time 1 - 2 at time 2 - 3 at time 3 - 4 map Creates a Dstream (sequence of RDDs) Contd...Contd...
  • 32. Example (Spark Streaming)Example (Spark Streaming) val master = "local" val conf = new SparkConf().setMaster(master) val ssc = new StreamingContext(conf, Seconds(10)) val lines = ssc.socketTextStream("localhost", 9999) val words = lines.flatMap(_.split(" ")).map((_, 1)) val wordCounts = words.reduceByKey(_ + _) lines DStream at time 0 - 1 words/pairs DStream at time 1 - 2 at time 2 - 3 at time 3 - 4 wordCount DStream map groupBy Groups Dstream by Words Contd...Contd...
  • 33. Example (Spark Streaming)Example (Spark Streaming) val master = "local" val conf = new SparkConf().setMaster(master) val ssc = new StreamingContext(conf, Seconds(10)) val lines = ssc.socketTextStream("localhost", 9999) val words = lines.flatMap(_.split(" ")).map((_, 1)) val wordCounts = words.reduceByKey(_ + _) ssc.start() lines DStream at time 0 - 1 words/pairs DStream at time 1 - 2 at time 2 - 3 at time 3 - 4 wordCount DStream map groupBy Start streaming & computation Contd...Contd...
  • 34. AgendaAgenda ● What is Spark ? ● Why we need Spark ? ● Brief introduction to RDD ● Brief introduction to Spark Streaming ● How to install Spark ? ● Demo ● What is Spark ? ● Why we need Spark ? ● Brief introduction to RDD ● Brief introduction to Spark Streaming ● How to install Spark ? ● Demo
  • 35. How to Install Spark ?  Download Spark from - http://spark.apache.org/downloads.html  Extract it to a suitable directory.  Go to the directory via terminal & run following command - mvn -DskipTests clean package  Now Spark is ready to run in Interactive mode ./bin/spark-shell  Download Spark from - http://spark.apache.org/downloads.html  Extract it to a suitable directory.  Go to the directory via terminal & run following command - mvn -DskipTests clean package  Now Spark is ready to run in Interactive mode ./bin/spark-shell
  • 36. sbt Setup name := "Spark Demo" version := "1.0" scalaVersion := "2.10.5" libraryDependencies ++= Seq( "org.apache.spark" %% "spark-core" % "1.2.1", "org.apache.spark" %% "spark-streaming" % "1.2.1", "org.apache.spark" %% "spark-sql" % "1.2.1", "org.apache.spark" %% "spark-mllib" % "1.2.1" )
  • 37. AgendaAgenda ● What is Spark ? ● Why we need Spark ? ● Brief introduction to RDD ● Brief introduction to Spark Streaming ● How to install Spark ? ● Demo ● What is Spark ? ● Why we need Spark ? ● Brief introduction to RDD ● Brief introduction to Spark Streaming ● How to install Spark ? ● Demo
  • 38. Demo

Editor's Notes

  1. Why javascript, why we are bothering to do javascript. beacuse as you know its typical to do web development without javascript. ITs the only language, that&amp;apos;s basically supported web browser. So at some point you need javascript code. ITs scripting language, not designed to scale large rich web application
  2. Easy to learn Now Javascript is easy to pick up because of the very flexible nature of the language. Because Javascript is not a compiled language, things like memory management is not big concern. Easy to Edit Its is easy to get started with because you don&amp;apos;t need much to do so. As we know, its a scripting language, so the code you write does not need to be compiled and as such does not require any compiler or any expensive software. Prototyping Language its a prototyping language. In a prototyping language, every object is an instance of a class. What that means is that objects can be defined, and developed on the fly to suit a particular use, rather than having to build out specific classes to handle a specific need Easy to debug There are many tools like firebug to debug javascript. to trace error
  3. Why we need to do compiling in JavaScript? gained many new apis, but language itself is mostly the same. Some developers really like javscript, but they feel that there should be other features included in javscript. many platforms that compiles high level language to javascript. It removes many of the hidden dangers that Javascript has like: * Missing critical semicolons you can write better javascript code in othe language. Major Reason:- to consistently work with the same language both on the server and on the client. In this way one doesn&amp;apos;t need to change gears all the time
  4. Typescript compilers that compiles in javascript and add some new features such as type annotations, classes and interfaces. CoffeeScript, Dart Coffee script is very popular and targets javascript. One of the main reason of its popularity to get rid of javascript c like syntax, because some people apparently dislike curly braces and semicolon very much. CoffeeScript is inspired by Ruby, Python and Haskell. Google created Dart as a replacement of Dart. They are hoping that one day they will replace javascript. Parenscript, Emscripten, JSIL, GWT. Js.scala
  5. Scala- an acronym for “Scalable Language”. a careful integration of object-oriented and functional language concepts.Scala runs on the JVM. . scala.js supports all of scala language so it can compile entire scala standard library.
  6. Scala- an acronym for “Scalable Language”. a careful integration of object-oriented and functional language concepts.Scala runs on the JVM. . scala.js supports all of scala language so it can compile entire scala standard library.
  7. Scala- an acronym for “Scalable Language”. a careful integration of object-oriented and functional language concepts.Scala runs on the JVM. . scala.js supports all of scala language so it can compile entire scala standard library.
  8. Scala- an acronym for “Scalable Language”. a careful integration of object-oriented and functional language concepts.Scala runs on the JVM. . scala.js supports all of scala language so it can compile entire scala standard library.
  9. Scala- an acronym for “Scalable Language”. a careful integration of object-oriented and functional language concepts.Scala runs on the JVM. . scala.js supports all of scala language so it can compile entire scala standard library.
  10. Scala- an acronym for “Scalable Language”. a careful integration of object-oriented and functional language concepts.Scala runs on the JVM. . scala.js supports all of scala language so it can compile entire scala standard library.
  11. Scala- an acronym for “Scalable Language”. a careful integration of object-oriented and functional language concepts.Scala runs on the JVM. . scala.js supports all of scala language so it can compile entire scala standard library.
  12. Scala- an acronym for “Scalable Language”. a careful integration of object-oriented and functional language concepts.Scala runs on the JVM. . scala.js supports all of scala language so it can compile entire scala standard library.
  13. In Scala, one can define implicit conversions as methods with the implicit keywordcase class ID(val id: String) implicit def stringToID(s: String): ID = ID(s)def lookup(id: ID): Book = { ... } val book = lookup(&amp;quot;foo&amp;quot;) val id: ID = &amp;quot;bar&amp;quot; is valid, because the type-checker will rewrite it as val book = lookup(stringToID(&amp;quot;foo&amp;quot;) User-defined dynamic types :- Since version 2.10, scala has special feature scala.dynamic, which is used to define custom dynamic types. it allows to call method on objects, that don&amp;apos;t exist. It doesn&amp;apos;t have any member. It is marker interface. import scala.language.dynamics empl.lname = &amp;quot;Doe&amp;quot;. empl.set(&amp;quot;lname&amp;quot;, &amp;quot;Doe&amp;quot;) when you call empl.lname = &amp;quot;Doe&amp;quot;, the compiler converts it to a call empl.updateDynamic(&amp;quot;lname&amp;quot;)(&amp;quot;Doe&amp;quot;).
  14. compiles Scala code to JavaScript, allowing you to write your web application entirely in Scala!. Scala.js compiles full-fledged Scala code down to JavaScript, which can be integrated in your Web application. It provides very good interoperability with JavaScript code, both from Scala.js to JavaScript and vice versa. E.g., use jQuery and HTML5 from your Scala.js code.Since scala as a language and also its library rely on java standard library, so it is impossible to support all of scala without supporting some of java. hence scala.js includes partial part of java standard library , written in scala itself If you are developing rich internet application in scala and you are using all goodness of scala but you are sacrificing javascript interoperability, then you can use scala.js , a scala to javascript compiler. So that you can build entire web application in scala. A javascript backend for scala
  15. scala.js compiles your scala code to javascript code. its just a usual scala compiler that takes scala code and produces javascript code instead of JVM byte code. on the other hand, js-scala is a scala library providing composable javascript code generator. You can use them in your usual scala program to write javascript program generator. your scala program will be compile into JVM byte code using scala compiler and executing of this program generates javasript program. The main difference is that js-scala is a library while scala.js is a compiler. Suppose that you want to write a JavaScript program solving a given problem. In js-scala you write aScala program generating a JavaScript program solving the given problem. In scala.js you write a Scala program solving the given problem.
  16. Now-a days interoperability between statically typed and dynamically typed is getting demanded day by day that&amp;apos;s why many statically typed languages are targeting javascript. statically typed means, when a type of variable is known at compile time. In dynamically typed means, when a type of variable is interpreted at run time. interoperability with object oriented and functional features of javascript is essential but existing language has poor support for this. But scala.js interoperatibility system is based on powerful for type-directed interoperability with dynamically typed languages. It accommodates both the functional and object oriented features of scala and provides very natural interoperability with both language. It is expressive enough to represnt Dom, jquery in its statically and dynamically typed language. Scala has a very powerful type system with unique combination of features:traits, genrics, implicit conversion, higher order function and user defined dynamic type. As a functional and object-oriented language, its concepts are also very close to JavaScript, behind the type system: no static methods
  17. Now-a days interoperability between statically typed and dynamically typed is getting demanded day by day that&amp;apos;s why many statically typed languages are targeting javascript. statically typed means, when a type of variable is known at compile time. In dynamically typed means, when a type of variable is interpreted at run time. interoperability with object oriented and functional features of javascript is essential but existing language has poor support for this. But scala.js interoperatibility system is based on powerful for type-directed interoperability with dynamically typed languages. It accommodates both the functional and object oriented features of scala and provides very natural interoperability with both language. It is expressive enough to represnt Dom, jquery in its statically and dynamically typed language. Scala has a very powerful type system with unique combination of features:traits, genrics, implicit conversion, higher order function and user defined dynamic type. As a functional and object-oriented language, its concepts are also very close to JavaScript, behind the type system: no static methods
  18. Now-a days interoperability between statically typed and dynamically typed is getting demanded day by day that&amp;apos;s why many statically typed languages are targeting javascript. statically typed means, when a type of variable is known at compile time. In dynamically typed means, when a type of variable is interpreted at run time. interoperability with object oriented and functional features of javascript is essential but existing language has poor support for this. But scala.js interoperatibility system is based on powerful for type-directed interoperability with dynamically typed languages. It accommodates both the functional and object oriented features of scala and provides very natural interoperability with both language. It is expressive enough to represnt Dom, jquery in its statically and dynamically typed language. Scala has a very powerful type system with unique combination of features:traits, genrics, implicit conversion, higher order function and user defined dynamic type. As a functional and object-oriented language, its concepts are also very close to JavaScript, behind the type system: no static methods
  19. Now-a days interoperability between statically typed and dynamically typed is getting demanded day by day that&amp;apos;s why many statically typed languages are targeting javascript. statically typed means, when a type of variable is known at compile time. In dynamically typed means, when a type of variable is interpreted at run time. interoperability with object oriented and functional features of javascript is essential but existing language has poor support for this. But scala.js interoperatibility system is based on powerful for type-directed interoperability with dynamically typed languages. It accommodates both the functional and object oriented features of scala and provides very natural interoperability with both language. It is expressive enough to represnt Dom, jquery in its statically and dynamically typed language. Scala has a very powerful type system with unique combination of features:traits, genrics, implicit conversion, higher order function and user defined dynamic type. As a functional and object-oriented language, its concepts are also very close to JavaScript, behind the type system: no static methods
  20. Now-a days interoperability between statically typed and dynamically typed is getting demanded day by day that&amp;apos;s why many statically typed languages are targeting javascript. statically typed means, when a type of variable is known at compile time. In dynamically typed means, when a type of variable is interpreted at run time. interoperability with object oriented and functional features of javascript is essential but existing language has poor support for this. But scala.js interoperatibility system is based on powerful for type-directed interoperability with dynamically typed languages. It accommodates both the functional and object oriented features of scala and provides very natural interoperability with both language. It is expressive enough to represnt Dom, jquery in its statically and dynamically typed language. Scala has a very powerful type system with unique combination of features:traits, genrics, implicit conversion, higher order function and user defined dynamic type. As a functional and object-oriented language, its concepts are also very close to JavaScript, behind the type system: no static methods
  21. Now-a days interoperability between statically typed and dynamically typed is getting demanded day by day that&amp;apos;s why many statically typed languages are targeting javascript. statically typed means, when a type of variable is known at compile time. In dynamically typed means, when a type of variable is interpreted at run time. interoperability with object oriented and functional features of javascript is essential but existing language has poor support for this. But scala.js interoperatibility system is based on powerful for type-directed interoperability with dynamically typed languages. It accommodates both the functional and object oriented features of scala and provides very natural interoperability with both language. It is expressive enough to represnt Dom, jquery in its statically and dynamically typed language. Scala has a very powerful type system with unique combination of features:traits, genrics, implicit conversion, higher order function and user defined dynamic type. As a functional and object-oriented language, its concepts are also very close to JavaScript, behind the type system: no static methods
  22. Support all of Scala (including macros!) except few semantic difference Because the target platform of Scala.js is quite different from that of Scala, a few language semantics differences exist.
  23. Piyush Mishra