Spark Core

Introducing Spark Core
Friday, January 22, 16

Agenda
• Assumptions
• Why Spark?
• What you need to know to begin?

Assumptions
• You want to learn Apache Spark, but need to know where
to begin
• You need to know the fundamentals of Spark in order to
progress in your learning of Spark
• You need to evaluate if Spark could be an appropriate ﬁt
for your use cases or career growth
One or more of the following

In a nutshell, why spark?
• Engine for eﬃcient large-scale processing. It’s faster than
Hadoop MapReduce
• Spark can complement your existing Hadoop investments
such as HDFS and Hive
• Rich ecosystem including support for SQL, Machine
Learning, Steaming and multiple language APIs such as
Scala, Python and Java

Introduction
• Ok, so where should I start?

Spark Essentials
• Resilient Distributed Datasets (RDD)
• Transformers
• Actions
• Spark Driver Programs and SparkContext
To begin, you need to know:

Resilient Distributed Datasets (RDDs)
• RDDs are Spark’s primary abstraction for data
interaction (lazy, in memory)
• RDDs are an immutable, distributed collection of
elements separated into partitions
• There are multiple types of RDDs
• RDDs can be created from an external data sets such as
Hadoop InputFormats, text ﬁles on a variety of ﬁle
systems or existing RDDs via a Spark Transformations

Transformations
• RDD functions which return pointers to new RDDs
(remember: lazy)
• map, ﬂatMap, ﬁlter, etc.

Actions
• RDD functions which return values to the driver
• reduce, collect, count, etc.

Spark RDDs, Transformations, Actions Diagram
Load from External Source
Example: textFile
Transformations Actions
RDDs
Output Value(s)
Example: count, collect
5, ['a','b', 'c']

Spark Driver Programs and Context
• Spark driver is a program that declares transformations and
actions on RDDs of data
• A driver submits the serialized RDD graph to the master
where the master creates tasks. These tasks are delegated to
the workers for execution.
• Workers are where the tasks are actually executed.

Driver Program and SparkContext
Image borrowed from http://spark.apache.org/docs/latest/cluster-overview.html

References
• For course information and discount coupons, visit http://
www.supergloo.com/
• Learning Spark Book Summary http://www.amazon.com/
Learning-Spark-Summary-Lightning-Fast-Deconstructed-
ebook/dp/B019HS7USA/

Next Steps

Spark Core

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Spark Core

Similar to Spark Core (20)

Recently uploaded

Recently uploaded (20)

Spark Core