https://www.learntek.org/scala-spark-training/
Learntek is global online training provider on Big Data Analytics, Hadoop, Machine Learning, Deep Learning, IOT, AI, Cloud Technology, DEVOPS, Digital Marketing and other IT and Management courses.
3. Scala and Spark Training – What is Scala?
Scala and spark Training – Scala is a modern multi-paradigm
programming language designed to express common
programming patterns in a concise, elegant, and type-safe way.
Scala, the word came from “Scalable Language”, is a hybrid
functional programming language which smoothly integrates
the features of objected oriented and functional programming
languages and it is compiled to run on the Java Virtual Machine.
Scala has been created by Martin Odersky and released in 2003.
4. Copyright @ 2019 Learntek. All Rights Reserved. 4
Why Scala?
There are the following reasons that encourages Scala
learning.
Many existing companies, who depend on Java for business
critical applications, are turning to Scala to boost their
development productivity, applications scalability and overall
reliability.
Scala is a type-safe JVM language that incorporates both
object oriented and functional programming features into an
extremely concise, logical, simple and extremely powerful
language.
5. Copyright @ 2019 Learntek. All Rights Reserved. 5
Scala creates a “better Java” alternative by remaining its syntax
very close to the Java language syntax, so that to minimize the
learning difficulty.
Scala was created specifically with the goal of creating a
better language, in contrast with those restrictive, overly
tedious, or frustrating features of Java.
Scala is a much cleaner and well organized language that is
ultimately easier to use and increases productivity.
6. Copyright @ 2019 Learntek. All Rights Reserved. 6
What is Spark?
Spark is a fast cluster computing technology, designed for fast
computation in Hadoop clusters. It is based on Hadoop
MapReduce programming and it extends the MapReduce
model to efficiently use it for more types of computations, like
interactive queries and stream processing. Spark uses Hadoop
in two different ways – one is storage and another one
is processing. As Spark is having its own cluster management
computation, it uses Hadoop for storage purpose only.
7. Copyright @ 2019 Learntek. All Rights Reserved. 7
Spark is one of Hadoop’s sub project developed in 2009 in UC
Berkeley’s AMP Lab by Matey Zaharia. It was Open Sourced in
2010 under a BSD license. It was donated to Apache software
foundation in 2013, and now Apache Spark has become a top
level Apache project from Feb-2014.
8. Copyright @ 2019 Learntek. All Rights Reserved. 8
Why Spark?
Spark was introduced by Apache Software Foundation for
speeding up the Hadoop software computing process.
The main feature of Spark is its in-memory cluster
computing that highly increases the speed of an application
processing.
Spark is designed to cover a wide range of workloads such as
batch applications, iterative algorithms, interactive queries and
streaming applications by reducing the management burden of
maintaining separate tools.
9. Copyright @ 2019 Learntek. All Rights Reserved. 9
Apache Spark also have the following features.
•Speed− Spark helps to run an application in Hadoop cluster, up
to 100 times faster in memory and 10 times faster when
running on disk by reducing number of read/write operations to
disk and by storing the intermediate processing data in
memory.
10. Copyright @ 2019 Learntek. All Rights Reserved. 10
•Supports multiple languages− Spark comes up with 80 high-
level operators for interactive querying and provides
application development with built-in APIs in different
languages in Java, Scala, or Python.
•Advanced Analytics− Spark not only supports ‘Map’ and
‘reduce’ programming but it also supports SQL queries,
Streaming data, Machine learning (ML), and Graph algorithms.
11. Copyright @ 2019 Learntek. All Rights Reserved. 11
The following topics will be covered in our Scala and Spark
Training:
Scala and Spark Training – Introduction to Scala
Scala and spark Training – Overview of Scala
Installing Scala
Scala Basics
IDE for Scala
Scala Worksheet
12. Copyright @ 2019 Learntek. All Rights Reserved. 12
Scala Programming
Variables & Methods
Literals
Reserved Words
Operators
Precedence Rules
Operator Associativity
Ways of Executing a Scala Program
Expressions and Loops
If Expression
For Expression
Usage of ‘yield’ keyword in For Expression
Exception handling with Try Expression
Match Expression
While Loops
Do-While Loops
13. Copyright @ 2019 Learntek. All Rights Reserved. 13
Functions in Scala
Methods
Nested Methods
First class Function
Higher Order Methods
Function Literal
Partially Applied Function
Tail Recursion
Closure
Currying
Control Abstraction
Call-by-name Vs call-by-value
Repeated Parameter passing mechanism
Named Parameter mechanism
Default parameter mechanism
14. Copyright @ 2019 Learntek. All Rights Reserved. 14
OOPs in Scala
Classes & Objects
Defining a Constructor
Constructor Parameter Vs Class Parameter
Singleton Object
Companion Object
Abstract Class
Uniform Access Principle
Access Modifiers
Extending a Class
Namespace in Scala
Calling a superclass Constructor
Dynamic Binding in Scala
Final Member in Scala Class
Scala Class Hierarchy
Object Equality in Scala
Factory Design Pattern in Scala
15. Copyright @ 2019 Learntek. All Rights Reserved. 15
Traits
Introduction to Traits
Inheritance in Traits
Mixing a Trait
Trait Vs Class
Ordered Trait
Example of Ordered Trait
Stackable Modification behaviour of Trait
Example of Stackable Modification
Rules of mixing of multiple traits
16. Copyright @ 2019 Learntek. All Rights Reserved. 16
Scala Programming Packaging
Package
Different form of Scala Package
Imports statement
Different form of Import
Package Object
Implicit Imports
17. Copyright @ 2019 Learntek. All Rights Reserved. 17
Case Class & Pattern Matching
Introduction to Case Class
Introduction to Pattern Matching
Example of Pattern Matching
Wildcard Pattern
Constant Pattern
Variable Pattern
Constructor Pattern
Sequence Pattern
Tuple Pattern
Type Pattern
Variable Binding
Pattern Guard
Sealed Class
Option Data Type
Usage of Option Data Type
Pattern Usage
Partial Function
Case Class and Partial Function
Usage of Pattern in For Expression
18. Copyright @ 2019 Learntek. All Rights Reserved. 18
Scala Collection
Immutable and Mutable collection
Constructing object of Array, Set, List, Tuple,
Map
Detailed Discussion of various methods in List
class and List Object
List Construction
Basic Operations like head, tail, is Empty on List
List Pattern
Example of using List Pattern
Categories of methods in List
First Order Methods in List
Higher Order Methods in List
Map vs flat Map
Filtering a List
Example of take While, drop While, span,
partition
Predicates over List
Folding Over List
Fold Left Vs Fold Right
19. Copyright @ 2019 Learntek. All Rights Reserved. 19
Scala and Spark Training – Introduction to Spark
Introduction to Big Data
Big Data Problem
Scale-Up Vs Scale-Out Architecture
Characteristics of Scale-Out
Introduction to Hadoop, Map-Reduce and HDFS
Introducing Spark
20. Copyright @ 2019 Learntek. All Rights Reserved. 20
Hortonworks Data Platform (HDP) using Virtual box
Importing HDP VM image using Virtual box on local machine
Configuring HDP
Overview of Ambari and its components
Overview of services configuration using Ambari
Overview of Apache Zeppelin
Creating, importing and executing notebooks in Apache
Zeppelin
21. Copyright @ 2019 Learntek. All Rights Reserved. 21
IDEs for Spark Applications
SBT and its overview
Intellij
Eclipse
Resolving dependencies for Spark applications
22. Copyright @ 2019 Learntek. All Rights Reserved. 22
Spark Basics
Spark Shell
Overview of Spark architecture
Storage layers for Spark
Initialize a Spark Context and building
applications
Submitting a Spark Application
Use of Spark History Server
Spark Components
Spark Driver Process
Spark Executor
Spark Conf and Spark Context
Spark Session object
Overview of spark-submit command
Spark UI
23. Copyright @ 2019 Learntek. All Rights Reserved. 23
RDDs
Overview of RDD
RDD and Partitions
Ways of Creating RDD
RDD transformations and Actions
Lazy evaluation
RDD Lineage Graph (DAG)
Element wise transformations
Map Vs FlatMap Transformation
Set Transformation
RDD Actions
Overview of RDD persistence
Methods for persisting RDD
Persisting RDD with Storage option
Illustration of Caching on an RDD in DAG
Removal of Cached RDD
24. Copyright @ 2019 Learntek. All Rights Reserved. 24
Pair RDDs
Overview of Key-Value Pair RDD
Ways of creating Pair RDDs
Transformations on Pair RDD
ReduceByKey(), FoldByKey(),MapValues(),
FlatMapValues(),keys() and Values()
Transformation
Grouping, Joining, Sorting on Pair RDD
ReduceByKey() Vs GroupByKey()
Pair RDD Action
25. Copyright @ 2019 Learntek. All Rights Reserved. 25
Launching Spark on cluster
Configure and launch Spark Cluster on Google Cloud
Configure and launch Spark Cluster on Microsoft Azure
Logging and Debugging a Spark Application
Setting up a window environment for executing Spark Application using IDE
Steps of using slf4j logging mechanism in Spark Application
Attaching a debugger to Spark Application
Example of debugging a Spark application running inside a cluster
26. Copyright @ 2019 Learntek. All Rights Reserved.
26
Spark Application Architecture
Spark Application Distributed Architecture
Spark Application submission Mode
Overview of Cluster Manager
Example of using Standalone Cluster Manager
Driver and its responsibilities
Overview of Job, Stage and Tasks
Spark Job Hierarchy
Executor
Spark-submit command and various submission options
Yarn Cluster Manager
Yarn Architecture
Client and Cluster Deploy-mode
27. Copyright @ 2019 Learntek. All Rights Reserved. 27
Advance concepts in Spark
Accumulator
Broadcast
RDD partitioning
Re-partition RDD
Determining RDD partitioner
28. Copyright @ 2019 Learntek. All Rights Reserved. 28
Spark SQL
Introduction to SparkSQL
Creating SparkSession with Hive Support
Data Frame
Ways of Creating Data Frame
Registering a Data Frame as View
Data Frame Transformations API
Data Frame SQL statement
Aggregate Operations
Data Frame Action
Catalyst Optimizer
Catalog API
29. Copyright @ 2019 Learntek. All Rights Reserved.
29
Limitation of Data Frame
Introduction to Dataset
Introduction to Encoder
Creating Dataset
Functional transformation on Dataset
Loading CSV, JSON, Parquet format file in SparkSQL
Loading and saving data from/in Hive, JDBC, HDFS,
Cassandra
Introduction to User-Defined-Function (UDF)
Customizing a UDF
Usage of UDF in DataFrame Transformations API
Usage of UDF in Spark SQL statement
Introduction to Window Function
Steps of defining a window
function
Illustration of Window
function usage
Introduction to UDAF
Customizing a UDAF
Illustration of customized
UDAF usage
30. Copyright @ 2019 Learntek. All Rights Reserved. 30
Spark Streaming
Introduction to data streaming
Spark Streaming framework
Spark Streaming and Micro batch
Introduction of DStreams
DStreams and RDD
Word Count example using Socket Text Stream
Streaming with Twitter feeds
Setting up a Twitter App
Resolving Twitter dependency in Spark Streaming
Application
Steps of creating Uber Jar
Example of extracting hashtags
from tweet data
Troubleshooting Twitter
Streaming issue in Spark
Application
Steps of creating Spark Streaming
Application
Architecture of Spark Streaming
Stateless Transformations
31. Copyright @ 2019 Learntek. All Rights Reserved. 31
Twitter Streaming examples using stateless transformation
Introduction to stateful Transformations
Window Transformations
Window Duration and Slide Duration
Window Operations
Naive and inverse window reduce operation
Checkpoint
Tracking State of an event using updateStateByKey operation
Interact directly with RDD using transform () operation
Example of HDFS file streaming
Example of Spark-Kafka interaction
Saving DStreams to external file system
32. Copyright @ 2019 Learntek. All Rights Reserved. 32
For more Training Information , Contact
Us
Email : info@learntek.org
USA : +1734 418 2465
INDIA : +40 4018 1306
+7799713624