SlideShare a Scribd company logo
1 of 32
Scala and Spark Training
2
Scala and Spark Training – What is Scala?
Scala and spark Training – Scala is a modern multi-paradigm
programming language designed to express common
programming patterns in a concise, elegant, and type-safe way.
Scala, the word came from “Scalable Language”, is a hybrid
functional programming language which smoothly integrates
the features of objected oriented and functional programming
languages and it is compiled to run on the Java Virtual Machine.
Scala has been created by Martin Odersky and released in 2003.
Copyright @ 2019 Learntek. All Rights Reserved. 4
Why Scala?
There are the following reasons that encourages Scala
learning.
Many existing companies, who depend on Java for business
critical applications, are turning to Scala to boost their
development productivity, applications scalability and overall
reliability.
Scala is a type-safe JVM language that incorporates both
object oriented and functional programming features into an
extremely concise, logical, simple and extremely powerful
language.
Copyright @ 2019 Learntek. All Rights Reserved. 5
Scala creates a “better Java” alternative by remaining its syntax
very close to the Java language syntax, so that to minimize the
learning difficulty.
Scala was created specifically with the goal of creating a
better language, in contrast with those restrictive, overly
tedious, or frustrating features of Java.
Scala is a much cleaner and well organized language that is
ultimately easier to use and increases productivity.
Copyright @ 2019 Learntek. All Rights Reserved. 6
What is Spark?
Spark is a fast cluster computing technology, designed for fast
computation in Hadoop clusters. It is based on Hadoop
MapReduce programming and it extends the MapReduce
model to efficiently use it for more types of computations, like
interactive queries and stream processing. Spark uses Hadoop
in two different ways – one is storage and another one
is processing. As Spark is having its own cluster management
computation, it uses Hadoop for storage purpose only.
Copyright @ 2019 Learntek. All Rights Reserved. 7
Spark is one of Hadoop’s sub project developed in 2009 in UC
Berkeley’s AMP Lab by Matey Zaharia. It was Open Sourced in
2010 under a BSD license. It was donated to Apache software
foundation in 2013, and now Apache Spark has become a top
level Apache project from Feb-2014.
Copyright @ 2019 Learntek. All Rights Reserved. 8
Why Spark?
Spark was introduced by Apache Software Foundation for
speeding up the Hadoop software computing process.
The main feature of Spark is its in-memory cluster
computing that highly increases the speed of an application
processing.
Spark is designed to cover a wide range of workloads such as
batch applications, iterative algorithms, interactive queries and
streaming applications by reducing the management burden of
maintaining separate tools.
Copyright @ 2019 Learntek. All Rights Reserved. 9
Apache Spark also have the following features.
•Speed− Spark helps to run an application in Hadoop cluster, up
to 100 times faster in memory and 10 times faster when
running on disk by reducing number of read/write operations to
disk and by storing the intermediate processing data in
memory.
Copyright @ 2019 Learntek. All Rights Reserved. 10
•Supports multiple languages− Spark comes up with 80 high-
level operators for interactive querying and provides
application development with built-in APIs in different
languages in Java, Scala, or Python.
•Advanced Analytics− Spark not only supports ‘Map’ and
‘reduce’ programming but it also supports SQL queries,
Streaming data, Machine learning (ML), and Graph algorithms.
Copyright @ 2019 Learntek. All Rights Reserved. 11
The following topics will be covered in our Scala and Spark
Training:
Scala and Spark Training – Introduction to Scala
Scala and spark Training – Overview of Scala
Installing Scala
Scala Basics
IDE for Scala
Scala Worksheet
Copyright @ 2019 Learntek. All Rights Reserved. 12
Scala Programming
Variables & Methods
Literals
Reserved Words
Operators
Precedence Rules
Operator Associativity
Ways of Executing a Scala Program
Expressions and Loops
If Expression
For Expression
Usage of ‘yield’ keyword in For Expression
Exception handling with Try Expression
Match Expression
While Loops
Do-While Loops
Copyright @ 2019 Learntek. All Rights Reserved. 13
Functions in Scala
Methods
Nested Methods
First class Function
Higher Order Methods
Function Literal
Partially Applied Function
Tail Recursion
Closure
Currying
Control Abstraction
Call-by-name Vs call-by-value
Repeated Parameter passing mechanism
Named Parameter mechanism
Default parameter mechanism
Copyright @ 2019 Learntek. All Rights Reserved. 14
OOPs in Scala
Classes & Objects
Defining a Constructor
Constructor Parameter Vs Class Parameter
Singleton Object
Companion Object
Abstract Class
Uniform Access Principle
Access Modifiers
Extending a Class
Namespace in Scala
Calling a superclass Constructor
Dynamic Binding in Scala
Final Member in Scala Class
Scala Class Hierarchy
Object Equality in Scala
Factory Design Pattern in Scala
Copyright @ 2019 Learntek. All Rights Reserved. 15
Traits
Introduction to Traits
Inheritance in Traits
Mixing a Trait
Trait Vs Class
Ordered Trait
Example of Ordered Trait
Stackable Modification behaviour of Trait
Example of Stackable Modification
Rules of mixing of multiple traits
Copyright @ 2019 Learntek. All Rights Reserved. 16
Scala Programming Packaging
Package
Different form of Scala Package
Imports statement
Different form of Import
Package Object
Implicit Imports
Copyright @ 2019 Learntek. All Rights Reserved. 17
Case Class & Pattern Matching
Introduction to Case Class
Introduction to Pattern Matching
Example of Pattern Matching
Wildcard Pattern
Constant Pattern
Variable Pattern
Constructor Pattern
Sequence Pattern
Tuple Pattern
Type Pattern
Variable Binding
Pattern Guard
Sealed Class
Option Data Type
Usage of Option Data Type
Pattern Usage
Partial Function
Case Class and Partial Function
Usage of Pattern in For Expression
Copyright @ 2019 Learntek. All Rights Reserved. 18
Scala Collection
Immutable and Mutable collection
Constructing object of Array, Set, List, Tuple,
Map
Detailed Discussion of various methods in List
class and List Object
List Construction
Basic Operations like head, tail, is Empty on List
List Pattern
Example of using List Pattern
Categories of methods in List
First Order Methods in List
Higher Order Methods in List
Map vs flat Map
Filtering a List
Example of take While, drop While, span,
partition
Predicates over List
Folding Over List
Fold Left Vs Fold Right
Copyright @ 2019 Learntek. All Rights Reserved. 19
Scala and Spark Training – Introduction to Spark
Introduction to Big Data
Big Data Problem
Scale-Up Vs Scale-Out Architecture
Characteristics of Scale-Out
Introduction to Hadoop, Map-Reduce and HDFS
Introducing Spark
Copyright @ 2019 Learntek. All Rights Reserved. 20
Hortonworks Data Platform (HDP) using Virtual box
Importing HDP VM image using Virtual box on local machine
Configuring HDP
Overview of Ambari and its components
Overview of services configuration using Ambari
Overview of Apache Zeppelin
Creating, importing and executing notebooks in Apache
Zeppelin
Copyright @ 2019 Learntek. All Rights Reserved. 21
IDEs for Spark Applications
SBT and its overview
Intellij
Eclipse
Resolving dependencies for Spark applications
Copyright @ 2019 Learntek. All Rights Reserved. 22
Spark Basics
Spark Shell
Overview of Spark architecture
Storage layers for Spark
Initialize a Spark Context and building
applications
Submitting a Spark Application
Use of Spark History Server
Spark Components
Spark Driver Process
Spark Executor
Spark Conf and Spark Context
Spark Session object
Overview of spark-submit command
Spark UI
Copyright @ 2019 Learntek. All Rights Reserved. 23
RDDs
Overview of RDD
RDD and Partitions
Ways of Creating RDD
RDD transformations and Actions
Lazy evaluation
RDD Lineage Graph (DAG)
Element wise transformations
Map Vs FlatMap Transformation
Set Transformation
RDD Actions
Overview of RDD persistence
Methods for persisting RDD
Persisting RDD with Storage option
Illustration of Caching on an RDD in DAG
Removal of Cached RDD
Copyright @ 2019 Learntek. All Rights Reserved. 24
Pair RDDs
Overview of Key-Value Pair RDD
Ways of creating Pair RDDs
Transformations on Pair RDD
ReduceByKey(), FoldByKey(),MapValues(),
FlatMapValues(),keys() and Values()
Transformation
Grouping, Joining, Sorting on Pair RDD
ReduceByKey() Vs GroupByKey()
Pair RDD Action
Copyright @ 2019 Learntek. All Rights Reserved. 25
Launching Spark on cluster
Configure and launch Spark Cluster on Google Cloud
Configure and launch Spark Cluster on Microsoft Azure
Logging and Debugging a Spark Application
Setting up a window environment for executing Spark Application using IDE
Steps of using slf4j logging mechanism in Spark Application
Attaching a debugger to Spark Application
Example of debugging a Spark application running inside a cluster
Copyright @ 2019 Learntek. All Rights Reserved.
26
Spark Application Architecture
Spark Application Distributed Architecture
Spark Application submission Mode
Overview of Cluster Manager
Example of using Standalone Cluster Manager
Driver and its responsibilities
Overview of Job, Stage and Tasks
Spark Job Hierarchy
Executor
Spark-submit command and various submission options
Yarn Cluster Manager
Yarn Architecture
Client and Cluster Deploy-mode
Copyright @ 2019 Learntek. All Rights Reserved. 27
Advance concepts in Spark
Accumulator
Broadcast
RDD partitioning
Re-partition RDD
Determining RDD partitioner
Copyright @ 2019 Learntek. All Rights Reserved. 28
Spark SQL
Introduction to SparkSQL
Creating SparkSession with Hive Support
Data Frame
Ways of Creating Data Frame
Registering a Data Frame as View
Data Frame Transformations API
Data Frame SQL statement
Aggregate Operations
Data Frame Action
Catalyst Optimizer
Catalog API
Copyright @ 2019 Learntek. All Rights Reserved.
29
Limitation of Data Frame
Introduction to Dataset
Introduction to Encoder
Creating Dataset
Functional transformation on Dataset
Loading CSV, JSON, Parquet format file in SparkSQL
Loading and saving data from/in Hive, JDBC, HDFS,
Cassandra
Introduction to User-Defined-Function (UDF)
Customizing a UDF
Usage of UDF in DataFrame Transformations API
Usage of UDF in Spark SQL statement
Introduction to Window Function
Steps of defining a window
function
Illustration of Window
function usage
Introduction to UDAF
Customizing a UDAF
Illustration of customized
UDAF usage
Copyright @ 2019 Learntek. All Rights Reserved. 30
Spark Streaming
Introduction to data streaming
Spark Streaming framework
Spark Streaming and Micro batch
Introduction of DStreams
DStreams and RDD
Word Count example using Socket Text Stream
Streaming with Twitter feeds
Setting up a Twitter App
Resolving Twitter dependency in Spark Streaming
Application
Steps of creating Uber Jar
Example of extracting hashtags
from tweet data
Troubleshooting Twitter
Streaming issue in Spark
Application
Steps of creating Spark Streaming
Application
Architecture of Spark Streaming
Stateless Transformations
Copyright @ 2019 Learntek. All Rights Reserved. 31
Twitter Streaming examples using stateless transformation
Introduction to stateful Transformations
Window Transformations
Window Duration and Slide Duration
Window Operations
Naive and inverse window reduce operation
Checkpoint
Tracking State of an event using updateStateByKey operation
Interact directly with RDD using transform () operation
Example of HDFS file streaming
Example of Spark-Kafka interaction
Saving DStreams to external file system
Copyright @ 2019 Learntek. All Rights Reserved. 32
For more Training Information , Contact
Us
Email : info@learntek.org
USA : +1734 418 2465
INDIA : +40 4018 1306
+7799713624

More Related Content

What's hot

JSON and Oracle Database: A Brave New World
 JSON and Oracle Database: A Brave New World JSON and Oracle Database: A Brave New World
JSON and Oracle Database: A Brave New WorldDaniel McGhan
 
ER/Studio 2016: Build a Business-Driven Data Architecture
ER/Studio 2016: Build a Business-Driven Data ArchitectureER/Studio 2016: Build a Business-Driven Data Architecture
ER/Studio 2016: Build a Business-Driven Data ArchitectureEmbarcadero Technologies
 
Data integration-on-hadoop
Data integration-on-hadoopData integration-on-hadoop
Data integration-on-hadoopskaluska
 
JavaScript: Why Should I Care?
JavaScript: Why Should I Care?JavaScript: Why Should I Care?
JavaScript: Why Should I Care?Daniel McGhan
 
Lambda expressions
Lambda expressionsLambda expressions
Lambda expressionsDoron Gold
 
Oracle SQL Developer for SQL Server?
Oracle SQL Developer for SQL Server?Oracle SQL Developer for SQL Server?
Oracle SQL Developer for SQL Server?Jeff Smith
 
Which Questions We Should Have
Which Questions We Should HaveWhich Questions We Should Have
Which Questions We Should HaveOracle Korea
 
Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...
Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...
Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...Edureka!
 
APACHE SPARK PER IL MACHINE LEARNING: INTRODUZIONE ED UN CASO DI STUDIO_ Meet...
APACHE SPARK PER IL MACHINE LEARNING: INTRODUZIONE ED UN CASO DI STUDIO_ Meet...APACHE SPARK PER IL MACHINE LEARNING: INTRODUZIONE ED UN CASO DI STUDIO_ Meet...
APACHE SPARK PER IL MACHINE LEARNING: INTRODUZIONE ED UN CASO DI STUDIO_ Meet...Deep Learning Italia
 
Talend online training and jobsupport
Talend online training and jobsupportTalend online training and jobsupport
Talend online training and jobsupportkraja2035
 
Aras Federation Web Services
Aras Federation Web ServicesAras Federation Web Services
Aras Federation Web ServicesProdeos
 
Intro to JavaScript for APEX Developers
Intro to JavaScript for APEX DevelopersIntro to JavaScript for APEX Developers
Intro to JavaScript for APEX DevelopersDaniel McGhan
 

What's hot (15)

Oracle NoSQL
Oracle NoSQLOracle NoSQL
Oracle NoSQL
 
JSON and Oracle Database: A Brave New World
 JSON and Oracle Database: A Brave New World JSON and Oracle Database: A Brave New World
JSON and Oracle Database: A Brave New World
 
Scala a case4
Scala a case4Scala a case4
Scala a case4
 
ER/Studio 2016: Build a Business-Driven Data Architecture
ER/Studio 2016: Build a Business-Driven Data ArchitectureER/Studio 2016: Build a Business-Driven Data Architecture
ER/Studio 2016: Build a Business-Driven Data Architecture
 
Data integration-on-hadoop
Data integration-on-hadoopData integration-on-hadoop
Data integration-on-hadoop
 
resumePdf
resumePdfresumePdf
resumePdf
 
JavaScript: Why Should I Care?
JavaScript: Why Should I Care?JavaScript: Why Should I Care?
JavaScript: Why Should I Care?
 
Lambda expressions
Lambda expressionsLambda expressions
Lambda expressions
 
Oracle SQL Developer for SQL Server?
Oracle SQL Developer for SQL Server?Oracle SQL Developer for SQL Server?
Oracle SQL Developer for SQL Server?
 
Which Questions We Should Have
Which Questions We Should HaveWhich Questions We Should Have
Which Questions We Should Have
 
Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...
Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...
Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...
 
APACHE SPARK PER IL MACHINE LEARNING: INTRODUZIONE ED UN CASO DI STUDIO_ Meet...
APACHE SPARK PER IL MACHINE LEARNING: INTRODUZIONE ED UN CASO DI STUDIO_ Meet...APACHE SPARK PER IL MACHINE LEARNING: INTRODUZIONE ED UN CASO DI STUDIO_ Meet...
APACHE SPARK PER IL MACHINE LEARNING: INTRODUZIONE ED UN CASO DI STUDIO_ Meet...
 
Talend online training and jobsupport
Talend online training and jobsupportTalend online training and jobsupport
Talend online training and jobsupport
 
Aras Federation Web Services
Aras Federation Web ServicesAras Federation Web Services
Aras Federation Web Services
 
Intro to JavaScript for APEX Developers
Intro to JavaScript for APEX DevelopersIntro to JavaScript for APEX Developers
Intro to JavaScript for APEX Developers
 

Similar to Scala and spark

Spark forplainoldjavageeks svforum_20140724
Spark forplainoldjavageeks svforum_20140724Spark forplainoldjavageeks svforum_20140724
Spark forplainoldjavageeks svforum_20140724sdeeg
 
Scala & Spark Online Training
Scala & Spark Online TrainingScala & Spark Online Training
Scala & Spark Online TrainingLearntek1
 
Spark For Faster Batch Processing
Spark For Faster Batch ProcessingSpark For Faster Batch Processing
Spark For Faster Batch ProcessingEdureka!
 
Why spark by Stratio - v.1.0
Why spark by Stratio - v.1.0Why spark by Stratio - v.1.0
Why spark by Stratio - v.1.0Stratio
 
Apache Spark Introduction @ University College London
Apache Spark Introduction @ University College LondonApache Spark Introduction @ University College London
Apache Spark Introduction @ University College LondonVitthal Gogate
 
Apache Spark Introduction.pdf
Apache Spark Introduction.pdfApache Spark Introduction.pdf
Apache Spark Introduction.pdfMaheshPandit16
 
Learn about SPARK tool and it's componemts
Learn about SPARK tool and it's componemtsLearn about SPARK tool and it's componemts
Learn about SPARK tool and it's componemtssiddharth30121
 
Spark SQL | Apache Spark
Spark SQL | Apache SparkSpark SQL | Apache Spark
Spark SQL | Apache SparkEdureka!
 
Big Data Processing With Spark
Big Data Processing With SparkBig Data Processing With Spark
Big Data Processing With SparkEdureka!
 
Apache Spark: Lightning Fast Cluster Computing
Apache Spark: Lightning Fast Cluster ComputingApache Spark: Lightning Fast Cluster Computing
Apache Spark: Lightning Fast Cluster ComputingAll Things Open
 
Spark For Plain Old Java Geeks (June2014 Meetup)
Spark For Plain Old Java Geeks (June2014 Meetup)Spark For Plain Old Java Geeks (June2014 Meetup)
Spark For Plain Old Java Geeks (June2014 Meetup)sdeeg
 
Introduction to Spark - DataFactZ
Introduction to Spark - DataFactZIntroduction to Spark - DataFactZ
Introduction to Spark - DataFactZDataFactZ
 
Big Data Processing with Spark and Scala
Big Data Processing with Spark and Scala Big Data Processing with Spark and Scala
Big Data Processing with Spark and Scala Edureka!
 
spark interview questions & answers acadgild blogs
 spark interview questions & answers acadgild blogs spark interview questions & answers acadgild blogs
spark interview questions & answers acadgild blogsprateek kumar
 
5 reasons why spark is in demand!
5 reasons why spark is in demand!5 reasons why spark is in demand!
5 reasons why spark is in demand!Edureka!
 
5 things one must know about spark!
5 things one must know about spark!5 things one must know about spark!
5 things one must know about spark!Edureka!
 
Bring the Spark To Your Eyes
Bring the Spark To Your EyesBring the Spark To Your Eyes
Bring the Spark To Your EyesDemi Ben-Ari
 

Similar to Scala and spark (20)

Spark forplainoldjavageeks svforum_20140724
Spark forplainoldjavageeks svforum_20140724Spark forplainoldjavageeks svforum_20140724
Spark forplainoldjavageeks svforum_20140724
 
Scala & Spark Online Training
Scala & Spark Online TrainingScala & Spark Online Training
Scala & Spark Online Training
 
Spark For Faster Batch Processing
Spark For Faster Batch ProcessingSpark For Faster Batch Processing
Spark For Faster Batch Processing
 
Why spark by Stratio - v.1.0
Why spark by Stratio - v.1.0Why spark by Stratio - v.1.0
Why spark by Stratio - v.1.0
 
Apache Spark Introduction @ University College London
Apache Spark Introduction @ University College LondonApache Spark Introduction @ University College London
Apache Spark Introduction @ University College London
 
Apache Spark Introduction.pdf
Apache Spark Introduction.pdfApache Spark Introduction.pdf
Apache Spark Introduction.pdf
 
Learn about SPARK tool and it's componemts
Learn about SPARK tool and it's componemtsLearn about SPARK tool and it's componemts
Learn about SPARK tool and it's componemts
 
Spark SQL | Apache Spark
Spark SQL | Apache SparkSpark SQL | Apache Spark
Spark SQL | Apache Spark
 
Big Data Processing With Spark
Big Data Processing With SparkBig Data Processing With Spark
Big Data Processing With Spark
 
Apache Spark: Lightning Fast Cluster Computing
Apache Spark: Lightning Fast Cluster ComputingApache Spark: Lightning Fast Cluster Computing
Apache Spark: Lightning Fast Cluster Computing
 
Spark For Plain Old Java Geeks (June2014 Meetup)
Spark For Plain Old Java Geeks (June2014 Meetup)Spark For Plain Old Java Geeks (June2014 Meetup)
Spark For Plain Old Java Geeks (June2014 Meetup)
 
Introduction to Spark - DataFactZ
Introduction to Spark - DataFactZIntroduction to Spark - DataFactZ
Introduction to Spark - DataFactZ
 
Apache spark
Apache sparkApache spark
Apache spark
 
Big Data Processing with Spark and Scala
Big Data Processing with Spark and Scala Big Data Processing with Spark and Scala
Big Data Processing with Spark and Scala
 
spark interview questions & answers acadgild blogs
 spark interview questions & answers acadgild blogs spark interview questions & answers acadgild blogs
spark interview questions & answers acadgild blogs
 
Spark 101
Spark 101Spark 101
Spark 101
 
5 reasons why spark is in demand!
5 reasons why spark is in demand!5 reasons why spark is in demand!
5 reasons why spark is in demand!
 
Started with-apache-spark
Started with-apache-sparkStarted with-apache-spark
Started with-apache-spark
 
5 things one must know about spark!
5 things one must know about spark!5 things one must know about spark!
5 things one must know about spark!
 
Bring the Spark To Your Eyes
Bring the Spark To Your EyesBring the Spark To Your Eyes
Bring the Spark To Your Eyes
 

More from Janu Jahnavi

Analytics using r programming
Analytics using r programmingAnalytics using r programming
Analytics using r programmingJanu Jahnavi
 
Google cloud platform
Google cloud platformGoogle cloud platform
Google cloud platformJanu Jahnavi
 
Google cloud Platform
Google cloud PlatformGoogle cloud Platform
Google cloud PlatformJanu Jahnavi
 
Categorizing and pos tagging with nltk python
Categorizing and pos tagging with nltk pythonCategorizing and pos tagging with nltk python
Categorizing and pos tagging with nltk pythonJanu Jahnavi
 
Categorizing and pos tagging with nltk python
Categorizing and pos tagging with nltk pythonCategorizing and pos tagging with nltk python
Categorizing and pos tagging with nltk pythonJanu Jahnavi
 
Python multithreading
Python multithreadingPython multithreading
Python multithreadingJanu Jahnavi
 
Python multithreading
Python multithreadingPython multithreading
Python multithreadingJanu Jahnavi
 

More from Janu Jahnavi (20)

Analytics using r programming
Analytics using r programmingAnalytics using r programming
Analytics using r programming
 
Software testing
Software testingSoftware testing
Software testing
 
Software testing
Software testingSoftware testing
Software testing
 
Spring
SpringSpring
Spring
 
Stack skills
Stack skillsStack skills
Stack skills
 
Ui devopler
Ui devoplerUi devopler
Ui devopler
 
Apache flink
Apache flinkApache flink
Apache flink
 
Apache flink
Apache flinkApache flink
Apache flink
 
Angular js
Angular jsAngular js
Angular js
 
Mysql python
Mysql pythonMysql python
Mysql python
 
Mysql python
Mysql pythonMysql python
Mysql python
 
Ruby with cucmber
Ruby with cucmberRuby with cucmber
Ruby with cucmber
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Google cloud platform
Google cloud platformGoogle cloud platform
Google cloud platform
 
Google cloud Platform
Google cloud PlatformGoogle cloud Platform
Google cloud Platform
 
Categorizing and pos tagging with nltk python
Categorizing and pos tagging with nltk pythonCategorizing and pos tagging with nltk python
Categorizing and pos tagging with nltk python
 
Categorizing and pos tagging with nltk python
Categorizing and pos tagging with nltk pythonCategorizing and pos tagging with nltk python
Categorizing and pos tagging with nltk python
 
Python multithreading
Python multithreadingPython multithreading
Python multithreading
 
Python multithreading
Python multithreadingPython multithreading
Python multithreading
 

Recently uploaded

MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupJonathanParaisoCruz
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfadityarao40181
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...M56BOOKSTORE PRODUCT/SERVICE
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentInMediaRes1
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxsocialsciencegdgrohi
 

Recently uploaded (20)

MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized Group
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdf
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media Component
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
 

Scala and spark

  • 1. Scala and Spark Training
  • 2. 2
  • 3. Scala and Spark Training – What is Scala? Scala and spark Training – Scala is a modern multi-paradigm programming language designed to express common programming patterns in a concise, elegant, and type-safe way. Scala, the word came from “Scalable Language”, is a hybrid functional programming language which smoothly integrates the features of objected oriented and functional programming languages and it is compiled to run on the Java Virtual Machine. Scala has been created by Martin Odersky and released in 2003.
  • 4. Copyright @ 2019 Learntek. All Rights Reserved. 4 Why Scala? There are the following reasons that encourages Scala learning. Many existing companies, who depend on Java for business critical applications, are turning to Scala to boost their development productivity, applications scalability and overall reliability. Scala is a type-safe JVM language that incorporates both object oriented and functional programming features into an extremely concise, logical, simple and extremely powerful language.
  • 5. Copyright @ 2019 Learntek. All Rights Reserved. 5 Scala creates a “better Java” alternative by remaining its syntax very close to the Java language syntax, so that to minimize the learning difficulty. Scala was created specifically with the goal of creating a better language, in contrast with those restrictive, overly tedious, or frustrating features of Java. Scala is a much cleaner and well organized language that is ultimately easier to use and increases productivity.
  • 6. Copyright @ 2019 Learntek. All Rights Reserved. 6 What is Spark? Spark is a fast cluster computing technology, designed for fast computation in Hadoop clusters. It is based on Hadoop MapReduce programming and it extends the MapReduce model to efficiently use it for more types of computations, like interactive queries and stream processing. Spark uses Hadoop in two different ways – one is storage and another one is processing. As Spark is having its own cluster management computation, it uses Hadoop for storage purpose only.
  • 7. Copyright @ 2019 Learntek. All Rights Reserved. 7 Spark is one of Hadoop’s sub project developed in 2009 in UC Berkeley’s AMP Lab by Matey Zaharia. It was Open Sourced in 2010 under a BSD license. It was donated to Apache software foundation in 2013, and now Apache Spark has become a top level Apache project from Feb-2014.
  • 8. Copyright @ 2019 Learntek. All Rights Reserved. 8 Why Spark? Spark was introduced by Apache Software Foundation for speeding up the Hadoop software computing process. The main feature of Spark is its in-memory cluster computing that highly increases the speed of an application processing. Spark is designed to cover a wide range of workloads such as batch applications, iterative algorithms, interactive queries and streaming applications by reducing the management burden of maintaining separate tools.
  • 9. Copyright @ 2019 Learntek. All Rights Reserved. 9 Apache Spark also have the following features. •Speed− Spark helps to run an application in Hadoop cluster, up to 100 times faster in memory and 10 times faster when running on disk by reducing number of read/write operations to disk and by storing the intermediate processing data in memory.
  • 10. Copyright @ 2019 Learntek. All Rights Reserved. 10 •Supports multiple languages− Spark comes up with 80 high- level operators for interactive querying and provides application development with built-in APIs in different languages in Java, Scala, or Python. •Advanced Analytics− Spark not only supports ‘Map’ and ‘reduce’ programming but it also supports SQL queries, Streaming data, Machine learning (ML), and Graph algorithms.
  • 11. Copyright @ 2019 Learntek. All Rights Reserved. 11 The following topics will be covered in our Scala and Spark Training: Scala and Spark Training – Introduction to Scala Scala and spark Training – Overview of Scala Installing Scala Scala Basics IDE for Scala Scala Worksheet
  • 12. Copyright @ 2019 Learntek. All Rights Reserved. 12 Scala Programming Variables & Methods Literals Reserved Words Operators Precedence Rules Operator Associativity Ways of Executing a Scala Program Expressions and Loops If Expression For Expression Usage of ‘yield’ keyword in For Expression Exception handling with Try Expression Match Expression While Loops Do-While Loops
  • 13. Copyright @ 2019 Learntek. All Rights Reserved. 13 Functions in Scala Methods Nested Methods First class Function Higher Order Methods Function Literal Partially Applied Function Tail Recursion Closure Currying Control Abstraction Call-by-name Vs call-by-value Repeated Parameter passing mechanism Named Parameter mechanism Default parameter mechanism
  • 14. Copyright @ 2019 Learntek. All Rights Reserved. 14 OOPs in Scala Classes & Objects Defining a Constructor Constructor Parameter Vs Class Parameter Singleton Object Companion Object Abstract Class Uniform Access Principle Access Modifiers Extending a Class Namespace in Scala Calling a superclass Constructor Dynamic Binding in Scala Final Member in Scala Class Scala Class Hierarchy Object Equality in Scala Factory Design Pattern in Scala
  • 15. Copyright @ 2019 Learntek. All Rights Reserved. 15 Traits Introduction to Traits Inheritance in Traits Mixing a Trait Trait Vs Class Ordered Trait Example of Ordered Trait Stackable Modification behaviour of Trait Example of Stackable Modification Rules of mixing of multiple traits
  • 16. Copyright @ 2019 Learntek. All Rights Reserved. 16 Scala Programming Packaging Package Different form of Scala Package Imports statement Different form of Import Package Object Implicit Imports
  • 17. Copyright @ 2019 Learntek. All Rights Reserved. 17 Case Class & Pattern Matching Introduction to Case Class Introduction to Pattern Matching Example of Pattern Matching Wildcard Pattern Constant Pattern Variable Pattern Constructor Pattern Sequence Pattern Tuple Pattern Type Pattern Variable Binding Pattern Guard Sealed Class Option Data Type Usage of Option Data Type Pattern Usage Partial Function Case Class and Partial Function Usage of Pattern in For Expression
  • 18. Copyright @ 2019 Learntek. All Rights Reserved. 18 Scala Collection Immutable and Mutable collection Constructing object of Array, Set, List, Tuple, Map Detailed Discussion of various methods in List class and List Object List Construction Basic Operations like head, tail, is Empty on List List Pattern Example of using List Pattern Categories of methods in List First Order Methods in List Higher Order Methods in List Map vs flat Map Filtering a List Example of take While, drop While, span, partition Predicates over List Folding Over List Fold Left Vs Fold Right
  • 19. Copyright @ 2019 Learntek. All Rights Reserved. 19 Scala and Spark Training – Introduction to Spark Introduction to Big Data Big Data Problem Scale-Up Vs Scale-Out Architecture Characteristics of Scale-Out Introduction to Hadoop, Map-Reduce and HDFS Introducing Spark
  • 20. Copyright @ 2019 Learntek. All Rights Reserved. 20 Hortonworks Data Platform (HDP) using Virtual box Importing HDP VM image using Virtual box on local machine Configuring HDP Overview of Ambari and its components Overview of services configuration using Ambari Overview of Apache Zeppelin Creating, importing and executing notebooks in Apache Zeppelin
  • 21. Copyright @ 2019 Learntek. All Rights Reserved. 21 IDEs for Spark Applications SBT and its overview Intellij Eclipse Resolving dependencies for Spark applications
  • 22. Copyright @ 2019 Learntek. All Rights Reserved. 22 Spark Basics Spark Shell Overview of Spark architecture Storage layers for Spark Initialize a Spark Context and building applications Submitting a Spark Application Use of Spark History Server Spark Components Spark Driver Process Spark Executor Spark Conf and Spark Context Spark Session object Overview of spark-submit command Spark UI
  • 23. Copyright @ 2019 Learntek. All Rights Reserved. 23 RDDs Overview of RDD RDD and Partitions Ways of Creating RDD RDD transformations and Actions Lazy evaluation RDD Lineage Graph (DAG) Element wise transformations Map Vs FlatMap Transformation Set Transformation RDD Actions Overview of RDD persistence Methods for persisting RDD Persisting RDD with Storage option Illustration of Caching on an RDD in DAG Removal of Cached RDD
  • 24. Copyright @ 2019 Learntek. All Rights Reserved. 24 Pair RDDs Overview of Key-Value Pair RDD Ways of creating Pair RDDs Transformations on Pair RDD ReduceByKey(), FoldByKey(),MapValues(), FlatMapValues(),keys() and Values() Transformation Grouping, Joining, Sorting on Pair RDD ReduceByKey() Vs GroupByKey() Pair RDD Action
  • 25. Copyright @ 2019 Learntek. All Rights Reserved. 25 Launching Spark on cluster Configure and launch Spark Cluster on Google Cloud Configure and launch Spark Cluster on Microsoft Azure Logging and Debugging a Spark Application Setting up a window environment for executing Spark Application using IDE Steps of using slf4j logging mechanism in Spark Application Attaching a debugger to Spark Application Example of debugging a Spark application running inside a cluster
  • 26. Copyright @ 2019 Learntek. All Rights Reserved. 26 Spark Application Architecture Spark Application Distributed Architecture Spark Application submission Mode Overview of Cluster Manager Example of using Standalone Cluster Manager Driver and its responsibilities Overview of Job, Stage and Tasks Spark Job Hierarchy Executor Spark-submit command and various submission options Yarn Cluster Manager Yarn Architecture Client and Cluster Deploy-mode
  • 27. Copyright @ 2019 Learntek. All Rights Reserved. 27 Advance concepts in Spark Accumulator Broadcast RDD partitioning Re-partition RDD Determining RDD partitioner
  • 28. Copyright @ 2019 Learntek. All Rights Reserved. 28 Spark SQL Introduction to SparkSQL Creating SparkSession with Hive Support Data Frame Ways of Creating Data Frame Registering a Data Frame as View Data Frame Transformations API Data Frame SQL statement Aggregate Operations Data Frame Action Catalyst Optimizer Catalog API
  • 29. Copyright @ 2019 Learntek. All Rights Reserved. 29 Limitation of Data Frame Introduction to Dataset Introduction to Encoder Creating Dataset Functional transformation on Dataset Loading CSV, JSON, Parquet format file in SparkSQL Loading and saving data from/in Hive, JDBC, HDFS, Cassandra Introduction to User-Defined-Function (UDF) Customizing a UDF Usage of UDF in DataFrame Transformations API Usage of UDF in Spark SQL statement Introduction to Window Function Steps of defining a window function Illustration of Window function usage Introduction to UDAF Customizing a UDAF Illustration of customized UDAF usage
  • 30. Copyright @ 2019 Learntek. All Rights Reserved. 30 Spark Streaming Introduction to data streaming Spark Streaming framework Spark Streaming and Micro batch Introduction of DStreams DStreams and RDD Word Count example using Socket Text Stream Streaming with Twitter feeds Setting up a Twitter App Resolving Twitter dependency in Spark Streaming Application Steps of creating Uber Jar Example of extracting hashtags from tweet data Troubleshooting Twitter Streaming issue in Spark Application Steps of creating Spark Streaming Application Architecture of Spark Streaming Stateless Transformations
  • 31. Copyright @ 2019 Learntek. All Rights Reserved. 31 Twitter Streaming examples using stateless transformation Introduction to stateful Transformations Window Transformations Window Duration and Slide Duration Window Operations Naive and inverse window reduce operation Checkpoint Tracking State of an event using updateStateByKey operation Interact directly with RDD using transform () operation Example of HDFS file streaming Example of Spark-Kafka interaction Saving DStreams to external file system
  • 32. Copyright @ 2019 Learntek. All Rights Reserved. 32 For more Training Information , Contact Us Email : info@learntek.org USA : +1734 418 2465 INDIA : +40 4018 1306 +7799713624