SlideShare a Scribd company logo
1 of 21
Apache Spark with Java 8
CHAPTER – 4
THE BASICS OF SEARCH ENGINE FRIENDLY DESIGN & DEVELOPMENT
Copyright @ 2019 Learntek. All Rights Reserved. 3
Apache Spark with Java 8 Training : Why Spark?
Apache Spark with Java 8 Training :Spark was introduced by Apache Software
Foundation for speeding up the Hadoop software computing process.
The main feature of Spark is its in-memory cluster computing that highly increases
the speed of an application processing.
Spark is designed to cover a wide range of workloads such as batch applications,
iterative algorithms, interactive queries and streaming applications by reducing
the management burden of maintaining separate tools.
Copyright @ 2019 Learntek. All Rights Reserved. 4
Apache Spark also have the following features.
Speed− Spark helps to run an application in Hadoop cluster, up to 100 times faster
in memory and 10 times faster when running on disk by reducing number of
read/write operations to disk and by storing the intermediate processing data in
memory.
Supports multiple languages− Spark comes up with 80 high-level operators for
interactive querying and provides application development with built-in APIs in
different languages in Java, Scala, or Python.
Advanced Analytics− Spark not only supports ‘Map’ and ‘reduce’ programming but
it also supports SQL queries, Streaming data, Machine learning (ML), and Graph
algorithms.
Copyright @ 2019 Learntek. All Rights Reserved. 5
Apache Spark with Java 8 Training : Why Java8
With the introduction of lambda expression in Java8, it has provided support of
functional programming in a beautiful way. In addition to lambda expression, it has
also introduced Streaming API, which can be thought of as a collection framework
for functional programming in Java without storing the elements. With of
introduction of lambda expression in Java8, code can be written in more concise
and elegant way. Learning curve has also become quite smooth as one has to learn
just Apache Spark API, not Scala.
Copyright @ 2019 Learntek. All Rights Reserved. 6
Apache Spark with Java – Overview of Java8
Overview of Interface, Static method and Default method in interface
Anonymous Inner Classes
Introduction to Lambda Expressions
Functional Interface, type inference
Method references
Composing Lambda
Understanding Closure
Overview of Streams
Working with Streams
Infinite Streams
Copyright @ 2019 Learntek. All Rights Reserved. 7
Apache Spark with java – Introduction to Spark
Introduction to Big Data
Big Data Problem
Scale-Up Vs Scale-Out Architecture
Characteristics of Scale-Out
Introduction to Hadoop, Map-Reduce and HDFS
Introducing Spark
Copyright @ 2019 Learntek. All Rights Reserved. 8
Hortonworks Data Platform (HDP) using Virtual box
Importing HDP VM image using Virtual box on local machine
Configuring HDP
Overview of Ambari and its components
Overview of services configuration using Ambari
Overview of Apache Zeppelin
Creating, importing and executing notebooks in Apache Zeppelin
IDEs for Spark Applications
Intellij
Eclipse
Resolving dependencies for Spark applications
Copyright @ 2015 Learntek. All Rights Reserved.
9
Spark Basics
Spark Shell
Overview of Spark architecture
Storage layers for Spark
Initialize a Spark Context and building
applications
Submitting a Spark Application
Use of Spark History Server
Spark Components
Spark Driver Process
Spark Executor
Spark Conf and Spark Context
SparkSession object
Overview of spark-submit command
Spark UI
Copyright @ 2019 Learntek. All Rights Reserved. 10
RDDs
Overview of RDD
RDD and Partitions
Ways of Creating RDD
RDD transformations and Actions
Lazy evaluation
RDD Lineage Graph (DAG)
Element wise transformations
Map Vs FlatMap Transformation
Set Transformation
RDD Actions
Overview of RDD persistence
Methods for persisting RDD
Persisting RDD with Storage option
Illustration of Caching on an RDD in DAG
Removal of Cached RDD
Copyright @ 2019 Learntek. All Rights Reserved. 11
Pair RDDs
Overview of Key-Value Pair RDD
Ways of creating Pair RDDs
Transformations on Pair RDD
ReduceByKey(), FoldByKey(),MapValues(),
FlatMapValues(),keys() and Values()
Transformation
Grouping, Joining, Sorting on Pair RDD
ReduceByKey() Vs GroupByKey()
Pair RDD Action
Copyright @ 2019 Learntek. All Rights Reserved. 12
Launching Spark on cluster
Configure and launch Spark Cluster on Google Cloud
Configure and launch Spark Cluster on Microsoft Azure
Logging and Debugging a Spark Application
Setting up a window environment for executing Spark Application using IDE
Steps of using slf4j logging mechanism in Spark Application
Attaching a debugger to Spark Application
Example of debugging a Spark application running inside a cluster
Copyright @ 2019 Learntek. All Rights Reserved. 13
Spark Application Architecture
Spark Application Distributed Architecture
Spark Application submission Mode
Overview of Cluster Manager
Example of using Standalone Cluster Manager
Driver and its responsibilities
Overview of Job, Stage and Tasks
Spark Job Hierarchy
Executor
Spark-submit command and various submission
options
Yarn Cluster Manager
Yarn Architecture
Client and Cluster Deploy-mode
Copyright @ 2019 Learntek. All Rights Reserved. 14
Advance concepts in Spark
Accumulator
Broadcast
RDD partitioning
Re-partition RDD
Determining RDD partitioner
Partition based RDD like mapPartitions,
mapPartitionsWithIndex,
mapPartitionsToPair
Copyright @ 2019 Learntek. All Rights Reserved. 15
Spark SQL
Introduction to SparkSQL
Creating SparkSession with Hive Support
DataFrame
Ways of Creating DataFrame
Registering a DataFrame as View
DataFrame Transformations API
DataFrame SQL statement
Aggregate Operations
DataFrame Action
Catalyst Optimizer
Limitation of DataFrame
Introduction to Dataset
Copyright @ 2019 Learntek. All Rights Reserved. 16
Introduction to Encoder
Creating Dataset
Functional transformation on Dataset
Loading CSV, JSON, Parquet format file in SparkSQL
Loading and saving data from/in Hive, JDBC, HDFS, Cassandra
Introduction to User-Defined-Function (UDF)
Customizing a UDF
Usage of UDF in DataFrame Transformations
API
Usage of UDF in Spark SQL statement
Introduction to Window Function
Steps of defining a window function
Illustration of Window function usage
Copyright @ 2015 Learntek. All Rights Reserved. 17
Introduction to UDAF
Customizing a UDAF
Illustration of customized UDAF usage
Copyright @ 2019 Learntek. All Rights Reserved. 18
Basic Spark Streaming
Introduction to data streaming
Spark Streaming framework
Spark Streaming and Micro batch
Introduction of DStreams
DStreams and RDD
Word Count example using Socket Text Stream
streaming with Twitter feeds
Setting up a Twitter App
Resolving Twitter dependency in Spark Streaming Application
Copyright @ 2019 Learntek. All Rights Reserved. 19
Steps of creating Uber Jar
Example of extracting hashtags from tweet data
Troubleshooting Twitter Streaming issue in Spark Application
Steps of creating Spark Streaming Application
Architecture of Spark Streaming
Stateless Transformations
Twitter Streaming examples using stateless transformation
Introduction to stateful Transformations
Window Duration and Slide Duration
Window Operations
Naive and inverse window reduce operation
Checkpoint
Tracking State of an event using updateStateByKey operation
Copyright @ 2019 Learntek. All Rights Reserved. 20
Interact directly with RDD using transform () operation
Example of HDFS file streaming
Example of Spark-Kafka interaction
Saving DStreams to external file system
Prerequisites of Apache Spark with Java 8:
Understanding of OOPS concept and programming construct in Java will be
required. Having programming experience in Java7 will be mandatory. Having
understanding or experience of Lambda expressions in Java8 will be an added
advantage.
Copyright @ 2019 Learntek. All Rights Reserved. 21
For more Training Information , Contact Us
Email : info@learntek.org
USA : +1734 418 2465
INDIA : +40 4018 1306
+7799713624

More Related Content

What's hot

Nagarjuna_Damarla
Nagarjuna_DamarlaNagarjuna_Damarla
Nagarjuna_Damarla
Nag Arjun
 
2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3 2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3
Chester Chen
 

What's hot (20)

Started with-apache-spark
Started with-apache-sparkStarted with-apache-spark
Started with-apache-spark
 
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
 
An Early Evaluation of Running Spark on Kubernetes
An Early Evaluation of Running Spark on KubernetesAn Early Evaluation of Running Spark on Kubernetes
An Early Evaluation of Running Spark on Kubernetes
 
Improving Python and Spark (PySpark) Performance and Interoperability
Improving Python and Spark (PySpark) Performance and InteroperabilityImproving Python and Spark (PySpark) Performance and Interoperability
Improving Python and Spark (PySpark) Performance and Interoperability
 
resumePdf
resumePdfresumePdf
resumePdf
 
Big Data Processing With Spark
Big Data Processing With SparkBig Data Processing With Spark
Big Data Processing With Spark
 
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-AirflowPyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
 
Implementing a highly scalable stock prediction system with R, Geode, SpringX...
Implementing a highly scalable stock prediction system with R, Geode, SpringX...Implementing a highly scalable stock prediction system with R, Geode, SpringX...
Implementing a highly scalable stock prediction system with R, Geode, SpringX...
 
Sunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scala
Sunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scalaSunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scala
Sunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scala
 
Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...
Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...
Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...
 
Scalable Machine Learning with PySpark
Scalable Machine Learning with PySparkScalable Machine Learning with PySpark
Scalable Machine Learning with PySpark
 
Data integration-on-hadoop
Data integration-on-hadoopData integration-on-hadoop
Data integration-on-hadoop
 
Nagarjuna_Damarla
Nagarjuna_DamarlaNagarjuna_Damarla
Nagarjuna_Damarla
 
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...
 
Spark Will Replace Hadoop ! Know Why
Spark Will Replace Hadoop ! Know Why Spark Will Replace Hadoop ! Know Why
Spark Will Replace Hadoop ! Know Why
 
PySpark Training | PySpark Tutorial for Beginners | Apache Spark with Python ...
PySpark Training | PySpark Tutorial for Beginners | Apache Spark with Python ...PySpark Training | PySpark Tutorial for Beginners | Apache Spark with Python ...
PySpark Training | PySpark Tutorial for Beginners | Apache Spark with Python ...
 
Resume_Karthick
Resume_KarthickResume_Karthick
Resume_Karthick
 
Clean coding in plsql and sql, v2
Clean coding in plsql and sql, v2Clean coding in plsql and sql, v2
Clean coding in plsql and sql, v2
 
2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3 2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3
 
JSON and Oracle Database: A Brave New World
 JSON and Oracle Database: A Brave New World JSON and Oracle Database: A Brave New World
JSON and Oracle Database: A Brave New World
 

Similar to Apache spark with java 8

Similar to Apache spark with java 8 (20)

Datascience Training with Hadoop, Python Machine Learning & Scala, Spark
Datascience Training with Hadoop, Python Machine Learning & Scala, SparkDatascience Training with Hadoop, Python Machine Learning & Scala, Spark
Datascience Training with Hadoop, Python Machine Learning & Scala, Spark
 
Apache Spark Introduction.pdf
Apache Spark Introduction.pdfApache Spark Introduction.pdf
Apache Spark Introduction.pdf
 
Apache Spark: Lightning Fast Cluster Computing
Apache Spark: Lightning Fast Cluster ComputingApache Spark: Lightning Fast Cluster Computing
Apache Spark: Lightning Fast Cluster Computing
 
5 things one must know about spark!
5 things one must know about spark!5 things one must know about spark!
5 things one must know about spark!
 
Why spark by Stratio - v.1.0
Why spark by Stratio - v.1.0Why spark by Stratio - v.1.0
Why spark by Stratio - v.1.0
 
5 reasons why spark is in demand!
5 reasons why spark is in demand!5 reasons why spark is in demand!
5 reasons why spark is in demand!
 
Module01
 Module01 Module01
Module01
 
Bring the Spark To Your Eyes
Bring the Spark To Your EyesBring the Spark To Your Eyes
Bring the Spark To Your Eyes
 
Apache spark
Apache sparkApache spark
Apache spark
 
spark interview questions & answers acadgild blogs
 spark interview questions & answers acadgild blogs spark interview questions & answers acadgild blogs
spark interview questions & answers acadgild blogs
 
Apache Spark - Intro to Large-scale recommendations with Apache Spark and Python
Apache Spark - Intro to Large-scale recommendations with Apache Spark and PythonApache Spark - Intro to Large-scale recommendations with Apache Spark and Python
Apache Spark - Intro to Large-scale recommendations with Apache Spark and Python
 
A short introduction to Spark and its benefits
A short introduction to Spark and its benefitsA short introduction to Spark and its benefits
A short introduction to Spark and its benefits
 
Spark 101
Spark 101Spark 101
Spark 101
 
Spark For Plain Old Java Geeks (June2014 Meetup)
Spark For Plain Old Java Geeks (June2014 Meetup)Spark For Plain Old Java Geeks (June2014 Meetup)
Spark For Plain Old Java Geeks (June2014 Meetup)
 
H2O PySparkling Water
H2O PySparkling WaterH2O PySparkling Water
H2O PySparkling Water
 
Apache Spark Introduction @ University College London
Apache Spark Introduction @ University College LondonApache Spark Introduction @ University College London
Apache Spark Introduction @ University College London
 
Spark architecture
Spark architectureSpark architecture
Spark architecture
 
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop EcosystemWhy Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
 
Introduction to Spark - DataFactZ
Introduction to Spark - DataFactZIntroduction to Spark - DataFactZ
Introduction to Spark - DataFactZ
 
Insight on "From Hadoop to Spark" by Mark Kerzner
Insight on "From Hadoop to Spark" by Mark KerznerInsight on "From Hadoop to Spark" by Mark Kerzner
Insight on "From Hadoop to Spark" by Mark Kerzner
 

More from Janu Jahnavi

More from Janu Jahnavi (20)

Analytics using r programming
Analytics using r programmingAnalytics using r programming
Analytics using r programming
 
Software testing
Software testingSoftware testing
Software testing
 
Software testing
Software testingSoftware testing
Software testing
 
Spring
SpringSpring
Spring
 
Stack skills
Stack skillsStack skills
Stack skills
 
Ui devopler
Ui devoplerUi devopler
Ui devopler
 
Apache flink
Apache flinkApache flink
Apache flink
 
Apache flink
Apache flinkApache flink
Apache flink
 
Angular js
Angular jsAngular js
Angular js
 
Mysql python
Mysql pythonMysql python
Mysql python
 
Mysql python
Mysql pythonMysql python
Mysql python
 
Ruby with cucmber
Ruby with cucmberRuby with cucmber
Ruby with cucmber
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Google cloud platform
Google cloud platformGoogle cloud platform
Google cloud platform
 
Google cloud Platform
Google cloud PlatformGoogle cloud Platform
Google cloud Platform
 
Apache spark with java 8
Apache spark with java 8Apache spark with java 8
Apache spark with java 8
 
Categorizing and pos tagging with nltk python
Categorizing and pos tagging with nltk pythonCategorizing and pos tagging with nltk python
Categorizing and pos tagging with nltk python
 
Categorizing and pos tagging with nltk python
Categorizing and pos tagging with nltk pythonCategorizing and pos tagging with nltk python
Categorizing and pos tagging with nltk python
 
Python multithreading
Python multithreadingPython multithreading
Python multithreading
 

Recently uploaded

Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 

Recently uploaded (20)

Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 

Apache spark with java 8

  • 2. CHAPTER – 4 THE BASICS OF SEARCH ENGINE FRIENDLY DESIGN & DEVELOPMENT
  • 3. Copyright @ 2019 Learntek. All Rights Reserved. 3 Apache Spark with Java 8 Training : Why Spark? Apache Spark with Java 8 Training :Spark was introduced by Apache Software Foundation for speeding up the Hadoop software computing process. The main feature of Spark is its in-memory cluster computing that highly increases the speed of an application processing. Spark is designed to cover a wide range of workloads such as batch applications, iterative algorithms, interactive queries and streaming applications by reducing the management burden of maintaining separate tools.
  • 4. Copyright @ 2019 Learntek. All Rights Reserved. 4 Apache Spark also have the following features. Speed− Spark helps to run an application in Hadoop cluster, up to 100 times faster in memory and 10 times faster when running on disk by reducing number of read/write operations to disk and by storing the intermediate processing data in memory. Supports multiple languages− Spark comes up with 80 high-level operators for interactive querying and provides application development with built-in APIs in different languages in Java, Scala, or Python. Advanced Analytics− Spark not only supports ‘Map’ and ‘reduce’ programming but it also supports SQL queries, Streaming data, Machine learning (ML), and Graph algorithms.
  • 5. Copyright @ 2019 Learntek. All Rights Reserved. 5 Apache Spark with Java 8 Training : Why Java8 With the introduction of lambda expression in Java8, it has provided support of functional programming in a beautiful way. In addition to lambda expression, it has also introduced Streaming API, which can be thought of as a collection framework for functional programming in Java without storing the elements. With of introduction of lambda expression in Java8, code can be written in more concise and elegant way. Learning curve has also become quite smooth as one has to learn just Apache Spark API, not Scala.
  • 6. Copyright @ 2019 Learntek. All Rights Reserved. 6 Apache Spark with Java – Overview of Java8 Overview of Interface, Static method and Default method in interface Anonymous Inner Classes Introduction to Lambda Expressions Functional Interface, type inference Method references Composing Lambda Understanding Closure Overview of Streams Working with Streams Infinite Streams
  • 7. Copyright @ 2019 Learntek. All Rights Reserved. 7 Apache Spark with java – Introduction to Spark Introduction to Big Data Big Data Problem Scale-Up Vs Scale-Out Architecture Characteristics of Scale-Out Introduction to Hadoop, Map-Reduce and HDFS Introducing Spark
  • 8. Copyright @ 2019 Learntek. All Rights Reserved. 8 Hortonworks Data Platform (HDP) using Virtual box Importing HDP VM image using Virtual box on local machine Configuring HDP Overview of Ambari and its components Overview of services configuration using Ambari Overview of Apache Zeppelin Creating, importing and executing notebooks in Apache Zeppelin IDEs for Spark Applications Intellij Eclipse Resolving dependencies for Spark applications
  • 9. Copyright @ 2015 Learntek. All Rights Reserved. 9 Spark Basics Spark Shell Overview of Spark architecture Storage layers for Spark Initialize a Spark Context and building applications Submitting a Spark Application Use of Spark History Server Spark Components Spark Driver Process Spark Executor Spark Conf and Spark Context SparkSession object Overview of spark-submit command Spark UI
  • 10. Copyright @ 2019 Learntek. All Rights Reserved. 10 RDDs Overview of RDD RDD and Partitions Ways of Creating RDD RDD transformations and Actions Lazy evaluation RDD Lineage Graph (DAG) Element wise transformations Map Vs FlatMap Transformation Set Transformation RDD Actions Overview of RDD persistence Methods for persisting RDD Persisting RDD with Storage option Illustration of Caching on an RDD in DAG Removal of Cached RDD
  • 11. Copyright @ 2019 Learntek. All Rights Reserved. 11 Pair RDDs Overview of Key-Value Pair RDD Ways of creating Pair RDDs Transformations on Pair RDD ReduceByKey(), FoldByKey(),MapValues(), FlatMapValues(),keys() and Values() Transformation Grouping, Joining, Sorting on Pair RDD ReduceByKey() Vs GroupByKey() Pair RDD Action
  • 12. Copyright @ 2019 Learntek. All Rights Reserved. 12 Launching Spark on cluster Configure and launch Spark Cluster on Google Cloud Configure and launch Spark Cluster on Microsoft Azure Logging and Debugging a Spark Application Setting up a window environment for executing Spark Application using IDE Steps of using slf4j logging mechanism in Spark Application Attaching a debugger to Spark Application Example of debugging a Spark application running inside a cluster
  • 13. Copyright @ 2019 Learntek. All Rights Reserved. 13 Spark Application Architecture Spark Application Distributed Architecture Spark Application submission Mode Overview of Cluster Manager Example of using Standalone Cluster Manager Driver and its responsibilities Overview of Job, Stage and Tasks Spark Job Hierarchy Executor Spark-submit command and various submission options Yarn Cluster Manager Yarn Architecture Client and Cluster Deploy-mode
  • 14. Copyright @ 2019 Learntek. All Rights Reserved. 14 Advance concepts in Spark Accumulator Broadcast RDD partitioning Re-partition RDD Determining RDD partitioner Partition based RDD like mapPartitions, mapPartitionsWithIndex, mapPartitionsToPair
  • 15. Copyright @ 2019 Learntek. All Rights Reserved. 15 Spark SQL Introduction to SparkSQL Creating SparkSession with Hive Support DataFrame Ways of Creating DataFrame Registering a DataFrame as View DataFrame Transformations API DataFrame SQL statement Aggregate Operations DataFrame Action Catalyst Optimizer Limitation of DataFrame Introduction to Dataset
  • 16. Copyright @ 2019 Learntek. All Rights Reserved. 16 Introduction to Encoder Creating Dataset Functional transformation on Dataset Loading CSV, JSON, Parquet format file in SparkSQL Loading and saving data from/in Hive, JDBC, HDFS, Cassandra Introduction to User-Defined-Function (UDF) Customizing a UDF Usage of UDF in DataFrame Transformations API Usage of UDF in Spark SQL statement Introduction to Window Function Steps of defining a window function Illustration of Window function usage
  • 17. Copyright @ 2015 Learntek. All Rights Reserved. 17 Introduction to UDAF Customizing a UDAF Illustration of customized UDAF usage
  • 18. Copyright @ 2019 Learntek. All Rights Reserved. 18 Basic Spark Streaming Introduction to data streaming Spark Streaming framework Spark Streaming and Micro batch Introduction of DStreams DStreams and RDD Word Count example using Socket Text Stream streaming with Twitter feeds Setting up a Twitter App Resolving Twitter dependency in Spark Streaming Application
  • 19. Copyright @ 2019 Learntek. All Rights Reserved. 19 Steps of creating Uber Jar Example of extracting hashtags from tweet data Troubleshooting Twitter Streaming issue in Spark Application Steps of creating Spark Streaming Application Architecture of Spark Streaming Stateless Transformations Twitter Streaming examples using stateless transformation Introduction to stateful Transformations Window Duration and Slide Duration Window Operations Naive and inverse window reduce operation Checkpoint Tracking State of an event using updateStateByKey operation
  • 20. Copyright @ 2019 Learntek. All Rights Reserved. 20 Interact directly with RDD using transform () operation Example of HDFS file streaming Example of Spark-Kafka interaction Saving DStreams to external file system Prerequisites of Apache Spark with Java 8: Understanding of OOPS concept and programming construct in Java will be required. Having programming experience in Java7 will be mandatory. Having understanding or experience of Lambda expressions in Java8 will be an added advantage.
  • 21. Copyright @ 2019 Learntek. All Rights Reserved. 21 For more Training Information , Contact Us Email : info@learntek.org USA : +1734 418 2465 INDIA : +40 4018 1306 +7799713624