SlideShare a Scribd company logo
9/21/2018 Top 20 Apache Spark Interview Questions & Answers 2017 | Acadgild Blogs
https://acadgild.com/blog/top-20-apache-spark-interview-questions-2017 1/8
Top 20 Apache Spark Interview Questions 2017
prateek • September 6, 2017  1  6,151
Big Data Hadoop & Spark - Advanced
Here are the top 20 Apache spark interview questions and their answers are
given just under to them. These sample spark interview questions are framed
by consultants from Aadgild who train for Spark coaching.To allow you an
inspiration of the sort to queries which can be asked in associate degree
interview. we’ve taken full care to convey correct answers for all the Apache
interview questions.
Click here for Hadoop Interview questions – Sqoop and Kafka
Top 20 Apache Spark Interview
Questions
1. What is Apache Spark?
A. Apache Spark is a cluster compu ng framework which runs on a cluster of
commodity hardware and performs data unifica on i.e., reading and wri ng of
wide variety of data from mul ple sources. In Spark, a task is an opera on that can
100% Free Course On Big
Data Essentials
Subscribe to our blog and get access to this course
ABSOLUTELY FREE.
Name
Email
Phone
Submit
9/21/2018 Top 20 Apache Spark Interview Questions & Answers 2017 | Acadgild Blogs
https://acadgild.com/blog/top-20-apache-spark-interview-questions-2017 2/8
be a map task or a reduce task. Spark Context handles the execu on of the job and
also provides API’s in different languages i.e., Scala, Java and Python to develop
applica ons and faster execu on as compared to MapReduce.
2. Why is Spark faster than MapReduce?
A. There are few important reasons why Spark is faster than MapReduce and some
of them are below:
There is no ght coupling in Spark i.e., there is no mandatory rule that reduce
must come a er map.
Spark tries to keep the data “in-memory” as much as possible.
In MapReduce, the intermediate data will be stored in HDFS and hence takes longer
me to get the data from a source but this is not the case with Spark.
3. Explain the Apache Spark Architecture.
Apache Spark applica on contains two programs namely a Driver program
and Workers program.
A cluster manager will be there in-between to interact with these two cluster
nodes. Spark Context will keep in touch with the worker nodes with the help
of Cluster Manager.
Spark Context is like a master and Spark workers are like slaves.
Workers contain the executors to run the job. If any dependencies or
arguments have to be passed then Spark Context will take care of that. RDD’s
will reside on the Spark Executors.
You can also run Spark applica ons locally using a thread, and if you want to
take advantage of distributed environments you can take the help of S3, HDFS
or any other storage system.
4. What is RDD?
A. RDD stands for Resilient Distributed Datasets (RDDs). If you have large amount
of data, and is not necessarily stored in a single system, all the data can be
distributed across all the nodes and one subset of data is called as a par on which
will be processed by a par cular task. RDD’s are very close to input splits in
MapReduce.
9/21/2018 Top 20 Apache Spark Interview Questions & Answers 2017 | Acadgild Blogs
https://acadgild.com/blog/top-20-apache-spark-interview-questions-2017 3/8
5. What is the role of coalesce () and repar on () in
Map Reduce?
A. Both coalesce and repar on are used to modify the number of par ons in an
RDD but Coalesce avoids full shuffle.
If you go from 1000 par ons to 100 par ons, there will not be a shuffle, instead
each of the 100 new par ons will claim 10 of the current par ons and this does
not require a shuffle.
Repar on performs a coalesce with shuffle. Repar on will result in the specified
number of par ons with the data distributed using a hash prac oner.
6. How do you specify the number of par ons while
crea ng an RDD?
A. You can specify the number of par ons while crea ng a RDD either by using
the sc.textFile or by using parallelize func ons as follows:
Val rdd = sc.parallelize(data,4)
val data = sc.textFile(“path”,4)
7. What are ac ons and transforma ons?
A. Transforma ons create new RDD’s from exis ng RDD and these transforma ons
are lazy and will not be executed un l you call any ac on.
Eg: map(), filter(), flatMap(), etc.,
Ac ons will return results of an RDD.
Eg: reduce(), count(), collect(), etc.,
8. What is Lazy Evalua on?
A. If you create any RDD from an exis ng RDD that is called as transforma on and
unless you call an ac on your RDD will not be materialized the reason is Spark will
delay the result un l you really want the result because there could be some
situa ons you have typed something and it went wrong and again you have to
9/21/2018 Top 20 Apache Spark Interview Questions & Answers 2017 | Acadgild Blogs
https://acadgild.com/blog/top-20-apache-spark-interview-questions-2017 4/8
correct it in an interac ve way it will increase the me and it will create un-
necessary delays. Also, Spark op mizes the required calcula ons and takes
intelligent decisions which is not possible with line by line code execu on. Spark
recovers from failures and slow workers.
9. Men on some Transforma ons and Ac ons
A. Transforma ons map (), filter(), flatMap()
Ac ons
reduce(), count(), collect()
10. What is the role of cache() and persist()?
A. Whenever you want to store a RDD into memory such that the RDD will be used
mul ple mes or that RDD might have created a er lots of complex processing in
those situa ons, you can take the advantage of Cache or Persist.
You can make an RDD to be persisted using the persist() or cache() func ons on it.
The first me it is computed in an ac on, it will be kept in memory on the nodes.
When you call persist(), you can specify that you want to store the RDD on the disk
or in the memory or both. If it is in-memory, whether it should be stored in
serialized format or de-serialized format, you can define all those things.
cache() is like persist() func on only, where the storage level is set to memory only.
11. What are Accumulators?
A. Accumulators are the write only variables which are ini alized once and sent to
the workers. These workers will update based on the logic wri en and sent back to
the driver which will aggregate or process based on the logic.
Only driver can access the accumulator’s value. For tasks, Accumulators are write-
only. For example, it is used to count the number errors seen in RDD across
workers.
12. What are Broadcast Variables?
9/21/2018 Top 20 Apache Spark Interview Questions & Answers 2017 | Acadgild Blogs
https://acadgild.com/blog/top-20-apache-spark-interview-questions-2017 5/8
A. Broadcast Variables are the read-only shared variables. Suppose, there is a set of
data which may have to be used mul ple mes in the workers at different phases,
we can share all those variables to the workers from the driver and every machine
can read them.
13. What are the op miza ons that developer can
make while working with spark?
A. Spark is memory intensive, whatever you do it does in memory.
Firstly, you can adjust how long spark will wait before it mes out on each of the
phases of data locality (data local –> process local –> node local –> rack local –>
Any).
Filter out data as early as possible. For caching, choose wisely from various storage
levels.
Tune the number of par ons in spark.
14. What is Spark SQL?
A. Spark SQL is a module for structured data processing where we take advantage
of SQL queries running on the datasets.
15. What is a Data Frame?
A. A data frame is like a table, it got some named columns which organized into
columns. You can create a data frame from a file or from tables in hive, external
databases SQL or NoSQL or exis ng RDD’s. It is analogous to a table.
16. How can you connect Hive to Spark SQL?
A. The first important thing is that you have to place hive-site.xml file in conf
directory of Spark.
Then with the help of Spark session object we can construct a data frame as,
result = spark.sql(“select * from <hive_table>”)
17. What is GraphX?
9/21/2018 Top 20 Apache Spark Interview Questions & Answers 2017 | Acadgild Blogs
https://acadgild.com/blog/top-20-apache-spark-interview-questions-2017 6/8
A. Many mes you have to process the data in the form of graphs, because you
have to do some analysis on it. It tries to perform Graph computa on in Spark in
which data is present in files or in RDD’s.
GraphX is built on the top of Spark core, so it has got all the capabili es of Apache
Spark like fault tolerance, scaling and there are many inbuilt graph algorithms also.
GraphX unifies ETL, exploratory analysis and itera ve graph computa on within a
single system.
You can view the same data as both graphs and collec ons, transform and join
graphs with RDD efficiently and write custom itera ve algorithms using the pregel
API.
GraphX competes on performance with the fastest graph systems while retaining
Spark’s flexibility, fault tolerance and ease of use.
18. What is PageRank Algorithm?
A. One of the algorithm in GraphX is PageRank algorithm. Pagerank measures the
importance of each vertex in a graph assuming an edge from u to v represents an
endorsements of v’s importance by u.
For exmaple, in Twi er if a twi er user is followed by many other users, that
par cular will be ranked highly. GraphX comes with sta c and dynamic
implementa ons of pageRank as methods on the pageRank object.
19. What is Spark Streaming?
A. Whenever there is data flowing con nuously and you want to process the data
as early as possible, in that case you can take the advantage of Spark Streaming. It
is the API for stream processing of live data.
Data can flow for Ka a, Flume or from TCP sockets, Kenisis etc., and you can do
complex processing on the data before you pushing them into their des na ons.
Des na ons can be file systems or databases or any other dashboards.
20. What is Sliding Window?
A. In Spark Streaming, you have to specify the batch interval. For example, let’s
take your batch interval is 10 seconds, Now Spark will process the data whatever it
9/21/2018 Top 20 Apache Spark Interview Questions & Answers 2017 | Acadgild Blogs
https://acadgild.com/blog/top-20-apache-spark-interview-questions-2017 7/8
 Tags apache spark interview question interview Questions
spark interview question 2017
gets in the last 10 seconds i.e., last batch interval me.
But with Sliding Window, you can specify how many last batches has to be
processed. In the below screen shot, you can see that you can specify the batch
interval and how many batches you want to process.
Apart from this, you can also specify when you want to process your last sliding
window. For example you want to process the last 3 batches when there are 2 new
batches. That is like when you want to slide and how many batches has to be
processed in that window.
Hope this post helped you know some important spark interview questions
that are often asked in the Apache Spark topic.
Related Popular Courses:
HADOOP BIG DATA
CERTIFIED ANDROID DEVELOPER COURSE
APACHE KAFKA TUTORIAL
DATA SCIENCE CERTIFICATION
DATA ANALYSIS COURSE
Related
Step by Step Guide to
Master Apache Spark
November 14, 2016
In "All Categories"
Beginner's Guide for
Spark 2017
July 6, 2017
In "Big Data Hadoop &
Spark"
What is JOIN in Apache
Spark
October 14, 2016
In "Big Data Hadoop &
Spark"
9/21/2018 Top 20 Apache Spark Interview Questions & Answers 2017 | Acadgild Blogs
https://acadgild.com/blog/top-20-apache-spark-interview-questions-2017 8/8
This site uses Akismet to reduce spam. Learn how your comment data is processed.
Reply
amar
September 29, 2017 at 2:03 PM
we got some good interview questions on apache spark . All the answer
are given properly .Helpful stuff .
One Comment

More Related Content

What's hot

Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...
Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...
Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...
Edureka!
 
Apache Spark 101
Apache Spark 101Apache Spark 101
Apache Spark 101
Abdullah Çetin ÇAVDAR
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
Simplilearn
 
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...
Edureka!
 
Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...
Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...
Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...
Simplilearn
 
Spark vs Hadoop
Spark vs HadoopSpark vs Hadoop
Spark vs Hadoop
Olesya Eidam
 
Fast Data Analytics with Spark and Python
Fast Data Analytics with Spark and PythonFast Data Analytics with Spark and Python
Fast Data Analytics with Spark and Python
Benjamin Bengfort
 
Evolution of spark framework for simplifying data analysis.
Evolution of spark framework for simplifying data analysis.Evolution of spark framework for simplifying data analysis.
Evolution of spark framework for simplifying data analysis.
Anirudh Gangwar
 
What's new with Apache Spark?
What's new with Apache Spark?What's new with Apache Spark?
What's new with Apache Spark?
Paco Nathan
 
Spark architecture
Spark architectureSpark architecture
Spark architecture
GauravBiswas9
 
Spark, Python and Parquet
Spark, Python and Parquet Spark, Python and Parquet
Spark, Python and Parquet
odsc
 
Module01
 Module01 Module01
Module01
NPN Training
 
Spark for big data analytics
Spark for big data analyticsSpark for big data analytics
Spark for big data analytics
Edureka!
 
Started with-apache-spark
Started with-apache-sparkStarted with-apache-spark
Started with-apache-spark
Happiest Minds Technologies
 
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
Edureka!
 
Transitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkTransitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to Spark
Slim Baltagi
 
Spark Streaming | Twitter Sentiment Analysis Example | Apache Spark Training ...
Spark Streaming | Twitter Sentiment Analysis Example | Apache Spark Training ...Spark Streaming | Twitter Sentiment Analysis Example | Apache Spark Training ...
Spark Streaming | Twitter Sentiment Analysis Example | Apache Spark Training ...
Edureka!
 
Introduction to Spark ML
Introduction to Spark MLIntroduction to Spark ML
Introduction to Spark ML
Holden Karau
 
Introduction to spark
Introduction to sparkIntroduction to spark
Introduction to spark
Home
 

What's hot (20)

Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...
Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...
Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...
 
Apache Spark 101
Apache Spark 101Apache Spark 101
Apache Spark 101
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
 
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...
 
Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...
Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...
Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...
 
Spark vs Hadoop
Spark vs HadoopSpark vs Hadoop
Spark vs Hadoop
 
Fast Data Analytics with Spark and Python
Fast Data Analytics with Spark and PythonFast Data Analytics with Spark and Python
Fast Data Analytics with Spark and Python
 
Evolution of spark framework for simplifying data analysis.
Evolution of spark framework for simplifying data analysis.Evolution of spark framework for simplifying data analysis.
Evolution of spark framework for simplifying data analysis.
 
spark_v1_2
spark_v1_2spark_v1_2
spark_v1_2
 
What's new with Apache Spark?
What's new with Apache Spark?What's new with Apache Spark?
What's new with Apache Spark?
 
Spark architecture
Spark architectureSpark architecture
Spark architecture
 
Spark, Python and Parquet
Spark, Python and Parquet Spark, Python and Parquet
Spark, Python and Parquet
 
Module01
 Module01 Module01
Module01
 
Spark for big data analytics
Spark for big data analyticsSpark for big data analytics
Spark for big data analytics
 
Started with-apache-spark
Started with-apache-sparkStarted with-apache-spark
Started with-apache-spark
 
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
 
Transitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkTransitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to Spark
 
Spark Streaming | Twitter Sentiment Analysis Example | Apache Spark Training ...
Spark Streaming | Twitter Sentiment Analysis Example | Apache Spark Training ...Spark Streaming | Twitter Sentiment Analysis Example | Apache Spark Training ...
Spark Streaming | Twitter Sentiment Analysis Example | Apache Spark Training ...
 
Introduction to Spark ML
Introduction to Spark MLIntroduction to Spark ML
Introduction to Spark ML
 
Introduction to spark
Introduction to sparkIntroduction to spark
Introduction to spark
 

Similar to spark interview questions & answers acadgild blogs

Apache Spark Introduction.pdf
Apache Spark Introduction.pdfApache Spark Introduction.pdf
Apache Spark Introduction.pdf
MaheshPandit16
 
Apache Spark PDF
Apache Spark PDFApache Spark PDF
Apache Spark PDF
Naresh Rupareliya
 
Spark from the Surface
Spark from the SurfaceSpark from the Surface
Spark from the Surface
Josi Aranda
 
5 things one must know about spark!
5 things one must know about spark!5 things one must know about spark!
5 things one must know about spark!
Edureka!
 
Introduction to Spark - DataFactZ
Introduction to Spark - DataFactZIntroduction to Spark - DataFactZ
Introduction to Spark - DataFactZ
DataFactZ
 
Machine Learning with SparkR
Machine Learning with SparkRMachine Learning with SparkR
Machine Learning with SparkR
Olgun Aydın
 
5 reasons why spark is in demand!
5 reasons why spark is in demand!5 reasons why spark is in demand!
5 reasons why spark is in demand!
Edureka!
 
Spark1
Spark1Spark1
Hadoop vs spark
Hadoop vs sparkHadoop vs spark
Hadoop vs spark
amarkayam
 
Apache Spark beyond Hadoop MapReduce
Apache Spark beyond Hadoop MapReduceApache Spark beyond Hadoop MapReduce
Apache Spark beyond Hadoop MapReduce
Edureka!
 
Learn about SPARK tool and it's componemts
Learn about SPARK tool and it's componemtsLearn about SPARK tool and it's componemts
Learn about SPARK tool and it's componemts
siddharth30121
 
Why Spark over Hadoop?
Why Spark over Hadoop?Why Spark over Hadoop?
Why Spark over Hadoop?
Prwatech Institution
 
Spark
SparkSpark
Apache spark
Apache sparkApache spark
Apache spark
Dona Mary Philip
 
Quick Guide to Refresh Spark skills
Quick Guide to Refresh Spark skillsQuick Guide to Refresh Spark skills
Quick Guide to Refresh Spark skills
Ravindra kumar
 
5 Reasons why Spark is in demand!
5 Reasons why Spark is in demand!5 Reasons why Spark is in demand!
5 Reasons why Spark is in demand!
Edureka!
 
Big data processing with apache spark part1
Big data processing with apache spark   part1Big data processing with apache spark   part1
Big data processing with apache spark part1
Abbas Maazallahi
 
Using pySpark with Google Colab & Spark 3.0 preview
Using pySpark with Google Colab & Spark 3.0 previewUsing pySpark with Google Colab & Spark 3.0 preview
Using pySpark with Google Colab & Spark 3.0 preview
Mario Cartia
 
APACHE SPARK INTERVIEW QUESTIONS AND ANSWERS 2021
APACHE SPARK INTERVIEW QUESTIONS AND ANSWERS 2021APACHE SPARK INTERVIEW QUESTIONS AND ANSWERS 2021
APACHE SPARK INTERVIEW QUESTIONS AND ANSWERS 2021
Sprintzeal
 

Similar to spark interview questions & answers acadgild blogs (20)

Apache Spark Introduction.pdf
Apache Spark Introduction.pdfApache Spark Introduction.pdf
Apache Spark Introduction.pdf
 
Apache Spark PDF
Apache Spark PDFApache Spark PDF
Apache Spark PDF
 
Spark from the Surface
Spark from the SurfaceSpark from the Surface
Spark from the Surface
 
5 things one must know about spark!
5 things one must know about spark!5 things one must know about spark!
5 things one must know about spark!
 
SparkPaper
SparkPaperSparkPaper
SparkPaper
 
Introduction to Spark - DataFactZ
Introduction to Spark - DataFactZIntroduction to Spark - DataFactZ
Introduction to Spark - DataFactZ
 
Machine Learning with SparkR
Machine Learning with SparkRMachine Learning with SparkR
Machine Learning with SparkR
 
5 reasons why spark is in demand!
5 reasons why spark is in demand!5 reasons why spark is in demand!
5 reasons why spark is in demand!
 
Spark1
Spark1Spark1
Spark1
 
Hadoop vs spark
Hadoop vs sparkHadoop vs spark
Hadoop vs spark
 
Apache Spark beyond Hadoop MapReduce
Apache Spark beyond Hadoop MapReduceApache Spark beyond Hadoop MapReduce
Apache Spark beyond Hadoop MapReduce
 
Learn about SPARK tool and it's componemts
Learn about SPARK tool and it's componemtsLearn about SPARK tool and it's componemts
Learn about SPARK tool and it's componemts
 
Why Spark over Hadoop?
Why Spark over Hadoop?Why Spark over Hadoop?
Why Spark over Hadoop?
 
Spark
SparkSpark
Spark
 
Apache spark
Apache sparkApache spark
Apache spark
 
Quick Guide to Refresh Spark skills
Quick Guide to Refresh Spark skillsQuick Guide to Refresh Spark skills
Quick Guide to Refresh Spark skills
 
5 Reasons why Spark is in demand!
5 Reasons why Spark is in demand!5 Reasons why Spark is in demand!
5 Reasons why Spark is in demand!
 
Big data processing with apache spark part1
Big data processing with apache spark   part1Big data processing with apache spark   part1
Big data processing with apache spark part1
 
Using pySpark with Google Colab & Spark 3.0 preview
Using pySpark with Google Colab & Spark 3.0 previewUsing pySpark with Google Colab & Spark 3.0 preview
Using pySpark with Google Colab & Spark 3.0 preview
 
APACHE SPARK INTERVIEW QUESTIONS AND ANSWERS 2021
APACHE SPARK INTERVIEW QUESTIONS AND ANSWERS 2021APACHE SPARK INTERVIEW QUESTIONS AND ANSWERS 2021
APACHE SPARK INTERVIEW QUESTIONS AND ANSWERS 2021
 

Recently uploaded

Brand Identity For A Sportscaster Project and Portfolio I
Brand Identity For A Sportscaster Project and Portfolio IBrand Identity For A Sportscaster Project and Portfolio I
Brand Identity For A Sportscaster Project and Portfolio I
thomasaolson2000
 
一比一原版(TMU毕业证)多伦多都会大学毕业证如何办理
一比一原版(TMU毕业证)多伦多都会大学毕业证如何办理一比一原版(TMU毕业证)多伦多都会大学毕业证如何办理
一比一原版(TMU毕业证)多伦多都会大学毕业证如何办理
yuhofha
 
Full Sail_Morales_Michael_SMM_2024-05.pptx
Full Sail_Morales_Michael_SMM_2024-05.pptxFull Sail_Morales_Michael_SMM_2024-05.pptx
Full Sail_Morales_Michael_SMM_2024-05.pptx
mmorales2173
 
Personal Brand Exploration Comedy Jxnelle.
Personal Brand Exploration Comedy Jxnelle.Personal Brand Exploration Comedy Jxnelle.
Personal Brand Exploration Comedy Jxnelle.
alexthomas971
 
Andrea Kate Portfolio Presentation.pdf
Andrea Kate  Portfolio  Presentation.pdfAndrea Kate  Portfolio  Presentation.pdf
Andrea Kate Portfolio Presentation.pdf
andreakaterasco
 
How to Master LinkedIn for Career and Business
How to Master LinkedIn for Career and BusinessHow to Master LinkedIn for Career and Business
How to Master LinkedIn for Career and Business
ideatoipo
 
一比一原版(YU毕业证)约克大学毕业证如何办理
一比一原版(YU毕业证)约克大学毕业证如何办理一比一原版(YU毕业证)约克大学毕业证如何办理
一比一原版(YU毕业证)约克大学毕业证如何办理
yuhofha
 
Digital Marketing Training In Bangalore
Digital  Marketing Training In BangaloreDigital  Marketing Training In Bangalore
Digital Marketing Training In Bangalore
nidm599
 
lab.123456789123456789123456789123456789
lab.123456789123456789123456789123456789lab.123456789123456789123456789123456789
lab.123456789123456789123456789123456789
Ghh
 
一比一原版(UVic毕业证)维多利亚大学毕业证如何办理
一比一原版(UVic毕业证)维多利亚大学毕业证如何办理一比一原版(UVic毕业证)维多利亚大学毕业证如何办理
一比一原版(UVic毕业证)维多利亚大学毕业证如何办理
pxyhy
 
'Guidance and counselling- role of Psychologist in Guidance and Counselling.
'Guidance and counselling- role of Psychologist in Guidance and Counselling.'Guidance and counselling- role of Psychologist in Guidance and Counselling.
'Guidance and counselling- role of Psychologist in Guidance and Counselling.
PaviBangera
 
How Mentoring Elevates Your PM Career | PMI Silver Spring Chapter
How Mentoring Elevates Your PM Career | PMI Silver Spring ChapterHow Mentoring Elevates Your PM Career | PMI Silver Spring Chapter
How Mentoring Elevates Your PM Career | PMI Silver Spring Chapter
Hector Del Castillo, CPM, CPMM
 
Jill Pizzola's Tenure as Senior Talent Acquisition Partner at THOMSON REUTERS...
Jill Pizzola's Tenure as Senior Talent Acquisition Partner at THOMSON REUTERS...Jill Pizzola's Tenure as Senior Talent Acquisition Partner at THOMSON REUTERS...
Jill Pizzola's Tenure as Senior Talent Acquisition Partner at THOMSON REUTERS...
dsnow9802
 
0624.speakingengagementsandteaching-01.pdf
0624.speakingengagementsandteaching-01.pdf0624.speakingengagementsandteaching-01.pdf
0624.speakingengagementsandteaching-01.pdf
Thomas GIRARD BDes
 
New Explore Careers and College Majors 2024
New Explore Careers and College Majors 2024New Explore Careers and College Majors 2024
New Explore Careers and College Majors 2024
Dr. Mary Askew
 
RECOGNITION AWARD 13 - TO ALESSANDRO MARTINS.pdf
RECOGNITION AWARD 13 - TO ALESSANDRO MARTINS.pdfRECOGNITION AWARD 13 - TO ALESSANDRO MARTINS.pdf
RECOGNITION AWARD 13 - TO ALESSANDRO MARTINS.pdf
AlessandroMartins454470
 
原版制作(RMIT毕业证书)墨尔本皇家理工大学毕业证在读证明一模一样
原版制作(RMIT毕业证书)墨尔本皇家理工大学毕业证在读证明一模一样原版制作(RMIT毕业证书)墨尔本皇家理工大学毕业证在读证明一模一样
原版制作(RMIT毕业证书)墨尔本皇家理工大学毕业证在读证明一模一样
atwvhyhm
 
labb123456789123456789123456789123456789
labb123456789123456789123456789123456789labb123456789123456789123456789123456789
labb123456789123456789123456789123456789
Ghh
 
Exploring Career Paths in Cybersecurity for Technical Communicators
Exploring Career Paths in Cybersecurity for Technical CommunicatorsExploring Career Paths in Cybersecurity for Technical Communicators
Exploring Career Paths in Cybersecurity for Technical Communicators
Ben Woelk, CISSP, CPTC
 
MISS TEEN GONDA 2024 - WINNER ABHA VISHWAKARMA
MISS TEEN GONDA 2024 - WINNER ABHA VISHWAKARMAMISS TEEN GONDA 2024 - WINNER ABHA VISHWAKARMA
MISS TEEN GONDA 2024 - WINNER ABHA VISHWAKARMA
DK PAGEANT
 

Recently uploaded (20)

Brand Identity For A Sportscaster Project and Portfolio I
Brand Identity For A Sportscaster Project and Portfolio IBrand Identity For A Sportscaster Project and Portfolio I
Brand Identity For A Sportscaster Project and Portfolio I
 
一比一原版(TMU毕业证)多伦多都会大学毕业证如何办理
一比一原版(TMU毕业证)多伦多都会大学毕业证如何办理一比一原版(TMU毕业证)多伦多都会大学毕业证如何办理
一比一原版(TMU毕业证)多伦多都会大学毕业证如何办理
 
Full Sail_Morales_Michael_SMM_2024-05.pptx
Full Sail_Morales_Michael_SMM_2024-05.pptxFull Sail_Morales_Michael_SMM_2024-05.pptx
Full Sail_Morales_Michael_SMM_2024-05.pptx
 
Personal Brand Exploration Comedy Jxnelle.
Personal Brand Exploration Comedy Jxnelle.Personal Brand Exploration Comedy Jxnelle.
Personal Brand Exploration Comedy Jxnelle.
 
Andrea Kate Portfolio Presentation.pdf
Andrea Kate  Portfolio  Presentation.pdfAndrea Kate  Portfolio  Presentation.pdf
Andrea Kate Portfolio Presentation.pdf
 
How to Master LinkedIn for Career and Business
How to Master LinkedIn for Career and BusinessHow to Master LinkedIn for Career and Business
How to Master LinkedIn for Career and Business
 
一比一原版(YU毕业证)约克大学毕业证如何办理
一比一原版(YU毕业证)约克大学毕业证如何办理一比一原版(YU毕业证)约克大学毕业证如何办理
一比一原版(YU毕业证)约克大学毕业证如何办理
 
Digital Marketing Training In Bangalore
Digital  Marketing Training In BangaloreDigital  Marketing Training In Bangalore
Digital Marketing Training In Bangalore
 
lab.123456789123456789123456789123456789
lab.123456789123456789123456789123456789lab.123456789123456789123456789123456789
lab.123456789123456789123456789123456789
 
一比一原版(UVic毕业证)维多利亚大学毕业证如何办理
一比一原版(UVic毕业证)维多利亚大学毕业证如何办理一比一原版(UVic毕业证)维多利亚大学毕业证如何办理
一比一原版(UVic毕业证)维多利亚大学毕业证如何办理
 
'Guidance and counselling- role of Psychologist in Guidance and Counselling.
'Guidance and counselling- role of Psychologist in Guidance and Counselling.'Guidance and counselling- role of Psychologist in Guidance and Counselling.
'Guidance and counselling- role of Psychologist in Guidance and Counselling.
 
How Mentoring Elevates Your PM Career | PMI Silver Spring Chapter
How Mentoring Elevates Your PM Career | PMI Silver Spring ChapterHow Mentoring Elevates Your PM Career | PMI Silver Spring Chapter
How Mentoring Elevates Your PM Career | PMI Silver Spring Chapter
 
Jill Pizzola's Tenure as Senior Talent Acquisition Partner at THOMSON REUTERS...
Jill Pizzola's Tenure as Senior Talent Acquisition Partner at THOMSON REUTERS...Jill Pizzola's Tenure as Senior Talent Acquisition Partner at THOMSON REUTERS...
Jill Pizzola's Tenure as Senior Talent Acquisition Partner at THOMSON REUTERS...
 
0624.speakingengagementsandteaching-01.pdf
0624.speakingengagementsandteaching-01.pdf0624.speakingengagementsandteaching-01.pdf
0624.speakingengagementsandteaching-01.pdf
 
New Explore Careers and College Majors 2024
New Explore Careers and College Majors 2024New Explore Careers and College Majors 2024
New Explore Careers and College Majors 2024
 
RECOGNITION AWARD 13 - TO ALESSANDRO MARTINS.pdf
RECOGNITION AWARD 13 - TO ALESSANDRO MARTINS.pdfRECOGNITION AWARD 13 - TO ALESSANDRO MARTINS.pdf
RECOGNITION AWARD 13 - TO ALESSANDRO MARTINS.pdf
 
原版制作(RMIT毕业证书)墨尔本皇家理工大学毕业证在读证明一模一样
原版制作(RMIT毕业证书)墨尔本皇家理工大学毕业证在读证明一模一样原版制作(RMIT毕业证书)墨尔本皇家理工大学毕业证在读证明一模一样
原版制作(RMIT毕业证书)墨尔本皇家理工大学毕业证在读证明一模一样
 
labb123456789123456789123456789123456789
labb123456789123456789123456789123456789labb123456789123456789123456789123456789
labb123456789123456789123456789123456789
 
Exploring Career Paths in Cybersecurity for Technical Communicators
Exploring Career Paths in Cybersecurity for Technical CommunicatorsExploring Career Paths in Cybersecurity for Technical Communicators
Exploring Career Paths in Cybersecurity for Technical Communicators
 
MISS TEEN GONDA 2024 - WINNER ABHA VISHWAKARMA
MISS TEEN GONDA 2024 - WINNER ABHA VISHWAKARMAMISS TEEN GONDA 2024 - WINNER ABHA VISHWAKARMA
MISS TEEN GONDA 2024 - WINNER ABHA VISHWAKARMA
 

spark interview questions & answers acadgild blogs

  • 1. 9/21/2018 Top 20 Apache Spark Interview Questions & Answers 2017 | Acadgild Blogs https://acadgild.com/blog/top-20-apache-spark-interview-questions-2017 1/8 Top 20 Apache Spark Interview Questions 2017 prateek • September 6, 2017  1  6,151 Big Data Hadoop & Spark - Advanced Here are the top 20 Apache spark interview questions and their answers are given just under to them. These sample spark interview questions are framed by consultants from Aadgild who train for Spark coaching.To allow you an inspiration of the sort to queries which can be asked in associate degree interview. we’ve taken full care to convey correct answers for all the Apache interview questions. Click here for Hadoop Interview questions – Sqoop and Kafka Top 20 Apache Spark Interview Questions 1. What is Apache Spark? A. Apache Spark is a cluster compu ng framework which runs on a cluster of commodity hardware and performs data unifica on i.e., reading and wri ng of wide variety of data from mul ple sources. In Spark, a task is an opera on that can 100% Free Course On Big Data Essentials Subscribe to our blog and get access to this course ABSOLUTELY FREE. Name Email Phone Submit
  • 2. 9/21/2018 Top 20 Apache Spark Interview Questions & Answers 2017 | Acadgild Blogs https://acadgild.com/blog/top-20-apache-spark-interview-questions-2017 2/8 be a map task or a reduce task. Spark Context handles the execu on of the job and also provides API’s in different languages i.e., Scala, Java and Python to develop applica ons and faster execu on as compared to MapReduce. 2. Why is Spark faster than MapReduce? A. There are few important reasons why Spark is faster than MapReduce and some of them are below: There is no ght coupling in Spark i.e., there is no mandatory rule that reduce must come a er map. Spark tries to keep the data “in-memory” as much as possible. In MapReduce, the intermediate data will be stored in HDFS and hence takes longer me to get the data from a source but this is not the case with Spark. 3. Explain the Apache Spark Architecture. Apache Spark applica on contains two programs namely a Driver program and Workers program. A cluster manager will be there in-between to interact with these two cluster nodes. Spark Context will keep in touch with the worker nodes with the help of Cluster Manager. Spark Context is like a master and Spark workers are like slaves. Workers contain the executors to run the job. If any dependencies or arguments have to be passed then Spark Context will take care of that. RDD’s will reside on the Spark Executors. You can also run Spark applica ons locally using a thread, and if you want to take advantage of distributed environments you can take the help of S3, HDFS or any other storage system. 4. What is RDD? A. RDD stands for Resilient Distributed Datasets (RDDs). If you have large amount of data, and is not necessarily stored in a single system, all the data can be distributed across all the nodes and one subset of data is called as a par on which will be processed by a par cular task. RDD’s are very close to input splits in MapReduce.
  • 3. 9/21/2018 Top 20 Apache Spark Interview Questions & Answers 2017 | Acadgild Blogs https://acadgild.com/blog/top-20-apache-spark-interview-questions-2017 3/8 5. What is the role of coalesce () and repar on () in Map Reduce? A. Both coalesce and repar on are used to modify the number of par ons in an RDD but Coalesce avoids full shuffle. If you go from 1000 par ons to 100 par ons, there will not be a shuffle, instead each of the 100 new par ons will claim 10 of the current par ons and this does not require a shuffle. Repar on performs a coalesce with shuffle. Repar on will result in the specified number of par ons with the data distributed using a hash prac oner. 6. How do you specify the number of par ons while crea ng an RDD? A. You can specify the number of par ons while crea ng a RDD either by using the sc.textFile or by using parallelize func ons as follows: Val rdd = sc.parallelize(data,4) val data = sc.textFile(“path”,4) 7. What are ac ons and transforma ons? A. Transforma ons create new RDD’s from exis ng RDD and these transforma ons are lazy and will not be executed un l you call any ac on. Eg: map(), filter(), flatMap(), etc., Ac ons will return results of an RDD. Eg: reduce(), count(), collect(), etc., 8. What is Lazy Evalua on? A. If you create any RDD from an exis ng RDD that is called as transforma on and unless you call an ac on your RDD will not be materialized the reason is Spark will delay the result un l you really want the result because there could be some situa ons you have typed something and it went wrong and again you have to
  • 4. 9/21/2018 Top 20 Apache Spark Interview Questions & Answers 2017 | Acadgild Blogs https://acadgild.com/blog/top-20-apache-spark-interview-questions-2017 4/8 correct it in an interac ve way it will increase the me and it will create un- necessary delays. Also, Spark op mizes the required calcula ons and takes intelligent decisions which is not possible with line by line code execu on. Spark recovers from failures and slow workers. 9. Men on some Transforma ons and Ac ons A. Transforma ons map (), filter(), flatMap() Ac ons reduce(), count(), collect() 10. What is the role of cache() and persist()? A. Whenever you want to store a RDD into memory such that the RDD will be used mul ple mes or that RDD might have created a er lots of complex processing in those situa ons, you can take the advantage of Cache or Persist. You can make an RDD to be persisted using the persist() or cache() func ons on it. The first me it is computed in an ac on, it will be kept in memory on the nodes. When you call persist(), you can specify that you want to store the RDD on the disk or in the memory or both. If it is in-memory, whether it should be stored in serialized format or de-serialized format, you can define all those things. cache() is like persist() func on only, where the storage level is set to memory only. 11. What are Accumulators? A. Accumulators are the write only variables which are ini alized once and sent to the workers. These workers will update based on the logic wri en and sent back to the driver which will aggregate or process based on the logic. Only driver can access the accumulator’s value. For tasks, Accumulators are write- only. For example, it is used to count the number errors seen in RDD across workers. 12. What are Broadcast Variables?
  • 5. 9/21/2018 Top 20 Apache Spark Interview Questions & Answers 2017 | Acadgild Blogs https://acadgild.com/blog/top-20-apache-spark-interview-questions-2017 5/8 A. Broadcast Variables are the read-only shared variables. Suppose, there is a set of data which may have to be used mul ple mes in the workers at different phases, we can share all those variables to the workers from the driver and every machine can read them. 13. What are the op miza ons that developer can make while working with spark? A. Spark is memory intensive, whatever you do it does in memory. Firstly, you can adjust how long spark will wait before it mes out on each of the phases of data locality (data local –> process local –> node local –> rack local –> Any). Filter out data as early as possible. For caching, choose wisely from various storage levels. Tune the number of par ons in spark. 14. What is Spark SQL? A. Spark SQL is a module for structured data processing where we take advantage of SQL queries running on the datasets. 15. What is a Data Frame? A. A data frame is like a table, it got some named columns which organized into columns. You can create a data frame from a file or from tables in hive, external databases SQL or NoSQL or exis ng RDD’s. It is analogous to a table. 16. How can you connect Hive to Spark SQL? A. The first important thing is that you have to place hive-site.xml file in conf directory of Spark. Then with the help of Spark session object we can construct a data frame as, result = spark.sql(“select * from <hive_table>”) 17. What is GraphX?
  • 6. 9/21/2018 Top 20 Apache Spark Interview Questions & Answers 2017 | Acadgild Blogs https://acadgild.com/blog/top-20-apache-spark-interview-questions-2017 6/8 A. Many mes you have to process the data in the form of graphs, because you have to do some analysis on it. It tries to perform Graph computa on in Spark in which data is present in files or in RDD’s. GraphX is built on the top of Spark core, so it has got all the capabili es of Apache Spark like fault tolerance, scaling and there are many inbuilt graph algorithms also. GraphX unifies ETL, exploratory analysis and itera ve graph computa on within a single system. You can view the same data as both graphs and collec ons, transform and join graphs with RDD efficiently and write custom itera ve algorithms using the pregel API. GraphX competes on performance with the fastest graph systems while retaining Spark’s flexibility, fault tolerance and ease of use. 18. What is PageRank Algorithm? A. One of the algorithm in GraphX is PageRank algorithm. Pagerank measures the importance of each vertex in a graph assuming an edge from u to v represents an endorsements of v’s importance by u. For exmaple, in Twi er if a twi er user is followed by many other users, that par cular will be ranked highly. GraphX comes with sta c and dynamic implementa ons of pageRank as methods on the pageRank object. 19. What is Spark Streaming? A. Whenever there is data flowing con nuously and you want to process the data as early as possible, in that case you can take the advantage of Spark Streaming. It is the API for stream processing of live data. Data can flow for Ka a, Flume or from TCP sockets, Kenisis etc., and you can do complex processing on the data before you pushing them into their des na ons. Des na ons can be file systems or databases or any other dashboards. 20. What is Sliding Window? A. In Spark Streaming, you have to specify the batch interval. For example, let’s take your batch interval is 10 seconds, Now Spark will process the data whatever it
  • 7. 9/21/2018 Top 20 Apache Spark Interview Questions & Answers 2017 | Acadgild Blogs https://acadgild.com/blog/top-20-apache-spark-interview-questions-2017 7/8  Tags apache spark interview question interview Questions spark interview question 2017 gets in the last 10 seconds i.e., last batch interval me. But with Sliding Window, you can specify how many last batches has to be processed. In the below screen shot, you can see that you can specify the batch interval and how many batches you want to process. Apart from this, you can also specify when you want to process your last sliding window. For example you want to process the last 3 batches when there are 2 new batches. That is like when you want to slide and how many batches has to be processed in that window. Hope this post helped you know some important spark interview questions that are often asked in the Apache Spark topic. Related Popular Courses: HADOOP BIG DATA CERTIFIED ANDROID DEVELOPER COURSE APACHE KAFKA TUTORIAL DATA SCIENCE CERTIFICATION DATA ANALYSIS COURSE Related Step by Step Guide to Master Apache Spark November 14, 2016 In "All Categories" Beginner's Guide for Spark 2017 July 6, 2017 In "Big Data Hadoop & Spark" What is JOIN in Apache Spark October 14, 2016 In "Big Data Hadoop & Spark"
  • 8. 9/21/2018 Top 20 Apache Spark Interview Questions & Answers 2017 | Acadgild Blogs https://acadgild.com/blog/top-20-apache-spark-interview-questions-2017 8/8 This site uses Akismet to reduce spam. Learn how your comment data is processed. Reply amar September 29, 2017 at 2:03 PM we got some good interview questions on apache spark . All the answer are given properly .Helpful stuff . One Comment