SlideShare a Scribd company logo
SHWETA PATNAIK
M.TECH (CSE)
 What is SPARK
 SPARK Architecture
 Why Spark when Hadoop is already there?
 Spark vs. Hadoop MapReduce
 Apache Spark Installation
 Operating or Deploying a Spark Cluster Manually
 Running Spark Application
 Spark Environment
 Spark was introduced by Apache Software Foundation for speeding up
the Hadoop computational computing software process.
 Apache Spark is an open-source cluster computing framework for real-time
processing.
 Spark is being adopted by major players likeAmazon, eBay, andYahoo!
 It was built on top of Hadoop MapReduce and it extends the MapReduce
model to efficiently use more types of computations.
 Hadoop is just one of the ways to implement Spark.
 Spark uses Hadoop in two ways – one is storage and second
is processing.
 Spark has its own cluster management computation.
 Hadoop is based on batch processing of big data.This means that the
data is stored over a period of time and is then processed using Hadoop.
 Whereas in Spark, processing can take place in real-time.This real-time
processing power in Spark helps us to solve the use cases of RealTime
Analytics .
 Spark is also able to do batch processing 100 times faster than that of
Hadoop MapReduce (Processing framework in Apache Hadoop).
1. Introduction
 Apache Spark – It is an open source big data framework. It provides
faster and more general purpose data processing engine. Spark is
basically designed for fast computation.
 Hadoop MapReduce – It is also an open source framework for writing
applications. It also processes structured and unstructured data that are
stored in HDFS. MapReduce can process data in batch mode.
2. Speed
 Apache Spark – Spark is lightning fast cluster computing tool. Apache
Spark runs applications up to 100x faster in memory and 10x faster on
disk than Hadoop.
 Hadoop MapReduce – MapReduce reads and writes from disk, as a
result, it slows down the processing speed.
3. Real-time analysis
 Apache Spark – It can process real time data i.e. data coming from the real-time
event streams at the rate of millions of events per second, e.g.Twitter data for
instance or Facebook sharing/posting. Spark’s strength is the ability to process live
streams efficiently.
 Hadoop MapReduce – MapReduce fails when it comes to real-time data processing
as it was designed to perform batch processing on voluminous amounts of data.
4. latency
 Apache Spark – Spark provides low-latency computing.
 Hadoop MapReduce – MapReduce is a high latency computing framework.
5. Interactive mode
 Apache Spark – Spark can process data interactively.
 Hadoop MapReduce – MapReduce doesn’t have an interactive mode.
6. Streaming
 Apache Spark – Spark can process real time data through Spark Streaming.
 Hadoop MapReduce –With MapReduce, you can only process data in batch mode.
7. Fault tolerance
 Apache Spark – Spark is fault-tolerant. As a result, there is no need to restart
the application from scratch in case of any failure.
 Hadoop MapReduce – Like Apache Spark, MapReduce is also fault-tolerant,
so there is no need to restart the application from scratch in case of any
failure.
8. Cost
 Apache Spark – As spark requires a lot of RAM to run in-memory.Thus,
increases the cluster, and also its cost.
 Hadoop MapReduce – MapReduce is a cheaper option available while
comparing it in terms of cost.
9. Language Developed
 Apache Spark – Spark is developed in Scala.
 Hadoop MapReduce – Hadoop MapReduce is developed in Java.
10. OS support
 Apache Spark – Spark supports cross-platform.
 Hadoop MapReduce – Hadoop MapReduce also supports cross-
platform.
11. Programming Language support
 Apache Spark – Scala, Java, Python, R, SQL.
 Hadoop MapReduce – Primarily Java, other languages like C, C++, Ruby,
Groovy, Perl, Python are also supported using Hadoop streaming.
 13. SQL support
 Apache Spark – It enables the user to run SQL queries using Spark SQL.
 Hadoop MapReduce – It enables users to run SQL queries using
Apache Hive(HQL).
 14. Scalability
 Apache Spark – Spark is highly scalable.Thus, we can add n number of
nodes in the cluster. Also a largest known Spark Cluster is of 8000 nodes.
 Hadoop MapReduce – MapReduce is also highly scalable we can keep
adding n number of nodes in the cluster. Also, a largest known Hadoop
cluster is of 14000 nodes.
15.The line of code
 Apache Spark – Apache Spark is developed in merely 20000 line of
codes.
 Hadoop MapReduce – Hadoop 2.0 has 1,20,000 line of codes
16. Machine Learning
 Apache Spark – Spark has its own set of machine learning
i.e. MLlib.
 Hadoop MapReduce – Hadoop requires machine learning tool for
example Apache Mahout.
17.Hardware Requirements
 Apache Spark – Spark needs mid to high-level hardware.
 Hadoop MapReduce – MapReduce runs very well on commodity
hardware.
 Download Scala from the link:
http://downloads.lightbend.com/scala/2.11.8/scala-2.11.8.msi
 Install Scala :Under C drive in Scala folder(C:scala)
 My Computer properties  advanced system settings environment
variables  tab of user variable New
 User variable:
Variable: SCALA_HOME
Value: C:scala
Click Ok Button
 System variable:
Variable: PATH
Value: C:scalabin
Then Click okokok
 Download it from the following link:
http://spark.apache.org/downloads.html and extract it into E drive, such
as E:Spark.(you can extract in other drives also)
 User variable:
Variable: SPARK_HOME
Value: E:sparkspark-1.6.1-bin-hadoop2.3
 System variable:
Variable: PATH
Value: E:sparkspark-1.6.1-bin-hadoop2.3bin
 Download it from the link:
 https://github.com/prabaprakash/Hadoop-2.3/tree/master/bin
(winutils.exe) And paste it in E:sparkspark-1.6.1-bin-hadoop2.3bin
 See below:
 Let’s look at the hadoop MapReduce example ofWord Count inApache
Spark –
 The input in the file input.txt contains the following text –
 We will submit the word count example in Apache Spark using the Spark
shell instead of running the word count program as a whole -
 Let’s start Spark shell
 Let’s create a Spark RDD using the input file that we want to run our first Spark
program on.You should specify the absolute path of the input file-
 On executing the above command, the following output is observed -
 Now is the step to count the number of words –
 You will get the following output:
 The next step is to store the output in a text file-
 Go to the output directory (location where you have created the file
named output)
ThankYou

More Related Content

What's hot

Getting started big data
Getting started big dataGetting started big data
Getting started big data
Kibrom Gebrehiwot
 
Hadoop
HadoopHadoop
5 things one must know about spark!
5 things one must know about spark!5 things one must know about spark!
5 things one must know about spark!
Edureka!
 
Apache Spark Notes
Apache Spark NotesApache Spark Notes
Apache Spark Notes
Venkateswaran Kandasamy
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache Hadoop
Steve Watt
 
Hadoop cluster configuration
Hadoop cluster configurationHadoop cluster configuration
Hadoop cluster configuration
prabakaranbrick
 
Apache spark
Apache sparkApache spark
Apache spark
Prashant Pranay
 
Hadoop vs spark
Hadoop vs sparkHadoop vs spark
Hadoop vs spark
amarkayam
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
Stanley Wang
 
Yarns about YARN: Migrating to MapReduce v2
Yarns about YARN: Migrating to MapReduce v2Yarns about YARN: Migrating to MapReduce v2
Yarns about YARN: Migrating to MapReduce v2
DataWorks Summit
 
Hadoop YARN
Hadoop YARNHadoop YARN
Hadoop YARN
Vigen Sahakyan
 
Hadoop Administration pdf
Hadoop Administration pdfHadoop Administration pdf
Hadoop Administration pdf
Edureka!
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce Fundamentals
Lynn Langit
 
YARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute PlatformYARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute Platform
Bikas Saha
 
Introduction to Yarn
Introduction to YarnIntroduction to Yarn
Introduction to Yarn
Apache Apex
 
Facebook Hadoop Data & Applications
Facebook Hadoop Data & ApplicationsFacebook Hadoop Data & Applications
Facebook Hadoop Data & Applications
dzhou
 
Introduction to YARN Apps
Introduction to YARN AppsIntroduction to YARN Apps
Introduction to YARN Apps
Cloudera, Inc.
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
sunera pathan
 
Cassandra/Hadoop Integration
Cassandra/Hadoop IntegrationCassandra/Hadoop Integration
Cassandra/Hadoop Integration
Jeremy Hanna
 
Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...
Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...
Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...
spinningmatt
 

What's hot (20)

Getting started big data
Getting started big dataGetting started big data
Getting started big data
 
Hadoop
HadoopHadoop
Hadoop
 
5 things one must know about spark!
5 things one must know about spark!5 things one must know about spark!
5 things one must know about spark!
 
Apache Spark Notes
Apache Spark NotesApache Spark Notes
Apache Spark Notes
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache Hadoop
 
Hadoop cluster configuration
Hadoop cluster configurationHadoop cluster configuration
Hadoop cluster configuration
 
Apache spark
Apache sparkApache spark
Apache spark
 
Hadoop vs spark
Hadoop vs sparkHadoop vs spark
Hadoop vs spark
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Yarns about YARN: Migrating to MapReduce v2
Yarns about YARN: Migrating to MapReduce v2Yarns about YARN: Migrating to MapReduce v2
Yarns about YARN: Migrating to MapReduce v2
 
Hadoop YARN
Hadoop YARNHadoop YARN
Hadoop YARN
 
Hadoop Administration pdf
Hadoop Administration pdfHadoop Administration pdf
Hadoop Administration pdf
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce Fundamentals
 
YARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute PlatformYARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute Platform
 
Introduction to Yarn
Introduction to YarnIntroduction to Yarn
Introduction to Yarn
 
Facebook Hadoop Data & Applications
Facebook Hadoop Data & ApplicationsFacebook Hadoop Data & Applications
Facebook Hadoop Data & Applications
 
Introduction to YARN Apps
Introduction to YARN AppsIntroduction to YARN Apps
Introduction to YARN Apps
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
 
Cassandra/Hadoop Integration
Cassandra/Hadoop IntegrationCassandra/Hadoop Integration
Cassandra/Hadoop Integration
 
Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...
Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...
Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...
 

Similar to Apache spark installation [autosaved]

Apache spark
Apache sparkApache spark
Apache spark
Dona Mary Philip
 
SparkPaper
SparkPaperSparkPaper
SparkPaper
Suraj Thapaliya
 
Module01
 Module01 Module01
Module01
NPN Training
 
spark_v1_2
spark_v1_2spark_v1_2
spark_v1_2
Frank Schroeter
 
Introduction to spark
Introduction to sparkIntroduction to spark
Introduction to spark
Home
 
Spark SQL | Apache Spark
Spark SQL | Apache SparkSpark SQL | Apache Spark
Spark SQL | Apache Spark
Edureka!
 
Big Data Processing With Spark
Big Data Processing With SparkBig Data Processing With Spark
Big Data Processing With Spark
Edureka!
 
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...
Edureka!
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
Samy Dindane
 
Why Spark over Hadoop?
Why Spark over Hadoop?Why Spark over Hadoop?
Why Spark over Hadoop?
Prwatech Institution
 
Hadoop Vs Spark — Choosing the Right Big Data Framework
Hadoop Vs Spark — Choosing the Right Big Data FrameworkHadoop Vs Spark — Choosing the Right Big Data Framework
Hadoop Vs Spark — Choosing the Right Big Data Framework
Alaina Carter
 
In Memory Analytics with Apache Spark
In Memory Analytics with Apache SparkIn Memory Analytics with Apache Spark
In Memory Analytics with Apache Spark
Venkata Naga Ravi
 
Apache Spark PDF
Apache Spark PDFApache Spark PDF
Apache Spark PDF
Naresh Rupareliya
 
Low latency access of bigdata using spark and shark
Low latency access of bigdata using spark and sharkLow latency access of bigdata using spark and shark
Low latency access of bigdata using spark and shark
Pradeep Kumar G.S
 
Spark 1.6 vs Spark 2.0
Spark 1.6 vs Spark 2.0Spark 1.6 vs Spark 2.0
Spark 1.6 vs Spark 2.0
Sigmoid
 
Apachespark 160612140708
Apachespark 160612140708Apachespark 160612140708
Apachespark 160612140708
Srikrishna k
 
Apache spark
Apache sparkApache spark
Apache spark
Ramakrishna kapa
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
Simplilearn
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
Dharmjit Singh
 
Learn Apache Spark: A Comprehensive Guide
Learn Apache Spark: A Comprehensive GuideLearn Apache Spark: A Comprehensive Guide
Learn Apache Spark: A Comprehensive Guide
Whizlabs
 

Similar to Apache spark installation [autosaved] (20)

Apache spark
Apache sparkApache spark
Apache spark
 
SparkPaper
SparkPaperSparkPaper
SparkPaper
 
Module01
 Module01 Module01
Module01
 
spark_v1_2
spark_v1_2spark_v1_2
spark_v1_2
 
Introduction to spark
Introduction to sparkIntroduction to spark
Introduction to spark
 
Spark SQL | Apache Spark
Spark SQL | Apache SparkSpark SQL | Apache Spark
Spark SQL | Apache Spark
 
Big Data Processing With Spark
Big Data Processing With SparkBig Data Processing With Spark
Big Data Processing With Spark
 
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Why Spark over Hadoop?
Why Spark over Hadoop?Why Spark over Hadoop?
Why Spark over Hadoop?
 
Hadoop Vs Spark — Choosing the Right Big Data Framework
Hadoop Vs Spark — Choosing the Right Big Data FrameworkHadoop Vs Spark — Choosing the Right Big Data Framework
Hadoop Vs Spark — Choosing the Right Big Data Framework
 
In Memory Analytics with Apache Spark
In Memory Analytics with Apache SparkIn Memory Analytics with Apache Spark
In Memory Analytics with Apache Spark
 
Apache Spark PDF
Apache Spark PDFApache Spark PDF
Apache Spark PDF
 
Low latency access of bigdata using spark and shark
Low latency access of bigdata using spark and sharkLow latency access of bigdata using spark and shark
Low latency access of bigdata using spark and shark
 
Spark 1.6 vs Spark 2.0
Spark 1.6 vs Spark 2.0Spark 1.6 vs Spark 2.0
Spark 1.6 vs Spark 2.0
 
Apachespark 160612140708
Apachespark 160612140708Apachespark 160612140708
Apachespark 160612140708
 
Apache spark
Apache sparkApache spark
Apache spark
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 
Learn Apache Spark: A Comprehensive Guide
Learn Apache Spark: A Comprehensive GuideLearn Apache Spark: A Comprehensive Guide
Learn Apache Spark: A Comprehensive Guide
 

Recently uploaded

一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
eddie19851
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 

Recently uploaded (20)

一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 

Apache spark installation [autosaved]

  • 2.  What is SPARK  SPARK Architecture  Why Spark when Hadoop is already there?  Spark vs. Hadoop MapReduce  Apache Spark Installation  Operating or Deploying a Spark Cluster Manually  Running Spark Application  Spark Environment
  • 3.  Spark was introduced by Apache Software Foundation for speeding up the Hadoop computational computing software process.  Apache Spark is an open-source cluster computing framework for real-time processing.  Spark is being adopted by major players likeAmazon, eBay, andYahoo!  It was built on top of Hadoop MapReduce and it extends the MapReduce model to efficiently use more types of computations.  Hadoop is just one of the ways to implement Spark.  Spark uses Hadoop in two ways – one is storage and second is processing.  Spark has its own cluster management computation.
  • 4.
  • 5.  Hadoop is based on batch processing of big data.This means that the data is stored over a period of time and is then processed using Hadoop.  Whereas in Spark, processing can take place in real-time.This real-time processing power in Spark helps us to solve the use cases of RealTime Analytics .  Spark is also able to do batch processing 100 times faster than that of Hadoop MapReduce (Processing framework in Apache Hadoop).
  • 6.
  • 7. 1. Introduction  Apache Spark – It is an open source big data framework. It provides faster and more general purpose data processing engine. Spark is basically designed for fast computation.  Hadoop MapReduce – It is also an open source framework for writing applications. It also processes structured and unstructured data that are stored in HDFS. MapReduce can process data in batch mode. 2. Speed  Apache Spark – Spark is lightning fast cluster computing tool. Apache Spark runs applications up to 100x faster in memory and 10x faster on disk than Hadoop.  Hadoop MapReduce – MapReduce reads and writes from disk, as a result, it slows down the processing speed.
  • 8. 3. Real-time analysis  Apache Spark – It can process real time data i.e. data coming from the real-time event streams at the rate of millions of events per second, e.g.Twitter data for instance or Facebook sharing/posting. Spark’s strength is the ability to process live streams efficiently.  Hadoop MapReduce – MapReduce fails when it comes to real-time data processing as it was designed to perform batch processing on voluminous amounts of data. 4. latency  Apache Spark – Spark provides low-latency computing.  Hadoop MapReduce – MapReduce is a high latency computing framework. 5. Interactive mode  Apache Spark – Spark can process data interactively.  Hadoop MapReduce – MapReduce doesn’t have an interactive mode. 6. Streaming  Apache Spark – Spark can process real time data through Spark Streaming.  Hadoop MapReduce –With MapReduce, you can only process data in batch mode.
  • 9. 7. Fault tolerance  Apache Spark – Spark is fault-tolerant. As a result, there is no need to restart the application from scratch in case of any failure.  Hadoop MapReduce – Like Apache Spark, MapReduce is also fault-tolerant, so there is no need to restart the application from scratch in case of any failure. 8. Cost  Apache Spark – As spark requires a lot of RAM to run in-memory.Thus, increases the cluster, and also its cost.  Hadoop MapReduce – MapReduce is a cheaper option available while comparing it in terms of cost. 9. Language Developed  Apache Spark – Spark is developed in Scala.  Hadoop MapReduce – Hadoop MapReduce is developed in Java.
  • 10. 10. OS support  Apache Spark – Spark supports cross-platform.  Hadoop MapReduce – Hadoop MapReduce also supports cross- platform. 11. Programming Language support  Apache Spark – Scala, Java, Python, R, SQL.  Hadoop MapReduce – Primarily Java, other languages like C, C++, Ruby, Groovy, Perl, Python are also supported using Hadoop streaming.  13. SQL support  Apache Spark – It enables the user to run SQL queries using Spark SQL.  Hadoop MapReduce – It enables users to run SQL queries using Apache Hive(HQL).  14. Scalability  Apache Spark – Spark is highly scalable.Thus, we can add n number of nodes in the cluster. Also a largest known Spark Cluster is of 8000 nodes.  Hadoop MapReduce – MapReduce is also highly scalable we can keep adding n number of nodes in the cluster. Also, a largest known Hadoop cluster is of 14000 nodes.
  • 11. 15.The line of code  Apache Spark – Apache Spark is developed in merely 20000 line of codes.  Hadoop MapReduce – Hadoop 2.0 has 1,20,000 line of codes 16. Machine Learning  Apache Spark – Spark has its own set of machine learning i.e. MLlib.  Hadoop MapReduce – Hadoop requires machine learning tool for example Apache Mahout. 17.Hardware Requirements  Apache Spark – Spark needs mid to high-level hardware.  Hadoop MapReduce – MapReduce runs very well on commodity hardware.
  • 12.
  • 13.  Download Scala from the link: http://downloads.lightbend.com/scala/2.11.8/scala-2.11.8.msi  Install Scala :Under C drive in Scala folder(C:scala)
  • 14.  My Computer properties  advanced system settings environment variables  tab of user variable New
  • 15.  User variable: Variable: SCALA_HOME Value: C:scala Click Ok Button
  • 16.  System variable: Variable: PATH Value: C:scalabin Then Click okokok
  • 17.
  • 18.  Download it from the following link: http://spark.apache.org/downloads.html and extract it into E drive, such as E:Spark.(you can extract in other drives also)
  • 19.  User variable: Variable: SPARK_HOME Value: E:sparkspark-1.6.1-bin-hadoop2.3  System variable: Variable: PATH Value: E:sparkspark-1.6.1-bin-hadoop2.3bin
  • 20.  Download it from the link:  https://github.com/prabaprakash/Hadoop-2.3/tree/master/bin (winutils.exe) And paste it in E:sparkspark-1.6.1-bin-hadoop2.3bin
  • 22.  Let’s look at the hadoop MapReduce example ofWord Count inApache Spark –  The input in the file input.txt contains the following text –
  • 23.
  • 24.  We will submit the word count example in Apache Spark using the Spark shell instead of running the word count program as a whole -  Let’s start Spark shell
  • 25.  Let’s create a Spark RDD using the input file that we want to run our first Spark program on.You should specify the absolute path of the input file-  On executing the above command, the following output is observed -
  • 26.
  • 27.  Now is the step to count the number of words –  You will get the following output:
  • 28.  The next step is to store the output in a text file-  Go to the output directory (location where you have created the file named output)
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.