SlideShare a Scribd company logo
1 of 21
Download to read offline
www.edureka.co/apache-spark-scala-training
5 Reasons why Spark is in demand!
www.edureka.co/apache-spark-scala-training
Reasons why Spark is in Demand
 Reason #1 : Low Latency
 Reason #2 : Streaming Support
 Reason #3 : Machine Learning and Graph
 Reason #4 : Data Frame API introduction
 Reason #5 : Spark integration with Hadoop
www.edureka.co/apache-spark-scala-training
Spark Architecture
Machine
Learning
Library
Graph
programming
Spark interface
For RDBMS
lovers
Utility for
continuous
ingestion of data
www.edureka.co/apache-spark-scala-training
Low Latency
www.edureka.co/apache-spark-scala-training
Sparks Cuts Down Read/Write I/O To Disk
Spark is good for both data that fit In-Memory and Off-Memory
Spark tries to keep things in-memory of its distributed workers, allowing for significantly
faster/lower-latency computations, whereas MapReduce keeps shuffling things in and out of
disk.
www.edureka.co/apache-spark-scala-training
Time taken for a System to Sort 100 TB Of Data
The previous world record was 72 minutes, set by Yahoo using a Hadoop MapReduce
cluster of 2100 nodes
Using Spark on 206 EC2 nodes, it took only 23 minutes.
Spark sorted the same data 3X faster using 10X fewer machines
www.edureka.co/apache-spark-scala-training
Streaming Support
www.edureka.co/apache-spark-scala-training
Event processing
Used for processing the real-time streaming data.
It uses the DStream which is a series of RDDs, for processing the continuous real-time data.
The Spark Streaming API closely matches that of the Spark Core
The Spark Streaming API closely matches that of the Spark Core
www.edureka.co/apache-spark-scala-training
Machine Learning and Graph
Implementation with DAG
www.edureka.co/apache-spark-scala-training
Machine Learning
MLlib, a
machine
learning library
Classification Regression Clustering
Collaborative
filtering
Some of the algorithms also work with streaming data, such as linear regression using
ordinary least squares or k-means clustering
www.edureka.co/apache-spark-scala-training
Cyclic Data Flows
• All jobs in spark comprise a series of operators and run on a set of data.
• All the operators in a job are used to construct a DAG (Directed Acyclic Graph).
• The DAG is optimized by rearranging and combining operators where possible.
www.edureka.co/apache-spark-scala-training
GraphX
Graph
Algorithms
Page Rank
Connected
Components
Triangle
Counting
Component for graphs and graph-parallel computation
Extends the Spark RDD by introducing a new Graph abstraction
www.edureka.co/apache-spark-scala-training
Support for Data Frames
www.edureka.co/apache-spark-scala-training
DataFrame
As Spark continues to grow, it wants to enable wider audiences beyond “big data” engineers to
leverage the power of distributed processing.
Inspired by DataFrames in R and Python (pandas)
DataFrames API is designed to make big data processing on tabular data easier
DataFrame is a distributed collection of data organized into named columns.
Provides operations to filter, group, or compute aggregates, and can be used with Spark SQL.
Can be constructed from structured data files, existing RDDs, tables in Hive, or external
databases.
www.edureka.co/apache-spark-scala-training
DataFrame features
Ability to scale from KBs to PBs
APIs for python, java, scala, and R (in development via sparkr)
Seamless integration with all big data tooling and infrastructure via spark
State-of-the-art optimization and code generation through the spark SQL catalyst optimizer
Support for a wide array of data formats and storage systems
www.edureka.co/apache-spark-scala-training
Spark Integration with Hadoop
www.edureka.co/apache-spark-scala-training
Spark Execution Platforms
Spark can leverage the resource negotiator of Hadoop framework i.e. YARN
Spark workloads can make use of Symphony scheduling policies and execute via YARN
Spark execution
modes
Standalone Mesos HDFS
www.edureka.co/apache-spark-scala-training
Spark Features/Modules in Demand
www.edureka.co/apache-spark-scala-training
New Features In 2015
Data Frames 
• Similar API to data frames in R and Pandas
• Automatically optimised via Spark SQL
• Released in Spark 1.3
SparkR 
• Released in Spark 1.4
• Exposes DataFrames, RDD’s & ML library in R
Machine Learning Pipelines 
• High Level API
• Featurization
• Evaluation
• Model Tuning
External Data Sources 
• Platform API to plug Data-Sources into Spark
• Pushes logic into sources
www.edureka.co/apache-spark-scala-training
Spark overview
www.edureka.co/apache-spark-scala-training
Thank You
Questions/Queries/Feedback
Recording and presentation will be made available to you within 24 hours

More Related Content

What's hot

Big data Processing with Apache Spark & Scala
Big data Processing with Apache Spark & ScalaBig data Processing with Apache Spark & Scala
Big data Processing with Apache Spark & Scala
Edureka!
 

What's hot (20)

Spark for big data analytics
Spark for big data analyticsSpark for big data analytics
Spark for big data analytics
 
Apache spark - Architecture , Overview & libraries
Apache spark - Architecture , Overview & librariesApache spark - Architecture , Overview & libraries
Apache spark - Architecture , Overview & libraries
 
Spark Streaming
Spark StreamingSpark Streaming
Spark Streaming
 
Big Data Processing With Spark
Big Data Processing With SparkBig Data Processing With Spark
Big Data Processing With Spark
 
Machine Learning with SparkR
Machine Learning with SparkRMachine Learning with SparkR
Machine Learning with SparkR
 
Apache Spark PDF
Apache Spark PDFApache Spark PDF
Apache Spark PDF
 
Spark Will Replace Hadoop ! Know Why
Spark Will Replace Hadoop ! Know Why Spark Will Replace Hadoop ! Know Why
Spark Will Replace Hadoop ! Know Why
 
5 things one must know about spark!
5 things one must know about spark!5 things one must know about spark!
5 things one must know about spark!
 
Spark_Part 1
Spark_Part 1Spark_Part 1
Spark_Part 1
 
Big data Processing with Apache Spark & Scala
Big data Processing with Apache Spark & ScalaBig data Processing with Apache Spark & Scala
Big data Processing with Apache Spark & Scala
 
Spark For Faster Batch Processing
Spark For Faster Batch ProcessingSpark For Faster Batch Processing
Spark For Faster Batch Processing
 
Hadoop vs Apache Spark
Hadoop vs Apache SparkHadoop vs Apache Spark
Hadoop vs Apache Spark
 
Apache spark
Apache sparkApache spark
Apache spark
 
Hadoop vs Spark | Which One to Choose? | Hadoop Training | Spark Training | E...
Hadoop vs Spark | Which One to Choose? | Hadoop Training | Spark Training | E...Hadoop vs Spark | Which One to Choose? | Hadoop Training | Spark Training | E...
Hadoop vs Spark | Which One to Choose? | Hadoop Training | Spark Training | E...
 
Data processing with spark in r & python
Data processing with spark in r & pythonData processing with spark in r & python
Data processing with spark in r & python
 
Apache spark architecture (Big Data and Analytics)
Apache spark architecture (Big Data and Analytics)Apache spark architecture (Big Data and Analytics)
Apache spark architecture (Big Data and Analytics)
 
Lighting up Big Data Analytics with Apache Spark in Azure
Lighting up Big Data Analytics with Apache Spark in AzureLighting up Big Data Analytics with Apache Spark in Azure
Lighting up Big Data Analytics with Apache Spark in Azure
 
Big data overview
Big data overviewBig data overview
Big data overview
 
Jump Start into Apache Spark (Seattle Spark Meetup)
Jump Start into Apache Spark (Seattle Spark Meetup)Jump Start into Apache Spark (Seattle Spark Meetup)
Jump Start into Apache Spark (Seattle Spark Meetup)
 
Started with-apache-spark
Started with-apache-sparkStarted with-apache-spark
Started with-apache-spark
 

Viewers also liked

"Introduction to R Programming and Machine Learning"
"Introduction to R Programming and Machine Learning""Introduction to R Programming and Machine Learning"
"Introduction to R Programming and Machine Learning"
Edureka!
 

Viewers also liked (8)

"Introduction to R Programming and Machine Learning"
"Introduction to R Programming and Machine Learning""Introduction to R Programming and Machine Learning"
"Introduction to R Programming and Machine Learning"
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Apache Spark 2.0: Faster, Easier, and Smarter
Apache Spark 2.0: Faster, Easier, and SmarterApache Spark 2.0: Faster, Easier, and Smarter
Apache Spark 2.0: Faster, Easier, and Smarter
 
Introduction to Apache Spark Developer Training
Introduction to Apache Spark Developer TrainingIntroduction to Apache Spark Developer Training
Introduction to Apache Spark Developer Training
 
What Is Salesforce CRM? | Salesforce CRM Tutorial For Beginners | Salesforce ...
What Is Salesforce CRM? | Salesforce CRM Tutorial For Beginners | Salesforce ...What Is Salesforce CRM? | Salesforce CRM Tutorial For Beginners | Salesforce ...
What Is Salesforce CRM? | Salesforce CRM Tutorial For Beginners | Salesforce ...
 
Apache spark basics
Apache spark basicsApache spark basics
Apache spark basics
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 

Similar to 5 Reasons why Spark is in demand!

What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
Simplilearn
 
Low latency access of bigdata using spark and shark
Low latency access of bigdata using spark and sharkLow latency access of bigdata using spark and shark
Low latency access of bigdata using spark and shark
Pradeep Kumar G.S
 

Similar to 5 Reasons why Spark is in demand! (20)

5 things one must know about spark!
5 things one must know about spark!5 things one must know about spark!
5 things one must know about spark!
 
Apache spark
Apache sparkApache spark
Apache spark
 
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
 
Module01
 Module01 Module01
Module01
 
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...
 
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
 
SparkPaper
SparkPaperSparkPaper
SparkPaper
 
Introduction to spark
Introduction to sparkIntroduction to spark
Introduction to spark
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
 
spark_v1_2
spark_v1_2spark_v1_2
spark_v1_2
 
Apache Spark Introduction.pdf
Apache Spark Introduction.pdfApache Spark Introduction.pdf
Apache Spark Introduction.pdf
 
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekProcessing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeek
 
Low latency access of bigdata using spark and shark
Low latency access of bigdata using spark and sharkLow latency access of bigdata using spark and shark
Low latency access of bigdata using spark and shark
 
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
 
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 
Scalable Machine Learning with PySpark
Scalable Machine Learning with PySparkScalable Machine Learning with PySpark
Scalable Machine Learning with PySpark
 
APACHE SPARK.pptx
APACHE SPARK.pptxAPACHE SPARK.pptx
APACHE SPARK.pptx
 
Spark introduction & Architecture.pptx
Spark introduction & Architecture.pptxSpark introduction & Architecture.pptx
Spark introduction & Architecture.pptx
 
Hadoop vs spark
Hadoop vs sparkHadoop vs spark
Hadoop vs spark
 

More from Edureka!

More from Edureka! (20)

What to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaWhat to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | Edureka
 
Top 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaTop 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | Edureka
 
Top 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaTop 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | Edureka
 
Tableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaTableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | Edureka
 
Python Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaPython Programming Tutorial | Edureka
Python Programming Tutorial | Edureka
 
Top 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaTop 5 PMP Certifications | Edureka
Top 5 PMP Certifications | Edureka
 
Top Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaTop Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | Edureka
 
Linux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaLinux Mint Tutorial | Edureka
Linux Mint Tutorial | Edureka
 
How to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaHow to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| Edureka
 
Importance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaImportance of Digital Marketing | Edureka
Importance of Digital Marketing | Edureka
 
RPA in 2020 | Edureka
RPA in 2020 | EdurekaRPA in 2020 | Edureka
RPA in 2020 | Edureka
 
Email Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEmail Notifications in Jenkins | Edureka
Email Notifications in Jenkins | Edureka
 
EA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | Edureka
 
Cognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaCognitive AI Tutorial | Edureka
Cognitive AI Tutorial | Edureka
 
AWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaAWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | Edureka
 
Blue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaBlue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | Edureka
 
Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaA star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
 
Kubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaKubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | Edureka
 
Introduction to DevOps | Edureka
Introduction to DevOps | EdurekaIntroduction to DevOps | Edureka
Introduction to DevOps | Edureka
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 

5 Reasons why Spark is in demand!