SlideShare a Scribd company logo
SUBMITTED TO: SUBMITTED BY:
Mrs. Suman singh Nikita Vijay
(HOD of CSE Dept.) B. Tech –VIII sem(CSE)
A SEMINAR PRESENTATION ON
“Introduction To Apache Spark”
● Need of new generation distributed system
● Hardware/software evolution in last decade
● Apache Spark
• Components of Apache Spark
● Why Spark?
● Who are using Spark?
Agenda
● Lot has been changed from 2000
● Both hardware and software gone through changes
● Big data has become necessity now
● Let’s look at what changed over decade
Why we need new generation?
● Disk was cheap so disk was primary source of data
● Network was costly so data locality
● RAM was very costly
● Single core machines were dominant
RAM is the king
• RAM is primary source of data and we use disk for
fallback
● Network is speedier
● Multi core machines are commonplace
State of hardware in 2000
Now
● Object orientation was the king
● Software optimized for single core
● No open frameworks for creating
○ Distributed storage
○ Distributed processing
● SQL was the only dominant way for data analysis
Now
•Functional programming is on rise
● Software needs to exploit multiple cores on single node
There are good frameworks to create distributed systems
○ HDFS for storage
● NoSQL is real alternative now
Software in 2000
● Very few companies had big data issue
● Batch processing system ruled the world
● Volume was big concern compare to velocity
● Mostly used for
○ Search
○ Log analysis
● All companies use big data
● Velocity is as much concern as volume
Needs of real time are as much important as batch
processing
Big Data processing needs in 2000
NOW
• A fast and general engine for large scale data
processing
• Created by AMPLab
• Written in Scala
•Licensed under Apache
Apache Spark
Spark streaming
graphX
MLlib
Apache sql
Benefits of a Unified Platform
• No copying of data between systems
•Combine processing types in one program
• Code reuse
• One system to learn
• One system to maintain
Mesos, a distributed system framework as class project
in UC Berkeley in 2009.
● Spark to test how mesos works
● Focused on
○ Iterative programs (ML)
○ Unifying real time and batch processing
● Open sourced in 2010
History of Apache Spark
● You can spark on top any distributed system
● It can run on
○ Yarn
○ Apache Mesos
○ It’s own cluster
Runs everywhere
● Apache Spark is highly modular
The original version contained only 1600 lines of scala
code
● Apache Spark API is extremely simple compared Java
API of M/R
● API is concise and consistent
Small and Simple
Source : http://spark-summit.org/wp-content/uploads/2013/10/Zaharia-spark-summit-2013-
• In Spark, you can cache hdfs data in main memory of
worker nodes
• Spark analysis can be executed directly on in memory
data
● Shuffling also can be done from in memory
● Fault tolerant
In-memory aka Speed
● No separate storage layer
● Integrates well with HDFS
● Can run on Hadoop 1.0 and Hadoop 2.0 YARN
● Excellent integration with ecosystem projects like
Apache Hive, HBase etc
Integration with Hadoop
● Written in Scala but API is not limited to it
● Offers API in
○ Scala
○ Java
○ Python
● You can also do SQL using SparkSQL
Multi language API
Who are using Spark
seminar presentation on apache-spark
seminar presentation on apache-spark

More Related Content

What's hot

Challenges in Building a Data Pipeline
Challenges in Building a Data PipelineChallenges in Building a Data Pipeline
Challenges in Building a Data Pipeline
Manish Kumar
 
Big data Question bank.pdf
Big data Question bank.pdfBig data Question bank.pdf
Big data Question bank.pdf
Sitamarhi Institute of Technology
 
Introduction to Nebula Graph, an Open-Source Distributed Graph Database
Introduction to Nebula Graph, an Open-Source Distributed Graph DatabaseIntroduction to Nebula Graph, an Open-Source Distributed Graph Database
Introduction to Nebula Graph, an Open-Source Distributed Graph Database
Nebula Graph
 
Spark vs Hadoop
Spark vs HadoopSpark vs Hadoop
Spark vs Hadoop
Olesya Eidam
 
Using pySpark with Google Colab & Spark 3.0 preview
Using pySpark with Google Colab & Spark 3.0 previewUsing pySpark with Google Colab & Spark 3.0 preview
Using pySpark with Google Colab & Spark 3.0 preview
Mario Cartia
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
Prashant Gupta
 
Big Data Analytics with Spark
Big Data Analytics with SparkBig Data Analytics with Spark
Big Data Analytics with Spark
Mohammed Guller
 
Running Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration OptionsRunning Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration Options
Timothy Spann
 
Introduction to Pig
Introduction to PigIntroduction to Pig
Introduction to Pig
Prashanth Babu
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
Philippe Julio
 
Data Science Across Data Sources with Apache Arrow
Data Science Across Data Sources with Apache ArrowData Science Across Data Sources with Apache Arrow
Data Science Across Data Sources with Apache Arrow
Databricks
 
Big Data Storage - Comparing Speed and Features for Avro, JSON, ORC, and Parquet
Big Data Storage - Comparing Speed and Features for Avro, JSON, ORC, and ParquetBig Data Storage - Comparing Speed and Features for Avro, JSON, ORC, and Parquet
Big Data Storage - Comparing Speed and Features for Avro, JSON, ORC, and Parquet
DataWorks Summit
 
When apache pulsar meets apache flink
When apache pulsar meets apache flinkWhen apache pulsar meets apache flink
When apache pulsar meets apache flink
StreamNative
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
Simplilearn
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
DataWorks Summit
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
What Is DATA MINING(INTRODUCTION)
What Is DATA MINING(INTRODUCTION)What Is DATA MINING(INTRODUCTION)
What Is DATA MINING(INTRODUCTION)
Pratik Tambekar
 
Data Streaming For Big Data
Data Streaming For Big DataData Streaming For Big Data
Data Streaming For Big Data
Seval Çapraz
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
Dr. C.V. Suresh Babu
 
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a ServiceZeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
Databricks
 

What's hot (20)

Challenges in Building a Data Pipeline
Challenges in Building a Data PipelineChallenges in Building a Data Pipeline
Challenges in Building a Data Pipeline
 
Big data Question bank.pdf
Big data Question bank.pdfBig data Question bank.pdf
Big data Question bank.pdf
 
Introduction to Nebula Graph, an Open-Source Distributed Graph Database
Introduction to Nebula Graph, an Open-Source Distributed Graph DatabaseIntroduction to Nebula Graph, an Open-Source Distributed Graph Database
Introduction to Nebula Graph, an Open-Source Distributed Graph Database
 
Spark vs Hadoop
Spark vs HadoopSpark vs Hadoop
Spark vs Hadoop
 
Using pySpark with Google Colab & Spark 3.0 preview
Using pySpark with Google Colab & Spark 3.0 previewUsing pySpark with Google Colab & Spark 3.0 preview
Using pySpark with Google Colab & Spark 3.0 preview
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Big Data Analytics with Spark
Big Data Analytics with SparkBig Data Analytics with Spark
Big Data Analytics with Spark
 
Running Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration OptionsRunning Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration Options
 
Introduction to Pig
Introduction to PigIntroduction to Pig
Introduction to Pig
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Data Science Across Data Sources with Apache Arrow
Data Science Across Data Sources with Apache ArrowData Science Across Data Sources with Apache Arrow
Data Science Across Data Sources with Apache Arrow
 
Big Data Storage - Comparing Speed and Features for Avro, JSON, ORC, and Parquet
Big Data Storage - Comparing Speed and Features for Avro, JSON, ORC, and ParquetBig Data Storage - Comparing Speed and Features for Avro, JSON, ORC, and Parquet
Big Data Storage - Comparing Speed and Features for Avro, JSON, ORC, and Parquet
 
When apache pulsar meets apache flink
When apache pulsar meets apache flinkWhen apache pulsar meets apache flink
When apache pulsar meets apache flink
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
What Is DATA MINING(INTRODUCTION)
What Is DATA MINING(INTRODUCTION)What Is DATA MINING(INTRODUCTION)
What Is DATA MINING(INTRODUCTION)
 
Data Streaming For Big Data
Data Streaming For Big DataData Streaming For Big Data
Data Streaming For Big Data
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a ServiceZeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
 

Similar to seminar presentation on apache-spark

Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
datamantra
 
Spark introduction and architecture
Spark introduction and architectureSpark introduction and architecture
Spark introduction and architecture
Sohil Jain
 
Spark introduction and architecture
Spark introduction and architectureSpark introduction and architecture
Spark introduction and architecture
Sohil Jain
 
Apache Spark vs Apache Flink
Apache Spark vs Apache FlinkApache Spark vs Apache Flink
Apache Spark vs Apache Flink
AKASH SIHAG
 
Apache Spark in Industry
Apache Spark in IndustryApache Spark in Industry
Apache Spark in Industry
Dorian Beganovic
 
Spark Streaming and MLlib - Hyderabad Spark Group
Spark Streaming and MLlib - Hyderabad Spark GroupSpark Streaming and MLlib - Hyderabad Spark Group
Spark Streaming and MLlib - Hyderabad Spark Group
Phaneendra Chiruvella
 
Apache Spark Fundamentals
Apache Spark FundamentalsApache Spark Fundamentals
Apache Spark Fundamentals
Zahra Eskandari
 
Getting Started with Spark Scala
Getting Started with Spark ScalaGetting Started with Spark Scala
Getting Started with Spark Scala
Knoldus Inc.
 
Stream, stream, stream: Different streaming methods with Spark and Kafka
Stream, stream, stream: Different streaming methods with Spark and KafkaStream, stream, stream: Different streaming methods with Spark and Kafka
Stream, stream, stream: Different streaming methods with Spark and Kafka
Itai Yaffe
 
spark example spark example spark examplespark examplespark examplespark example
spark example spark example spark examplespark examplespark examplespark examplespark example spark example spark examplespark examplespark examplespark example
spark example spark example spark examplespark examplespark examplespark example
ShidrokhGoudarzi1
 
Stream, Stream, Stream: Different Streaming Methods with Apache Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Apache Spark and KafkaStream, Stream, Stream: Different Streaming Methods with Apache Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Apache Spark and Kafka
Databricks
 
Apache Spark for Beginners
Apache Spark for BeginnersApache Spark for Beginners
Apache Spark for Beginners
Anirudh
 
BlackRay - The open Source Data Engine
BlackRay - The open Source Data EngineBlackRay - The open Source Data Engine
BlackRay - The open Source Data Engine
fschupp
 
Big Data Processing with Apache Spark 2014
Big Data Processing with Apache Spark 2014Big Data Processing with Apache Spark 2014
Big Data Processing with Apache Spark 2014
mahchiev
 
Introduction to Apache Beam
Introduction to Apache BeamIntroduction to Apache Beam
Introduction to Apache Beam
Knoldus Inc.
 
Spark Workshop
Spark WorkshopSpark Workshop
Spark Workshop
Navid Kalaei
 
.NET per la Data Science e oltre
.NET per la Data Science e oltre.NET per la Data Science e oltre
.NET per la Data Science e oltre
Marco Parenzan
 
Stream, Stream, Stream: Different Streaming Methods with Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Spark and KafkaStream, Stream, Stream: Different Streaming Methods with Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Spark and Kafka
DataWorks Summit
 
The structured streaming upgrade to Apache Spark and how enterprises can bene...
The structured streaming upgrade to Apache Spark and how enterprises can bene...The structured streaming upgrade to Apache Spark and how enterprises can bene...
The structured streaming upgrade to Apache Spark and how enterprises can bene...
Impetus Technologies
 
Spark Uber Development Kit
Spark Uber Development KitSpark Uber Development Kit
Spark Uber Development Kit
DataWorks Summit/Hadoop Summit
 

Similar to seminar presentation on apache-spark (20)

Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Spark introduction and architecture
Spark introduction and architectureSpark introduction and architecture
Spark introduction and architecture
 
Spark introduction and architecture
Spark introduction and architectureSpark introduction and architecture
Spark introduction and architecture
 
Apache Spark vs Apache Flink
Apache Spark vs Apache FlinkApache Spark vs Apache Flink
Apache Spark vs Apache Flink
 
Apache Spark in Industry
Apache Spark in IndustryApache Spark in Industry
Apache Spark in Industry
 
Spark Streaming and MLlib - Hyderabad Spark Group
Spark Streaming and MLlib - Hyderabad Spark GroupSpark Streaming and MLlib - Hyderabad Spark Group
Spark Streaming and MLlib - Hyderabad Spark Group
 
Apache Spark Fundamentals
Apache Spark FundamentalsApache Spark Fundamentals
Apache Spark Fundamentals
 
Getting Started with Spark Scala
Getting Started with Spark ScalaGetting Started with Spark Scala
Getting Started with Spark Scala
 
Stream, stream, stream: Different streaming methods with Spark and Kafka
Stream, stream, stream: Different streaming methods with Spark and KafkaStream, stream, stream: Different streaming methods with Spark and Kafka
Stream, stream, stream: Different streaming methods with Spark and Kafka
 
spark example spark example spark examplespark examplespark examplespark example
spark example spark example spark examplespark examplespark examplespark examplespark example spark example spark examplespark examplespark examplespark example
spark example spark example spark examplespark examplespark examplespark example
 
Stream, Stream, Stream: Different Streaming Methods with Apache Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Apache Spark and KafkaStream, Stream, Stream: Different Streaming Methods with Apache Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Apache Spark and Kafka
 
Apache Spark for Beginners
Apache Spark for BeginnersApache Spark for Beginners
Apache Spark for Beginners
 
BlackRay - The open Source Data Engine
BlackRay - The open Source Data EngineBlackRay - The open Source Data Engine
BlackRay - The open Source Data Engine
 
Big Data Processing with Apache Spark 2014
Big Data Processing with Apache Spark 2014Big Data Processing with Apache Spark 2014
Big Data Processing with Apache Spark 2014
 
Introduction to Apache Beam
Introduction to Apache BeamIntroduction to Apache Beam
Introduction to Apache Beam
 
Spark Workshop
Spark WorkshopSpark Workshop
Spark Workshop
 
.NET per la Data Science e oltre
.NET per la Data Science e oltre.NET per la Data Science e oltre
.NET per la Data Science e oltre
 
Stream, Stream, Stream: Different Streaming Methods with Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Spark and KafkaStream, Stream, Stream: Different Streaming Methods with Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Spark and Kafka
 
The structured streaming upgrade to Apache Spark and how enterprises can bene...
The structured streaming upgrade to Apache Spark and how enterprises can bene...The structured streaming upgrade to Apache Spark and how enterprises can bene...
The structured streaming upgrade to Apache Spark and how enterprises can bene...
 
Spark Uber Development Kit
Spark Uber Development KitSpark Uber Development Kit
Spark Uber Development Kit
 

More from Jawhar Ali

seminar report on What is ransomware
seminar report on What is ransomwareseminar report on What is ransomware
seminar report on What is ransomware
Jawhar Ali
 
seminar report on Sql injection
seminar report on Sql injectionseminar report on Sql injection
seminar report on Sql injection
Jawhar Ali
 
seminar report on kingapp application
seminar report on kingapp applicationseminar report on kingapp application
seminar report on kingapp application
Jawhar Ali
 
seminar report on school management system
seminar report on school management systemseminar report on school management system
seminar report on school management system
Jawhar Ali
 
seminar presentation on Face ricognition technology
seminar presentation on Face ricognition technologyseminar presentation on Face ricognition technology
seminar presentation on Face ricognition technology
Jawhar Ali
 
seminar presentation on Digital Jwellery
seminar presentation on Digital Jwelleryseminar presentation on Digital Jwellery
seminar presentation on Digital Jwellery
Jawhar Ali
 
powerpoint presentation on sixth sense Technology
powerpoint presentation  on sixth sense Technologypowerpoint presentation  on sixth sense Technology
powerpoint presentation on sixth sense Technology
Jawhar Ali
 
Powerpoint presentation on 5G wireless technology
Powerpoint presentation on 5G wireless technologyPowerpoint presentation on 5G wireless technology
Powerpoint presentation on 5G wireless technology
Jawhar Ali
 
powerpoint presentation on Google glass
powerpoint presentation on Google glasspowerpoint presentation on Google glass
powerpoint presentation on Google glass
Jawhar Ali
 
Table Of Contents Google Glass
Table Of Contents Google GlassTable Of Contents Google Glass
Table Of Contents Google Glass
Jawhar Ali
 
introduction and abstract on Google Glass Major report
introduction and abstract on  Google Glass Major reportintroduction and abstract on  Google Glass Major report
introduction and abstract on Google Glass Major report
Jawhar Ali
 
Candidate declaration on Google Glass
Candidate declaration on Google GlassCandidate declaration on Google Glass
Candidate declaration on Google Glass
Jawhar Ali
 
front Page on Google Glass
 front Page on Google Glass front Page on Google Glass
front Page on Google Glass
Jawhar Ali
 
Table of contents on blood bank management system
Table of contents on blood bank management systemTable of contents on blood bank management system
Table of contents on blood bank management system
Jawhar Ali
 
List of figures in Blood bank management system
List of figures in Blood bank management systemList of figures in Blood bank management system
List of figures in Blood bank management system
Jawhar Ali
 
Full report on blood bank management system
Full report on  blood bank management systemFull report on  blood bank management system
Full report on blood bank management system
Jawhar Ali
 
Cand declaration
Cand declaration Cand declaration
Cand declaration
Jawhar Ali
 
Training report on web developing
Training report on web developingTraining report on web developing
Training report on web developing
Jawhar Ali
 
seminar report on wireless Sensor network
seminar report on wireless Sensor networkseminar report on wireless Sensor network
seminar report on wireless Sensor network
Jawhar Ali
 
Cloud computing ppt
Cloud computing pptCloud computing ppt
Cloud computing ppt
Jawhar Ali
 

More from Jawhar Ali (20)

seminar report on What is ransomware
seminar report on What is ransomwareseminar report on What is ransomware
seminar report on What is ransomware
 
seminar report on Sql injection
seminar report on Sql injectionseminar report on Sql injection
seminar report on Sql injection
 
seminar report on kingapp application
seminar report on kingapp applicationseminar report on kingapp application
seminar report on kingapp application
 
seminar report on school management system
seminar report on school management systemseminar report on school management system
seminar report on school management system
 
seminar presentation on Face ricognition technology
seminar presentation on Face ricognition technologyseminar presentation on Face ricognition technology
seminar presentation on Face ricognition technology
 
seminar presentation on Digital Jwellery
seminar presentation on Digital Jwelleryseminar presentation on Digital Jwellery
seminar presentation on Digital Jwellery
 
powerpoint presentation on sixth sense Technology
powerpoint presentation  on sixth sense Technologypowerpoint presentation  on sixth sense Technology
powerpoint presentation on sixth sense Technology
 
Powerpoint presentation on 5G wireless technology
Powerpoint presentation on 5G wireless technologyPowerpoint presentation on 5G wireless technology
Powerpoint presentation on 5G wireless technology
 
powerpoint presentation on Google glass
powerpoint presentation on Google glasspowerpoint presentation on Google glass
powerpoint presentation on Google glass
 
Table Of Contents Google Glass
Table Of Contents Google GlassTable Of Contents Google Glass
Table Of Contents Google Glass
 
introduction and abstract on Google Glass Major report
introduction and abstract on  Google Glass Major reportintroduction and abstract on  Google Glass Major report
introduction and abstract on Google Glass Major report
 
Candidate declaration on Google Glass
Candidate declaration on Google GlassCandidate declaration on Google Glass
Candidate declaration on Google Glass
 
front Page on Google Glass
 front Page on Google Glass front Page on Google Glass
front Page on Google Glass
 
Table of contents on blood bank management system
Table of contents on blood bank management systemTable of contents on blood bank management system
Table of contents on blood bank management system
 
List of figures in Blood bank management system
List of figures in Blood bank management systemList of figures in Blood bank management system
List of figures in Blood bank management system
 
Full report on blood bank management system
Full report on  blood bank management systemFull report on  blood bank management system
Full report on blood bank management system
 
Cand declaration
Cand declaration Cand declaration
Cand declaration
 
Training report on web developing
Training report on web developingTraining report on web developing
Training report on web developing
 
seminar report on wireless Sensor network
seminar report on wireless Sensor networkseminar report on wireless Sensor network
seminar report on wireless Sensor network
 
Cloud computing ppt
Cloud computing pptCloud computing ppt
Cloud computing ppt
 

Recently uploaded

The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
Vivekanand Anglo Vedic Academy
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
Sandy Millin
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
Tamralipta Mahavidyalaya
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
EverAndrsGuerraGuerr
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
Jisc
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
beazzy04
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
DeeptiGupta154
 
The Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonThe Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve Thomason
Steve Thomason
 
Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)
rosedainty
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Jisc
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Thiyagu K
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
JosvitaDsouza2
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
Anna Sz.
 
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxMARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
bennyroshan06
 
Fish and Chips - have they had their chips
Fish and Chips - have they had their chipsFish and Chips - have they had their chips
Fish and Chips - have they had their chips
GeoBlogs
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS Module
Celine George
 
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdfESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
Fundacja Rozwoju Społeczeństwa Przedsiębiorczego
 
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
Nguyen Thanh Tu Collection
 

Recently uploaded (20)

The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
 
The Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonThe Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve Thomason
 
Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
 
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxMARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
 
Fish and Chips - have they had their chips
Fish and Chips - have they had their chipsFish and Chips - have they had their chips
Fish and Chips - have they had their chips
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS Module
 
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdfESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
 
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
 

seminar presentation on apache-spark

  • 1. SUBMITTED TO: SUBMITTED BY: Mrs. Suman singh Nikita Vijay (HOD of CSE Dept.) B. Tech –VIII sem(CSE) A SEMINAR PRESENTATION ON “Introduction To Apache Spark”
  • 2. ● Need of new generation distributed system ● Hardware/software evolution in last decade ● Apache Spark • Components of Apache Spark ● Why Spark? ● Who are using Spark? Agenda
  • 3. ● Lot has been changed from 2000 ● Both hardware and software gone through changes ● Big data has become necessity now ● Let’s look at what changed over decade Why we need new generation?
  • 4. ● Disk was cheap so disk was primary source of data ● Network was costly so data locality ● RAM was very costly ● Single core machines were dominant RAM is the king • RAM is primary source of data and we use disk for fallback ● Network is speedier ● Multi core machines are commonplace State of hardware in 2000 Now
  • 5. ● Object orientation was the king ● Software optimized for single core ● No open frameworks for creating ○ Distributed storage ○ Distributed processing ● SQL was the only dominant way for data analysis Now •Functional programming is on rise ● Software needs to exploit multiple cores on single node There are good frameworks to create distributed systems ○ HDFS for storage ● NoSQL is real alternative now Software in 2000
  • 6. ● Very few companies had big data issue ● Batch processing system ruled the world ● Volume was big concern compare to velocity ● Mostly used for ○ Search ○ Log analysis ● All companies use big data ● Velocity is as much concern as volume Needs of real time are as much important as batch processing Big Data processing needs in 2000 NOW
  • 7. • A fast and general engine for large scale data processing • Created by AMPLab • Written in Scala •Licensed under Apache Apache Spark
  • 9.
  • 10. Benefits of a Unified Platform • No copying of data between systems •Combine processing types in one program • Code reuse • One system to learn • One system to maintain
  • 11. Mesos, a distributed system framework as class project in UC Berkeley in 2009. ● Spark to test how mesos works ● Focused on ○ Iterative programs (ML) ○ Unifying real time and batch processing ● Open sourced in 2010 History of Apache Spark
  • 12. ● You can spark on top any distributed system ● It can run on ○ Yarn ○ Apache Mesos ○ It’s own cluster Runs everywhere
  • 13. ● Apache Spark is highly modular The original version contained only 1600 lines of scala code ● Apache Spark API is extremely simple compared Java API of M/R ● API is concise and consistent Small and Simple
  • 15. • In Spark, you can cache hdfs data in main memory of worker nodes • Spark analysis can be executed directly on in memory data ● Shuffling also can be done from in memory ● Fault tolerant In-memory aka Speed
  • 16. ● No separate storage layer ● Integrates well with HDFS ● Can run on Hadoop 1.0 and Hadoop 2.0 YARN ● Excellent integration with ecosystem projects like Apache Hive, HBase etc Integration with Hadoop
  • 17. ● Written in Scala but API is not limited to it ● Offers API in ○ Scala ○ Java ○ Python ● You can also do SQL using SparkSQL Multi language API
  • 18. Who are using Spark