SlideShare a Scribd company logo
What	is	Apache	Spark?
Apache	Spark	is	a	large-scale	data	processing	consolidated	analytical	engine.	Apache	Spark	is	a	data	processing	system	that	can
perform	very	large	data	sets	computing	tasks	easily	and	can	also	spread	data	processing	tasks	on	several	machines,	on	their	own,
or	in	combination	with	other	distributed	programming	resources.	They	are	important	to	big	data	and	machine	learning.
Apache	Spark	software	consists	of	two	main	components	on	the	basic	level:	a	driver	which	transforms	the	user’s	code	into
several	tasks	which	can	be	spread	across	worker	nodes	and	executors	that	run	on	and	carry	out	the	assigned	tasks	at	those	node
levels.	To	mediate	between	the	two,	some	sort	of	cluster	manager	is	needed.
Features	which	make	Spark	one	of	the	big	data	platforms	most	widely	used	are:
It	provides	advanced	analytical	support:
Spark	not	only	supports	basic	‘maps,’	but	supports	even	advanced	analysis,	ML	and	graph	algorithms,	and	supports	SQL	queries,
streaming	of	datasets,	and	advanced	analytics.	It	features	a	versatile	stack	of	libraries	such	as	Machine	Learning,	graphs,	and
MLlib	SQL	&	DataFrames.	What	is	interesting	is	that	Spark	allows	all	these	libraries	to	be	integrated	into	a	single	workflow.
Easy	Usage:
Spark	can	be	used	in	Java,	Scala,	Python,	and	R	for	scalable	applications.	Developers,	therefore,	have	the	ability	to	build	and	run
Spark	apps	in	their	favorite	programming	languages.	In	addition,	Spark	has	an	interconnected	range	of	more	than	80	operating
companies.	You	can	use	Spark	for	Scala,	Python,	R,	and	SQL	shells	to	query	knowledge	interactively.
Fast	processing	speed:
The	processing	of	vast	quantities	of	structured	data	requires	all	big	data	analysis.	Therefore,	as	far	as	large	data	processing	is
concerned,	companies	and	organizations	want	such	a	framework	that	massive	data	can	be	processed	at	high	speed.	Spark	apps
can	operate	on	disk	in	Hadoop	clusters	up	to	100	times	faster	and	10	times	faster.	It	uses	the	Resilient	Distributed	Dataset	to
make	it	possible	for	Spark	to	store	memory	data	transparently	and	to	only	record	if	required.	This	can	reduce	the	reading	and
writing	time	of	most	disks	during	data	processing.
Flexibility:
Spark	can	run	separately	or	on	Hadoop	YARN,	Apache	Mesos,	Kubernetes,	and	even	on	the	cloud.	In	addition,	various	data
sources	can	be	accessed.	For	example,	Spark	will	run	on	the	YARN	cluster	manager	and	read	any	Hadoop	data	already	available.
The	data	sources	like	Hbase,	HDFS,	Hive,	and	Cassandra	can	be	read	from	any	Hadoop	info.	Spark	is	an	excellent	method	for
migrating	pure	Hadoop	software,	whether	the	code	is	spark-friendly.
Processing	of	real-time	sources:
Spark	is	designed	to	process	data	streaming	in	real-time.	Although	MapReduce	is	constructed	to	manage	and	process	data
already	stored	in	Hadoop	clusters,	Spark	can	handle	and	manipulate	data	on	Spark	Streaming	in	real-time.	In	comparison	to
other	streaming	approaches,	Spark	Streaming	restores	lost	work	and	provides	the	exact	semantics	of	the	out-of-the-box	without
needing	additional	code	or	configuration.	It	also	helps	you	to	reuse	the	same	batch	and	stream	code	and	even	to	add	stream	data
to	historical	data.
Conclusion:	Spark	is	an	incredibly	multifaceted	big	data	platform	with	an	amazing	functionality.	Since	it’s	an	open-source
platform,	it	actively	enhances	and	expands	with	additional	functionality	and	functionalities.	For	the	diversification	and	extension
of	applications	in	big	data,	Apache	Spark	will	use	cases.
Learnbay	is	a	one-stop	solution	for	all	your	Data	Science	and	AI-related	queries,	as	we	are	specialized	in	Data	Science	and
Artificial	Intelligence	Training	to	the	professionals	who	want	to	pursue	their	career	in	Data	Science	and	Artificial
Intelligence.	This	is	one	of	the	best	places	to	study	Data	Science	and	Artificial	Intelligence	as	the	courses	provided	here	covers	all
the	essential	concepts	of	the	subject,	it	helps	aspirants	to	effectively	understand	and	practice	the	concepts	with	various	real-time
projects.
Twitter Facebook
krishna-kumar-learnbay
August	25,	2020
Uncategorized
Data	Science	Artificial	Intelligence	enthusiastic	and	founder	of	Learnbay	and	workvista	Co-works.	9+	Years	of	industry
experience	in	Python	,Embedded	Systems,	Database	and	IOT.Organiser	of	Data	Science,	Artificial	Intelligence	,	Python	and
Block	chain	Meet-up	groups	Bangalore	View	more	posts
	Like
Be	the	first	to	like	this.
Related
Pursuing	a	career	in	Artificial	Intelligence
with	years	of	experience	in	other	domains.
Importance	of	operationalizing	Big	Data
Analytics	in	day-to-day	activities!
Data	Science	and	Mental	Health	Awareness!
Enter	your	comment	here...Enter	your	comment	here...
Privacy	&	Cookies:	This	site	uses	cookies.	By	continuing	to	use	this	website,	you	agree	to	their	use.	
To	find	out	more,	including	how	to	control	cookies,	see	here:	Cookie	Policy

More Related Content

What's hot

Introduction of ssis
Introduction of ssisIntroduction of ssis
Introduction of ssis
deepakk073
 
Hana Architecture
Hana ArchitectureHana Architecture
Hana Architecture
Basha Shaik
 

What's hot (20)

Big Data Processing With Spark
Big Data Processing With SparkBig Data Processing With Spark
Big Data Processing With Spark
 
Real timeeventmonitoringsystem(1)
Real timeeventmonitoringsystem(1)Real timeeventmonitoringsystem(1)
Real timeeventmonitoringsystem(1)
 
Hadoop versus spark
Hadoop versus sparkHadoop versus spark
Hadoop versus spark
 
Spark Summit EU talk by Pat Patterson
Spark Summit EU talk by Pat PattersonSpark Summit EU talk by Pat Patterson
Spark Summit EU talk by Pat Patterson
 
Atlanta MLConf
Atlanta MLConfAtlanta MLConf
Atlanta MLConf
 
Spark SQL | Apache Spark
Spark SQL | Apache SparkSpark SQL | Apache Spark
Spark SQL | Apache Spark
 
Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)
 
Oracle Database | Computer Science
Oracle Database | Computer ScienceOracle Database | Computer Science
Oracle Database | Computer Science
 
Introduction of ssis
Introduction of ssisIntroduction of ssis
Introduction of ssis
 
Advanced analytics with R and SQL
Advanced analytics with R and SQLAdvanced analytics with R and SQL
Advanced analytics with R and SQL
 
Apache spark
Apache spark Apache spark
Apache spark
 
5 reasons why spark is in demand!
5 reasons why spark is in demand!5 reasons why spark is in demand!
5 reasons why spark is in demand!
 
Hana Architecture
Hana ArchitectureHana Architecture
Hana Architecture
 
What is Apache spark
What is Apache sparkWhat is Apache spark
What is Apache spark
 
Webinar - Introduction to Azure Data Lake
Webinar - Introduction to Azure Data LakeWebinar - Introduction to Azure Data Lake
Webinar - Introduction to Azure Data Lake
 
Data provisioning in SAP HANA
Data provisioning in SAP HANAData provisioning in SAP HANA
Data provisioning in SAP HANA
 
Wengines, Workflows, and 2 years of advanced data processing in Apache OODT
Wengines, Workflows, and 2 years of advanced data processing in Apache OODTWengines, Workflows, and 2 years of advanced data processing in Apache OODT
Wengines, Workflows, and 2 years of advanced data processing in Apache OODT
 
Enabling SQL Access to Data Lakes
Enabling SQL Access to Data LakesEnabling SQL Access to Data Lakes
Enabling SQL Access to Data Lakes
 
Azure Data Factory Data Flows Training v005
Azure Data Factory Data Flows Training v005Azure Data Factory Data Flows Training v005
Azure Data Factory Data Flows Training v005
 
Spark Will Replace Hadoop ! Know Why
Spark Will Replace Hadoop ! Know Why Spark Will Replace Hadoop ! Know Why
Spark Will Replace Hadoop ! Know Why
 

Similar to Dive into the new features of apache spark

Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Simplilearn
 

Similar to Dive into the new features of apache spark (20)

sparkbigdataanlyticspoweerpointpptt.pptx
sparkbigdataanlyticspoweerpointpptt.pptxsparkbigdataanlyticspoweerpointpptt.pptx
sparkbigdataanlyticspoweerpointpptt.pptx
 
Exploiting Apache Spark's Potential Changing Enormous Information Investigati...
Exploiting Apache Spark's Potential Changing Enormous Information Investigati...Exploiting Apache Spark's Potential Changing Enormous Information Investigati...
Exploiting Apache Spark's Potential Changing Enormous Information Investigati...
 
Apache spark
Apache sparkApache spark
Apache spark
 
BigData & Hadoop Ecosystem.pptx
BigData & Hadoop Ecosystem.pptxBigData & Hadoop Ecosystem.pptx
BigData & Hadoop Ecosystem.pptx
 
Introduction to spark
Introduction to sparkIntroduction to spark
Introduction to spark
 
Started with-apache-spark
Started with-apache-sparkStarted with-apache-spark
Started with-apache-spark
 
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
 
Apache spark
Apache sparkApache spark
Apache spark
 
RDBMS vs Hadoop vs Spark
RDBMS vs Hadoop vs SparkRDBMS vs Hadoop vs Spark
RDBMS vs Hadoop vs Spark
 
SparkPaper
SparkPaperSparkPaper
SparkPaper
 
Lighting up Big Data Analytics with Apache Spark in Azure
Lighting up Big Data Analytics with Apache Spark in AzureLighting up Big Data Analytics with Apache Spark in Azure
Lighting up Big Data Analytics with Apache Spark in Azure
 
Apache Spark PDF
Apache Spark PDFApache Spark PDF
Apache Spark PDF
 
Apache Spark Notes
Apache Spark NotesApache Spark Notes
Apache Spark Notes
 
Apachespark 160612140708
Apachespark 160612140708Apachespark 160612140708
Apachespark 160612140708
 
Apache spark
Apache sparkApache spark
Apache spark
 
How Microsoft Synapse Analytics Can Transform Your Data Analytics.pdf
How Microsoft Synapse Analytics Can Transform Your Data Analytics.pdfHow Microsoft Synapse Analytics Can Transform Your Data Analytics.pdf
How Microsoft Synapse Analytics Can Transform Your Data Analytics.pdf
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 
IOT.ppt
IOT.pptIOT.ppt
IOT.ppt
 
Why spark by Stratio - v.1.0
Why spark by Stratio - v.1.0Why spark by Stratio - v.1.0
Why spark by Stratio - v.1.0
 
Big_data_analytics_NoSql_Module-4_Session
Big_data_analytics_NoSql_Module-4_SessionBig_data_analytics_NoSql_Module-4_Session
Big_data_analytics_NoSql_Module-4_Session
 

More from Learnbay Datascience

More from Learnbay Datascience (20)

Top data science projects
Top data science projectsTop data science projects
Top data science projects
 
Python my SQL - create table
Python my SQL - create tablePython my SQL - create table
Python my SQL - create table
 
Python my SQL - create database
Python my SQL - create databasePython my SQL - create database
Python my SQL - create database
 
Python my sql database connection
Python my sql   database connectionPython my sql   database connection
Python my sql database connection
 
Python - mySOL
Python - mySOLPython - mySOL
Python - mySOL
 
AI - Issues and Terminology
AI - Issues and TerminologyAI - Issues and Terminology
AI - Issues and Terminology
 
AI - Fuzzy Logic Systems
AI - Fuzzy Logic SystemsAI - Fuzzy Logic Systems
AI - Fuzzy Logic Systems
 
AI - working of an ns
AI - working of an nsAI - working of an ns
AI - working of an ns
 
Artificial Intelligence- Neural Networks
Artificial Intelligence- Neural NetworksArtificial Intelligence- Neural Networks
Artificial Intelligence- Neural Networks
 
AI - Robotics
AI - RoboticsAI - Robotics
AI - Robotics
 
Applications of expert system
Applications of expert systemApplications of expert system
Applications of expert system
 
Components of expert systems
Components of expert systemsComponents of expert systems
Components of expert systems
 
Artificial intelligence - expert systems
 Artificial intelligence - expert systems Artificial intelligence - expert systems
Artificial intelligence - expert systems
 
AI - natural language processing
AI - natural language processingAI - natural language processing
AI - natural language processing
 
Ai popular search algorithms
Ai   popular search algorithmsAi   popular search algorithms
Ai popular search algorithms
 
AI - Agents & Environments
AI - Agents & EnvironmentsAI - Agents & Environments
AI - Agents & Environments
 
Artificial intelligence - research areas
Artificial intelligence - research areasArtificial intelligence - research areas
Artificial intelligence - research areas
 
Artificial intelligence composed
Artificial intelligence composedArtificial intelligence composed
Artificial intelligence composed
 
Artificial intelligence intelligent systems
Artificial intelligence   intelligent systemsArtificial intelligence   intelligent systems
Artificial intelligence intelligent systems
 
Applications of ai
Applications of aiApplications of ai
Applications of ai
 

Recently uploaded

Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Peter Udo Diehl
 

Recently uploaded (20)

Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
НАДІЯ ФЕДЮШКО БАЦ «Професійне зростання QA спеціаліста»
НАДІЯ ФЕДЮШКО БАЦ  «Професійне зростання QA спеціаліста»НАДІЯ ФЕДЮШКО БАЦ  «Професійне зростання QA спеціаліста»
НАДІЯ ФЕДЮШКО БАЦ «Професійне зростання QA спеціаліста»
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 

Dive into the new features of apache spark