SlideShare a Scribd company logo
1 of 4
What is Spark
Apache Spark is open source framework for fast, in-memory data processing. It
currently supports Scala, Java and Python. Besides the core libraries, there is
support for streaming, machine learning, data frames, integration with R and a
version of SQL.
EricMarshall
Spark compatibility and ecosystem
• Spark runs in a clustered environment of arbitrary size and is designed to sit on
top of a distributed file systems like HDFS, Cassandra, or S3.
• Spark integrates with schedulers including Yarn and Mesos. Spark scales well
and has deployed a cluster of 8000 nodes at the time of this writing.
•Spark can read from most all sources and has performant connectors to nosql
and sql datastores and tools like Tableau.
Spark and
Hadoop
 Spark can read from most all sources and has
performant connectors to the Hadoop eco-system,
other nosql and sql datastores and tools like
Tableau. Spark can connect to streams or work in
batches.
 Spark also can run in a stand-alone clustered
mode with HDFS or any form of shared file system
(like NFS mounted to each node with the same
path).
 Spark can run highly available. Spark is resilient to
Worker failures and will move work to other
Workers. Spark supports standby Masters or can
rely on the cluster’s scheduling software.
 Or run within Hadoop as aYarn job; reading/writng
from HFDS and connecting to other data sources.
Spark Tasks
 Spark is agnostic regarding the underlying cluster manager. Spark applications run as
independent sets of processes on a cluster, coordinated by the SparkContext object in
your main program (called the driver program).
 Specifically, to run on a cluster, the SparkContext can connect to several types of
cluster managers (either Spark’s own standalone cluster manager or Mesos/YARN),
which allocate resources across applications.
 Each application has its own executor processes: managing threads, providing
isolation between Spark contexts, also useful on the scheduling side as a unit of work.
 Spark uses resources dynamically, if configured to do so. Scaling up and down as the
work demands. (Currently only supported via Yarn)

More Related Content

What's hot

Low latency access of bigdata using spark and shark
Low latency access of bigdata using spark and sharkLow latency access of bigdata using spark and shark
Low latency access of bigdata using spark and sharkPradeep Kumar G.S
 
NoSQL_Databases
NoSQL_DatabasesNoSQL_Databases
NoSQL_DatabasesRick Perry
 
Hadoop vs Spark | Which One to Choose? | Hadoop Training | Spark Training | E...
Hadoop vs Spark | Which One to Choose? | Hadoop Training | Spark Training | E...Hadoop vs Spark | Which One to Choose? | Hadoop Training | Spark Training | E...
Hadoop vs Spark | Which One to Choose? | Hadoop Training | Spark Training | E...Edureka!
 
Apache Spark 101 - Demi Ben-Ari - Panorays
Apache Spark 101 - Demi Ben-Ari - PanoraysApache Spark 101 - Demi Ben-Ari - Panorays
Apache Spark 101 - Demi Ben-Ari - PanoraysDemi Ben-Ari
 
NoSQL (Non-Relational Databases)
NoSQL (Non-Relational Databases)NoSQL (Non-Relational Databases)
NoSQL (Non-Relational Databases)Ehsan Javanmard
 
Vskills Apache Cassandra sample material
Vskills Apache Cassandra sample materialVskills Apache Cassandra sample material
Vskills Apache Cassandra sample materialVskills
 
Hadoop vs spark
Hadoop vs sparkHadoop vs spark
Hadoop vs sparkamarkayam
 
Hadoop in three use cases
Hadoop in three use casesHadoop in three use cases
Hadoop in three use casesJoey Echeverria
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introductionChirag Ahuja
 
Introduction to NOSQL databases
Introduction to NOSQL databasesIntroduction to NOSQL databases
Introduction to NOSQL databasesAshwani Kumar
 
Hadoop distributions - ecosystem
Hadoop distributions - ecosystemHadoop distributions - ecosystem
Hadoop distributions - ecosystemJakub Stransky
 
Introduction to hadoop
Introduction to hadoopIntroduction to hadoop
Introduction to hadoopChad Richeson
 
Appache Cassandra
Appache Cassandra  Appache Cassandra
Appache Cassandra nehabsairam
 

What's hot (20)

Low latency access of bigdata using spark and shark
Low latency access of bigdata using spark and sharkLow latency access of bigdata using spark and shark
Low latency access of bigdata using spark and shark
 
Cassandra Learning
Cassandra LearningCassandra Learning
Cassandra Learning
 
NoSQL_Databases
NoSQL_DatabasesNoSQL_Databases
NoSQL_Databases
 
Hadoop An Introduction
Hadoop An IntroductionHadoop An Introduction
Hadoop An Introduction
 
Hadoop vs Spark | Which One to Choose? | Hadoop Training | Spark Training | E...
Hadoop vs Spark | Which One to Choose? | Hadoop Training | Spark Training | E...Hadoop vs Spark | Which One to Choose? | Hadoop Training | Spark Training | E...
Hadoop vs Spark | Which One to Choose? | Hadoop Training | Spark Training | E...
 
No sq lv2
No sq lv2No sq lv2
No sq lv2
 
Apache Spark 101 - Demi Ben-Ari - Panorays
Apache Spark 101 - Demi Ben-Ari - PanoraysApache Spark 101 - Demi Ben-Ari - Panorays
Apache Spark 101 - Demi Ben-Ari - Panorays
 
NoSQL (Non-Relational Databases)
NoSQL (Non-Relational Databases)NoSQL (Non-Relational Databases)
NoSQL (Non-Relational Databases)
 
Vskills Apache Cassandra sample material
Vskills Apache Cassandra sample materialVskills Apache Cassandra sample material
Vskills Apache Cassandra sample material
 
Hadoop vs spark
Hadoop vs sparkHadoop vs spark
Hadoop vs spark
 
Hadoop in three use cases
Hadoop in three use casesHadoop in three use cases
Hadoop in three use cases
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Introduction to NOSQL databases
Introduction to NOSQL databasesIntroduction to NOSQL databases
Introduction to NOSQL databases
 
Cloudera Hadoop Distribution
Cloudera Hadoop DistributionCloudera Hadoop Distribution
Cloudera Hadoop Distribution
 
Hadoop distributions - ecosystem
Hadoop distributions - ecosystemHadoop distributions - ecosystem
Hadoop distributions - ecosystem
 
Apache Spark Notes
Apache Spark NotesApache Spark Notes
Apache Spark Notes
 
Introduction to hadoop
Introduction to hadoopIntroduction to hadoop
Introduction to hadoop
 
Analytics 3
Analytics 3Analytics 3
Analytics 3
 
Hadoop
HadoopHadoop
Hadoop
 
Appache Cassandra
Appache Cassandra  Appache Cassandra
Appache Cassandra
 

Viewers also liked

Partner Cloud Solutions Event August 2016-Forlac
Partner Cloud Solutions Event August 2016-ForlacPartner Cloud Solutions Event August 2016-Forlac
Partner Cloud Solutions Event August 2016-ForlacDave Rendón
 
Case-for-Support-Evas-Phoenix-Rising-Capital-Campaign
Case-for-Support-Evas-Phoenix-Rising-Capital-CampaignCase-for-Support-Evas-Phoenix-Rising-Capital-Campaign
Case-for-Support-Evas-Phoenix-Rising-Capital-CampaignHelen Choi (최효경), CFRE
 
Usage of Reliable Actors in Azure Service Fabric
Usage of Reliable Actors in Azure Service FabricUsage of Reliable Actors in Azure Service Fabric
Usage of Reliable Actors in Azure Service FabricAlexander Laysha
 
Azure web apps - designing and debugging
Azure web apps  - designing and debuggingAzure web apps  - designing and debugging
Azure web apps - designing and debuggingAlexey Bokov
 
CAP теорема Брюера и ее применения на практике
CAP теорема Брюера и ее применения на практикеCAP теорема Брюера и ее применения на практике
CAP теорема Брюера и ее применения на практикеAlexey Bokov
 
Pieter de Bruin (Microsoft) - Welke technologie gebruiken bij implementatie M...
Pieter de Bruin (Microsoft) - Welke technologie gebruiken bij implementatie M...Pieter de Bruin (Microsoft) - Welke technologie gebruiken bij implementatie M...
Pieter de Bruin (Microsoft) - Welke technologie gebruiken bij implementatie M...AFAS Software
 
Azure Service Fabric and the Actor Model: when did we forget Object Orientation?
Azure Service Fabric and the Actor Model: when did we forget Object Orientation?Azure Service Fabric and the Actor Model: when did we forget Object Orientation?
Azure Service Fabric and the Actor Model: when did we forget Object Orientation?João Pedro Martins
 
コミュニティ立ち上げのときに本当にあった恐い話
コミュニティ立ち上げのときに本当にあった恐い話 コミュニティ立ち上げのときに本当にあった恐い話
コミュニティ立ち上げのときに本当にあった恐い話 Mio Konagaya
 
English Speaking Session: Introduction (WordCamp Tokyo 2015)
English Speaking Session: Introduction (WordCamp Tokyo 2015)English Speaking Session: Introduction (WordCamp Tokyo 2015)
English Speaking Session: Introduction (WordCamp Tokyo 2015)Toru Miki
 
MLeap: Productionize Data Science Workflows Using Spark
MLeap: Productionize Data Science Workflows Using SparkMLeap: Productionize Data Science Workflows Using Spark
MLeap: Productionize Data Science Workflows Using SparkJen Aman
 
_s + bootstrapで始めるWordPressテーマ開発入門
_s + bootstrapで始めるWordPressテーマ開発入門_s + bootstrapで始めるWordPressテーマ開発入門
_s + bootstrapで始めるWordPressテーマ開発入門Hidetaka Okamoto
 

Viewers also liked (13)

Spark rdd part 2
Spark rdd part 2Spark rdd part 2
Spark rdd part 2
 
Partner Cloud Solutions Event August 2016-Forlac
Partner Cloud Solutions Event August 2016-ForlacPartner Cloud Solutions Event August 2016-Forlac
Partner Cloud Solutions Event August 2016-Forlac
 
Case-for-Support-Evas-Phoenix-Rising-Capital-Campaign
Case-for-Support-Evas-Phoenix-Rising-Capital-CampaignCase-for-Support-Evas-Phoenix-Rising-Capital-Campaign
Case-for-Support-Evas-Phoenix-Rising-Capital-Campaign
 
Usage of Reliable Actors in Azure Service Fabric
Usage of Reliable Actors in Azure Service FabricUsage of Reliable Actors in Azure Service Fabric
Usage of Reliable Actors in Azure Service Fabric
 
Azure web apps - designing and debugging
Azure web apps  - designing and debuggingAzure web apps  - designing and debugging
Azure web apps - designing and debugging
 
CAP теорема Брюера и ее применения на практике
CAP теорема Брюера и ее применения на практикеCAP теорема Брюера и ее применения на практике
CAP теорема Брюера и ее применения на практике
 
Pieter de Bruin (Microsoft) - Welke technologie gebruiken bij implementatie M...
Pieter de Bruin (Microsoft) - Welke technologie gebruiken bij implementatie M...Pieter de Bruin (Microsoft) - Welke technologie gebruiken bij implementatie M...
Pieter de Bruin (Microsoft) - Welke technologie gebruiken bij implementatie M...
 
Azure Service Fabric and the Actor Model: when did we forget Object Orientation?
Azure Service Fabric and the Actor Model: when did we forget Object Orientation?Azure Service Fabric and the Actor Model: when did we forget Object Orientation?
Azure Service Fabric and the Actor Model: when did we forget Object Orientation?
 
コミュニティ立ち上げのときに本当にあった恐い話
コミュニティ立ち上げのときに本当にあった恐い話 コミュニティ立ち上げのときに本当にあった恐い話
コミュニティ立ち上げのときに本当にあった恐い話
 
English Speaking Session: Introduction (WordCamp Tokyo 2015)
English Speaking Session: Introduction (WordCamp Tokyo 2015)English Speaking Session: Introduction (WordCamp Tokyo 2015)
English Speaking Session: Introduction (WordCamp Tokyo 2015)
 
MLeap: Productionize Data Science Workflows Using Spark
MLeap: Productionize Data Science Workflows Using SparkMLeap: Productionize Data Science Workflows Using Spark
MLeap: Productionize Data Science Workflows Using Spark
 
_s + bootstrapで始めるWordPressテーマ開発入門
_s + bootstrapで始めるWordPressテーマ開発入門_s + bootstrapで始めるWordPressテーマ開発入門
_s + bootstrapで始めるWordPressテーマ開発入門
 
Spark core
Spark coreSpark core
Spark core
 

Similar to What is Apache Spark - An In-Memory Data Processing Framework

Apachespark 160612140708
Apachespark 160612140708Apachespark 160612140708
Apachespark 160612140708Srikrishna k
 
Apache Spark Introduction
Apache Spark IntroductionApache Spark Introduction
Apache Spark Introductionsudhakara st
 
Hadoop Spark Introduction-20150130
Hadoop Spark Introduction-20150130Hadoop Spark Introduction-20150130
Hadoop Spark Introduction-20150130Xuan-Chao Huang
 
Apache Spark vs. Hadoop Is Spark Set to Replace Hadoop.pdf
Apache Spark vs. Hadoop Is Spark Set to Replace Hadoop.pdfApache Spark vs. Hadoop Is Spark Set to Replace Hadoop.pdf
Apache Spark vs. Hadoop Is Spark Set to Replace Hadoop.pdfMounikaPolabathina
 
In Memory Analytics with Apache Spark
In Memory Analytics with Apache SparkIn Memory Analytics with Apache Spark
In Memory Analytics with Apache SparkVenkata Naga Ravi
 
spark example spark example spark examplespark examplespark examplespark example
spark example spark example spark examplespark examplespark examplespark examplespark example spark example spark examplespark examplespark examplespark example
spark example spark example spark examplespark examplespark examplespark exampleShidrokhGoudarzi1
 
Exploiting Apache Spark's Potential Changing Enormous Information Investigati...
Exploiting Apache Spark's Potential Changing Enormous Information Investigati...Exploiting Apache Spark's Potential Changing Enormous Information Investigati...
Exploiting Apache Spark's Potential Changing Enormous Information Investigati...rajeshseo5
 
Big_data_analytics_NoSql_Module-4_Session
Big_data_analytics_NoSql_Module-4_SessionBig_data_analytics_NoSql_Module-4_Session
Big_data_analytics_NoSql_Module-4_SessionRUHULAMINHAZARIKA
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache SparkSamy Dindane
 
Cassandra Lunch #89: Semi-Structured Data in Cassandra
Cassandra Lunch #89: Semi-Structured Data in CassandraCassandra Lunch #89: Semi-Structured Data in Cassandra
Cassandra Lunch #89: Semi-Structured Data in CassandraAnant Corporation
 
Apache spark architecture (Big Data and Analytics)
Apache spark architecture (Big Data and Analytics)Apache spark architecture (Big Data and Analytics)
Apache spark architecture (Big Data and Analytics)Jyotasana Bharti
 
Lighting up Big Data Analytics with Apache Spark in Azure
Lighting up Big Data Analytics with Apache Spark in AzureLighting up Big Data Analytics with Apache Spark in Azure
Lighting up Big Data Analytics with Apache Spark in AzureJen Stirrup
 
Fast Data Analytics with Spark and Python
Fast Data Analytics with Spark and PythonFast Data Analytics with Spark and Python
Fast Data Analytics with Spark and PythonBenjamin Bengfort
 
BigData & Hadoop Ecosystem.pptx
BigData & Hadoop Ecosystem.pptxBigData & Hadoop Ecosystem.pptx
BigData & Hadoop Ecosystem.pptxBibhasDeb1
 
Introduction to Apache Spark :: Lagos Scala Meetup session 2
Introduction to Apache Spark :: Lagos Scala Meetup session 2 Introduction to Apache Spark :: Lagos Scala Meetup session 2
Introduction to Apache Spark :: Lagos Scala Meetup session 2 Olalekan Fuad Elesin
 

Similar to What is Apache Spark - An In-Memory Data Processing Framework (20)

Apachespark 160612140708
Apachespark 160612140708Apachespark 160612140708
Apachespark 160612140708
 
Apache Spark Introduction
Apache Spark IntroductionApache Spark Introduction
Apache Spark Introduction
 
Hadoop Spark Introduction-20150130
Hadoop Spark Introduction-20150130Hadoop Spark Introduction-20150130
Hadoop Spark Introduction-20150130
 
Apache spark
Apache sparkApache spark
Apache spark
 
Apache Spark vs. Hadoop Is Spark Set to Replace Hadoop.pdf
Apache Spark vs. Hadoop Is Spark Set to Replace Hadoop.pdfApache Spark vs. Hadoop Is Spark Set to Replace Hadoop.pdf
Apache Spark vs. Hadoop Is Spark Set to Replace Hadoop.pdf
 
APACHE SPARK.pptx
APACHE SPARK.pptxAPACHE SPARK.pptx
APACHE SPARK.pptx
 
Spark core
Spark coreSpark core
Spark core
 
In Memory Analytics with Apache Spark
In Memory Analytics with Apache SparkIn Memory Analytics with Apache Spark
In Memory Analytics with Apache Spark
 
spark example spark example spark examplespark examplespark examplespark example
spark example spark example spark examplespark examplespark examplespark examplespark example spark example spark examplespark examplespark examplespark example
spark example spark example spark examplespark examplespark examplespark example
 
Exploiting Apache Spark's Potential Changing Enormous Information Investigati...
Exploiting Apache Spark's Potential Changing Enormous Information Investigati...Exploiting Apache Spark's Potential Changing Enormous Information Investigati...
Exploiting Apache Spark's Potential Changing Enormous Information Investigati...
 
Big_data_analytics_NoSql_Module-4_Session
Big_data_analytics_NoSql_Module-4_SessionBig_data_analytics_NoSql_Module-4_Session
Big_data_analytics_NoSql_Module-4_Session
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Cassandra Lunch #89: Semi-Structured Data in Cassandra
Cassandra Lunch #89: Semi-Structured Data in CassandraCassandra Lunch #89: Semi-Structured Data in Cassandra
Cassandra Lunch #89: Semi-Structured Data in Cassandra
 
Apache spark architecture (Big Data and Analytics)
Apache spark architecture (Big Data and Analytics)Apache spark architecture (Big Data and Analytics)
Apache spark architecture (Big Data and Analytics)
 
Lighting up Big Data Analytics with Apache Spark in Azure
Lighting up Big Data Analytics with Apache Spark in AzureLighting up Big Data Analytics with Apache Spark in Azure
Lighting up Big Data Analytics with Apache Spark in Azure
 
Fast Data Analytics with Spark and Python
Fast Data Analytics with Spark and PythonFast Data Analytics with Spark and Python
Fast Data Analytics with Spark and Python
 
BigData & Hadoop Ecosystem.pptx
BigData & Hadoop Ecosystem.pptxBigData & Hadoop Ecosystem.pptx
BigData & Hadoop Ecosystem.pptx
 
Introduction to Apache Spark :: Lagos Scala Meetup session 2
Introduction to Apache Spark :: Lagos Scala Meetup session 2 Introduction to Apache Spark :: Lagos Scala Meetup session 2
Introduction to Apache Spark :: Lagos Scala Meetup session 2
 
Module01
 Module01 Module01
Module01
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 

More from ericwilliammarshall (7)

Nosql
NosqlNosql
Nosql
 
File maker for yap
File maker for yapFile maker for yap
File maker for yap
 
Web arch gfdl
Web arch gfdlWeb arch gfdl
Web arch gfdl
 
Shibboleth
ShibbolethShibboleth
Shibboleth
 
Condor
CondorCondor
Condor
 
Hadoop for sysadmins
Hadoop for sysadminsHadoop for sysadmins
Hadoop for sysadmins
 
high performance computing exposed
high performance computing exposedhigh performance computing exposed
high performance computing exposed
 

Recently uploaded

Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
Software Coding for software engineering
Software Coding for software engineeringSoftware Coding for software engineering
Software Coding for software engineeringssuserb3a23b
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Developmentvyaparkranti
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentationvaddepallysandeep122
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 

Recently uploaded (20)

Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
Software Coding for software engineering
Software Coding for software engineeringSoftware Coding for software engineering
Software Coding for software engineering
 
Odoo Development Company in India | Devintelle Consulting Service
Odoo Development Company in India | Devintelle Consulting ServiceOdoo Development Company in India | Devintelle Consulting Service
Odoo Development Company in India | Devintelle Consulting Service
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Development
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentation
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your BusinessAdvantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
 

What is Apache Spark - An In-Memory Data Processing Framework

  • 1. What is Spark Apache Spark is open source framework for fast, in-memory data processing. It currently supports Scala, Java and Python. Besides the core libraries, there is support for streaming, machine learning, data frames, integration with R and a version of SQL. EricMarshall
  • 2. Spark compatibility and ecosystem • Spark runs in a clustered environment of arbitrary size and is designed to sit on top of a distributed file systems like HDFS, Cassandra, or S3. • Spark integrates with schedulers including Yarn and Mesos. Spark scales well and has deployed a cluster of 8000 nodes at the time of this writing. •Spark can read from most all sources and has performant connectors to nosql and sql datastores and tools like Tableau.
  • 3. Spark and Hadoop  Spark can read from most all sources and has performant connectors to the Hadoop eco-system, other nosql and sql datastores and tools like Tableau. Spark can connect to streams or work in batches.  Spark also can run in a stand-alone clustered mode with HDFS or any form of shared file system (like NFS mounted to each node with the same path).  Spark can run highly available. Spark is resilient to Worker failures and will move work to other Workers. Spark supports standby Masters or can rely on the cluster’s scheduling software.  Or run within Hadoop as aYarn job; reading/writng from HFDS and connecting to other data sources.
  • 4. Spark Tasks  Spark is agnostic regarding the underlying cluster manager. Spark applications run as independent sets of processes on a cluster, coordinated by the SparkContext object in your main program (called the driver program).  Specifically, to run on a cluster, the SparkContext can connect to several types of cluster managers (either Spark’s own standalone cluster manager or Mesos/YARN), which allocate resources across applications.  Each application has its own executor processes: managing threads, providing isolation between Spark contexts, also useful on the scheduling side as a unit of work.  Spark uses resources dynamically, if configured to do so. Scaling up and down as the work demands. (Currently only supported via Yarn)