What is Apache Spark - An In-Memory Data Processing Framework

•Download as PPTX, PDF•

1 like•408 views

Apache Spark is an open source framework for fast, in-memory data processing. It supports Scala, Java, Python and integrates with other technologies like SQL, streaming, and machine learning. Spark runs in a clustered environment on top of distributed file systems and can integrate with schedulers like YARN and Mesos. It can efficiently read from and write to a variety of data sources.

Software

What is Spark
Apache Spark is open source framework for fast, in-memory data processing. It
currently supports Scala, Java and Python. Besides the core libraries, there is
support for streaming, machine learning, data frames, integration with R and a
version of SQL.
EricMarshall

Spark compatibility and ecosystem
• Spark runs in a clustered environment of arbitrary size and is designed to sit on
top of a distributed file systems like HDFS, Cassandra, or S3.
• Spark integrates with schedulers including Yarn and Mesos. Spark scales well
and has deployed a cluster of 8000 nodes at the time of this writing.
•Spark can read from most all sources and has performant connectors to nosql
and sql datastores and tools like Tableau.

Spark and
Hadoop
 Spark can read from most all sources and has
performant connectors to the Hadoop eco-system,
other nosql and sql datastores and tools like
Tableau. Spark can connect to streams or work in
batches.
 Spark also can run in a stand-alone clustered
mode with HDFS or any form of shared file system
(like NFS mounted to each node with the same
path).
 Spark can run highly available. Spark is resilient to
Worker failures and will move work to other
Workers. Spark supports standby Masters or can
rely on the cluster’s scheduling software.
 Or run within Hadoop as aYarn job; reading/writng
from HFDS and connecting to other data sources.

Spark Tasks
 Spark is agnostic regarding the underlying cluster manager. Spark applications run as
independent sets of processes on a cluster, coordinated by the SparkContext object in
your main program (called the driver program).
 Specifically, to run on a cluster, the SparkContext can connect to several types of
cluster managers (either Spark’s own standalone cluster manager or Mesos/YARN),
which allocate resources across applications.
 Each application has its own executor processes: managing threads, providing
isolation between Spark contexts, also useful on the scheduling side as a unit of work.
 Spark uses resources dynamically, if configured to do so. Scaling up and down as the
work demands. (Currently only supported via Yarn)

What's hot

Low latency access of bigdata using spark and sharkPradeep Kumar G.S

Cassandra LearningEhsan Javanmard

NoSQL_DatabasesRick Perry

Hadoop An IntroductionMohanasundaram Ponnusamy

Hadoop vs Spark | Which One to Choose? | Hadoop Training | Spark Training | E...Edureka!

No sq lv2Nusrat Sharmin

Apache Spark 101 - Demi Ben-Ari - PanoraysDemi Ben-Ari

NoSQL (Non-Relational Databases)Ehsan Javanmard

Vskills Apache Cassandra sample materialVskills

Hadoop vs sparkamarkayam

Hadoop in three use casesJoey Echeverria

Hadoop introductionChirag Ahuja

Introduction to NOSQL databasesAshwani Kumar

Cloudera Hadoop DistributionThisara Pramuditha

Hadoop distributions - ecosystemJakub Stransky

Apache Spark NotesVenkateswaran Kandasamy

Introduction to hadoopChad Richeson

Analytics 3Srikanth Ayithy

Hadoopreddivarihareesh

Appache Cassandra nehabsairam

What's hot (20)

Low latency access of bigdata using spark and shark

Cassandra Learning

NoSQL_Databases

Hadoop An Introduction

Hadoop vs Spark | Which One to Choose? | Hadoop Training | Spark Training | E...

No sq lv2

Apache Spark 101 - Demi Ben-Ari - Panorays

NoSQL (Non-Relational Databases)

Vskills Apache Cassandra sample material

Hadoop vs spark

Hadoop in three use cases

Hadoop introduction

Introduction to NOSQL databases

Cloudera Hadoop Distribution

Hadoop distributions - ecosystem

Apache Spark Notes

Introduction to hadoop

Analytics 3

Hadoop

Appache Cassandra

Viewers also liked

Spark rdd part 2Kiran Krishna

Partner Cloud Solutions Event August 2016-ForlacDave Rendón

Case-for-Support-Evas-Phoenix-Rising-Capital-CampaignHelen Choi (최효경), CFRE

Usage of Reliable Actors in Azure Service FabricAlexander Laysha

Azure web apps - designing and debuggingAlexey Bokov

CAP теорема Брюера и ее применения на практикеAlexey Bokov

Pieter de Bruin (Microsoft) - Welke technologie gebruiken bij implementatie M...AFAS Software

Azure Service Fabric and the Actor Model: when did we forget Object Orientation?João Pedro Martins

コミュニティ立ち上げのときに本当にあった恐い話 Mio Konagaya

English Speaking Session: Introduction (WordCamp Tokyo 2015)Toru Miki

MLeap: Productionize Data Science Workflows Using SparkJen Aman

_s + bootstrapで始めるWordPressテーマ開発入門Hidetaka Okamoto

Spark coreFreeman Zhang

Viewers also liked (13)

Spark rdd part 2

Partner Cloud Solutions Event August 2016-Forlac

Case-for-Support-Evas-Phoenix-Rising-Capital-Campaign

Usage of Reliable Actors in Azure Service Fabric

Azure web apps - designing and debugging

CAP теорема Брюера и ее применения на практике

Pieter de Bruin (Microsoft) - Welke technologie gebruiken bij implementatie M...

Azure Service Fabric and the Actor Model: when did we forget Object Orientation?

コミュニティ立ち上げのときに本当にあった恐い話

English Speaking Session: Introduction (WordCamp Tokyo 2015)

MLeap: Productionize Data Science Workflows Using Spark

_s + bootstrapで始めるWordPressテーマ開発入門

Spark core

Similar to What is Apache Spark - An In-Memory Data Processing Framework

Apachespark 160612140708Srikrishna k

Apache Spark Introductionsudhakara st

Hadoop Spark Introduction-20150130Xuan-Chao Huang

Apache sparkDona Mary Philip

Apache Spark vs. Hadoop Is Spark Set to Replace Hadoop.pdfMounikaPolabathina

APACHE SPARK.pptxDeepaThirumurugan

Spark corePrashant Gupta

In Memory Analytics with Apache SparkVenkata Naga Ravi

spark example spark example spark examplespark examplespark examplespark exampleShidrokhGoudarzi1

Exploiting Apache Spark's Potential Changing Enormous Information Investigati...rajeshseo5

Big_data_analytics_NoSql_Module-4_SessionRUHULAMINHAZARIKA

Introduction to Apache SparkSamy Dindane

Cassandra Lunch #89: Semi-Structured Data in CassandraAnant Corporation

Apache spark architecture (Big Data and Analytics)Jyotasana Bharti

Lighting up Big Data Analytics with Apache Spark in AzureJen Stirrup

Fast Data Analytics with Spark and PythonBenjamin Bengfort

BigData & Hadoop Ecosystem.pptxBibhasDeb1

Introduction to Apache Spark :: Lagos Scala Meetup session 2 Olalekan Fuad Elesin

Module01NPN Training

Apache Spark OverviewDharmjit Singh

Similar to What is Apache Spark - An In-Memory Data Processing Framework (20)

Apachespark 160612140708

Apache Spark Introduction

Hadoop Spark Introduction-20150130

Apache spark

Apache Spark vs. Hadoop Is Spark Set to Replace Hadoop.pdf

APACHE SPARK.pptx

Spark core

In Memory Analytics with Apache Spark

spark example spark example spark examplespark examplespark examplespark example

Exploiting Apache Spark's Potential Changing Enormous Information Investigati...

Big_data_analytics_NoSql_Module-4_Session

Introduction to Apache Spark

Cassandra Lunch #89: Semi-Structured Data in Cassandra

Apache spark architecture (Big Data and Analytics)

Lighting up Big Data Analytics with Apache Spark in Azure

Fast Data Analytics with Spark and Python

BigData & Hadoop Ecosystem.pptx

Introduction to Apache Spark :: Lagos Scala Meetup session 2

Module01

Apache Spark Overview

Recently uploaded

Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service9953056974 Low Rate Call Girls In Saket, Delhi NCR

Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent

Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López

Software Coding for software engineeringssuserb3a23b

Odoo Development Company in India | Devintelle Consulting ServiceDevintelle Consulting Service Pvt Ltd Odoo OpenERP

CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies

Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts

Introduction Computer Science - Software Design.pdfFerryKemperman

VK Business Profile - provides IT solutions and Web Developmentvyaparkranti

Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ

PREDICTING RIVER WATER QUALITY ppt presentationvaddepallysandeep122

SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler

Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase

SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa

Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater

Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed

What is Fashion PLM and Why Do You Need ItWave PLM

Recruitment Management Software Benefits (Infographic)Hr365.us smith

Advantages of Odoo ERP 17 for Your BusinessEnvertis Software Solutions

Recently uploaded (20)

Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service

Comparing Linux OS Image Update Models - EOSS 2024.pdf

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...

Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...

Software Coding for software engineering

Odoo Development Company in India | Devintelle Consulting Service

CRM Contender Series: HubSpot vs. Salesforce

Odoo 14 - eLearning Module In Odoo 14 Enterprise

Introduction Computer Science - Software Design.pdf

VK Business Profile - provides IT solutions and Web Development

Cloud Data Center Network Construction - IEEE

PREDICTING RIVER WATER QUALITY ppt presentation

SensoDat: Simulation-based Sensor Dataset of Self-driving Cars

Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024

SpotFlow: Tracking Method Calls and States at Runtime

Ahmed Motair CV April 2024 (Senior SW Developer)

Unveiling Design Patterns: A Visual Guide with UML Diagrams

What is Fashion PLM and Why Do You Need It

Recruitment Management Software Benefits (Infographic)

Advantages of Odoo ERP 17 for Your Business

What is Apache Spark - An In-Memory Data Processing Framework

1. What is Spark Apache Spark is open source framework for fast, in-memory data processing. It currently supports Scala, Java and Python. Besides the core libraries, there is support for streaming, machine learning, data frames, integration with R and a version of SQL. EricMarshall

2. Spark compatibility and ecosystem • Spark runs in a clustered environment of arbitrary size and is designed to sit on top of a distributed file systems like HDFS, Cassandra, or S3. • Spark integrates with schedulers including Yarn and Mesos. Spark scales well and has deployed a cluster of 8000 nodes at the time of this writing. •Spark can read from most all sources and has performant connectors to nosql and sql datastores and tools like Tableau.

3. Spark and Hadoop  Spark can read from most all sources and has performant connectors to the Hadoop eco-system, other nosql and sql datastores and tools like Tableau. Spark can connect to streams or work in batches.  Spark also can run in a stand-alone clustered mode with HDFS or any form of shared file system (like NFS mounted to each node with the same path).  Spark can run highly available. Spark is resilient to Worker failures and will move work to other Workers. Spark supports standby Masters or can rely on the cluster’s scheduling software.  Or run within Hadoop as aYarn job; reading/writng from HFDS and connecting to other data sources.

4. Spark Tasks  Spark is agnostic regarding the underlying cluster manager. Spark applications run as independent sets of processes on a cluster, coordinated by the SparkContext object in your main program (called the driver program).  Specifically, to run on a cluster, the SparkContext can connect to several types of cluster managers (either Spark’s own standalone cluster manager or Mesos/YARN), which allocate resources across applications.  Each application has its own executor processes: managing threads, providing isolation between Spark contexts, also useful on the scheduling side as a unit of work.  Spark uses resources dynamically, if configured to do so. Scaling up and down as the work demands. (Currently only supported via Yarn)

What is Apache Spark - An In-Memory Data Processing Framework

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (13)

Similar to What is Apache Spark - An In-Memory Data Processing Framework

Similar to What is Apache Spark - An In-Memory Data Processing Framework (20)

More from ericwilliammarshall

More from ericwilliammarshall (7)

Recently uploaded

Recently uploaded (20)

What is Apache Spark - An In-Memory Data Processing Framework