SlideShare a Scribd company logo
1 of 15
Download to read offline
Coming up to speed on Spark
Please send any comments to:
Adarsh Pannu
adarshrp@us.ibm.com
Intro
What is Spark? How does it
relate to Hadoop? When would
you use it?
1-2 hours
Basic Understand basic technology
and write simple programs
1-2 days
Intermediate
Start enabling customers in the
field, hand-holding them
through problems and issues.
5-15 days and
more
Expert
Know Spark inside out even if
you don’t intend to contribute to
the project itself.
Weeks to months
Intro Spark
Go through these presentations to understand the value of Spark. These speakers also
attempt to differentiate Spark from Hadoop, and enumerate its comparative strengths. (Not
much code here)
!  Turning Data into Value, Ion Stoica, Spark Summit 2013 Video & Slides 25 mins
!  An Overview of Apache Spark, Jim Scott, Video Slides 1 hr 06 mins
!  How Companies are Using Spark, and Where the Edge in Big Data Will Be, Matei
Zaharia, Video & Slides 12 mins
!  Spark Fundamentals I (Lesson 1 only), Big Data University <20 mins
Basic Spark
!  Pick up some Scala through this article co-
authored by Scala’s creator, Martin Odersky.
Link
Estimated time: 2 hours
Basic Spark (contd.)
!  Do these two courses. They cover Spark basics and include a
certification. You can use the supplied Docker images for all other
labs.
7 hours
Basic Spark (contd.)
!  Go to spark.apache.org and study the Overview and the
Spark Programming Guide. Many online courses borrow
liberally from this material. Information on this site is
updated with every new Spark release.
Estimated 7-8 hours.
Intermediate Spark
!  Stay at spark.apache.org. Go through the component specific Programming Guides as
well as the sections on Deploying and More. Browse the Spark API as needed.
Estimated time 3-5 days and more.
Intermediate Spark (contd.)
Learn about the operational aspects of Spark:
!  Advanced Apache Spark (DevOps) Video Slides 6 hours " EXCELLENT!
!  Tuning and Debugging Spark Video Slides 48 mins
!  How-to: Tune Your Apache Spark Jobs Link ~ 1 hour
!  (Tons of other presentations, to be listed later)
Gain a high-level understanding of Spark architecture:
!  Introduction to AmpLab Spark Internals, Matei Zaharia (Databricks), Video 1 hr 15 mins
!  A Deeper Understanding of Spark Internals, Aaron Davidson (Databricks) Video PDF
44 mins
Intermediate Spark (contd.)
Experiment, experiment, experiment ... “Play the role of the customer”
!  Setup your personal 3-4 node cluster
!  Download some “open” data. E.g. “airline” data on stat-computing.org/dataexpo/2009/
!  Write some code, make it run, see how it performs, tune it, trouble-shoot it
!  Experiment with different deployment modes (Standalone + YARN)
!  Play with different configuration knobs, check out dashboards, etc.
!  Explore all subcomponents (especially Core, SQL, MLLib)
Read the original academic papers
!  Resilient Distributed Datasets: A Fault-
Tolerant Abstraction for In-Memory Cluster
Computing, Matei Zaharia, et. al.
!  Discretized Streams: An Efficient and Fault-
Tolerant Model for Stream Processing on
Large Clusters, Matei Zaharia, et. al.
!  GraphX: A Resilient Distributed Graph
System on Spark, Reynold S. Xin, et. al.
!  Spark SQL: Relational Data Processing in
Spark, Michael Armbrust, et. al.
Advanced Spark: Original Papers
Advanced Spark: Enhance your Scala skills
This book by
Odersky is arduously
long and isn’t meant
to give you a quick
start.
!  Use this as your
primary Scala text
!  Excellent MooC by Odersky. Some of
the material is meant for CS majors.
Highly recommended for STC
developers.
35+ hours
Advanced Spark: Browse Conference Proceedings
Spark Summits cover technology and use cases. Technology is also covered in various other places so
you could consider skipping those tracks. Don’t forget to check out the customer stories. That is how we
learn about enablement opportunities and challenges, and in some cases, we can see through the
Spark hype ☺
100+ hours of FREE videos and associated PDFs available on spark-summit.org. You don’t even have
to pay the conference fee! Go back in time and “attend” these conferences!
We can produce a smaller ”watch list” of important videos and publish that internally.
Advanced Spark: Browse YouTube Videos
YouTube is full of training videos, some good, other not so much. These are the
only channels you need to watch though. There is a lot of repetition in the
material, and some of the videos are from the conferences mentioned earlier.
Advanced Spark: Check out these books
Provides a good overview of Spark but
much of the material is also available
through other sources previously
mentioned. Could be skipped.
!  Covers concrete statistical analysis /
machine learning use cases. Covers
Spark APIs and MLLib. Highly
recommended for data scientists.
Advanced Spark: Yes ... read the code
Even if you don’t intend to contribute to Spark, there are a ton of valuable comments in the code that
provide insights into Spark’s design. Don’t be shy! Go to github.com/apache/spark and check it to out.

More Related Content

Viewers also liked

Nmap(network mapping)
Nmap(network mapping)Nmap(network mapping)
Nmap(network mapping)shwetha mk
 
Detection of Idle Stealth Port Scan Attack in Network Intrusion Detection Sys...
Detection of Idle Stealth Port Scan Attack in Network Intrusion Detection Sys...Detection of Idle Stealth Port Scan Attack in Network Intrusion Detection Sys...
Detection of Idle Stealth Port Scan Attack in Network Intrusion Detection Sys...skpatel91
 
Hacking With Nmap - Scanning Techniques
Hacking With Nmap - Scanning TechniquesHacking With Nmap - Scanning Techniques
Hacking With Nmap - Scanning Techniquesamiable_indian
 
Hadoop and Big Data Overview
Hadoop and Big Data OverviewHadoop and Big Data Overview
Hadoop and Big Data OverviewPrabhu Thukkaram
 
Apache Spark and Oracle Stream Analytics
Apache Spark and Oracle Stream AnalyticsApache Spark and Oracle Stream Analytics
Apache Spark and Oracle Stream AnalyticsPrabhu Thukkaram
 
A Deep Dive into Structured Streaming in Apache Spark
A Deep Dive into Structured Streaming in Apache Spark A Deep Dive into Structured Streaming in Apache Spark
A Deep Dive into Structured Streaming in Apache Spark Anyscale
 
Apache Kafka lessons learned @PAYBACK
Apache Kafka lessons learned @PAYBACKApache Kafka lessons learned @PAYBACK
Apache Kafka lessons learned @PAYBACKMaxim Shelest
 
Dive into Spark Streaming
Dive into Spark StreamingDive into Spark Streaming
Dive into Spark StreamingGerard Maas
 
Exploring language classification with spark and the spark notebook
Exploring language classification with spark and the spark notebookExploring language classification with spark and the spark notebook
Exploring language classification with spark and the spark notebookGerard Maas
 
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...Lightbend
 
Double Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSenseDouble Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSenseHortonworks
 
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsAnyscale
 
Deep Learning with Apache Spark: an Introduction
Deep Learning with Apache Spark: an IntroductionDeep Learning with Apache Spark: an Introduction
Deep Learning with Apache Spark: an IntroductionEmanuele Bezzi
 
Witgesterde blauwborst powerpoint
Witgesterde blauwborst powerpointWitgesterde blauwborst powerpoint
Witgesterde blauwborst powerpointArie Yuri Bakker
 
تلاش‏هاى جهانى جهت كنترل تسليحات و خلع سلاح
تلاش‏هاى جهانى جهت كنترل تسليحات و خلع سلاحتلاش‏هاى جهانى جهت كنترل تسليحات و خلع سلاح
تلاش‏هاى جهانى جهت كنترل تسليحات و خلع سلاحMajid Zavari
 

Viewers also liked (18)

Nmap(network mapping)
Nmap(network mapping)Nmap(network mapping)
Nmap(network mapping)
 
Detection of Idle Stealth Port Scan Attack in Network Intrusion Detection Sys...
Detection of Idle Stealth Port Scan Attack in Network Intrusion Detection Sys...Detection of Idle Stealth Port Scan Attack in Network Intrusion Detection Sys...
Detection of Idle Stealth Port Scan Attack in Network Intrusion Detection Sys...
 
Understanding NMAP
Understanding NMAPUnderstanding NMAP
Understanding NMAP
 
Nmap Basics
Nmap BasicsNmap Basics
Nmap Basics
 
Hacking With Nmap - Scanning Techniques
Hacking With Nmap - Scanning TechniquesHacking With Nmap - Scanning Techniques
Hacking With Nmap - Scanning Techniques
 
Hadoop and Big Data Overview
Hadoop and Big Data OverviewHadoop and Big Data Overview
Hadoop and Big Data Overview
 
Apache Spark and Oracle Stream Analytics
Apache Spark and Oracle Stream AnalyticsApache Spark and Oracle Stream Analytics
Apache Spark and Oracle Stream Analytics
 
Spark on YARN
Spark on YARNSpark on YARN
Spark on YARN
 
A Deep Dive into Structured Streaming in Apache Spark
A Deep Dive into Structured Streaming in Apache Spark A Deep Dive into Structured Streaming in Apache Spark
A Deep Dive into Structured Streaming in Apache Spark
 
Apache Kafka lessons learned @PAYBACK
Apache Kafka lessons learned @PAYBACKApache Kafka lessons learned @PAYBACK
Apache Kafka lessons learned @PAYBACK
 
Dive into Spark Streaming
Dive into Spark StreamingDive into Spark Streaming
Dive into Spark Streaming
 
Exploring language classification with spark and the spark notebook
Exploring language classification with spark and the spark notebookExploring language classification with spark and the spark notebook
Exploring language classification with spark and the spark notebook
 
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...
 
Double Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSenseDouble Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSense
 
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
 
Deep Learning with Apache Spark: an Introduction
Deep Learning with Apache Spark: an IntroductionDeep Learning with Apache Spark: an Introduction
Deep Learning with Apache Spark: an Introduction
 
Witgesterde blauwborst powerpoint
Witgesterde blauwborst powerpointWitgesterde blauwborst powerpoint
Witgesterde blauwborst powerpoint
 
تلاش‏هاى جهانى جهت كنترل تسليحات و خلع سلاح
تلاش‏هاى جهانى جهت كنترل تسليحات و خلع سلاحتلاش‏هاى جهانى جهت كنترل تسليحات و خلع سلاح
تلاش‏هاى جهانى جهت كنترل تسليحات و خلع سلاح
 

Similar to Apache Spark: Coming up to speed

Getting Started with Spark Scala
Getting Started with Spark ScalaGetting Started with Spark Scala
Getting Started with Spark ScalaKnoldus Inc.
 
Contributing to Apache Spark 3
Contributing to Apache Spark 3Contributing to Apache Spark 3
Contributing to Apache Spark 3Holden Karau
 
Apexand visualforcearchitecture
Apexand visualforcearchitectureApexand visualforcearchitecture
Apexand visualforcearchitectureCMR WORLD TECH
 
Rg apexand visualforcearchitecture
Rg apexand visualforcearchitectureRg apexand visualforcearchitecture
Rg apexand visualforcearchitectureCMR WORLD TECH
 
99 Apache Spark interview questions for professionals - https://www.amazon.co...
99 Apache Spark interview questions for professionals - https://www.amazon.co...99 Apache Spark interview questions for professionals - https://www.amazon.co...
99 Apache Spark interview questions for professionals - https://www.amazon.co...Yogesh Kumar
 
TiConf NYC - Documenting Your Titanium Applications
TiConf NYC - Documenting Your Titanium ApplicationsTiConf NYC - Documenting Your Titanium Applications
TiConf NYC - Documenting Your Titanium ApplicationsJamil Spain
 
Documenting apps ti confnyc
Documenting apps   ti confnycDocumenting apps   ti confnyc
Documenting apps ti confnycJamil Spain
 
Building Commercial Applications with Oracle Applications Express by Scott Sp...
Building Commercial Applications with Oracle Applications Express by Scott Sp...Building Commercial Applications with Oracle Applications Express by Scott Sp...
Building Commercial Applications with Oracle Applications Express by Scott Sp...Enkitec
 
Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...
Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...
Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...Edureka!
 
Learn Apache Spark: A Comprehensive Guide
Learn Apache Spark: A Comprehensive GuideLearn Apache Spark: A Comprehensive Guide
Learn Apache Spark: A Comprehensive GuideWhizlabs
 
An Insider’s Guide to Maximizing Spark SQL Performance
 An Insider’s Guide to Maximizing Spark SQL Performance An Insider’s Guide to Maximizing Spark SQL Performance
An Insider’s Guide to Maximizing Spark SQL PerformanceTakuya UESHIN
 
Apache Spark Introduction
Apache Spark IntroductionApache Spark Introduction
Apache Spark Introductionbigdata trunk
 
Build a deep learning pipeline on apache spark for ads optimization
Build a deep learning pipeline on apache spark for ads optimizationBuild a deep learning pipeline on apache spark for ads optimization
Build a deep learning pipeline on apache spark for ads optimizationCraig Chao
 
Just Enough DevOps for Data Scientists Part II: Handling Infra Failures When ...
Just Enough DevOps for Data Scientists Part II: Handling Infra Failures When ...Just Enough DevOps for Data Scientists Part II: Handling Infra Failures When ...
Just Enough DevOps for Data Scientists Part II: Handling Infra Failures When ...Anya Bida
 
Just enough DevOps for Data Scientists (Part II)
Just enough DevOps for Data Scientists (Part II)Just enough DevOps for Data Scientists (Part II)
Just enough DevOps for Data Scientists (Part II)Databricks
 
Spark is going to replace Apache Hadoop! Know Why?
Spark is going to replace Apache Hadoop! Know Why?Spark is going to replace Apache Hadoop! Know Why?
Spark is going to replace Apache Hadoop! Know Why?Edureka!
 
Spark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production usersSpark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production usersDatabricks
 
Get your organization’s feet wet with Semantic Web Technologies
Get your organization’s feet wet with Semantic Web TechnologiesGet your organization’s feet wet with Semantic Web Technologies
Get your organization’s feet wet with Semantic Web TechnologiesAndré Torkveen
 

Similar to Apache Spark: Coming up to speed (20)

Getting Started with Spark Scala
Getting Started with Spark ScalaGetting Started with Spark Scala
Getting Started with Spark Scala
 
Contributing to Apache Spark 3
Contributing to Apache Spark 3Contributing to Apache Spark 3
Contributing to Apache Spark 3
 
Apexand visualforcearchitecture
Apexand visualforcearchitectureApexand visualforcearchitecture
Apexand visualforcearchitecture
 
Rg apexand visualforcearchitecture
Rg apexand visualforcearchitectureRg apexand visualforcearchitecture
Rg apexand visualforcearchitecture
 
99 Apache Spark interview questions for professionals - https://www.amazon.co...
99 Apache Spark interview questions for professionals - https://www.amazon.co...99 Apache Spark interview questions for professionals - https://www.amazon.co...
99 Apache Spark interview questions for professionals - https://www.amazon.co...
 
TiConf NYC - Documenting Your Titanium Applications
TiConf NYC - Documenting Your Titanium ApplicationsTiConf NYC - Documenting Your Titanium Applications
TiConf NYC - Documenting Your Titanium Applications
 
Documenting apps ti confnyc
Documenting apps   ti confnycDocumenting apps   ti confnyc
Documenting apps ti confnyc
 
Building Commercial Applications with Oracle Applications Express by Scott Sp...
Building Commercial Applications with Oracle Applications Express by Scott Sp...Building Commercial Applications with Oracle Applications Express by Scott Sp...
Building Commercial Applications with Oracle Applications Express by Scott Sp...
 
Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...
Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...
Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...
 
Learn Apache Spark: A Comprehensive Guide
Learn Apache Spark: A Comprehensive GuideLearn Apache Spark: A Comprehensive Guide
Learn Apache Spark: A Comprehensive Guide
 
An Insider’s Guide to Maximizing Spark SQL Performance
 An Insider’s Guide to Maximizing Spark SQL Performance An Insider’s Guide to Maximizing Spark SQL Performance
An Insider’s Guide to Maximizing Spark SQL Performance
 
Apache Spark Introduction
Apache Spark IntroductionApache Spark Introduction
Apache Spark Introduction
 
SparkTokyo2019
SparkTokyo2019SparkTokyo2019
SparkTokyo2019
 
Build a deep learning pipeline on apache spark for ads optimization
Build a deep learning pipeline on apache spark for ads optimizationBuild a deep learning pipeline on apache spark for ads optimization
Build a deep learning pipeline on apache spark for ads optimization
 
Just Enough DevOps for Data Scientists Part II: Handling Infra Failures When ...
Just Enough DevOps for Data Scientists Part II: Handling Infra Failures When ...Just Enough DevOps for Data Scientists Part II: Handling Infra Failures When ...
Just Enough DevOps for Data Scientists Part II: Handling Infra Failures When ...
 
Just enough DevOps for Data Scientists (Part II)
Just enough DevOps for Data Scientists (Part II)Just enough DevOps for Data Scientists (Part II)
Just enough DevOps for Data Scientists (Part II)
 
Spark is going to replace Apache Hadoop! Know Why?
Spark is going to replace Apache Hadoop! Know Why?Spark is going to replace Apache Hadoop! Know Why?
Spark is going to replace Apache Hadoop! Know Why?
 
Spark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production usersSpark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production users
 
STC Design
STC DesignSTC Design
STC Design
 
Get your organization’s feet wet with Semantic Web Technologies
Get your organization’s feet wet with Semantic Web TechnologiesGet your organization’s feet wet with Semantic Web Technologies
Get your organization’s feet wet with Semantic Web Technologies
 

Recently uploaded

Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 

Recently uploaded (20)

Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 

Apache Spark: Coming up to speed

  • 1. Coming up to speed on Spark Please send any comments to: Adarsh Pannu adarshrp@us.ibm.com
  • 2. Intro What is Spark? How does it relate to Hadoop? When would you use it? 1-2 hours Basic Understand basic technology and write simple programs 1-2 days Intermediate Start enabling customers in the field, hand-holding them through problems and issues. 5-15 days and more Expert Know Spark inside out even if you don’t intend to contribute to the project itself. Weeks to months
  • 3. Intro Spark Go through these presentations to understand the value of Spark. These speakers also attempt to differentiate Spark from Hadoop, and enumerate its comparative strengths. (Not much code here) !  Turning Data into Value, Ion Stoica, Spark Summit 2013 Video & Slides 25 mins !  An Overview of Apache Spark, Jim Scott, Video Slides 1 hr 06 mins !  How Companies are Using Spark, and Where the Edge in Big Data Will Be, Matei Zaharia, Video & Slides 12 mins !  Spark Fundamentals I (Lesson 1 only), Big Data University <20 mins
  • 4. Basic Spark !  Pick up some Scala through this article co- authored by Scala’s creator, Martin Odersky. Link Estimated time: 2 hours
  • 5. Basic Spark (contd.) !  Do these two courses. They cover Spark basics and include a certification. You can use the supplied Docker images for all other labs. 7 hours
  • 6. Basic Spark (contd.) !  Go to spark.apache.org and study the Overview and the Spark Programming Guide. Many online courses borrow liberally from this material. Information on this site is updated with every new Spark release. Estimated 7-8 hours.
  • 7. Intermediate Spark !  Stay at spark.apache.org. Go through the component specific Programming Guides as well as the sections on Deploying and More. Browse the Spark API as needed. Estimated time 3-5 days and more.
  • 8. Intermediate Spark (contd.) Learn about the operational aspects of Spark: !  Advanced Apache Spark (DevOps) Video Slides 6 hours " EXCELLENT! !  Tuning and Debugging Spark Video Slides 48 mins !  How-to: Tune Your Apache Spark Jobs Link ~ 1 hour !  (Tons of other presentations, to be listed later) Gain a high-level understanding of Spark architecture: !  Introduction to AmpLab Spark Internals, Matei Zaharia (Databricks), Video 1 hr 15 mins !  A Deeper Understanding of Spark Internals, Aaron Davidson (Databricks) Video PDF 44 mins
  • 9. Intermediate Spark (contd.) Experiment, experiment, experiment ... “Play the role of the customer” !  Setup your personal 3-4 node cluster !  Download some “open” data. E.g. “airline” data on stat-computing.org/dataexpo/2009/ !  Write some code, make it run, see how it performs, tune it, trouble-shoot it !  Experiment with different deployment modes (Standalone + YARN) !  Play with different configuration knobs, check out dashboards, etc. !  Explore all subcomponents (especially Core, SQL, MLLib)
  • 10. Read the original academic papers !  Resilient Distributed Datasets: A Fault- Tolerant Abstraction for In-Memory Cluster Computing, Matei Zaharia, et. al. !  Discretized Streams: An Efficient and Fault- Tolerant Model for Stream Processing on Large Clusters, Matei Zaharia, et. al. !  GraphX: A Resilient Distributed Graph System on Spark, Reynold S. Xin, et. al. !  Spark SQL: Relational Data Processing in Spark, Michael Armbrust, et. al. Advanced Spark: Original Papers
  • 11. Advanced Spark: Enhance your Scala skills This book by Odersky is arduously long and isn’t meant to give you a quick start. !  Use this as your primary Scala text !  Excellent MooC by Odersky. Some of the material is meant for CS majors. Highly recommended for STC developers. 35+ hours
  • 12. Advanced Spark: Browse Conference Proceedings Spark Summits cover technology and use cases. Technology is also covered in various other places so you could consider skipping those tracks. Don’t forget to check out the customer stories. That is how we learn about enablement opportunities and challenges, and in some cases, we can see through the Spark hype ☺ 100+ hours of FREE videos and associated PDFs available on spark-summit.org. You don’t even have to pay the conference fee! Go back in time and “attend” these conferences! We can produce a smaller ”watch list” of important videos and publish that internally.
  • 13. Advanced Spark: Browse YouTube Videos YouTube is full of training videos, some good, other not so much. These are the only channels you need to watch though. There is a lot of repetition in the material, and some of the videos are from the conferences mentioned earlier.
  • 14. Advanced Spark: Check out these books Provides a good overview of Spark but much of the material is also available through other sources previously mentioned. Could be skipped. !  Covers concrete statistical analysis / machine learning use cases. Covers Spark APIs and MLLib. Highly recommended for data scientists.
  • 15. Advanced Spark: Yes ... read the code Even if you don’t intend to contribute to Spark, there are a ton of valuable comments in the code that provide insights into Spark’s design. Don’t be shy! Go to github.com/apache/spark and check it to out.