SlideShare a Scribd company logo
Coming up to speed on Spark
Please send any comments to:
Adarsh Pannu
adarshrp@us.ibm.com
Intro
What is Spark? How does it
relate to Hadoop? When would
you use it?
1-2 hours
Basic Understand basic technology
and write simple programs
1-2 days
Intermediate
Start enabling customers in the
field, hand-holding them
through problems and issues.
5-15 days and
more
Expert
Know Spark inside out even if
you don’t intend to contribute to
the project itself.
Weeks to months
Intro Spark
Go through these presentations to understand the value of Spark. These speakers also
attempt to differentiate Spark from Hadoop, and enumerate its comparative strengths. (Not
much code here)
!  Turning Data into Value, Ion Stoica, Spark Summit 2013 Video & Slides 25 mins
!  An Overview of Apache Spark, Jim Scott, Video Slides 1 hr 06 mins
!  How Companies are Using Spark, and Where the Edge in Big Data Will Be, Matei
Zaharia, Video & Slides 12 mins
!  Spark Fundamentals I (Lesson 1 only), Big Data University <20 mins
Basic Spark
!  Pick up some Scala through this article co-
authored by Scala’s creator, Martin Odersky.
Link
Estimated time: 2 hours
Basic Spark (contd.)
!  Do these two courses. They cover Spark basics and include a
certification. You can use the supplied Docker images for all other
labs.
7 hours
Basic Spark (contd.)
!  Go to spark.apache.org and study the Overview and the
Spark Programming Guide. Many online courses borrow
liberally from this material. Information on this site is
updated with every new Spark release.
Estimated 7-8 hours.
Intermediate Spark
!  Stay at spark.apache.org. Go through the component specific Programming Guides as
well as the sections on Deploying and More. Browse the Spark API as needed.
Estimated time 3-5 days and more.
Intermediate Spark (contd.)
Learn about the operational aspects of Spark:
!  Advanced Apache Spark (DevOps) Video Slides 6 hours " EXCELLENT!
!  Tuning and Debugging Spark Video Slides 48 mins
!  How-to: Tune Your Apache Spark Jobs Link ~ 1 hour
!  (Tons of other presentations, to be listed later)
Gain a high-level understanding of Spark architecture:
!  Introduction to AmpLab Spark Internals, Matei Zaharia (Databricks), Video 1 hr 15 mins
!  A Deeper Understanding of Spark Internals, Aaron Davidson (Databricks) Video PDF
44 mins
Intermediate Spark (contd.)
Experiment, experiment, experiment ... “Play the role of the customer”
!  Setup your personal 3-4 node cluster
!  Download some “open” data. E.g. “airline” data on stat-computing.org/dataexpo/2009/
!  Write some code, make it run, see how it performs, tune it, trouble-shoot it
!  Experiment with different deployment modes (Standalone + YARN)
!  Play with different configuration knobs, check out dashboards, etc.
!  Explore all subcomponents (especially Core, SQL, MLLib)
Read the original academic papers
!  Resilient Distributed Datasets: A Fault-
Tolerant Abstraction for In-Memory Cluster
Computing, Matei Zaharia, et. al.
!  Discretized Streams: An Efficient and Fault-
Tolerant Model for Stream Processing on
Large Clusters, Matei Zaharia, et. al.
!  GraphX: A Resilient Distributed Graph
System on Spark, Reynold S. Xin, et. al.
!  Spark SQL: Relational Data Processing in
Spark, Michael Armbrust, et. al.
Advanced Spark: Original Papers
Advanced Spark: Enhance your Scala skills
This book by
Odersky is arduously
long and isn’t meant
to give you a quick
start.
!  Use this as your
primary Scala text
!  Excellent MooC by Odersky. Some of
the material is meant for CS majors.
Highly recommended for STC
developers.
35+ hours
Advanced Spark: Browse Conference Proceedings
Spark Summits cover technology and use cases. Technology is also covered in various other places so
you could consider skipping those tracks. Don’t forget to check out the customer stories. That is how we
learn about enablement opportunities and challenges, and in some cases, we can see through the
Spark hype ☺
100+ hours of FREE videos and associated PDFs available on spark-summit.org. You don’t even have
to pay the conference fee! Go back in time and “attend” these conferences!
We can produce a smaller ”watch list” of important videos and publish that internally.
Advanced Spark: Browse YouTube Videos
YouTube is full of training videos, some good, other not so much. These are the
only channels you need to watch though. There is a lot of repetition in the
material, and some of the videos are from the conferences mentioned earlier.
Advanced Spark: Check out these books
Provides a good overview of Spark but
much of the material is also available
through other sources previously
mentioned. Could be skipped.
!  Covers concrete statistical analysis /
machine learning use cases. Covers
Spark APIs and MLLib. Highly
recommended for data scientists.
Advanced Spark: Yes ... read the code
Even if you don’t intend to contribute to Spark, there are a ton of valuable comments in the code that
provide insights into Spark’s design. Don’t be shy! Go to github.com/apache/spark and check it to out.

More Related Content

Viewers also liked

Nmap(network mapping)
Nmap(network mapping)Nmap(network mapping)
Nmap(network mapping)
shwetha mk
 
Detection of Idle Stealth Port Scan Attack in Network Intrusion Detection Sys...
Detection of Idle Stealth Port Scan Attack in Network Intrusion Detection Sys...Detection of Idle Stealth Port Scan Attack in Network Intrusion Detection Sys...
Detection of Idle Stealth Port Scan Attack in Network Intrusion Detection Sys...
skpatel91
 
Understanding NMAP
Understanding NMAPUnderstanding NMAP
Understanding NMAP
Phannarith Ou, G-CISO
 
Nmap Basics
Nmap BasicsNmap Basics
Nmap Basics
amiable_indian
 
Hacking With Nmap - Scanning Techniques
Hacking With Nmap - Scanning TechniquesHacking With Nmap - Scanning Techniques
Hacking With Nmap - Scanning Techniques
amiable_indian
 
Hadoop and Big Data Overview
Hadoop and Big Data OverviewHadoop and Big Data Overview
Hadoop and Big Data Overview
Prabhu Thukkaram
 
Apache Spark and Oracle Stream Analytics
Apache Spark and Oracle Stream AnalyticsApache Spark and Oracle Stream Analytics
Apache Spark and Oracle Stream Analytics
Prabhu Thukkaram
 
Spark on YARN
Spark on YARNSpark on YARN
Spark on YARN
Adarsh Pannu
 
A Deep Dive into Structured Streaming in Apache Spark
A Deep Dive into Structured Streaming in Apache Spark A Deep Dive into Structured Streaming in Apache Spark
A Deep Dive into Structured Streaming in Apache Spark
Anyscale
 
Apache Kafka lessons learned @PAYBACK
Apache Kafka lessons learned @PAYBACKApache Kafka lessons learned @PAYBACK
Apache Kafka lessons learned @PAYBACK
Maxim Shelest
 
Dive into Spark Streaming
Dive into Spark StreamingDive into Spark Streaming
Dive into Spark Streaming
Gerard Maas
 
Exploring language classification with spark and the spark notebook
Exploring language classification with spark and the spark notebookExploring language classification with spark and the spark notebook
Exploring language classification with spark and the spark notebook
Gerard Maas
 
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...
Lightbend
 
Double Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSenseDouble Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSense
Hortonworks
 
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Anyscale
 
Deep Learning with Apache Spark: an Introduction
Deep Learning with Apache Spark: an IntroductionDeep Learning with Apache Spark: an Introduction
Deep Learning with Apache Spark: an Introduction
Emanuele Bezzi
 
Witgesterde blauwborst powerpoint
Witgesterde blauwborst powerpointWitgesterde blauwborst powerpoint
Witgesterde blauwborst powerpointArie Yuri Bakker
 
تلاش‏هاى جهانى جهت كنترل تسليحات و خلع سلاح
تلاش‏هاى جهانى جهت كنترل تسليحات و خلع سلاحتلاش‏هاى جهانى جهت كنترل تسليحات و خلع سلاح
تلاش‏هاى جهانى جهت كنترل تسليحات و خلع سلاح
Majid Zavari
 

Viewers also liked (18)

Nmap(network mapping)
Nmap(network mapping)Nmap(network mapping)
Nmap(network mapping)
 
Detection of Idle Stealth Port Scan Attack in Network Intrusion Detection Sys...
Detection of Idle Stealth Port Scan Attack in Network Intrusion Detection Sys...Detection of Idle Stealth Port Scan Attack in Network Intrusion Detection Sys...
Detection of Idle Stealth Port Scan Attack in Network Intrusion Detection Sys...
 
Understanding NMAP
Understanding NMAPUnderstanding NMAP
Understanding NMAP
 
Nmap Basics
Nmap BasicsNmap Basics
Nmap Basics
 
Hacking With Nmap - Scanning Techniques
Hacking With Nmap - Scanning TechniquesHacking With Nmap - Scanning Techniques
Hacking With Nmap - Scanning Techniques
 
Hadoop and Big Data Overview
Hadoop and Big Data OverviewHadoop and Big Data Overview
Hadoop and Big Data Overview
 
Apache Spark and Oracle Stream Analytics
Apache Spark and Oracle Stream AnalyticsApache Spark and Oracle Stream Analytics
Apache Spark and Oracle Stream Analytics
 
Spark on YARN
Spark on YARNSpark on YARN
Spark on YARN
 
A Deep Dive into Structured Streaming in Apache Spark
A Deep Dive into Structured Streaming in Apache Spark A Deep Dive into Structured Streaming in Apache Spark
A Deep Dive into Structured Streaming in Apache Spark
 
Apache Kafka lessons learned @PAYBACK
Apache Kafka lessons learned @PAYBACKApache Kafka lessons learned @PAYBACK
Apache Kafka lessons learned @PAYBACK
 
Dive into Spark Streaming
Dive into Spark StreamingDive into Spark Streaming
Dive into Spark Streaming
 
Exploring language classification with spark and the spark notebook
Exploring language classification with spark and the spark notebookExploring language classification with spark and the spark notebook
Exploring language classification with spark and the spark notebook
 
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...
 
Double Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSenseDouble Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSense
 
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
 
Deep Learning with Apache Spark: an Introduction
Deep Learning with Apache Spark: an IntroductionDeep Learning with Apache Spark: an Introduction
Deep Learning with Apache Spark: an Introduction
 
Witgesterde blauwborst powerpoint
Witgesterde blauwborst powerpointWitgesterde blauwborst powerpoint
Witgesterde blauwborst powerpoint
 
تلاش‏هاى جهانى جهت كنترل تسليحات و خلع سلاح
تلاش‏هاى جهانى جهت كنترل تسليحات و خلع سلاحتلاش‏هاى جهانى جهت كنترل تسليحات و خلع سلاح
تلاش‏هاى جهانى جهت كنترل تسليحات و خلع سلاح
 

Similar to Apache Spark: Coming up to speed

Getting Started with Spark Scala
Getting Started with Spark ScalaGetting Started with Spark Scala
Getting Started with Spark Scala
Knoldus Inc.
 
Contributing to Apache Spark 3
Contributing to Apache Spark 3Contributing to Apache Spark 3
Contributing to Apache Spark 3
Holden Karau
 
Apexand visualforcearchitecture
Apexand visualforcearchitectureApexand visualforcearchitecture
Apexand visualforcearchitecture
CMR WORLD TECH
 
Rg apexand visualforcearchitecture
Rg apexand visualforcearchitectureRg apexand visualforcearchitecture
Rg apexand visualforcearchitecture
CMR WORLD TECH
 
99 Apache Spark interview questions for professionals - https://www.amazon.co...
99 Apache Spark interview questions for professionals - https://www.amazon.co...99 Apache Spark interview questions for professionals - https://www.amazon.co...
99 Apache Spark interview questions for professionals - https://www.amazon.co...
Yogesh Kumar
 
TiConf NYC - Documenting Your Titanium Applications
TiConf NYC - Documenting Your Titanium ApplicationsTiConf NYC - Documenting Your Titanium Applications
TiConf NYC - Documenting Your Titanium Applications
Jamil Spain
 
Documenting apps ti confnyc
Documenting apps   ti confnycDocumenting apps   ti confnyc
Documenting apps ti confnyc
Jamil Spain
 
Building Commercial Applications with Oracle Applications Express by Scott Sp...
Building Commercial Applications with Oracle Applications Express by Scott Sp...Building Commercial Applications with Oracle Applications Express by Scott Sp...
Building Commercial Applications with Oracle Applications Express by Scott Sp...
Enkitec
 
Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...
Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...
Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...
Edureka!
 
Learn Apache Spark: A Comprehensive Guide
Learn Apache Spark: A Comprehensive GuideLearn Apache Spark: A Comprehensive Guide
Learn Apache Spark: A Comprehensive Guide
Whizlabs
 
An Insider’s Guide to Maximizing Spark SQL Performance
 An Insider’s Guide to Maximizing Spark SQL Performance An Insider’s Guide to Maximizing Spark SQL Performance
An Insider’s Guide to Maximizing Spark SQL Performance
Takuya UESHIN
 
Apache Spark Introduction
Apache Spark IntroductionApache Spark Introduction
Apache Spark Introduction
bigdata trunk
 
SparkTokyo2019
SparkTokyo2019SparkTokyo2019
SparkTokyo2019
Kazuaki Ishizaki
 
Build a deep learning pipeline on apache spark for ads optimization
Build a deep learning pipeline on apache spark for ads optimizationBuild a deep learning pipeline on apache spark for ads optimization
Build a deep learning pipeline on apache spark for ads optimization
Craig Chao
 
Just Enough DevOps for Data Scientists Part II: Handling Infra Failures When ...
Just Enough DevOps for Data Scientists Part II: Handling Infra Failures When ...Just Enough DevOps for Data Scientists Part II: Handling Infra Failures When ...
Just Enough DevOps for Data Scientists Part II: Handling Infra Failures When ...
Anya Bida
 
Just enough DevOps for Data Scientists (Part II)
Just enough DevOps for Data Scientists (Part II)Just enough DevOps for Data Scientists (Part II)
Just enough DevOps for Data Scientists (Part II)
Databricks
 
Spark is going to replace Apache Hadoop! Know Why?
Spark is going to replace Apache Hadoop! Know Why?Spark is going to replace Apache Hadoop! Know Why?
Spark is going to replace Apache Hadoop! Know Why?
Edureka!
 
Spark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production usersSpark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production users
Databricks
 
STC Design
STC DesignSTC Design
Get your organization’s feet wet with Semantic Web Technologies
Get your organization’s feet wet with Semantic Web TechnologiesGet your organization’s feet wet with Semantic Web Technologies
Get your organization’s feet wet with Semantic Web Technologies
André Torkveen
 

Similar to Apache Spark: Coming up to speed (20)

Getting Started with Spark Scala
Getting Started with Spark ScalaGetting Started with Spark Scala
Getting Started with Spark Scala
 
Contributing to Apache Spark 3
Contributing to Apache Spark 3Contributing to Apache Spark 3
Contributing to Apache Spark 3
 
Apexand visualforcearchitecture
Apexand visualforcearchitectureApexand visualforcearchitecture
Apexand visualforcearchitecture
 
Rg apexand visualforcearchitecture
Rg apexand visualforcearchitectureRg apexand visualforcearchitecture
Rg apexand visualforcearchitecture
 
99 Apache Spark interview questions for professionals - https://www.amazon.co...
99 Apache Spark interview questions for professionals - https://www.amazon.co...99 Apache Spark interview questions for professionals - https://www.amazon.co...
99 Apache Spark interview questions for professionals - https://www.amazon.co...
 
TiConf NYC - Documenting Your Titanium Applications
TiConf NYC - Documenting Your Titanium ApplicationsTiConf NYC - Documenting Your Titanium Applications
TiConf NYC - Documenting Your Titanium Applications
 
Documenting apps ti confnyc
Documenting apps   ti confnycDocumenting apps   ti confnyc
Documenting apps ti confnyc
 
Building Commercial Applications with Oracle Applications Express by Scott Sp...
Building Commercial Applications with Oracle Applications Express by Scott Sp...Building Commercial Applications with Oracle Applications Express by Scott Sp...
Building Commercial Applications with Oracle Applications Express by Scott Sp...
 
Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...
Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...
Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...
 
Learn Apache Spark: A Comprehensive Guide
Learn Apache Spark: A Comprehensive GuideLearn Apache Spark: A Comprehensive Guide
Learn Apache Spark: A Comprehensive Guide
 
An Insider’s Guide to Maximizing Spark SQL Performance
 An Insider’s Guide to Maximizing Spark SQL Performance An Insider’s Guide to Maximizing Spark SQL Performance
An Insider’s Guide to Maximizing Spark SQL Performance
 
Apache Spark Introduction
Apache Spark IntroductionApache Spark Introduction
Apache Spark Introduction
 
SparkTokyo2019
SparkTokyo2019SparkTokyo2019
SparkTokyo2019
 
Build a deep learning pipeline on apache spark for ads optimization
Build a deep learning pipeline on apache spark for ads optimizationBuild a deep learning pipeline on apache spark for ads optimization
Build a deep learning pipeline on apache spark for ads optimization
 
Just Enough DevOps for Data Scientists Part II: Handling Infra Failures When ...
Just Enough DevOps for Data Scientists Part II: Handling Infra Failures When ...Just Enough DevOps for Data Scientists Part II: Handling Infra Failures When ...
Just Enough DevOps for Data Scientists Part II: Handling Infra Failures When ...
 
Just enough DevOps for Data Scientists (Part II)
Just enough DevOps for Data Scientists (Part II)Just enough DevOps for Data Scientists (Part II)
Just enough DevOps for Data Scientists (Part II)
 
Spark is going to replace Apache Hadoop! Know Why?
Spark is going to replace Apache Hadoop! Know Why?Spark is going to replace Apache Hadoop! Know Why?
Spark is going to replace Apache Hadoop! Know Why?
 
Spark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production usersSpark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production users
 
STC Design
STC DesignSTC Design
STC Design
 
Get your organization’s feet wet with Semantic Web Technologies
Get your organization’s feet wet with Semantic Web TechnologiesGet your organization’s feet wet with Semantic Web Technologies
Get your organization’s feet wet with Semantic Web Technologies
 

Recently uploaded

4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
zsjl4mimo
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 

Recently uploaded (20)

4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 

Apache Spark: Coming up to speed

  • 1. Coming up to speed on Spark Please send any comments to: Adarsh Pannu adarshrp@us.ibm.com
  • 2. Intro What is Spark? How does it relate to Hadoop? When would you use it? 1-2 hours Basic Understand basic technology and write simple programs 1-2 days Intermediate Start enabling customers in the field, hand-holding them through problems and issues. 5-15 days and more Expert Know Spark inside out even if you don’t intend to contribute to the project itself. Weeks to months
  • 3. Intro Spark Go through these presentations to understand the value of Spark. These speakers also attempt to differentiate Spark from Hadoop, and enumerate its comparative strengths. (Not much code here) !  Turning Data into Value, Ion Stoica, Spark Summit 2013 Video & Slides 25 mins !  An Overview of Apache Spark, Jim Scott, Video Slides 1 hr 06 mins !  How Companies are Using Spark, and Where the Edge in Big Data Will Be, Matei Zaharia, Video & Slides 12 mins !  Spark Fundamentals I (Lesson 1 only), Big Data University <20 mins
  • 4. Basic Spark !  Pick up some Scala through this article co- authored by Scala’s creator, Martin Odersky. Link Estimated time: 2 hours
  • 5. Basic Spark (contd.) !  Do these two courses. They cover Spark basics and include a certification. You can use the supplied Docker images for all other labs. 7 hours
  • 6. Basic Spark (contd.) !  Go to spark.apache.org and study the Overview and the Spark Programming Guide. Many online courses borrow liberally from this material. Information on this site is updated with every new Spark release. Estimated 7-8 hours.
  • 7. Intermediate Spark !  Stay at spark.apache.org. Go through the component specific Programming Guides as well as the sections on Deploying and More. Browse the Spark API as needed. Estimated time 3-5 days and more.
  • 8. Intermediate Spark (contd.) Learn about the operational aspects of Spark: !  Advanced Apache Spark (DevOps) Video Slides 6 hours " EXCELLENT! !  Tuning and Debugging Spark Video Slides 48 mins !  How-to: Tune Your Apache Spark Jobs Link ~ 1 hour !  (Tons of other presentations, to be listed later) Gain a high-level understanding of Spark architecture: !  Introduction to AmpLab Spark Internals, Matei Zaharia (Databricks), Video 1 hr 15 mins !  A Deeper Understanding of Spark Internals, Aaron Davidson (Databricks) Video PDF 44 mins
  • 9. Intermediate Spark (contd.) Experiment, experiment, experiment ... “Play the role of the customer” !  Setup your personal 3-4 node cluster !  Download some “open” data. E.g. “airline” data on stat-computing.org/dataexpo/2009/ !  Write some code, make it run, see how it performs, tune it, trouble-shoot it !  Experiment with different deployment modes (Standalone + YARN) !  Play with different configuration knobs, check out dashboards, etc. !  Explore all subcomponents (especially Core, SQL, MLLib)
  • 10. Read the original academic papers !  Resilient Distributed Datasets: A Fault- Tolerant Abstraction for In-Memory Cluster Computing, Matei Zaharia, et. al. !  Discretized Streams: An Efficient and Fault- Tolerant Model for Stream Processing on Large Clusters, Matei Zaharia, et. al. !  GraphX: A Resilient Distributed Graph System on Spark, Reynold S. Xin, et. al. !  Spark SQL: Relational Data Processing in Spark, Michael Armbrust, et. al. Advanced Spark: Original Papers
  • 11. Advanced Spark: Enhance your Scala skills This book by Odersky is arduously long and isn’t meant to give you a quick start. !  Use this as your primary Scala text !  Excellent MooC by Odersky. Some of the material is meant for CS majors. Highly recommended for STC developers. 35+ hours
  • 12. Advanced Spark: Browse Conference Proceedings Spark Summits cover technology and use cases. Technology is also covered in various other places so you could consider skipping those tracks. Don’t forget to check out the customer stories. That is how we learn about enablement opportunities and challenges, and in some cases, we can see through the Spark hype ☺ 100+ hours of FREE videos and associated PDFs available on spark-summit.org. You don’t even have to pay the conference fee! Go back in time and “attend” these conferences! We can produce a smaller ”watch list” of important videos and publish that internally.
  • 13. Advanced Spark: Browse YouTube Videos YouTube is full of training videos, some good, other not so much. These are the only channels you need to watch though. There is a lot of repetition in the material, and some of the videos are from the conferences mentioned earlier.
  • 14. Advanced Spark: Check out these books Provides a good overview of Spark but much of the material is also available through other sources previously mentioned. Could be skipped. !  Covers concrete statistical analysis / machine learning use cases. Covers Spark APIs and MLLib. Highly recommended for data scientists.
  • 15. Advanced Spark: Yes ... read the code Even if you don’t intend to contribute to Spark, there are a ton of valuable comments in the code that provide insights into Spark’s design. Don’t be shy! Go to github.com/apache/spark and check it to out.