SlideShare a Scribd company logo
Spark Tuning for Enterprise
System Administrators
Anya T. Bida, PhD
Rachel B. Warren
Don't worry about missing something...
Video: https://www.youtube.com/watch?v=DNWaMR8uKDc&feature=youtu.be
Presentation: http://www.slideshare.net/anyabida
Cheat-sheet: http://techsuppdiva.github.io/
!
!
Anya: https://www.linkedin.com/in/anyabida
Rachel: https://www.linkedin.com/in/rachelbwarren
!
!

 !2
About Anya About Rachel
Operations Engineer
!
!
!
Spark & Scala Enthusiast /
Data Engineer
Alpine Data
!
alpinenow.com
About You*
Intermittent
Reliable
Optimal
Spark practitioners
mySparkApp Success
*
Intermittent
Reliable
Optimal
mySparkApp Success
Default != Recommended
Example: By default, spark.executor.memory = 1g
1g allows small jobs to finish out of the box.
Spark assumes you'll increase this parameter.

!6
Which parameters are important?
!
How do I configure them?
!7
Default != Recommended
Filter* data
before an
expensive reduce
or aggregation
consider*
coalesce(
Use* data
structures that
require less
memory
Serialize*
PySpark
serializing
is built-in
Scala/
Java?
persist(storageLevel.[*]_SER)
Recommended:
kryoserializer *
tuning.html#tuning-
data-structures
See "Optimize partitions."
*
See "GC investigation." *
See "Checkpointing." *
The Spark Tuning Cheat-Sheet
Intermittent
Reliable
Optimal
mySparkApp Success
Memory trouble
Initial config
Intermittent
Reliable
Optimal
mySparkApp Success
Memory trouble
Initial config
!11
How many in the
audience have their own
cluster?
!12
Fair Schedulers
!13
YARN
<allocations>
<queue name="sample_queue">
<minResources>4000 mb,0vcores</minResources>
<maxResources>8000 mb,8vcores</maxResources>
<maxRunningApps>10</maxRunningApps>
<weight>2.0</weight>
<schedulingPolicy>fair</schedulingPolicy>
</queue>
</allocations>
SPARK
<allocations>

<pool name="sample_queue">
<schedulingMode>FAIR</sch
<weight>1</weight>

<minShare>2</minShare>

</pool>

</allocations>
Fair Schedulers
!14
YARN
<allocations>
<queue name="sample_queue">
<minResources>4000 mb,0vcores</minResources>
<maxResources>8000 mb,8vcores</maxResources>
<maxRunningApps>10</maxRunningApps>
<weight>2.0</weight>
<schedulingPolicy>fair</schedulingPolicy>
</queue>
</allocations>
SPARK
<allocations>

<pool name="sample_queue">
<schedulingMode>FAIR</sch
<weight>1</weight>

<minShare>2</minShare>

</pool>

</allocations>
Fair Schedulers
!15
YARN
<allocations>
<queue name="sample_queue">
<minResources>4000 mb,0vcores</minResources>
<maxResources>8000 mb,8vcores</maxResources>
<maxRunningApps>10</maxRunningApps>
<weight>2.0</weight>
<schedulingPolicy>fair</schedulingPolicy>
</queue>
</allocations>
SPARK
<allocations>

<pool name="sample_queue">
<schedulingMode>FAIR</sch
<weight>1</weight>

<minShare>2</minShare>

</pool>

</allocations>
Fair Schedulers
!16
YARN
<allocations>
<queue name="sample_queue">
<minResources>4000 mb,0vcores</minResources>
<maxResources>8000 mb,8vcores</maxResources>
<maxRunningApps>10</maxRunningApps>
<weight>2.0</weight>
<schedulingPolicy>fair</schedulingPolicy>
</queue>
</allocations>
SPARK
<allocations>

<pool name="sample_queue">
<schedulingMode>FAIR</sch
<weight>1</weight>

<minShare>2</minShare>

</pool>

</allocations>
Fair Schedulers
!17
YARN
<allocations>
<queue name="sample_queue">
<minResources>4000 mb,0vcores</minResources>
<maxResources>8000 mb,8vcores</maxResources>
<maxRunningApps>10</maxRunningApps>
<weight>2.0</weight>
<schedulingPolicy>fair</schedulingPolicy>
</queue>
</allocations>
SPARK
<allocations>

<pool name="sample_queue">
<schedulingMode>FAIR</sch
<weight>1</weight>

<minShare>2</minShare>

</pool>

</allocations>
Use these parameters!
Fair Schedulers
!18
YARN
<allocations>
<user name="sample_user">
<maxRunningApps>6</maxRunningApps>
</user>
<userMaxAppsDefault>5</userMaxAppsDefault>
!
</allocations>
Fair Schedulers
!19
YARN
<allocations>
<user name="sample_user">
<maxRunningApps>6</maxRunningApps>
</user>
<userMaxAppsDefault>5</userMaxAppsDefault>
!
</allocations>
What is the memory limit for
mySparkApp?
!20
!21
Driver
Executor
Cluster Manager
Sidebar: Spark Architecture
Mark Grover:
http://www.slideshare.net/SparkSummit/top-5-mistakes-when-writing-spark-applications-by-mark-grover-and-ted-
malaska
Executor
!22
Max Memory in "pool" x 3/4 = mySparkApp_mem_limit
!
!
!
What is the memory limit for
mySparkApp?
!23
Max Memory in "pool" x 3/4 = mySparkApp_mem_limit
!
!
!
What is the memory limit for
mySparkApp?
!24
Max Memory in "pool" x 3/4 = mySparkApp_mem_limit
!
!
!
<maxResources>___mb</maxResources>
Limitation
What is the memory limit for
mySparkApp?
What is the memory limit for
mySparkApp?
!25
Max Memory in "pool" x 3/4 = mySparkApp_mem_limit
!
!
!
Reserve 25% for overhead
!26
Max Memory in "pool" x 3/4 = mySparkApp_mem_limit
!
!
!
What is the memory limit for
mySparkApp?
!27
!28
Max Memory in "pool" x 3/4 = mySparkApp_mem_limit
!
mySparkApp_mem_limit > driver.memory + (executor.memory
x dynamicAllocation.maxExecutors)
What is the memory limit for
mySparkApp?
!29
Max Memory in "pool" x 3/4 = mySparkApp_mem_limit
!
mySparkApp_mem_limit > driver.memory + (executor.memory
x dynamicAllocation.maxExecutors)
What is the memory limit for
mySparkApp?
!30
Max Memory in "pool" x 3/4 = mySparkApp_mem_limit
!
mySparkApp_mem_limit > driver.memory + (executor.memory
x dynamicAllocation.maxExecutors)
What is the memory limit for
mySparkApp?
Limitation: Driver must not be
larger than a single node.
!31
yarn.nodemanager.resource.memory-mb
Driver Container
spark.driver.memory
!32
Max Memory in "pool" x 3/4 = mySparkApp_mem_limit
!
mySparkApp_mem_limit > driver.memory + (executor.memory
x dynamicAllocation.maxExecutors)
What is the memory limit for
mySparkApp?
!33
Driver
Executor
Cluster Manager
Sidebar: Spark Architecture
Mark Grover:
http://www.slideshare.net/SparkSummit/top-5-mistakes-when-writing-spark-applications-by-mark-grover-and-ted-
malaska
Executor
!34
Max Memory in "pool" x 3/4 = mySparkApp_mem_limit
!
mySparkApp_mem_limit > driver.memory + (executor.memory
x dynamicAllocation.maxExecutors)
What is the memory limit for
mySparkApp?
Verify my calculations respect this
limitation.
!35
Intermittent
Reliable
Optimal
mySparkApp Success
Memory trouble
Initial config
Intermittent
Reliable
Optimal
mySparkApp Success
Memory trouble
Initial config
mySparkApp memory issues
Reduce the memory needed for
mySparkApp. How?
Gracefully handle memory
limitations. How?
mySparkApp memory issues
Reduce the memory needed for
mySparkApp. How?
Gracefully handle memory
limitations. How?
mySparkApp memory issues
Reduce the memory needed for
mySparkApp. How?
Gracefully handle memory
limitations. How?
mySparkApp memory issues
here let's talk about one scenario
Reduce the memory needed for
mySparkApp. How?
Gracefully handle memory
limitations. How?
mySparkApp memory issues
persist(storageLevel.[*]_SER)
Reduce the memory needed for
mySparkApp. How?
Gracefully handle memory
limitations. How?
mySparkApp memory issues
persist(storageLevel.[*]_SER)
Reduce the memory needed for
mySparkApp. How?
Gracefully handle memory
limitations. How?
mySparkApp memory issues
persist(storageLevel.[*]_SER)
Recommended: kryoserializer *
Reduce the memory needed for
mySparkApp. How?
Gracefully handle memory
limitations. How?
mySparkApp memory issues
persist(storageLevel.[*]_SER)
Recommended: kryoserializer *
Reduce the memory needed for
mySparkApp. How?
Gracefully handle memory
limitations. How?
mySparkApp memory issues
Reduce the memory needed for
mySparkApp. How?
Gracefully handle memory
limitations. How?
mySparkApp memory issues
here let's talk about one scenario
Spark 1.1-1.5,
Recommendation: Increase
spark.memory.storageFraction
!51
Alexey Grishchenko:
https://0x0fff.com/spark-memory-management/
Spark 1.1-1.5,
Recommendation: Increase
spark.memory.storageFraction
!
Spark 1.6, Recommendation:
UnifiedMemoryManager
Alexey Grishchenko:
https://0x0fff.com/spark-memory-management/
Sandy Ryza:
http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/
yarn.nodemanager.resource.memory-mb
spark.yarn.executor.memoryOverhead
Executor Container
spark.executor.memory
!53
Driver
Cluster Manager
Sidebar: Spark Architecture
yarn.nodema
spark.yarn.e
Exec
spark.e
yarn.nodema
spark.yarn.e
Exec
spark.e
yarn.nodema
spark.yarn.e
Exec
spark.e
Executor
Executor
Intermittent
Reliable
Optimal
mySparkApp Success
Memory trouble
Initial config
Intermittent
Reliable
Optimal
mySparkApp Success
Memory trouble
Initial config
Instead of 2.5 hours, myApp
completes in 1 hour.
Cheat-sheet
techsuppdiva.github.io/
Intermittent
Reliable
Optimal
mySparkApp Success
Memory trouble
Initial config
HighPerformanceSpark.com
Further Reading:
• Spark Tuning Cheat-sheet

techsuppdiva.github.io
• Apache Spark Documentation

https://spark.apache.org/docs/latest

• Checkpointing

http://spark.apache.org/docs/latest/streaming-programming-guide.html#checkpointing

https://github.com/jaceklaskowski/mastering-apache-spark-book/blob/master/spark-rdd-checkpointing.adoc

• Learning Spark, by H. Karau, A. Konwinski, P. Wendell, M. Zaharia, 2015
!58
More Questions?
!59
Video: https://www.youtube.com/watch?v=DNWaMR8uKDc&feature=youtu.be
Presentation: http://www.slideshare.net/anyabida
Cheat-sheet: http://techsuppdiva.github.io/
!
!
Anya: https://www.linkedin.com/in/anyabida
Rachel: https://www.linkedin.com/in/rachelbwarren
!
!

 Thanks!

More Related Content

What's hot

Picking the right AWS backend for your application (September 2017)
Picking the right AWS backend for your application (September 2017)Picking the right AWS backend for your application (September 2017)
Picking the right AWS backend for your application (September 2017)
Julien SIMON
 
Akka in Practice: Designing Actor-based Applications
Akka in Practice: Designing Actor-based ApplicationsAkka in Practice: Designing Actor-based Applications
Akka in Practice: Designing Actor-based ApplicationsNLJUG
 
Spark Autotuning Talk - Strata New York
Spark Autotuning Talk - Strata New YorkSpark Autotuning Talk - Strata New York
Spark Autotuning Talk - Strata New York
Holden Karau
 
Getting Buzzed on Buzzwords: Using Cloud & Big Data to Pentest at Scale
Getting Buzzed on Buzzwords: Using Cloud & Big Data to Pentest at ScaleGetting Buzzed on Buzzwords: Using Cloud & Big Data to Pentest at Scale
Getting Buzzed on Buzzwords: Using Cloud & Big Data to Pentest at Scale
Bishop Fox
 
Benchmarking at Parse
Benchmarking at ParseBenchmarking at Parse
Benchmarking at Parse
Travis Redman
 
개발자가 알아두면 좋은 5가지 AWS 인공 지능 서비스 깨알 지식 (윤석찬, AWS 테크에반젤리스트) :: AWS DevDay 2018
개발자가 알아두면 좋은 5가지 AWS 인공 지능 서비스 깨알 지식 (윤석찬, AWS 테크에반젤리스트) ::  AWS DevDay 2018개발자가 알아두면 좋은 5가지 AWS 인공 지능 서비스 깨알 지식 (윤석찬, AWS 테크에반젤리스트) ::  AWS DevDay 2018
개발자가 알아두면 좋은 5가지 AWS 인공 지능 서비스 깨알 지식 (윤석찬, AWS 테크에반젤리스트) :: AWS DevDay 2018
Amazon Web Services Korea
 
SF Solr Meetup - Interactively Search and Visualize Your Big Data
SF Solr Meetup - Interactively Search and Visualize Your Big DataSF Solr Meetup - Interactively Search and Visualize Your Big Data
SF Solr Meetup - Interactively Search and Visualize Your Big Data
gethue
 
Spark intro by Adform Research
Spark intro by Adform ResearchSpark intro by Adform Research
Spark intro by Adform ResearchVasil Remeniuk
 
(GAM303) Riot Games: Migrating Mountains of Data to AWS
(GAM303) Riot Games: Migrating Mountains of Data to AWS(GAM303) Riot Games: Migrating Mountains of Data to AWS
(GAM303) Riot Games: Migrating Mountains of Data to AWS
Amazon Web Services
 
Quarkus - a shrink ray to your Java Application
Quarkus - a shrink ray to your Java ApplicationQuarkus - a shrink ray to your Java Application
Quarkus - a shrink ray to your Java Application
CodeOps Technologies LLP
 
Testing in Scala. Adform Research
Testing in Scala. Adform ResearchTesting in Scala. Adform Research
Testing in Scala. Adform Research
Vasil Remeniuk
 
CQRS + ES with Scala and Akka
CQRS + ES with Scala and AkkaCQRS + ES with Scala and Akka
CQRS + ES with Scala and Akka
Bharadwaj N
 
How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent...
How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent...How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent...
How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent...
Amazon Web Services
 
FPGAs in the cloud? (October 2017)
FPGAs in the cloud? (October 2017)FPGAs in the cloud? (October 2017)
FPGAs in the cloud? (October 2017)
Julien SIMON
 
JahiaOne - Performance Tuning
JahiaOne - Performance TuningJahiaOne - Performance Tuning
JahiaOne - Performance Tuning
Jahia Solutions Group
 
Scalding - Big Data Programming with Scala
Scalding - Big Data Programming with ScalaScalding - Big Data Programming with Scala
Scalding - Big Data Programming with Scala
Taewook Eom
 
Running Yarn at Scale
Running Yarn at Scale Running Yarn at Scale
Running Yarn at Scale
DataWorks Summit
 
Scaling Twitter
Scaling TwitterScaling Twitter
Scaling Twitter
Blaine
 
Going serverless
Going serverlessGoing serverless
Going serverless
Jeremy Green
 
How we took our server side application to the cloud and liked what we got
How we took our server side application to the cloud and liked what we gotHow we took our server side application to the cloud and liked what we got
How we took our server side application to the cloud and liked what we gotBaruch Sadogursky
 

What's hot (20)

Picking the right AWS backend for your application (September 2017)
Picking the right AWS backend for your application (September 2017)Picking the right AWS backend for your application (September 2017)
Picking the right AWS backend for your application (September 2017)
 
Akka in Practice: Designing Actor-based Applications
Akka in Practice: Designing Actor-based ApplicationsAkka in Practice: Designing Actor-based Applications
Akka in Practice: Designing Actor-based Applications
 
Spark Autotuning Talk - Strata New York
Spark Autotuning Talk - Strata New YorkSpark Autotuning Talk - Strata New York
Spark Autotuning Talk - Strata New York
 
Getting Buzzed on Buzzwords: Using Cloud & Big Data to Pentest at Scale
Getting Buzzed on Buzzwords: Using Cloud & Big Data to Pentest at ScaleGetting Buzzed on Buzzwords: Using Cloud & Big Data to Pentest at Scale
Getting Buzzed on Buzzwords: Using Cloud & Big Data to Pentest at Scale
 
Benchmarking at Parse
Benchmarking at ParseBenchmarking at Parse
Benchmarking at Parse
 
개발자가 알아두면 좋은 5가지 AWS 인공 지능 서비스 깨알 지식 (윤석찬, AWS 테크에반젤리스트) :: AWS DevDay 2018
개발자가 알아두면 좋은 5가지 AWS 인공 지능 서비스 깨알 지식 (윤석찬, AWS 테크에반젤리스트) ::  AWS DevDay 2018개발자가 알아두면 좋은 5가지 AWS 인공 지능 서비스 깨알 지식 (윤석찬, AWS 테크에반젤리스트) ::  AWS DevDay 2018
개발자가 알아두면 좋은 5가지 AWS 인공 지능 서비스 깨알 지식 (윤석찬, AWS 테크에반젤리스트) :: AWS DevDay 2018
 
SF Solr Meetup - Interactively Search and Visualize Your Big Data
SF Solr Meetup - Interactively Search and Visualize Your Big DataSF Solr Meetup - Interactively Search and Visualize Your Big Data
SF Solr Meetup - Interactively Search and Visualize Your Big Data
 
Spark intro by Adform Research
Spark intro by Adform ResearchSpark intro by Adform Research
Spark intro by Adform Research
 
(GAM303) Riot Games: Migrating Mountains of Data to AWS
(GAM303) Riot Games: Migrating Mountains of Data to AWS(GAM303) Riot Games: Migrating Mountains of Data to AWS
(GAM303) Riot Games: Migrating Mountains of Data to AWS
 
Quarkus - a shrink ray to your Java Application
Quarkus - a shrink ray to your Java ApplicationQuarkus - a shrink ray to your Java Application
Quarkus - a shrink ray to your Java Application
 
Testing in Scala. Adform Research
Testing in Scala. Adform ResearchTesting in Scala. Adform Research
Testing in Scala. Adform Research
 
CQRS + ES with Scala and Akka
CQRS + ES with Scala and AkkaCQRS + ES with Scala and Akka
CQRS + ES with Scala and Akka
 
How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent...
How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent...How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent...
How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent...
 
FPGAs in the cloud? (October 2017)
FPGAs in the cloud? (October 2017)FPGAs in the cloud? (October 2017)
FPGAs in the cloud? (October 2017)
 
JahiaOne - Performance Tuning
JahiaOne - Performance TuningJahiaOne - Performance Tuning
JahiaOne - Performance Tuning
 
Scalding - Big Data Programming with Scala
Scalding - Big Data Programming with ScalaScalding - Big Data Programming with Scala
Scalding - Big Data Programming with Scala
 
Running Yarn at Scale
Running Yarn at Scale Running Yarn at Scale
Running Yarn at Scale
 
Scaling Twitter
Scaling TwitterScaling Twitter
Scaling Twitter
 
Going serverless
Going serverlessGoing serverless
Going serverless
 
How we took our server side application to the cloud and liked what we got
How we took our server side application to the cloud and liked what we gotHow we took our server side application to the cloud and liked what we got
How we took our server side application to the cloud and liked what we got
 

Viewers also liked

Yarns About Yarn
Yarns About YarnYarns About Yarn
Yarns About Yarn
Cloudera, Inc.
 
The Future of Data
The Future of DataThe Future of Data
The Future of Data
blynnbuckley
 
Hw09 Cloudera Desktop In Detail
Hw09   Cloudera Desktop In DetailHw09   Cloudera Desktop In Detail
Hw09 Cloudera Desktop In DetailCloudera, Inc.
 
451 Research Impact Report
451 Research Impact Report451 Research Impact Report
451 Research Impact Report
Infochimps, a CSC Big Data Business
 
Cloudera introduction
Cloudera introductionCloudera introduction
Cloudera introduction
Phate334
 
Introduction to YARN Apps
Introduction to YARN AppsIntroduction to YARN Apps
Introduction to YARN Apps
Cloudera, Inc.
 
Tune up Yarn and Hive
Tune up Yarn and HiveTune up Yarn and Hive
Tune up Yarn and Hive
rxu
 
Unlock Hadoop Success with Cloudera Navigator Optimizer
Unlock Hadoop Success with Cloudera Navigator OptimizerUnlock Hadoop Success with Cloudera Navigator Optimizer
Unlock Hadoop Success with Cloudera Navigator Optimizer
Cloudera, Inc.
 
A beginners guide to Cloudera Hadoop
A beginners guide to Cloudera HadoopA beginners guide to Cloudera Hadoop
A beginners guide to Cloudera Hadoop
David Yahalom
 
Hadoop administration using cloudera student lab guidebook
Hadoop administration using cloudera   student lab guidebookHadoop administration using cloudera   student lab guidebook
Hadoop administration using cloudera student lab guidebook
Niranjan Pandey
 
Hadoop & Cloudera Workshop
Hadoop & Cloudera WorkshopHadoop & Cloudera Workshop
Hadoop & Cloudera Workshop
Serkan Sakınmaz
 
Cloudera Desktop
Cloudera DesktopCloudera Desktop
Cloudera Desktop
Hadoop User Group
 
Hadoop I/O Analysis
Hadoop I/O AnalysisHadoop I/O Analysis
Hadoop I/O Analysis
Richard McDougall
 
Data Science at Scale Using Apache Spark and Apache Hadoop
Data Science at Scale Using Apache Spark and Apache HadoopData Science at Scale Using Apache Spark and Apache Hadoop
Data Science at Scale Using Apache Spark and Apache Hadoop
Cloudera, Inc.
 
Hadoop Workshop using Cloudera on Amazon EC2
Hadoop Workshop using Cloudera on Amazon EC2Hadoop Workshop using Cloudera on Amazon EC2
Hadoop Workshop using Cloudera on Amazon EC2
IMC Institute
 

Viewers also liked (15)

Yarns About Yarn
Yarns About YarnYarns About Yarn
Yarns About Yarn
 
The Future of Data
The Future of DataThe Future of Data
The Future of Data
 
Hw09 Cloudera Desktop In Detail
Hw09   Cloudera Desktop In DetailHw09   Cloudera Desktop In Detail
Hw09 Cloudera Desktop In Detail
 
451 Research Impact Report
451 Research Impact Report451 Research Impact Report
451 Research Impact Report
 
Cloudera introduction
Cloudera introductionCloudera introduction
Cloudera introduction
 
Introduction to YARN Apps
Introduction to YARN AppsIntroduction to YARN Apps
Introduction to YARN Apps
 
Tune up Yarn and Hive
Tune up Yarn and HiveTune up Yarn and Hive
Tune up Yarn and Hive
 
Unlock Hadoop Success with Cloudera Navigator Optimizer
Unlock Hadoop Success with Cloudera Navigator OptimizerUnlock Hadoop Success with Cloudera Navigator Optimizer
Unlock Hadoop Success with Cloudera Navigator Optimizer
 
A beginners guide to Cloudera Hadoop
A beginners guide to Cloudera HadoopA beginners guide to Cloudera Hadoop
A beginners guide to Cloudera Hadoop
 
Hadoop administration using cloudera student lab guidebook
Hadoop administration using cloudera   student lab guidebookHadoop administration using cloudera   student lab guidebook
Hadoop administration using cloudera student lab guidebook
 
Hadoop & Cloudera Workshop
Hadoop & Cloudera WorkshopHadoop & Cloudera Workshop
Hadoop & Cloudera Workshop
 
Cloudera Desktop
Cloudera DesktopCloudera Desktop
Cloudera Desktop
 
Hadoop I/O Analysis
Hadoop I/O AnalysisHadoop I/O Analysis
Hadoop I/O Analysis
 
Data Science at Scale Using Apache Spark and Apache Hadoop
Data Science at Scale Using Apache Spark and Apache HadoopData Science at Scale Using Apache Spark and Apache Hadoop
Data Science at Scale Using Apache Spark and Apache Hadoop
 
Hadoop Workshop using Cloudera on Amazon EC2
Hadoop Workshop using Cloudera on Amazon EC2Hadoop Workshop using Cloudera on Amazon EC2
Hadoop Workshop using Cloudera on Amazon EC2
 

Similar to Spark tuning2016may11bida

Spark Tuning for Enterprise System Administrators By Anya Bida
Spark Tuning for Enterprise System Administrators By Anya BidaSpark Tuning for Enterprise System Administrators By Anya Bida
Spark Tuning for Enterprise System Administrators By Anya Bida
Spark Summit
 
Just Enough DevOps for Data Scientists Part II: Handling Infra Failures When ...
Just Enough DevOps for Data Scientists Part II: Handling Infra Failures When ...Just Enough DevOps for Data Scientists Part II: Handling Infra Failures When ...
Just Enough DevOps for Data Scientists Part II: Handling Infra Failures When ...
Anya Bida
 
Scaling Apache Spark at Facebook
Scaling Apache Spark at FacebookScaling Apache Spark at Facebook
Scaling Apache Spark at Facebook
Databricks
 
Debugging PySpark - PyCon US 2018
Debugging PySpark -  PyCon US 2018Debugging PySpark -  PyCon US 2018
Debugging PySpark - PyCon US 2018
Holden Karau
 
Spark autotuning talk final
Spark autotuning talk finalSpark autotuning talk final
Spark autotuning talk final
Rachel Warren
 
Spark / Mesos Cluster Optimization
Spark / Mesos Cluster OptimizationSpark / Mesos Cluster Optimization
Spark / Mesos Cluster Optimization
ebiznext
 
Using apache spark for processing trillions of records each day at Datadog
Using apache spark for processing trillions of records each day at DatadogUsing apache spark for processing trillions of records each day at Datadog
Using apache spark for processing trillions of records each day at Datadog
Vadim Semenov
 
Spark Intro by Adform Research
Spark Intro by Adform ResearchSpark Intro by Adform Research
Spark Intro by Adform ResearchVasil Remeniuk
 
Kafka Summit SF 2017 - Streaming Processing in Python – 10 ways to avoid summ...
Kafka Summit SF 2017 - Streaming Processing in Python – 10 ways to avoid summ...Kafka Summit SF 2017 - Streaming Processing in Python – 10 ways to avoid summ...
Kafka Summit SF 2017 - Streaming Processing in Python – 10 ways to avoid summ...
confluent
 
Debugging Apache Spark - Scala & Python super happy fun times 2017
Debugging Apache Spark -   Scala & Python super happy fun times 2017Debugging Apache Spark -   Scala & Python super happy fun times 2017
Debugging Apache Spark - Scala & Python super happy fun times 2017
Holden Karau
 
Just enough DevOps for Data Scientists (Part II)
Just enough DevOps for Data Scientists (Part II)Just enough DevOps for Data Scientists (Part II)
Just enough DevOps for Data Scientists (Part II)
Databricks
 
Getting Started with Apache Spark on Kubernetes
Getting Started with Apache Spark on KubernetesGetting Started with Apache Spark on Kubernetes
Getting Started with Apache Spark on Kubernetes
Databricks
 
Debugging PySpark - Spark Summit East 2017
Debugging PySpark - Spark Summit East 2017Debugging PySpark - Spark Summit East 2017
Debugging PySpark - Spark Summit East 2017
Holden Karau
 
Debugging PySpark: Spark Summit East talk by Holden Karau
Debugging PySpark: Spark Summit East talk by Holden KarauDebugging PySpark: Spark Summit East talk by Holden Karau
Debugging PySpark: Spark Summit East talk by Holden Karau
Spark Summit
 
Understanding Spark Tuning: Strata New York
Understanding Spark Tuning: Strata New YorkUnderstanding Spark Tuning: Strata New York
Understanding Spark Tuning: Strata New York
Rachel Warren
 
Spark Autotuning - Strata EU 2018
Spark Autotuning - Strata EU 2018Spark Autotuning - Strata EU 2018
Spark Autotuning - Strata EU 2018
Holden Karau
 
Apache Spark Performance is too hard. Let's make it easier
Apache Spark Performance is too hard. Let's make it easierApache Spark Performance is too hard. Let's make it easier
Apache Spark Performance is too hard. Let's make it easier
Databricks
 
Debugging Spark: Scala and Python - Super Happy Fun Times @ Data Day Texas 2018
Debugging Spark:  Scala and Python - Super Happy Fun Times @ Data Day Texas 2018Debugging Spark:  Scala and Python - Super Happy Fun Times @ Data Day Texas 2018
Debugging Spark: Scala and Python - Super Happy Fun Times @ Data Day Texas 2018
Holden Karau
 

Similar to Spark tuning2016may11bida (20)

Spark Tuning for Enterprise System Administrators By Anya Bida
Spark Tuning for Enterprise System Administrators By Anya BidaSpark Tuning for Enterprise System Administrators By Anya Bida
Spark Tuning for Enterprise System Administrators By Anya Bida
 
Just Enough DevOps for Data Scientists Part II: Handling Infra Failures When ...
Just Enough DevOps for Data Scientists Part II: Handling Infra Failures When ...Just Enough DevOps for Data Scientists Part II: Handling Infra Failures When ...
Just Enough DevOps for Data Scientists Part II: Handling Infra Failures When ...
 
Spark Meetup
Spark MeetupSpark Meetup
Spark Meetup
 
Scaling Apache Spark at Facebook
Scaling Apache Spark at FacebookScaling Apache Spark at Facebook
Scaling Apache Spark at Facebook
 
Debugging PySpark - PyCon US 2018
Debugging PySpark -  PyCon US 2018Debugging PySpark -  PyCon US 2018
Debugging PySpark - PyCon US 2018
 
Spark autotuning talk final
Spark autotuning talk finalSpark autotuning talk final
Spark autotuning talk final
 
Spark / Mesos Cluster Optimization
Spark / Mesos Cluster OptimizationSpark / Mesos Cluster Optimization
Spark / Mesos Cluster Optimization
 
Using apache spark for processing trillions of records each day at Datadog
Using apache spark for processing trillions of records each day at DatadogUsing apache spark for processing trillions of records each day at Datadog
Using apache spark for processing trillions of records each day at Datadog
 
Spark Intro by Adform Research
Spark Intro by Adform ResearchSpark Intro by Adform Research
Spark Intro by Adform Research
 
NYC_2016_slides
NYC_2016_slidesNYC_2016_slides
NYC_2016_slides
 
Kafka Summit SF 2017 - Streaming Processing in Python – 10 ways to avoid summ...
Kafka Summit SF 2017 - Streaming Processing in Python – 10 ways to avoid summ...Kafka Summit SF 2017 - Streaming Processing in Python – 10 ways to avoid summ...
Kafka Summit SF 2017 - Streaming Processing in Python – 10 ways to avoid summ...
 
Debugging Apache Spark - Scala & Python super happy fun times 2017
Debugging Apache Spark -   Scala & Python super happy fun times 2017Debugging Apache Spark -   Scala & Python super happy fun times 2017
Debugging Apache Spark - Scala & Python super happy fun times 2017
 
Just enough DevOps for Data Scientists (Part II)
Just enough DevOps for Data Scientists (Part II)Just enough DevOps for Data Scientists (Part II)
Just enough DevOps for Data Scientists (Part II)
 
Getting Started with Apache Spark on Kubernetes
Getting Started with Apache Spark on KubernetesGetting Started with Apache Spark on Kubernetes
Getting Started with Apache Spark on Kubernetes
 
Debugging PySpark - Spark Summit East 2017
Debugging PySpark - Spark Summit East 2017Debugging PySpark - Spark Summit East 2017
Debugging PySpark - Spark Summit East 2017
 
Debugging PySpark: Spark Summit East talk by Holden Karau
Debugging PySpark: Spark Summit East talk by Holden KarauDebugging PySpark: Spark Summit East talk by Holden Karau
Debugging PySpark: Spark Summit East talk by Holden Karau
 
Understanding Spark Tuning: Strata New York
Understanding Spark Tuning: Strata New YorkUnderstanding Spark Tuning: Strata New York
Understanding Spark Tuning: Strata New York
 
Spark Autotuning - Strata EU 2018
Spark Autotuning - Strata EU 2018Spark Autotuning - Strata EU 2018
Spark Autotuning - Strata EU 2018
 
Apache Spark Performance is too hard. Let's make it easier
Apache Spark Performance is too hard. Let's make it easierApache Spark Performance is too hard. Let's make it easier
Apache Spark Performance is too hard. Let's make it easier
 
Debugging Spark: Scala and Python - Super Happy Fun Times @ Data Day Texas 2018
Debugging Spark:  Scala and Python - Super Happy Fun Times @ Data Day Texas 2018Debugging Spark:  Scala and Python - Super Happy Fun Times @ Data Day Texas 2018
Debugging Spark: Scala and Python - Super Happy Fun Times @ Data Day Texas 2018
 

Recently uploaded

一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Subhajit Sahu
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
2023240532
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 

Recently uploaded (20)

一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 

Spark tuning2016may11bida