SlideShare a Scribd company logo
1© Cloudera, Inc. All rights reserved.
Why your Spark Job is Failing
Kostas Sakellis
2© Cloudera, Inc. All rights reserved.
Me
• Software Engineering at Cloudera
• Contributor to Apache Spark
• Before that, worked on Cloudera Manager
3© Cloudera, Inc. All rights reserved.
com.esotericsoftware.kryo.KryoException:
Unable to find class:
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC
$$iwC$$anonfun$4$$anonfun$apply$3
4© Cloudera, Inc. All rights reserved.
We go about our
day ignoring
manholes until…
Courtesy of: http://www.independent.co.uk/incoming/article9127706.ece/binary/original/maholev23.jpg
5© Cloudera, Inc. All rights reserved.
… something goes
wrong.
Courtesy of: http://greenpointers.com/wp-content/uploads/2015/03/Manhole-Explosion1.jpg
6© Cloudera, Inc. All rights reserved.
org.apache.spark.SparkException: Job aborted due to stage failure: Task 1
in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage
0.0 (TID 6, kostas-4.vpc.cloudera.com): java.lang.NumberFormatException:
For input string: "3.9166,10.2491,-4.0926,-4.4659,0"
at
sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1250)
at java.lang.Double.parseDouble(Double.java:540)
at scala.collection.immutable.StringLike
[...]
Driver stacktrace:
at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGSched
uler$$failJobAndIndependentStages(DAGScheduler.scala:1203)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGSche
duler.scala:1192)
[...]
7© Cloudera, Inc. All rights reserved.
org.apache.spark.SparkException: Job aborted due to stage failure: Task 1
in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage
0.0 (TID 6, kostas-4.vpc.cloudera.com): java.lang.NumberFormatException:
For input string: "3.9166,10.2491,-4.0926,-4.4659,0"
at
sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1250)
at java.lang.Double.parseDouble(Double.java:540)
at scala.collection.immutable.StringLike
[...]
Driver stacktrace:
at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGSched
uler$$failJobAndIndependentStages(DAGScheduler.scala:1203)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGSche
duler.scala:1192)
[...]
8© Cloudera, Inc. All rights reserved.
Job? What now?
Courtesy of:http://calvert.lib.md.us/jobs_pic.jpg
9© Cloudera, Inc. All rights reserved.
Example
sc.textFile(“hdfs://…”, 4)
.map((x) => x.toInt)
.filter(_ > 10)
.sum()
10© Cloudera, Inc. All rights reserved.
Example
sc.textFile(“hdfs://…”, 4)
.map((x) => x.toInt)
.filter(_ > 10)
.sum()
11© Cloudera, Inc. All rights reserved.
Example
sc.textFile(“hdfs://…”, 4)
.map((x) => x.toInt)
.filter(_ > 10)
.sum()
12© Cloudera, Inc. All rights reserved.
Then what the
heck is a stage?
Courtesy of: https://writinginadeadworld.files.wordpress.com/2014/03/rock1.jpeg
13© Cloudera, Inc. All rights reserved.
Partitions
sc.textFile(“hdfs://…”, 4)
.map((x) => x.toInt)
.filter(_ > 10)
.sum()
HDFS
Partition 1
Partition 2
Partition 3
Partition 4
14© Cloudera, Inc. All rights reserved.
RDDs
sc.textFile(“hdfs://…”, 4)
.map((x) => x.toInt)
.filter(_ > 10)
.sum()
…RDD1
HDFS
Partition 1
Partition 2
Partition 3
Partition 4
15© Cloudera, Inc. All rights reserved.
RDDs
sc.textFile(“hdfs://…”, 4)
.map((x) => x.toInt)
.filter(_ > 10)
.sum()
…RDD1 …RDD2
HDFS
Partition 1
Partition 2
Partition 3
Partition 4
Partition 1
Partition 2
Partition 3
Partition 4
16© Cloudera, Inc. All rights reserved.
RDDs
sc.textFile(“hdfs://…”, 4)
.map((x) => x.toInt)
.filter(_ > 10)
.sum()
…RDD1 …RDD2
HDFS
Partition 1
Partition 2
Partition 3
Partition 4
Partition 1
Partition 2
Partition 3
Partition 4
…RDD3
Partition 1
Partition 2
Partition 3
Partition 4
17© Cloudera, Inc. All rights reserved.
…RDD1 …RDD2
RDDs
HDFS
Partition 1
Partition 2
Partition 3
Partition 4
sc.textFile(“hdfs://…”, 4)
.map((x) => x.toInt)
.filter(_ > 10)
.sum()
Partition 1
Partition 2
Partition 3
Partition 4
…RDD3
Partition 1
Partition 2
Partition 3
Partition 4
Sum
18© Cloudera, Inc. All rights reserved.
…RDD1 …RDD2
RDD Lineage
HDFS
Partition 1
Partition 2
Partition 3
Partition 4
sc.textFile(“hdfs://…”, 4)
.map((x) => x.toInt)
.filter(_ > 10)
.sum()
Partition 1
Partition 2
Partition 3
Partition 4
…RDD3
Partition 1
Partition 2
Partition 3
Partition 4
Sum
Lineage
19© Cloudera, Inc. All rights reserved.
RDD Dependencies
…RDD1 …RDD2
HDFS
Partition 1
Partition 2
Partition 3
Partition 4
Partition 1
Partition 2
Partition 3
Partition 4
…RDD3
Partition 1
Partition 2
Partition 3
Partition 4
Sum
Narrow Dependencies
• Narrow and Wide Dependencies
20© Cloudera, Inc. All rights reserved.
Wide Dependencies
• Sometimes records need to be grouped together
• Examples
• join
• groupByKey
• Stages created at wide dependency boundaries
21© Cloudera, Inc. All rights reserved.
A more Interesting Spark Job
val rdd1 = sc.textFile(“hdfs://...”)
.map(someFunc)
.filter(filterFunc)
val rdd2 = sc.hadoopFile(“hdfs://...”)
.groupByKey()
.map(someOtherFunc)
val rdd3 = rdd1.join(rdd2)
.map(someFunc)
rdd3.collect()
22© Cloudera, Inc. All rights reserved.
A more Interesting Spark Job
val rdd1 = sc.textFile(“hdfs://...”)
.map(someFunc)
.filter(filterFunc)
maptextFile filter
23© Cloudera, Inc. All rights reserved.
A more Interesting Spark Job
val rdd2 = sc.hadoopFile(“hdfs://...”)
.groupByKey()
.map(someOtherFunc)
groupByKeyhadoopFile map
24© Cloudera, Inc. All rights reserved.
A more Interesting Spark Job
val rdd3 = rdd1.join(rdd2)
.map(someFunc)
join map
25© Cloudera, Inc. All rights reserved.
A more Interesting Spark Job
rdd3.collect()
maptextFile filter
group
ByKey
hadoop
File
map
join map
1
Wide Dependencies
1
2 3
4
26© Cloudera, Inc. All rights reserved.
Get to the point before I stop
caring!
27© Cloudera, Inc. All rights reserved.
org.apache.spark.SparkException: Job aborted due to stage failure: Task 1
in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage
0.0 (TID 6, kostas-4.vpc.cloudera.com): java.lang.NumberFormatException:
For input string: "3.9166,10.2491,-4.0926,-4.4659,0"
at
sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1250)
at java.lang.Double.parseDouble(Double.java:540)
at scala.collection.immutable.StringLike
[...]
Driver stacktrace:
at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGSched
uler$$failJobAndIndependentStages(DAGScheduler.scala:1203)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGSche
duler.scala:1192)
[...]
28© Cloudera, Inc. All rights reserved.
What was the failure?
org.apache.spark.SparkException: Job aborted due to stage failure: Task 1
in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage
0.0 (TID 6, kostas-4.vpc.cloudera.com): java.lang.NumberFormatException:
For input string: "3.9166,10.2491,-4.0926,-4.4659,0”
[...]
29© Cloudera, Inc. All rights reserved.
What was the failure?
Stage
Task Task
Task Task
30© Cloudera, Inc. All rights reserved.
What was the failure?
Stage
Task Task
Task Task
31© Cloudera, Inc. All rights reserved.
What was the failure?
Stage
Task Task
Task Task
spark.task.maxFailures=4
32© Cloudera, Inc. All rights reserved.
org.apache.spark.SparkException: Job aborted due to stage failure: Task 1
in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage
0.0 (TID 6, kostas-4.vpc.cloudera.com): java.lang.NumberFormatException:
For input string: "3.9166,10.2491,-4.0926,-4.4659,0"
at
sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1250)
at java.lang.Double.parseDouble(Double.java:540)
at scala.collection.immutable.StringLike
[...]
Driver stacktrace:
at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGSched
uler$$failJobAndIndependentStages(DAGScheduler.scala:1203)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGSche
duler.scala:1192)
[...]
33© Cloudera, Inc. All rights reserved.
org.apache.spark.SparkException: Job aborted due to stage failure: Task 1
in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage
0.0 (TID 6, kostas-4.vpc.cloudera.com): java.lang.NumberFormatException:
For input string: "3.9166,10.2491,-4.0926,-4.4659,0"
at
sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1250)
at java.lang.Double.parseDouble(Double.java:540)
at scala.collection.immutable.StringLike
[...]
Driver stacktrace:
at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGSched
uler$$failJobAndIndependentStages(DAGScheduler.scala:1203)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGSche
duler.scala:1192)
[...]
34© Cloudera, Inc. All rights reserved.
ERROR executor.Executor: Exception in task ID 2866
java.io.IOException: Filesystem closed
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:565)
at
org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:648)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:706)
at java.io.DataInputStream.read(DataInputStream.java:100)
at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:209)
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:173)
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:206)
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:45)
at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:164)
[...]
35© Cloudera, Inc. All rights reserved.
Spark Architecture
36© Cloudera, Inc. All rights reserved.
YARN Architecture
Resource Manager
Node Manager
Container Container
Node Manager
Container Container
Application
Master
Client
Process Process
37© Cloudera, Inc. All rights reserved.
Spark on YARN Architecture
Resource Manager
Node Manager
Container Container
Node Manager
Container ContainerClient
Process Process
38© Cloudera, Inc. All rights reserved.
Spark on YARN Architecture
Resource Manager
Node Manager
Container Container
Node Manager
Container Container
Application
Master
Client
Process Process
39© Cloudera, Inc. All rights reserved.
spark-submit --executor-memory 2g
--master yarn-client
--num-executors 2
--num-cores 2
40© Cloudera, Inc. All rights reserved.
Container
[pid=63375,containerID=container_1388158490598_0001_01_00
0003] is running beyond physical memory limits. Current
usage: 2.2 GB of 2.1 GB physical memory used; 2.8 GB of
4.2 GB virtual memory used. Killing container.
[...]
41© Cloudera, Inc. All rights reserved.
Container
[pid=63375,containerID=container_1388158490598_0001_01_00
0003] is running beyond physical memory limits. Current
usage: 2.2 GB of 2.1 GB physical memory used; 2.8 GB of
4.2 GB virtual memory used. Killing container.
[...]
42© Cloudera, Inc. All rights reserved.
spark-submit --executor-memory 2g
--master yarn-client
--num-executors 2
--num-cores 2
43© Cloudera, Inc. All rights reserved.
yarn.nodemanager.resource.memory-mb
Executor Container
spark.yarn.executor.memoryOverhead (7%) (10% in 1.4)
spark.executor.memory
spark.shuffle.memoryFraction (0.4) spark.storage.memoryFraction (0.6)
Memory allocation
44© Cloudera, Inc. All rights reserved.
Sometimes jobs run
slow or even…
Courtesy of: http://blog.sdrock.com/pastors/files/2013/06/time-clock.jpg
45© Cloudera, Inc. All rights reserved.
java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1986)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
[...]
46© Cloudera, Inc. All rights reserved.
GC Stalls
47© Cloudera, Inc. All rights reserved.
Too much spilling!
Courtesy of: http://tgnp.me/wp-content/uploads/2014/05/spilled-starbucks.jpg
48© Cloudera, Inc. All rights reserved.
Shuffle Boundaries
maptextFile filter
group
ByKey
hadoop
File
map
join map
Shuffle
49© Cloudera, Inc. All rights reserved.
Most performance issues are in
shuffles!
50© Cloudera, Inc. All rights reserved.
Inside a Task: Fetch & Aggregate
ExternalAppendOnlyMapBlock
Block
deserialize
deserialize
key1 -> values
key2 -> values
key3 -> values
key4 -> values
Sort & Spill
key1 -> values
key2 -> values
key3 -> values
51© Cloudera, Inc. All rights reserved.
rdd.reduceByKey(reduceFunc,
numPartitions=1000)
Inside a Task: Specify partitions
52© Cloudera, Inc. All rights reserved.
Why not set partitions to ∞ ?
53© Cloudera, Inc. All rights reserved.
Excessive parallelism
• Overwhelming scheduler overhead
• More fetches -> more disk seeks
• Driver needs to track state per-task
54© Cloudera, Inc. All rights reserved.
So how to choose?
• Easy answer:
• Keep multiplying by 1.5 and see what works
55© Cloudera, Inc. All rights reserved.
Is Spark bad?
Courtesy of: https://theferkel.files.wordpress.com/2015/04/250474-breaking-bad-quotes.jpg
56© Cloudera, Inc. All rights reserved.
Thank you

More Related Content

What's hot

Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applications
hadooparchbook
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache Spark
Databricks
 
Apache Spark Introduction and Resilient Distributed Dataset basics and deep dive
Apache Spark Introduction and Resilient Distributed Dataset basics and deep diveApache Spark Introduction and Resilient Distributed Dataset basics and deep dive
Apache Spark Introduction and Resilient Distributed Dataset basics and deep dive
Sachin Aggarwal
 
Fine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark JobsFine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark Jobs
Databricks
 
Apache Spark Core – Practical Optimization
Apache Spark Core – Practical OptimizationApache Spark Core – Practical Optimization
Apache Spark Core – Practical Optimization
Databricks
 
Memory Management in Apache Spark
Memory Management in Apache SparkMemory Management in Apache Spark
Memory Management in Apache Spark
Databricks
 
Building a SIMD Supported Vectorized Native Engine for Spark SQL
Building a SIMD Supported Vectorized Native Engine for Spark SQLBuilding a SIMD Supported Vectorized Native Engine for Spark SQL
Building a SIMD Supported Vectorized Native Engine for Spark SQL
Databricks
 
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark Applications
Spark Summit
 
Physical Plans in Spark SQL
Physical Plans in Spark SQLPhysical Plans in Spark SQL
Physical Plans in Spark SQL
Databricks
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
Flink Forward
 
Dynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache SparkDynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache Spark
Databricks
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
Rahul Jain
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Databricks
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveDataWorks Summit
 
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkSpark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Bo Yang
 
Apache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper OptimizationApache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper Optimization
Databricks
 
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsApache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & Internals
Anton Kirillov
 
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekProcessing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeek
Venkata Naga Ravi
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Databricks
 
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache CalciteCost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
Julian Hyde
 

What's hot (20)

Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applications
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache Spark
 
Apache Spark Introduction and Resilient Distributed Dataset basics and deep dive
Apache Spark Introduction and Resilient Distributed Dataset basics and deep diveApache Spark Introduction and Resilient Distributed Dataset basics and deep dive
Apache Spark Introduction and Resilient Distributed Dataset basics and deep dive
 
Fine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark JobsFine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark Jobs
 
Apache Spark Core – Practical Optimization
Apache Spark Core – Practical OptimizationApache Spark Core – Practical Optimization
Apache Spark Core – Practical Optimization
 
Memory Management in Apache Spark
Memory Management in Apache SparkMemory Management in Apache Spark
Memory Management in Apache Spark
 
Building a SIMD Supported Vectorized Native Engine for Spark SQL
Building a SIMD Supported Vectorized Native Engine for Spark SQLBuilding a SIMD Supported Vectorized Native Engine for Spark SQL
Building a SIMD Supported Vectorized Native Engine for Spark SQL
 
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark Applications
 
Physical Plans in Spark SQL
Physical Plans in Spark SQLPhysical Plans in Spark SQL
Physical Plans in Spark SQL
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
Dynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache SparkDynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache Spark
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
 
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkSpark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
 
Apache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper OptimizationApache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper Optimization
 
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsApache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & Internals
 
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekProcessing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeek
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
 
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache CalciteCost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
 

Similar to Why your Spark Job is Failing

Spark etl
Spark etlSpark etl
Spark etl
Imran Rashid
 
Harmonious Development: Via Vagrant and Puppet
Harmonious Development: Via Vagrant and PuppetHarmonious Development: Via Vagrant and Puppet
Harmonious Development: Via Vagrant and Puppet
Achieve Internet
 
Netflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search RoadshowNetflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search Roadshow
Adrian Cockcroft
 
Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...
Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...
Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...
Cloudera, Inc.
 
Docker Security workshop slides
Docker Security workshop slidesDocker Security workshop slides
Docker Security workshop slides
Docker, Inc.
 
New Docker Features for Orchestration and Containers
New Docker Features for Orchestration and ContainersNew Docker Features for Orchestration and Containers
New Docker Features for Orchestration and Containers
Jeff Anderson
 
What's New in Docker 1.12 (June 20, 2016) by Mike Goelzer & Andrea Luzzardi
What's New in Docker 1.12 (June 20, 2016) by Mike Goelzer & Andrea LuzzardiWhat's New in Docker 1.12 (June 20, 2016) by Mike Goelzer & Andrea Luzzardi
What's New in Docker 1.12 (June 20, 2016) by Mike Goelzer & Andrea Luzzardi
Mike Goelzer
 
What's New in Docker 1.12 by Mike Goelzer and Andrea Luzzardi
What's New in Docker 1.12 by Mike Goelzer and Andrea LuzzardiWhat's New in Docker 1.12 by Mike Goelzer and Andrea Luzzardi
What's New in Docker 1.12 by Mike Goelzer and Andrea Luzzardi
Docker, Inc.
 
Things I've learned working with Docker Support
Things I've learned working with Docker SupportThings I've learned working with Docker Support
Things I've learned working with Docker Support
Sujay Pillai
 
Introduction to Apache Spark
Introduction to Apache Spark Introduction to Apache Spark
Introduction to Apache Spark
Hubert Fan Chiang
 
Sparkstreaming
SparkstreamingSparkstreaming
Sparkstreaming
Marilyn Waldman
 
Securing Cassandra for Compliance
Securing Cassandra for ComplianceSecuring Cassandra for Compliance
Securing Cassandra for Compliance
DataStax
 
Hardening cassandra q2_2016
Hardening cassandra q2_2016Hardening cassandra q2_2016
Hardening cassandra q2_2016
zznate
 
Drupalcon2007 Sun
Drupalcon2007 SunDrupalcon2007 Sun
Drupalcon2007 Sun
smattoon
 
Effective Spark on Multi-Tenant Clusters
Effective Spark on Multi-Tenant ClustersEffective Spark on Multi-Tenant Clusters
Effective Spark on Multi-Tenant Clusters
DataWorks Summit/Hadoop Summit
 
Cloud-native applications with Java and Kubernetes - Yehor Volkov
 Cloud-native applications with Java and Kubernetes - Yehor Volkov Cloud-native applications with Java and Kubernetes - Yehor Volkov
Cloud-native applications with Java and Kubernetes - Yehor Volkov
Kuberton
 
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
StampedeCon
 
"Petascale Genomics with Spark", Sean Owen,Director of Data Science at Cloudera
"Petascale Genomics with Spark", Sean Owen,Director of Data Science at Cloudera"Petascale Genomics with Spark", Sean Owen,Director of Data Science at Cloudera
"Petascale Genomics with Spark", Sean Owen,Director of Data Science at Cloudera
Dataconomy Media
 
Apache Spark Operations
Apache Spark OperationsApache Spark Operations
Apache Spark Operations
Cloudera, Inc.
 
Docker San Francisco Meetup April 2015 - The Docker Orchestration Ecosystem o...
Docker San Francisco Meetup April 2015 - The Docker Orchestration Ecosystem o...Docker San Francisco Meetup April 2015 - The Docker Orchestration Ecosystem o...
Docker San Francisco Meetup April 2015 - The Docker Orchestration Ecosystem o...
Patrick Chanezon
 

Similar to Why your Spark Job is Failing (20)

Spark etl
Spark etlSpark etl
Spark etl
 
Harmonious Development: Via Vagrant and Puppet
Harmonious Development: Via Vagrant and PuppetHarmonious Development: Via Vagrant and Puppet
Harmonious Development: Via Vagrant and Puppet
 
Netflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search RoadshowNetflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search Roadshow
 
Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...
Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...
Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...
 
Docker Security workshop slides
Docker Security workshop slidesDocker Security workshop slides
Docker Security workshop slides
 
New Docker Features for Orchestration and Containers
New Docker Features for Orchestration and ContainersNew Docker Features for Orchestration and Containers
New Docker Features for Orchestration and Containers
 
What's New in Docker 1.12 (June 20, 2016) by Mike Goelzer & Andrea Luzzardi
What's New in Docker 1.12 (June 20, 2016) by Mike Goelzer & Andrea LuzzardiWhat's New in Docker 1.12 (June 20, 2016) by Mike Goelzer & Andrea Luzzardi
What's New in Docker 1.12 (June 20, 2016) by Mike Goelzer & Andrea Luzzardi
 
What's New in Docker 1.12 by Mike Goelzer and Andrea Luzzardi
What's New in Docker 1.12 by Mike Goelzer and Andrea LuzzardiWhat's New in Docker 1.12 by Mike Goelzer and Andrea Luzzardi
What's New in Docker 1.12 by Mike Goelzer and Andrea Luzzardi
 
Things I've learned working with Docker Support
Things I've learned working with Docker SupportThings I've learned working with Docker Support
Things I've learned working with Docker Support
 
Introduction to Apache Spark
Introduction to Apache Spark Introduction to Apache Spark
Introduction to Apache Spark
 
Sparkstreaming
SparkstreamingSparkstreaming
Sparkstreaming
 
Securing Cassandra for Compliance
Securing Cassandra for ComplianceSecuring Cassandra for Compliance
Securing Cassandra for Compliance
 
Hardening cassandra q2_2016
Hardening cassandra q2_2016Hardening cassandra q2_2016
Hardening cassandra q2_2016
 
Drupalcon2007 Sun
Drupalcon2007 SunDrupalcon2007 Sun
Drupalcon2007 Sun
 
Effective Spark on Multi-Tenant Clusters
Effective Spark on Multi-Tenant ClustersEffective Spark on Multi-Tenant Clusters
Effective Spark on Multi-Tenant Clusters
 
Cloud-native applications with Java and Kubernetes - Yehor Volkov
 Cloud-native applications with Java and Kubernetes - Yehor Volkov Cloud-native applications with Java and Kubernetes - Yehor Volkov
Cloud-native applications with Java and Kubernetes - Yehor Volkov
 
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
 
"Petascale Genomics with Spark", Sean Owen,Director of Data Science at Cloudera
"Petascale Genomics with Spark", Sean Owen,Director of Data Science at Cloudera"Petascale Genomics with Spark", Sean Owen,Director of Data Science at Cloudera
"Petascale Genomics with Spark", Sean Owen,Director of Data Science at Cloudera
 
Apache Spark Operations
Apache Spark OperationsApache Spark Operations
Apache Spark Operations
 
Docker San Francisco Meetup April 2015 - The Docker Orchestration Ecosystem o...
Docker San Francisco Meetup April 2015 - The Docker Orchestration Ecosystem o...Docker San Francisco Meetup April 2015 - The Docker Orchestration Ecosystem o...
Docker San Francisco Meetup April 2015 - The Docker Orchestration Ecosystem o...
 

More from DataWorks Summit

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 

Recently uploaded (20)

Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 

Why your Spark Job is Failing

  • 1. 1© Cloudera, Inc. All rights reserved. Why your Spark Job is Failing Kostas Sakellis
  • 2. 2© Cloudera, Inc. All rights reserved. Me • Software Engineering at Cloudera • Contributor to Apache Spark • Before that, worked on Cloudera Manager
  • 3. 3© Cloudera, Inc. All rights reserved. com.esotericsoftware.kryo.KryoException: Unable to find class: $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC $$iwC$$anonfun$4$$anonfun$apply$3
  • 4. 4© Cloudera, Inc. All rights reserved. We go about our day ignoring manholes until… Courtesy of: http://www.independent.co.uk/incoming/article9127706.ece/binary/original/maholev23.jpg
  • 5. 5© Cloudera, Inc. All rights reserved. … something goes wrong. Courtesy of: http://greenpointers.com/wp-content/uploads/2015/03/Manhole-Explosion1.jpg
  • 6. 6© Cloudera, Inc. All rights reserved. org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 6, kostas-4.vpc.cloudera.com): java.lang.NumberFormatException: For input string: "3.9166,10.2491,-4.0926,-4.4659,0" at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1250) at java.lang.Double.parseDouble(Double.java:540) at scala.collection.immutable.StringLike [...] Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGSched uler$$failJobAndIndependentStages(DAGScheduler.scala:1203) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGSche duler.scala:1192) [...]
  • 7. 7© Cloudera, Inc. All rights reserved. org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 6, kostas-4.vpc.cloudera.com): java.lang.NumberFormatException: For input string: "3.9166,10.2491,-4.0926,-4.4659,0" at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1250) at java.lang.Double.parseDouble(Double.java:540) at scala.collection.immutable.StringLike [...] Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGSched uler$$failJobAndIndependentStages(DAGScheduler.scala:1203) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGSche duler.scala:1192) [...]
  • 8. 8© Cloudera, Inc. All rights reserved. Job? What now? Courtesy of:http://calvert.lib.md.us/jobs_pic.jpg
  • 9. 9© Cloudera, Inc. All rights reserved. Example sc.textFile(“hdfs://…”, 4) .map((x) => x.toInt) .filter(_ > 10) .sum()
  • 10. 10© Cloudera, Inc. All rights reserved. Example sc.textFile(“hdfs://…”, 4) .map((x) => x.toInt) .filter(_ > 10) .sum()
  • 11. 11© Cloudera, Inc. All rights reserved. Example sc.textFile(“hdfs://…”, 4) .map((x) => x.toInt) .filter(_ > 10) .sum()
  • 12. 12© Cloudera, Inc. All rights reserved. Then what the heck is a stage? Courtesy of: https://writinginadeadworld.files.wordpress.com/2014/03/rock1.jpeg
  • 13. 13© Cloudera, Inc. All rights reserved. Partitions sc.textFile(“hdfs://…”, 4) .map((x) => x.toInt) .filter(_ > 10) .sum() HDFS Partition 1 Partition 2 Partition 3 Partition 4
  • 14. 14© Cloudera, Inc. All rights reserved. RDDs sc.textFile(“hdfs://…”, 4) .map((x) => x.toInt) .filter(_ > 10) .sum() …RDD1 HDFS Partition 1 Partition 2 Partition 3 Partition 4
  • 15. 15© Cloudera, Inc. All rights reserved. RDDs sc.textFile(“hdfs://…”, 4) .map((x) => x.toInt) .filter(_ > 10) .sum() …RDD1 …RDD2 HDFS Partition 1 Partition 2 Partition 3 Partition 4 Partition 1 Partition 2 Partition 3 Partition 4
  • 16. 16© Cloudera, Inc. All rights reserved. RDDs sc.textFile(“hdfs://…”, 4) .map((x) => x.toInt) .filter(_ > 10) .sum() …RDD1 …RDD2 HDFS Partition 1 Partition 2 Partition 3 Partition 4 Partition 1 Partition 2 Partition 3 Partition 4 …RDD3 Partition 1 Partition 2 Partition 3 Partition 4
  • 17. 17© Cloudera, Inc. All rights reserved. …RDD1 …RDD2 RDDs HDFS Partition 1 Partition 2 Partition 3 Partition 4 sc.textFile(“hdfs://…”, 4) .map((x) => x.toInt) .filter(_ > 10) .sum() Partition 1 Partition 2 Partition 3 Partition 4 …RDD3 Partition 1 Partition 2 Partition 3 Partition 4 Sum
  • 18. 18© Cloudera, Inc. All rights reserved. …RDD1 …RDD2 RDD Lineage HDFS Partition 1 Partition 2 Partition 3 Partition 4 sc.textFile(“hdfs://…”, 4) .map((x) => x.toInt) .filter(_ > 10) .sum() Partition 1 Partition 2 Partition 3 Partition 4 …RDD3 Partition 1 Partition 2 Partition 3 Partition 4 Sum Lineage
  • 19. 19© Cloudera, Inc. All rights reserved. RDD Dependencies …RDD1 …RDD2 HDFS Partition 1 Partition 2 Partition 3 Partition 4 Partition 1 Partition 2 Partition 3 Partition 4 …RDD3 Partition 1 Partition 2 Partition 3 Partition 4 Sum Narrow Dependencies • Narrow and Wide Dependencies
  • 20. 20© Cloudera, Inc. All rights reserved. Wide Dependencies • Sometimes records need to be grouped together • Examples • join • groupByKey • Stages created at wide dependency boundaries
  • 21. 21© Cloudera, Inc. All rights reserved. A more Interesting Spark Job val rdd1 = sc.textFile(“hdfs://...”) .map(someFunc) .filter(filterFunc) val rdd2 = sc.hadoopFile(“hdfs://...”) .groupByKey() .map(someOtherFunc) val rdd3 = rdd1.join(rdd2) .map(someFunc) rdd3.collect()
  • 22. 22© Cloudera, Inc. All rights reserved. A more Interesting Spark Job val rdd1 = sc.textFile(“hdfs://...”) .map(someFunc) .filter(filterFunc) maptextFile filter
  • 23. 23© Cloudera, Inc. All rights reserved. A more Interesting Spark Job val rdd2 = sc.hadoopFile(“hdfs://...”) .groupByKey() .map(someOtherFunc) groupByKeyhadoopFile map
  • 24. 24© Cloudera, Inc. All rights reserved. A more Interesting Spark Job val rdd3 = rdd1.join(rdd2) .map(someFunc) join map
  • 25. 25© Cloudera, Inc. All rights reserved. A more Interesting Spark Job rdd3.collect() maptextFile filter group ByKey hadoop File map join map 1 Wide Dependencies 1 2 3 4
  • 26. 26© Cloudera, Inc. All rights reserved. Get to the point before I stop caring!
  • 27. 27© Cloudera, Inc. All rights reserved. org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 6, kostas-4.vpc.cloudera.com): java.lang.NumberFormatException: For input string: "3.9166,10.2491,-4.0926,-4.4659,0" at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1250) at java.lang.Double.parseDouble(Double.java:540) at scala.collection.immutable.StringLike [...] Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGSched uler$$failJobAndIndependentStages(DAGScheduler.scala:1203) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGSche duler.scala:1192) [...]
  • 28. 28© Cloudera, Inc. All rights reserved. What was the failure? org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 6, kostas-4.vpc.cloudera.com): java.lang.NumberFormatException: For input string: "3.9166,10.2491,-4.0926,-4.4659,0” [...]
  • 29. 29© Cloudera, Inc. All rights reserved. What was the failure? Stage Task Task Task Task
  • 30. 30© Cloudera, Inc. All rights reserved. What was the failure? Stage Task Task Task Task
  • 31. 31© Cloudera, Inc. All rights reserved. What was the failure? Stage Task Task Task Task spark.task.maxFailures=4
  • 32. 32© Cloudera, Inc. All rights reserved. org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 6, kostas-4.vpc.cloudera.com): java.lang.NumberFormatException: For input string: "3.9166,10.2491,-4.0926,-4.4659,0" at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1250) at java.lang.Double.parseDouble(Double.java:540) at scala.collection.immutable.StringLike [...] Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGSched uler$$failJobAndIndependentStages(DAGScheduler.scala:1203) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGSche duler.scala:1192) [...]
  • 33. 33© Cloudera, Inc. All rights reserved. org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 6, kostas-4.vpc.cloudera.com): java.lang.NumberFormatException: For input string: "3.9166,10.2491,-4.0926,-4.4659,0" at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1250) at java.lang.Double.parseDouble(Double.java:540) at scala.collection.immutable.StringLike [...] Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGSched uler$$failJobAndIndependentStages(DAGScheduler.scala:1203) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGSche duler.scala:1192) [...]
  • 34. 34© Cloudera, Inc. All rights reserved. ERROR executor.Executor: Exception in task ID 2866 java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:565) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:648) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:706) at java.io.DataInputStream.read(DataInputStream.java:100) at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:209) at org.apache.hadoop.util.LineReader.readLine(LineReader.java:173) at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:206) at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:45) at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:164) [...]
  • 35. 35© Cloudera, Inc. All rights reserved. Spark Architecture
  • 36. 36© Cloudera, Inc. All rights reserved. YARN Architecture Resource Manager Node Manager Container Container Node Manager Container Container Application Master Client Process Process
  • 37. 37© Cloudera, Inc. All rights reserved. Spark on YARN Architecture Resource Manager Node Manager Container Container Node Manager Container ContainerClient Process Process
  • 38. 38© Cloudera, Inc. All rights reserved. Spark on YARN Architecture Resource Manager Node Manager Container Container Node Manager Container Container Application Master Client Process Process
  • 39. 39© Cloudera, Inc. All rights reserved. spark-submit --executor-memory 2g --master yarn-client --num-executors 2 --num-cores 2
  • 40. 40© Cloudera, Inc. All rights reserved. Container [pid=63375,containerID=container_1388158490598_0001_01_00 0003] is running beyond physical memory limits. Current usage: 2.2 GB of 2.1 GB physical memory used; 2.8 GB of 4.2 GB virtual memory used. Killing container. [...]
  • 41. 41© Cloudera, Inc. All rights reserved. Container [pid=63375,containerID=container_1388158490598_0001_01_00 0003] is running beyond physical memory limits. Current usage: 2.2 GB of 2.1 GB physical memory used; 2.8 GB of 4.2 GB virtual memory used. Killing container. [...]
  • 42. 42© Cloudera, Inc. All rights reserved. spark-submit --executor-memory 2g --master yarn-client --num-executors 2 --num-cores 2
  • 43. 43© Cloudera, Inc. All rights reserved. yarn.nodemanager.resource.memory-mb Executor Container spark.yarn.executor.memoryOverhead (7%) (10% in 1.4) spark.executor.memory spark.shuffle.memoryFraction (0.4) spark.storage.memoryFraction (0.6) Memory allocation
  • 44. 44© Cloudera, Inc. All rights reserved. Sometimes jobs run slow or even… Courtesy of: http://blog.sdrock.com/pastors/files/2013/06/time-clock.jpg
  • 45. 45© Cloudera, Inc. All rights reserved. java.lang.OutOfMemoryError: GC overhead limit exceeded at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1986) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) [...]
  • 46. 46© Cloudera, Inc. All rights reserved. GC Stalls
  • 47. 47© Cloudera, Inc. All rights reserved. Too much spilling! Courtesy of: http://tgnp.me/wp-content/uploads/2014/05/spilled-starbucks.jpg
  • 48. 48© Cloudera, Inc. All rights reserved. Shuffle Boundaries maptextFile filter group ByKey hadoop File map join map Shuffle
  • 49. 49© Cloudera, Inc. All rights reserved. Most performance issues are in shuffles!
  • 50. 50© Cloudera, Inc. All rights reserved. Inside a Task: Fetch & Aggregate ExternalAppendOnlyMapBlock Block deserialize deserialize key1 -> values key2 -> values key3 -> values key4 -> values Sort & Spill key1 -> values key2 -> values key3 -> values
  • 51. 51© Cloudera, Inc. All rights reserved. rdd.reduceByKey(reduceFunc, numPartitions=1000) Inside a Task: Specify partitions
  • 52. 52© Cloudera, Inc. All rights reserved. Why not set partitions to ∞ ?
  • 53. 53© Cloudera, Inc. All rights reserved. Excessive parallelism • Overwhelming scheduler overhead • More fetches -> more disk seeks • Driver needs to track state per-task
  • 54. 54© Cloudera, Inc. All rights reserved. So how to choose? • Easy answer: • Keep multiplying by 1.5 and see what works
  • 55. 55© Cloudera, Inc. All rights reserved. Is Spark bad? Courtesy of: https://theferkel.files.wordpress.com/2015/04/250474-breaking-bad-quotes.jpg
  • 56. 56© Cloudera, Inc. All rights reserved. Thank you

Editor's Notes

  1. Early on a colleague of ours sent us this exception… this is truncated This talk is going to be about these kinds of errors you sometimes get when running…
  2. This is probably the most common failure you’re going to see. First of all, in this case, the punchline here is going to be that the problem is your fault. But second of all, what does all this other stuff mean and why is Spark telling me this in this way.
  3. This is probably the most common failure you’re going to see. First of all, in this case, the punchline here is going to be that the problem is your fault. But second of all, what does all this other stuff mean and why is Spark telling me this in this way.
  4. Lets start with an example program in Spark.
  5. The sum() call launches a job
  6. Lets start with an example program in Spark.
  7. A chunk of data somewhere Could be on Hadoop File System (HDFS) Could be cached in Spark Defines the degree of parallelism
  8. Describes a way of generating input and output partitions Immutable – very important! RDDs can depend on other RDDs Most have single parent Joins have multiple parents Lineage over replication for fault tolerance https://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf
  9. Describes a way of generating input and output partitions Immutable – very important! RDDs can depend on other RDDs Most have single parent Joins have multiple parents Lineage over replication for fault tolerance https://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf
  10. Describes a way of generating input and output partitions Immutable – very important! RDDs can depend on other RDDs Most have single parent Joins have multiple parents Lineage over replication for fault tolerance https://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf
  11. Describes a way of generating input and output partitions Immutable – very important! RDDs can depend on other RDDs Most have single parent Joins have multiple parents Lineage over replication for fault tolerance https://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf
  12. Describes a way of generating input and output partitions Immutable – very important! RDDs can depend on other RDDs Most have single parent Joins have multiple parents Lineage over replication for fault tolerance https://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf
  13. Narrow map filter Wide join groupByKey
  14. More details: A job is a DAG of stages The scheduler creates a set of tasks per stage Partitions assigned to a task
  15. This is probably the most common failure you’re going to see. First of all, in this case, the punchline here is going to be that the problem is your fault. But second of all, what does all this other stuff mean and why is Spark telling me this in this way.
  16. Talk about the relationship between stages and tasks.
  17. So with all this information in hand, we can come back and interpret this error. We tried to run a job, but it failed because one of its stages failed. Why did that stage fail? Because one of its tasks failed. Tasks will be retried. Mercifully, Spark gives us the exception that caused the most recent failure.
  18. Mercifully, Spark gives us the exception that caused the most recent failure.
  19. Mercifully, Spark gives us the exception that caused the most recent failure.
  20. Lets review the general Spark architecture A driver Where the DAG scheduler lives Drives the sho Single point of failure Executors Communicates with driver Runs the tasks created by the driver Think of this as a ThreadPoolExecutor in java Pluggable cluster managers YARN, Mesos, standalone ----- Meeting Notes (6/10/15 14:57) -----
  21. Lets review the general Spark architecture
  22. Lets review the general Spark architecture
  23. Lets review the general Spark architecture
  24. This shows up in the YARN NodeManager logs
  25. mention that this is what happens with a groupByKey or reduceByKey show blocks being deserialized into Java objects and placed into map show spill with fewer tasks, more of these blocks have to go to the same reducer, and more stuff needs to be held in this map
  26. No Distributed systems are complicated