2. What does “failure” mean for Spark?
• Spark is a cluster-compute framework
targeted at analytics workloads
• Supported failure modes:
– Transient errors (e.g., network, HDFS outage)
– Worker machine failures
• Unsupported failure modes:
– Systemic exceptions (e.g., bad code, OOMs)
– Driver machine failure
3. What makes a recovery model good?
• A good recovery model should:
– Be simple
– Consistently make progress towards
completion
– Always be in use (“fail constantly”)
4. Outline of this talk
• Spark architecture overview
• Common failures
• Special considerations for fault tolerance
5. Example program
Goal: Find number of names per “first character”
sc.textFile(“hdfs:/names”)
.map(name => (name.charAt(0), 1))
.reduceByKey(_ + _)
.collect()
うえださいとうえしん
6. Example program
Goal: Find number of names per “first character”
sc.textFile(“hdfs:/names”)
.map(name => (name.charAt(0), 1))
.reduceByKey(_ + _)
.collect()
うえださいとうえしん
(う, 1) (さ, 1) (う, 1)
7. Example program
Goal: Find number of names per “first character”
sc.textFile(“hdfs:/names”)
.map(name => (name.charAt(0), 1))
.reduceByKey(_ + _)
.collect()
うえださいとうえしん
(う, 1) (さ, 1) (う, 1)
(う, 2) (さ, 1)
8. Example program
Goal: Find number of names per “first character”
sc.textFile(“hdfs:/names”)
.map(name => (name.charAt(0), 1))
.reduceByKey(_ + _)
.collect()
うえださいとうえしん
(う, 1) (さ, 1) (う, 1)
(う, 2) (さ, 1)
res0 = [(う,2), (さ,1)]
9. Spark Execution Model
1. Create DAG of RDDs to represent
computation
2. Create logical execution plan for DAG
3. Schedule and execute individual tasks
12. Step 2: Create execution plan
• Pipeline as much as possible
• Split into “stages” based on need to
reorganize data
Stage 1 HadoopRDD
map()
reduceByKey()
collect()
うえださいとうえしん
(う, 1) (さ, 1) (う, 1)
(う, 2)
(さ, 1)
res0 = [(う,2), (さ,1)]
Stage 2
13. Step 3: Schedule tasks
• Split each stage into tasks
• A task is data + computation
• Execute all tasks within a stage before
moving on
29. When things go wrong
• Task failure
• Task taking a long time
• Executor failure
30. Task Failure
• Task fails with exception retry it
• RDDs are immutable and “stateless”, so
rerunning should have same effect
– Special logic required for tasks which write
data out (atomic rename)
– Statelessness not enforced by programming
model
sc.parallelize(0 until 100).map { x =>
val myVal = sys.prop(“foo”, 0) + x
sys.prop(“foo”) = myVal
myVal
}
32. Speculative Execeution
• Try to predict slow or failing tasks, restart
task on a different machine in parallel
• Also assumes immutability and
statelessness
• Enable with “spark.speculation=true”
38. Executor Failure
• Examine tasks run on that executor:
– If task from final stage, we’ve already
collected its results – don’t rerun
– If task from intermediate stage, must rerun.
• May require executing “finished” stage
42. Other Failure Scenarios
What happens when:
1. We have a large number of stages?
2. Our input data is not immutable (e.g.
streaming)?
3. Executors had cached data?
43. 1. Dealing with many stages
Problem:
Executor loss causes recomputation of all non-final
stages
Solution:
Checkpoint whole RDD to HDFS periodically
Stage 1 Stage 2 Stage 3 Stage 4 Stage 5 Stage 6 Stage 7
44. 1. Dealing with many stages
Problem:
Executor loss causes recomputation of all non-final
stages
Solution:
Checkpoint whole RDD to HDFS periodically
HDFS
Stage 1 Stage 2 Stage 3 Stage 4 Stage 5 Stage 6 Stage 7
45. 2. Dealing with lost input data
Problem:
Input data is consumed when read (e.g.,
streaming), and re-execution is not possible.
Solution:
No general solution today – either use an HDFS
source or implement it yourself.
Spark 1.2 roadmap includes a general solution
which may trade throughput for safety.
46. 3. Loss of cached data
Problem:
Executor loss causes cache to become
incomplete.
Solution:
Do nothing – a task caches data locally while it
runs, causing the cache to stabilize.
47. 3. Loss of cached data
val file = sc.textFile(“s3n://”).cache() // 8 blocks
for (i <- 0 until 10) {
file.count()
}
Cache Block 0
Block 2
Block 4
Block 6
Block 0
Block 2
Block 4
Block 6
Cache Block 1
Block 3
Block 5
Block 7
Block 1
Block 3
Block 5
Block 7
i = 0
i = 1
Block 0
Block 2
Block 4
Block 6
Block 1
Block 3
Block 5
Block 7
48. 3. Loss of cached data
val file = sc.textFile(“s3n://”).cache()
for (i <- 0 until 10) {
file.count()
}
Cache
i = 0 i = 1
Block 2
Block 4
Block 6
Block 7
Cache Block 1
Block 3
Block 5
Block 7
Block 1
Block 3
Block 5
Block 7
Block 1
Block 3
Block 5
Block 7
Block 2
Block 4
Block 6
Block 7
Block 0
Block 1
Block 3
Block 5
Block 0
i = 2
i = 3
Block 2
Block 4
Block 6
Block 7
Block 0
Block 1
Block 3
Block 5
49. Conclusions
• Spark comes equipped to handle the most
common forms of failure
• Special care must be taken in certain
cases:
– Highly iterative use-cases (checkpointing)
– Streaming (atomic data consumption)
– Violating Spark’s core immutability and
statelessness assumptions