Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Spark Application
Development Made
Fast and Easy
Shivnath Babu
Lance Co Ting Keh
Lance Co Ting Keh
Machine Learning @ Box
Distributed ML Infrastructure
Go Blue Devils!
Shivnath Babu
Associate Professor @...
Spark @Box
What’s so great about Spark?
From:
https://weminoredinfilm.files.wordpress.com/2014/04/59911951.jpg
Complexity
Spark Execution
part-0
part-1
part-2
part-3
part-0
part-1
part-2
part-3
map filter reduceBykey map saveAsTextFile
part-0
p...
Spark Execution
sc.textFile(hdfsPath)
.map(parseInput)
.filter(subThreshold)
.reduceByKey(tallyCount)
.map(formatOutput)
....
part-0
part-1
part-2
part-3
part-0
part-1
part-2
part-3
map filter reduceBykey map saveAsTextFile
part-0
part-1
part-2
par...
Spark Execution
part-0
part-1
part-2
part-3
part-0
part-1
part-2
part-3
map filter reduceBykey map saveAsTextFile
part-0
p...
Spark Execution
part-0
part-1
part-2
part-3
part-0
part-1
part-2
part-3
map filter reduceBykey map saveAsTextFile
part-0
p...
Spark Execution
part-0
part-1
part-2
part-3
part-0
part-1
part-2
part-3
map filter reduceBykey map saveAsTextFile
part-0
p...
Anything that can go wrong,
will go wrong (at some point)
What can go wrong?
Failures
•  My query failed after 6 hours!
•  What does this exception mean?
What can go wrong?
•  Failures
•  My query failed after 6 hours!
•  What does this exception mean?
•  Wrong results
•  Res...
How do application developers
detect & fix these problems today?
Too much to look at!
Look at Logs?
Logs in distributed systems are spread out, incomplete,
& usually very difficult to understand
There has to be a better way for
application developers to
detect & fix problems
Visualize:
Show me all relevant data in one place
Optimize:
Analyze the data for me and give me
diagnoses and fixes
Strategize:
Help me prevent the problems from
happening and meet my goals
Demo
Visualize:
Show me all relevant data in one place
Visualize:
Show me all relevant data in one place
Optimize:
Analyze the data for me and give me diagnoses
and fixes
Strategize:
Help me prevent the problems from happening
and meet my goals
Strategize:
Help me prevent the problems from happening
and meet my goals
We are hiring: jobs@unraveldata.com
Sign up for early access at:
bit.ly/unravelspark
 Better Visibility into Spark Execution for Faster Application Development-(Shivnath Babubabu and Lance Co Ting Keh, Duke)
Upcoming SlideShare
Loading in …5
×

Better Visibility into Spark Execution for Faster Application Development-(Shivnath Babubabu and Lance Co Ting Keh, Duke)

3,158 views

Published on

Presentation at Spark Summit

Published in: Data & Analytics

Better Visibility into Spark Execution for Faster Application Development-(Shivnath Babubabu and Lance Co Ting Keh, Duke)

  1. 1. Spark Application Development Made Fast and Easy Shivnath Babu Lance Co Ting Keh
  2. 2. Lance Co Ting Keh Machine Learning @ Box Distributed ML Infrastructure Go Blue Devils! Shivnath Babu Associate Professor @ Duke Chief Scientist at Unravel Data Sys. R&D in Management of Data Systems
  3. 3. Spark @Box
  4. 4. What’s so great about Spark? From:
  5. 5. https://weminoredinfilm.files.wordpress.com/2014/04/59911951.jpg Complexity
  6. 6. Spark Execution part-0 part-1 part-2 part-3 part-0 part-1 part-2 part-3 map filter reduceBykey map saveAsTextFile part-0 part-1 part-2 part-3 part-0 part-1 part-2 part-3 part-0 part-1 part-2 part-3 part-0 part-1 part-2 part-3 RDD0 RDD1 RDD2 RDD3 RDD4 HDFS sc.textFile(hdfsPath) .map(parseInput) .filter(subThreshold) .reduceByKey(tallyCount) .map(formatOutput) .saveAsTextFile(outPath)
  7. 7. Spark Execution sc.textFile(hdfsPath) .map(parseInput) .filter(subThreshold) .reduceByKey(tallyCount) .map(formatOutput) .saveAsTextFile(outPath) part-0 part-1 part-2 part-3 part-0 part-1 part-2 part-3 map filter reduceBykey map saveAsTextFile part-0 part-1 part-2 part-3 part-0 part-1 part-2 part-3 part-0 part-1 part-2 part-3 part-0 part-1 part-2 part-3 HDFS Stage 0 Stage 1 RDD0 RDD1 RDD2 RDD3 RDD4
  8. 8. part-0 part-1 part-2 part-3 part-0 part-1 part-2 part-3 map filter reduceBykey map saveAsTextFile part-0 part-1 part-2 part-3 part-0 part-1 part-2 part-3 part-0 part-1 part-2 part-3 part-0 part-1 part-2 part-3 RDD0 RDD1 RDD2 RDD3 RDD4 HDFS Stage 0 Stage 1 sc.textFile(hdfsPath) .map(parseInput) .filter(subThreshold) .reduceByKey(tallyCount) .map(formatOutput) .saveAsTextFile(outPath) Spark Execution
  9. 9. Spark Execution part-0 part-1 part-2 part-3 part-0 part-1 part-2 part-3 map filter reduceBykey map saveAsTextFile part-0 part-1 part-2 part-3 part-0 part-1 part-2 part-3 part-0 part-1 part-2 part-3 part-0 part-1 part-2 part-3 Exec0Exec1Exec2Exec3 RDD0 RDD1 RDD2 RDD3 RDD4 HDFS sc.textFile(hdfsPath) .map(parseInput) .filter(subThreshold) .reduceByKey(tallyCount) .map(formatOutput) .saveAsTextFile(outPath) Stage 0 Stage 1
  10. 10. Spark Execution part-0 part-1 part-2 part-3 part-0 part-1 part-2 part-3 map filter reduceBykey map saveAsTextFile part-0 part-1 part-2 part-3 part-0 part-1 part-2 part-3 part-0 part-1 part-2 part-3 part-0 part-1 part-2 part-3 Exec0Exec1Exec2Exec3 RDD0 RDD1 RDD2 RDD3 RDD4 HDFS sc.textFile(hdfsPath) .map(parseInput) .filter(subThreshold) .reduceByKey(tallyCount) .map(formatOutput) .saveAsTextFile(outPath) Stage 0 Stage 1
  11. 11. Spark Execution part-0 part-1 part-2 part-3 part-0 part-1 part-2 part-3 map filter reduceBykey map saveAsTextFile part-0 part-1 part-2 part-3 part-0 part-1 part-2 part-3 part-0 part-1 part-2 part-3 part-0 part-1 part-2 part-3 Exec0Exec1Exec2Exec3 RDD0 RDD1 RDD2 RDD3 RDD4 HDFS sc.textFile(hdfsPath) .map(parseInput) .filter(subThreshold) .reduceByKey(tallyCount) .map(formatOutput) .saveAsTextFile(outPath) Stage 0 Stage 1
  12. 12. Anything that can go wrong, will go wrong (at some point)
  13. 13. What can go wrong? Failures •  My query failed after 6 hours! •  What does this exception mean?
  14. 14. What can go wrong? •  Failures •  My query failed after 6 hours! •  What does this exception mean? •  Wrong results •  Result of my job looks wrong •  Bad performance •  My app is very slow •  Pipeline is not meeting the 4hr SLA •  Poor scalability •  Oh, but it worked on the dev cluster! •  Bad App(le)s •  Tom’s query brought the cluster down •  Application Problems •  Poor choice of transformations •  Ineffective caching •  Bloated data structures •  Data/Storage Problems •  Skewed data, load imbalance •  Small files, poor data partitioning •  Spark Problems •  Shuffle •  Lazy evaluation causes confusion •  Resource Problems •  Resource contention •  Performance degradation And Why?
  15. 15. How do application developers detect & fix these problems today?
  16. 16. Too much to look at!
  17. 17. Look at Logs? Logs in distributed systems are spread out, incomplete, & usually very difficult to understand
  18. 18. There has to be a better way for application developers to detect & fix problems
  19. 19. Visualize: Show me all relevant data in one place
  20. 20. Optimize: Analyze the data for me and give me diagnoses and fixes
  21. 21. Strategize: Help me prevent the problems from happening and meet my goals
  22. 22. Demo
  23. 23. Visualize: Show me all relevant data in one place
  24. 24. Visualize: Show me all relevant data in one place
  25. 25. Optimize: Analyze the data for me and give me diagnoses and fixes
  26. 26. Strategize: Help me prevent the problems from happening and meet my goals
  27. 27. Strategize: Help me prevent the problems from happening and meet my goals
  28. 28. We are hiring: jobs@unraveldata.com Sign up for early access at: bit.ly/unravelspark

×