Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
Meetup ml spark_ppt
Meetup ml spark_ppt
Loading in …3
×
1 of 29

Better Visibility into Spark Execution for Faster Application Development-(Shivnath Babubabu and Lance Co Ting Keh, Duke)

6

Share

Download to read offline

Presentation at Spark Summit

Better Visibility into Spark Execution for Faster Application Development-(Shivnath Babubabu and Lance Co Ting Keh, Duke)

  1. 1. Spark Application Development Made Fast and Easy Shivnath Babu Lance Co Ting Keh
  2. 2. Lance Co Ting Keh Machine Learning @ Box Distributed ML Infrastructure Go Blue Devils! Shivnath Babu Associate Professor @ Duke Chief Scientist at Unravel Data Sys. R&D in Management of Data Systems
  3. 3. Spark @Box
  4. 4. What’s so great about Spark? From:
  5. 5. https://weminoredinfilm.files.wordpress.com/2014/04/59911951.jpg Complexity
  6. 6. Spark Execution part-0 part-1 part-2 part-3 part-0 part-1 part-2 part-3 map filter reduceBykey map saveAsTextFile part-0 part-1 part-2 part-3 part-0 part-1 part-2 part-3 part-0 part-1 part-2 part-3 part-0 part-1 part-2 part-3 RDD0 RDD1 RDD2 RDD3 RDD4 HDFS sc.textFile(hdfsPath) .map(parseInput) .filter(subThreshold) .reduceByKey(tallyCount) .map(formatOutput) .saveAsTextFile(outPath)
  7. 7. Spark Execution sc.textFile(hdfsPath) .map(parseInput) .filter(subThreshold) .reduceByKey(tallyCount) .map(formatOutput) .saveAsTextFile(outPath) part-0 part-1 part-2 part-3 part-0 part-1 part-2 part-3 map filter reduceBykey map saveAsTextFile part-0 part-1 part-2 part-3 part-0 part-1 part-2 part-3 part-0 part-1 part-2 part-3 part-0 part-1 part-2 part-3 HDFS Stage 0 Stage 1 RDD0 RDD1 RDD2 RDD3 RDD4
  8. 8. part-0 part-1 part-2 part-3 part-0 part-1 part-2 part-3 map filter reduceBykey map saveAsTextFile part-0 part-1 part-2 part-3 part-0 part-1 part-2 part-3 part-0 part-1 part-2 part-3 part-0 part-1 part-2 part-3 RDD0 RDD1 RDD2 RDD3 RDD4 HDFS Stage 0 Stage 1 sc.textFile(hdfsPath) .map(parseInput) .filter(subThreshold) .reduceByKey(tallyCount) .map(formatOutput) .saveAsTextFile(outPath) Spark Execution
  9. 9. Spark Execution part-0 part-1 part-2 part-3 part-0 part-1 part-2 part-3 map filter reduceBykey map saveAsTextFile part-0 part-1 part-2 part-3 part-0 part-1 part-2 part-3 part-0 part-1 part-2 part-3 part-0 part-1 part-2 part-3 Exec0Exec1Exec2Exec3 RDD0 RDD1 RDD2 RDD3 RDD4 HDFS sc.textFile(hdfsPath) .map(parseInput) .filter(subThreshold) .reduceByKey(tallyCount) .map(formatOutput) .saveAsTextFile(outPath) Stage 0 Stage 1
  10. 10. Spark Execution part-0 part-1 part-2 part-3 part-0 part-1 part-2 part-3 map filter reduceBykey map saveAsTextFile part-0 part-1 part-2 part-3 part-0 part-1 part-2 part-3 part-0 part-1 part-2 part-3 part-0 part-1 part-2 part-3 Exec0Exec1Exec2Exec3 RDD0 RDD1 RDD2 RDD3 RDD4 HDFS sc.textFile(hdfsPath) .map(parseInput) .filter(subThreshold) .reduceByKey(tallyCount) .map(formatOutput) .saveAsTextFile(outPath) Stage 0 Stage 1
  11. 11. Spark Execution part-0 part-1 part-2 part-3 part-0 part-1 part-2 part-3 map filter reduceBykey map saveAsTextFile part-0 part-1 part-2 part-3 part-0 part-1 part-2 part-3 part-0 part-1 part-2 part-3 part-0 part-1 part-2 part-3 Exec0Exec1Exec2Exec3 RDD0 RDD1 RDD2 RDD3 RDD4 HDFS sc.textFile(hdfsPath) .map(parseInput) .filter(subThreshold) .reduceByKey(tallyCount) .map(formatOutput) .saveAsTextFile(outPath) Stage 0 Stage 1
  12. 12. Anything that can go wrong, will go wrong (at some point)
  13. 13. What can go wrong? Failures •  My query failed after 6 hours! •  What does this exception mean?
  14. 14. What can go wrong? •  Failures •  My query failed after 6 hours! •  What does this exception mean? •  Wrong results •  Result of my job looks wrong •  Bad performance •  My app is very slow •  Pipeline is not meeting the 4hr SLA •  Poor scalability •  Oh, but it worked on the dev cluster! •  Bad App(le)s •  Tom’s query brought the cluster down •  Application Problems •  Poor choice of transformations •  Ineffective caching •  Bloated data structures •  Data/Storage Problems •  Skewed data, load imbalance •  Small files, poor data partitioning •  Spark Problems •  Shuffle •  Lazy evaluation causes confusion •  Resource Problems •  Resource contention •  Performance degradation And Why?
  15. 15. How do application developers detect & fix these problems today?
  16. 16. Too much to look at!
  17. 17. Look at Logs? Logs in distributed systems are spread out, incomplete, & usually very difficult to understand
  18. 18. There has to be a better way for application developers to detect & fix problems
  19. 19. Visualize: Show me all relevant data in one place
  20. 20. Optimize: Analyze the data for me and give me diagnoses and fixes
  21. 21. Strategize: Help me prevent the problems from happening and meet my goals
  22. 22. Demo
  23. 23. Visualize: Show me all relevant data in one place
  24. 24. Visualize: Show me all relevant data in one place
  25. 25. Optimize: Analyze the data for me and give me diagnoses and fixes
  26. 26. Strategize: Help me prevent the problems from happening and meet my goals
  27. 27. Strategize: Help me prevent the problems from happening and meet my goals
  28. 28. We are hiring: jobs@unraveldata.com Sign up for early access at: bit.ly/unravelspark

×