Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Why your Spark job is failing

61,525 views

Published on

Why your Spark job is failing

Published in: Data & Analytics
  • DOWNLOAD THIS BOOKS INTO AVAILABLE FORMAT (2019 Update) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { https://soo.gd/irt2 } ......................................................................................................................... Download Full EPUB Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download Full doc Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download PDF EBOOK here { https://soo.gd/irt2 } ......................................................................................................................... Download EPUB Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download doc Ebook here { https://soo.gd/irt2 } ......................................................................................................................... ......................................................................................................................... ................................................................................................................................... eBook is an electronic version of a traditional print book THIS can be read by using a personal computer or by using an eBook reader. (An eBook reader can be a software application for use on a computer such as Microsoft's free Reader application, or a book-sized computer THIS is used solely as a reading device such as Nuvomedia's Rocket eBook.) Users can purchase an eBook on diskette or CD, but the most popular method of getting an eBook is to purchase a downloadable file of the eBook (or other reading material) from a Web site (such as Barnes and Noble) to be read from the user's computer or reading device. Generally, an eBook can be downloaded in five minutes or less ......................................................................................................................... .............. Browse by Genre Available eBooks .............................................................................................................................. Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, ......................................................................................................................... ......................................................................................................................... .....BEST SELLER FOR EBOOK RECOMMEND............................................................. ......................................................................................................................... Blowout: Corrupted Democracy, Rogue State Russia, and the Richest, Most Destructive Industry on Earth,-- The Ride of a Lifetime: Lessons Learned from 15 Years as CEO of the Walt Disney Company,-- Call Sign Chaos: Learning to Lead,-- StrengthsFinder 2.0,-- Stillness Is the Key,-- She Said: Breaking the Sexual Harassment Story THIS Helped Ignite a Movement,-- Atomic Habits: An Easy & Proven Way to Build Good Habits & Break Bad Ones,-- Everything Is Figureoutable,-- What It Takes: Lessons in the Pursuit of Excellence,-- Rich Dad Poor Dad: What the Rich Teach Their Kids About Money THIS the Poor and Middle Class Do Not!,-- The Total Money Makeover: Classic Edition: A Proven Plan for Financial Fitness,-- Shut Up and Listen!: Hard Business Truths THIS Will Help You Succeed, ......................................................................................................................... .........................................................................................................................
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Follow the link, new dating source: ♥♥♥ http://bit.ly/2ZDZFYj ♥♥♥
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Dating for everyone is here: ♥♥♥ http://bit.ly/2ZDZFYj ♥♥♥
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Earn Up To $316/day! Easy Writing Jobs from the comfort of home! ■■■ https://tinyurl.com/vvgf8vz
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Ripley's Believe It Or Not Investigated Him After His 5th Win...(unreal story inside) ♣♣♣ https://tinyurl.com/t2onem4
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Why your Spark job is failing

  1. 1. ● Data science at Cloudera ● Recently lead Apache Spark development at Cloudera ● Before that, committing on Apache YARN and MapReduce ● Hadoop project management committee
  2. 2. com.esotericsoftware.kryo. KryoException: Unable to find class: $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC $$iwC$$iwC$$anonfun$4$$anonfun$apply$3
  3. 3. org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:0 failed 4 times, most recent failure: Exception failure in TID 6 on host bottou02-10g.pa.cloudera.com: java.lang.ArithmeticException: / by zero $iwC$$iwC$$iwC$$iwC$$anonfun$1.apply$mcII$sp(<console>:13) $iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:13) $iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:13) scala.collection.Iterator$$anon$11.next(Iterator.scala:328) org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1016) [...] Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler. org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages (DAGScheduler.scala:1033) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1. apply(DAGScheduler.scala:1017) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1. apply(DAGScheduler.scala:1015) [...]
  4. 4. org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:0 failed 4 times, most recent failure: Exception failure in TID 6 on host bottou02-10g.pa.cloudera.com: java.lang.ArithmeticException: / by zero $iwC$$iwC$$iwC$$iwC$$anonfun$1.apply$mcII$sp(<console>:13) $iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:13) $iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:13) scala.collection.Iterator$$anon$11.next(Iterator.scala:328) org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1016) [...]
  5. 5. org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:0 failed 4 times, most recent failure: Exception failure in TID 6 on host bottou02-10g.pa.cloudera.com: java.lang.ArithmeticException: / by zero $iwC$$iwC$$iwC$$iwC$$anonfun$1.apply$mcII$sp(<console>:13) $iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:13) $iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:13) scala.collection.Iterator$$anon$11.next(Iterator.scala:328) org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1016) [...]
  6. 6. val file = sc.textFile("hdfs://...") file.filter(_.startsWith(“banana”)) .count()
  7. 7. Job Stage Task Task Task Task Stage Task Task Task Task
  8. 8. val rdd1 = sc.textFile(“hdfs://...”) .map(someFunc) .filter(filterFunc) textFile map filter
  9. 9. val rdd2 = sc.hadoopFile(“hdfs: //...”) .groupByKey() .map(someOtherFunc) hadoopFile groupByKey map
  10. 10. val rdd3 = rdd1.join(rdd2) .map(someFunc) join map
  11. 11. rdd3.collect()
  12. 12. textFile map filter hadoop group File ByKey map join map
  13. 13. textFile map filter hadoop group File ByKey map join map
  14. 14. Stage Task Task Task Task
  15. 15. Stage Task Task Task Task
  16. 16. Stage Task Task Task Task
  17. 17. org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:0 failed 4 times, most recent failure: Exception failure in TID 6 on host bottou02-10g.pa.cloudera.com: java.lang.ArithmeticException: / by zero $iwC$$iwC$$iwC$$iwC$$anonfun$1.apply$mcII$sp(<console>:13) $iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:13) $iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:13) scala.collection.Iterator$$anon$11.next(Iterator.scala:328) org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1016) [...]
  18. 18. 14/04/22 11:59:58 ERROR executor.Executor: Exception in task ID 286 6 java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:565 ) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:64 8) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:706 ) at java.io.DataInputStream.read(DataInputStream.java:100 ) at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:20 9) at org.apache.hadoop.util.LineReader.readLine(LineReader.java:173 ) at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:20 6) at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:4 5) at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:164 ) at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:149 ) at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71 ) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:2 7) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327 ) at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388 ) at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388 ) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327 ) at scala.collection.Iterator$class.foreach(Iterator.scala:727 ) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157 ) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:16 1) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:10 2) at org.apache.spark.scheduler.Task.run(Task.scala:53 ) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:21 1) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:4 2) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:4 1) at java.security.AccessController.doPrivileged(Native Method ) at javax.security.auth.Subject.doAs(Subject.java:415 ) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:140 8) at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:4 1) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:176 ) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:114 5) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:61 5) at java.lang.Thread.run(Thread.java:724)
  19. 19. ResourceManager NodeManager NodeManager Container Container Application Master Container Client
  20. 20. ResourceManager Client NodeManager NodeManager Container Map Task Container Application Master Container Reduce Task
  21. 21. ResourceManager Client NodeManager NodeManager Container Map Task Container Application Master Container Reduce Task
  22. 22. ResourceManager Client NodeManager NodeManager Container Map Task Container Application Master Container Reduce Task
  23. 23. Container [pid=63375, containerID=container_1388158490598_0001_01_00 0003] is running beyond physical memory limits. Current usage: 2.1 GB of 2 GB physical memory used; 2.8 GB of 4.2 GB virtual memory used. Killing container.
  24. 24. yarn.nodemanager.resource.memory-mb Executor container spark.yarn.executor.memoryOverhead spark.executor.memory spark.shuffle.memoryFraction spark.storage.memoryFraction
  25. 25. ExternalAppend OnlyMap Block Block deserialize deserialize
  26. 26. ExternalAppend OnlyMap key1 -> values key2 -> values key3 -> values
  27. 27. ExternalAppend OnlyMap key1 -> values key2 -> values key3 -> values
  28. 28. ExternalAppend OnlyMap Sort & Spill key1 -> values key2 -> values key3 -> values
  29. 29. rdd.reduceByKey(reduceFunc, numPartitions=1000)
  30. 30. java.io.FileNotFoundException: /dn6/spark/local/spark-local- 20140610134115- 2cee/30/merged_shuffle_0_368_14 (Too many open files)
  31. 31. Task Task Write stuff key1 -> values key2 -> values key3 -> values out key1 -> values key2 -> values key3 -> values Task
  32. 32. Task Task Write stuff key1 -> values key2 -> values key3 -> values out key1 -> values key2 -> values key3 -> values Task
  33. 33. Partition 1 File Partition 2 File Partition 3 File Records
  34. 34. Records Buffer
  35. 35. Single file Buffer Sort & Spill Partition 1 Records Partition 2 Records Partition 3 Records Index file
  36. 36. conf.set(“spark.shuffle.manager”, SORT)
  37. 37. ● No ● Distributed systems are complicated

×