SlideShare a Scribd company logo
1© Cloudera, Inc. All rights reserved.
Neelesh Srinivas Salian
Breaking Spark
Top 5 Mistakes to avoid when
using Apache Spark in
Production
2© Cloudera, Inc. All rights reserved.
Outline
• Introduction to Spark
• Information gathering
• Classification of the Issues
• Description
• Remedies
• Q&A
3© Cloudera, Inc. All rights reserved.
4© Cloudera, Inc. All rights reserved.
Information Gathering
Regular Spark Applications and
memory configuration problems
Problems in the Ingest pipelines
Problems in interacting with
Kafka
Usually Resource Management
issues
Spark Core Spark Streaming Spark on YARN
Source:
• Cloudera Customer Opened Cases
• Posts in Cloudera Community Forums
• Spark use cases for the following:
5© Cloudera, Inc. All rights reserved.
Classification
1. Scaling of the Architecture
2. Memory Configurations
3. End user Code
4. Incompatible Dependencies
5. Administration/Operation related issues
6© Cloudera, Inc. All rights reserved.
1. Increase in the cluster Size
2. Utilization of the Cluster
1. Scaling of the Architecture
7© Cloudera, Inc. All rights reserved.
1. Understand the initial setup and current one
2. Look at logs / console to find the error
3. Determine parameters
4. Tune away
1. Scale of the Architecture: Larger Datasets
8© Cloudera, Inc. All rights reserved.
YARN
Remember:
Container Memory
• yarn.nodemanager.resource.memory-mb
Container Memory Maximum
• yarn.scheduler.maximum-allocation-mb
Container Memory Minimum
• yarn.scheduler.minimum-allocation-mb
2. Memory Configurations
Standalone
Java Heap Size of Worker in Bytes
• worker_max_heapsize
Java Heap Size of Master in Bytes
• master_max_heapsize
9© Cloudera, Inc. All rights reserved.
Symptoms:
Executor error: ERROR
util.SparkUncaughtExceptionHandler: Uncaught
exception in thread Thread[Executor task launch worker-
0,5,main]
java.lang.OutOfMemoryError: Java heap space
Driver error: ERROR actor.ActorSystemImpl: Uncaught
fatal error from thread [sparkDriver-akka.actor.default-
dispatcher-60] shutting down ActorSystem [sparkDriver]
java.lang.OutOfMemoryError: Java heap space
Cause:
The driver or executor is configured
with a memory limit that is too low for
the application being run.
2. Memory Configurations: Insufficient Memory
10© Cloudera, Inc. All rights reserved.
1. Understand Spark Configuration Order of Precedence
1. Spark configuration file: Parameters are passed to SparkConf.
2. Command-line arguments are passed to spark-submit, spark-shell, or pyspark.
3. Cloudera Manager UI properties set in the spark-defaults.conf configuration file.
2. Verify the Driver and Executor Memory Used
1. Environment tab in History Server
2. Check driver and executor memory usage values
3. Increase the Heap
Caveat: do not set the spark.driver.memory config using SparkConf , use - - driver-memory
• spark.driver.memory
• spark.executor.memory
2. Memory Configurations: Insufficient Memory
11© Cloudera, Inc. All rights reserved.
1. Why chose Spark code over
MapReduce?
1. Expensive Operations
1. Syntax / Command Errors
3. End User Code
12© Cloudera, Inc. All rights reserved.
1. Incorrectly built jars
2. Version mismatch
4. Incompatible Dependencies
13© Cloudera, Inc. All rights reserved.
Symptoms
Error: Could not find or load main class
org.apache.spark.deploy.yarn.Application
Master
java.lang.ClassNotFoundException:
org.apache.hadoop.mapreduce.MRJobCon
fig
java.lang.NoClassDefFoundError:
org/apache/hadoop/fs/FSDataInputStrea
m
ImportError: No module named
qt_returns_summary_wrapper
4. Incompatible Dependencies: Classpath Issues
14© Cloudera, Inc. All rights reserved.
What to Do?
1. Verify if an application is using
classes that are not part of the
framework (Hadoop/Spark)
1. Verify that an application is built
using correct artifacts.
4. Incompatible Dependencies: Classpath Issues
15© Cloudera, Inc. All rights reserved.
1. Management of Cluster
2. User-related issues
3. Security/Permissions
4. No Configurations setup
5. Administration/Operation Related issues
16© Cloudera, Inc. All rights reserved.
• Learn from the Mistakes
• Be prepared
• References:
• Tuning Spark Blog Cloudera
• Tuning YARN Cloudera
• Spark Documentation
Summary
17© Cloudera, Inc. All rights reserved.
• Wilfred
• Paula
• Bjorn
• Mark
• Anand
• Sean
• Attila
Credits
18© Cloudera, Inc. All rights reserved.
Thank you
Neelesh Srinivas Salian
LinkedIn
Email: neeleshssalian@gmail.com
Twitter: NeelS7

More Related Content

What's hot

Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applications
hadooparchbook
 
Why Your Apache Spark Job is Failing
Why Your Apache Spark Job is FailingWhy Your Apache Spark Job is Failing
Why Your Apache Spark Job is Failing
Cloudera, Inc.
 
Spark tuning
Spark tuningSpark tuning
Cloudera Impala
Cloudera ImpalaCloudera Impala
Cloudera Impala
Alex Moundalexis
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance Understandability
Jen Aman
 
Compression talk
Compression talkCompression talk
Compression talk
Ilya Ganelin
 
A Developer’s View into Spark's Memory Model with Wenchen Fan
A Developer’s View into Spark's Memory Model with Wenchen FanA Developer’s View into Spark's Memory Model with Wenchen Fan
A Developer’s View into Spark's Memory Model with Wenchen Fan
Databricks
 
Solving low latency query over big data with Spark SQL
Solving low latency query over big data with Spark SQLSolving low latency query over big data with Spark SQL
Solving low latency query over big data with Spark SQL
Julien Pierre
 
Building a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and SparkBuilding a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and Spark
Evan Chan
 
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Databricks
 
Supporting Highly Multitenant Spark Notebook Workloads with Craig Ingram and ...
Supporting Highly Multitenant Spark Notebook Workloads with Craig Ingram and ...Supporting Highly Multitenant Spark Notebook Workloads with Craig Ingram and ...
Supporting Highly Multitenant Spark Notebook Workloads with Craig Ingram and ...
Spark Summit
 
Spark Tuning For Enterprise System Administrators, Spark Summit East 2016
Spark Tuning For Enterprise System Administrators, Spark Summit East 2016Spark Tuning For Enterprise System Administrators, Spark Summit East 2016
Spark Tuning For Enterprise System Administrators, Spark Summit East 2016
Anya Bida
 
Scale-Out Using Spark in Serverless Herd Mode!
Scale-Out Using Spark in Serverless Herd Mode!Scale-Out Using Spark in Serverless Herd Mode!
Scale-Out Using Spark in Serverless Herd Mode!
Databricks
 
PaaSTA: Autoscaling at Yelp
PaaSTA: Autoscaling at YelpPaaSTA: Autoscaling at Yelp
PaaSTA: Autoscaling at Yelp
Nathan Handler
 
Deploying Accelerators At Datacenter Scale Using Spark
Deploying Accelerators At Datacenter Scale Using SparkDeploying Accelerators At Datacenter Scale Using Spark
Deploying Accelerators At Datacenter Scale Using Spark
Jen Aman
 
Spark Summit EU talk by Jorg Schad
Spark Summit EU talk by Jorg SchadSpark Summit EU talk by Jorg Schad
Spark Summit EU talk by Jorg Schad
Spark Summit
 
Understanding Memory Management In Spark For Fun And Profit
Understanding Memory Management In Spark For Fun And ProfitUnderstanding Memory Management In Spark For Fun And Profit
Understanding Memory Management In Spark For Fun And Profit
Spark Summit
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applications
markgrover
 
Monitoring of GPU Usage with Tensorflow Models Using Prometheus
Monitoring of GPU Usage with Tensorflow Models Using PrometheusMonitoring of GPU Usage with Tensorflow Models Using Prometheus
Monitoring of GPU Usage with Tensorflow Models Using Prometheus
Databricks
 

What's hot (19)

Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applications
 
Why Your Apache Spark Job is Failing
Why Your Apache Spark Job is FailingWhy Your Apache Spark Job is Failing
Why Your Apache Spark Job is Failing
 
Spark tuning
Spark tuningSpark tuning
Spark tuning
 
Cloudera Impala
Cloudera ImpalaCloudera Impala
Cloudera Impala
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance Understandability
 
Compression talk
Compression talkCompression talk
Compression talk
 
A Developer’s View into Spark's Memory Model with Wenchen Fan
A Developer’s View into Spark's Memory Model with Wenchen FanA Developer’s View into Spark's Memory Model with Wenchen Fan
A Developer’s View into Spark's Memory Model with Wenchen Fan
 
Solving low latency query over big data with Spark SQL
Solving low latency query over big data with Spark SQLSolving low latency query over big data with Spark SQL
Solving low latency query over big data with Spark SQL
 
Building a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and SparkBuilding a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and Spark
 
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
 
Supporting Highly Multitenant Spark Notebook Workloads with Craig Ingram and ...
Supporting Highly Multitenant Spark Notebook Workloads with Craig Ingram and ...Supporting Highly Multitenant Spark Notebook Workloads with Craig Ingram and ...
Supporting Highly Multitenant Spark Notebook Workloads with Craig Ingram and ...
 
Spark Tuning For Enterprise System Administrators, Spark Summit East 2016
Spark Tuning For Enterprise System Administrators, Spark Summit East 2016Spark Tuning For Enterprise System Administrators, Spark Summit East 2016
Spark Tuning For Enterprise System Administrators, Spark Summit East 2016
 
Scale-Out Using Spark in Serverless Herd Mode!
Scale-Out Using Spark in Serverless Herd Mode!Scale-Out Using Spark in Serverless Herd Mode!
Scale-Out Using Spark in Serverless Herd Mode!
 
PaaSTA: Autoscaling at Yelp
PaaSTA: Autoscaling at YelpPaaSTA: Autoscaling at Yelp
PaaSTA: Autoscaling at Yelp
 
Deploying Accelerators At Datacenter Scale Using Spark
Deploying Accelerators At Datacenter Scale Using SparkDeploying Accelerators At Datacenter Scale Using Spark
Deploying Accelerators At Datacenter Scale Using Spark
 
Spark Summit EU talk by Jorg Schad
Spark Summit EU talk by Jorg SchadSpark Summit EU talk by Jorg Schad
Spark Summit EU talk by Jorg Schad
 
Understanding Memory Management In Spark For Fun And Profit
Understanding Memory Management In Spark For Fun And ProfitUnderstanding Memory Management In Spark For Fun And Profit
Understanding Memory Management In Spark For Fun And Profit
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applications
 
Monitoring of GPU Usage with Tensorflow Models Using Prometheus
Monitoring of GPU Usage with Tensorflow Models Using PrometheusMonitoring of GPU Usage with Tensorflow Models Using Prometheus
Monitoring of GPU Usage with Tensorflow Models Using Prometheus
 

Viewers also liked

Apache Hadoop - A Deep Dive (Part 1 - HDFS)
Apache Hadoop - A Deep Dive (Part 1 - HDFS) Apache Hadoop - A Deep Dive (Part 1 - HDFS)
Apache Hadoop - A Deep Dive (Part 1 - HDFS)
Debarchan Sarkar
 
HDFS Deep Dive
HDFS Deep DiveHDFS Deep Dive
HDFS Deep Dive
Zoltan C. Toth
 
Application architectures with Hadoop and Sessionization in MR
Application architectures with Hadoop and Sessionization in MRApplication architectures with Hadoop and Sessionization in MR
Application architectures with Hadoop and Sessionization in MR
markgrover
 
HDFS Analysis for Small Files
HDFS Analysis for Small FilesHDFS Analysis for Small Files
HDFS Analysis for Small Files
DataWorks Summit/Hadoop Summit
 
HBaseCon 2013: Using Apache HBase for Large Matrices
HBaseCon 2013: Using Apache HBase for Large MatricesHBaseCon 2013: Using Apache HBase for Large Matrices
HBaseCon 2013: Using Apache HBase for Large Matrices
Cloudera, Inc.
 
HBase Storage Internals
HBase Storage InternalsHBase Storage Internals
HBase Storage Internals
DataWorks Summit
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
enissoz
 
Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop Tutorial
Edureka!
 

Viewers also liked (8)

Apache Hadoop - A Deep Dive (Part 1 - HDFS)
Apache Hadoop - A Deep Dive (Part 1 - HDFS) Apache Hadoop - A Deep Dive (Part 1 - HDFS)
Apache Hadoop - A Deep Dive (Part 1 - HDFS)
 
HDFS Deep Dive
HDFS Deep DiveHDFS Deep Dive
HDFS Deep Dive
 
Application architectures with Hadoop and Sessionization in MR
Application architectures with Hadoop and Sessionization in MRApplication architectures with Hadoop and Sessionization in MR
Application architectures with Hadoop and Sessionization in MR
 
HDFS Analysis for Small Files
HDFS Analysis for Small FilesHDFS Analysis for Small Files
HDFS Analysis for Small Files
 
HBaseCon 2013: Using Apache HBase for Large Matrices
HBaseCon 2013: Using Apache HBase for Large MatricesHBaseCon 2013: Using Apache HBase for Large Matrices
HBaseCon 2013: Using Apache HBase for Large Matrices
 
HBase Storage Internals
HBase Storage InternalsHBase Storage Internals
HBase Storage Internals
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
 
Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop Tutorial
 

Similar to Breaking Spark: Top 5 mistakes to avoid when using Apache Spark in production

Building Efficient Pipelines in Apache Spark
Building Efficient Pipelines in Apache SparkBuilding Efficient Pipelines in Apache Spark
Building Efficient Pipelines in Apache Spark
Jeremy Beard
 
Effective Spark on Multi-Tenant Clusters
Effective Spark on Multi-Tenant ClustersEffective Spark on Multi-Tenant Clusters
Effective Spark on Multi-Tenant Clusters
DataWorks Summit/Hadoop Summit
 
Apache Spark Operations
Apache Spark OperationsApache Spark Operations
Apache Spark Operations
Cloudera, Inc.
 
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and KuduBuilding Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Jeremy Beard
 
Spark One Platform Webinar
Spark One Platform WebinarSpark One Platform Webinar
Spark One Platform Webinar
Cloudera, Inc.
 
Chicago spark meetup-april2017-public
Chicago spark meetup-april2017-publicChicago spark meetup-april2017-public
Chicago spark meetup-april2017-public
Guru Dharmateja Medasani
 
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for productionFaster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Cloudera, Inc.
 
Getting Apache Spark Customers to Production
Getting Apache Spark Customers to ProductionGetting Apache Spark Customers to Production
Getting Apache Spark Customers to Production
Cloudera, Inc.
 
Apache Spark At Apple with Sam Maclennan and Vishwanath Lakkundi
Apache Spark At Apple with Sam Maclennan and Vishwanath LakkundiApache Spark At Apple with Sam Maclennan and Vishwanath Lakkundi
Apache Spark At Apple with Sam Maclennan and Vishwanath Lakkundi
Databricks
 
Cloudera showcase c5.4
Cloudera showcase c5.4Cloudera showcase c5.4
Cloudera showcase c5.4
Cloudera, Inc.
 
Oracle EM12c Release 4 New Features!
Oracle EM12c Release 4 New Features!Oracle EM12c Release 4 New Features!
Oracle EM12c Release 4 New Features!
Kellyn Pot'Vin-Gorman
 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 

Cloudera, Inc.
 
Yarns about YARN: Migrating to MapReduce v2
Yarns about YARN: Migrating to MapReduce v2Yarns about YARN: Migrating to MapReduce v2
Yarns about YARN: Migrating to MapReduce v2
DataWorks Summit
 
Kafka for DBAs
Kafka for DBAsKafka for DBAs
Kafka for DBAs
Gwen (Chen) Shapira
 
Productionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerProductionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job Server
Evan Chan
 
Updated Power of the AWR Warehouse, Dallas, HQ, etc.
Updated Power of the AWR Warehouse, Dallas, HQ, etc.Updated Power of the AWR Warehouse, Dallas, HQ, etc.
Updated Power of the AWR Warehouse, Dallas, HQ, etc.
Kellyn Pot'Vin-Gorman
 
Decoupling Decisions with Apache Kafka
Decoupling Decisions with Apache KafkaDecoupling Decisions with Apache Kafka
Decoupling Decisions with Apache Kafka
Grant Henke
 
Kscope Not Your Father's Enterprise Manager
Kscope Not Your Father's Enterprise ManagerKscope Not Your Father's Enterprise Manager
Kscope Not Your Father's Enterprise Manager
Kellyn Pot'Vin-Gorman
 
Get most out of Spark on YARN
Get most out of Spark on YARNGet most out of Spark on YARN
Get most out of Spark on YARN
DataWorks Summit
 
Power of the AWR Warehouse- HotSos Symposium 2015
Power of the AWR Warehouse-  HotSos Symposium 2015Power of the AWR Warehouse-  HotSos Symposium 2015
Power of the AWR Warehouse- HotSos Symposium 2015
Kellyn Pot'Vin-Gorman
 

Similar to Breaking Spark: Top 5 mistakes to avoid when using Apache Spark in production (20)

Building Efficient Pipelines in Apache Spark
Building Efficient Pipelines in Apache SparkBuilding Efficient Pipelines in Apache Spark
Building Efficient Pipelines in Apache Spark
 
Effective Spark on Multi-Tenant Clusters
Effective Spark on Multi-Tenant ClustersEffective Spark on Multi-Tenant Clusters
Effective Spark on Multi-Tenant Clusters
 
Apache Spark Operations
Apache Spark OperationsApache Spark Operations
Apache Spark Operations
 
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and KuduBuilding Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
 
Spark One Platform Webinar
Spark One Platform WebinarSpark One Platform Webinar
Spark One Platform Webinar
 
Chicago spark meetup-april2017-public
Chicago spark meetup-april2017-publicChicago spark meetup-april2017-public
Chicago spark meetup-april2017-public
 
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for productionFaster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
 
Getting Apache Spark Customers to Production
Getting Apache Spark Customers to ProductionGetting Apache Spark Customers to Production
Getting Apache Spark Customers to Production
 
Apache Spark At Apple with Sam Maclennan and Vishwanath Lakkundi
Apache Spark At Apple with Sam Maclennan and Vishwanath LakkundiApache Spark At Apple with Sam Maclennan and Vishwanath Lakkundi
Apache Spark At Apple with Sam Maclennan and Vishwanath Lakkundi
 
Cloudera showcase c5.4
Cloudera showcase c5.4Cloudera showcase c5.4
Cloudera showcase c5.4
 
Oracle EM12c Release 4 New Features!
Oracle EM12c Release 4 New Features!Oracle EM12c Release 4 New Features!
Oracle EM12c Release 4 New Features!
 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 

 
Yarns about YARN: Migrating to MapReduce v2
Yarns about YARN: Migrating to MapReduce v2Yarns about YARN: Migrating to MapReduce v2
Yarns about YARN: Migrating to MapReduce v2
 
Kafka for DBAs
Kafka for DBAsKafka for DBAs
Kafka for DBAs
 
Productionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerProductionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job Server
 
Updated Power of the AWR Warehouse, Dallas, HQ, etc.
Updated Power of the AWR Warehouse, Dallas, HQ, etc.Updated Power of the AWR Warehouse, Dallas, HQ, etc.
Updated Power of the AWR Warehouse, Dallas, HQ, etc.
 
Decoupling Decisions with Apache Kafka
Decoupling Decisions with Apache KafkaDecoupling Decisions with Apache Kafka
Decoupling Decisions with Apache Kafka
 
Kscope Not Your Father's Enterprise Manager
Kscope Not Your Father's Enterprise ManagerKscope Not Your Father's Enterprise Manager
Kscope Not Your Father's Enterprise Manager
 
Get most out of Spark on YARN
Get most out of Spark on YARNGet most out of Spark on YARN
Get most out of Spark on YARN
 
Power of the AWR Warehouse- HotSos Symposium 2015
Power of the AWR Warehouse-  HotSos Symposium 2015Power of the AWR Warehouse-  HotSos Symposium 2015
Power of the AWR Warehouse- HotSos Symposium 2015
 

Recently uploaded

Object Oriented Analysis and Design - OOAD
Object Oriented Analysis and Design - OOADObject Oriented Analysis and Design - OOAD
Object Oriented Analysis and Design - OOAD
PreethaV16
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Sinan KOZAK
 
Open Channel Flow: fluid flow with a free surface
Open Channel Flow: fluid flow with a free surfaceOpen Channel Flow: fluid flow with a free surface
Open Channel Flow: fluid flow with a free surface
Indrajeet sahu
 
5G Radio Network Througput Problem Analysis HCIA.pdf
5G Radio Network Througput Problem Analysis HCIA.pdf5G Radio Network Througput Problem Analysis HCIA.pdf
5G Radio Network Througput Problem Analysis HCIA.pdf
AlvianRamadhani5
 
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
Gino153088
 
NATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENT
NATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENTNATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENT
NATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENT
Addu25809
 
Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...
Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...
Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...
Transcat
 
Introduction to Computer Networks & OSI MODEL.ppt
Introduction to Computer Networks & OSI MODEL.pptIntroduction to Computer Networks & OSI MODEL.ppt
Introduction to Computer Networks & OSI MODEL.ppt
Dwarkadas J Sanghvi College of Engineering
 
Blood finder application project report (1).pdf
Blood finder application project report (1).pdfBlood finder application project report (1).pdf
Blood finder application project report (1).pdf
Kamal Acharya
 
ITSM Integration with MuleSoft.pptx
ITSM  Integration with MuleSoft.pptxITSM  Integration with MuleSoft.pptx
ITSM Integration with MuleSoft.pptx
VANDANAMOHANGOUDA
 
Accident detection system project report.pdf
Accident detection system project report.pdfAccident detection system project report.pdf
Accident detection system project report.pdf
Kamal Acharya
 
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by AnantLLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
Anant Corporation
 
Supermarket Management System Project Report.pdf
Supermarket Management System Project Report.pdfSupermarket Management System Project Report.pdf
Supermarket Management System Project Report.pdf
Kamal Acharya
 
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
MadhavJungKarki
 
Mechatronics material . Mechanical engineering
Mechatronics material . Mechanical engineeringMechatronics material . Mechanical engineering
Mechatronics material . Mechanical engineering
sachin chaurasia
 
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
ecqow
 
Zener Diode and its V-I Characteristics and Applications
Zener Diode and its V-I Characteristics and ApplicationsZener Diode and its V-I Characteristics and Applications
Zener Diode and its V-I Characteristics and Applications
Shiny Christobel
 
Height and depth gauge linear metrology.pdf
Height and depth gauge linear metrology.pdfHeight and depth gauge linear metrology.pdf
Height and depth gauge linear metrology.pdf
q30122000
 
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
upoux
 
An Introduction to the Compiler Designss
An Introduction to the Compiler DesignssAn Introduction to the Compiler Designss
An Introduction to the Compiler Designss
ElakkiaU
 

Recently uploaded (20)

Object Oriented Analysis and Design - OOAD
Object Oriented Analysis and Design - OOADObject Oriented Analysis and Design - OOAD
Object Oriented Analysis and Design - OOAD
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
 
Open Channel Flow: fluid flow with a free surface
Open Channel Flow: fluid flow with a free surfaceOpen Channel Flow: fluid flow with a free surface
Open Channel Flow: fluid flow with a free surface
 
5G Radio Network Througput Problem Analysis HCIA.pdf
5G Radio Network Througput Problem Analysis HCIA.pdf5G Radio Network Througput Problem Analysis HCIA.pdf
5G Radio Network Througput Problem Analysis HCIA.pdf
 
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
 
NATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENT
NATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENTNATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENT
NATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENT
 
Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...
Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...
Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...
 
Introduction to Computer Networks & OSI MODEL.ppt
Introduction to Computer Networks & OSI MODEL.pptIntroduction to Computer Networks & OSI MODEL.ppt
Introduction to Computer Networks & OSI MODEL.ppt
 
Blood finder application project report (1).pdf
Blood finder application project report (1).pdfBlood finder application project report (1).pdf
Blood finder application project report (1).pdf
 
ITSM Integration with MuleSoft.pptx
ITSM  Integration with MuleSoft.pptxITSM  Integration with MuleSoft.pptx
ITSM Integration with MuleSoft.pptx
 
Accident detection system project report.pdf
Accident detection system project report.pdfAccident detection system project report.pdf
Accident detection system project report.pdf
 
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by AnantLLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
 
Supermarket Management System Project Report.pdf
Supermarket Management System Project Report.pdfSupermarket Management System Project Report.pdf
Supermarket Management System Project Report.pdf
 
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
 
Mechatronics material . Mechanical engineering
Mechatronics material . Mechanical engineeringMechatronics material . Mechanical engineering
Mechatronics material . Mechanical engineering
 
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
 
Zener Diode and its V-I Characteristics and Applications
Zener Diode and its V-I Characteristics and ApplicationsZener Diode and its V-I Characteristics and Applications
Zener Diode and its V-I Characteristics and Applications
 
Height and depth gauge linear metrology.pdf
Height and depth gauge linear metrology.pdfHeight and depth gauge linear metrology.pdf
Height and depth gauge linear metrology.pdf
 
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
 
An Introduction to the Compiler Designss
An Introduction to the Compiler DesignssAn Introduction to the Compiler Designss
An Introduction to the Compiler Designss
 

Breaking Spark: Top 5 mistakes to avoid when using Apache Spark in production

  • 1. 1© Cloudera, Inc. All rights reserved. Neelesh Srinivas Salian Breaking Spark Top 5 Mistakes to avoid when using Apache Spark in Production
  • 2. 2© Cloudera, Inc. All rights reserved. Outline • Introduction to Spark • Information gathering • Classification of the Issues • Description • Remedies • Q&A
  • 3. 3© Cloudera, Inc. All rights reserved.
  • 4. 4© Cloudera, Inc. All rights reserved. Information Gathering Regular Spark Applications and memory configuration problems Problems in the Ingest pipelines Problems in interacting with Kafka Usually Resource Management issues Spark Core Spark Streaming Spark on YARN Source: • Cloudera Customer Opened Cases • Posts in Cloudera Community Forums • Spark use cases for the following:
  • 5. 5© Cloudera, Inc. All rights reserved. Classification 1. Scaling of the Architecture 2. Memory Configurations 3. End user Code 4. Incompatible Dependencies 5. Administration/Operation related issues
  • 6. 6© Cloudera, Inc. All rights reserved. 1. Increase in the cluster Size 2. Utilization of the Cluster 1. Scaling of the Architecture
  • 7. 7© Cloudera, Inc. All rights reserved. 1. Understand the initial setup and current one 2. Look at logs / console to find the error 3. Determine parameters 4. Tune away 1. Scale of the Architecture: Larger Datasets
  • 8. 8© Cloudera, Inc. All rights reserved. YARN Remember: Container Memory • yarn.nodemanager.resource.memory-mb Container Memory Maximum • yarn.scheduler.maximum-allocation-mb Container Memory Minimum • yarn.scheduler.minimum-allocation-mb 2. Memory Configurations Standalone Java Heap Size of Worker in Bytes • worker_max_heapsize Java Heap Size of Master in Bytes • master_max_heapsize
  • 9. 9© Cloudera, Inc. All rights reserved. Symptoms: Executor error: ERROR util.SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker- 0,5,main] java.lang.OutOfMemoryError: Java heap space Driver error: ERROR actor.ActorSystemImpl: Uncaught fatal error from thread [sparkDriver-akka.actor.default- dispatcher-60] shutting down ActorSystem [sparkDriver] java.lang.OutOfMemoryError: Java heap space Cause: The driver or executor is configured with a memory limit that is too low for the application being run. 2. Memory Configurations: Insufficient Memory
  • 10. 10© Cloudera, Inc. All rights reserved. 1. Understand Spark Configuration Order of Precedence 1. Spark configuration file: Parameters are passed to SparkConf. 2. Command-line arguments are passed to spark-submit, spark-shell, or pyspark. 3. Cloudera Manager UI properties set in the spark-defaults.conf configuration file. 2. Verify the Driver and Executor Memory Used 1. Environment tab in History Server 2. Check driver and executor memory usage values 3. Increase the Heap Caveat: do not set the spark.driver.memory config using SparkConf , use - - driver-memory • spark.driver.memory • spark.executor.memory 2. Memory Configurations: Insufficient Memory
  • 11. 11© Cloudera, Inc. All rights reserved. 1. Why chose Spark code over MapReduce? 1. Expensive Operations 1. Syntax / Command Errors 3. End User Code
  • 12. 12© Cloudera, Inc. All rights reserved. 1. Incorrectly built jars 2. Version mismatch 4. Incompatible Dependencies
  • 13. 13© Cloudera, Inc. All rights reserved. Symptoms Error: Could not find or load main class org.apache.spark.deploy.yarn.Application Master java.lang.ClassNotFoundException: org.apache.hadoop.mapreduce.MRJobCon fig java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStrea m ImportError: No module named qt_returns_summary_wrapper 4. Incompatible Dependencies: Classpath Issues
  • 14. 14© Cloudera, Inc. All rights reserved. What to Do? 1. Verify if an application is using classes that are not part of the framework (Hadoop/Spark) 1. Verify that an application is built using correct artifacts. 4. Incompatible Dependencies: Classpath Issues
  • 15. 15© Cloudera, Inc. All rights reserved. 1. Management of Cluster 2. User-related issues 3. Security/Permissions 4. No Configurations setup 5. Administration/Operation Related issues
  • 16. 16© Cloudera, Inc. All rights reserved. • Learn from the Mistakes • Be prepared • References: • Tuning Spark Blog Cloudera • Tuning YARN Cloudera • Spark Documentation Summary
  • 17. 17© Cloudera, Inc. All rights reserved. • Wilfred • Paula • Bjorn • Mark • Anand • Sean • Attila Credits
  • 18. 18© Cloudera, Inc. All rights reserved. Thank you Neelesh Srinivas Salian LinkedIn Email: neeleshssalian@gmail.com Twitter: NeelS7

Editor's Notes

  1. So just to begin with a brief introduction: Spark is an in-memory framework that enables a user to process data and perform analytics using the different components within it. This includes Core (Which is the main crux or the engine of Spark) Streaming (which allows you to process data that comes to the user in the form a stream of bits) MLLib ( the machine learning library that empowers the users to explore the usage of Machine learning algorithms to enable better insights) SQL ( the SQL engine in the Spark framework ) Spark, on the other hand, is a data-processing tool that operates on those distributed data collections; it doesn't do distributed storage. Having been in the environment that includes a lot of other components that form the CDH ecosystem, Spark has increased a lot in terms of Adoption.