SlideShare a Scribd company logo
Presented by
Rohith Sharma, Naganarasimha &
Sunil
About us..
Rohith Sharma K S,
-Hadoop Committer, Works for Huawei
-5+ year of experience in Hadoop ecosystems
Naganarasimha G R,
-Apache Hadoop Contributor for YARN, Huawei
-4+ year of experience in Hadoop ecosystems
Sunil Govind
-Apache Hadoop Contributor for YARN and MapReduce
-3+ year of experience in Hadoop ecosystems
Agenda
➔Overview about general cluster deployment
➔Yarn cluster resource configurations walk through
➔Anti Patterns
◆ MapReduce
◆ YARN
● RM Restart/HA
● Queue Planning
➔Summary
Brief Overview: General Cluster
DeploymentA sample Hadoop Cluster Layout with HA
NM DNRM
(Master)
NN
(Master)
RM
(Backup)
NN
(Backup)
NM
NM
NM DN
DN
DN
Client
ATS RM - Resource Manager
NM - Node Manager
NN - Name Node
DN - Data Node
ATS - Application Timeline Server
ZK - ZooKeeper
ZK
ZK
ZK
ZooKeeper Cluster
YARN Configuration : An Example
Legacy NodeManager’s or DataNode’s were having low resource configurations. Nowadays most of the
systems has high end capability and customers wants high end machines with less number of nodes
(50~100 nodes) to achieve better performance.
Sample NodeManager configurations could be like:
-64 GB in Memory
-8/16 cores of CPU
-1Gb Network cards
-100 TB disk (or Disk Arrays)
We are now more focussing on these set of deployment and will try to cover anti-patterns OR best
usages in coming slides.
YARN Configuration: Related to
Resources
NodeManager:
●yarn.nodemanager.resource.memory-mb
●yarn.nodemanager.resource.cpu-vcores
●yarn.nodemanager.vmem-pmem-ratio
●yarn.nodemanager.log-dirs
●yarn.nodemanager.local-dirs
Scheduler:
●yarn.scheduler.minimum-allocation-mb
●yarn.scheduler.maximum-allocation-mb
MapReduce:
●mapreduce.map/reduce.java.opts
●mapreduce.map/reduce.memory.mb
●mapreduce.map/reduce.cpu.vcores
YARN and MR has these various resource tuning configurations to help for a better resource
allocation.
●With “vmem-pmem-ratio” (2:1 for example), Node Manager can kill container if its Virtual
Memory shoots twice to its configured memory usage.
●It’s advised to configure “local-dirs” and “log-dirs” in different mount points.
Anti Pattern in
MRAppMaster
Container Memory Vs Container Heap
MemoryCustomer : “Enough container memory is configured, still job runs slowly and sometimes
when data is relatively more, tasks fails with OOM”
Resolution:
1.Container memory and container Heap Size both are different configurations.
2.Make sure if mapreduce.map/reduce.memory.mb is configured then configure
mapreduce.map/reduce.java.opts for heap size.
3.Since this was common mistake from users, currently in trunk we have handled this scenario. RM will
set 0.8 of container configured/requested memory as its heap memory.
1. if mapreduce.map/reduce.memory.mb values are specified, but no -Xmx is supplied for
mapreduce.map/reduce.java.opts keys, then the -Xmx value will be derived from the former's value.
2. For both these conversions, a scaling factor specified by property mapreduce.job.heap.memory-
mb.ratio is used (default 80%), to account for overheads between heap usage vs. actual physical
memory usage.
Shuffle phase is taking long time
Customer: “500 GB data Job finished in 4 hours, and on the cluster 1000 GB data
job is running since 12 hours in reducer phase. I think job is stuck.”
After enquiring more about resource configuration,
The same resource configurations used for both the jobs
Resolution:
1.Job is NOT hanged/stuck, rather time has spent on copying map output.
2.Increase the task resources
3.Tuning configurations
mapreduce.reduce.shuffle.parallelcopies
mapreduce.reduce.shuffle.input.buffer.percent
Anti Pattern in YARN
RM Restart : RMStateStore Limit
Customer: “Configured to yarn.resourcemanager.max-completed-applications to 100000.
Completed applications in cluster has reached the limit and there many applications are in
running. Observation is RM service to be up, takes 10-15 seconds”
Resolution:
1.It is NOT suggested to configure 100000 max-completed-applications.
2.Suggested to use TimelimeServer for history of YARN applications
3.Higher the value significantly impact on the RM recovery
Queue planning
Queue planning : Queue Mapping
Queue planning : Queue Capacity Planning and
Preemption
Queue planning : Queue Capacity Planning for
multiple usersCustomer : “I have multiple users submitting apps to a queue, seems like all the resources have
been taken by single user’s app(s) though other apps are activated“
Queue Capacity Planning :
CS provides options to control resources used by different users under a queue. yarn.scheduler.capacity.<queue-
path>.minimum-user-limit-percent and yarn.scheduler.capacity.<queue-path>.user-limit-factor are the configurations which
determines what amount of resources each user gets
yarn.scheduler.capacity.<queue-path>.minimum-user-limit-percent defaults to 100% which implies no user limits are imposed.
This defines how much minimum resource each user is going to get.
yarn.scheduler.capacity.<queue-path>.user-limit-factor defaults to 1 which implies that a single user can never take complete
queue’s resources. Needs to be configured such that even when other users are not using the queue, how much a particular
user can take.
Queue planning : AM Resource Limit
Customer: “Hey buddy, most of my Jobs are in ACCEPTED state and never starts to run.
What should be the problem?”
“All my Jobs were running fine. But after RM switchover, few Jobs didn’t resume its work.
Why RM is not able to allocate new containers to these Jobs?”
Resolution:
1.User need to ensure that AM Resource Limit is properly configured w.r.t the User’s deployment needs.
Maximum resource limit for running AM containers need to be analyzed and configured correctly to
ensure effective progress of applications.
a. Refer yarn.scheduler.capacity.maximum-am-resource-percent
2.After RM switchover if few NMs were not registered back, it can result a change in cluster size
compared to what was there prior to failover. This will affect the AM Resource Limit, and hence less AMs
will be activated after restart.
3.For analytical : more AM limit, For Batch queries : less AM limit
Queue planning : Application Priority
within QueueCustomer : “I have many applications running in my cluster, and few are very important jobs
which has to execute fast. I now use separate queues to run some very important
applications. Configuration seems very complex here and I feel cluster resources are not
utilized well because of this.”
Resolution:
root
sales (50%) inventory(50%)
low
40%
high
20%
med
40%
low
40%
high
20%
med
40%
Configuration seems very complex for this case and
cluster resources may not be utilized very well.
Suggesting to use Application Priority instead.
Resolution:
Application Priority will be available in YARN from 2.8 release onwards. A brief heads-up
about this feature.
1.Configure “yarn.cluster.max-application-priority” in yarn-site.xml. This will be the maximum
priority for any user/application which can be configured.
2.Within a queue, currently applications are selected by using OrderingPolicy (FIFO/Fair). If
applications are submitted with priority, Capacity Scheduler will also consider prioirity of
application in FiFoOrderingPolicy. Hence an application with highest priority will always be
picked for resource allocation.
3.For MapReduce, use “mapreduce.job.priority” to set priority.
Application Priority within Queue
(contd..)
Resource Request Limits
Customer: “I am not very sure about the capacity of node managers and maximum-allocation
resource configuration. But my application is not getting any containers or its getting killed.”
Resolution/Suggestion:
NMs are not having more than 6GB memory. If container request has big memory/cpu demand which
may more than a node manager’s memory and less than default “maximum-allocation-mb”, then
container requests will not be served by RM. Unfortunately this is not thrown as an error to the user side,
and application will continuously wait for allocation. On the other hand, Scheduler will also be waiting for
some nodes to meet this heavy resource requests.
User yarn.scheduler.maximum-allocation-mb and yarn.scheduler.maximum-allocation-vcores effectively by looking up
on the NodeManager memory/cpu limit.
Reservation Issue
Customer : “My Application has reserved container in a node and never able to get new
containers.”
Resolution:
Reservation feature in Capacity Scheduler serves a great deal to ensure a better linear resource
allocation. However it’s possible that there can be few corner cases. For example, an application has
made a reservation to a node. But this node has various containers running (long-lived), so chances of
getting some free resources from this node is minimal in an immediate time frame.
Configurations like below can help in having some time-framed reservation for effective cluster usage.
●yarn.scheduler.capacity.reservations-continue-look-all-nodes will help in looking for a suitable resource in other
nodes too.
Suggestions in Resource Configuration
Thank you

More Related Content

What's hot

Resource Aware Scheduling for Hadoop [Final Presentation]
Resource Aware Scheduling for Hadoop [Final Presentation]Resource Aware Scheduling for Hadoop [Final Presentation]
Resource Aware Scheduling for Hadoop [Final Presentation]
Lu Wei
 
C++ PSM for DDS: Revised Submission
C++ PSM for DDS: Revised SubmissionC++ PSM for DDS: Revised Submission
C++ PSM for DDS: Revised Submission
Rick Warren
 
Inside MapR's M7
Inside MapR's M7Inside MapR's M7
Inside MapR's M7
Ted Dunning
 
An Introduction to Apache Hadoop Yarn
An Introduction to Apache Hadoop YarnAn Introduction to Apache Hadoop Yarn
An Introduction to Apache Hadoop Yarn
Mike Frampton
 
Apache Hadoop YARN
Apache Hadoop YARNApache Hadoop YARN
Apache Hadoop YARN
Adam Kawa
 
Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2
Cloudera, Inc.
 
Qos aware data replication for data-intensive applications in cloud computing...
Qos aware data replication for data-intensive applications in cloud computing...Qos aware data replication for data-intensive applications in cloud computing...
Qos aware data replication for data-intensive applications in cloud computing...
Papitha Velumani
 
Challenges & Capabilites in Managing a MapR Cluster by David Tucker
Challenges & Capabilites in Managing a MapR Cluster by David TuckerChallenges & Capabilites in Managing a MapR Cluster by David Tucker
Challenges & Capabilites in Managing a MapR Cluster by David Tucker
MapR Technologies
 
Introduction to Yarn
Introduction to YarnIntroduction to Yarn
Introduction to Yarn
Omid Vahdaty
 
Resource scheduling
Resource schedulingResource scheduling
Resource scheduling
Ghazal Tashakor
 
Taming YARN @ Hadoop Conference Japan 2014
Taming YARN @ Hadoop Conference Japan 2014Taming YARN @ Hadoop Conference Japan 2014
Taming YARN @ Hadoop Conference Japan 2014Tsuyoshi OZAWA
 
Hadoop YARN overview
Hadoop YARN overviewHadoop YARN overview
Hadoop YARN overview
Arnon Rotem-Gal-Oz
 
Hadoop YARN
Hadoop YARNHadoop YARN
Hadoop YARN
Vigen Sahakyan
 
MapReduce Improvements in MapR Hadoop
MapReduce Improvements in MapR HadoopMapReduce Improvements in MapR Hadoop
MapReduce Improvements in MapR Hadoopabord
 
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Sumeet Singh
 
Advanced Hadoop Tuning and Optimization
Advanced Hadoop Tuning and Optimization Advanced Hadoop Tuning and Optimization
Advanced Hadoop Tuning and Optimization
Shivkumar Babshetty
 
Yarn
YarnYarn
Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...
Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...
Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...
Cloudera, Inc.
 
Architectural Overview of MapR's Apache Hadoop Distribution
Architectural Overview of MapR's Apache Hadoop DistributionArchitectural Overview of MapR's Apache Hadoop Distribution
Architectural Overview of MapR's Apache Hadoop Distribution
mcsrivas
 

What's hot (20)

Resource Aware Scheduling for Hadoop [Final Presentation]
Resource Aware Scheduling for Hadoop [Final Presentation]Resource Aware Scheduling for Hadoop [Final Presentation]
Resource Aware Scheduling for Hadoop [Final Presentation]
 
C++ PSM for DDS: Revised Submission
C++ PSM for DDS: Revised SubmissionC++ PSM for DDS: Revised Submission
C++ PSM for DDS: Revised Submission
 
Inside MapR's M7
Inside MapR's M7Inside MapR's M7
Inside MapR's M7
 
An Introduction to Apache Hadoop Yarn
An Introduction to Apache Hadoop YarnAn Introduction to Apache Hadoop Yarn
An Introduction to Apache Hadoop Yarn
 
Apache Hadoop YARN
Apache Hadoop YARNApache Hadoop YARN
Apache Hadoop YARN
 
Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2
 
Qos aware data replication for data-intensive applications in cloud computing...
Qos aware data replication for data-intensive applications in cloud computing...Qos aware data replication for data-intensive applications in cloud computing...
Qos aware data replication for data-intensive applications in cloud computing...
 
Challenges & Capabilites in Managing a MapR Cluster by David Tucker
Challenges & Capabilites in Managing a MapR Cluster by David TuckerChallenges & Capabilites in Managing a MapR Cluster by David Tucker
Challenges & Capabilites in Managing a MapR Cluster by David Tucker
 
Introduction to Yarn
Introduction to YarnIntroduction to Yarn
Introduction to Yarn
 
Resource scheduling
Resource schedulingResource scheduling
Resource scheduling
 
Taming YARN @ Hadoop Conference Japan 2014
Taming YARN @ Hadoop Conference Japan 2014Taming YARN @ Hadoop Conference Japan 2014
Taming YARN @ Hadoop Conference Japan 2014
 
Hadoop YARN overview
Hadoop YARN overviewHadoop YARN overview
Hadoop YARN overview
 
Hadoop YARN
Hadoop YARNHadoop YARN
Hadoop YARN
 
MapReduce Improvements in MapR Hadoop
MapReduce Improvements in MapR HadoopMapReduce Improvements in MapR Hadoop
MapReduce Improvements in MapR Hadoop
 
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
 
Advanced Hadoop Tuning and Optimization
Advanced Hadoop Tuning and Optimization Advanced Hadoop Tuning and Optimization
Advanced Hadoop Tuning and Optimization
 
Anatomy of Hadoop YARN
Anatomy of Hadoop YARNAnatomy of Hadoop YARN
Anatomy of Hadoop YARN
 
Yarn
YarnYarn
Yarn
 
Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...
Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...
Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...
 
Architectural Overview of MapR's Apache Hadoop Distribution
Architectural Overview of MapR's Apache Hadoop DistributionArchitectural Overview of MapR's Apache Hadoop Distribution
Architectural Overview of MapR's Apache Hadoop Distribution
 

Viewers also liked

Big Data Performance and Capacity Management
Big Data Performance and Capacity ManagementBig Data Performance and Capacity Management
Big Data Performance and Capacity Management
rightsize
 
Performance Issues on Hadoop Clusters
Performance Issues on Hadoop ClustersPerformance Issues on Hadoop Clusters
Performance Issues on Hadoop Clusters
Xiao Qin
 
Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...
Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...
Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...
Renato Bonomini
 
10 Popular Hadoop Technical Interview Questions
10 Popular Hadoop Technical Interview Questions10 Popular Hadoop Technical Interview Questions
10 Popular Hadoop Technical Interview Questions
ZaranTech LLC
 
Hadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questionsHadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questions
Asad Masood Qazi
 
Hadoop Administration pdf
Hadoop Administration pdfHadoop Administration pdf
Hadoop Administration pdfEdureka!
 
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
RTTS
 
Advanced Building Materials
Advanced Building MaterialsAdvanced Building Materials
Advanced Building Materials
Srishti Mehta
 

Viewers also liked (9)

Big Data Performance and Capacity Management
Big Data Performance and Capacity ManagementBig Data Performance and Capacity Management
Big Data Performance and Capacity Management
 
Performance Issues on Hadoop Clusters
Performance Issues on Hadoop ClustersPerformance Issues on Hadoop Clusters
Performance Issues on Hadoop Clusters
 
Hadoop admin
Hadoop adminHadoop admin
Hadoop admin
 
Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...
Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...
Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...
 
10 Popular Hadoop Technical Interview Questions
10 Popular Hadoop Technical Interview Questions10 Popular Hadoop Technical Interview Questions
10 Popular Hadoop Technical Interview Questions
 
Hadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questionsHadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questions
 
Hadoop Administration pdf
Hadoop Administration pdfHadoop Administration pdf
Hadoop Administration pdf
 
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
 
Advanced Building Materials
Advanced Building MaterialsAdvanced Building Materials
Advanced Building Materials
 

Similar to Anti patterns in Hadoop Cluster deployment

Spark & Yarn better together 1.2
Spark & Yarn better together 1.2Spark & Yarn better together 1.2
Spark & Yarn better together 1.2
Jianfeng Zhang
 
Topology Aware Resource Allocation
Topology Aware Resource AllocationTopology Aware Resource Allocation
Topology Aware Resource Allocation
Sujith Jay Nair
 
DevoxxUK: Optimizating Application Performance on Kubernetes
DevoxxUK: Optimizating Application Performance on KubernetesDevoxxUK: Optimizating Application Performance on Kubernetes
DevoxxUK: Optimizating Application Performance on Kubernetes
Dinakar Guniguntala
 
A sdn based application aware and network provisioning
A sdn based application aware and network provisioningA sdn based application aware and network provisioning
A sdn based application aware and network provisioning
Stanley Wang
 
Weblogic Cluster advanced performance tuning
Weblogic Cluster advanced performance tuningWeblogic Cluster advanced performance tuning
Weblogic Cluster advanced performance tuning
Aditya Bhuyan
 
Weblogic performance tuning2
Weblogic performance tuning2Weblogic performance tuning2
Weblogic performance tuning2
Aditya Bhuyan
 
Venturing into Large Hadoop Clusters
Venturing into Large Hadoop ClustersVenturing into Large Hadoop Clusters
Venturing into Large Hadoop Clusters
VARUN SAXENA
 
Scalable analytics for iaas cloud availability
Scalable analytics for iaas cloud availabilityScalable analytics for iaas cloud availability
Scalable analytics for iaas cloud availability
Papitha Velumani
 
Weblogic Cluster performance tuning
Weblogic Cluster performance tuningWeblogic Cluster performance tuning
Weblogic Cluster performance tuning
Aditya Bhuyan
 
Weblogic performance tuning1
Weblogic performance tuning1Weblogic performance tuning1
Weblogic performance tuning1
Aditya Bhuyan
 
Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...
Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...
Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...
DataStax Academy
 
Speed up your XPages Application performance
Speed up your XPages Application performanceSpeed up your XPages Application performance
Speed up your XPages Application performance
Maarga Systems
 
QoS-Aware Data Replication for Data-Intensive Applications in Cloud Computing...
QoS-Aware Data Replication for Data-Intensive Applications in Cloud Computing...QoS-Aware Data Replication for Data-Intensive Applications in Cloud Computing...
QoS-Aware Data Replication for Data-Intensive Applications in Cloud Computing...
Papitha Velumani
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Hortonworks
 
YARN - way to share cluster BEYOND HADOOP
YARN - way to share cluster BEYOND HADOOPYARN - way to share cluster BEYOND HADOOP
YARN - way to share cluster BEYOND HADOOP
Omkar Joshi
 
Velocity 2018 preetha appan final
Velocity 2018   preetha appan finalVelocity 2018   preetha appan final
Velocity 2018 preetha appan final
preethaappan
 
Webcenter application performance tuning guide
Webcenter application performance tuning guideWebcenter application performance tuning guide
Webcenter application performance tuning guide
Vinay Kumar
 
Hadoop mapreduce and yarn frame work- unit5
Hadoop mapreduce and yarn frame work-  unit5Hadoop mapreduce and yarn frame work-  unit5
Hadoop mapreduce and yarn frame work- unit5
RojaT4
 
Scalable analytics for iaas cloud availability
Scalable analytics for iaas cloud availabilityScalable analytics for iaas cloud availability
Scalable analytics for iaas cloud availability
Papitha Velumani
 
IRJET- Amazon Redshift Workload Management and Fast Retrieval of Data
IRJET- Amazon Redshift Workload Management and Fast Retrieval of DataIRJET- Amazon Redshift Workload Management and Fast Retrieval of Data
IRJET- Amazon Redshift Workload Management and Fast Retrieval of Data
IRJET Journal
 

Similar to Anti patterns in Hadoop Cluster deployment (20)

Spark & Yarn better together 1.2
Spark & Yarn better together 1.2Spark & Yarn better together 1.2
Spark & Yarn better together 1.2
 
Topology Aware Resource Allocation
Topology Aware Resource AllocationTopology Aware Resource Allocation
Topology Aware Resource Allocation
 
DevoxxUK: Optimizating Application Performance on Kubernetes
DevoxxUK: Optimizating Application Performance on KubernetesDevoxxUK: Optimizating Application Performance on Kubernetes
DevoxxUK: Optimizating Application Performance on Kubernetes
 
A sdn based application aware and network provisioning
A sdn based application aware and network provisioningA sdn based application aware and network provisioning
A sdn based application aware and network provisioning
 
Weblogic Cluster advanced performance tuning
Weblogic Cluster advanced performance tuningWeblogic Cluster advanced performance tuning
Weblogic Cluster advanced performance tuning
 
Weblogic performance tuning2
Weblogic performance tuning2Weblogic performance tuning2
Weblogic performance tuning2
 
Venturing into Large Hadoop Clusters
Venturing into Large Hadoop ClustersVenturing into Large Hadoop Clusters
Venturing into Large Hadoop Clusters
 
Scalable analytics for iaas cloud availability
Scalable analytics for iaas cloud availabilityScalable analytics for iaas cloud availability
Scalable analytics for iaas cloud availability
 
Weblogic Cluster performance tuning
Weblogic Cluster performance tuningWeblogic Cluster performance tuning
Weblogic Cluster performance tuning
 
Weblogic performance tuning1
Weblogic performance tuning1Weblogic performance tuning1
Weblogic performance tuning1
 
Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...
Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...
Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...
 
Speed up your XPages Application performance
Speed up your XPages Application performanceSpeed up your XPages Application performance
Speed up your XPages Application performance
 
QoS-Aware Data Replication for Data-Intensive Applications in Cloud Computing...
QoS-Aware Data Replication for Data-Intensive Applications in Cloud Computing...QoS-Aware Data Replication for Data-Intensive Applications in Cloud Computing...
QoS-Aware Data Replication for Data-Intensive Applications in Cloud Computing...
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with Hadoop
 
YARN - way to share cluster BEYOND HADOOP
YARN - way to share cluster BEYOND HADOOPYARN - way to share cluster BEYOND HADOOP
YARN - way to share cluster BEYOND HADOOP
 
Velocity 2018 preetha appan final
Velocity 2018   preetha appan finalVelocity 2018   preetha appan final
Velocity 2018 preetha appan final
 
Webcenter application performance tuning guide
Webcenter application performance tuning guideWebcenter application performance tuning guide
Webcenter application performance tuning guide
 
Hadoop mapreduce and yarn frame work- unit5
Hadoop mapreduce and yarn frame work-  unit5Hadoop mapreduce and yarn frame work-  unit5
Hadoop mapreduce and yarn frame work- unit5
 
Scalable analytics for iaas cloud availability
Scalable analytics for iaas cloud availabilityScalable analytics for iaas cloud availability
Scalable analytics for iaas cloud availability
 
IRJET- Amazon Redshift Workload Management and Fast Retrieval of Data
IRJET- Amazon Redshift Workload Management and Fast Retrieval of DataIRJET- Amazon Redshift Workload Management and Fast Retrieval of Data
IRJET- Amazon Redshift Workload Management and Fast Retrieval of Data
 

Recently uploaded

May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Crescat
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Mind IT Systems
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
Octavian Nadolu
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
Aftab Hussain
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
lorraineandreiamcidl
 
Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...
Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...
Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...
kalichargn70th171
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
Rakesh Kumar R
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
Hornet Dynamics
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
Google
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
Google
 
Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
Philip Schwarz
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
Aftab Hussain
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
Neo4j
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Łukasz Chruściel
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
Ayan Halder
 

Recently uploaded (20)

May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
 
Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...
Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...
Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
 
Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
 

Anti patterns in Hadoop Cluster deployment

  • 1. Presented by Rohith Sharma, Naganarasimha & Sunil
  • 2. About us.. Rohith Sharma K S, -Hadoop Committer, Works for Huawei -5+ year of experience in Hadoop ecosystems Naganarasimha G R, -Apache Hadoop Contributor for YARN, Huawei -4+ year of experience in Hadoop ecosystems Sunil Govind -Apache Hadoop Contributor for YARN and MapReduce -3+ year of experience in Hadoop ecosystems
  • 3. Agenda ➔Overview about general cluster deployment ➔Yarn cluster resource configurations walk through ➔Anti Patterns ◆ MapReduce ◆ YARN ● RM Restart/HA ● Queue Planning ➔Summary
  • 4. Brief Overview: General Cluster DeploymentA sample Hadoop Cluster Layout with HA NM DNRM (Master) NN (Master) RM (Backup) NN (Backup) NM NM NM DN DN DN Client ATS RM - Resource Manager NM - Node Manager NN - Name Node DN - Data Node ATS - Application Timeline Server ZK - ZooKeeper ZK ZK ZK ZooKeeper Cluster
  • 5. YARN Configuration : An Example Legacy NodeManager’s or DataNode’s were having low resource configurations. Nowadays most of the systems has high end capability and customers wants high end machines with less number of nodes (50~100 nodes) to achieve better performance. Sample NodeManager configurations could be like: -64 GB in Memory -8/16 cores of CPU -1Gb Network cards -100 TB disk (or Disk Arrays) We are now more focussing on these set of deployment and will try to cover anti-patterns OR best usages in coming slides.
  • 6. YARN Configuration: Related to Resources NodeManager: ●yarn.nodemanager.resource.memory-mb ●yarn.nodemanager.resource.cpu-vcores ●yarn.nodemanager.vmem-pmem-ratio ●yarn.nodemanager.log-dirs ●yarn.nodemanager.local-dirs Scheduler: ●yarn.scheduler.minimum-allocation-mb ●yarn.scheduler.maximum-allocation-mb MapReduce: ●mapreduce.map/reduce.java.opts ●mapreduce.map/reduce.memory.mb ●mapreduce.map/reduce.cpu.vcores YARN and MR has these various resource tuning configurations to help for a better resource allocation. ●With “vmem-pmem-ratio” (2:1 for example), Node Manager can kill container if its Virtual Memory shoots twice to its configured memory usage. ●It’s advised to configure “local-dirs” and “log-dirs” in different mount points.
  • 8. Container Memory Vs Container Heap MemoryCustomer : “Enough container memory is configured, still job runs slowly and sometimes when data is relatively more, tasks fails with OOM” Resolution: 1.Container memory and container Heap Size both are different configurations. 2.Make sure if mapreduce.map/reduce.memory.mb is configured then configure mapreduce.map/reduce.java.opts for heap size. 3.Since this was common mistake from users, currently in trunk we have handled this scenario. RM will set 0.8 of container configured/requested memory as its heap memory. 1. if mapreduce.map/reduce.memory.mb values are specified, but no -Xmx is supplied for mapreduce.map/reduce.java.opts keys, then the -Xmx value will be derived from the former's value. 2. For both these conversions, a scaling factor specified by property mapreduce.job.heap.memory- mb.ratio is used (default 80%), to account for overheads between heap usage vs. actual physical memory usage.
  • 9. Shuffle phase is taking long time Customer: “500 GB data Job finished in 4 hours, and on the cluster 1000 GB data job is running since 12 hours in reducer phase. I think job is stuck.” After enquiring more about resource configuration, The same resource configurations used for both the jobs Resolution: 1.Job is NOT hanged/stuck, rather time has spent on copying map output. 2.Increase the task resources 3.Tuning configurations mapreduce.reduce.shuffle.parallelcopies mapreduce.reduce.shuffle.input.buffer.percent
  • 11. RM Restart : RMStateStore Limit Customer: “Configured to yarn.resourcemanager.max-completed-applications to 100000. Completed applications in cluster has reached the limit and there many applications are in running. Observation is RM service to be up, takes 10-15 seconds” Resolution: 1.It is NOT suggested to configure 100000 max-completed-applications. 2.Suggested to use TimelimeServer for history of YARN applications 3.Higher the value significantly impact on the RM recovery
  • 13. Queue planning : Queue Mapping
  • 14. Queue planning : Queue Capacity Planning and Preemption
  • 15. Queue planning : Queue Capacity Planning for multiple usersCustomer : “I have multiple users submitting apps to a queue, seems like all the resources have been taken by single user’s app(s) though other apps are activated“ Queue Capacity Planning : CS provides options to control resources used by different users under a queue. yarn.scheduler.capacity.<queue- path>.minimum-user-limit-percent and yarn.scheduler.capacity.<queue-path>.user-limit-factor are the configurations which determines what amount of resources each user gets yarn.scheduler.capacity.<queue-path>.minimum-user-limit-percent defaults to 100% which implies no user limits are imposed. This defines how much minimum resource each user is going to get. yarn.scheduler.capacity.<queue-path>.user-limit-factor defaults to 1 which implies that a single user can never take complete queue’s resources. Needs to be configured such that even when other users are not using the queue, how much a particular user can take.
  • 16. Queue planning : AM Resource Limit Customer: “Hey buddy, most of my Jobs are in ACCEPTED state and never starts to run. What should be the problem?” “All my Jobs were running fine. But after RM switchover, few Jobs didn’t resume its work. Why RM is not able to allocate new containers to these Jobs?” Resolution: 1.User need to ensure that AM Resource Limit is properly configured w.r.t the User’s deployment needs. Maximum resource limit for running AM containers need to be analyzed and configured correctly to ensure effective progress of applications. a. Refer yarn.scheduler.capacity.maximum-am-resource-percent 2.After RM switchover if few NMs were not registered back, it can result a change in cluster size compared to what was there prior to failover. This will affect the AM Resource Limit, and hence less AMs will be activated after restart. 3.For analytical : more AM limit, For Batch queries : less AM limit
  • 17. Queue planning : Application Priority within QueueCustomer : “I have many applications running in my cluster, and few are very important jobs which has to execute fast. I now use separate queues to run some very important applications. Configuration seems very complex here and I feel cluster resources are not utilized well because of this.” Resolution: root sales (50%) inventory(50%) low 40% high 20% med 40% low 40% high 20% med 40% Configuration seems very complex for this case and cluster resources may not be utilized very well. Suggesting to use Application Priority instead.
  • 18. Resolution: Application Priority will be available in YARN from 2.8 release onwards. A brief heads-up about this feature. 1.Configure “yarn.cluster.max-application-priority” in yarn-site.xml. This will be the maximum priority for any user/application which can be configured. 2.Within a queue, currently applications are selected by using OrderingPolicy (FIFO/Fair). If applications are submitted with priority, Capacity Scheduler will also consider prioirity of application in FiFoOrderingPolicy. Hence an application with highest priority will always be picked for resource allocation. 3.For MapReduce, use “mapreduce.job.priority” to set priority. Application Priority within Queue (contd..)
  • 19. Resource Request Limits Customer: “I am not very sure about the capacity of node managers and maximum-allocation resource configuration. But my application is not getting any containers or its getting killed.” Resolution/Suggestion: NMs are not having more than 6GB memory. If container request has big memory/cpu demand which may more than a node manager’s memory and less than default “maximum-allocation-mb”, then container requests will not be served by RM. Unfortunately this is not thrown as an error to the user side, and application will continuously wait for allocation. On the other hand, Scheduler will also be waiting for some nodes to meet this heavy resource requests. User yarn.scheduler.maximum-allocation-mb and yarn.scheduler.maximum-allocation-vcores effectively by looking up on the NodeManager memory/cpu limit.
  • 20. Reservation Issue Customer : “My Application has reserved container in a node and never able to get new containers.” Resolution: Reservation feature in Capacity Scheduler serves a great deal to ensure a better linear resource allocation. However it’s possible that there can be few corner cases. For example, an application has made a reservation to a node. But this node has various containers running (long-lived), so chances of getting some free resources from this node is minimal in an immediate time frame. Configurations like below can help in having some time-framed reservation for effective cluster usage. ●yarn.scheduler.capacity.reservations-continue-look-all-nodes will help in looking for a suitable resource in other nodes too.
  • 21. Suggestions in Resource Configuration