SlideShare a Scribd company logo
1 of 30
Tune up Yarn and Hive
Richard Xu
Systems Architect at Hortonworks
rxu@hortonworks.com
Toronto Hadoop User Group Nov 27, 2015
Today’s Agenda
• Review a real battle
• Tuning cluster and Yarn
• Tuning Hive queries
• Yet more tuning needed
• LLAP---what we can expect in the near future
Page 2
No issue or big one…
Page 3
Cluster overview
• 14 nodes
• 2.46 TB memory available for Yarn applications
Page 4
Main use cases
• 200+ of Hive queries kicked off by Oozie to aggregate
data quarter-hourly, hourly, daily and weekly.
• HBase tables are loaded into memory as in-memory
cache; Hive queries retrieve data from these HBase
tables via Hive UDF
Initial complaints
• Cluster is slow
• Almost everybody’s job hanging for hours which is
supposed to be finished in a few minutes
• Hadoop does not work 
Page 5
Started with cluster and Yarn
tuning…
Page 6
Initial Approaches
Page 7
• Ensure best practice configures are in place in all aspects: OS
(disable Transparent Huge Pages, disable swappiness on datanodes
only), network (disable iptables), hard drives --- found ulimit setting
too low
• Create 2 more Yarn capacity scheduler queues: batch (60%), ad-hoc
(30%) in addition to the default queue (10%)
• Applied default configurations suggested by hdp-configuration-
utils.py
Issues after initial approach
Page 8
The very first issue we encountered is one off-shore team member’s bad query
used up all the resource of the cluster. Fine, need to limit user capacity to avoid it:
1. Set user-limit-factor to from default value 1 to 0.1, to restrict any user from
using resources beyond 10% of the queue capacity.
2. Set minimum-user-limit-percent from 100 to 10, so that the queue can serve 10
users same time.
New issues right after applying the above
changes
• Some users submitting 2 Oozie jobs same time get
stuck.
• The cluster is not running with full load/speed ---- we
observe pending applications while the cluster still
have resources
Page 9
Screenshot
Only 18.7% of resource is being used.
Page 10
Why?
Page 11
Reason Related source code
Yarn capacity queue property,
“Max Schedulable Applications
Per User”.
As we allow more concurrent
users, then the number of max
schedulable applications per user
decreases!
public static int computeMaxActi
veApplicationsPerUser(
int maxActiveApplications, int
userLimit, float userLimitFactor) {
return Math.max(
(int)Math.ceil(
maxActiveApplications * (
userLimit / 100.0f) * userLimitFact
or), 1);
}
Screenshot
Page 12
Solution
Page 13
increase yarn.scheduler.capacity.maximum-am-resource-percent to assign more
resources to applicationManager:
yarn.scheduler.capacity.root.prts-batch.minimum-user-limit-percent=10
yarn.scheduler.capacity.root.prts-batch.user-limit-factor=0.5
yarn.scheduler.capacity.maximum-am-resource-percent=0.2
Change made from original settings:
Page 14
yarn.scheduler.capacity.root.prts-batch.user-limit-factor=0.5 change from 2
mapreduce.map.java.opts - change from 3g to 4g
mapreduce.reduce.java.opts - change from 3g to 4g
Default virtual memory for a job's map-task - 4g to 8g
Default virtual memory for a job's reduce-task - change from 4g to 16g
yarn.app.mapreduce.am.resource.mb - change from 4g to 16g
yarn.app.mapreduce.am.command-opts - change from 4g to 12g
mapreduce.reduce.java.opts - change from 4g to 12g
mapreduce.map.java.opts - change from 4g to 6g
yarn.scheduler.minimum-allocation-mb - changed to 8g from 3g
======
tez.am.resource.memory.mb changed to 8g from 4g
tez.task.resource.memory.mb changed to 8g from 4g
tez.am.java.opts changed to 4g from 6g
====hive=====
hive.tez.container.size changed from 4g to 8g
hive.tez.java.opts changed from 2560 to 6144Mb
hive.auto.convert.join.noconditionaltask.size changed from 2.5gb to 512Mb
Change made from original settings:
Page 15
Max containers per host: moved from 21 to 58
Suggested hardware change:
IO is a bottleneck to move beyond
Remove Raid pair on OS and use additional drive for HDFS
Tuning Hive Queries
Page 16
Starting point
Page 17
• Ofer’s blog, “5 Ways to Make Your Hive Queries Run
Faster” is a great guidance:
• http://hortonworks.com/blog/5-ways-make-hive-
queries-run-faster/,
Then look at individual Hive queries,
especially those taking extremely
longer time than peers.
Page 18
50 concurrent clients
Issue: Execution plan shows that a whole
table is loaded in--- 9 GB, millions of rows
The subquery below should be filtered first:
(select fddcell_key,date_key as date, hour_key, qhour_key,
OSSC_RC,MeContext,EUtranCellFDD,enodebfunction from
tf001_fddcell_qhourly tf0001 ) tf001
changed to:
left outer join tf001_fddcell_qhourly tf0001 tf001
Result: the job used to run 2 hours now takes 25 mins --- remember, it can be
further improved.
Issue: unix_timestamp function
Page 19
50 concurrent clients
Throughput = 1095 reqs/s
unix_timestamp function is used to get the current day and hour in the where
clause for joining tables. The unix_timestamp function is a non-deterministic
function. What this means is that it is not executed when the query is compiled, it
is executed at runtime. For each row. This disables dynamic partition pruning
since the optimizer can’t tell what the date and hour are before each row is read.
Full Table Scans for everyone! In Hive 1.2.0 and beyond, the unix_timestamp
function will begin to be deprecated. it is replaced with a current_timestamp
function that is deterministic. Since Sprint are on Hive 0.14, we installed a UDF
that implements the current_timestamp code.
Issue: hive query hanging with Tez and
failing with Map Reduce
Issue: hive query hanging with Tez and failing with Map Reduce
(Generate Map Join Task Error)
Solution:
Disable hive.auto.convert.join
or hive.auto.convert.sortmerge.join.
Page 20
Issue: datatype mismatch on join
When joining tables it is important to make sure that the data types
match on join columns. There were a couple of joins that were trying to
join a bigint to an int. The inability to cast a bigint to an int was causing
a table scan (I believe because of the join order). Performing an explicit
cast on the int to a bigint allowed the query to do a range scan instead.
Page 21
Issue: join orders
For example, there were some places where joining a larger table to a smaller one
in a subquery before joining it back to another large table helps to filter the large
table and improve performance of the query.
1. Consider a JOIN as follows: SELECT * FROM A JOIN B JOIN C (assuming all on
the same ID fields)
2. Consider further that A is a very very large table, and both B and C are relatively
smaller table.
3. Without CBO the plan may become (A JOIN B) —> T, then T JOIN C —> output.
This is of course not very efficient since T is also very very large (based on A) and
thus we have two very large joins. Alternatively you could do (B JOIN C) -> T, and
then (A JOIN T). You would think CBO would do this, but with HDP 2.1 it’s not. Not
sure if this would be different in 2.2 - potentially yes, but not sure.
Page 22
Overall Tuning
Page 23
Tuning Specifics
Page 24
• Use 80% RAM at most in Yarn --- leave space for shuffing
• Oozie launcher and hive query go to different queues
• Tez unique features
1. Dynamic partition pruning+tuning
hive.tez.dynamic.partition.pruning=true
hive.tez.dynamic.partition.pruning.max.data.size
hive.tez.dynamic.partition.pruning.max.event.size
2. Auto-reducer parallelism+tuning
hive.tez.auto.reducer.parallelism=true
hive.tez.min.partition.factor=0.01
3. Tunable slow-start
tez.shuffle-vertex-manager.min-src-fraction=0.99
tez.shuffle-vertex-manager.max-src-fraction=1.0
4. Min-held containers instead of prewarm
tez.am.session.min.held-containers=3
Useful tools
Page 25
• Tez Performance Visualization
• https://github.com/apache/tez/tree/master/tez-
tools/swimlanes
•
• http://people.apache.org/~gopalv/query27.svg
Useful tools(continue)
Page 26
• https://github.com/t3rmin4t0r/lipwig
• http://people.apache.org/~gopalv/query27.svg
• http://people.apache.org/~gopalv/q27-plan.svg
Lessons Learned
Page 27
• Take a holistic approach to performance tuning
• You cannot tune the system around bad code
• Know your performance target before you begin
• Tuning Hive queries requires a better understanding
of the data structures than relational databases
• Developers may not know how to tune or even read
an explain plan
• It is NOT always bad developer code
• Get engaged with the customer developers EARLY
LLAP: Long-lived execution in
Hive
Page 28
Tez with LLAP engine
LLAP is an optional daemon process running on multiple nodes, that provides
the following:
• Caching and data reuse across queries with compressed columnar data in-memory (off-heap)
• Multi-threaded execution including reads with predicate pushdown and hash joins
• High throughput IO using Async IO Elevator with dedicated thread and core per disk
• Granular column level security across applications
• YARN will provide workload management in LLAP by using delegation
Node
LLAP
Process
HDFS
Query
Fragm
ent
LLAP In-Memory
columnar cache
LLAP process
running read task
for a query
LLAP process runs on multiple nodes,
accelerating Tez tasks
Node
Hive
Query
Node NodeNode Node
LLAP LLAP LLAP LLAP
Questions?

More Related Content

What's hot

How the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside DownHow the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside Down
DataWorks Summit
 
Performance Hive+Tez 2
Performance Hive+Tez 2Performance Hive+Tez 2
Performance Hive+Tez 2
t3rmin4t0r
 

What's hot (20)

2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive Tuning2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive Tuning
 
Tez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthelTez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthel
 
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the CloudSpeed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
 
Spark vstez
Spark vstezSpark vstez
Spark vstez
 
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
 
Stinger Initiative - Deep Dive
Stinger Initiative - Deep DiveStinger Initiative - Deep Dive
Stinger Initiative - Deep Dive
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
How the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside DownHow the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside Down
 
Performance Hive+Tez 2
Performance Hive+Tez 2Performance Hive+Tez 2
Performance Hive+Tez 2
 
Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive Queries
 
Using Apache Hive with High Performance
Using Apache Hive with High PerformanceUsing Apache Hive with High Performance
Using Apache Hive with High Performance
 
Data organization: hive meetup
Data organization: hive meetupData organization: hive meetup
Data organization: hive meetup
 
Llap: Locality is Dead
Llap: Locality is DeadLlap: Locality is Dead
Llap: Locality is Dead
 
Evolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage SubsystemEvolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage Subsystem
 
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with Yarn
Scale 12 x   Efficient Multi-tenant Hadoop 2 Workloads with YarnScale 12 x   Efficient Multi-tenant Hadoop 2 Workloads with Yarn
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with Yarn
 
The Future of Apache Storm
The Future of Apache StormThe Future of Apache Storm
The Future of Apache Storm
 
Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive Queries
 
HiveACIDPublic
HiveACIDPublicHiveACIDPublic
HiveACIDPublic
 
Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5
 
Tuning Apache Ambari performance for Big Data at scale with 3000 agents
Tuning Apache Ambari performance for Big Data at scale with 3000 agentsTuning Apache Ambari performance for Big Data at scale with 3000 agents
Tuning Apache Ambari performance for Big Data at scale with 3000 agents
 

Viewers also liked

Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
DataWorks Summit
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
DataWorks Summit
 
JVM and Garbage Collection Tuning
JVM and Garbage Collection TuningJVM and Garbage Collection Tuning
JVM and Garbage Collection Tuning
Kai Koenig
 
apache pig performance optimizations talk at apachecon 2010
apache pig performance optimizations talk at apachecon 2010apache pig performance optimizations talk at apachecon 2010
apache pig performance optimizations talk at apachecon 2010
Thejas Nair
 

Viewers also liked (20)

Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
 
Hive tuning
Hive tuningHive tuning
Hive tuning
 
Yahoo's Experience Running Pig on Tez at Scale
Yahoo's Experience Running Pig on Tez at ScaleYahoo's Experience Running Pig on Tez at Scale
Yahoo's Experience Running Pig on Tez at Scale
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
 
LLAP: long-lived execution in Hive
LLAP: long-lived execution in HiveLLAP: long-lived execution in Hive
LLAP: long-lived execution in Hive
 
JVM and Garbage Collection Tuning
JVM and Garbage Collection TuningJVM and Garbage Collection Tuning
JVM and Garbage Collection Tuning
 
Jvm tuning in a rush! - Lviv JUG
Jvm tuning in a rush! - Lviv JUGJvm tuning in a rush! - Lviv JUG
Jvm tuning in a rush! - Lviv JUG
 
Spark tuning2016may11bida
Spark tuning2016may11bidaSpark tuning2016may11bida
Spark tuning2016may11bida
 
HotSpot JVM Tuning
HotSpot JVM TuningHotSpot JVM Tuning
HotSpot JVM Tuning
 
Basics of JVM Tuning
Basics of JVM TuningBasics of JVM Tuning
Basics of JVM Tuning
 
Yarns About Yarn
Yarns About YarnYarns About Yarn
Yarns About Yarn
 
Tez Data Processing over Yarn
Tez Data Processing over YarnTez Data Processing over Yarn
Tez Data Processing over Yarn
 
Tuning up with Apache Tez
Tuning up with Apache TezTuning up with Apache Tez
Tuning up with Apache Tez
 
February 2014 HUG : Hive On Tez
February 2014 HUG : Hive On TezFebruary 2014 HUG : Hive On Tez
February 2014 HUG : Hive On Tez
 
February 2014 HUG : Pig On Tez
February 2014 HUG : Pig On TezFebruary 2014 HUG : Pig On Tez
February 2014 HUG : Pig On Tez
 
Starfish: A Self-tuning System for Big Data Analytics
Starfish: A Self-tuning System for Big Data AnalyticsStarfish: A Self-tuning System for Big Data Analytics
Starfish: A Self-tuning System for Big Data Analytics
 
Introduction to Hadoop Ecosystem
Introduction to Hadoop Ecosystem Introduction to Hadoop Ecosystem
Introduction to Hadoop Ecosystem
 
apache pig performance optimizations talk at apachecon 2010
apache pig performance optimizations talk at apachecon 2010apache pig performance optimizations talk at apachecon 2010
apache pig performance optimizations talk at apachecon 2010
 
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
 
functional dependencies with example
functional dependencies with examplefunctional dependencies with example
functional dependencies with example
 

Similar to Tune up Yarn and Hive

Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...
Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...
Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...
DataWorks Summit
 
High performance website
High performance websiteHigh performance website
High performance website
Chamnap Chhorn
 
Tachyon_meetup_5-28-2015-IBM
Tachyon_meetup_5-28-2015-IBMTachyon_meetup_5-28-2015-IBM
Tachyon_meetup_5-28-2015-IBM
Shaoshan Liu
 
Was liberty at scale
Was liberty at scaleWas liberty at scale
Was liberty at scale
sflynn073
 

Similar to Tune up Yarn and Hive (20)

The All-In-One Package for Massively Multicore, Heterogeneous Jobs with Hotsp...
The All-In-One Package for Massively Multicore, Heterogeneous Jobs with Hotsp...The All-In-One Package for Massively Multicore, Heterogeneous Jobs with Hotsp...
The All-In-One Package for Massively Multicore, Heterogeneous Jobs with Hotsp...
 
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
 
Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...
Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...
Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...
 
Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...
Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...
Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...
 
How we switched to columnar at SpendHQ
How we switched to columnar at SpendHQHow we switched to columnar at SpendHQ
How we switched to columnar at SpendHQ
 
Building an Impenetrable ZooKeeper - Kathleen Ting
Building an Impenetrable ZooKeeper - Kathleen TingBuilding an Impenetrable ZooKeeper - Kathleen Ting
Building an Impenetrable ZooKeeper - Kathleen Ting
 
Vinetalk: The missing piece for cluster managers to enable accelerator sharing
Vinetalk: The missing piece for cluster managers to enable accelerator sharingVinetalk: The missing piece for cluster managers to enable accelerator sharing
Vinetalk: The missing piece for cluster managers to enable accelerator sharing
 
HBaseCon 2015: HBase at Scale in an Online and High-Demand Environment
HBaseCon 2015: HBase at Scale in an Online and  High-Demand EnvironmentHBaseCon 2015: HBase at Scale in an Online and  High-Demand Environment
HBaseCon 2015: HBase at Scale in an Online and High-Demand Environment
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
 
Performance Tuning
Performance TuningPerformance Tuning
Performance Tuning
 
High performance website
High performance websiteHigh performance website
High performance website
 
Tachyon_meetup_5-28-2015-IBM
Tachyon_meetup_5-28-2015-IBMTachyon_meetup_5-28-2015-IBM
Tachyon_meetup_5-28-2015-IBM
 
Square Peg Round Hole: Serverless Solutions For Non-Serverless Problems
Square Peg Round Hole: Serverless Solutions For Non-Serverless ProblemsSquare Peg Round Hole: Serverless Solutions For Non-Serverless Problems
Square Peg Round Hole: Serverless Solutions For Non-Serverless Problems
 
Learn what is Hadoop-and-BigData
Learn  what is Hadoop-and-BigDataLearn  what is Hadoop-and-BigData
Learn what is Hadoop-and-BigData
 
Was liberty at scale
Was liberty at scaleWas liberty at scale
Was liberty at scale
 
Node labels in YARN
Node labels in YARNNode labels in YARN
Node labels in YARN
 
Node Labels in YARN
Node Labels in YARNNode Labels in YARN
Node Labels in YARN
 
MySQL Performance Metrics that Matter
MySQL Performance Metrics that MatterMySQL Performance Metrics that Matter
MySQL Performance Metrics that Matter
 
Optimizing elastic search on google compute engine
Optimizing elastic search on google compute engineOptimizing elastic search on google compute engine
Optimizing elastic search on google compute engine
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 

Tune up Yarn and Hive

  • 1. Tune up Yarn and Hive Richard Xu Systems Architect at Hortonworks rxu@hortonworks.com Toronto Hadoop User Group Nov 27, 2015
  • 2. Today’s Agenda • Review a real battle • Tuning cluster and Yarn • Tuning Hive queries • Yet more tuning needed • LLAP---what we can expect in the near future Page 2
  • 3. No issue or big one… Page 3
  • 4. Cluster overview • 14 nodes • 2.46 TB memory available for Yarn applications Page 4 Main use cases • 200+ of Hive queries kicked off by Oozie to aggregate data quarter-hourly, hourly, daily and weekly. • HBase tables are loaded into memory as in-memory cache; Hive queries retrieve data from these HBase tables via Hive UDF
  • 5. Initial complaints • Cluster is slow • Almost everybody’s job hanging for hours which is supposed to be finished in a few minutes • Hadoop does not work  Page 5
  • 6. Started with cluster and Yarn tuning… Page 6
  • 7. Initial Approaches Page 7 • Ensure best practice configures are in place in all aspects: OS (disable Transparent Huge Pages, disable swappiness on datanodes only), network (disable iptables), hard drives --- found ulimit setting too low • Create 2 more Yarn capacity scheduler queues: batch (60%), ad-hoc (30%) in addition to the default queue (10%) • Applied default configurations suggested by hdp-configuration- utils.py
  • 8. Issues after initial approach Page 8 The very first issue we encountered is one off-shore team member’s bad query used up all the resource of the cluster. Fine, need to limit user capacity to avoid it: 1. Set user-limit-factor to from default value 1 to 0.1, to restrict any user from using resources beyond 10% of the queue capacity. 2. Set minimum-user-limit-percent from 100 to 10, so that the queue can serve 10 users same time.
  • 9. New issues right after applying the above changes • Some users submitting 2 Oozie jobs same time get stuck. • The cluster is not running with full load/speed ---- we observe pending applications while the cluster still have resources Page 9
  • 10. Screenshot Only 18.7% of resource is being used. Page 10
  • 11. Why? Page 11 Reason Related source code Yarn capacity queue property, “Max Schedulable Applications Per User”. As we allow more concurrent users, then the number of max schedulable applications per user decreases! public static int computeMaxActi veApplicationsPerUser( int maxActiveApplications, int userLimit, float userLimitFactor) { return Math.max( (int)Math.ceil( maxActiveApplications * ( userLimit / 100.0f) * userLimitFact or), 1); }
  • 13. Solution Page 13 increase yarn.scheduler.capacity.maximum-am-resource-percent to assign more resources to applicationManager: yarn.scheduler.capacity.root.prts-batch.minimum-user-limit-percent=10 yarn.scheduler.capacity.root.prts-batch.user-limit-factor=0.5 yarn.scheduler.capacity.maximum-am-resource-percent=0.2
  • 14. Change made from original settings: Page 14 yarn.scheduler.capacity.root.prts-batch.user-limit-factor=0.5 change from 2 mapreduce.map.java.opts - change from 3g to 4g mapreduce.reduce.java.opts - change from 3g to 4g Default virtual memory for a job's map-task - 4g to 8g Default virtual memory for a job's reduce-task - change from 4g to 16g yarn.app.mapreduce.am.resource.mb - change from 4g to 16g yarn.app.mapreduce.am.command-opts - change from 4g to 12g mapreduce.reduce.java.opts - change from 4g to 12g mapreduce.map.java.opts - change from 4g to 6g yarn.scheduler.minimum-allocation-mb - changed to 8g from 3g ====== tez.am.resource.memory.mb changed to 8g from 4g tez.task.resource.memory.mb changed to 8g from 4g tez.am.java.opts changed to 4g from 6g ====hive===== hive.tez.container.size changed from 4g to 8g hive.tez.java.opts changed from 2560 to 6144Mb hive.auto.convert.join.noconditionaltask.size changed from 2.5gb to 512Mb
  • 15. Change made from original settings: Page 15 Max containers per host: moved from 21 to 58 Suggested hardware change: IO is a bottleneck to move beyond Remove Raid pair on OS and use additional drive for HDFS
  • 17. Starting point Page 17 • Ofer’s blog, “5 Ways to Make Your Hive Queries Run Faster” is a great guidance: • http://hortonworks.com/blog/5-ways-make-hive- queries-run-faster/, Then look at individual Hive queries, especially those taking extremely longer time than peers.
  • 18. Page 18 50 concurrent clients Issue: Execution plan shows that a whole table is loaded in--- 9 GB, millions of rows The subquery below should be filtered first: (select fddcell_key,date_key as date, hour_key, qhour_key, OSSC_RC,MeContext,EUtranCellFDD,enodebfunction from tf001_fddcell_qhourly tf0001 ) tf001 changed to: left outer join tf001_fddcell_qhourly tf0001 tf001 Result: the job used to run 2 hours now takes 25 mins --- remember, it can be further improved.
  • 19. Issue: unix_timestamp function Page 19 50 concurrent clients Throughput = 1095 reqs/s unix_timestamp function is used to get the current day and hour in the where clause for joining tables. The unix_timestamp function is a non-deterministic function. What this means is that it is not executed when the query is compiled, it is executed at runtime. For each row. This disables dynamic partition pruning since the optimizer can’t tell what the date and hour are before each row is read. Full Table Scans for everyone! In Hive 1.2.0 and beyond, the unix_timestamp function will begin to be deprecated. it is replaced with a current_timestamp function that is deterministic. Since Sprint are on Hive 0.14, we installed a UDF that implements the current_timestamp code.
  • 20. Issue: hive query hanging with Tez and failing with Map Reduce Issue: hive query hanging with Tez and failing with Map Reduce (Generate Map Join Task Error) Solution: Disable hive.auto.convert.join or hive.auto.convert.sortmerge.join. Page 20
  • 21. Issue: datatype mismatch on join When joining tables it is important to make sure that the data types match on join columns. There were a couple of joins that were trying to join a bigint to an int. The inability to cast a bigint to an int was causing a table scan (I believe because of the join order). Performing an explicit cast on the int to a bigint allowed the query to do a range scan instead. Page 21
  • 22. Issue: join orders For example, there were some places where joining a larger table to a smaller one in a subquery before joining it back to another large table helps to filter the large table and improve performance of the query. 1. Consider a JOIN as follows: SELECT * FROM A JOIN B JOIN C (assuming all on the same ID fields) 2. Consider further that A is a very very large table, and both B and C are relatively smaller table. 3. Without CBO the plan may become (A JOIN B) —> T, then T JOIN C —> output. This is of course not very efficient since T is also very very large (based on A) and thus we have two very large joins. Alternatively you could do (B JOIN C) -> T, and then (A JOIN T). You would think CBO would do this, but with HDP 2.1 it’s not. Not sure if this would be different in 2.2 - potentially yes, but not sure. Page 22
  • 24. Tuning Specifics Page 24 • Use 80% RAM at most in Yarn --- leave space for shuffing • Oozie launcher and hive query go to different queues • Tez unique features 1. Dynamic partition pruning+tuning hive.tez.dynamic.partition.pruning=true hive.tez.dynamic.partition.pruning.max.data.size hive.tez.dynamic.partition.pruning.max.event.size 2. Auto-reducer parallelism+tuning hive.tez.auto.reducer.parallelism=true hive.tez.min.partition.factor=0.01 3. Tunable slow-start tez.shuffle-vertex-manager.min-src-fraction=0.99 tez.shuffle-vertex-manager.max-src-fraction=1.0 4. Min-held containers instead of prewarm tez.am.session.min.held-containers=3
  • 25. Useful tools Page 25 • Tez Performance Visualization • https://github.com/apache/tez/tree/master/tez- tools/swimlanes • • http://people.apache.org/~gopalv/query27.svg
  • 26. Useful tools(continue) Page 26 • https://github.com/t3rmin4t0r/lipwig • http://people.apache.org/~gopalv/query27.svg • http://people.apache.org/~gopalv/q27-plan.svg
  • 27. Lessons Learned Page 27 • Take a holistic approach to performance tuning • You cannot tune the system around bad code • Know your performance target before you begin • Tuning Hive queries requires a better understanding of the data structures than relational databases • Developers may not know how to tune or even read an explain plan • It is NOT always bad developer code • Get engaged with the customer developers EARLY
  • 28. LLAP: Long-lived execution in Hive Page 28
  • 29. Tez with LLAP engine LLAP is an optional daemon process running on multiple nodes, that provides the following: • Caching and data reuse across queries with compressed columnar data in-memory (off-heap) • Multi-threaded execution including reads with predicate pushdown and hash joins • High throughput IO using Async IO Elevator with dedicated thread and core per disk • Granular column level security across applications • YARN will provide workload management in LLAP by using delegation Node LLAP Process HDFS Query Fragm ent LLAP In-Memory columnar cache LLAP process running read task for a query LLAP process runs on multiple nodes, accelerating Tez tasks Node Hive Query Node NodeNode Node LLAP LLAP LLAP LLAP

Editor's Notes

  1. Create a replicated table Insert some data into it flush it Kill the primary Attempt a write – fails Read a value
  2. Create a replicated table Insert some data into it flush it Kill the primary Attempt a write – fails Read a value
  3. Create a replicated table Insert some data into it flush it Kill the primary Attempt a write – fails Read a value
  4. Test performed using 6 AWS nodes (i2.8xlarge) + 5 client nodes (m2.4xlarge)
  5. Test performed using 6 AWS nodes (i2.8xlarge) + 5 client nodes (m2.4xlarge)
  6. Create a replicated table Insert some data into it flush it Kill the primary Attempt a write – fails Read a value