Hadoop Summit San Jose 2014 - Apache Hadoop YARN: Best Practices

© Hortonworks Inc. 2014
Apache Hadoop YARN
Best Practices
Zhijie Shen
zshen [at] hortonworks.com
Varun Vasudev
vvasudev [at] hortonworks.com
Page 1

Who we are
• Zhijie Shen
– Software engineer at Hortonworks
– Apache Hadoop Committer
– Apache SAMZA Committer and PPMC
– PhD from National University of Singapore
• Varun Vasudev
– Software engineer at Hortonworks, working on YARN
– Worked on image and web search at Yahoo!
Page 2
Architecting the Future of Big Data

Agenda
• Talking about what we have learnt from our experiences working with
YARN users
• Best practices for
– Administrators
– Application Developers
Page 3

For Administrators
Page 4

Sub-Agenda
• Overview of YARN configuration
• ResourceManager
• Schedulers
• NodeManagers
• Others
– Log aggregation
– Metrics
Page 5

Overview of YARN configuration
• Almost everything YARN related in yarn-site.xml
• Granular – individual variables documented
• Nearly 150 configuration properties
– Required: Very small set – hostnames etc
– Common: Client and server
– Advanced: RPC retries etc.
– yarn.resourcemanager.* yarn.nodemanager.* usually - server configs
– Admins can mark them ‘final’ to clarify to users they cannot be overridden
– yarn.client.* - client configs
• Security, ResourceManager, NodeManager, TimelineServer, Scheduler –
all in one file
• Topology scripts on RM, NM and all nodes
– BUG: MR AM has to read the same script. Work in progress to send it from RM to
AMs
Page 6

ResourceManager
• Hardware requirements
– ResourceManagers needs CPU
– Doesn’t require as much memory as JobTracker
– 4 to 8 GB should be fine
• JobHistoryServer
– Needs memory, at least 8 GB
Page 7

Enable RM HA
• Enable RM HA - availability
• Only supported using Zookeeper
– Leader election used
– Fencing support
• Automatic failover enabled by default
– Using zookeeper again
– Embedded zkfc, no need to explicitly start separate process
• You can start multiple ResourceManagers
• Specify rm-ids using yarn.resourcemanager.ha.rm-ids
– e.g yarn.resourcemanager.ha.rm-ids rm1, rm2
• Associate hostnames with rm-ids using
yarn.resourcemanager.hostname.rm1,
yarn.resourcemanager.hostname.rm2
– No need to change any other configs – scheduler, resource-tracker addresses are
automatically taken care of
• Web-Uis automatically get redirected to the active
Page 8

YARN schedulers
• Two main schedulers
– capacity
– fair
• Capacity Scheduler allows you to setup queues to split resources –
useful for multi-tenant clusters where you want to guarantee resources
• Fair Scheduler allows you to split resources ‘fairly’ across applications
• Both have admin files which can be used to dynamically change the
setup
• If you have enabled HA, queue configuration files are on local disk
– Make sure queue files are consistent across nodes
– Feature to centralize configs in progress
Page 9

Capacity Scheduler
Page 10
50%
queue-1 queue-2 queue-3
Apps Apps Apps
Guaranteed
Resources
30% 20%

YARN Capacity scheduler
• Configuration in capacity-scheduler.xml
• Take some time to setup your queues!
• Queues have per-queue acls to restrict queue access
– Access can be dynamically changed
• Elasticity can be limited on a per-queue basis – use
yarn.scheduler.capacity.<queue-path>.maximum-capacity
• Use yarn.scheduler.capacity.<queue-path>.state to drain queues
– ‘Decommissioning’ a queue
• yarn rmadmin –refreshQueues to make runtime changes
Page 11

YARN Fair Scheduler
• Apps get equal share of resources, on average, over time
• No worry about starvation
• Support for queues – meant to be used so that you can prevent users
from flooding the system with apps
• Has support for fairness policy which can be modified at runtime
• Good if you have lots of small jobs
Page 12

Size your containers
• Memory and cores – minimum and maximum allocation, affects
containers per node
• yarn.scheduler.*-allocation-*
• Defaults are 1GB, 8GB, 1 core and 32 cores
• CPU scheduling needs a bit more stabilization
– Historically – translate to memory calculations
• Similarly Disk-scheduling
– translate disk limits to memory/cpu.
Page 13
0
10
20
30
40
50
60
70
4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64
Number of containers
per node
Memory for NodeManager(in GB)

NodeManagers
• Set resource-memory – variable is yarn.nodemanager.resource.memory-
mb
– Sets how much memory YARN can use for containers
– Default is 8GB
• Set up a health-checker script!
– Check disk
– Check network
– Check any external resources required for job completion
– Test it on your OS
– Weed out bad nodes automatically!
• Figure out if the physical and virtual memory monitors make sense;
both are enabled by default.
– Default ratio is 2.1
• Multiple disks for containers on NodeManagers
– HDFS too accesses them
– If bottlenecked on disks, separate them. Haven’t seen it in the wild though
Page 14

YARN log aggregation
• Log aggregation can be enabled using yarn.log-aggregation-enable.
• Can control how long you keep the logs by setting parameters for
purging
• App logs can be obtained using “yarn logs” command
• Creates lots of small files, can affect HDFS performance
Page 15

YARN Metrics
• JMX – http://<rm address>:<port>/jmx, http://<nm address>:<port>/jmx
– Cluster metrics – apps running, successful, failed, etc
– Scheduler metrics – queue usage
– RPC metrics
• Web UI – http://<rm address>:<port>/cluster
– Cluster metrics
– Scheduler metrics – easier to digest, especially queue usage
– Healthy, failed nodes
• Can be emitted to Ganglia directly using the metrics sink
– Metrics configuration file
Page 16

For Application Developers
Page 17

Sub-Agenda
• Framework or a native Application?
• Understanding YARN Basics
• Writing an YARN Client
• Writing an ApplicationMaster
• Misc Lessons
Page 18

Framework or a native app?
• Two choices
– Write applications on top of existing frameworks
– Battle tested
– Already work
– APIs
– Roll your own native YARN application
• Existing frameworks
– Scalable batch processing: MapReduce
– Stream processing: Storm/Samza
– Interactive processing, iterations: Tez/Spark
– SQL: Hive
– Data pipelines: Pig
– Graph processing: Giraph
– Existing app: Slider
• Apache: Your App Store
Page 19

Ease of development
• Check the other developing or deployment tools
Page 20
NativeSlider
Frameworks
Complexity
Twill/REEF

Understanding YARN Components
Page 21
• ResourceManager
– Master of a cluster
• NodeManager
– Slave to take care of one host
• ApplicationMaster
– Master of an application
• Container
– Resource abstraction, process to
complete a task

User code: Client and AM
• Client
– Client to ResourceManager
• ApplicationMaster
– ApplicationMaster to scheduler
– Allocate resources
– ApplicationMaster to NodeManager
– Manage containers
Page 22

Client: Rule of Thumb
• Use the client libraries
– YarnClient
– Submit an application
– AMRMClient(Async)
– Negotiate resources
– NMClient(Async)
– Manage containers
– TimelineClient
– Monitor an application
Page 23

Writing Client
1. Get the application Id from RM
2. Construct ApplicationSubmissionContext
1. Shell command to run the AM
2. Environment (class path, env-variable)
3. LocalResources (Job jars downloaded from HDFS)
3. Submit the request to RM
1. submitApplication
Page 24

Tips for Writing Client
• Cluster Dependencies
–Try to make zero assumptions on the cluster
–Cluster location
–Cluster sizes.
– ApplicationMaster too
• Your application bundle should deploy everything required
using YARN’s local resources.
Page 25

Writing ApplicationMaster
1. AM registers with RM (registerApplicationMaster)
2. HeartBeats(allocate) with RM (asynchronously)
1. send the Request
1. Request new containers.
2. Release containers.
2. Received containers and send request to NM to start the container
1. construct ContainerLaunchContext
– commands
– env
– jars
3. Unregisters with RM (finishApplicationMaster)
Page 26

Tips for writing ApplicationMaster
• RM assigns containers asynchronously
– Containers are likely not returned immediately at current call.
– User needs to give empty requests until it gets the containers it requested.
– ResourceRequest is incremental.
• Locality requests may not always be met
– Relaxed Locality
• AMs can fail
– They run on cluster nodes which can fail
– RM restarts AMs automatically
– Write AMs to handle failures on restarts - recovery
– May be continue your work when AM restarts
• Optionally talk to your containers directly through the AM
– To get progress, give work, kill it, etc
– YARN doesn’t do anything for you
Page 27

Using the Timeline Service
• Metadata/Metrics
• Put application specific information
– TimelineClient
– POJO objects
• Query the information
– Get all entities of an entity type
– Get one specific entity
– Get all events of an entity type
Page 28

Page 29
Summary: Application Workflow
• Execution Sequence
1. Client submits an application
2. RM allocates a container to start
AM
3. AM registers with RM
4. AM asks containers from RM
5. AM notifies NM to launch
containers
6. Application code is executed in
container
7. Client contacts RM/AM to monitor
application’s status
8. AM unregisters with RM
Client RM NM AM
1
2
3
4
5
7
8
6

Misc Lessons: Taking What YARN offers
• Monitor your application
– RM
– NM
– Timeline server
Page 30

Misc Lessons: Debugging/Testing
• MiniYARNCluster
– In JVM YARN cluster!
– Regression tests for your applications
• Unmanaged AM
– Support to run the AM outside of a YARN cluster for development and
testing
– AM logs on your console!
• Logs
– RM/NM logs
– App Log aggregation
– Accessible via CLI, web UI
Page 31

Thank you!
Questions?
Page 32

Hadoop Summit San Jose 2014 - Apache Hadoop YARN: Best Practices

Recommended

Recommended

More Related Content

Recently uploaded

Recently uploaded (20)

Featured

Featured (20)

Hadoop Summit San Jose 2014 - Apache Hadoop YARN: Best Practices