Hadoop administration

Hadoop Administration
issues and fixes

InMobi cluster
HDP Distribution - 2.2
450 nodes
5PB data storage
60-75K applications runs everyday
9000 Cores
14 TB of Total RAM

Our Hadoop ecosystem
falcon
oozie
scribe/databus
kafka
httpfs services
zookeeper
aerospike

HardWare Configurations
NameNode
64GB of RAM / 24 core cpu
DataNode
64GB of RAM /32 core/ 6*2TB disks
128GB of RAM/40 core / 10*2TB disks
OS
Ubuntu 12.04

Configuration management
Puppet
system package pushes
system configuration management
hadoop package pushes
hadoop configuration management using templates

Hadoop configurations
1.5GB of RAM and 1 core for a container
Max container limit is 20G - to enable spark jobs
Max vcore allocation is 8 cpu
Using Capacity Scheduler
Node Manager recovery is enabled
CGroups for CPU isolation

jvm configuration
NameNode - 30GB
NodeManager - 2GB
DataNode - 3GB
gc - concurrent mark sweep collector ( CMS)

Alers and Trends
Using Nagios for Alerting.
monitor only master node services eg:
namenode,resourcemanager,historyserver
datanodes/nodemanagers are monitored based on availability ( say 90% of
the nodes are up)
alerts are configured to send to PagerDuty

Alers and Trends [contd]
Using Graphite for Metrics
jmx metrics
use jmxtrans and custom scripts to send metrics to graphite
clusterwide alerts on top on graphite data

Issues And challenges - hadoop
Tasks hogging up whole allocated CPU vcores.
Problem
We have couple of jobs which are very CPU intensive. While these jobs are running, other tasks are starving for
resources, leads to SLA misses.
Reason
with the DefaultContainerExecutor class, there is no limit in using CPU by a single task. Other tasks running on the
node node starve for CPU resource and can lead to long running jobs.

Issues And challenges - hadoop [cntd]
solution
We mitigated this problem by implementing strict CPU limit with cgroups and LinuxContainerExecutor class.
install libcgroup1 package

Job priority
Problem
yarn does not honor mapred job priority ( mapreduce.job.priority VERY_LOW, LOW,HIGH,VERY_HIGH)
Reason
Priorities across application within the same queue is not implemented yet.

Solution
Implemented hierarchical queues within a queue with different thresholds
allocate different proportions to the sub-queues say 30% to report, 20% to report_low and 30%
report_high queue.

Namenode HA failover with ssh fencing
Problem
Active Namenode(Namenode1) server went down physically. We expected the standby to become active. But that
didn’t happen.
Reason
We are using Namenode HA with ssh fencing. zkfc(zookeeper failover controller) does ssh to the active
namenode and make sure the namenode process is killed to avoid any possible chance of split-brain. Since the active
namenode was physically down, the ssh-fencing always returned non-zero status

Solution
1. We added a dummy host in place of the Namenode1 in the hdfs-site.xml. The zkfc running in standby
successfully
2. We can use shell fencing method to return exit status ‘0’
3. You can also use custom scripts with shell fencing method to handle this situation

tasks running slow
problem
Jobs are running extremely slow.
Reason
There can be system issues as well as application issues that can cause this problem.
1. misconfigured nodemanager
The actual cpu available in a nodemanager is 12, but you set
yarn.nodemanager.resource.cpu-vcores as 24.

2. Nodes were configured with 100Mb/s network speed
3. Bad disks
Solution
a. Enabling Speculative execution
cons
Enabling speculative execution can lead to wastage of resources. We
have patched speculative execution, It spawn duplicate tasks only in case absolute necessity.

b. Excluding bad disks from the cluster
We exclude bad disks from a node using puppet automatically.
c. Exclude bad nodes from the cluster
We are using custom healthcheck to blacklist a node if a certain number of applications fails
on this node in a certain amount of time.
i. separate the NodeManager Audit log to a different file using log4j

Then use a custom script to get the number of failures in nodemanager and use that to
blacklist a node.

d. machines with bad hardware configurations
Solution
Use script to validate each nodes

History Server log size.
have seen AM logs with size 1G
controlled AM log size with yarn.app.mapreduce.am.container.log.limit.kb
log aggregation is enabled and logs are kept in HDFS for 3 days

Q&AWe are hiring for rockstars like you. If interested please drop a note at
sivakumar@inmobi.com/srikanth.sundarrajan@inmobi.com

Hadoop administration

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Hadoop administration

Similar to Hadoop administration (20)

Recently uploaded

Recently uploaded (20)

Hadoop administration