Next Generation Hadoop Operations

Next-Generation Hadoop
Operations
What’s ahead in the next 12 months
for Hadoop cluster administration

Andrew Ryan
AppOps Engineer
Feb 16, 2011

Agenda
Hadoop operations @Facebook: an
1 overview

2 Existing operational best practices
The challenges ahead: new directions in
3 Hadoop

4 Emerging operational best practices

5 Conclusions and next steps

Hadoop Operations @Facebook
▪  Lean staffing, fast moving, highly leveraged
▪  Basic oncall structure:
▪  Level 1: 24x7 sysadmin team (“SRO”) for whole site
▪  Level 2: 2 people (“AppOps”) trading 1-week oncall shifts
▪  Level 3: 4 different Hadoop dev subteams with 1-week rotations
▪  Plus oncalls from other adjunct teams: SiteOps for machine
repairs, NetEng for network, etc.
▪  Every engineer @FB is issued a cell phone and expected to be
available in emergencies and/or if they make a change to a
production system or code.

Operational gaps in Hadoop
Our best practices address all these gaps
▪  Hardware selection, preparation, and configuration
▪  Installation/packaging
▪  Upgrades
▪  Autostart/start/stop/restart/status as correct UNIX user
▪  Node level application and system monitoring
▪  Cluster-level and job-level monitoring
▪  Integrated log viewing/tailing/grepping
▪  Fast, reliable, centrally logged cluster-level shell ( != slaves.sh)

Existing operational best practices (1)
Sysadmin
▪  All the stuff you would do for a large distributed system but especially…
▪  Failed/failing hardware is your biggest enemy. FIND IT AND FIX IT, OR
GET IT OUT OF YOUR CLUSTERS! (the ‘excludes’ file is your friend)
▪  Regularly run every possible diagnostic to safely scan for bad hardware
▪  Identify and remove “repeat offender” hardware

▪  Fail fast, recover quickly, small things add up in big clusters:
▪  RHEL/Centos kickstart steals your disk space (1.5%-3%+ per disk)
▪  No swap + vm.panic_on_oom=1 + kernel.kdb=0 for “fast auto reboot
on OOM”
▪  Never fsck ext3 data drives unless Hadoop says you have to

Sysadmin example
Identifying your “America’s Most Wanted” pays off

Tooling
▪  Maintain a central registry of clusters, nodes, and each node’s role in
the cluster, integrated with your service/asset management platform
▪  Build centrally maintained tools to:
▪  Start/stop/restart/autostart daemons on hosts (hadoopctl)
▪  View/grep/tail daemon logs on hosts (hadooplog)
▪  Start/stop, or execute commands on entire clusters (clusterctl)
▪  Manage excludes files based on repair status (excluderator)
▪  Deploy any arbitrary version of software to clusters
▪  Monitor daemon health and collect statistics

Tooling example
Deploy & upgrade clusters
# Deploy an HDFS/MapReduce cluster pair: 2 to 4000 nodes via
torrent
$ deploy-hadoop-release.py --clusterdeploy=DFS1,SILVER
branch@rev
$ clusterctl restart DFS1 SILVER

# “Refresh deploy” on 10 clusters, and then restart just the datanodes
$ deploy-hadoop-release.py –poddeploy=DFSSCRIBE-ALL redeploy
$ clusterctl restart DFSSCRIBE-ALL:datanode

Process
▪  Document everything
▪  Segregate different classes of users on different clusters, with
appropriate service levels and capacities
▪  Graph user-visible metrics like HDFS and job latency
▪  Build “least destructive” procedures for getting hardware back in
service
▪  Developers and Ops should use the same procedures and tools

Process example
Graphing our users’ experience on the cluster

A Hadoop cluster admin’s worst
enemies

▪  The “X-Files”: machines which fail in strange ways, undetected by your
monitoring systems
▪  Get your basics under control, then you’ll have more time for these
▪  “America’s Most Wanted”: machines which keep failing, again and
again
▪  Our data: 1% of our machines accounted for 30% of our repair
tickets

New directions for Hadoop
▪  Hbase (Facebook Messages, real-time click logs)
▪  Zero-downtime upgrades (AvatarNode, rolling upgrades)
▪  “Megadatanodes” and Hadoop RAID
▪  HDFS as an “appliance”

See also:
http://www.facebook.com/notes/facebook-engineering/looking-at-
the-code-behind-our-three-uses-of-apache-hadoop/468211193919

Hbase and Hadoop
▪  Very new technology with emerging operational characteristics
▪  Applications using Hbase are also new, with their own usage quirks
▪  Aiming for large number of small clusters (~100 nodes)
▪  Slow/dead nodes are a big problem: these are real-time, user facing
▪  Region failover slow ; no speculative execution
▪  Full-downtime restarts must be avoided

View the Messages tech talk here: http://fb.me/95OQ8YaD2rkb3r

Zero-downtime upgrades
▪  HDFS upgrades are 1-2 hours of downtime
▪  Jobtracker upgrades are quick (5 min), but kill all currently running
jobs
▪  Rolling upgrades work today, but are too slow for large clusters
▪  Must be able to be strict and lenient about multiple versions of client
and server software installed and running in the cluster

“Megadatanodes” and Hadoop RAID
▪  Storage requirements continue to increase rapidly, as does CPU/
RAM
▪  9X increase in datanode density from 2009-2011 (4TB->36TB)
▪  Hadoop RAID with XOR and Reed-Solomon bring tremendous cost
savings along with management challenges:
▪  Losing one node is a big deal (200k-600k blocks/node?). A rack?
Ouch!
▪  Tools and admin capabilities are not ready yet
▪  Will HDFS administration in 2012 be “like administering a cluster of
4000 Netapps”?
▪  Host/rack level network will be a bottleneck

HDFS as an “appliance”
▪  Use HDFS cluster instead of commercial storage appliance
▪  Requires commercial-grade support & features
▪  Must be price-competitive

vs.

Emerging operational best practices
▪  More careful selection of hardware and network designs to
accommodate new uses of Hadoop
▪  Find and deal with slowness at a node/rack/segment level
▪  Auto-healing at granularity better than “reboot” or “restart”
▪  Node-level version detection and installation
▪  Rolling, zero-downtime upgrades (AvatarNode + new JobTracker)

…and do all this without making Hadoop any harder to set up and run

Next steps
▪  Are we trying to do too much?
▪  Facebook needs an enormous data warehouse
▪  Facebook needs a large distributed filesystem
▪  Facebook needs a database alternative to MySQL
▪  Facebook always looking to spend less money
▪  …and all that other stuff too
▪  Failure is not an option
▪  Never a dull moment!

Next Generation Hadoop Operations

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Next Generation Hadoop Operations

Similar to Next Generation Hadoop Operations (20)

More from Owen O'Malley

More from Owen O'Malley (19)

Recently uploaded

Recently uploaded (20)

Next Generation Hadoop Operations