– New features all the time
● Different from other IT projects
– One application on many hosts; not vice-versa
– Things you might run: HDFS, MapReduce, Yarn, ZooKeeper,
Oozie, Hive, Pig, HBase, Sqoop, Solr, Cloudera Impala...
Many Common Setup Issues
Operating system issues
– Transparent Huge Pages
– Clock Skew
– Reverse-lookup does must report FQDN
– NICs can negotiate less than full speed
These are just examples. There are many more!
Let others do the work for you
Cloudera's Distribution including Apache
– Enterprise-Ready: Tested and deployed in production on 10s of
1000s of nodes
– Enterprise-grade features and innovation
Fine-grained Authorization (Sentry)
– 100% open source and Apache licensed
Available for free
– Any number of nodes
– Manage all services available in CDH
– Set up, configure, monitor, diagnose, and upgrade
– Complex workflows
5 Years of expertise baked into product
● Everything is up and running – Great!
● Add users and start running jobs, and get
a whole new set of challenges – Great...
● Find, Diagnose and fix problems
– Why are my HBase queries slow?
● View cluster activity
– Who ran the MapReduce job that made my HBase
● Get alerts for any problems that come up
– Outage at 2AM, you want that wake-up call...right?
● Common problems that are easy to check
– Are any processes down?
– Are HDFS reads and writes working?
– Are HDFS checkpoints too slow?
– Has a host been swapping?
– Is there too much Clock Skew?
● Grep works great on 1 machine, not 100's
● Useful to answer
– What errors/warnings occurred when my service was slow?
– Has this error occurred before?
– When did a problem start happening?
● We know how to set up a cluster manually
● We know how to identify, diagnose and fix
● Also need to handle regular tasks
– Grow cluster
– Replace hardware
Cloudera Manager API
– Create / configure cluster and services
– Configure new host to run on cluster
– Enable HDFS High Availability
– Enable MapReduce JobTracker High Availability
– Decommission / Recommission host
– Metrics used for charting available via API
– Health checks, including export to Nagios
Cloudera Manager API
● Java and Python client bindings
● Export health information into Nagios
Common Integration Questions
● Nagios – yes
● Even have tools to help integrate
● Chef – not yet
● Puppet – yes
● Customers use CM and puppet together to press button
and stamp out new cluster
● Snmp – yes
● events published and can be integrated
● Hadoop Operations - A Guide for Developers and Administrators
– Book by Eric Sammer
● CM Architecture blog
● API Examples and Tutorials
● Cloudera Manager installer link and docs
● Easily upload support bundle
– Enables proactive support
– Fix problems more quickly
● Rolling Upgrades and Restarts
● Backup and Disaster Recovery
Configuration History and Rollback