Cluster management and automation with cloudera manager

3,811 views

Published on

Darren Lo's talk from #lspe http://www.meetup.com/SF-Bay-Area-Large-Scale-Production-Engineering/events/129859402/

Published in: Technology

Cluster management and automation with cloudera manager

  1. 1. Cluster Management and Automation with Cloudera Manager Darren Lo – Software Engineer at Cloudera
  2. 2. Agenda ● Hadoop Installation and Setup ● Diagnosing Problems ● Automating Management Tasks ● Links
  3. 3. Hadoop is... ● Fast-changing – New features all the time ● Different from other IT projects – One application on many hosts; not vice-versa ● Complex – Things you might run: HDFS, MapReduce, Yarn, ZooKeeper, Oozie, Hive, Pig, HBase, Sqoop, Solr, Cloudera Impala... ● Useful
  4. 4. Many Common Setup Issues ● Operating system issues – Transparent Huge Pages – Ulimits – Clock Skew ● Networking issues – Reverse-lookup does must report FQDN – NICs can negotiate less than full speed These are just examples. There are many more!
  5. 5. Let others do the work for you ● Cloudera's Distribution including Apache Hadoop (CDH) – Enterprise-Ready: Tested and deployed in production on 10s of 1000s of nodes – Enterprise-grade features and innovation ● Fine-grained Authorization (Sentry) ● Impala, Search – 100% open source and Apache licensed
  6. 6. Cloudera Manager ● Available for free – Any number of nodes – Manage all services available in CDH – Set up, configure, monitor, diagnose, and upgrade – Complex workflows – Kerberos – API ● 5 Years of expertise baked into product
  7. 7. Installing with Cloudera Manager
  8. 8. Installing with Cloudera Manager
  9. 9. Installing with Cloudera Manager
  10. 10. Installing with Cloudera Manager
  11. 11. Installing with Cloudera Manager
  12. 12. Installing with Cloudera Manager
  13. 13. Installing with Cloudera Manager
  14. 14. Installing with Cloudera Manager
  15. 15. Installation Complete ● Everything is up and running – Great! ● Add users and start running jobs, and get a whole new set of challenges – Great...
  16. 16. Next Challenges ● Find, Diagnose and fix problems – Why are my HBase queries slow? ● View cluster activity – Who ran the MapReduce job that made my HBase queries slow? ● Get alerts for any problems that come up – Outage at 2AM, you want that wake-up call...right?
  17. 17. Health Tests ● Common problems that are easy to check – Are any processes down? – Are HDFS reads and writes working? – Are HDFS checkpoints too slow? – Has a host been swapping? – Is there too much Clock Skew?
  18. 18. Health Tests
  19. 19. Log Search ● Grep works great on 1 machine, not 100's ● Useful to answer – What errors/warnings occurred when my service was slow? – Has this error occurred before? – When did a problem start happening?
  20. 20. Log Search
  21. 21. Events and Alerts ● CM publishes a stream of events – Critical events are alerts ● Event search ● Integrate with external tools like Nagios
  22. 22. Activity Monitor ● Who was running stuff when the cluster had problems? ● See who is running MR jobs – identifies Hive jobs too
  23. 23. Activity Monitor
  24. 24. Metrics and Charts ● Like Log search, a must-have for any distributed system ● Hadoop services expose many metrics ● Collect and visualize these with – Cloudera Manager – Ganglia
  25. 25. Charting with Cloudera Manager
  26. 26. Charting with Cloudera Manager
  27. 27. Charting with Cloudera Manager
  28. 28. Next Challenges ● We know how to set up a cluster manually ● We know how to identify, diagnose and fix issues ● Also need to handle regular tasks – Grow cluster – Replace hardware
  29. 29. Cloudera Manager API ● Setup – Create / configure cluster and services – Configure new host to run on cluster ● Workflows – Enable HDFS High Availability – Enable MapReduce JobTracker High Availability – Decommission / Recommission host ● Monitoring – Metrics used for charting available via API – Health checks, including export to Nagios – Events
  30. 30. Cloudera Manager API ● http://cloudera.github.com/cm_api/ ● Java and Python client bindings ● Shell ● Export health information into Nagios
  31. 31. Common Integration Questions ● Nagios – yes ● Even have tools to help integrate ● Chef – not yet ● Puppet – yes ● Customers use CM and puppet together to press button and stamp out new cluster ● Snmp – yes ● events published and can be integrated
  32. 32. Links ● Hadoop Operations - A Guide for Developers and Administrators – Book by Eric Sammer ● CM Architecture blog – http://blog.cloudera.com/blog/2013/07/how-does-cloudera-manager-work/ ● API Examples and Tutorials – http://cloudera.github.io/cm_api/ – http://blog.cloudera.com/blog/2013/05/how-to-automate-your-hadoop-cluster-from-java/ – http://blog.cloudera.com/blog/2012/09/automating-your-cluster-with-cloudera-manager-api/ ● Cloudera Manager installer link and docs – http://www.cloudera.com/content/support/en/downloads.html – http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM4Ent/latest/Cloudera-Manager- Installation-Guide/Cloudera-Manager-Installation-Guide.html
  33. 33. Enterprise Features ● Easily upload support bundle – Enables proactive support – Fix problems more quickly ● Rolling Upgrades and Restarts ● Backup and Disaster Recovery ● Auditing ● Operational Reports ● Configuration History and Rollback ● LDAP

×