Hadoop cluster setup by using cloudera manager


Published on

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Hadoop cluster setup by using cloudera manager

  1. 1. Hadoop Cluster Setup A Simple Way by Cloudera Manager Peng-Yi Lai Co-graph confidential
  2. 2. Outline ▪ Cloudera Manager – Set Up Your Hadoop ▪ Flume – Data Collection Tool Co-graph confidential
  3. 3. Before Starting ▪ Ask yourself what do you want! An expert to make Hadoop itself better Provide Service by Using Hadoop Co-graph confidential
  4. 4. As a Hadoop Expert Better to know Hadoop as detail as possible Companies like Cloudera and MapR Co-graph confidential
  5. 5. Other Usages on Hadoop 1. Learn how to use Hadoop to solve problems more effectively and efficiently 2. Find an easiest way to make sure your Hadoop can work properly Co-graph confidential
  6. 6. Desired Skills ▪ Network knowledge is imperative ▪ Every node in a cluster communicates with each other through network ▪ Even with cloudera manager, you still need to handle it on your own ▪ Linux administration ▪ Everyone knows that!! Co-graph confidential
  7. 7. Requirement for Cloudera Manager (1) ▪ Prepare Your Machines ▪ Supported OS version ▪ Only 64bit Linux-based ▪ Supported Browsers ▪ For admin console ▪ Supported Database ▪ If you need to use custom database other than embedded PostgreSQL database ▪ Supported JDK version ▪ Cloudera Manager would install it for you if there is no JDK installed ▪ Repositories ▪ All hosts must have to access standard packages repositories and Cloudera Hadoop repositories Co-graph confidential
  8. 8. Requirement for Cloudera Manager (2) ▪ Networking and Security ▪ Properly configuring DNS or /etc/hosts ▪ Everyone should know who’s who ▪ Using root account ro password-less sudo permision ssh access to all cluster machines ▪ No blocking by iptables or firewalls ▪ 7180 port is used to access Cloudera Manager ▪ No blocking by Security-Enhanced Linux (SELinux) ▪ disabled ▪ There are more details on cloudera.com ▪ If there is a problem, don’t feel ashamed to google! Co-graph confidential
  9. 9. Set Up a Hadoop Cluster ▪ After everything is done, install clouderamanager-installer.bin from the Cloudera Downlaods page ▪ Change the permission and install ▪ Login to admin console on http://<Server host>:7180 ▪ Follow the steps by Cloudera Manager ▪ Done! Co-graph confidential
  10. 10. Cloudera Manager Login Co-graph confidential
  11. 11. Specify Hosts Co-graph confidential
  12. 12. Hosts Found Co-graph confidential
  13. 13. Waiting for Installation Co-graph confidential
  14. 14. Home Co-graph confidential
  15. 15. Actions of Services Co-graph confidential
  16. 16. HDFS Service Co-graph confidential
  17. 17. Configuration of HDFS Co-graph confidential
  18. 18. Selected Services Co-graph confidential
  19. 19. Services to Add Co-graph confidential
  20. 20. All Hosts Co-graph confidential
  21. 21. Information of a Host Co-graph confidential
  22. 22. More about Cloudera Manager ▪ Easy to upgrade your CHD version ▪ Easy to add/delete a host and a cluster ▪ Easy to configure High Availability (HA) ▪ Support Hadoop security by using Kerberos ▪ Support backup and disaster recovery Co-graph confidential
  23. 23. For Developer ▪ Use Hue (another topic) Co-graph confidential
  24. 24. Observation Co-graph confidential
  25. 25. Flume A Data Collection Tool Co-graph confidential
  26. 26. Two Ways to Use Flume Independent of Hadoop cluster • Flume can totally run by itself • Configure flume.conf in /etc/flume-ng/conf On cluster of Hadoop Or a node managed by Cloudera Manager • Easy to keep the agent nodes under control • Start, Stop, Restart service on admin console • Configure flume on admin console • Convenient to check log file Co-graph confidential
  27. 27. 3 Important Settings Source • Define what kind of events sent by external source to accept Channel • Define which way to keep the event until it’s consumed by a Flume sink Sink • Define which repository like HDFS or Flume agent to put/forward the event kept in Channel Co-graph confidential
  28. 28. Type Example ▪ Source ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ Avro Source Exec Source JMS Source NetCat Source Syslog TCP Source Syslog UDP Source HTTP Source Thrift Legacy Source …etc ▪ Channel ▪ Memory Channel ▪ JDBC Channel ▪ File Channel ▪ Pseudo Transaction Channel ▪ Custom Channel Co-graph confidential ▪ Sink ▪ HDFS Sink ▪ Logger Sink ▪ Avro Sink ▪ Thrift Sink ▪ IRC Sink ▪ File Roll Sink ▪ HBaseSink ▪ …etc
  29. 29. Example of Setting Co-graph confidential
  30. 30. Use Cloudera Manager Co-graph confidential
  31. 31. Co-graph confidential