Cloudera hadoop installation


Published on

Steps for installation of Clouder Hadoop on Ubuntu 12.04

Published in: Engineering, Technology
  • Be the first to comment

Cloudera hadoop installation

  1. 1. Cloudera Hadoop (CDH 4) Installation on Ubuntu 12.04 LTS Sumitra Pundlik Assistant Professor Department of Computer Engineering MIT College of Engineering Kothrud, Pune 411038
  2. 2. Agenda ● Introduction to Hadoop ● Various components of Hadoop ● Installation steps for Cloudera Hadoop
  3. 3. Introduction to Hadoop                ● The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. ● It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. ● The library itself is designed to detect and handle failures at the application layer.
  4. 4. Various Components of Hadoop
  5. 5. The project includes these modules: Hadoop Common: The common utilities that support the other Hadoop modules. Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput access to application data. Hadoop YARN: A framework for job scheduling and cluster resource management. Hadoop MapReduce: A YARN-based system for parallel processing of large data sets.
  6. 6. Ambari™: A web-based tool for provisioning, managing, and monitoring Apache Hadoop clusters which includes support for Hadoop HDFS, Hadoop MapReduce, Hive, HCatalog, HBase, ZooKeeper, Oozie, Pig and Sqoop. Avro™: A data serialization system. Cassandra™: A scalable multi-master database with no single points of failure. Chukwa™: A data collection system for managing large distributed systems. HBase™: A scalable, distributed database that supports structured data storage for large tables. Hive™: A data warehouse infrastructure that provides data summarization and ad hoc querying. Mahout™: A Scalable machine learning and data mining library.
  7. 7. ● Pig™: A high-level data-flow language and execution framework for parallel computation. ● Spark™: A fast and general compute engine for Hadoop data. Spark provides a simple and expressive programming model that supports a wide range of applications, including ETL, machine learning, stream processing, and graph computation. ● Tez™: A generalized data-flow programming framework, built on Hadoop YARN. ● ZooKeeper™: A high-performance coordination service for distributed applications
  8. 8. Cloudera Hadoop Installation ● What is Cloudera Hadoop? ● What is Cloudera Manager? ● Prerequisite for installation ● Installation Steps with Screen Shot
  9. 9. What is Cloudera Hadoop ● CDH is the world’s most complete, tested, and popular distribution of Apache Hadoop. ● CDH is 100% Apache-licensed open source. ● CDH bundled all Hadoop related projects at one place.
  10. 10. What is Cloudera Manager ● Cloudera Manager automates the installation and configuration of CDH on an entire cluster. ● Prerequisite  Update your Ubuntu  Password less ssh  Password less sudo  Edit host file  Install database(MySQL/PostgreSQL/Oracle)  Install JDBC connector for above databases.
  11. 11. Update Your Ubuntu Machine ● Run sudo apt-get update ● If you have any problem for update sudo -i apt-get clean cd /var/lib/apt mv lists lists.old mkdir -p lists/partial apt-get clean apt-get update ● Still you are facing problem contact your Technical Assistant
  12. 12. Password less SSH ● Secure Shell (SSH) is a cryptographic network protocol for secure data communication, remote command-line login, remote command execution, and other secure network services between two networked computers. ● Install OpenSSH sudo apt-get install openssh-server openssh-client and change configuration of sshd_config file /etc/ssh/ by using sudo gedit /etc/ssh/sshd_config and set PubkeyAuthentication to YES sudo /etc/init.d/ssh reload
  13. 13. Password less SSH ● Run following command for password less ssh 1 ssh-keygen 2 ssh-add 3 ssh-copy-id -i exam@ 4 ssh exam@ Run 3 and 4 command for cluster implementation with specific hostname or user_name@ip_address from master machine It means connect client machines from master machine.
  14. 14. Password less sudo ● Make Sudo password less ● Make changes in sudoers file sudo gedit /etc/sudoers %sudo ALL:= NOPASSWD:ALL save that file ● For Cluster Implementation Need to change sudoers file of each and every client machine
  15. 15. Edit hosts file ● In this file mention IP address and host name of machine example ccompl0910 for cluster implementation mention all client IP address and Host name in Masters hosts file and masters IP address and Host Name in each clients hosts file
  16. 16. Install database MySQL sudo apt-get install mysql-server-5.5 login :-root password :-password
  17. 17. Install JDBC connector and configure for secure installation sudo apt-get install libmysql-java sudo /usr/bin/mysql_secure_installation Enter current password for root (enter for none): password Change the root password? [Y/n] n Remove anonymous users? [Y/n] y Disallow root login remotely? [Y/n] n Remove test database and access to it? [Y/n] y Reload privilege tables now? [Y/n] y Restart mysql server sudo service mysql restart
  18. 18. Create Database Mysql -u root -p and enter password create database sttpdatabase; create database hive; We need separate database for following activities Activity Monitor Service Monitor Report Manager Host Monitor Cloudera Navigator
  19. 19. Supported OS ● Ubuntu 10.04 (Lucid Lynx), 64-bit ● Ubuntu 12.04 (Precise Pangolin), 64-bit ● Supported Browsers Firefox 11 or later Google Chrome Internet Explorer 9 Safari 5 or later
  20. 20. ● Supported Databases ● MySQL - 5.0, 5.1, 5.5 ● Oracle - 10g Release 2, 11g Release 2 ● PostgreSQL - 8.1, 8.3, 8.4, 9.1 ● Supported JDK ● JDK1.7 or later
  21. 21. ● Resources ● Cloudera Manager Server: 5 GB on the partition hosting /var. 500 MB on the partition hosting /usr RAM - 4 GB is appropriate for most cases, and is required when using Oracle databases Python - Cloudera Manager uses Python. ● Installation Path Path A: Automated Path Path B: Your Own Method
  22. 22. PATH A Installation ● Step 1: Download and Run the Cloudera Manager Installer ● Download cloudera-manager-installer.bin ● Install Cloudera Manager on a single host. ● Change it to have executable permission chmod u+x cloudera-manager-installer.bin ● Run installer bin sudo ./cloudera-manager-installer.bin ● after completion of installer bin set up open browser with http://localhost:7180 ● Login : admin ● Password : admin
  23. 23. Row 1 Row 2 Row 3 Row 4 0 2 4 6 8 10 12 Column 1 Column 2 Column 3