Structor - Automated Building of Virtual Hadoop Clusters

2,052 views

Published on

Discusses vagrant scripts to setup and deploy a working Hadoop multiple node cluster with or without security. All source code is available on https://github.com/hortonworks/structor .

Published in: Technology

Structor - Automated Building of Virtual Hadoop Clusters

  1. 1. © Hortonworks Inc. 2014 Structor – Automated Building of Virtual Hadoop Clusters July 2014 Page 1 Owen O’Malley owen@hortonworks.com @owen_omalley
  2. 2. © Hortonworks Inc. 2014 Page 2 •Creating a virtual Hadoop cluster is hard –Takes time to set and configure VM –A “learning experience” for new engineers –Each engineer has a different setup –Experimenting is hazardous! •Setting up security is even harder –Most developers don’t test with security •Need to test Ambari and manual installs •Need to test various operating systems What’s the Problem?
  3. 3. © Hortonworks Inc. 2014 Page 3 •Create scripts to create a working Hadoop cluster –Secure or Non-secure –Multiple nodes •Vagrant –Used for creating and managing the VMs –VM starts as a base box with no Hadoop •Puppet –Used for provisioning the Hadoop packages Solution
  4. 4. © Hortonworks Inc. 2014 Page 4 •We’ve put everything for a development box into the Vagrant base box (CentOS6) –Build tools: ant, git, java, maven, protobuf, thrift •Downloaded once and cached •Setup % vagrant init omalley/centos6_x64 % vagrant up % vagrant ssh •Less than a minute Simplest Case – Development Box
  5. 5. © Hortonworks Inc. 2014 Page 5 •Ssh in with “vagrant ssh” –Account: vagrant, Password: vagrant –Become root with “sudo –i” •Clone directory to make copies •Other useful vagrant commands: % vagrant status – list virtual machines % vagrant suspend – suspend virtual machines % vagrant resume – resume virtual machines % vagrant destroy – destroy virtual machines Using the Box
  6. 6. © Hortonworks Inc. 2014 Page 6 •Commands to start cluster % git clone git@github.com:hortonworks/structor.git % cd structor % vagrant up •Default profile has 3 machines –gw – client gateway machine –nn – master (NameNode, ResourceMgr) –slave1 – slaves (DataNode, NodeManager) •HDFS, Yarn, Hive, Pig, and Zookeeper Setting up Non-Secure Cluster
  7. 7. © Hortonworks Inc. 2014 Page 7 •Add hostnames to /etc/hosts 240.0.0.10 gw.example.com 240.0.0.11 nn.example.com 240.0.0.12 slave1.example.com 240.0.0.13 slave2.example.com 240.0.0.14 slave3.example.com •HDFS – http://nn.example.com:50070/ •Yarn – http://nn.example.com:8088/ •For security –Modify /etc/krb5.conf as in README.md. –Use Safari or Firefox (needs config change) Setting up your Mac
  8. 8. © Hortonworks Inc. 2014 Page 8 •Commands to start cluster % ln –s profiles/3node-secure.profile current.profile % mkdir generated (bug workaround) % vagrant up •Brings up 3 machines with security –Includes a kdc and principles •Yarn Web UI - https://nn.exaple.com:8090 •“kinit vagrant” on your Mac for Web UI •Ssh to gw and kinit for the CLI Setting up Secure Cluster
  9. 9. © Hortonworks Inc. 2014 Page 9 •JSON files that control cluster •3 node secure cluster: { "domain": "example.com”, "realm": "EXAMPLE.COM", "security": true, "vm_mem": 2048, "server_mem": 300, "client_mem": 200, "clients" : [ "hdfs", "yarn", "pig", "hive", "zk" ], "nodes": [ { "hostname": "gw", "ip": "240.0.0.10", "roles": [ "client" ] }, { "hostname": "nn", "ip": "240.0.0.11", "roles": [ "kdc", "nn", "yarn", "hive-meta", "hive-db”, "zk" ]}, { "hostname": "slave1", "ip": "240.0.0.12", "roles": [ "slave" ]}]} Profiles
  10. 10. © Hortonworks Inc. 2014 Page 10 •Various profiles –1node-nonsecure –3node-secure –5node-nonsecure –ambari-nonsecure –knox-nonsecure •Great way to setup Ambari cluster •Project owners should add their project –Help other developers use your project Additional Profiles
  11. 11. © Hortonworks Inc. 2014 Page 11 •The master branch is Hadoop 2.4 –There is also an Hadoop 1.1 (hdp-1.3) branch •All packages are installed via Puppet –Uses built in OS package tools •Repo file is in files/repos/hdp.repo –Can override source of packages –Easy to change to download custom builds Choosing HDP versions
  12. 12. © Hortonworks Inc. 2014 Page 12 •Each configuration file is templated •HDFS configuration is in –modules/hdfs_client/templates/*.erb –Changes will apply to all nodes •We use Ruby to find NameNode: <% @namenode = eval(@nodes).select {|node| node[:roles].include? 'nn’} [0][:hostname] + "." + @domain; %> <property> <name>fs.defaultFS</name> <value>hdfs://<%= @namenode %>:8020</value> </property> Configuration Files (eg. core-site.xml)
  13. 13. © Hortonworks Inc. 2014 Page 13 •Actual work is done via Puppet –Hides details of each OS •Modularized –Top level is manifests/default.pp –Each module is in modules/* •Top level looks like: include selinux include ntp if $security == "true" and hasrole($roles, 'kdc') { include kerberos_kdc } Puppet
  14. 14. © Hortonworks Inc. 2014 Page 14 •Add other Hadoop ecosystem tools –Tez –HBase •Add other operating systems –Ubuntu, Suse, CentOS 5 •Support other Vagrant providers –Amazon EC2 –Docker •Support for other backing RDBs Future Directions
  15. 15. © Hortonworks Inc. 2013 Thank You! Questions & Answers Page 15

×