Bare metal Hadoop provisioning


Published on

Creating a Hadoop cluster with cobbler and ansible. Easy and fully automated.

1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Bare metal Hadoop provisioning

  1. 1. GoDataDrivenPROUDLY PART OF THE XEBIA GROUP@krisgeuskrisgeusebroek@godatadriven.comBare metal HadoopprovisioningKris GeusebroekBig Data HackerWith ansible and cobbler1
  2. 2. -- Big Data Borat“Give man Hadoop cluster he gaininsight for a day. Teach man buildHadoop cluster he soon leave forbetter job. #bigdata”2
  3. 3. -- Kris Geusebroek“We’re hiring”3
  4. 4. Don’t want to...Manually install everything needed for a Hadoopcluster...4
  5. 5. Separate layers...- Hardware- OS- Basic install and configuration (Firewalls, IPSec, IPV6,NTPd, raise ulimits, disk formatting and mounting)- Cluster install (Cloudera Manager or HortonworksData Platform)- Extra stuff (Monitoring Ganglia, R & R-packages, ......)5
  6. 6. Want...- Horizontal scaling: Effort for an extra machine isminimal- Commodity Industry standard hardware - So cope with errors, malfunctioning, re-installation- Multiple clusters- Experiment first with appropriate configuration for aspecific goal -Think memory, hard disks, number of nodes6
  7. 7. Want...- Automate all the tasks for every layer- Parameterise a lot- Simple configuration of the separate layers- Definition of roles (masternode, datanode etc.)7
  8. 8. Possible with...Vendor specific toolsproblem here is they can do only a subset of all tasks8
  9. 9. What we have done here...Nothing new, just another possibilityNothing tool specific- demo installs Cloudera Manager, but works also withHortonworks Data Platform.Most important is:9
  10. 10. Stack...10
  11. 11. -- Big Data Borat“Essentially, this solution is CoSSaaS.”11
  12. 12. -- Big Data Borat“Essentially, this solution is CoSSaaS.(Couple of Shell Scripts as a Service)”12
  13. 13. Cobbler...Cobbler used for- CMS- DHCP server- OS image hosting- OS kickstartcobblerd.org13
  14. 14. Ansible...Ansible used for-Tying it all together- Initial setup of network config- One time push of SSH key- Full software installansible.cc14
  15. 15. Cloudera Manager...Cloudera Manager used for- Cluster install software.- Currently manual labour, can be automated usingthe APIcloudera.com15
  16. 16. Show me the code...Add node information to the cobbler CMSFirst make the install dvd known to cobbler:mount -t iso9660 -o loop /<directoryname>/CentOS-6.4-x86_64-bin-DVD1.iso /mnt/dvdcobbler import --path=/mnt/dvd --name=CentOS64Next make the node information known:sudo cobbler system add --name=node01 --profile=CentOS64-x86_64 --hostname=node01--mac=<00:00:00:00:00:00> --ip-address= --static=TrueIf needed, re-enable the netboot flag:sudo cobbler system edit --name=node01 --netboot-enabled=True16
  17. 17. Show me the code...Ansible needs to know what goes where[cluster]node01node02node03[cobbler]cobbler[proxy]cobbler[ganglia-master]node01[ganglia-nodes:children]cluster[cloudera-manager]node0117
  18. 18. Show me the code...For the rest it’s just a DSL thinghy with extra’s- hosts:- cloudera-manager- clusteruser: rootsudo: yesvars_files:- vars/common.ymltasks:- include: cloudera-manager/tasks/common.ymlhandlers:- include: cloudera-manager/handlers/main.yml- name: Configure CM4 Repocopy: src=cloudera-manager/files/etc/yum.repos.d/cm4.repo dest=/etc/yum.repos.d/ owner=rootgroup=root- name: Install CM4 common stuffyum: name=$item state=installed18
  19. 19. Demo...19
  20. 20. Shared problems...- No magic: Vendor specific hardware can screwthings up (strange names for disk mounts forexample)- Bios settings, different RAID settings are not handled(yet).- Large amount of initial network traffic with largeclusters (N-times downloading the same softwarepackages from yum repositories) => Repo mirroringto the rescue- MAC address of all nodes must be known20
  21. 21. Take aways...- Do automate from the start- It’s easy- Use (our) open source code to get a head start Our team will do the additional consultancy21
  22. 22. GoDataDrivenWe’re hiring / Questions? / Thank you!@krisgeuskrisgeusebroek@godatadriven.comKris GeusebroekBig Data Hacker22