Your SlideShare is downloading. ×

Bare metal Hadoop provisioning

1,170
views

Published on

Creating a Hadoop cluster with cobbler and ansible. Easy and fully automated.

Creating a Hadoop cluster with cobbler and ansible. Easy and fully automated.


0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,170
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
37
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. GoDataDrivenPROUDLY PART OF THE XEBIA GROUP@krisgeuskrisgeusebroek@godatadriven.comBare metal HadoopprovisioningKris GeusebroekBig Data HackerWith ansible and cobbler1
  • 2. -- Big Data Borat“Give man Hadoop cluster he gaininsight for a day. Teach man buildHadoop cluster he soon leave forbetter job. #bigdata”2
  • 3. -- Kris Geusebroek“We’re hiring”3
  • 4. Don’t want to...Manually install everything needed for a Hadoopcluster...4
  • 5. Separate layers...- Hardware- OS- Basic install and configuration (Firewalls, IPSec, IPV6,NTPd, raise ulimits, disk formatting and mounting)- Cluster install (Cloudera Manager or HortonworksData Platform)- Extra stuff (Monitoring Ganglia, R & R-packages, ......)5
  • 6. Want...- Horizontal scaling: Effort for an extra machine isminimal- Commodity Industry standard hardware - So cope with errors, malfunctioning, re-installation- Multiple clusters- Experiment first with appropriate configuration for aspecific goal -Think memory, hard disks, number of nodes6
  • 7. Want...- Automate all the tasks for every layer- Parameterise a lot- Simple configuration of the separate layers- Definition of roles (masternode, datanode etc.)7
  • 8. Possible with...Vendor specific toolsproblem here is they can do only a subset of all tasks8
  • 9. What we have done here...Nothing new, just another possibilityNothing tool specific- demo installs Cloudera Manager, but works also withHortonworks Data Platform.Most important is:9
  • 10. Stack...10
  • 11. -- Big Data Borat“Essentially, this solution is CoSSaaS.”11
  • 12. -- Big Data Borat“Essentially, this solution is CoSSaaS.(Couple of Shell Scripts as a Service)”12
  • 13. Cobbler...Cobbler used for- CMS- DHCP server- OS image hosting- OS kickstartcobblerd.org13
  • 14. Ansible...Ansible used for-Tying it all together- Initial setup of network config- One time push of SSH key- Full software installansible.cc14
  • 15. Cloudera Manager...Cloudera Manager used for- Cluster install software.- Currently manual labour, can be automated usingthe APIcloudera.com15
  • 16. Show me the code...Add node information to the cobbler CMSFirst make the install dvd known to cobbler:mount -t iso9660 -o loop /<directoryname>/CentOS-6.4-x86_64-bin-DVD1.iso /mnt/dvdcobbler import --path=/mnt/dvd --name=CentOS64Next make the node information known:sudo cobbler system add --name=node01 --profile=CentOS64-x86_64 --hostname=node01--mac=<00:00:00:00:00:00> --ip-address=10.20.0.101 --static=TrueIf needed, re-enable the netboot flag:sudo cobbler system edit --name=node01 --netboot-enabled=True16
  • 17. Show me the code...Ansible needs to know what goes where[cluster]node01node02node03[cobbler]cobbler[proxy]cobbler[ganglia-master]node01[ganglia-nodes:children]cluster[cloudera-manager]node0117
  • 18. Show me the code...For the rest it’s just a DSL thinghy with extra’s- hosts:- cloudera-manager- clusteruser: rootsudo: yesvars_files:- vars/common.ymltasks:- include: cloudera-manager/tasks/common.ymlhandlers:- include: cloudera-manager/handlers/main.yml- name: Configure CM4 Repocopy: src=cloudera-manager/files/etc/yum.repos.d/cm4.repo dest=/etc/yum.repos.d/ owner=rootgroup=root- name: Install CM4 common stuffyum: name=$item state=installed18
  • 19. Demo...19
  • 20. Shared problems...- No magic: Vendor specific hardware can screwthings up (strange names for disk mounts forexample)- Bios settings, different RAID settings are not handled(yet).- Large amount of initial network traffic with largeclusters (N-times downloading the same softwarepackages from yum repositories) => Repo mirroringto the rescue- MAC address of all nodes must be known20
  • 21. Take aways...- Do automate from the start- It’s easy- Use (our) open source code to get a head starthttps://github.com/godatadriven/ansible_cluster- Our team will do the additional consultancy21
  • 22. GoDataDrivenWe’re hiring / Questions? / Thank you!@krisgeuskrisgeusebroek@godatadriven.comKris GeusebroekBig Data Hacker22

×