Bare metal Hadoop provisioning

Bare metal Hadoop provisioning



Creating a Hadoop cluster with cobbler and ansible. Easy and fully automated.

Creating a Hadoop cluster with cobbler and ansible. Easy and fully automated.



Total Views
Views on SlideShare
Embed Views



1 Embed 13 13


Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    Bare metal Hadoop provisioning Bare metal Hadoop provisioning Presentation Transcript

    • GoDataDrivenPROUDLY PART OF THE XEBIA GROUP@krisgeuskrisgeusebroek@godatadriven.comBare metal HadoopprovisioningKris GeusebroekBig Data HackerWith ansible and cobbler1
    • -- Big Data Borat“Give man Hadoop cluster he gaininsight for a day. Teach man buildHadoop cluster he soon leave forbetter job. #bigdata”2
    • -- Kris Geusebroek“We’re hiring”3
    • Don’t want to...Manually install everything needed for a Hadoopcluster...4
    • Separate layers...- Hardware- OS- Basic install and configuration (Firewalls, IPSec, IPV6,NTPd, raise ulimits, disk formatting and mounting)- Cluster install (Cloudera Manager or HortonworksData Platform)- Extra stuff (Monitoring Ganglia, R & R-packages, ......)5
    • Want...- Horizontal scaling: Effort for an extra machine isminimal- Commodity Industry standard hardware - So cope with errors, malfunctioning, re-installation- Multiple clusters- Experiment first with appropriate configuration for aspecific goal -Think memory, hard disks, number of nodes6
    • Want...- Automate all the tasks for every layer- Parameterise a lot- Simple configuration of the separate layers- Definition of roles (masternode, datanode etc.)7
    • Possible with...Vendor specific toolsproblem here is they can do only a subset of all tasks8
    • What we have done here...Nothing new, just another possibilityNothing tool specific- demo installs Cloudera Manager, but works also withHortonworks Data Platform.Most important is:9
    • Stack...10
    • -- Big Data Borat“Essentially, this solution is CoSSaaS.”11
    • -- Big Data Borat“Essentially, this solution is CoSSaaS.(Couple of Shell Scripts as a Service)”12
    • Cobbler...Cobbler used for- CMS- DHCP server- OS image hosting- OS kickstartcobblerd.org13
    • Ansible...Ansible used for-Tying it all together- Initial setup of network config- One time push of SSH key- Full software installansible.cc14
    • Cloudera Manager...Cloudera Manager used for- Cluster install software.- Currently manual labour, can be automated usingthe APIcloudera.com15
    • Show me the code...Add node information to the cobbler CMSFirst make the install dvd known to cobbler:mount -t iso9660 -o loop /<directoryname>/CentOS-6.4-x86_64-bin-DVD1.iso /mnt/dvdcobbler import --path=/mnt/dvd --name=CentOS64Next make the node information known:sudo cobbler system add --name=node01 --profile=CentOS64-x86_64 --hostname=node01--mac=<00:00:00:00:00:00> --ip-address= --static=TrueIf needed, re-enable the netboot flag:sudo cobbler system edit --name=node01 --netboot-enabled=True16
    • Show me the code...Ansible needs to know what goes where[cluster]node01node02node03[cobbler]cobbler[proxy]cobbler[ganglia-master]node01[ganglia-nodes:children]cluster[cloudera-manager]node0117
    • Show me the code...For the rest it’s just a DSL thinghy with extra’s- hosts:- cloudera-manager- clusteruser: rootsudo: yesvars_files:- vars/common.ymltasks:- include: cloudera-manager/tasks/common.ymlhandlers:- include: cloudera-manager/handlers/main.yml- name: Configure CM4 Repocopy: src=cloudera-manager/files/etc/yum.repos.d/cm4.repo dest=/etc/yum.repos.d/ owner=rootgroup=root- name: Install CM4 common stuffyum: name=$item state=installed18
    • Demo...19
    • Shared problems...- No magic: Vendor specific hardware can screwthings up (strange names for disk mounts forexample)- Bios settings, different RAID settings are not handled(yet).- Large amount of initial network traffic with largeclusters (N-times downloading the same softwarepackages from yum repositories) => Repo mirroringto the rescue- MAC address of all nodes must be known20
    • Take aways...- Do automate from the start- It’s easy- Use (our) open source code to get a head start Our team will do the additional consultancy21
    • GoDataDrivenWe’re hiring / Questions? / Thank you!@krisgeuskrisgeusebroek@godatadriven.comKris GeusebroekBig Data Hacker22