SlideShare a Scribd company logo
GoDataDriven
PROUDLY PART OF THE XEBIA GROUP
@krisgeus
krisgeusebroek@godatadriven.com
Bare metal Hadoop
provisioning
Kris Geusebroek
Big Data Hacker
With ansible and cobbler
1
-- Big Data Borat
“Give man Hadoop cluster he gain
insight for a day. Teach man build
Hadoop cluster he soon leave for
better job. #bigdata”
2
-- Kris Geusebroek
“We’re hiring”
3
Don’t want to...
Manually install everything needed for a Hadoop
cluster...
4
Separate layers...
- Hardware
- OS
- Basic install and configuration (Firewalls, IPSec, IPV6,
NTPd, raise ulimits, disk formatting and mounting)
- Cluster install (Cloudera Manager or Hortonworks
Data Platform)
- Extra stuff (Monitoring Ganglia, R & R-packages, ......)
5
Want...
- Horizontal scaling: Effort for an extra machine is
minimal
- Commodity Industry standard hardware
	

 - So cope with errors, malfunctioning, re-installation
- Multiple clusters
- Experiment first with appropriate configuration for a
specific goal
	

 -Think memory, hard disks, number of nodes
6
Want...
- Automate all the tasks for every layer
- Parameterise a lot
- Simple configuration of the separate layers
- Definition of roles (masternode, datanode etc.)
7
Possible with...
Vendor specific tools
problem here is they can do only a subset of all tasks
8
What we have done here...
Nothing new, just another possibility
Nothing tool specific
- demo installs Cloudera Manager, but works also with
Hortonworks Data Platform.
Most important is:
9
Stack...
10
-- Big Data Borat
“Essentially, this solution is CoSSaaS.”
11
-- Big Data Borat
“Essentially, this solution is CoSSaaS.
(Couple of Shell Scripts as a Service)”
12
Cobbler...
Cobbler used for
- CMS
- DHCP server
- OS image hosting
- OS kickstart
cobblerd.org
13
Ansible...
Ansible used for
-Tying it all together
- Initial setup of network config
- One time push of SSH key
- Full software install
ansible.cc
14
Cloudera Manager...
Cloudera Manager used for
- Cluster install software.
- Currently manual labour, can be automated using
the API
cloudera.com
15
Show me the code...
Add node information to the cobbler CMS
First make the install dvd known to cobbler:
mount -t iso9660 -o loop /<directoryname>/CentOS-6.4-x86_64-bin-DVD1.iso /mnt/dvd
cobbler import --path=/mnt/dvd --name=CentOS64
Next make the node information known:
sudo cobbler system add --name=node01 --profile=CentOS64-x86_64 --hostname=node01
--mac=<00:00:00:00:00:00> --ip-address=10.20.0.101 --static=True
If needed, re-enable the netboot flag:
sudo cobbler system edit --name=node01 --netboot-enabled=True
16
Show me the code...
Ansible needs to know what goes where
[cluster]
node01
node02
node03
[cobbler]
cobbler
[proxy]
cobbler
[ganglia-master]
node01
[ganglia-nodes:children]
cluster
[cloudera-manager]
node01
17
Show me the code...
For the rest it’s just a DSL thinghy with extra’s
- hosts:
- cloudera-manager
- cluster
user: root
sudo: yes
vars_files:
- vars/common.yml
tasks:
- include: cloudera-manager/tasks/common.yml
handlers:
- include: cloudera-manager/handlers/main.yml
- name: Configure CM4 Repo
copy: src=cloudera-manager/files/etc/yum.repos.d/cm4.repo dest=/etc/yum.repos.d/ owner=root
group=root
- name: Install CM4 common stuff
yum: name=$item state=installed
18
Demo...
19
Shared problems...
- No magic: Vendor specific hardware can screw
things up (strange names for disk mounts for
example)
- Bios settings, different RAID settings are not handled
(yet).
- Large amount of initial network traffic with large
clusters (N-times downloading the same software
packages from yum repositories) => Repo mirroring
to the rescue
- MAC address of all nodes must be known
20
Take aways...
- Do automate from the start
- It’s easy
- Use (our) open source code to get a head start
https://github.com/godatadriven/ansible_cluster
- Our team will do the additional consultancy
21
GoDataDriven
We’re hiring / Questions? / Thank you!
@krisgeus
krisgeusebroek@godatadriven.com
Kris Geusebroek
Big Data Hacker
22

More Related Content

What's hot

Installing a Cluster of Raspberry Pis with Stacki Ace
Installing a Cluster of Raspberry Pis with Stacki AceInstalling a Cluster of Raspberry Pis with Stacki Ace
Installing a Cluster of Raspberry Pis with Stacki Ace
StackIQ
 
Migrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to AzureMigrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to Azure
Revolution Analytics
 
Building a Hadoop Cluster with Stacki
Building a Hadoop Cluster with StackiBuilding a Hadoop Cluster with Stacki
Building a Hadoop Cluster with Stacki
StackIQ
 
The new AMD EPYC solutions from OVHcloud: what benefits?
The new AMD EPYC solutions from OVHcloud: what benefits?The new AMD EPYC solutions from OVHcloud: what benefits?
The new AMD EPYC solutions from OVHcloud: what benefits?
OVHcloud
 
Everyone Loves a Sausage
Everyone Loves a SausageEveryone Loves a Sausage
Everyone Loves a Sausage
Nick Jones
 
OSBConf 2015 | Using aws virtual tape library as storage for bacula bareos by...
OSBConf 2015 | Using aws virtual tape library as storage for bacula bareos by...OSBConf 2015 | Using aws virtual tape library as storage for bacula bareos by...
OSBConf 2015 | Using aws virtual tape library as storage for bacula bareos by...
NETWAYS
 
Solr on Docker - the Good, the Bad and the Ugly
Solr on Docker - the Good, the Bad and the UglySolr on Docker - the Good, the Bad and the Ugly
Solr on Docker - the Good, the Bad and the Ugly
Sematext Group, Inc.
 
Ceph Day KL - Ceph Tiering with High Performance Archiecture
Ceph Day KL - Ceph Tiering with High Performance ArchiectureCeph Day KL - Ceph Tiering with High Performance Archiecture
Ceph Day KL - Ceph Tiering with High Performance Archiecture
Ceph Community
 
Deep Learning at Scale
Deep Learning at ScaleDeep Learning at Scale
Deep Learning at Scale
Herman Wu
 
Ceph Day Bring Ceph To Enterprise
Ceph Day Bring Ceph To EnterpriseCeph Day Bring Ceph To Enterprise
Ceph Day Bring Ceph To Enterprise
Alex Lau
 
Best Deep Learning Post from LinkedIn Group
Best Deep Learning Post from LinkedIn Group Best Deep Learning Post from LinkedIn Group
Best Deep Learning Post from LinkedIn Group
Farshid Pirahansiah
 
R in Minecraft
R in Minecraft R in Minecraft
R in Minecraft
Revolution Analytics
 
Cassandra Summit 2014: Down with Tweaking! Removing Tunable Complexity for Ca...
Cassandra Summit 2014: Down with Tweaking! Removing Tunable Complexity for Ca...Cassandra Summit 2014: Down with Tweaking! Removing Tunable Complexity for Ca...
Cassandra Summit 2014: Down with Tweaking! Removing Tunable Complexity for Ca...
DataStax Academy
 
My Old Friend Malloc
My Old Friend MallocMy Old Friend Malloc
My Old Friend Malloc
Christoph Engelbert
 
OFY-2015-Cloud-In-A-Day
OFY-2015-Cloud-In-A-DayOFY-2015-Cloud-In-A-Day
OFY-2015-Cloud-In-A-Day
kbshiv
 
JavaScript, Meet Cloud : Node.js on Windows Azure
JavaScript, Meet Cloud : Node.js on Windows AzureJavaScript, Meet Cloud : Node.js on Windows Azure
JavaScript, Meet Cloud : Node.js on Windows Azure
Shiju Varghese
 
Kauli SSPにおけるVyOSの導入事例
Kauli SSPにおけるVyOSの導入事例Kauli SSPにおけるVyOSの導入事例
Kauli SSPにおけるVyOSの導入事例Kazuhito Ohkawa
 

What's hot (17)

Installing a Cluster of Raspberry Pis with Stacki Ace
Installing a Cluster of Raspberry Pis with Stacki AceInstalling a Cluster of Raspberry Pis with Stacki Ace
Installing a Cluster of Raspberry Pis with Stacki Ace
 
Migrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to AzureMigrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to Azure
 
Building a Hadoop Cluster with Stacki
Building a Hadoop Cluster with StackiBuilding a Hadoop Cluster with Stacki
Building a Hadoop Cluster with Stacki
 
The new AMD EPYC solutions from OVHcloud: what benefits?
The new AMD EPYC solutions from OVHcloud: what benefits?The new AMD EPYC solutions from OVHcloud: what benefits?
The new AMD EPYC solutions from OVHcloud: what benefits?
 
Everyone Loves a Sausage
Everyone Loves a SausageEveryone Loves a Sausage
Everyone Loves a Sausage
 
OSBConf 2015 | Using aws virtual tape library as storage for bacula bareos by...
OSBConf 2015 | Using aws virtual tape library as storage for bacula bareos by...OSBConf 2015 | Using aws virtual tape library as storage for bacula bareos by...
OSBConf 2015 | Using aws virtual tape library as storage for bacula bareos by...
 
Solr on Docker - the Good, the Bad and the Ugly
Solr on Docker - the Good, the Bad and the UglySolr on Docker - the Good, the Bad and the Ugly
Solr on Docker - the Good, the Bad and the Ugly
 
Ceph Day KL - Ceph Tiering with High Performance Archiecture
Ceph Day KL - Ceph Tiering with High Performance ArchiectureCeph Day KL - Ceph Tiering with High Performance Archiecture
Ceph Day KL - Ceph Tiering with High Performance Archiecture
 
Deep Learning at Scale
Deep Learning at ScaleDeep Learning at Scale
Deep Learning at Scale
 
Ceph Day Bring Ceph To Enterprise
Ceph Day Bring Ceph To EnterpriseCeph Day Bring Ceph To Enterprise
Ceph Day Bring Ceph To Enterprise
 
Best Deep Learning Post from LinkedIn Group
Best Deep Learning Post from LinkedIn Group Best Deep Learning Post from LinkedIn Group
Best Deep Learning Post from LinkedIn Group
 
R in Minecraft
R in Minecraft R in Minecraft
R in Minecraft
 
Cassandra Summit 2014: Down with Tweaking! Removing Tunable Complexity for Ca...
Cassandra Summit 2014: Down with Tweaking! Removing Tunable Complexity for Ca...Cassandra Summit 2014: Down with Tweaking! Removing Tunable Complexity for Ca...
Cassandra Summit 2014: Down with Tweaking! Removing Tunable Complexity for Ca...
 
My Old Friend Malloc
My Old Friend MallocMy Old Friend Malloc
My Old Friend Malloc
 
OFY-2015-Cloud-In-A-Day
OFY-2015-Cloud-In-A-DayOFY-2015-Cloud-In-A-Day
OFY-2015-Cloud-In-A-Day
 
JavaScript, Meet Cloud : Node.js on Windows Azure
JavaScript, Meet Cloud : Node.js on Windows AzureJavaScript, Meet Cloud : Node.js on Windows Azure
JavaScript, Meet Cloud : Node.js on Windows Azure
 
Kauli SSPにおけるVyOSの導入事例
Kauli SSPにおけるVyOSの導入事例Kauli SSPにおけるVyOSの導入事例
Kauli SSPにおけるVyOSの導入事例
 

Similar to Bare metal Hadoop provisioning

Puppet and Apache CloudStack
Puppet and Apache CloudStackPuppet and Apache CloudStack
Puppet and Apache CloudStack
Puppet
 
Infrastructure as code with Puppet and Apache CloudStack
Infrastructure as code with Puppet and Apache CloudStackInfrastructure as code with Puppet and Apache CloudStack
Infrastructure as code with Puppet and Apache CloudStackke4qqq
 
Puppet and CloudStack
Puppet and CloudStackPuppet and CloudStack
Puppet and CloudStackke4qqq
 
Cobbler, Func and Puppet: Tools for Large Scale Environments
Cobbler, Func and Puppet: Tools for Large Scale EnvironmentsCobbler, Func and Puppet: Tools for Large Scale Environments
Cobbler, Func and Puppet: Tools for Large Scale Environments
Michael Zhang
 
Cobbler, Func and Puppet: Tools for Large Scale Environments
Cobbler, Func and Puppet: Tools for Large Scale EnvironmentsCobbler, Func and Puppet: Tools for Large Scale Environments
Cobbler, Func and Puppet: Tools for Large Scale Environments
ViSenze - Artificial Intelligence for the Visual Web
 
Deploying Foreman in Enterprise Environments
Deploying Foreman in Enterprise EnvironmentsDeploying Foreman in Enterprise Environments
Deploying Foreman in Enterprise Environments
inovex GmbH
 
Puppetpreso
PuppetpresoPuppetpreso
Puppetpresoke4qqq
 
NetBSD on Google Compute Engine (en)
NetBSD on Google Compute Engine (en)NetBSD on Google Compute Engine (en)
NetBSD on Google Compute Engine (en)
Ryo ONODERA
 
Qemu device prototyping
Qemu device prototypingQemu device prototyping
Qemu device prototyping
Yan Vugenfirer
 
LSA2 - 02 Namespaces
LSA2 - 02  NamespacesLSA2 - 02  Namespaces
LSA2 - 02 Namespaces
Marian Marinov
 
Using cobbler in a not so small environment 1.77
Using cobbler in a not so small environment 1.77Using cobbler in a not so small environment 1.77
Using cobbler in a not so small environment 1.77
chhorn
 
Kernel Recipes 2015 - Kernel dump analysis
Kernel Recipes 2015 - Kernel dump analysisKernel Recipes 2015 - Kernel dump analysis
Kernel Recipes 2015 - Kernel dump analysis
Anne Nicolas
 
Lessons from Driverless AI going to Production
Lessons from Driverless AI going to ProductionLessons from Driverless AI going to Production
Lessons from Driverless AI going to Production
Sri Ambati
 
Introduction to Stacki - World's fastest Linux server provisioning Tool
Introduction to Stacki - World's fastest Linux server provisioning ToolIntroduction to Stacki - World's fastest Linux server provisioning Tool
Introduction to Stacki - World's fastest Linux server provisioning Tool
Suresh Paulraj
 
Hadoop Cluster - Basic OS Setup Insights
Hadoop Cluster - Basic OS Setup InsightsHadoop Cluster - Basic OS Setup Insights
Hadoop Cluster - Basic OS Setup Insights
Sruthi Kumar Annamnidu
 
the NML project
the NML projectthe NML project
the NML projectLei Yang
 
NFD9 - Matt Peterson, Data Center Operations
NFD9 - Matt Peterson, Data Center OperationsNFD9 - Matt Peterson, Data Center Operations
NFD9 - Matt Peterson, Data Center Operations
Cumulus Networks
 
Kubernetes on Bare Metal at the Kitchener-Waterloo Kubernetes and Cloud Nativ...
Kubernetes on Bare Metal at the Kitchener-Waterloo Kubernetes and Cloud Nativ...Kubernetes on Bare Metal at the Kitchener-Waterloo Kubernetes and Cloud Nativ...
Kubernetes on Bare Metal at the Kitchener-Waterloo Kubernetes and Cloud Nativ...
CloudOps2005
 
Postgres the hardway
Postgres the hardwayPostgres the hardway
Postgres the hardway
Dave Pitts
 
Couch to OpenStack: Nova - July, 30, 2013
Couch to OpenStack: Nova - July, 30, 2013Couch to OpenStack: Nova - July, 30, 2013
Couch to OpenStack: Nova - July, 30, 2013
Trevor Roberts Jr.
 

Similar to Bare metal Hadoop provisioning (20)

Puppet and Apache CloudStack
Puppet and Apache CloudStackPuppet and Apache CloudStack
Puppet and Apache CloudStack
 
Infrastructure as code with Puppet and Apache CloudStack
Infrastructure as code with Puppet and Apache CloudStackInfrastructure as code with Puppet and Apache CloudStack
Infrastructure as code with Puppet and Apache CloudStack
 
Puppet and CloudStack
Puppet and CloudStackPuppet and CloudStack
Puppet and CloudStack
 
Cobbler, Func and Puppet: Tools for Large Scale Environments
Cobbler, Func and Puppet: Tools for Large Scale EnvironmentsCobbler, Func and Puppet: Tools for Large Scale Environments
Cobbler, Func and Puppet: Tools for Large Scale Environments
 
Cobbler, Func and Puppet: Tools for Large Scale Environments
Cobbler, Func and Puppet: Tools for Large Scale EnvironmentsCobbler, Func and Puppet: Tools for Large Scale Environments
Cobbler, Func and Puppet: Tools for Large Scale Environments
 
Deploying Foreman in Enterprise Environments
Deploying Foreman in Enterprise EnvironmentsDeploying Foreman in Enterprise Environments
Deploying Foreman in Enterprise Environments
 
Puppetpreso
PuppetpresoPuppetpreso
Puppetpreso
 
NetBSD on Google Compute Engine (en)
NetBSD on Google Compute Engine (en)NetBSD on Google Compute Engine (en)
NetBSD on Google Compute Engine (en)
 
Qemu device prototyping
Qemu device prototypingQemu device prototyping
Qemu device prototyping
 
LSA2 - 02 Namespaces
LSA2 - 02  NamespacesLSA2 - 02  Namespaces
LSA2 - 02 Namespaces
 
Using cobbler in a not so small environment 1.77
Using cobbler in a not so small environment 1.77Using cobbler in a not so small environment 1.77
Using cobbler in a not so small environment 1.77
 
Kernel Recipes 2015 - Kernel dump analysis
Kernel Recipes 2015 - Kernel dump analysisKernel Recipes 2015 - Kernel dump analysis
Kernel Recipes 2015 - Kernel dump analysis
 
Lessons from Driverless AI going to Production
Lessons from Driverless AI going to ProductionLessons from Driverless AI going to Production
Lessons from Driverless AI going to Production
 
Introduction to Stacki - World's fastest Linux server provisioning Tool
Introduction to Stacki - World's fastest Linux server provisioning ToolIntroduction to Stacki - World's fastest Linux server provisioning Tool
Introduction to Stacki - World's fastest Linux server provisioning Tool
 
Hadoop Cluster - Basic OS Setup Insights
Hadoop Cluster - Basic OS Setup InsightsHadoop Cluster - Basic OS Setup Insights
Hadoop Cluster - Basic OS Setup Insights
 
the NML project
the NML projectthe NML project
the NML project
 
NFD9 - Matt Peterson, Data Center Operations
NFD9 - Matt Peterson, Data Center OperationsNFD9 - Matt Peterson, Data Center Operations
NFD9 - Matt Peterson, Data Center Operations
 
Kubernetes on Bare Metal at the Kitchener-Waterloo Kubernetes and Cloud Nativ...
Kubernetes on Bare Metal at the Kitchener-Waterloo Kubernetes and Cloud Nativ...Kubernetes on Bare Metal at the Kitchener-Waterloo Kubernetes and Cloud Nativ...
Kubernetes on Bare Metal at the Kitchener-Waterloo Kubernetes and Cloud Nativ...
 
Postgres the hardway
Postgres the hardwayPostgres the hardway
Postgres the hardway
 
Couch to OpenStack: Nova - July, 30, 2013
Couch to OpenStack: Nova - July, 30, 2013Couch to OpenStack: Nova - July, 30, 2013
Couch to OpenStack: Nova - July, 30, 2013
 

More from GoDataDriven

Streamlining Data Science Workflows with a Feature Catalog
Streamlining Data Science Workflows with a Feature CatalogStreamlining Data Science Workflows with a Feature Catalog
Streamlining Data Science Workflows with a Feature Catalog
GoDataDriven
 
Visualizing Big Data in a Small Screen
Visualizing Big Data in a Small ScreenVisualizing Big Data in a Small Screen
Visualizing Big Data in a Small Screen
GoDataDriven
 
Building a Scalable and reliable open source ML Platform with MLFlow
Building a Scalable and reliable open source ML Platform with MLFlowBuilding a Scalable and reliable open source ML Platform with MLFlow
Building a Scalable and reliable open source ML Platform with MLFlow
GoDataDriven
 
Training Taster: Leading the way to become a data-driven organization
Training Taster: Leading the way to become a data-driven organizationTraining Taster: Leading the way to become a data-driven organization
Training Taster: Leading the way to become a data-driven organization
GoDataDriven
 
My Path From Data Engineer to Analytics Engineer
My Path From Data Engineer to Analytics EngineerMy Path From Data Engineer to Analytics Engineer
My Path From Data Engineer to Analytics Engineer
GoDataDriven
 
dbt Python models - GoDataFest by Guillermo Sanchez
dbt Python models - GoDataFest by Guillermo Sanchezdbt Python models - GoDataFest by Guillermo Sanchez
dbt Python models - GoDataFest by Guillermo Sanchez
GoDataDriven
 
Workshop on Google Cloud Data Platform
Workshop on Google Cloud Data PlatformWorkshop on Google Cloud Data Platform
Workshop on Google Cloud Data Platform
GoDataDriven
 
How to create a Devcontainer for your Python project
How to create a Devcontainer for your Python projectHow to create a Devcontainer for your Python project
How to create a Devcontainer for your Python project
GoDataDriven
 
Using Graph Neural Networks To Embrace The Dependency In Your Data by Usman Z...
Using Graph Neural Networks To Embrace The Dependency In Your Data by Usman Z...Using Graph Neural Networks To Embrace The Dependency In Your Data by Usman Z...
Using Graph Neural Networks To Embrace The Dependency In Your Data by Usman Z...
GoDataDriven
 
Common Issues With Time Series by Vadim Nelidov - GoDataFest 2022
Common Issues With Time Series by Vadim Nelidov - GoDataFest 2022Common Issues With Time Series by Vadim Nelidov - GoDataFest 2022
Common Issues With Time Series by Vadim Nelidov - GoDataFest 2022
GoDataDriven
 
MLOps CodeBreakfast on AWS - GoDataFest 2022
MLOps CodeBreakfast on AWS - GoDataFest 2022MLOps CodeBreakfast on AWS - GoDataFest 2022
MLOps CodeBreakfast on AWS - GoDataFest 2022
GoDataDriven
 
MLOps CodeBreakfast on Azure - GoDataFest 2022
MLOps CodeBreakfast on Azure - GoDataFest 2022MLOps CodeBreakfast on Azure - GoDataFest 2022
MLOps CodeBreakfast on Azure - GoDataFest 2022
GoDataDriven
 
Tableau vs. Power BI by Juan Manuel Perafan - GoDataFest 2022
Tableau vs. Power BI by Juan Manuel Perafan - GoDataFest 2022Tableau vs. Power BI by Juan Manuel Perafan - GoDataFest 2022
Tableau vs. Power BI by Juan Manuel Perafan - GoDataFest 2022
GoDataDriven
 
Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022
Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022
Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022
GoDataDriven
 
AWS Well-Architected Webinar Security - Ben de Haan
AWS Well-Architected Webinar Security - Ben de HaanAWS Well-Architected Webinar Security - Ben de Haan
AWS Well-Architected Webinar Security - Ben de Haan
GoDataDriven
 
The 7 Habits of Effective Data Driven Companies
The 7 Habits of Effective Data Driven CompaniesThe 7 Habits of Effective Data Driven Companies
The 7 Habits of Effective Data Driven Companies
GoDataDriven
 
DevOps for Data Science on Azure - Marcel de Vries (Xpirit) and Niels Zeilema...
DevOps for Data Science on Azure - Marcel de Vries (Xpirit) and Niels Zeilema...DevOps for Data Science on Azure - Marcel de Vries (Xpirit) and Niels Zeilema...
DevOps for Data Science on Azure - Marcel de Vries (Xpirit) and Niels Zeilema...
GoDataDriven
 
Artificial intelligence in actions: delivering a new experience to Formula 1 ...
Artificial intelligence in actions: delivering a new experience to Formula 1 ...Artificial intelligence in actions: delivering a new experience to Formula 1 ...
Artificial intelligence in actions: delivering a new experience to Formula 1 ...
GoDataDriven
 
Smart application on Azure at Vattenfall - Rens Weijers & Peter van 't Hof
Smart application on Azure at Vattenfall - Rens Weijers & Peter van 't HofSmart application on Azure at Vattenfall - Rens Weijers & Peter van 't Hof
Smart application on Azure at Vattenfall - Rens Weijers & Peter van 't Hof
GoDataDriven
 
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
GoDataDriven
 

More from GoDataDriven (20)

Streamlining Data Science Workflows with a Feature Catalog
Streamlining Data Science Workflows with a Feature CatalogStreamlining Data Science Workflows with a Feature Catalog
Streamlining Data Science Workflows with a Feature Catalog
 
Visualizing Big Data in a Small Screen
Visualizing Big Data in a Small ScreenVisualizing Big Data in a Small Screen
Visualizing Big Data in a Small Screen
 
Building a Scalable and reliable open source ML Platform with MLFlow
Building a Scalable and reliable open source ML Platform with MLFlowBuilding a Scalable and reliable open source ML Platform with MLFlow
Building a Scalable and reliable open source ML Platform with MLFlow
 
Training Taster: Leading the way to become a data-driven organization
Training Taster: Leading the way to become a data-driven organizationTraining Taster: Leading the way to become a data-driven organization
Training Taster: Leading the way to become a data-driven organization
 
My Path From Data Engineer to Analytics Engineer
My Path From Data Engineer to Analytics EngineerMy Path From Data Engineer to Analytics Engineer
My Path From Data Engineer to Analytics Engineer
 
dbt Python models - GoDataFest by Guillermo Sanchez
dbt Python models - GoDataFest by Guillermo Sanchezdbt Python models - GoDataFest by Guillermo Sanchez
dbt Python models - GoDataFest by Guillermo Sanchez
 
Workshop on Google Cloud Data Platform
Workshop on Google Cloud Data PlatformWorkshop on Google Cloud Data Platform
Workshop on Google Cloud Data Platform
 
How to create a Devcontainer for your Python project
How to create a Devcontainer for your Python projectHow to create a Devcontainer for your Python project
How to create a Devcontainer for your Python project
 
Using Graph Neural Networks To Embrace The Dependency In Your Data by Usman Z...
Using Graph Neural Networks To Embrace The Dependency In Your Data by Usman Z...Using Graph Neural Networks To Embrace The Dependency In Your Data by Usman Z...
Using Graph Neural Networks To Embrace The Dependency In Your Data by Usman Z...
 
Common Issues With Time Series by Vadim Nelidov - GoDataFest 2022
Common Issues With Time Series by Vadim Nelidov - GoDataFest 2022Common Issues With Time Series by Vadim Nelidov - GoDataFest 2022
Common Issues With Time Series by Vadim Nelidov - GoDataFest 2022
 
MLOps CodeBreakfast on AWS - GoDataFest 2022
MLOps CodeBreakfast on AWS - GoDataFest 2022MLOps CodeBreakfast on AWS - GoDataFest 2022
MLOps CodeBreakfast on AWS - GoDataFest 2022
 
MLOps CodeBreakfast on Azure - GoDataFest 2022
MLOps CodeBreakfast on Azure - GoDataFest 2022MLOps CodeBreakfast on Azure - GoDataFest 2022
MLOps CodeBreakfast on Azure - GoDataFest 2022
 
Tableau vs. Power BI by Juan Manuel Perafan - GoDataFest 2022
Tableau vs. Power BI by Juan Manuel Perafan - GoDataFest 2022Tableau vs. Power BI by Juan Manuel Perafan - GoDataFest 2022
Tableau vs. Power BI by Juan Manuel Perafan - GoDataFest 2022
 
Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022
Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022
Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022
 
AWS Well-Architected Webinar Security - Ben de Haan
AWS Well-Architected Webinar Security - Ben de HaanAWS Well-Architected Webinar Security - Ben de Haan
AWS Well-Architected Webinar Security - Ben de Haan
 
The 7 Habits of Effective Data Driven Companies
The 7 Habits of Effective Data Driven CompaniesThe 7 Habits of Effective Data Driven Companies
The 7 Habits of Effective Data Driven Companies
 
DevOps for Data Science on Azure - Marcel de Vries (Xpirit) and Niels Zeilema...
DevOps for Data Science on Azure - Marcel de Vries (Xpirit) and Niels Zeilema...DevOps for Data Science on Azure - Marcel de Vries (Xpirit) and Niels Zeilema...
DevOps for Data Science on Azure - Marcel de Vries (Xpirit) and Niels Zeilema...
 
Artificial intelligence in actions: delivering a new experience to Formula 1 ...
Artificial intelligence in actions: delivering a new experience to Formula 1 ...Artificial intelligence in actions: delivering a new experience to Formula 1 ...
Artificial intelligence in actions: delivering a new experience to Formula 1 ...
 
Smart application on Azure at Vattenfall - Rens Weijers & Peter van 't Hof
Smart application on Azure at Vattenfall - Rens Weijers & Peter van 't HofSmart application on Azure at Vattenfall - Rens Weijers & Peter van 't Hof
Smart application on Azure at Vattenfall - Rens Weijers & Peter van 't Hof
 
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
 

Recently uploaded

Osisko Development - Investor Presentation - June 24
Osisko Development - Investor Presentation - June 24Osisko Development - Investor Presentation - June 24
Osisko Development - Investor Presentation - June 24
Philip Rabenok
 
Corporate Presentation Probe June 2024.pdf
Corporate Presentation Probe June 2024.pdfCorporate Presentation Probe June 2024.pdf
Corporate Presentation Probe June 2024.pdf
Probe Gold
 
Investor Day 2024 Presentation Sysco 2024
Investor Day 2024 Presentation Sysco 2024Investor Day 2024 Presentation Sysco 2024
Investor Day 2024 Presentation Sysco 2024
Sysco_Investors
 
Snam 2023-27 Industrial Plan - Financial Presentation
Snam 2023-27 Industrial Plan - Financial PresentationSnam 2023-27 Industrial Plan - Financial Presentation
Snam 2023-27 Industrial Plan - Financial Presentation
Valentina Ottini
 
cyberagent_For New Investors_EN_240424.pdf
cyberagent_For New Investors_EN_240424.pdfcyberagent_For New Investors_EN_240424.pdf
cyberagent_For New Investors_EN_240424.pdf
CyberAgent, Inc.
 
Collective Mining | Corporate Presentation - May 2024
Collective Mining | Corporate Presentation - May 2024Collective Mining | Corporate Presentation - May 2024
Collective Mining | Corporate Presentation - May 2024
CollectiveMining1
 
一比一原版(UW毕业证)华盛顿大学毕业证成绩单专业办理
一比一原版(UW毕业证)华盛顿大学毕业证成绩单专业办理一比一原版(UW毕业证)华盛顿大学毕业证成绩单专业办理
一比一原版(UW毕业证)华盛顿大学毕业证成绩单专业办理
ybout
 
2024-deutsche-bank-global-consumer-conference.pdf
2024-deutsche-bank-global-consumer-conference.pdf2024-deutsche-bank-global-consumer-conference.pdf
2024-deutsche-bank-global-consumer-conference.pdf
Sysco_Investors
 

Recently uploaded (8)

Osisko Development - Investor Presentation - June 24
Osisko Development - Investor Presentation - June 24Osisko Development - Investor Presentation - June 24
Osisko Development - Investor Presentation - June 24
 
Corporate Presentation Probe June 2024.pdf
Corporate Presentation Probe June 2024.pdfCorporate Presentation Probe June 2024.pdf
Corporate Presentation Probe June 2024.pdf
 
Investor Day 2024 Presentation Sysco 2024
Investor Day 2024 Presentation Sysco 2024Investor Day 2024 Presentation Sysco 2024
Investor Day 2024 Presentation Sysco 2024
 
Snam 2023-27 Industrial Plan - Financial Presentation
Snam 2023-27 Industrial Plan - Financial PresentationSnam 2023-27 Industrial Plan - Financial Presentation
Snam 2023-27 Industrial Plan - Financial Presentation
 
cyberagent_For New Investors_EN_240424.pdf
cyberagent_For New Investors_EN_240424.pdfcyberagent_For New Investors_EN_240424.pdf
cyberagent_For New Investors_EN_240424.pdf
 
Collective Mining | Corporate Presentation - May 2024
Collective Mining | Corporate Presentation - May 2024Collective Mining | Corporate Presentation - May 2024
Collective Mining | Corporate Presentation - May 2024
 
一比一原版(UW毕业证)华盛顿大学毕业证成绩单专业办理
一比一原版(UW毕业证)华盛顿大学毕业证成绩单专业办理一比一原版(UW毕业证)华盛顿大学毕业证成绩单专业办理
一比一原版(UW毕业证)华盛顿大学毕业证成绩单专业办理
 
2024-deutsche-bank-global-consumer-conference.pdf
2024-deutsche-bank-global-consumer-conference.pdf2024-deutsche-bank-global-consumer-conference.pdf
2024-deutsche-bank-global-consumer-conference.pdf
 

Bare metal Hadoop provisioning

  • 1. GoDataDriven PROUDLY PART OF THE XEBIA GROUP @krisgeus krisgeusebroek@godatadriven.com Bare metal Hadoop provisioning Kris Geusebroek Big Data Hacker With ansible and cobbler 1
  • 2. -- Big Data Borat “Give man Hadoop cluster he gain insight for a day. Teach man build Hadoop cluster he soon leave for better job. #bigdata” 2
  • 4. Don’t want to... Manually install everything needed for a Hadoop cluster... 4
  • 5. Separate layers... - Hardware - OS - Basic install and configuration (Firewalls, IPSec, IPV6, NTPd, raise ulimits, disk formatting and mounting) - Cluster install (Cloudera Manager or Hortonworks Data Platform) - Extra stuff (Monitoring Ganglia, R & R-packages, ......) 5
  • 6. Want... - Horizontal scaling: Effort for an extra machine is minimal - Commodity Industry standard hardware - So cope with errors, malfunctioning, re-installation - Multiple clusters - Experiment first with appropriate configuration for a specific goal -Think memory, hard disks, number of nodes 6
  • 7. Want... - Automate all the tasks for every layer - Parameterise a lot - Simple configuration of the separate layers - Definition of roles (masternode, datanode etc.) 7
  • 8. Possible with... Vendor specific tools problem here is they can do only a subset of all tasks 8
  • 9. What we have done here... Nothing new, just another possibility Nothing tool specific - demo installs Cloudera Manager, but works also with Hortonworks Data Platform. Most important is: 9
  • 11. -- Big Data Borat “Essentially, this solution is CoSSaaS.” 11
  • 12. -- Big Data Borat “Essentially, this solution is CoSSaaS. (Couple of Shell Scripts as a Service)” 12
  • 13. Cobbler... Cobbler used for - CMS - DHCP server - OS image hosting - OS kickstart cobblerd.org 13
  • 14. Ansible... Ansible used for -Tying it all together - Initial setup of network config - One time push of SSH key - Full software install ansible.cc 14
  • 15. Cloudera Manager... Cloudera Manager used for - Cluster install software. - Currently manual labour, can be automated using the API cloudera.com 15
  • 16. Show me the code... Add node information to the cobbler CMS First make the install dvd known to cobbler: mount -t iso9660 -o loop /<directoryname>/CentOS-6.4-x86_64-bin-DVD1.iso /mnt/dvd cobbler import --path=/mnt/dvd --name=CentOS64 Next make the node information known: sudo cobbler system add --name=node01 --profile=CentOS64-x86_64 --hostname=node01 --mac=<00:00:00:00:00:00> --ip-address=10.20.0.101 --static=True If needed, re-enable the netboot flag: sudo cobbler system edit --name=node01 --netboot-enabled=True 16
  • 17. Show me the code... Ansible needs to know what goes where [cluster] node01 node02 node03 [cobbler] cobbler [proxy] cobbler [ganglia-master] node01 [ganglia-nodes:children] cluster [cloudera-manager] node01 17
  • 18. Show me the code... For the rest it’s just a DSL thinghy with extra’s - hosts: - cloudera-manager - cluster user: root sudo: yes vars_files: - vars/common.yml tasks: - include: cloudera-manager/tasks/common.yml handlers: - include: cloudera-manager/handlers/main.yml - name: Configure CM4 Repo copy: src=cloudera-manager/files/etc/yum.repos.d/cm4.repo dest=/etc/yum.repos.d/ owner=root group=root - name: Install CM4 common stuff yum: name=$item state=installed 18
  • 20. Shared problems... - No magic: Vendor specific hardware can screw things up (strange names for disk mounts for example) - Bios settings, different RAID settings are not handled (yet). - Large amount of initial network traffic with large clusters (N-times downloading the same software packages from yum repositories) => Repo mirroring to the rescue - MAC address of all nodes must be known 20
  • 21. Take aways... - Do automate from the start - It’s easy - Use (our) open source code to get a head start https://github.com/godatadriven/ansible_cluster - Our team will do the additional consultancy 21
  • 22. GoDataDriven We’re hiring / Questions? / Thank you! @krisgeus krisgeusebroek@godatadriven.com Kris Geusebroek Big Data Hacker 22