SlideShare a Scribd company logo
1 of 77
Download to read offline
How Bigtop Leveraged Docker
for Build Automation and
One-Click Hadoop Provisioning
Evans Ye
Apache Big Data 2015

Budapest
Who am I
• Apache Bigtop PMC member
• Software Engineer at Trend Micro
• Develop Big Data platform
• Develop cyber security solutions using Big Data
Outline
• What is Apache Bigtop?
• Achieving Build Automation
• One-Click Hadoop Provisioning
• Bigtop at Trend Micro
What is Apache Bigtop ?
Bigtop is a project for
Packaging Testing Deployment Virtualization
for you to easily build your own Big Data Stack
• Build RPM, DEB packages for Hadoop ecosystem
• $ yum -y install hadoop
• $ apt-get upgrade hbase
• Why not just untar ?
• Version control, upgrade/downgrade, dependency
management
• Unified view of configuration, log, binary directories
• Daemons for better process management
Packaging
• You can find binary repos here:
• http://www.apache.org/dist/bigtop/bigtop-1.0.0/repos/
Supported Linux distro.
Supported components
• A fundamental testing problem may happen in your 

Big Data Stack, too
Testing
Hadoop 2.4.1
Hadoop 2.6.0
Hive 1.1.0
Hadoop 2.7.1
Hive 1.2.0
Hive 1.2.1
Tez 0.6.0
Tez 0.6.2
Tez 0.7.0
• A fundamental testing problem may happen in your 

Big Data Stack, too
Testing
Hadoop 2.4.1
Hadoop 2.6.0
Hive 1.1.0
Hadoop 2.7.1
Hive 1.2.0
Hive 1.2.1
Tez 0.6.0
Tez 0.6.2
Tez 0.7.0
• A fundamental testing problem may happen in your 

Big Data Stack, too
Testing
Hadoop 2.4.1
Hadoop 2.6.0
Hive 1.1.0
Hadoop 2.7.1
Hive 1.2.0
Hive 1.2.1
Tez 0.6.0
Tez 0.6.2
Tez 0.7.0
Does this combination still work?
There’s a need for
integration tests !
• Bigtop has a testing framework that provides APIs
• Built-in tests that Bigtop provides:
• smoke-tests
• Basic Hadoop ecosystem interoperability
• integration tests
• Advanced tests runs from jar files
Bigtop Tests
Okay, 

then how to run those tests?
• Use Bigtop Puppet to deploy a fully functional,
distributed Hadoop cluster
• Integration tests running on a fully distributed cluster
• Bigtop Puppet
• Masterless Puppet
• Numbers of components are supported
• Kerberos enabled cluster deployment
Deployment
Where to deploy ?
• Automate the infrastructure creation & setup
• Bigtop Provisioner
• Virtualbox VMs auto-provisioning
• Docker containers auto-provisioning
Virtualization
• Hadoop App developers
• Provision Hadoop cluster to test your code on
• Cluster administrators
• Use Bigtop Puppet to deploy and manage your cluster
• Run Bigtop tests to ensure your cluster is working
• Vendors
• Build your own Hadoop distribution, customized 

from Bigtop bits
From user point of view
Achieving Build Automation
What is Apache Bigtop?
One-Click Hadoop Provisioning
Bigtop at Trend Micro
• As you can see, 

we support a number of Linux distributions (m)
What’s the challenge ?
• Numbers of components (n)
What’s the challenge ?
We need to build them all 

—> m * n
Preparing build environment
…

Seriously ?
Preparing build environment
• Puppet recipes automatically install required 

libraries, build tools
• To transform a machine into a build environment, simply do
• $ gradle toolchain
• Prerequisites :
• Puppet 3.x
• Java 7
Bigtop Toolchain
CI infra
CentOS slave
Fedora slave
Ubuntu slave
Debian slave
OpenSuSE slave
CI infra
CentOS slave
Fedora slave
Ubuntu slave
Debian slave
OpenSuSE slave
Bigtop Toolchain
Bigtop Toolchain
Bigtop Toolchain
Bigtop Toolchain
Bigtop Toolchain
!?
https://www.flickr.com/photos/kit4na/6385016345
We got Bigtop Toolchain
for you.
Hi Bigtop, How do I test
my packaging code?
A contributor A committer
(Don’t want to answer…)
I need to setup all the
Linux environments on
my laptop?
A contributor A committer
• Three key techniques in Docker
• Use Linux Kernel’s Namespaces to create 

isolated resources
• Use Linux Kernel’s cgroups to constrain 

resource usage
• Union file system that supports git-like 

image creation
Docker - How it works
• Lightweight
• Fast creation
• Repeatable
• Portability
• runs on *any* Linux environment
The nice things about it
• Run on any kind of Linux distribution in one 

single machine
Best fit in our use case
OS specific Bins/Libs
Ship build environment 

in Docker images
Bigtop/slaves images
Dockerhub official images
Install Puppet
bigtop/slaves
Install Bigtop Toolchain
• $ git clone https://github.com/apache/bigtop.git
• $ docker run 

--rm 

--volume `pwd`/bigtop:/bigtop 

--workdir /bigtop 

bigtop/slaves:centos-7 

bash -l -c ‘./gradlew rpm’
One click to build packages
Now, easy to build & test 

on your laptop
Linux
boot2docker
CentOS
slave
Ubuntu
slave
Mac OS X

Windows
CentOS
slave
Ubuntu
slave
Docker engine
Docker engine
Flexible CI infra
CentOS slave
Fedora slave
Ubuntu slave
Debian slave
OpenSuSE slave
• Fault tolerance
• Immutable
Flexible CI infra
CentOS slave
Fedora slave
Ubuntu slave
Debian slave
OpenSuSE slave
• Fault tolerance
• Immutable
One-Click Hadoop Provisioning
What is Apache Bigtop?
Achieving Build Automation
Bigtop at Trend Micro
• A tool to demonstrate the full life cycle of Bigtop
Bigtop Provisioner
Packaging TestingDeploymentVirtualization
Create resources Run Bigtop Puppet Run Bigtop Tests
Bigtop Provisioner
• Fast iterative development
• Test your code in the cluster, on your laptop, w/o human intervention
• Flexibility
• Choose any combination of components as you want
• Responsive CI
• Integration tests that get you the result in mins
• A Big Data Stack playground
• Spark + Tachyon, Spark + Ignite, etc
The goal
Vagrant

+

Automation Code

+

Bigtop Puppet
One-click Hadoop Provisioning
[Bigtop
Provisioner
w/ provider
=
• We use Vagrant as an abstraction layer to support
different kind of resource providers
Vagrant
Providers
One click Hadoop provisioning
./docker-hadoop.sh -c 3
bigtop/deploy image 

on Docker hub
./docker-hadoop.sh -c 3
One click Hadoop provisioning
bigtop/deploy image 

on Docker hub
./docker-hadoop.sh -c 3
puppet apply
puppet apply
puppet apply
One click Hadoop provisioning
Bigtop/deploy images
Dockerhub official images
install Puppet
bigtop/deploy
install Vagrant ssh key
Bigtop image hierarchy
Dockerhub official images
bigtop/puppet
bigtop/deploybigtop/slaves
bigtop/puppet
=
diff
All on the Docker hub
https://hub.docker.com/u/bigtop/
A demo is worth 1k words
Oops! demo fail…

https://asciinema.org/a/55aw3zl2g3dzsfe69198homrl
• Supported providers in Bigtop 1.0.0 release
• Virtaulbox VM
• Docker container
• OpenStack support is in master branch now
Bigtop Provisioner
• For Hadoop app developers, cluster admins, users
• Run a Hadoop cluster to test your code on
• Try & test configurations before applying to Production
• Play around with Bigtop Big Data Stack
• For contributors
• Easy to test your packaging, deployment, testing code
• For vendors
• CI out of the box —> patch upstream code made easier
Use cases
Bigtop at Trend Micro
What is Apache Bigtop?
Achieving Build Automation
One Click Hadoop Provisioning
• Use Bigtop as the basis for our internal custom
distribution of Hadoop
• Apply community, internal patches to upstream
projects for business and operational need
• Newest TMH7 is based on Bigtop 1.0 SNAPSHOT
Trend Micro Hadoop (TMH)
Working with community
made our life easier
• Knowing community status made TMH7 release 

based on Bigtop 1.0 SNAPSHOT possible
Working with community
made our life easier
• Contribute Bigtop Provisioner, packaging code,
puppet recipes, bugfixes, CI infra, anything!
• Knowing community status made TMH7 release 

based on Bigtop 1.0 SNAPSHOT possible
Working with community
made our life easier
• Leverage Bigtop smoke tests and integration tests 

with Bigtop Provisioner to evaluate TMH7
Working with community
made our life easier
• Contribute feedback, evaluation, use case
through Production level adoption
• Leverage Bigtop smoke tests and integration tests 

with Bigtop Provisioner to evaluate TMH7
Hadoop YARN
Hadoop HDFS
Mapreduce
Ad-hoc Query UDFs
Pig
App A App C
Oozie
Resource
Management
Storage
Processing
Engine
APIs and

Interfases
In-house 

Apps
Trend Micro Big Data Stack
Powered by Bigtop
Kerberos
App B App D
HBase
Wuji
Solr
Cloud
Challenges we’re facing !
• We’re moving our cluster to another Data Center
• New cluster will be running TMH7
• Need Distcp to sync data from old TMH6 cluster
• Both are Kerberos enabled secure clusters
Migration x Upgrade
Sync data between clusters
Distcp between

TMH6 and TMH7 ?
Hadoop 2.0.0 Hadoop 2.6.0
• Hadoop version
• Protocols (hdfs, hftp, webhdfs)
• Where to issue Distcp (Where to run MR job)
• Secure or not
Factors that matter
Things always get complicated
with Kerberos…
• We need to do Distcp across 4 secure clusters

—> Kerberos cross realm auth across 4 clusters
• Our Hadoop management tool needs to support
this through auto-configuring
• Developing the management tool is challenging !
Kerberos cross realm authentication
• Run multiple secure HA clusters on Docker
• Dev & test the management tool iteratively
Docker comes to the rescue
TMH6 Production TMH7 Production
TMH6 Staging TMH7 Staging
Hadoop apps 

dev & test on Docker
• A Devops toolkit for Hadoop app developer 

to develop and test its code on
• Provides fixed Big Data Stack preload images

—> dev & test env up and running w/o deployment

—> support end-to-end CI test for apps
• A Hadoop env for apps to test against our new 

Hadoop distribution
• Same features are also delivered in Vagrant boxes
• https://github.com/evans-ye/hadoocker
Hadoocker
internal Docker registry
./execute.sh
Hadoop server
Hadoop client
data
Docker based dev & test env
TMH7
Hadoop app
Restful 

APIs
sample data
hadoop fs put
internal Docker registry
./execute.sh
Hadoop server
Hadoop client
data
TMH7
Hadoop app
Restful 

APIs
sample data
hadoop fs put…Solr
Oozie(Wuji)
Dependency service
Docker based dev & test env
Summary
• Bigtop helps you to build your own Big Data Stack
• Building packages made easier by offering immutable
build environment through Docker images
• Bigtop Provisioner creates you a Hadoop cluster by 

one click with flexibility
• Use Docker to simulate complex environment 

to ease development and testing efforts
• Ship apps in Docker images to dev & test anywhere
Summary
Wait,

how’s the demo ?
Questions ?
Thank you !

More Related Content

What's hot

Open Source Recipes for Chef Deployments of Hadoop
Open Source Recipes for Chef Deployments of HadoopOpen Source Recipes for Chef Deployments of Hadoop
Open Source Recipes for Chef Deployments of HadoopDataWorks Summit
 
DockerCon Keynote Ben Golub
DockerCon Keynote Ben GolubDockerCon Keynote Ben Golub
DockerCon Keynote Ben GolubdotCloud
 
Docker open stack boston
Docker open stack bostonDocker open stack boston
Docker open stack bostondotCloud
 
Building a smarter application Stack by Tomas Doran from Yelp
Building a smarter application Stack by Tomas Doran from YelpBuilding a smarter application Stack by Tomas Doran from Yelp
Building a smarter application Stack by Tomas Doran from YelpdotCloud
 
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...Daniel Krook
 
Are VM Passé?
Are VM Passé? Are VM Passé?
Are VM Passé? dotCloud
 
Ninja, Choose Your Weapon!
Ninja, Choose Your Weapon!Ninja, Choose Your Weapon!
Ninja, Choose Your Weapon!Anton Weiss
 
Docker in pratice -chenyifei
Docker in pratice -chenyifeiDocker in pratice -chenyifei
Docker in pratice -chenyifeidotCloud
 
Docker, the Future of DevOps
Docker, the Future of DevOpsDocker, the Future of DevOps
Docker, the Future of DevOpsandersjanmyr
 
Galera on kubernetes_no_video
Galera on kubernetes_no_videoGalera on kubernetes_no_video
Galera on kubernetes_no_videoPatrick Galbraith
 
OpenShift, Docker, Kubernetes: The next generation of PaaS
OpenShift, Docker, Kubernetes: The next generation of PaaSOpenShift, Docker, Kubernetes: The next generation of PaaS
OpenShift, Docker, Kubernetes: The next generation of PaaSGraham Dumpleton
 
Why kubernetes matters
Why kubernetes mattersWhy kubernetes matters
Why kubernetes mattersPlatform9
 
Build a PaaS with OpenShift Origin
Build a PaaS with OpenShift OriginBuild a PaaS with OpenShift Origin
Build a PaaS with OpenShift OriginSteven Pousty
 
Persistent storage tailored for containers
Persistent storage tailored for containersPersistent storage tailored for containers
Persistent storage tailored for containersDocker, Inc.
 
Docker and Containers overview - Docker Workshop
Docker and Containers overview - Docker WorkshopDocker and Containers overview - Docker Workshop
Docker and Containers overview - Docker WorkshopJonas Rosland
 
Microservices using relocatable Docker containers
Microservices using relocatable Docker containersMicroservices using relocatable Docker containers
Microservices using relocatable Docker containersMauricio Garavaglia
 
How kubernetes works community, velocity, and contribution - osls 2017 (1)
How kubernetes works  community, velocity, and contribution - osls 2017 (1)How kubernetes works  community, velocity, and contribution - osls 2017 (1)
How kubernetes works community, velocity, and contribution - osls 2017 (1)Brian Grant
 
PuppetConf 2016: Changing the Engine While in Flight – Neil Armitage, VMware
PuppetConf 2016: Changing the Engine While in Flight – Neil Armitage, VMwarePuppetConf 2016: Changing the Engine While in Flight – Neil Armitage, VMware
PuppetConf 2016: Changing the Engine While in Flight – Neil Armitage, VMwarePuppet
 
Package Management and Chef - ChefConf 2015
Package Management and Chef - ChefConf 2015Package Management and Chef - ChefConf 2015
Package Management and Chef - ChefConf 2015Chef
 

What's hot (20)

Open Source Recipes for Chef Deployments of Hadoop
Open Source Recipes for Chef Deployments of HadoopOpen Source Recipes for Chef Deployments of Hadoop
Open Source Recipes for Chef Deployments of Hadoop
 
DockerCon Keynote Ben Golub
DockerCon Keynote Ben GolubDockerCon Keynote Ben Golub
DockerCon Keynote Ben Golub
 
Docker open stack boston
Docker open stack bostonDocker open stack boston
Docker open stack boston
 
Building a smarter application Stack by Tomas Doran from Yelp
Building a smarter application Stack by Tomas Doran from YelpBuilding a smarter application Stack by Tomas Doran from Yelp
Building a smarter application Stack by Tomas Doran from Yelp
 
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
 
7+1 myths of the new os
7+1 myths of the new os7+1 myths of the new os
7+1 myths of the new os
 
Are VM Passé?
Are VM Passé? Are VM Passé?
Are VM Passé?
 
Ninja, Choose Your Weapon!
Ninja, Choose Your Weapon!Ninja, Choose Your Weapon!
Ninja, Choose Your Weapon!
 
Docker in pratice -chenyifei
Docker in pratice -chenyifeiDocker in pratice -chenyifei
Docker in pratice -chenyifei
 
Docker, the Future of DevOps
Docker, the Future of DevOpsDocker, the Future of DevOps
Docker, the Future of DevOps
 
Galera on kubernetes_no_video
Galera on kubernetes_no_videoGalera on kubernetes_no_video
Galera on kubernetes_no_video
 
OpenShift, Docker, Kubernetes: The next generation of PaaS
OpenShift, Docker, Kubernetes: The next generation of PaaSOpenShift, Docker, Kubernetes: The next generation of PaaS
OpenShift, Docker, Kubernetes: The next generation of PaaS
 
Why kubernetes matters
Why kubernetes mattersWhy kubernetes matters
Why kubernetes matters
 
Build a PaaS with OpenShift Origin
Build a PaaS with OpenShift OriginBuild a PaaS with OpenShift Origin
Build a PaaS with OpenShift Origin
 
Persistent storage tailored for containers
Persistent storage tailored for containersPersistent storage tailored for containers
Persistent storage tailored for containers
 
Docker and Containers overview - Docker Workshop
Docker and Containers overview - Docker WorkshopDocker and Containers overview - Docker Workshop
Docker and Containers overview - Docker Workshop
 
Microservices using relocatable Docker containers
Microservices using relocatable Docker containersMicroservices using relocatable Docker containers
Microservices using relocatable Docker containers
 
How kubernetes works community, velocity, and contribution - osls 2017 (1)
How kubernetes works  community, velocity, and contribution - osls 2017 (1)How kubernetes works  community, velocity, and contribution - osls 2017 (1)
How kubernetes works community, velocity, and contribution - osls 2017 (1)
 
PuppetConf 2016: Changing the Engine While in Flight – Neil Armitage, VMware
PuppetConf 2016: Changing the Engine While in Flight – Neil Armitage, VMwarePuppetConf 2016: Changing the Engine While in Flight – Neil Armitage, VMware
PuppetConf 2016: Changing the Engine While in Flight – Neil Armitage, VMware
 
Package Management and Chef - ChefConf 2015
Package Management and Chef - ChefConf 2015Package Management and Chef - ChefConf 2015
Package Management and Chef - ChefConf 2015
 

Viewers also liked

Industry Automation with Programmable Logic Controller
Industry Automation with Programmable Logic ControllerIndustry Automation with Programmable Logic Controller
Industry Automation with Programmable Logic ControllerArifur Rahman
 
Hadoop on Docker
Hadoop on DockerHadoop on Docker
Hadoop on DockerRakesh Saha
 
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...Hortonworks
 
Intro to Puppet Enterprise
Intro to Puppet EnterpriseIntro to Puppet Enterprise
Intro to Puppet EnterprisePuppet
 
Puppet Primer, Robbie Jerrom, Solution Architect VMware
Puppet Primer, Robbie Jerrom, Solution Architect VMwarePuppet Primer, Robbie Jerrom, Solution Architect VMware
Puppet Primer, Robbie Jerrom, Solution Architect VMwaresubtitle
 
Using the SDACK Architecture to Build a Big Data Product
Using the SDACK Architecture to Build a Big Data ProductUsing the SDACK Architecture to Build a Big Data Product
Using the SDACK Architecture to Build a Big Data ProductEvans Ye
 

Viewers also liked (6)

Industry Automation with Programmable Logic Controller
Industry Automation with Programmable Logic ControllerIndustry Automation with Programmable Logic Controller
Industry Automation with Programmable Logic Controller
 
Hadoop on Docker
Hadoop on DockerHadoop on Docker
Hadoop on Docker
 
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
 
Intro to Puppet Enterprise
Intro to Puppet EnterpriseIntro to Puppet Enterprise
Intro to Puppet Enterprise
 
Puppet Primer, Robbie Jerrom, Solution Architect VMware
Puppet Primer, Robbie Jerrom, Solution Architect VMwarePuppet Primer, Robbie Jerrom, Solution Architect VMware
Puppet Primer, Robbie Jerrom, Solution Architect VMware
 
Using the SDACK Architecture to Build a Big Data Product
Using the SDACK Architecture to Build a Big Data ProductUsing the SDACK Architecture to Build a Big Data Product
Using the SDACK Architecture to Build a Big Data Product
 

Similar to How bigtop leveraged docker for build automation and one click hadoop provisioning @ Apache Big Data EU 2015

Leveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioningLeveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioningEvans Ye
 
Leveraging Docker for Hadoop build automation and Big Data stack provisioning
Leveraging Docker for Hadoop build automation and Big Data stack provisioningLeveraging Docker for Hadoop build automation and Big Data stack provisioning
Leveraging Docker for Hadoop build automation and Big Data stack provisioningDataWorks Summit
 
321 codeincontainer brewbox
321 codeincontainer brewbox321 codeincontainer brewbox
321 codeincontainer brewboxLino Telera
 
Write Once and REALLY Run Anywhere | OpenStack Summit HK 2013
Write Once and REALLY Run Anywhere | OpenStack Summit HK 2013Write Once and REALLY Run Anywhere | OpenStack Summit HK 2013
Write Once and REALLY Run Anywhere | OpenStack Summit HK 2013dotCloud
 
Habitat Overview
Habitat OverviewHabitat Overview
Habitat OverviewMandi Walls
 
Habitat Workshop at Velocity London 2017
Habitat Workshop at Velocity London 2017Habitat Workshop at Velocity London 2017
Habitat Workshop at Velocity London 2017Mandi Walls
 
Detailed Introduction To Docker
Detailed Introduction To DockerDetailed Introduction To Docker
Detailed Introduction To Dockernklmish
 
Trend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopTrend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopEvans Ye
 
Intro Docker october 2013
Intro Docker october 2013Intro Docker october 2013
Intro Docker october 2013dotCloud
 
Habitat talk at CodeMonsters Sofia, Bulgaria Nov 27 2018
Habitat talk at CodeMonsters Sofia, Bulgaria Nov 27 2018Habitat talk at CodeMonsters Sofia, Bulgaria Nov 27 2018
Habitat talk at CodeMonsters Sofia, Bulgaria Nov 27 2018Mandi Walls
 
habitat at docker bud
habitat at docker budhabitat at docker bud
habitat at docker budMandi Walls
 
Fluo CICD OpenStack Summit
Fluo CICD OpenStack SummitFluo CICD OpenStack Summit
Fluo CICD OpenStack SummitMiguel Zuniga
 
Intro to Docker November 2013
Intro to Docker November 2013Intro to Docker November 2013
Intro to Docker November 2013Docker, Inc.
 
State of Big Data on ARM64 / AArch64 - Apache Bigtop
State of Big Data on ARM64 / AArch64 - Apache BigtopState of Big Data on ARM64 / AArch64 - Apache Bigtop
State of Big Data on ARM64 / AArch64 - Apache BigtopGanesh Raju
 
Linux containers and docker
Linux containers and dockerLinux containers and docker
Linux containers and dockerFabio Fumarola
 

Similar to How bigtop leveraged docker for build automation and one click hadoop provisioning @ Apache Big Data EU 2015 (20)

Leveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioningLeveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioning
 
Leveraging Docker for Hadoop build automation and Big Data stack provisioning
Leveraging Docker for Hadoop build automation and Big Data stack provisioningLeveraging Docker for Hadoop build automation and Big Data stack provisioning
Leveraging Docker for Hadoop build automation and Big Data stack provisioning
 
321 codeincontainer brewbox
321 codeincontainer brewbox321 codeincontainer brewbox
321 codeincontainer brewbox
 
Write Once and REALLY Run Anywhere | OpenStack Summit HK 2013
Write Once and REALLY Run Anywhere | OpenStack Summit HK 2013Write Once and REALLY Run Anywhere | OpenStack Summit HK 2013
Write Once and REALLY Run Anywhere | OpenStack Summit HK 2013
 
Habitat Overview
Habitat OverviewHabitat Overview
Habitat Overview
 
OpenStack Summit
OpenStack SummitOpenStack Summit
OpenStack Summit
 
Habitat Workshop at Velocity London 2017
Habitat Workshop at Velocity London 2017Habitat Workshop at Velocity London 2017
Habitat Workshop at Velocity London 2017
 
Detailed Introduction To Docker
Detailed Introduction To DockerDetailed Introduction To Docker
Detailed Introduction To Docker
 
Trend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopTrend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache Bigtop
 
Intro Docker october 2013
Intro Docker october 2013Intro Docker october 2013
Intro Docker october 2013
 
Habitat talk at CodeMonsters Sofia, Bulgaria Nov 27 2018
Habitat talk at CodeMonsters Sofia, Bulgaria Nov 27 2018Habitat talk at CodeMonsters Sofia, Bulgaria Nov 27 2018
Habitat talk at CodeMonsters Sofia, Bulgaria Nov 27 2018
 
habitat at docker bud
habitat at docker budhabitat at docker bud
habitat at docker bud
 
Devops
DevopsDevops
Devops
 
Docker Introduction
Docker IntroductionDocker Introduction
Docker Introduction
 
Fluo CICD OpenStack Summit
Fluo CICD OpenStack SummitFluo CICD OpenStack Summit
Fluo CICD OpenStack Summit
 
Webinar Docker Tri Series
Webinar Docker Tri SeriesWebinar Docker Tri Series
Webinar Docker Tri Series
 
Intro to Docker November 2013
Intro to Docker November 2013Intro to Docker November 2013
Intro to Docker November 2013
 
State of Big Data on ARM64 / AArch64 - Apache Bigtop
State of Big Data on ARM64 / AArch64 - Apache BigtopState of Big Data on ARM64 / AArch64 - Apache Bigtop
State of Big Data on ARM64 / AArch64 - Apache Bigtop
 
Linux containers and docker
Linux containers and dockerLinux containers and docker
Linux containers and docker
 
Docker-Intro
Docker-IntroDocker-Intro
Docker-Intro
 

More from Evans Ye

Join ASF to Unlock Full Possibilities of Your Professional Career.pdf
Join ASF to Unlock Full Possibilities of Your Professional Career.pdfJoin ASF to Unlock Full Possibilities of Your Professional Career.pdf
Join ASF to Unlock Full Possibilities of Your Professional Career.pdfEvans Ye
 
非常人走非常路:參與ASF打世界杯比賽
非常人走非常路:參與ASF打世界杯比賽非常人走非常路:參與ASF打世界杯比賽
非常人走非常路:參與ASF打世界杯比賽Evans Ye
 
TensorFlow on Spark: A Deep Dive into Distributed Deep Learning
TensorFlow on Spark: A Deep Dive into Distributed Deep LearningTensorFlow on Spark: A Deep Dive into Distributed Deep Learning
TensorFlow on Spark: A Deep Dive into Distributed Deep LearningEvans Ye
 
2017 big data landscape and cutting edge innovations public
2017 big data landscape and cutting edge innovations public2017 big data landscape and cutting edge innovations public
2017 big data landscape and cutting edge innovations publicEvans Ye
 
ONE FOR ALL! Using Apache Calcite to make SQL smart
ONE FOR ALL! Using Apache Calcite to make SQL smartONE FOR ALL! Using Apache Calcite to make SQL smart
ONE FOR ALL! Using Apache Calcite to make SQL smartEvans Ye
 
The Apache Way: A Proven Way Toward Success
The Apache Way: A Proven Way Toward SuccessThe Apache Way: A Proven Way Toward Success
The Apache Way: A Proven Way Toward SuccessEvans Ye
 
The Apache Way
The Apache WayThe Apache Way
The Apache WayEvans Ye
 
Docker workshop
Docker workshopDocker workshop
Docker workshopEvans Ye
 
Fits docker into devops
Fits docker into devopsFits docker into devops
Fits docker into devopsEvans Ye
 
Getting involved in world class software engineering tips and tricks to join ...
Getting involved in world class software engineering tips and tricks to join ...Getting involved in world class software engineering tips and tricks to join ...
Getting involved in world class software engineering tips and tricks to join ...Evans Ye
 
Deep dive into enterprise data lake through Impala
Deep dive into enterprise data lake through ImpalaDeep dive into enterprise data lake through Impala
Deep dive into enterprise data lake through ImpalaEvans Ye
 
How we lose etu hadoop competition
How we lose etu hadoop competitionHow we lose etu hadoop competition
How we lose etu hadoop competitionEvans Ye
 
Network Traffic Search using Apache HBase
Network Traffic Search using Apache HBaseNetwork Traffic Search using Apache HBase
Network Traffic Search using Apache HBaseEvans Ye
 
Building hadoop based big data environment
Building hadoop based big data environmentBuilding hadoop based big data environment
Building hadoop based big data environmentEvans Ye
 
Hdfs ha using journal nodes
Hdfs ha using journal nodesHdfs ha using journal nodes
Hdfs ha using journal nodesEvans Ye
 
How to be a star engineer
How to be a star engineerHow to be a star engineer
How to be a star engineerEvans Ye
 

More from Evans Ye (17)

Join ASF to Unlock Full Possibilities of Your Professional Career.pdf
Join ASF to Unlock Full Possibilities of Your Professional Career.pdfJoin ASF to Unlock Full Possibilities of Your Professional Career.pdf
Join ASF to Unlock Full Possibilities of Your Professional Career.pdf
 
非常人走非常路:參與ASF打世界杯比賽
非常人走非常路:參與ASF打世界杯比賽非常人走非常路:參與ASF打世界杯比賽
非常人走非常路:參與ASF打世界杯比賽
 
TensorFlow on Spark: A Deep Dive into Distributed Deep Learning
TensorFlow on Spark: A Deep Dive into Distributed Deep LearningTensorFlow on Spark: A Deep Dive into Distributed Deep Learning
TensorFlow on Spark: A Deep Dive into Distributed Deep Learning
 
2017 big data landscape and cutting edge innovations public
2017 big data landscape and cutting edge innovations public2017 big data landscape and cutting edge innovations public
2017 big data landscape and cutting edge innovations public
 
ONE FOR ALL! Using Apache Calcite to make SQL smart
ONE FOR ALL! Using Apache Calcite to make SQL smartONE FOR ALL! Using Apache Calcite to make SQL smart
ONE FOR ALL! Using Apache Calcite to make SQL smart
 
The Apache Way: A Proven Way Toward Success
The Apache Way: A Proven Way Toward SuccessThe Apache Way: A Proven Way Toward Success
The Apache Way: A Proven Way Toward Success
 
The Apache Way
The Apache WayThe Apache Way
The Apache Way
 
Docker workshop
Docker workshopDocker workshop
Docker workshop
 
Fits docker into devops
Fits docker into devopsFits docker into devops
Fits docker into devops
 
Getting involved in world class software engineering tips and tricks to join ...
Getting involved in world class software engineering tips and tricks to join ...Getting involved in world class software engineering tips and tricks to join ...
Getting involved in world class software engineering tips and tricks to join ...
 
Deep dive into enterprise data lake through Impala
Deep dive into enterprise data lake through ImpalaDeep dive into enterprise data lake through Impala
Deep dive into enterprise data lake through Impala
 
How we lose etu hadoop competition
How we lose etu hadoop competitionHow we lose etu hadoop competition
How we lose etu hadoop competition
 
Network Traffic Search using Apache HBase
Network Traffic Search using Apache HBaseNetwork Traffic Search using Apache HBase
Network Traffic Search using Apache HBase
 
Vagrant
VagrantVagrant
Vagrant
 
Building hadoop based big data environment
Building hadoop based big data environmentBuilding hadoop based big data environment
Building hadoop based big data environment
 
Hdfs ha using journal nodes
Hdfs ha using journal nodesHdfs ha using journal nodes
Hdfs ha using journal nodes
 
How to be a star engineer
How to be a star engineerHow to be a star engineer
How to be a star engineer
 

Recently uploaded

The Core Functions of the Bangko Sentral ng Pilipinas
The Core Functions of the Bangko Sentral ng PilipinasThe Core Functions of the Bangko Sentral ng Pilipinas
The Core Functions of the Bangko Sentral ng PilipinasCherylouCamus
 
BPPG response - Options for Defined Benefit schemes - 19Apr24.pdf
BPPG response - Options for Defined Benefit schemes - 19Apr24.pdfBPPG response - Options for Defined Benefit schemes - 19Apr24.pdf
BPPG response - Options for Defined Benefit schemes - 19Apr24.pdfHenry Tapper
 
Call Girls Near Delhi Pride Hotel, New Delhi|9873777170
Call Girls Near Delhi Pride Hotel, New Delhi|9873777170Call Girls Near Delhi Pride Hotel, New Delhi|9873777170
Call Girls Near Delhi Pride Hotel, New Delhi|9873777170Sonam Pathan
 
government_intervention_in_business_ownership[1].pdf
government_intervention_in_business_ownership[1].pdfgovernment_intervention_in_business_ownership[1].pdf
government_intervention_in_business_ownership[1].pdfshaunmashale756
 
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...Henry Tapper
 
Stock Market Brief Deck for 4/24/24 .pdf
Stock Market Brief Deck for 4/24/24 .pdfStock Market Brief Deck for 4/24/24 .pdf
Stock Market Brief Deck for 4/24/24 .pdfMichael Silva
 
原版1:1复刻堪萨斯大学毕业证KU毕业证留信学历认证
原版1:1复刻堪萨斯大学毕业证KU毕业证留信学历认证原版1:1复刻堪萨斯大学毕业证KU毕业证留信学历认证
原版1:1复刻堪萨斯大学毕业证KU毕业证留信学历认证jdkhjh
 
magnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdf
magnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdfmagnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdf
magnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdfHenry Tapper
 
(办理学位证)加拿大萨省大学毕业证成绩单原版一比一
(办理学位证)加拿大萨省大学毕业证成绩单原版一比一(办理学位证)加拿大萨省大学毕业证成绩单原版一比一
(办理学位证)加拿大萨省大学毕业证成绩单原版一比一S SDS
 
Bladex 1Q24 Earning Results Presentation
Bladex 1Q24 Earning Results PresentationBladex 1Q24 Earning Results Presentation
Bladex 1Q24 Earning Results PresentationBladex
 
House of Commons ; CDC schemes overview document
House of Commons ; CDC schemes overview documentHouse of Commons ; CDC schemes overview document
House of Commons ; CDC schemes overview documentHenry Tapper
 
Call Girls Near Me WhatsApp:+91-9833363713
Call Girls Near Me WhatsApp:+91-9833363713Call Girls Near Me WhatsApp:+91-9833363713
Call Girls Near Me WhatsApp:+91-9833363713Sonam Pathan
 
AfRESFullPaper22018EmpiricalPerformanceofRealEstateInvestmentTrustsandShareho...
AfRESFullPaper22018EmpiricalPerformanceofRealEstateInvestmentTrustsandShareho...AfRESFullPaper22018EmpiricalPerformanceofRealEstateInvestmentTrustsandShareho...
AfRESFullPaper22018EmpiricalPerformanceofRealEstateInvestmentTrustsandShareho...yordanosyohannes2
 
Call Girls Near Golden Tulip Essential Hotel, New Delhi 9873777170
Call Girls Near Golden Tulip Essential Hotel, New Delhi 9873777170Call Girls Near Golden Tulip Essential Hotel, New Delhi 9873777170
Call Girls Near Golden Tulip Essential Hotel, New Delhi 9873777170Sonam Pathan
 
Economic Risk Factor Update: April 2024 [SlideShare]
Economic Risk Factor Update: April 2024 [SlideShare]Economic Risk Factor Update: April 2024 [SlideShare]
Economic Risk Factor Update: April 2024 [SlideShare]Commonwealth
 
Call Girls In Yusuf Sarai Women Seeking Men 9654467111
Call Girls In Yusuf Sarai Women Seeking Men 9654467111Call Girls In Yusuf Sarai Women Seeking Men 9654467111
Call Girls In Yusuf Sarai Women Seeking Men 9654467111Sapana Sha
 
NO1 WorldWide online istikhara for love marriage vashikaran specialist love p...
NO1 WorldWide online istikhara for love marriage vashikaran specialist love p...NO1 WorldWide online istikhara for love marriage vashikaran specialist love p...
NO1 WorldWide online istikhara for love marriage vashikaran specialist love p...Amil Baba Dawood bangali
 
NO1 WorldWide Love marriage specialist baba ji Amil Baba Kala ilam powerful v...
NO1 WorldWide Love marriage specialist baba ji Amil Baba Kala ilam powerful v...NO1 WorldWide Love marriage specialist baba ji Amil Baba Kala ilam powerful v...
NO1 WorldWide Love marriage specialist baba ji Amil Baba Kala ilam powerful v...Amil baba
 
Classical Theory of Macroeconomics by Adam Smith
Classical Theory of Macroeconomics by Adam SmithClassical Theory of Macroeconomics by Adam Smith
Classical Theory of Macroeconomics by Adam SmithAdamYassin2
 
fca-bsps-decision-letter-redacted (1).pdf
fca-bsps-decision-letter-redacted (1).pdffca-bsps-decision-letter-redacted (1).pdf
fca-bsps-decision-letter-redacted (1).pdfHenry Tapper
 

Recently uploaded (20)

The Core Functions of the Bangko Sentral ng Pilipinas
The Core Functions of the Bangko Sentral ng PilipinasThe Core Functions of the Bangko Sentral ng Pilipinas
The Core Functions of the Bangko Sentral ng Pilipinas
 
BPPG response - Options for Defined Benefit schemes - 19Apr24.pdf
BPPG response - Options for Defined Benefit schemes - 19Apr24.pdfBPPG response - Options for Defined Benefit schemes - 19Apr24.pdf
BPPG response - Options for Defined Benefit schemes - 19Apr24.pdf
 
Call Girls Near Delhi Pride Hotel, New Delhi|9873777170
Call Girls Near Delhi Pride Hotel, New Delhi|9873777170Call Girls Near Delhi Pride Hotel, New Delhi|9873777170
Call Girls Near Delhi Pride Hotel, New Delhi|9873777170
 
government_intervention_in_business_ownership[1].pdf
government_intervention_in_business_ownership[1].pdfgovernment_intervention_in_business_ownership[1].pdf
government_intervention_in_business_ownership[1].pdf
 
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
 
Stock Market Brief Deck for 4/24/24 .pdf
Stock Market Brief Deck for 4/24/24 .pdfStock Market Brief Deck for 4/24/24 .pdf
Stock Market Brief Deck for 4/24/24 .pdf
 
原版1:1复刻堪萨斯大学毕业证KU毕业证留信学历认证
原版1:1复刻堪萨斯大学毕业证KU毕业证留信学历认证原版1:1复刻堪萨斯大学毕业证KU毕业证留信学历认证
原版1:1复刻堪萨斯大学毕业证KU毕业证留信学历认证
 
magnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdf
magnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdfmagnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdf
magnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdf
 
(办理学位证)加拿大萨省大学毕业证成绩单原版一比一
(办理学位证)加拿大萨省大学毕业证成绩单原版一比一(办理学位证)加拿大萨省大学毕业证成绩单原版一比一
(办理学位证)加拿大萨省大学毕业证成绩单原版一比一
 
Bladex 1Q24 Earning Results Presentation
Bladex 1Q24 Earning Results PresentationBladex 1Q24 Earning Results Presentation
Bladex 1Q24 Earning Results Presentation
 
House of Commons ; CDC schemes overview document
House of Commons ; CDC schemes overview documentHouse of Commons ; CDC schemes overview document
House of Commons ; CDC schemes overview document
 
Call Girls Near Me WhatsApp:+91-9833363713
Call Girls Near Me WhatsApp:+91-9833363713Call Girls Near Me WhatsApp:+91-9833363713
Call Girls Near Me WhatsApp:+91-9833363713
 
AfRESFullPaper22018EmpiricalPerformanceofRealEstateInvestmentTrustsandShareho...
AfRESFullPaper22018EmpiricalPerformanceofRealEstateInvestmentTrustsandShareho...AfRESFullPaper22018EmpiricalPerformanceofRealEstateInvestmentTrustsandShareho...
AfRESFullPaper22018EmpiricalPerformanceofRealEstateInvestmentTrustsandShareho...
 
Call Girls Near Golden Tulip Essential Hotel, New Delhi 9873777170
Call Girls Near Golden Tulip Essential Hotel, New Delhi 9873777170Call Girls Near Golden Tulip Essential Hotel, New Delhi 9873777170
Call Girls Near Golden Tulip Essential Hotel, New Delhi 9873777170
 
Economic Risk Factor Update: April 2024 [SlideShare]
Economic Risk Factor Update: April 2024 [SlideShare]Economic Risk Factor Update: April 2024 [SlideShare]
Economic Risk Factor Update: April 2024 [SlideShare]
 
Call Girls In Yusuf Sarai Women Seeking Men 9654467111
Call Girls In Yusuf Sarai Women Seeking Men 9654467111Call Girls In Yusuf Sarai Women Seeking Men 9654467111
Call Girls In Yusuf Sarai Women Seeking Men 9654467111
 
NO1 WorldWide online istikhara for love marriage vashikaran specialist love p...
NO1 WorldWide online istikhara for love marriage vashikaran specialist love p...NO1 WorldWide online istikhara for love marriage vashikaran specialist love p...
NO1 WorldWide online istikhara for love marriage vashikaran specialist love p...
 
NO1 WorldWide Love marriage specialist baba ji Amil Baba Kala ilam powerful v...
NO1 WorldWide Love marriage specialist baba ji Amil Baba Kala ilam powerful v...NO1 WorldWide Love marriage specialist baba ji Amil Baba Kala ilam powerful v...
NO1 WorldWide Love marriage specialist baba ji Amil Baba Kala ilam powerful v...
 
Classical Theory of Macroeconomics by Adam Smith
Classical Theory of Macroeconomics by Adam SmithClassical Theory of Macroeconomics by Adam Smith
Classical Theory of Macroeconomics by Adam Smith
 
fca-bsps-decision-letter-redacted (1).pdf
fca-bsps-decision-letter-redacted (1).pdffca-bsps-decision-letter-redacted (1).pdf
fca-bsps-decision-letter-redacted (1).pdf
 

How bigtop leveraged docker for build automation and one click hadoop provisioning @ Apache Big Data EU 2015

  • 1. How Bigtop Leveraged Docker for Build Automation and One-Click Hadoop Provisioning Evans Ye Apache Big Data 2015
 Budapest
  • 2. Who am I • Apache Bigtop PMC member • Software Engineer at Trend Micro • Develop Big Data platform • Develop cyber security solutions using Big Data
  • 3. Outline • What is Apache Bigtop? • Achieving Build Automation • One-Click Hadoop Provisioning • Bigtop at Trend Micro
  • 4. What is Apache Bigtop ?
  • 5. Bigtop is a project for Packaging Testing Deployment Virtualization for you to easily build your own Big Data Stack
  • 6. • Build RPM, DEB packages for Hadoop ecosystem • $ yum -y install hadoop • $ apt-get upgrade hbase • Why not just untar ? • Version control, upgrade/downgrade, dependency management • Unified view of configuration, log, binary directories • Daemons for better process management Packaging
  • 7. • You can find binary repos here: • http://www.apache.org/dist/bigtop/bigtop-1.0.0/repos/ Supported Linux distro.
  • 9. • A fundamental testing problem may happen in your 
 Big Data Stack, too Testing Hadoop 2.4.1 Hadoop 2.6.0 Hive 1.1.0 Hadoop 2.7.1 Hive 1.2.0 Hive 1.2.1 Tez 0.6.0 Tez 0.6.2 Tez 0.7.0
  • 10. • A fundamental testing problem may happen in your 
 Big Data Stack, too Testing Hadoop 2.4.1 Hadoop 2.6.0 Hive 1.1.0 Hadoop 2.7.1 Hive 1.2.0 Hive 1.2.1 Tez 0.6.0 Tez 0.6.2 Tez 0.7.0
  • 11. • A fundamental testing problem may happen in your 
 Big Data Stack, too Testing Hadoop 2.4.1 Hadoop 2.6.0 Hive 1.1.0 Hadoop 2.7.1 Hive 1.2.0 Hive 1.2.1 Tez 0.6.0 Tez 0.6.2 Tez 0.7.0 Does this combination still work?
  • 12. There’s a need for integration tests !
  • 13. • Bigtop has a testing framework that provides APIs • Built-in tests that Bigtop provides: • smoke-tests • Basic Hadoop ecosystem interoperability • integration tests • Advanced tests runs from jar files Bigtop Tests
  • 14. Okay, 
 then how to run those tests?
  • 15. • Use Bigtop Puppet to deploy a fully functional, distributed Hadoop cluster • Integration tests running on a fully distributed cluster • Bigtop Puppet • Masterless Puppet • Numbers of components are supported • Kerberos enabled cluster deployment Deployment
  • 17. • Automate the infrastructure creation & setup • Bigtop Provisioner • Virtualbox VMs auto-provisioning • Docker containers auto-provisioning Virtualization
  • 18. • Hadoop App developers • Provision Hadoop cluster to test your code on • Cluster administrators • Use Bigtop Puppet to deploy and manage your cluster • Run Bigtop tests to ensure your cluster is working • Vendors • Build your own Hadoop distribution, customized 
 from Bigtop bits From user point of view
  • 19. Achieving Build Automation What is Apache Bigtop? One-Click Hadoop Provisioning Bigtop at Trend Micro
  • 20. • As you can see, 
 we support a number of Linux distributions (m) What’s the challenge ?
  • 21. • Numbers of components (n) What’s the challenge ?
  • 22. We need to build them all 
 —> m * n
  • 25. • Puppet recipes automatically install required 
 libraries, build tools • To transform a machine into a build environment, simply do • $ gradle toolchain • Prerequisites : • Puppet 3.x • Java 7 Bigtop Toolchain
  • 26. CI infra CentOS slave Fedora slave Ubuntu slave Debian slave OpenSuSE slave
  • 27. CI infra CentOS slave Fedora slave Ubuntu slave Debian slave OpenSuSE slave Bigtop Toolchain Bigtop Toolchain Bigtop Toolchain Bigtop Toolchain Bigtop Toolchain
  • 29. We got Bigtop Toolchain for you. Hi Bigtop, How do I test my packaging code? A contributor A committer
  • 30. (Don’t want to answer…) I need to setup all the Linux environments on my laptop? A contributor A committer
  • 31.
  • 32. • Three key techniques in Docker • Use Linux Kernel’s Namespaces to create 
 isolated resources • Use Linux Kernel’s cgroups to constrain 
 resource usage • Union file system that supports git-like 
 image creation Docker - How it works
  • 33. • Lightweight • Fast creation • Repeatable • Portability • runs on *any* Linux environment The nice things about it
  • 34. • Run on any kind of Linux distribution in one 
 single machine Best fit in our use case OS specific Bins/Libs
  • 35. Ship build environment 
 in Docker images
  • 36. Bigtop/slaves images Dockerhub official images Install Puppet bigtop/slaves Install Bigtop Toolchain
  • 37. • $ git clone https://github.com/apache/bigtop.git • $ docker run 
 --rm 
 --volume `pwd`/bigtop:/bigtop 
 --workdir /bigtop 
 bigtop/slaves:centos-7 
 bash -l -c ‘./gradlew rpm’ One click to build packages
  • 38. Now, easy to build & test 
 on your laptop Linux boot2docker CentOS slave Ubuntu slave Mac OS X
 Windows CentOS slave Ubuntu slave Docker engine Docker engine
  • 39. Flexible CI infra CentOS slave Fedora slave Ubuntu slave Debian slave OpenSuSE slave • Fault tolerance • Immutable
  • 40. Flexible CI infra CentOS slave Fedora slave Ubuntu slave Debian slave OpenSuSE slave • Fault tolerance • Immutable
  • 41. One-Click Hadoop Provisioning What is Apache Bigtop? Achieving Build Automation Bigtop at Trend Micro
  • 42. • A tool to demonstrate the full life cycle of Bigtop Bigtop Provisioner Packaging TestingDeploymentVirtualization Create resources Run Bigtop Puppet Run Bigtop Tests Bigtop Provisioner
  • 43. • Fast iterative development • Test your code in the cluster, on your laptop, w/o human intervention • Flexibility • Choose any combination of components as you want • Responsive CI • Integration tests that get you the result in mins • A Big Data Stack playground • Spark + Tachyon, Spark + Ignite, etc The goal
  • 44. Vagrant
 +
 Automation Code
 +
 Bigtop Puppet One-click Hadoop Provisioning [Bigtop Provisioner w/ provider =
  • 45. • We use Vagrant as an abstraction layer to support different kind of resource providers Vagrant Providers
  • 46. One click Hadoop provisioning ./docker-hadoop.sh -c 3
  • 47. bigtop/deploy image 
 on Docker hub ./docker-hadoop.sh -c 3 One click Hadoop provisioning
  • 48. bigtop/deploy image 
 on Docker hub ./docker-hadoop.sh -c 3 puppet apply puppet apply puppet apply One click Hadoop provisioning
  • 49. Bigtop/deploy images Dockerhub official images install Puppet bigtop/deploy install Vagrant ssh key
  • 50. Bigtop image hierarchy Dockerhub official images bigtop/puppet bigtop/deploybigtop/slaves bigtop/puppet = diff
  • 51. All on the Docker hub https://hub.docker.com/u/bigtop/
  • 52. A demo is worth 1k words Oops! demo fail…
 https://asciinema.org/a/55aw3zl2g3dzsfe69198homrl
  • 53. • Supported providers in Bigtop 1.0.0 release • Virtaulbox VM • Docker container • OpenStack support is in master branch now Bigtop Provisioner
  • 54. • For Hadoop app developers, cluster admins, users • Run a Hadoop cluster to test your code on • Try & test configurations before applying to Production • Play around with Bigtop Big Data Stack • For contributors • Easy to test your packaging, deployment, testing code • For vendors • CI out of the box —> patch upstream code made easier Use cases
  • 55. Bigtop at Trend Micro What is Apache Bigtop? Achieving Build Automation One Click Hadoop Provisioning
  • 56. • Use Bigtop as the basis for our internal custom distribution of Hadoop • Apply community, internal patches to upstream projects for business and operational need • Newest TMH7 is based on Bigtop 1.0 SNAPSHOT Trend Micro Hadoop (TMH)
  • 57. Working with community made our life easier • Knowing community status made TMH7 release 
 based on Bigtop 1.0 SNAPSHOT possible
  • 58. Working with community made our life easier • Contribute Bigtop Provisioner, packaging code, puppet recipes, bugfixes, CI infra, anything! • Knowing community status made TMH7 release 
 based on Bigtop 1.0 SNAPSHOT possible
  • 59. Working with community made our life easier • Leverage Bigtop smoke tests and integration tests 
 with Bigtop Provisioner to evaluate TMH7
  • 60. Working with community made our life easier • Contribute feedback, evaluation, use case through Production level adoption • Leverage Bigtop smoke tests and integration tests 
 with Bigtop Provisioner to evaluate TMH7
  • 61. Hadoop YARN Hadoop HDFS Mapreduce Ad-hoc Query UDFs Pig App A App C Oozie Resource Management Storage Processing Engine APIs and
 Interfases In-house 
 Apps Trend Micro Big Data Stack Powered by Bigtop Kerberos App B App D HBase Wuji Solr Cloud
  • 63. • We’re moving our cluster to another Data Center • New cluster will be running TMH7 • Need Distcp to sync data from old TMH6 cluster • Both are Kerberos enabled secure clusters Migration x Upgrade
  • 64. Sync data between clusters
  • 65. Distcp between
 TMH6 and TMH7 ? Hadoop 2.0.0 Hadoop 2.6.0
  • 66. • Hadoop version • Protocols (hdfs, hftp, webhdfs) • Where to issue Distcp (Where to run MR job) • Secure or not Factors that matter
  • 67. Things always get complicated with Kerberos…
  • 68. • We need to do Distcp across 4 secure clusters
 —> Kerberos cross realm auth across 4 clusters • Our Hadoop management tool needs to support this through auto-configuring • Developing the management tool is challenging ! Kerberos cross realm authentication
  • 69. • Run multiple secure HA clusters on Docker • Dev & test the management tool iteratively Docker comes to the rescue TMH6 Production TMH7 Production TMH6 Staging TMH7 Staging
  • 70. Hadoop apps 
 dev & test on Docker
  • 71. • A Devops toolkit for Hadoop app developer 
 to develop and test its code on • Provides fixed Big Data Stack preload images
 —> dev & test env up and running w/o deployment
 —> support end-to-end CI test for apps • A Hadoop env for apps to test against our new 
 Hadoop distribution • Same features are also delivered in Vagrant boxes • https://github.com/evans-ye/hadoocker Hadoocker
  • 72. internal Docker registry ./execute.sh Hadoop server Hadoop client data Docker based dev & test env TMH7 Hadoop app Restful 
 APIs sample data hadoop fs put
  • 73. internal Docker registry ./execute.sh Hadoop server Hadoop client data TMH7 Hadoop app Restful 
 APIs sample data hadoop fs put…Solr Oozie(Wuji) Dependency service Docker based dev & test env
  • 75. • Bigtop helps you to build your own Big Data Stack • Building packages made easier by offering immutable build environment through Docker images • Bigtop Provisioner creates you a Hadoop cluster by 
 one click with flexibility • Use Docker to simulate complex environment 
 to ease development and testing efforts • Ship apps in Docker images to dev & test anywhere Summary