SlideShare a Scribd company logo
LEVERAGING DOCKER FOR
HADOOP BUILD
AUTOMATION
AND 

BIG DATA STACK
PROVISIONING
Evans Ye, Sr. Software Engineer
DataWorks Summit San Jose 2017
Who am I
• Tech Lead @ APAC Data Team, Y! Taiwan
• Building data products for E-Commerce business
• PMC chair of Apache Bigtop, ASF member
2
Outline
• Quick Intro to Apache Bigtop
• Docker for Bigtop Packaging
• Docker for Bigtop Provisioner
• Docker for Bigtop Sandbox
• Releases
3
QUICK INTRO TO 

APACHE BIGTOP
4
Linux Distributions
5
Hadoop Distributions
6
But there're some other great
Hadoop ecosystem components..
7
How do I add patches?
8
9
From source code to packages
Bigtop

Packaging
10
Supported components
11
Bigtop feature set
Packaging Testing Deployment Virtualization
for you to easily build your own Big Data Stack
12
Community stats
• 94 total contributors
• Spark: 1093, Hadoop: 99, HBase: 126, Hive:115
• 5 years since 2012
• 30 Hadoop ecosystem components packaged
• 5 Linux Distro., 2 archs supported
13
DOCKER FOR 

BIGTOP PACKAGING
14
Preparing build environment
15
Preparing build environment
…

Seriously ?
16
Bigtop Toolchain
• Puppet recipes to install required libraries, build tools
• To prepare a build environment:
• Prerequisite :
▪ Java
git clone https://github.com/apache/bigtop.git
cd bigtop
./bigtop_toolchain/bin/puppetize.sh
./gradlew toolchain
17
CI infrastructure
CentOS slave
Fedora slave
Ubuntu slave
Debian slave
OpenSuSE slave
18
CI infrastructure
CentOS slave
Fedora slave
Ubuntu slave
Debian slave
OpenSuSE slave
Bigtop Toolchain
Bigtop Toolchain
Bigtop Toolchain
Bigtop Toolchain
Bigtop Toolchain
19
CI infrastructure
CentOS slave
Fedora slave
Ubuntu slave
Debian slave
OpenSuSE slave
Bigtop Toolchain
Bigtop Toolchain
Bigtop Toolchain
Bigtop Toolchain
Bigtop Toolchain
20
Dockerlized CI infrastructure
CentOS slave
Fedora slave
Ubuntu slave
Debian slave
OpenSuSE slave
• Immutable env
• Fault tolerance
21
Dockerlized CI infrastructure
CentOS slave
Fedora slave
Ubuntu slave
Debian slave
OpenSuSE slave
• Immutable env
• Fault tolerance
22
• Execute shell
• Bigtop CI Setup Guide
How to build packages
# OS=debian-8
# COMPONENT=hadoop
docker run -u jenkins --rm 
-v `pwd`:/bigtop --workdir /bigtop 
bigtop/slaves:trunk-$OS 
bash -l -c "./gradlew allclean $COMPONENT-pkg"
23
Bigtop packages on master
https://ci.bigtop.apache.org/view/Packages/job/Bigtop-trunk-packages/
24
• Example: How to port Bigtop Distribution to PPC64LE?
• Prepare PPC64LE docker base image
• Apply Bigtop Toolchain on PPC64LE docker image
• Build Bigtop packages on PPC64LE slaves image
• 2016: Ported 22 out of 24 Bigtop components in 2 weeks, with only 5 patches
• Credit: Amir Sanjar, IBM
Extremely friendly for porting
25
Bigtop early mission accomplished
Leveraged by app providers…
26
Get out from the Apache dome
27
New focus and target user
• Data engineers vs Distro. builders
• Solution diversity:
▪ Streaming: Flink, Apex
▪ In-memory cache: Alluxio, Ignite
▪ User/developer tools:
▪ Bigtop Provisioner
▪ Bigtop Sandbox
• Big data stack references
• Machine learning, deep learning components
28
DOCKER FOR 

BIGTOP PROVISIONER
29
Bigtop Provisioner
• A tool to demonstrate full life cycle of Bigtop
Packaging TestingDeploymentVirtualization
Create resources Run Bigtop Puppet Run Bigtop Tests
Bigtop Provisioner
30
• We use Vagrant as an abstraction layer to support
different kind of resource providers
Vagrant
Providers
One click Hadoop provisioning

(Bigtop 1.0.0)
bigtop/deploy image 

on Docker hub
./docker-hadoop.sh -c 3
puppet apply
puppet apply
puppet apply
32
https://cwiki.apache.org/confluence/display/BIGTOP/Bigtop+Provisioner+User+Guide
Problems with Vagrant’s Docker Provider
• Need to add vagrant public key into docker images
• Too many issues with auto-created boot2docker VM
• A bug for docker provider regarding provision keeps opening for 2 years
▪ Waiting for machine to boot' hangs infinitely
• Can not share same code for different providers anyway
• Not all the docker options supported in Vagrantfile
• ^#?& slow
33
Replaced by docker-compose 

(Bigtop 1.2.0)
./docker-hadoop.sh -c 3
puppet apply
puppet apply
puppet apply
34
bigtop/deploy image 

on Docker hub
Advantages
• No need to create customized image beforehand
• Better compatibility with Docker’s native solutions
• Clear, simple yaml file for orchestration settings
• Supports new features such as overlay network
• Leverage Swarm for multi-node cluster deployment
• Fast —> better user experience
35
• Execute shell
• Bigtop CI Setup Guide
How to run Docker Provisioner
# See bigtop/provisioner/docker/*.yaml
CONFIG=YOUR_CUSTOM_CONF.yaml
# provision
./gradlew -Pconfig=${CONFIG} -Pnum_instances=1 
docker-provisioner
# destroy provisioned cluster
./gradlew docker-provisioner-destroy
36
YOUR_CUSTOM_CONF.yaml example
37
docker:
memory_limit: "4g"
image: "bigtop/puppet:centos-7"
repo: "http://bigtop-repos.s3.amazonaws.com/releases/1.2.0/
centos/7/x86_64"
distro: centos
components: [hdfs, yarn, mapreduce]
enable_local_repo: false
smoke_test_components: [hdfs, yarn, mapreduce]
38
Visibility for deployments
38
Use cases
• For application developers, cluster admins, users
▪ Run a Hadoop cluster to test your code on
▪ Try & test configurations before applying to Production
▪ Play around with Bigtop Big Data Stacks
• For contributors
▪ Easy to test your packaging, deployment, testing code
• For Distro. builders
▪ CI matrix —> patch upstream code made easier
39
DOCKER FOR
BIGTOP SANDBOX
40
Introducing Bigtop Sandbox
• Easy way to get started
• Docker images that has Bigtop stacks installed and
configured
• Pseudo cluster up & running w/o installation
• Command-line tool for you to build your own stack
41
Docker image layer
Interface
Customized	big	data	stack
Deployment	&	management	tool
Base	image	(OS)
42
Docker image layer
Concrete implementation
HDFS	+	YARN	+	Spark
Bigtop	Puppet
bigtop/puppet:ubuntu-16.04
43
Building images
Ubuntu	16.04
Bigtop	Puppet
HDFS	+	YARN	+	Spark
+ site.yaml
$ puppet apply
44
site.yaml example
45
bigtop::hadoop_head_node: bigtop.example.com
bigtop::bigtop_repo_uri: http://bigtop-repos.s3.amazonaws.com/
releases/1.2.0/debian/8/x86_64
hadoop::hadoop_storage_dirs: [/data/1, /data/2]
hadoop_cluster_node::cluster_components: [hdfs, yarn, spark]
How to build
• Or specify your custom conf:
git clone https://github.com/apache/bigtop.git
cd bigtop/docker/sandbox
./build.sh -a bigtop -o ubuntu-16.04 
-c "hdfs, yarn, spark"
./build.sh-a bigtop -o ubuntu-16.04 
-f custom_site.yaml -t dws2017
46
Running images
HDFS	+	YARN	+	Spark
$ puppet apply
47
How to run
docker run --name sandbox -d 
-p 50070:50070 -p 8088:8088 
evansye/sandbox:dws2017
docker logs -f sandbox
docker exec sandbox spark-example SparkPi
48
49
Bigtop Provisioner Bigtop Sandbox
Scalable V X
Portable X V
Flexibility High Medium
Speed > 2 mins > 15 secs
Requires Network V X
Port forwarding X V
50
Bigtop Provisioner Bigtop Sandbox
Data engineers
Multi-node 

cluster testing
Build/use
sandboxes 

for dev & test
Ops
Multi-node 

cluster testing
Single node 

testing
Contributors
Test packages,
puppet recipes,

test cases
Test packages,
puppet recipes,

test cases
Distro. Builders
Test packages,
puppet recipes,

test cases
Provide Sandboxes
51
Integration test in CI/CD pipeline
Unit	
Test	
Source	
code		
Compile	
	
Build	
Image	
Integra7on	test	with	
Sandbox	
Sandbox	Service	
CD	pipeline	with	Bigtop	Sandbox	
Docker	Registry	
Push	
Image	
Deploy	
	
FINISHED	
	
Data	
52
Future
• Production deployment using Sandbox images
▪ --net host or overlay network(SDN)?
▪ External volumes for edit logs, fsimages, etc
▪ Cluster orchestration
▪ Swarm, Kubernetes?
53
RELEASES
54
▪ New components:
▪ Ambari 2.5.0
▪ GPDB 5.0.0-alpha.0

(Greenplum)
Bigtop 1.2.0 Released April, 2017
▪ Featured upgrade:
▪ Hadoop 2.7.3
▪ Spark 2.1.0
▪ Kafka 0.10.1.1
▪ HBase 1.1.3
▪ and more
55
• New features:
▪ Juju bigtop charms
▪ Bigtop Sandbox (alpha, recommended to try master)
• Improvement:
▪ Bigtop Docker Provisioner made faster
New features in Bigtop 1.2.0
56
Juju Cloud Weather Report
http://bigtop.charm.qa/
57
• Expected to be out late June
• Hadoop 2.7.4 

(Interested in docker container support back ported, but I'm not sure yet)
• Mainly bug fixes:
• Packages
• Deployments
• Sandbox
Bigtop 1.2.1 up coming
58
• Machine Learning and Deep Learning integration
• Support aarch 64
• Enhance support set in Bigtop Puppet (not all components covered)
• Extend the CI matrix coverage to Bigtop Tests
• Ambari Bigtop stack integration
• Provide Big data stack references
Road ahead towards 1.3.0
59
60
• Submit your proposal, contribute Bigtop w/ funding!
• Improvements, new features, build, test, CI, etc
• CFP opened June 13, 2017

CFP closed July 14, 2017
• https://www.odpi.org/community/bigtopgrantfund
ODPi Apache Bigtop Test Drive Program
61
• Join mailing list, ask questions, suggest features, etc
• Contribute (components, tutorials, docs)
• Report bugs
▪ Home page: http://bigtop.apache.org/
▪ mailing list: http://bigtop.apache.org/mail-lists.html
▪ Document: https://cwiki.apache.org/confluence/display/BIGTOP/Index
▪ Source code: https://github.com/apache/bigtop
▪ Packages: https://www.apache.org/dist/bigtop/bigtop-1.2.0/repos/
▪ JIRA: https://issues.apache.org/jira/browse/BIGTOP
Reference
62
63
Questions?

More Related Content

What's hot

Introduction to openshift
Introduction to openshiftIntroduction to openshift
Introduction to openshift
MamathaBusi
 
Docker Advanced registry usage
Docker Advanced registry usageDocker Advanced registry usage
Docker Advanced registry usage
Docker, Inc.
 
並列分散処理基盤Hadoopの紹介と、開発者が語るHadoopの使いどころ (Silicon Valley x 日本 / Tech x Business ...
並列分散処理基盤Hadoopの紹介と、開発者が語るHadoopの使いどころ (Silicon Valley x 日本 / Tech x Business ...並列分散処理基盤Hadoopの紹介と、開発者が語るHadoopの使いどころ (Silicon Valley x 日本 / Tech x Business ...
並列分散処理基盤Hadoopの紹介と、開発者が語るHadoopの使いどころ (Silicon Valley x 日本 / Tech x Business ...
NTT DATA OSS Professional Services
 
Prometheus on EKS
Prometheus on EKSPrometheus on EKS
Prometheus on EKS
Jo Hoon
 
Kubernetes Secrets Management on Production with Demo
Kubernetes Secrets Management on Production with DemoKubernetes Secrets Management on Production with Demo
Kubernetes Secrets Management on Production with Demo
Opsta
 
Dockerのディスクについて ~ファイルシステム・マウント方法など~
Dockerのディスクについて ~ファイルシステム・マウント方法など~Dockerのディスクについて ~ファイルシステム・マウント方法など~
Dockerのディスクについて ~ファイルシステム・マウント方法など~
HommasSlide
 
Hdp security overview
Hdp security overview Hdp security overview
Hdp security overview
Hortonworks
 
BigtopでHadoopをビルドする(Open Source Conference 2021 Online/Spring 発表資料)
BigtopでHadoopをビルドする(Open Source Conference 2021 Online/Spring 発表資料)BigtopでHadoopをビルドする(Open Source Conference 2021 Online/Spring 発表資料)
BigtopでHadoopをビルドする(Open Source Conference 2021 Online/Spring 発表資料)
NTT DATA Technology & Innovation
 
왜 쿠버네티스는 systemd로 cgroup을 관리하려고 할까요
왜 쿠버네티스는 systemd로 cgroup을 관리하려고 할까요왜 쿠버네티스는 systemd로 cgroup을 관리하려고 할까요
왜 쿠버네티스는 systemd로 cgroup을 관리하려고 할까요
Jo Hoon
 
Internal Hive
Internal HiveInternal Hive
Internal Hive
Recruit Technologies
 
OpenShift Meetup - Tokyo - Service Mesh and Serverless Overview
OpenShift Meetup - Tokyo - Service Mesh and Serverless OverviewOpenShift Meetup - Tokyo - Service Mesh and Serverless Overview
OpenShift Meetup - Tokyo - Service Mesh and Serverless Overview
María Angélica Bracho
 
OSA Con 2022 - Arrow in Flight_ New Developments in Data Connectivity - David...
OSA Con 2022 - Arrow in Flight_ New Developments in Data Connectivity - David...OSA Con 2022 - Arrow in Flight_ New Developments in Data Connectivity - David...
OSA Con 2022 - Arrow in Flight_ New Developments in Data Connectivity - David...
Altinity Ltd
 
Leveraging Docker for Hadoop build automation and Big Data stack provisioning
Leveraging Docker for Hadoop build automation and Big Data stack provisioningLeveraging Docker for Hadoop build automation and Big Data stack provisioning
Leveraging Docker for Hadoop build automation and Big Data stack provisioning
DataWorks Summit
 
Construindo Data Lakes - Visão Prática com Hadoop e BigData
Construindo Data Lakes - Visão Prática com Hadoop e BigDataConstruindo Data Lakes - Visão Prática com Hadoop e BigData
Construindo Data Lakes - Visão Prática com Hadoop e BigData
Marco Garcia
 
What is Docker
What is DockerWhat is Docker
What is Docker
Pavel Klimiankou
 
A brief study on Kubernetes and its components
A brief study on Kubernetes and its componentsA brief study on Kubernetes and its components
A brief study on Kubernetes and its components
Ramit Surana
 
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive ArchitectureHadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Skillspeed
 
[OpenStack Days Korea 2016] Track2 - 아리스타 OpenStack 연동 및 CloudVision 솔루션 소개
[OpenStack Days Korea 2016] Track2 - 아리스타 OpenStack 연동 및 CloudVision 솔루션 소개[OpenStack Days Korea 2016] Track2 - 아리스타 OpenStack 연동 및 CloudVision 솔루션 소개
[OpenStack Days Korea 2016] Track2 - 아리스타 OpenStack 연동 및 CloudVision 솔루션 소개
OpenStack Korea Community
 
OpenShift Overview
OpenShift OverviewOpenShift Overview
OpenShift Overview
roundman
 
From HDFS to S3: Migrate Pinterest Apache Spark Clusters
From HDFS to S3: Migrate Pinterest Apache Spark ClustersFrom HDFS to S3: Migrate Pinterest Apache Spark Clusters
From HDFS to S3: Migrate Pinterest Apache Spark Clusters
Databricks
 

What's hot (20)

Introduction to openshift
Introduction to openshiftIntroduction to openshift
Introduction to openshift
 
Docker Advanced registry usage
Docker Advanced registry usageDocker Advanced registry usage
Docker Advanced registry usage
 
並列分散処理基盤Hadoopの紹介と、開発者が語るHadoopの使いどころ (Silicon Valley x 日本 / Tech x Business ...
並列分散処理基盤Hadoopの紹介と、開発者が語るHadoopの使いどころ (Silicon Valley x 日本 / Tech x Business ...並列分散処理基盤Hadoopの紹介と、開発者が語るHadoopの使いどころ (Silicon Valley x 日本 / Tech x Business ...
並列分散処理基盤Hadoopの紹介と、開発者が語るHadoopの使いどころ (Silicon Valley x 日本 / Tech x Business ...
 
Prometheus on EKS
Prometheus on EKSPrometheus on EKS
Prometheus on EKS
 
Kubernetes Secrets Management on Production with Demo
Kubernetes Secrets Management on Production with DemoKubernetes Secrets Management on Production with Demo
Kubernetes Secrets Management on Production with Demo
 
Dockerのディスクについて ~ファイルシステム・マウント方法など~
Dockerのディスクについて ~ファイルシステム・マウント方法など~Dockerのディスクについて ~ファイルシステム・マウント方法など~
Dockerのディスクについて ~ファイルシステム・マウント方法など~
 
Hdp security overview
Hdp security overview Hdp security overview
Hdp security overview
 
BigtopでHadoopをビルドする(Open Source Conference 2021 Online/Spring 発表資料)
BigtopでHadoopをビルドする(Open Source Conference 2021 Online/Spring 発表資料)BigtopでHadoopをビルドする(Open Source Conference 2021 Online/Spring 発表資料)
BigtopでHadoopをビルドする(Open Source Conference 2021 Online/Spring 発表資料)
 
왜 쿠버네티스는 systemd로 cgroup을 관리하려고 할까요
왜 쿠버네티스는 systemd로 cgroup을 관리하려고 할까요왜 쿠버네티스는 systemd로 cgroup을 관리하려고 할까요
왜 쿠버네티스는 systemd로 cgroup을 관리하려고 할까요
 
Internal Hive
Internal HiveInternal Hive
Internal Hive
 
OpenShift Meetup - Tokyo - Service Mesh and Serverless Overview
OpenShift Meetup - Tokyo - Service Mesh and Serverless OverviewOpenShift Meetup - Tokyo - Service Mesh and Serverless Overview
OpenShift Meetup - Tokyo - Service Mesh and Serverless Overview
 
OSA Con 2022 - Arrow in Flight_ New Developments in Data Connectivity - David...
OSA Con 2022 - Arrow in Flight_ New Developments in Data Connectivity - David...OSA Con 2022 - Arrow in Flight_ New Developments in Data Connectivity - David...
OSA Con 2022 - Arrow in Flight_ New Developments in Data Connectivity - David...
 
Leveraging Docker for Hadoop build automation and Big Data stack provisioning
Leveraging Docker for Hadoop build automation and Big Data stack provisioningLeveraging Docker for Hadoop build automation and Big Data stack provisioning
Leveraging Docker for Hadoop build automation and Big Data stack provisioning
 
Construindo Data Lakes - Visão Prática com Hadoop e BigData
Construindo Data Lakes - Visão Prática com Hadoop e BigDataConstruindo Data Lakes - Visão Prática com Hadoop e BigData
Construindo Data Lakes - Visão Prática com Hadoop e BigData
 
What is Docker
What is DockerWhat is Docker
What is Docker
 
A brief study on Kubernetes and its components
A brief study on Kubernetes and its componentsA brief study on Kubernetes and its components
A brief study on Kubernetes and its components
 
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive ArchitectureHadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
 
[OpenStack Days Korea 2016] Track2 - 아리스타 OpenStack 연동 및 CloudVision 솔루션 소개
[OpenStack Days Korea 2016] Track2 - 아리스타 OpenStack 연동 및 CloudVision 솔루션 소개[OpenStack Days Korea 2016] Track2 - 아리스타 OpenStack 연동 및 CloudVision 솔루션 소개
[OpenStack Days Korea 2016] Track2 - 아리스타 OpenStack 연동 및 CloudVision 솔루션 소개
 
OpenShift Overview
OpenShift OverviewOpenShift Overview
OpenShift Overview
 
From HDFS to S3: Migrate Pinterest Apache Spark Clusters
From HDFS to S3: Migrate Pinterest Apache Spark ClustersFrom HDFS to S3: Migrate Pinterest Apache Spark Clusters
From HDFS to S3: Migrate Pinterest Apache Spark Clusters
 

Similar to Leveraging docker for hadoop build automation and big data stack provisioning

How bigtop leveraged docker for build automation and one click hadoop provis...
How bigtop leveraged docker for build automation and  one click hadoop provis...How bigtop leveraged docker for build automation and  one click hadoop provis...
How bigtop leveraged docker for build automation and one click hadoop provis...
Evans Ye
 
How bigtop leveraged docker for build automation and one click hadoop provis...
How bigtop leveraged docker for build automation and  one click hadoop provis...How bigtop leveraged docker for build automation and  one click hadoop provis...
How bigtop leveraged docker for build automation and one click hadoop provis...
Evans Ye
 
State of Big Data on ARM64 / AArch64 - Apache Bigtop
State of Big Data on ARM64 / AArch64 - Apache BigtopState of Big Data on ARM64 / AArch64 - Apache Bigtop
State of Big Data on ARM64 / AArch64 - Apache Bigtop
Ganesh Raju
 
Free GitOps Workshop
Free GitOps WorkshopFree GitOps Workshop
Free GitOps Workshop
Weaveworks
 
20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker
Wei Ting Chen
 
Road to NODES - Handling Neo4j Data with Apache Hop
Road to NODES - Handling Neo4j Data with Apache HopRoad to NODES - Handling Neo4j Data with Apache Hop
Road to NODES - Handling Neo4j Data with Apache Hop
Neo4j
 
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache ApexMaking sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
Apache Apex
 
Galera on kubernetes_no_video
Galera on kubernetes_no_videoGalera on kubernetes_no_video
Galera on kubernetes_no_video
Patrick Galbraith
 
A Love Story with Kubevirt and Backstage from Cloud Native NoVA meetup Feb 2024
A Love Story with Kubevirt and Backstage from Cloud Native NoVA meetup Feb 2024A Love Story with Kubevirt and Backstage from Cloud Native NoVA meetup Feb 2024
A Love Story with Kubevirt and Backstage from Cloud Native NoVA meetup Feb 2024
Cloud Native NoVA
 
Jfokus_Bringing the cloud back down to earth.pptx
Jfokus_Bringing the cloud back down to earth.pptxJfokus_Bringing the cloud back down to earth.pptx
Jfokus_Bringing the cloud back down to earth.pptx
Grace Jansen
 
Fluo CICD OpenStack Summit
Fluo CICD OpenStack SummitFluo CICD OpenStack Summit
Fluo CICD OpenStack Summit
Miguel Zuniga
 
PaaSTA: Running applications at Yelp
PaaSTA: Running applications at YelpPaaSTA: Running applications at Yelp
PaaSTA: Running applications at Yelp
Nathan Handler
 
Detailed Introduction To Docker
Detailed Introduction To DockerDetailed Introduction To Docker
Detailed Introduction To Docker
nklmish
 
FooConf23_Bringing the cloud back down to earth.pptx
FooConf23_Bringing the cloud back down to earth.pptxFooConf23_Bringing the cloud back down to earth.pptx
FooConf23_Bringing the cloud back down to earth.pptx
Grace Jansen
 
Robust Network Security and Observability with GitOps and Cilium
Robust Network Security and Observability with GitOps and CiliumRobust Network Security and Observability with GitOps and Cilium
Robust Network Security and Observability with GitOps and Cilium
Weaveworks
 
Commit to excellence - Java in containers
Commit to excellence - Java in containersCommit to excellence - Java in containers
Commit to excellence - Java in containers
Red Hat Developers
 
Node.js what's next (Index 2018)
Node.js what's next (Index 2018)Node.js what's next (Index 2018)
Node.js what's next (Index 2018)
Gibson Fahnestock
 
O'Reilly Software Architecture Conference London 2017: Building Resilient Mic...
O'Reilly Software Architecture Conference London 2017: Building Resilient Mic...O'Reilly Software Architecture Conference London 2017: Building Resilient Mic...
O'Reilly Software Architecture Conference London 2017: Building Resilient Mic...
Ambassador Labs
 
DevOps: Arquitectura, Estrategia y Modelo
DevOps: Arquitectura, Estrategia y ModeloDevOps: Arquitectura, Estrategia y Modelo
DevOps: Arquitectura, Estrategia y Modelo
SUSE España
 
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
Fwdays
 

Similar to Leveraging docker for hadoop build automation and big data stack provisioning (20)

How bigtop leveraged docker for build automation and one click hadoop provis...
How bigtop leveraged docker for build automation and  one click hadoop provis...How bigtop leveraged docker for build automation and  one click hadoop provis...
How bigtop leveraged docker for build automation and one click hadoop provis...
 
How bigtop leveraged docker for build automation and one click hadoop provis...
How bigtop leveraged docker for build automation and  one click hadoop provis...How bigtop leveraged docker for build automation and  one click hadoop provis...
How bigtop leveraged docker for build automation and one click hadoop provis...
 
State of Big Data on ARM64 / AArch64 - Apache Bigtop
State of Big Data on ARM64 / AArch64 - Apache BigtopState of Big Data on ARM64 / AArch64 - Apache Bigtop
State of Big Data on ARM64 / AArch64 - Apache Bigtop
 
Free GitOps Workshop
Free GitOps WorkshopFree GitOps Workshop
Free GitOps Workshop
 
20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker
 
Road to NODES - Handling Neo4j Data with Apache Hop
Road to NODES - Handling Neo4j Data with Apache HopRoad to NODES - Handling Neo4j Data with Apache Hop
Road to NODES - Handling Neo4j Data with Apache Hop
 
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache ApexMaking sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
 
Galera on kubernetes_no_video
Galera on kubernetes_no_videoGalera on kubernetes_no_video
Galera on kubernetes_no_video
 
A Love Story with Kubevirt and Backstage from Cloud Native NoVA meetup Feb 2024
A Love Story with Kubevirt and Backstage from Cloud Native NoVA meetup Feb 2024A Love Story with Kubevirt and Backstage from Cloud Native NoVA meetup Feb 2024
A Love Story with Kubevirt and Backstage from Cloud Native NoVA meetup Feb 2024
 
Jfokus_Bringing the cloud back down to earth.pptx
Jfokus_Bringing the cloud back down to earth.pptxJfokus_Bringing the cloud back down to earth.pptx
Jfokus_Bringing the cloud back down to earth.pptx
 
Fluo CICD OpenStack Summit
Fluo CICD OpenStack SummitFluo CICD OpenStack Summit
Fluo CICD OpenStack Summit
 
PaaSTA: Running applications at Yelp
PaaSTA: Running applications at YelpPaaSTA: Running applications at Yelp
PaaSTA: Running applications at Yelp
 
Detailed Introduction To Docker
Detailed Introduction To DockerDetailed Introduction To Docker
Detailed Introduction To Docker
 
FooConf23_Bringing the cloud back down to earth.pptx
FooConf23_Bringing the cloud back down to earth.pptxFooConf23_Bringing the cloud back down to earth.pptx
FooConf23_Bringing the cloud back down to earth.pptx
 
Robust Network Security and Observability with GitOps and Cilium
Robust Network Security and Observability with GitOps and CiliumRobust Network Security and Observability with GitOps and Cilium
Robust Network Security and Observability with GitOps and Cilium
 
Commit to excellence - Java in containers
Commit to excellence - Java in containersCommit to excellence - Java in containers
Commit to excellence - Java in containers
 
Node.js what's next (Index 2018)
Node.js what's next (Index 2018)Node.js what's next (Index 2018)
Node.js what's next (Index 2018)
 
O'Reilly Software Architecture Conference London 2017: Building Resilient Mic...
O'Reilly Software Architecture Conference London 2017: Building Resilient Mic...O'Reilly Software Architecture Conference London 2017: Building Resilient Mic...
O'Reilly Software Architecture Conference London 2017: Building Resilient Mic...
 
DevOps: Arquitectura, Estrategia y Modelo
DevOps: Arquitectura, Estrategia y ModeloDevOps: Arquitectura, Estrategia y Modelo
DevOps: Arquitectura, Estrategia y Modelo
 
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
 

More from Evans Ye

Join ASF to Unlock Full Possibilities of Your Professional Career.pdf
Join ASF to Unlock Full Possibilities of Your Professional Career.pdfJoin ASF to Unlock Full Possibilities of Your Professional Career.pdf
Join ASF to Unlock Full Possibilities of Your Professional Career.pdf
Evans Ye
 
非常人走非常路:參與ASF打世界杯比賽
非常人走非常路:參與ASF打世界杯比賽非常人走非常路:參與ASF打世界杯比賽
非常人走非常路:參與ASF打世界杯比賽
Evans Ye
 
TensorFlow on Spark: A Deep Dive into Distributed Deep Learning
TensorFlow on Spark: A Deep Dive into Distributed Deep LearningTensorFlow on Spark: A Deep Dive into Distributed Deep Learning
TensorFlow on Spark: A Deep Dive into Distributed Deep Learning
Evans Ye
 
2017 big data landscape and cutting edge innovations public
2017 big data landscape and cutting edge innovations public2017 big data landscape and cutting edge innovations public
2017 big data landscape and cutting edge innovations public
Evans Ye
 
ONE FOR ALL! Using Apache Calcite to make SQL smart
ONE FOR ALL! Using Apache Calcite to make SQL smartONE FOR ALL! Using Apache Calcite to make SQL smart
ONE FOR ALL! Using Apache Calcite to make SQL smart
Evans Ye
 
The Apache Way: A Proven Way Toward Success
The Apache Way: A Proven Way Toward SuccessThe Apache Way: A Proven Way Toward Success
The Apache Way: A Proven Way Toward Success
Evans Ye
 
The Apache Way
The Apache WayThe Apache Way
The Apache Way
Evans Ye
 
Using the SDACK Architecture to Build a Big Data Product
Using the SDACK Architecture to Build a Big Data ProductUsing the SDACK Architecture to Build a Big Data Product
Using the SDACK Architecture to Build a Big Data Product
Evans Ye
 
Trend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopTrend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache Bigtop
Evans Ye
 
BigTop vm and docker provisioner
BigTop vm and docker provisionerBigTop vm and docker provisioner
BigTop vm and docker provisioner
Evans Ye
 
Docker workshop
Docker workshopDocker workshop
Docker workshopEvans Ye
 
Fits docker into devops
Fits docker into devopsFits docker into devops
Fits docker into devopsEvans Ye
 
Getting involved in world class software engineering tips and tricks to join ...
Getting involved in world class software engineering tips and tricks to join ...Getting involved in world class software engineering tips and tricks to join ...
Getting involved in world class software engineering tips and tricks to join ...
Evans Ye
 
Deep dive into enterprise data lake through Impala
Deep dive into enterprise data lake through ImpalaDeep dive into enterprise data lake through Impala
Deep dive into enterprise data lake through Impala
Evans Ye
 
How we lose etu hadoop competition
How we lose etu hadoop competitionHow we lose etu hadoop competition
How we lose etu hadoop competition
Evans Ye
 
Network Traffic Search using Apache HBase
Network Traffic Search using Apache HBaseNetwork Traffic Search using Apache HBase
Network Traffic Search using Apache HBaseEvans Ye
 
Building hadoop based big data environment
Building hadoop based big data environmentBuilding hadoop based big data environment
Building hadoop based big data environmentEvans Ye
 
Hdfs ha using journal nodes
Hdfs ha using journal nodesHdfs ha using journal nodes
Hdfs ha using journal nodesEvans Ye
 
How to be a star engineer
How to be a star engineerHow to be a star engineer
How to be a star engineerEvans Ye
 

More from Evans Ye (20)

Join ASF to Unlock Full Possibilities of Your Professional Career.pdf
Join ASF to Unlock Full Possibilities of Your Professional Career.pdfJoin ASF to Unlock Full Possibilities of Your Professional Career.pdf
Join ASF to Unlock Full Possibilities of Your Professional Career.pdf
 
非常人走非常路:參與ASF打世界杯比賽
非常人走非常路:參與ASF打世界杯比賽非常人走非常路:參與ASF打世界杯比賽
非常人走非常路:參與ASF打世界杯比賽
 
TensorFlow on Spark: A Deep Dive into Distributed Deep Learning
TensorFlow on Spark: A Deep Dive into Distributed Deep LearningTensorFlow on Spark: A Deep Dive into Distributed Deep Learning
TensorFlow on Spark: A Deep Dive into Distributed Deep Learning
 
2017 big data landscape and cutting edge innovations public
2017 big data landscape and cutting edge innovations public2017 big data landscape and cutting edge innovations public
2017 big data landscape and cutting edge innovations public
 
ONE FOR ALL! Using Apache Calcite to make SQL smart
ONE FOR ALL! Using Apache Calcite to make SQL smartONE FOR ALL! Using Apache Calcite to make SQL smart
ONE FOR ALL! Using Apache Calcite to make SQL smart
 
The Apache Way: A Proven Way Toward Success
The Apache Way: A Proven Way Toward SuccessThe Apache Way: A Proven Way Toward Success
The Apache Way: A Proven Way Toward Success
 
The Apache Way
The Apache WayThe Apache Way
The Apache Way
 
Using the SDACK Architecture to Build a Big Data Product
Using the SDACK Architecture to Build a Big Data ProductUsing the SDACK Architecture to Build a Big Data Product
Using the SDACK Architecture to Build a Big Data Product
 
Trend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopTrend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache Bigtop
 
BigTop vm and docker provisioner
BigTop vm and docker provisionerBigTop vm and docker provisioner
BigTop vm and docker provisioner
 
Docker workshop
Docker workshopDocker workshop
Docker workshop
 
Fits docker into devops
Fits docker into devopsFits docker into devops
Fits docker into devops
 
Getting involved in world class software engineering tips and tricks to join ...
Getting involved in world class software engineering tips and tricks to join ...Getting involved in world class software engineering tips and tricks to join ...
Getting involved in world class software engineering tips and tricks to join ...
 
Deep dive into enterprise data lake through Impala
Deep dive into enterprise data lake through ImpalaDeep dive into enterprise data lake through Impala
Deep dive into enterprise data lake through Impala
 
How we lose etu hadoop competition
How we lose etu hadoop competitionHow we lose etu hadoop competition
How we lose etu hadoop competition
 
Network Traffic Search using Apache HBase
Network Traffic Search using Apache HBaseNetwork Traffic Search using Apache HBase
Network Traffic Search using Apache HBase
 
Vagrant
VagrantVagrant
Vagrant
 
Building hadoop based big data environment
Building hadoop based big data environmentBuilding hadoop based big data environment
Building hadoop based big data environment
 
Hdfs ha using journal nodes
Hdfs ha using journal nodesHdfs ha using journal nodes
Hdfs ha using journal nodes
 
How to be a star engineer
How to be a star engineerHow to be a star engineer
How to be a star engineer
 

Recently uploaded

Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 

Recently uploaded (20)

Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 

Leveraging docker for hadoop build automation and big data stack provisioning

  • 1. LEVERAGING DOCKER FOR HADOOP BUILD AUTOMATION AND 
 BIG DATA STACK PROVISIONING Evans Ye, Sr. Software Engineer DataWorks Summit San Jose 2017
  • 2. Who am I • Tech Lead @ APAC Data Team, Y! Taiwan • Building data products for E-Commerce business • PMC chair of Apache Bigtop, ASF member 2
  • 3. Outline • Quick Intro to Apache Bigtop • Docker for Bigtop Packaging • Docker for Bigtop Provisioner • Docker for Bigtop Sandbox • Releases 3
  • 4. QUICK INTRO TO 
 APACHE BIGTOP 4
  • 7. But there're some other great Hadoop ecosystem components.. 7
  • 8. How do I add patches? 8
  • 9. 9
  • 10. From source code to packages Bigtop
 Packaging 10
  • 12. Bigtop feature set Packaging Testing Deployment Virtualization for you to easily build your own Big Data Stack 12
  • 13. Community stats • 94 total contributors • Spark: 1093, Hadoop: 99, HBase: 126, Hive:115 • 5 years since 2012 • 30 Hadoop ecosystem components packaged • 5 Linux Distro., 2 archs supported 13
  • 14. DOCKER FOR 
 BIGTOP PACKAGING 14
  • 17. Bigtop Toolchain • Puppet recipes to install required libraries, build tools • To prepare a build environment: • Prerequisite : ▪ Java git clone https://github.com/apache/bigtop.git cd bigtop ./bigtop_toolchain/bin/puppetize.sh ./gradlew toolchain 17
  • 18. CI infrastructure CentOS slave Fedora slave Ubuntu slave Debian slave OpenSuSE slave 18
  • 19. CI infrastructure CentOS slave Fedora slave Ubuntu slave Debian slave OpenSuSE slave Bigtop Toolchain Bigtop Toolchain Bigtop Toolchain Bigtop Toolchain Bigtop Toolchain 19
  • 20. CI infrastructure CentOS slave Fedora slave Ubuntu slave Debian slave OpenSuSE slave Bigtop Toolchain Bigtop Toolchain Bigtop Toolchain Bigtop Toolchain Bigtop Toolchain 20
  • 21. Dockerlized CI infrastructure CentOS slave Fedora slave Ubuntu slave Debian slave OpenSuSE slave • Immutable env • Fault tolerance 21
  • 22. Dockerlized CI infrastructure CentOS slave Fedora slave Ubuntu slave Debian slave OpenSuSE slave • Immutable env • Fault tolerance 22
  • 23. • Execute shell • Bigtop CI Setup Guide How to build packages # OS=debian-8 # COMPONENT=hadoop docker run -u jenkins --rm -v `pwd`:/bigtop --workdir /bigtop bigtop/slaves:trunk-$OS bash -l -c "./gradlew allclean $COMPONENT-pkg" 23
  • 24. Bigtop packages on master https://ci.bigtop.apache.org/view/Packages/job/Bigtop-trunk-packages/ 24
  • 25. • Example: How to port Bigtop Distribution to PPC64LE? • Prepare PPC64LE docker base image • Apply Bigtop Toolchain on PPC64LE docker image • Build Bigtop packages on PPC64LE slaves image • 2016: Ported 22 out of 24 Bigtop components in 2 weeks, with only 5 patches • Credit: Amir Sanjar, IBM Extremely friendly for porting 25
  • 26. Bigtop early mission accomplished Leveraged by app providers… 26
  • 27. Get out from the Apache dome 27
  • 28. New focus and target user • Data engineers vs Distro. builders • Solution diversity: ▪ Streaming: Flink, Apex ▪ In-memory cache: Alluxio, Ignite ▪ User/developer tools: ▪ Bigtop Provisioner ▪ Bigtop Sandbox • Big data stack references • Machine learning, deep learning components 28
  • 29. DOCKER FOR 
 BIGTOP PROVISIONER 29
  • 30. Bigtop Provisioner • A tool to demonstrate full life cycle of Bigtop Packaging TestingDeploymentVirtualization Create resources Run Bigtop Puppet Run Bigtop Tests Bigtop Provisioner 30
  • 31. • We use Vagrant as an abstraction layer to support different kind of resource providers Vagrant Providers
  • 32. One click Hadoop provisioning
 (Bigtop 1.0.0) bigtop/deploy image 
 on Docker hub ./docker-hadoop.sh -c 3 puppet apply puppet apply puppet apply 32 https://cwiki.apache.org/confluence/display/BIGTOP/Bigtop+Provisioner+User+Guide
  • 33. Problems with Vagrant’s Docker Provider • Need to add vagrant public key into docker images • Too many issues with auto-created boot2docker VM • A bug for docker provider regarding provision keeps opening for 2 years ▪ Waiting for machine to boot' hangs infinitely • Can not share same code for different providers anyway • Not all the docker options supported in Vagrantfile • ^#?& slow 33
  • 34. Replaced by docker-compose 
 (Bigtop 1.2.0) ./docker-hadoop.sh -c 3 puppet apply puppet apply puppet apply 34 bigtop/deploy image 
 on Docker hub
  • 35. Advantages • No need to create customized image beforehand • Better compatibility with Docker’s native solutions • Clear, simple yaml file for orchestration settings • Supports new features such as overlay network • Leverage Swarm for multi-node cluster deployment • Fast —> better user experience 35
  • 36. • Execute shell • Bigtop CI Setup Guide How to run Docker Provisioner # See bigtop/provisioner/docker/*.yaml CONFIG=YOUR_CUSTOM_CONF.yaml # provision ./gradlew -Pconfig=${CONFIG} -Pnum_instances=1 docker-provisioner # destroy provisioned cluster ./gradlew docker-provisioner-destroy 36
  • 37. YOUR_CUSTOM_CONF.yaml example 37 docker: memory_limit: "4g" image: "bigtop/puppet:centos-7" repo: "http://bigtop-repos.s3.amazonaws.com/releases/1.2.0/ centos/7/x86_64" distro: centos components: [hdfs, yarn, mapreduce] enable_local_repo: false smoke_test_components: [hdfs, yarn, mapreduce]
  • 39. Use cases • For application developers, cluster admins, users ▪ Run a Hadoop cluster to test your code on ▪ Try & test configurations before applying to Production ▪ Play around with Bigtop Big Data Stacks • For contributors ▪ Easy to test your packaging, deployment, testing code • For Distro. builders ▪ CI matrix —> patch upstream code made easier 39
  • 41. Introducing Bigtop Sandbox • Easy way to get started • Docker images that has Bigtop stacks installed and configured • Pseudo cluster up & running w/o installation • Command-line tool for you to build your own stack 41
  • 43. Docker image layer Concrete implementation HDFS + YARN + Spark Bigtop Puppet bigtop/puppet:ubuntu-16.04 43
  • 45. site.yaml example 45 bigtop::hadoop_head_node: bigtop.example.com bigtop::bigtop_repo_uri: http://bigtop-repos.s3.amazonaws.com/ releases/1.2.0/debian/8/x86_64 hadoop::hadoop_storage_dirs: [/data/1, /data/2] hadoop_cluster_node::cluster_components: [hdfs, yarn, spark]
  • 46. How to build • Or specify your custom conf: git clone https://github.com/apache/bigtop.git cd bigtop/docker/sandbox ./build.sh -a bigtop -o ubuntu-16.04 -c "hdfs, yarn, spark" ./build.sh-a bigtop -o ubuntu-16.04 -f custom_site.yaml -t dws2017 46
  • 48. How to run docker run --name sandbox -d -p 50070:50070 -p 8088:8088 evansye/sandbox:dws2017 docker logs -f sandbox docker exec sandbox spark-example SparkPi 48
  • 49. 49
  • 50. Bigtop Provisioner Bigtop Sandbox Scalable V X Portable X V Flexibility High Medium Speed > 2 mins > 15 secs Requires Network V X Port forwarding X V 50
  • 51. Bigtop Provisioner Bigtop Sandbox Data engineers Multi-node 
 cluster testing Build/use sandboxes 
 for dev & test Ops Multi-node 
 cluster testing Single node 
 testing Contributors Test packages, puppet recipes,
 test cases Test packages, puppet recipes,
 test cases Distro. Builders Test packages, puppet recipes,
 test cases Provide Sandboxes 51
  • 52. Integration test in CI/CD pipeline Unit Test Source code Compile Build Image Integra7on test with Sandbox Sandbox Service CD pipeline with Bigtop Sandbox Docker Registry Push Image Deploy FINISHED Data 52
  • 53. Future • Production deployment using Sandbox images ▪ --net host or overlay network(SDN)? ▪ External volumes for edit logs, fsimages, etc ▪ Cluster orchestration ▪ Swarm, Kubernetes? 53
  • 55. ▪ New components: ▪ Ambari 2.5.0 ▪ GPDB 5.0.0-alpha.0
 (Greenplum) Bigtop 1.2.0 Released April, 2017 ▪ Featured upgrade: ▪ Hadoop 2.7.3 ▪ Spark 2.1.0 ▪ Kafka 0.10.1.1 ▪ HBase 1.1.3 ▪ and more 55
  • 56. • New features: ▪ Juju bigtop charms ▪ Bigtop Sandbox (alpha, recommended to try master) • Improvement: ▪ Bigtop Docker Provisioner made faster New features in Bigtop 1.2.0 56
  • 57. Juju Cloud Weather Report http://bigtop.charm.qa/ 57
  • 58. • Expected to be out late June • Hadoop 2.7.4 
 (Interested in docker container support back ported, but I'm not sure yet) • Mainly bug fixes: • Packages • Deployments • Sandbox Bigtop 1.2.1 up coming 58
  • 59. • Machine Learning and Deep Learning integration • Support aarch 64 • Enhance support set in Bigtop Puppet (not all components covered) • Extend the CI matrix coverage to Bigtop Tests • Ambari Bigtop stack integration • Provide Big data stack references Road ahead towards 1.3.0 59
  • 60. 60
  • 61. • Submit your proposal, contribute Bigtop w/ funding! • Improvements, new features, build, test, CI, etc • CFP opened June 13, 2017
 CFP closed July 14, 2017 • https://www.odpi.org/community/bigtopgrantfund ODPi Apache Bigtop Test Drive Program 61
  • 62. • Join mailing list, ask questions, suggest features, etc • Contribute (components, tutorials, docs) • Report bugs ▪ Home page: http://bigtop.apache.org/ ▪ mailing list: http://bigtop.apache.org/mail-lists.html ▪ Document: https://cwiki.apache.org/confluence/display/BIGTOP/Index ▪ Source code: https://github.com/apache/bigtop ▪ Packages: https://www.apache.org/dist/bigtop/bigtop-1.2.0/repos/ ▪ JIRA: https://issues.apache.org/jira/browse/BIGTOP Reference 62