SlideShare a Scribd company logo
LEVERAGING DOCKER FOR
HADOOP BUILD
AUTOMATION
AND 

BIG DATA STACK
PROVISIONING
Evans Ye, Sr. Software Engineer
DataWorks Summit San Jose 2017
Who am I
• Tech Lead @ APAC Data Team, Y! Taiwan
• Building data products for E-Commerce business
• PMC chair of Apache Bigtop, ASF member
2
Outline
• Quick Intro to Apache Bigtop
• Docker for Bigtop Packaging
• Docker for Bigtop Provisioner
• Docker for Bigtop Sandbox
• Releases
3
QUICK INTRO TO 

APACHE BIGTOP
4
Linux Distributions
5
Hadoop Distributions
6
But there're some other great
Hadoop ecosystem components..
7
How do I add patches?
8
9
From source code to packages
Bigtop

Packaging
10
Supported components
11
Bigtop feature set
Packaging Testing Deployment Virtualization
for you to easily build your own Big Data Stack
12
Community stats
• 94 total contributors
• Spark: 1093, Hadoop: 99, HBase: 126, Hive:115
• 5 years since 2012
• 30 Hadoop ecosystem components packaged
• 5 Linux Distro., 2 archs supported
13
DOCKER FOR 

BIGTOP PACKAGING
14
Preparing build environment
15
Preparing build environment
…

Seriously ?
16
Bigtop Toolchain
• Puppet recipes to install required libraries, build tools
• To prepare a build environment:
• Prerequisite :
▪ Java
git clone https://github.com/apache/bigtop.git
cd bigtop
./bigtop_toolchain/bin/puppetize.sh
./gradlew toolchain
17
CI infrastructure
CentOS slave
Fedora slave
Ubuntu slave
Debian slave
OpenSuSE slave
18
CI infrastructure
CentOS slave
Fedora slave
Ubuntu slave
Debian slave
OpenSuSE slave
Bigtop Toolchain
Bigtop Toolchain
Bigtop Toolchain
Bigtop Toolchain
Bigtop Toolchain
19
CI infrastructure
CentOS slave
Fedora slave
Ubuntu slave
Debian slave
OpenSuSE slave
Bigtop Toolchain
Bigtop Toolchain
Bigtop Toolchain
Bigtop Toolchain
Bigtop Toolchain
20
Dockerlized CI infrastructure
CentOS slave
Fedora slave
Ubuntu slave
Debian slave
OpenSuSE slave
• Immutable env
• Fault tolerance
21
Dockerlized CI infrastructure
CentOS slave
Fedora slave
Ubuntu slave
Debian slave
OpenSuSE slave
• Immutable env
• Fault tolerance
22
• Execute shell
• Bigtop CI Setup Guide
How to build packages
# OS=debian-8
# COMPONENT=hadoop
docker run -u jenkins --rm 
-v `pwd`:/bigtop --workdir /bigtop 
bigtop/slaves:trunk-$OS 
bash -l -c "./gradlew allclean $COMPONENT-pkg"
23
Bigtop packages on master
https://ci.bigtop.apache.org/view/Packages/job/Bigtop-trunk-packages/
24
• Example: How to port Bigtop Distribution to PPC64LE?
• Prepare PPC64LE docker base image
• Apply Bigtop Toolchain on PPC64LE docker image
• Build Bigtop packages on PPC64LE slaves image
• 2016: Ported 22 out of 24 Bigtop components in 2 weeks, with only 5 patches
• Credit: Amir Sanjar, IBM
Extremely friendly for porting
25
Bigtop early mission accomplished
Leveraged by app providers…
26
Get out from the Apache dome
27
New focus and target user
• Data engineers vs Distro. builders
• Solution diversity:
▪ Streaming: Flink, Apex
▪ In-memory cache: Alluxio, Ignite
▪ User/developer tools:
▪ Bigtop Provisioner
▪ Bigtop Sandbox
• Big data stack references
• Machine learning, deep learning components
28
DOCKER FOR 

BIGTOP PROVISIONER
29
Bigtop Provisioner
• A tool to demonstrate full life cycle of Bigtop
Packaging TestingDeploymentVirtualization
Create resources Run Bigtop Puppet Run Bigtop Tests
Bigtop Provisioner
30
• We use Vagrant as an abstraction layer to support
different kind of resource providers
Vagrant
Providers
One click Hadoop provisioning

(Bigtop 1.0.0)
bigtop/deploy image 

on Docker hub
./docker-hadoop.sh -c 3
puppet apply
puppet apply
puppet apply
32
https://cwiki.apache.org/confluence/display/BIGTOP/Bigtop+Provisioner+User+Guide
Problems with Vagrant’s Docker Provider
• Need to add vagrant public key into docker images
• Too many issues with auto-created boot2docker VM
• A bug for docker provider regarding provision keeps opening for 2 years
▪ Waiting for machine to boot' hangs infinitely
• Can not share same code for different providers anyway
• Not all the docker options supported in Vagrantfile
• ^#?& slow
33
Replaced by docker-compose 

(Bigtop 1.2.0)
./docker-hadoop.sh -c 3
puppet apply
puppet apply
puppet apply
34
bigtop/deploy image 

on Docker hub
Advantages
• No need to create customized image beforehand
• Better compatibility with Docker’s native solutions
• Clear, simple yaml file for orchestration settings
• Supports new features such as overlay network
• Leverage Swarm for multi-node cluster deployment
• Fast —> better user experience
35
• Execute shell
• Bigtop CI Setup Guide
How to run Docker Provisioner
# See bigtop/provisioner/docker/*.yaml
CONFIG=YOUR_CUSTOM_CONF.yaml
# provision
./gradlew -Pconfig=${CONFIG} -Pnum_instances=1 
docker-provisioner
# destroy provisioned cluster
./gradlew docker-provisioner-destroy
36
YOUR_CUSTOM_CONF.yaml example
37
docker:
memory_limit: "4g"
image: "bigtop/puppet:centos-7"
repo: "http://bigtop-repos.s3.amazonaws.com/releases/1.2.0/
centos/7/x86_64"
distro: centos
components: [hdfs, yarn, mapreduce]
enable_local_repo: false
smoke_test_components: [hdfs, yarn, mapreduce]
38
Visibility for deployments
38
Use cases
• For application developers, cluster admins, users
▪ Run a Hadoop cluster to test your code on
▪ Try & test configurations before applying to Production
▪ Play around with Bigtop Big Data Stacks
• For contributors
▪ Easy to test your packaging, deployment, testing code
• For Distro. builders
▪ CI matrix —> patch upstream code made easier
39
DOCKER FOR
BIGTOP SANDBOX
40
Introducing Bigtop Sandbox
• Easy way to get started
• Docker images that has Bigtop stacks installed and
configured
• Pseudo cluster up & running w/o installation
• Command-line tool for you to build your own stack
41
Docker image layer
Interface
Customized	big	data	stack
Deployment	&	management	tool
Base	image	(OS)
42
Docker image layer
Concrete implementation
HDFS	+	YARN	+	Spark
Bigtop	Puppet
bigtop/puppet:ubuntu-16.04
43
Building images
Ubuntu	16.04
Bigtop	Puppet
HDFS	+	YARN	+	Spark
+ site.yaml
$ puppet apply
44
site.yaml example
45
bigtop::hadoop_head_node: bigtop.example.com
bigtop::bigtop_repo_uri: http://bigtop-repos.s3.amazonaws.com/
releases/1.2.0/debian/8/x86_64
hadoop::hadoop_storage_dirs: [/data/1, /data/2]
hadoop_cluster_node::cluster_components: [hdfs, yarn, spark]
How to build
• Or specify your custom conf:
git clone https://github.com/apache/bigtop.git
cd bigtop/docker/sandbox
./build.sh -a bigtop -o ubuntu-16.04 
-c "hdfs, yarn, spark"
./build.sh-a bigtop -o ubuntu-16.04 
-f custom_site.yaml -t dws2017
46
Running images
HDFS	+	YARN	+	Spark
$ puppet apply
47
How to run
docker run --name sandbox -d 
-p 50070:50070 -p 8088:8088 
evansye/sandbox:dws2017
docker logs -f sandbox
docker exec sandbox spark-example SparkPi
48
49
Bigtop Provisioner Bigtop Sandbox
Scalable V X
Portable X V
Flexibility High Medium
Speed > 2 mins > 15 secs
Requires Network V X
Port forwarding X V
50
Bigtop Provisioner Bigtop Sandbox
Data engineers
Multi-node 

cluster testing
Build/use
sandboxes 

for dev & test
Ops
Multi-node 

cluster testing
Single node 

testing
Contributors
Test packages,
puppet recipes,

test cases
Test packages,
puppet recipes,

test cases
Distro. Builders
Test packages,
puppet recipes,

test cases
Provide Sandboxes
51
Integration test in CI/CD pipeline
Unit	
Test	
Source	
code		
Compile	
	
Build	
Image	
Integra7on	test	with	
Sandbox	
Sandbox	Service	
CD	pipeline	with	Bigtop	Sandbox	
Docker	Registry	
Push	
Image	
Deploy	
	
FINISHED	
	
Data	
52
Future
• Production deployment using Sandbox images
▪ --net host or overlay network(SDN)?
▪ External volumes for edit logs, fsimages, etc
▪ Cluster orchestration
▪ Swarm, Kubernetes?
53
RELEASES
54
▪ New components:
▪ Ambari 2.5.0
▪ GPDB 5.0.0-alpha.0

(Greenplum)
Bigtop 1.2.0 Released April, 2017
▪ Featured upgrade:
▪ Hadoop 2.7.3
▪ Spark 2.1.0
▪ Kafka 0.10.1.1
▪ HBase 1.1.3
▪ and more
55
• New features:
▪ Juju bigtop charms
▪ Bigtop Sandbox (alpha, recommended to try master)
• Improvement:
▪ Bigtop Docker Provisioner made faster
New features in Bigtop 1.2.0
56
Juju Cloud Weather Report
http://bigtop.charm.qa/
57
• Expected to be out late June
• Hadoop 2.7.4 

(Interested in docker container support back ported, but I'm not sure yet)
• Mainly bug fixes:
• Packages
• Deployments
• Sandbox
Bigtop 1.2.1 up coming
58
• Machine Learning and Deep Learning integration
• Support aarch 64
• Enhance support set in Bigtop Puppet (not all components covered)
• Extend the CI matrix coverage to Bigtop Tests
• Ambari Bigtop stack integration
• Provide Big data stack references
Road ahead towards 1.3.0
59
60
• Submit your proposal, contribute Bigtop w/ funding!
• Improvements, new features, build, test, CI, etc
• CFP opened June 13, 2017

CFP closed July 14, 2017
• https://www.odpi.org/community/bigtopgrantfund
ODPi Apache Bigtop Test Drive Program
61
• Join mailing list, ask questions, suggest features, etc
• Contribute (components, tutorials, docs)
• Report bugs
▪ Home page: http://bigtop.apache.org/
▪ mailing list: http://bigtop.apache.org/mail-lists.html
▪ Document: https://cwiki.apache.org/confluence/display/BIGTOP/Index
▪ Source code: https://github.com/apache/bigtop
▪ Packages: https://www.apache.org/dist/bigtop/bigtop-1.2.0/repos/
▪ JIRA: https://issues.apache.org/jira/browse/BIGTOP
Reference
62
63
Questions?

More Related Content

What's hot

Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
GetInData
 
Ozone: An Object Store in HDFS
Ozone: An Object Store in HDFSOzone: An Object Store in HDFS
Ozone: An Object Store in HDFS
DataWorks Summit
 
Apache Kylin Meetup: Berlin - With OLX Group
Apache Kylin Meetup: Berlin - With OLX GroupApache Kylin Meetup: Berlin - With OLX Group
Apache Kylin Meetup: Berlin - With OLX Group
Tyler Wishnoff
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
DataWorks Summit/Hadoop Summit
 
File Format Benchmark - Avro, JSON, ORC and Parquet
File Format Benchmark - Avro, JSON, ORC and ParquetFile Format Benchmark - Avro, JSON, ORC and Parquet
File Format Benchmark - Avro, JSON, ORC and Parquet
DataWorks Summit/Hadoop Summit
 
Building IAM for OpenStack
Building IAM for OpenStackBuilding IAM for OpenStack
Building IAM for OpenStack
Steve Martinelli
 
Seamless replication and disaster recovery for Apache Hive Warehouse
Seamless replication and disaster recovery for Apache Hive WarehouseSeamless replication and disaster recovery for Apache Hive Warehouse
Seamless replication and disaster recovery for Apache Hive Warehouse
DataWorks Summit
 
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?
DataWorks Summit
 
Druid and Hive Together : Use Cases and Best Practices
Druid and Hive Together : Use Cases and Best PracticesDruid and Hive Together : Use Cases and Best Practices
Druid and Hive Together : Use Cases and Best Practices
DataWorks Summit
 
Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
 Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
Databricks
 
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks
 
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse ArchitectureServerless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Kai Wähner
 
Ozone and HDFS's Evolution
Ozone and HDFS's EvolutionOzone and HDFS's Evolution
Ozone and HDFS's Evolution
DataWorks Summit
 
Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive Queries
Owen O'Malley
 
Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem
DataWorks Summit/Hadoop Summit
 
Achieve Blazing-Fast Ingest Speeds with Apache Arrow
Achieve Blazing-Fast Ingest Speeds with Apache ArrowAchieve Blazing-Fast Ingest Speeds with Apache Arrow
Achieve Blazing-Fast Ingest Speeds with Apache Arrow
Neo4j
 
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのかApache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
Toshihiro Suzuki
 
Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index
HPCC Systems
 
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
DataWorks Summit
 
Introduction to Apache NiFi dws19 DWS - DC 2019
Introduction to Apache NiFi   dws19 DWS - DC 2019Introduction to Apache NiFi   dws19 DWS - DC 2019
Introduction to Apache NiFi dws19 DWS - DC 2019
Timothy Spann
 

What's hot (20)

Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
 
Ozone: An Object Store in HDFS
Ozone: An Object Store in HDFSOzone: An Object Store in HDFS
Ozone: An Object Store in HDFS
 
Apache Kylin Meetup: Berlin - With OLX Group
Apache Kylin Meetup: Berlin - With OLX GroupApache Kylin Meetup: Berlin - With OLX Group
Apache Kylin Meetup: Berlin - With OLX Group
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
File Format Benchmark - Avro, JSON, ORC and Parquet
File Format Benchmark - Avro, JSON, ORC and ParquetFile Format Benchmark - Avro, JSON, ORC and Parquet
File Format Benchmark - Avro, JSON, ORC and Parquet
 
Building IAM for OpenStack
Building IAM for OpenStackBuilding IAM for OpenStack
Building IAM for OpenStack
 
Seamless replication and disaster recovery for Apache Hive Warehouse
Seamless replication and disaster recovery for Apache Hive WarehouseSeamless replication and disaster recovery for Apache Hive Warehouse
Seamless replication and disaster recovery for Apache Hive Warehouse
 
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?
 
Druid and Hive Together : Use Cases and Best Practices
Druid and Hive Together : Use Cases and Best PracticesDruid and Hive Together : Use Cases and Best Practices
Druid and Hive Together : Use Cases and Best Practices
 
Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
 Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
 
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
 
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse ArchitectureServerless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
 
Ozone and HDFS's Evolution
Ozone and HDFS's EvolutionOzone and HDFS's Evolution
Ozone and HDFS's Evolution
 
Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive Queries
 
Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem
 
Achieve Blazing-Fast Ingest Speeds with Apache Arrow
Achieve Blazing-Fast Ingest Speeds with Apache ArrowAchieve Blazing-Fast Ingest Speeds with Apache Arrow
Achieve Blazing-Fast Ingest Speeds with Apache Arrow
 
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのかApache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
 
Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index
 
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
 
Introduction to Apache NiFi dws19 DWS - DC 2019
Introduction to Apache NiFi   dws19 DWS - DC 2019Introduction to Apache NiFi   dws19 DWS - DC 2019
Introduction to Apache NiFi dws19 DWS - DC 2019
 

Similar to Leveraging Docker for Hadoop build automation and Big Data stack provisioning

How bigtop leveraged docker for build automation and one click hadoop provis...
How bigtop leveraged docker for build automation and  one click hadoop provis...How bigtop leveraged docker for build automation and  one click hadoop provis...
How bigtop leveraged docker for build automation and one click hadoop provis...
Evans Ye
 
How bigtop leveraged docker for build automation and one click hadoop provis...
How bigtop leveraged docker for build automation and  one click hadoop provis...How bigtop leveraged docker for build automation and  one click hadoop provis...
How bigtop leveraged docker for build automation and one click hadoop provis...
Evans Ye
 
State of Big Data on ARM64 / AArch64 - Apache Bigtop
State of Big Data on ARM64 / AArch64 - Apache BigtopState of Big Data on ARM64 / AArch64 - Apache Bigtop
State of Big Data on ARM64 / AArch64 - Apache Bigtop
Ganesh Raju
 
Free GitOps Workshop
Free GitOps WorkshopFree GitOps Workshop
Free GitOps Workshop
Weaveworks
 
20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker
Wei Ting Chen
 
Road to NODES - Handling Neo4j Data with Apache Hop
Road to NODES - Handling Neo4j Data with Apache HopRoad to NODES - Handling Neo4j Data with Apache Hop
Road to NODES - Handling Neo4j Data with Apache Hop
Neo4j
 
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache ApexMaking sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
Apache Apex
 
Galera on kubernetes_no_video
Galera on kubernetes_no_videoGalera on kubernetes_no_video
Galera on kubernetes_no_video
Patrick Galbraith
 
Apache Bigtop: a crash course in deploying a Hadoop bigdata management platform
Apache Bigtop: a crash course in deploying a Hadoop bigdata management platformApache Bigtop: a crash course in deploying a Hadoop bigdata management platform
Apache Bigtop: a crash course in deploying a Hadoop bigdata management platform
rhatr
 
A Love Story with Kubevirt and Backstage from Cloud Native NoVA meetup Feb 2024
A Love Story with Kubevirt and Backstage from Cloud Native NoVA meetup Feb 2024A Love Story with Kubevirt and Backstage from Cloud Native NoVA meetup Feb 2024
A Love Story with Kubevirt and Backstage from Cloud Native NoVA meetup Feb 2024
Cloud Native NoVA
 
Jfokus_Bringing the cloud back down to earth.pptx
Jfokus_Bringing the cloud back down to earth.pptxJfokus_Bringing the cloud back down to earth.pptx
Jfokus_Bringing the cloud back down to earth.pptx
Grace Jansen
 
Fluo CICD OpenStack Summit
Fluo CICD OpenStack SummitFluo CICD OpenStack Summit
Fluo CICD OpenStack Summit
Miguel Zuniga
 
PaaSTA: Running applications at Yelp
PaaSTA: Running applications at YelpPaaSTA: Running applications at Yelp
PaaSTA: Running applications at Yelp
Nathan Handler
 
Detailed Introduction To Docker
Detailed Introduction To DockerDetailed Introduction To Docker
Detailed Introduction To Docker
nklmish
 
FooConf23_Bringing the cloud back down to earth.pptx
FooConf23_Bringing the cloud back down to earth.pptxFooConf23_Bringing the cloud back down to earth.pptx
FooConf23_Bringing the cloud back down to earth.pptx
Grace Jansen
 
Robust Network Security and Observability with GitOps and Cilium
Robust Network Security and Observability with GitOps and CiliumRobust Network Security and Observability with GitOps and Cilium
Robust Network Security and Observability with GitOps and Cilium
Weaveworks
 
Commit to excellence - Java in containers
Commit to excellence - Java in containersCommit to excellence - Java in containers
Commit to excellence - Java in containers
Red Hat Developers
 
Node.js what's next (Index 2018)
Node.js what's next (Index 2018)Node.js what's next (Index 2018)
Node.js what's next (Index 2018)
Gibson Fahnestock
 
O'Reilly Software Architecture Conference London 2017: Building Resilient Mic...
O'Reilly Software Architecture Conference London 2017: Building Resilient Mic...O'Reilly Software Architecture Conference London 2017: Building Resilient Mic...
O'Reilly Software Architecture Conference London 2017: Building Resilient Mic...
Ambassador Labs
 
DevOps: Arquitectura, Estrategia y Modelo
DevOps: Arquitectura, Estrategia y ModeloDevOps: Arquitectura, Estrategia y Modelo
DevOps: Arquitectura, Estrategia y Modelo
SUSE España
 

Similar to Leveraging Docker for Hadoop build automation and Big Data stack provisioning (20)

How bigtop leveraged docker for build automation and one click hadoop provis...
How bigtop leveraged docker for build automation and  one click hadoop provis...How bigtop leveraged docker for build automation and  one click hadoop provis...
How bigtop leveraged docker for build automation and one click hadoop provis...
 
How bigtop leveraged docker for build automation and one click hadoop provis...
How bigtop leveraged docker for build automation and  one click hadoop provis...How bigtop leveraged docker for build automation and  one click hadoop provis...
How bigtop leveraged docker for build automation and one click hadoop provis...
 
State of Big Data on ARM64 / AArch64 - Apache Bigtop
State of Big Data on ARM64 / AArch64 - Apache BigtopState of Big Data on ARM64 / AArch64 - Apache Bigtop
State of Big Data on ARM64 / AArch64 - Apache Bigtop
 
Free GitOps Workshop
Free GitOps WorkshopFree GitOps Workshop
Free GitOps Workshop
 
20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker
 
Road to NODES - Handling Neo4j Data with Apache Hop
Road to NODES - Handling Neo4j Data with Apache HopRoad to NODES - Handling Neo4j Data with Apache Hop
Road to NODES - Handling Neo4j Data with Apache Hop
 
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache ApexMaking sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
 
Galera on kubernetes_no_video
Galera on kubernetes_no_videoGalera on kubernetes_no_video
Galera on kubernetes_no_video
 
Apache Bigtop: a crash course in deploying a Hadoop bigdata management platform
Apache Bigtop: a crash course in deploying a Hadoop bigdata management platformApache Bigtop: a crash course in deploying a Hadoop bigdata management platform
Apache Bigtop: a crash course in deploying a Hadoop bigdata management platform
 
A Love Story with Kubevirt and Backstage from Cloud Native NoVA meetup Feb 2024
A Love Story with Kubevirt and Backstage from Cloud Native NoVA meetup Feb 2024A Love Story with Kubevirt and Backstage from Cloud Native NoVA meetup Feb 2024
A Love Story with Kubevirt and Backstage from Cloud Native NoVA meetup Feb 2024
 
Jfokus_Bringing the cloud back down to earth.pptx
Jfokus_Bringing the cloud back down to earth.pptxJfokus_Bringing the cloud back down to earth.pptx
Jfokus_Bringing the cloud back down to earth.pptx
 
Fluo CICD OpenStack Summit
Fluo CICD OpenStack SummitFluo CICD OpenStack Summit
Fluo CICD OpenStack Summit
 
PaaSTA: Running applications at Yelp
PaaSTA: Running applications at YelpPaaSTA: Running applications at Yelp
PaaSTA: Running applications at Yelp
 
Detailed Introduction To Docker
Detailed Introduction To DockerDetailed Introduction To Docker
Detailed Introduction To Docker
 
FooConf23_Bringing the cloud back down to earth.pptx
FooConf23_Bringing the cloud back down to earth.pptxFooConf23_Bringing the cloud back down to earth.pptx
FooConf23_Bringing the cloud back down to earth.pptx
 
Robust Network Security and Observability with GitOps and Cilium
Robust Network Security and Observability with GitOps and CiliumRobust Network Security and Observability with GitOps and Cilium
Robust Network Security and Observability with GitOps and Cilium
 
Commit to excellence - Java in containers
Commit to excellence - Java in containersCommit to excellence - Java in containers
Commit to excellence - Java in containers
 
Node.js what's next (Index 2018)
Node.js what's next (Index 2018)Node.js what's next (Index 2018)
Node.js what's next (Index 2018)
 
O'Reilly Software Architecture Conference London 2017: Building Resilient Mic...
O'Reilly Software Architecture Conference London 2017: Building Resilient Mic...O'Reilly Software Architecture Conference London 2017: Building Resilient Mic...
O'Reilly Software Architecture Conference London 2017: Building Resilient Mic...
 
DevOps: Arquitectura, Estrategia y Modelo
DevOps: Arquitectura, Estrategia y ModeloDevOps: Arquitectura, Estrategia y Modelo
DevOps: Arquitectura, Estrategia y Modelo
 

More from DataWorks Summit

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 

Recently uploaded (20)

IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 

Leveraging Docker for Hadoop build automation and Big Data stack provisioning

  • 1. LEVERAGING DOCKER FOR HADOOP BUILD AUTOMATION AND 
 BIG DATA STACK PROVISIONING Evans Ye, Sr. Software Engineer DataWorks Summit San Jose 2017
  • 2. Who am I • Tech Lead @ APAC Data Team, Y! Taiwan • Building data products for E-Commerce business • PMC chair of Apache Bigtop, ASF member 2
  • 3. Outline • Quick Intro to Apache Bigtop • Docker for Bigtop Packaging • Docker for Bigtop Provisioner • Docker for Bigtop Sandbox • Releases 3
  • 4. QUICK INTRO TO 
 APACHE BIGTOP 4
  • 7. But there're some other great Hadoop ecosystem components.. 7
  • 8. How do I add patches? 8
  • 9. 9
  • 10. From source code to packages Bigtop
 Packaging 10
  • 12. Bigtop feature set Packaging Testing Deployment Virtualization for you to easily build your own Big Data Stack 12
  • 13. Community stats • 94 total contributors • Spark: 1093, Hadoop: 99, HBase: 126, Hive:115 • 5 years since 2012 • 30 Hadoop ecosystem components packaged • 5 Linux Distro., 2 archs supported 13
  • 14. DOCKER FOR 
 BIGTOP PACKAGING 14
  • 17. Bigtop Toolchain • Puppet recipes to install required libraries, build tools • To prepare a build environment: • Prerequisite : ▪ Java git clone https://github.com/apache/bigtop.git cd bigtop ./bigtop_toolchain/bin/puppetize.sh ./gradlew toolchain 17
  • 18. CI infrastructure CentOS slave Fedora slave Ubuntu slave Debian slave OpenSuSE slave 18
  • 19. CI infrastructure CentOS slave Fedora slave Ubuntu slave Debian slave OpenSuSE slave Bigtop Toolchain Bigtop Toolchain Bigtop Toolchain Bigtop Toolchain Bigtop Toolchain 19
  • 20. CI infrastructure CentOS slave Fedora slave Ubuntu slave Debian slave OpenSuSE slave Bigtop Toolchain Bigtop Toolchain Bigtop Toolchain Bigtop Toolchain Bigtop Toolchain 20
  • 21. Dockerlized CI infrastructure CentOS slave Fedora slave Ubuntu slave Debian slave OpenSuSE slave • Immutable env • Fault tolerance 21
  • 22. Dockerlized CI infrastructure CentOS slave Fedora slave Ubuntu slave Debian slave OpenSuSE slave • Immutable env • Fault tolerance 22
  • 23. • Execute shell • Bigtop CI Setup Guide How to build packages # OS=debian-8 # COMPONENT=hadoop docker run -u jenkins --rm -v `pwd`:/bigtop --workdir /bigtop bigtop/slaves:trunk-$OS bash -l -c "./gradlew allclean $COMPONENT-pkg" 23
  • 24. Bigtop packages on master https://ci.bigtop.apache.org/view/Packages/job/Bigtop-trunk-packages/ 24
  • 25. • Example: How to port Bigtop Distribution to PPC64LE? • Prepare PPC64LE docker base image • Apply Bigtop Toolchain on PPC64LE docker image • Build Bigtop packages on PPC64LE slaves image • 2016: Ported 22 out of 24 Bigtop components in 2 weeks, with only 5 patches • Credit: Amir Sanjar, IBM Extremely friendly for porting 25
  • 26. Bigtop early mission accomplished Leveraged by app providers… 26
  • 27. Get out from the Apache dome 27
  • 28. New focus and target user • Data engineers vs Distro. builders • Solution diversity: ▪ Streaming: Flink, Apex ▪ In-memory cache: Alluxio, Ignite ▪ User/developer tools: ▪ Bigtop Provisioner ▪ Bigtop Sandbox • Big data stack references • Machine learning, deep learning components 28
  • 29. DOCKER FOR 
 BIGTOP PROVISIONER 29
  • 30. Bigtop Provisioner • A tool to demonstrate full life cycle of Bigtop Packaging TestingDeploymentVirtualization Create resources Run Bigtop Puppet Run Bigtop Tests Bigtop Provisioner 30
  • 31. • We use Vagrant as an abstraction layer to support different kind of resource providers Vagrant Providers
  • 32. One click Hadoop provisioning
 (Bigtop 1.0.0) bigtop/deploy image 
 on Docker hub ./docker-hadoop.sh -c 3 puppet apply puppet apply puppet apply 32 https://cwiki.apache.org/confluence/display/BIGTOP/Bigtop+Provisioner+User+Guide
  • 33. Problems with Vagrant’s Docker Provider • Need to add vagrant public key into docker images • Too many issues with auto-created boot2docker VM • A bug for docker provider regarding provision keeps opening for 2 years ▪ Waiting for machine to boot' hangs infinitely • Can not share same code for different providers anyway • Not all the docker options supported in Vagrantfile • ^#?& slow 33
  • 34. Replaced by docker-compose 
 (Bigtop 1.2.0) ./docker-hadoop.sh -c 3 puppet apply puppet apply puppet apply 34 bigtop/deploy image 
 on Docker hub
  • 35. Advantages • No need to create customized image beforehand • Better compatibility with Docker’s native solutions • Clear, simple yaml file for orchestration settings • Supports new features such as overlay network • Leverage Swarm for multi-node cluster deployment • Fast —> better user experience 35
  • 36. • Execute shell • Bigtop CI Setup Guide How to run Docker Provisioner # See bigtop/provisioner/docker/*.yaml CONFIG=YOUR_CUSTOM_CONF.yaml # provision ./gradlew -Pconfig=${CONFIG} -Pnum_instances=1 docker-provisioner # destroy provisioned cluster ./gradlew docker-provisioner-destroy 36
  • 37. YOUR_CUSTOM_CONF.yaml example 37 docker: memory_limit: "4g" image: "bigtop/puppet:centos-7" repo: "http://bigtop-repos.s3.amazonaws.com/releases/1.2.0/ centos/7/x86_64" distro: centos components: [hdfs, yarn, mapreduce] enable_local_repo: false smoke_test_components: [hdfs, yarn, mapreduce]
  • 39. Use cases • For application developers, cluster admins, users ▪ Run a Hadoop cluster to test your code on ▪ Try & test configurations before applying to Production ▪ Play around with Bigtop Big Data Stacks • For contributors ▪ Easy to test your packaging, deployment, testing code • For Distro. builders ▪ CI matrix —> patch upstream code made easier 39
  • 41. Introducing Bigtop Sandbox • Easy way to get started • Docker images that has Bigtop stacks installed and configured • Pseudo cluster up & running w/o installation • Command-line tool for you to build your own stack 41
  • 43. Docker image layer Concrete implementation HDFS + YARN + Spark Bigtop Puppet bigtop/puppet:ubuntu-16.04 43
  • 45. site.yaml example 45 bigtop::hadoop_head_node: bigtop.example.com bigtop::bigtop_repo_uri: http://bigtop-repos.s3.amazonaws.com/ releases/1.2.0/debian/8/x86_64 hadoop::hadoop_storage_dirs: [/data/1, /data/2] hadoop_cluster_node::cluster_components: [hdfs, yarn, spark]
  • 46. How to build • Or specify your custom conf: git clone https://github.com/apache/bigtop.git cd bigtop/docker/sandbox ./build.sh -a bigtop -o ubuntu-16.04 -c "hdfs, yarn, spark" ./build.sh-a bigtop -o ubuntu-16.04 -f custom_site.yaml -t dws2017 46
  • 48. How to run docker run --name sandbox -d -p 50070:50070 -p 8088:8088 evansye/sandbox:dws2017 docker logs -f sandbox docker exec sandbox spark-example SparkPi 48
  • 49. 49
  • 50. Bigtop Provisioner Bigtop Sandbox Scalable V X Portable X V Flexibility High Medium Speed > 2 mins > 15 secs Requires Network V X Port forwarding X V 50
  • 51. Bigtop Provisioner Bigtop Sandbox Data engineers Multi-node 
 cluster testing Build/use sandboxes 
 for dev & test Ops Multi-node 
 cluster testing Single node 
 testing Contributors Test packages, puppet recipes,
 test cases Test packages, puppet recipes,
 test cases Distro. Builders Test packages, puppet recipes,
 test cases Provide Sandboxes 51
  • 52. Integration test in CI/CD pipeline Unit Test Source code Compile Build Image Integra7on test with Sandbox Sandbox Service CD pipeline with Bigtop Sandbox Docker Registry Push Image Deploy FINISHED Data 52
  • 53. Future • Production deployment using Sandbox images ▪ --net host or overlay network(SDN)? ▪ External volumes for edit logs, fsimages, etc ▪ Cluster orchestration ▪ Swarm, Kubernetes? 53
  • 55. ▪ New components: ▪ Ambari 2.5.0 ▪ GPDB 5.0.0-alpha.0
 (Greenplum) Bigtop 1.2.0 Released April, 2017 ▪ Featured upgrade: ▪ Hadoop 2.7.3 ▪ Spark 2.1.0 ▪ Kafka 0.10.1.1 ▪ HBase 1.1.3 ▪ and more 55
  • 56. • New features: ▪ Juju bigtop charms ▪ Bigtop Sandbox (alpha, recommended to try master) • Improvement: ▪ Bigtop Docker Provisioner made faster New features in Bigtop 1.2.0 56
  • 57. Juju Cloud Weather Report http://bigtop.charm.qa/ 57
  • 58. • Expected to be out late June • Hadoop 2.7.4 
 (Interested in docker container support back ported, but I'm not sure yet) • Mainly bug fixes: • Packages • Deployments • Sandbox Bigtop 1.2.1 up coming 58
  • 59. • Machine Learning and Deep Learning integration • Support aarch 64 • Enhance support set in Bigtop Puppet (not all components covered) • Extend the CI matrix coverage to Bigtop Tests • Ambari Bigtop stack integration • Provide Big data stack references Road ahead towards 1.3.0 59
  • 60. 60
  • 61. • Submit your proposal, contribute Bigtop w/ funding! • Improvements, new features, build, test, CI, etc • CFP opened June 13, 2017
 CFP closed July 14, 2017 • https://www.odpi.org/community/bigtopgrantfund ODPi Apache Bigtop Test Drive Program 61
  • 62. • Join mailing list, ask questions, suggest features, etc • Contribute (components, tutorials, docs) • Report bugs ▪ Home page: http://bigtop.apache.org/ ▪ mailing list: http://bigtop.apache.org/mail-lists.html ▪ Document: https://cwiki.apache.org/confluence/display/BIGTOP/Index ▪ Source code: https://github.com/apache/bigtop ▪ Packages: https://www.apache.org/dist/bigtop/bigtop-1.2.0/repos/ ▪ JIRA: https://issues.apache.org/jira/browse/BIGTOP Reference 62