Hadoop on Docker

Docker-Based
Hadoop Provisioning
On Cisco InterCloud
Innovation Architect, CIS CTO Group
Cisco
Dmitri Chtchourov Rakesh Saha
Product Management
Hortonworks

© Hortonworks Inc. 2011 – 2015. All Rights Reserved
Cautionary Statement Regarding Forward-Looking Statements
This presentation contains forward-looking statements involving risks and uncertainties. Such forward-looking statements in this
presentation generally relate to future events, our ability to increase the number of support subscription customers, the growth in
usage of the Hadoop framework, our ability to innovate and develop the various open source projects that will enhance the
capabilities of the Hortonworks Data Platform, anticipated customer benefits and general business outlook. In some cases, you can
identify forward-looking statements because they contain words such as “may,” “will,” “should,” “expects,” “plans,” “anticipates,”
“could,” “intends,” “target,” “projects,” “contemplates,” “believes,” “estimates,” “predicts,” “potential” or “continue” or similar terms
or expressions that concern our expectations, strategy, plans or intentions. You should not rely upon forward-looking statements as
predictions of future events. We have based the forward-looking statements contained in this presentation primarily on our current
expectations and projections about future events and trends that we believe may affect our business, financial condition and
prospects. We cannot assure you that the results, events and circumstances reflected in the forward-looking statements will be
achieved or occur, and actual results, events, or circumstances could differ materially from those described in the forward-looking
statements.
The forward-looking statements made in this prospectus relate only to events as of the date on which the statements are made and we
undertake no obligation to update any of the information in this presentation.
Trademarks
Hortonworks is a trademark of Hortonworks, Inc. in the United States and other jurisdictions. Other names used herein may be
trademarks of their respective owners.

Speakers
Rakesh Saha
Product Management
Hortonworks
Dmitri Chtchourov
Innovation Architect, CIS CTO Group
Cisco

Agenda
• About Hortonworks
• Cloudbreak – Docker-based Hadoop provisioning tool
• Introduction to Docker
• Hadoop Provisioning using Docker
• Cisco and Hortonworks Collaboration

About HortonworksONLY
100open source
Apache Hadoop data platform
% Founded in 2011
HADOOP
1ST
distribution to go public
IPO Fall 2014 (NASDAQ: HDP)
subscription
customers322 employees across
600+
countrie
s
technology partners
1000+ 17TM

Hortonworks
Mission:
Power your Modern Data Architecture
with HDP and Enterprise Apache Hadoop
Customer Momentum
• 300+ customers in seven quarters, growing at 75+/quarter
• Two thirds of customers come from F1000
Hortonworks and Hadoop at
Scale
• HDP in production on largest clusters on planet
• Multiple +1000 node clusters, including 35,000 nodes at
Yahoo!, 800 nodes at Spotify
• Founded in 2011
• Original 24 architects, developers,
operators of Hadoop from Yahoo!
• We are leaders in Hadoop community
• 500+ employees

OPERATIONAL TOOLS
DEV & DATA TOOLS
INFRASTRUCTURE
HDP is deeply integrated in the data centerSOURCES
EXISTING
Systems
Clickstream Web &Social Geolocation Sensor &
Machine
Server Logs Unstructured
DATASYSTEM
RDBMS EDW MPP
APPLICATIONS
Deep Partnerships
Hortonworks engages in deep
engineered relationships with the
leaders in the data center, such as
Cisco, Microsoft, EMC, Pivotal,
Teradata, Red Hat, SAS & SAP.
Broad Partnerships
Over a 1,000 partners work with us
to certify their applications to work
with Hadoop so they can extend big
data to their users.
HDP
Governance
&Integration
Security
Operations
Data Access
Data Management
YARN

Agenda
Cloudbreak Docker Provisioning Collaboration

Cloudbreak
• Developed by SequenceIQ
• Open source with Apache 2.0
license [ Apache project soon ]
• Deploys selected services to
public and private cloud via
Ambari Blueprints
• Elastic – can spin up any number
of nodes, add/remove on the fly
• Provides full cloud lifecycle
management post-deployment

BI / Analytics
(Hive)
IoT Apps
(Storm, HBase, Hive)
Launch HDP on Any Cloud for Any Application
Dev / Test
(all HDP services)
Data Science
(Spark)
Cloudbreak
1. Pick a Blueprint
2. Choose a Cloud
3. Launch HDP!
Example Ambari
Blueprints:
IoT Apps, BI / Analytics, Data Science, Dev /
Test

Hadoop in Cloud Provisioning with Cloudbreak
Create
Templates
Provide
Blueprint
Associate
Credentials
Launch
Cluster

Provisioning: Template
Create
Template
Provide
Blueprint
Associate
Credentials
Launch
Cluster

Provisioning: Blueprint
Create
Template
Provide
Blueprint
Associate
Credentials
Launch
Cluster

Provisioning: Provider Credentials
Create
Template
Provide
Blueprint
Associate
Credentials
Launch
Cluster

Provisioning: Launch
Create
Template
Provide
Blueprint
Associate
Credentials
Launch
Cluster

Specialized Blueprints
Quick productivity with pre-configured clusters blueprints
 Lambda Architecture
 Machine Learning
 Batch ETL
 …

BI / Analytics
(Hive)
IoT Apps
(Storm, HBase, Hive)
Dev / Test
(all HDP services)
Data Science
(Spark)
Autoscaling
Policy
• Policies based on any Ambari metrics
• Coordinates with YARN
• Policies are based on Metrics or Time
• Scaling can be service or component
type specific
Optimize cloud usage via Elastic Clusters

Auto-scale
Policy
Auto-scale
Policy
Auto-scale
Policy
YARN
Ambari
Alerts
Ambari
Metrics
Ambari
Ambari
Ambari
Provisioning
Cloudbreak
Static
Dynamic
Enforces Policies
Scales
Cluster/YARN Apps
Metrics and Alerts Feed
Cloudbreak
Scaling for Static and Dynamic Clusters

Provisioning – How it works
Start VMs -
with a running
Docker
daemon
Cloudbreak
Bootstrap
•Start Consul
Cluster
•Start Swarm
Cluster (Consul
for discovery)
Start Ambari
servers/agents
- Swarm API
Ambari
services
registered in
Consul
(Registrator)
Post Blueprint

Multiplicity
of
Stacks
Multiplicity
of hardware
environments
Static website Web frontendUser DB Queue Analytics DB
Development
VM QA server Public Cloud
Contributor’s
laptopProduction
Cluster
Customer Data
Center
An engine that enables any payload to be
encapsulated as a lightweight, portable,
self-sufficient container
Docker is a “Shipping Container” System for Code

 Lightweight, portable
 Build once, run anywhere
 VM – without the overhead of a VM
 Isolated containers
 Automated and scripted
Docker

Why Is Docker So Exciting?
For Developers:
Build once…run anywhere
• A clean, safe, and portable runtime
environment for your app.
• No missing dependencies, packages etc.
• Run each app in its own isolated container
• Automate testing, integration, packaging
• Reduce/eliminate concerns about
compatibility on different platforms
• Cheap, zero-penalty containers to deploy
services
For DevOps:
Configure once…run anything
• Make the entire lifecycle more efficient,
consistent, and repeatable
• Eliminate inconsistencies between SDLC
stages
• Support segregation of duties
• Significantly improves the speed and
reliability of CICD
• Significantly lightweight compared to VMs

App
A
Hypervisor (Type 2)
Host OS
Server
Guest
OS
Bins/
Libs
App
A’
Guest
OS
Bins/
Libs
App
B
Guest
OS
Bins/
Libs
Docker
Host OS kernel
Server
bin
AppA
lib
AppB
VM
Container
Containers are isolated,
Share only the kernel
Guest
OS
Guest
OS
…result is significantly faster
deployment, much less overhead,
easier migration, faster restart
lib
AppB
lib
AppB
lib
AppB
bin
AppA
Docker: Containers vs. VMs

HDP as Docker
Containers
via Cloudbreak
• Running Ambari Cluster in Containers
• Use Blueprint to define services
• All HDP services share a single container
Cloudb
reak
Ambari HDP
Installs
Ambari on
the VMs
Docker
VM
Docker
VM
Docker
Linux
Instruct
s
Ambari
to build
HDP
cluster
Cloud Provider/Bare Metal
Provisions
VMs from
Cloud
Providers
Run Hadoop as Docker Containers

Swarm + Consul for Placement and Discovery

Cloudbreak
Run Hadoop as Docker containers
Docker Docker
DockerDockerDocker
Docker

Cloudbreak
Docker Docker
DockerDockerDocker
Docker
amb-
agn
amb-ser
amb-
agn
amb-
agn
amb-
agn
amb-
agn
Blueprint

Cloudbreak
Docker Docker
DockerDockerDocker
Docker
amb-agn
- hdfs
- hbase
amb-ser
amb-agn
-hdfs
-hive
amb-agn
-hdfs
-yarn
amb-agn
-hdfs
-zookpr
amb-agn
-nmnode
-hdfs

• Quick installation with pre-pulled rpms
• Same process/images for dev/qa/prod
• Same process for single/multi-node
Benefits of running Hadoop on Docker

Cisco and Hortonworks’ Partnership
100% open source Hadoop Distribution,
Support and Training
Integrated Infrastructures for Big Data
CISCO AND HORTONWORKS ARE PARTNERING TO HELP YOU BUILD
YOUR BIG DATA SOLUTION AND REACH MASSIVE SCALABILITY,
SUPERIOR EFFICIENCY AND DRAMATICALLY LOWER TOTAL COST OF
OWNERSHIP THANKS TO A VALIDATED JOINT ARCHITECTURE.

Results of the collaboration
• Efficient Hadoop as a
service
• Adoption of Docker for
enterprise Hadoop
deployment
Tasks
Cisco
InterCloud
Public Cloud
Provider
HDP installation
15:04 mins 11:55 mins
Teragen (avg of 3 execution)
7:08 mins 22:15 mins
Terasort(avg of 3 execution)
32:09 mins 60:12 mins
Teravalidate(avg of 3
execution)
2:31 mins 10:40 mins

Observations Future Collaboration
• Docker is maturing inside enterprises
• Interest to run Docker on top of bare
metal
• Big data app developers are leaning
towards containerization of apps
• YARN is becoming application
deployment platform beyond big data
apps
• Demand for native containerized fully
managed app on YARN
• Run Docker natively on
Openstack
• Run Docker on Yarn
• OpenStack bare metal

Conclusion
Data Science
IoT
BI / Analytics
Dev / Test
Blueprints
HDP
HDP + Cisco InterCloud - Efficient Hadoop-as-a-service

Learn More
Download the Hortonworks Sandbox
Learn Hadoop
Build Your Analytic App
Try Hadoop 2
More about Cisco & Hortonworks
http://hortonworks.com/partner/cisco/
More about Hortonworks’ Acquisition of SequenceIQ
http://bit.ly/1R1ktxO

Hadoop on Docker

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (9)

Similar to Hadoop on Docker

Similar to Hadoop on Docker (20)

Recently uploaded

Recently uploaded (20)

Hadoop on Docker

Editor's Notes