SlideShare a Scribd company logo
1 of 79
Download to read offline
Hadoop Everywhere
Hortonworks. We do Hadoop.
$ whoami
Sean Roberts
Partner Solutions Engineer
London, EMEA &
everywhere
@seano
linkedin.com/in/seanorama
MacGyver. Data Freak. Cook.
Autodidact. Volunteer. Ancestral
Health. Fito. Couchsurfer. Nomad
- HDP 2.3
- http://hortonworks.com/
- Hadoop Summit recordings:
- http://2015.hadoopsummit.org/san-jose/
- http://2015.hadoopsummit.org/brussels/
- Past & Future workshops:
- http://hortonworks.com/partners/learn/
What’s New!
Agenda
● Hadoop Everywhere
● Deployment challenges & requirements
● Cloudbreak & our Docker approach
● Workshop: Your own CloudBreak
○ And auto-scaling with Periscope
● Cloud best practices
Reminder:
● Attendee phone lines are muted
● Please ask questions in the chat
Page 5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Disclaimer
This document may contain product features and technology directions that are under
development, may be under development in the future or may ultimately not be
developed.
Project capabilities are based on information that is publicly available within the Apache
Software Foundation project websites ("Apache"). Progress of the project capabilities
can be tracked from inception to release through Apache, however, technical feasibility,
market demand, user feedback and the overarching Apache Software Foundation
community development process can all effect timing and final delivery.
This document’s description of these features and technology directions does not
represent a contractual commitment, promise or obligation from Hortonworks to deliver
these features in any generally available product.
Product features and technology directions are subject to change, and must not be
included in contracts, purchase orders, or sales agreements of any kind.
Since this document contains an outline of general product development plans,
customers should not rely upon it when making purchasing decisions.
Hadoop Everywhere
Page 7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Any application
Batch, interactive, and real-time
Any data
Existing and new datasets
Anywhere
Complete range of deployment options
Commodity Appliance Cloud
YARN: data operating system
Existing
applications
New
analytics
Partner
applications
Data access: batch, interactive, real-time
Hadoop Everywhere
Page 8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hybrid Deployment Choice
Windows, Linux, On-Premise or Cloud
Data “gravity” guides choice
Compatible Clusters
Run applications and data processing
workloads wherever and whenever
needed
Replicated Datasets
Democratize Hadoop data access via
automated sharing of datasets using
Apache Falcon
Hadoop Up There, Down Here...Everywhere!
Dev / Test BI / ML
IoT Apps
On-Premises
Page 9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Use Cases Where?
Active Archive / Compliance Reporting Sensitive data = “down here”; “up there” valid for many
scenarios
ETL / Data Warehouse Optimization
Usually has “down here” gravity; DW in cloud is changing
that
Smart Meter Analysis Data typically flows “up there”
Single View of Customer
May have “down here” gravity; unless you’re using SaaS
apps
Supply Chain Optimization May have heavy “down here” gravity
New Data for Product Management “Up there” could be considered for many scenarios.
Vehicle Data for
Transportation/Logistics
Why not “up there”?
Vehicle Data for Insurance
May have “down here” gravity (ex. join with existing risk
data)
Anywhere? Up There or Down Here?
Deployment
Challenges & Requirements
Deployment challenges
● Infrastructure is different everywhere
○ e.g. Each cloud provider has their own API
○ e.g. Each provider has different networking methods
● OS/images are different everywhere
● How to do service discovery?
● How to dynamically scale/manage?
See prior operations workshops
- Infrastructure
- Operating System
- Environment Prepared (see docs)
- Ambari Agent/Server installed & registered
- Deploy HDP Cluster
- Ambari Blueprints or Cluster Wizard
- Ongoing configuration/management
Deployment requirements
Options for Automation
- Many combinations of tools
- e.g. Foreman, Ansible, Chef, Puppet, docker-ambari,
shell scripts, CloudFormation, …
- Provider specific
- Cisco UCS, Teradata, HP, Google’s bdutil, …
- Docker with Cloudbreak
Using Ambari with all of the above!
https://github.com/seanorama/ambari-bootstrap/
Demo: Basic script-based example
https://github.com/seanorama/ambari-bootstrap
Requirements:
● Infrastructure prepped (see HDP docs)
● Nodes with RedHat EL or CentOS 6 systems
● HDFS paths mounted (see HDP docs)
● sudo or root access
ambari-bootstrap
After Ambari deployment
● (optional) Configure local YUM/APT repos
● Deploy HDP with Ambari Wizard or Blueprint
● Ongoing configuration/management
Using Ansible
https://github.com/rackerlabs/ansible-hadoop
Build once. Deploy anywhere.
Docker
Page 19 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Page 20 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Multiplicity
of
Stacks
Multiplicity
of hardware
environments
Static website Web
frontend
User
DB
Queu
e
Analytics
DB
Development VM
QA server Public Cloud
Contributor’s laptop
Docker is a “Shipping Container” System for Code
Production Cluster
Customer Data Center
An engine that enables any payload to be
encapsulated as a lightweight, portable,
self-sufficient container
Page 21 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Docker
• Container based virtualization
• Lightweight and portable
• Build once, run anywhere
• Ease of packaging applications
• Automated and scripted
• Isolated
Page 22 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Why Is Docker So Exciting?
For Developers:
Build once…run anywhere
• A clean, safe, and portable runtime
environment for your app.
• No missing dependencies, packages etc.
• Run each app in its own isolated container
• Automate testing, integration, packaging
• Reduce/eliminate concerns about
compatibility on different platforms
• Cheap, zero-penalty containers to deploy
services
For DevOps:
Configure once…run anything
• Make the entire lifecycle more efficient,
consistent, and repeatable
• Eliminate inconsistencies between SDLC
stages
• Support segregation of duties
• Significantly improves the speed and
reliability of CICD
• Significantly lightweight compared to VMs
Page 23 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
More Technical Explanation
WHY WHA
T
• Run on any LINUX
• Regardless of kernel version (2.6.32+)
• Regardless of host distro
• Physical or virtual, cloud or not
• Container and host architecture must match
• Run anything
• If it can run on the host, it can run in the
container
• i.e. if it can run on a Linux kernel, it can run
• High Level—It’s a lightweight VM
• Own process space
• Own network interface
• Can run stuff as root
• Low Level—It’s chroot on steroids
• Container=isolated processes
• Share kernel with host
• No device emulation (neither HVM nor PV)
from host)
Page 24 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Docker - How it works
App
A
Hypervisor (Type 2)
Host OS
Server
Guest
OS
Bins/
Libs
App
A’
Gues
t
OS
Bins/
Libs
App
B
Gues
t
OS
Bins/
Libs
Docker
Host OS kernel
Server
bin
AppA
lib
AppB
VM
Container
Containers are isolated. Share OS
and bins/libraries
Guest
OS
Guest
OS
…result is significantly faster
deployment, much less overhead,
easier migration, faster restart
lib
AppB
lib
AppB
lib
AppB
bin
AppA
Cloudbreak
Tool for Provision and Managing Hadoop Clusters In The
Cloud
Page 26 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Cloudbreak
• Developed by SequenceIQ
• Open source with Apache 2.0
license [ Apache project soon ]
• Cloud and infrastructure
agnostic, cost effective Hadoop
As-a-Service platform API.
• Elastic – can spin up any number
of nodes, add/remove on the fly
• Provides full cloud lifecycle
management post-deployment
Page 27 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Key Features of Cloudbreak
Elastic
• Enables provisioning an
arbitrary node Cluster
• Enables (de)
commissioning nodes
from Cluster
• Policy and time based
based scaling of cluster
Flexible
• Declarative and flexible
Hadoop cluster creation
using blueprints
• Provision to multiple
public cloud providers or
Openstack based private
cloud using same
common API
• Access all of this
functionality through rich
UI, secured REST API or
automatable Shell
Enterprise-ready
• Supports basic, token
based and OAuth2
authentication model
• The cluster is
provisioned in a logically
isolated network
• Tracking usage and
cluster metrics
Page 28 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
BI / Analytics
(Hive)
IoT Apps
(Storm, HBase, Hive)
Launch HDP on Any Cloud for Any Application
Dev / Test
(all HDP services)
Data Science
(Spark)
Cloudbreak
1. Pick a Blueprint
2. Choose a Cloud
3. Launch HDP!
Example Ambari Blueprints:
IoT Apps, BI / Analytics, Data Science, Dev /
Test
Page 29 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Cloudbreak Approach
• Use Ambari for heavy lifting
• Provisioning of Hadoop services
• Monitoring
• Use Ambari Blueprints
• Assign Host groups to physical instance types
• Public/Private Cloud provider API abstracted
• Azure/Google/Amazon/Openstack
• Run Ambari agent/server in Docker container
• Networking: docker run –net=host
• Service discovery: consul (previously serf)
Workshop: Your own Cloudbreak
cloudbreak-deployer
● https://github.com/sequenceiq/cloudbreak-deployer
Requirements:
● A Docker host (laptop, server or Cloud infrastructure)
● Resources:
○ Very little. Tested with 2GB of RAM.
Workshop: Your Own CloudBreak
Requirement: a Docker host
● OSX or Windows: http://boot2docker.io/
○ boot2docker init
○ boot2docker up
○ eval "$(boot2docker shellinit)"
○ boot2docker ssh
● Linux: Install the docker daemon
● Anywhere: docker-machine “lets you create Docker hosts on your
computer, on cloud providers, and inside your own data center”
○ Example on Rackspace:
■ docker-machine create --driver rackspace 
--rackspace-api-key $OS_PASSWORD 
--rackspace-username $OS_USERNAME 
--rackspace-region DFW docker-rax
■ docker-machine ssh docker-rax
Install cloudbreak-deployer
https://github.com/sequenceiq/cloudbreak-deployer
● curl  https://raw.githubusercontent.com/sequenceiq/cloudbreak-
deployer/master/install | sh && cbd --version
● cbd init
● cbd start
You’ll then have your own CloudBreak & Periscope server
with API and Web UI
Done: Your own Cloudbreak
Deploy a cluster with your CloudBreak
Documentation:
http://sequenceiq.
com/cloudbreak/#clou
dbreak-credentials
1. Add Credentials
2. Create Cluster
3. Use your Cluster
Ambari available as expected
To reach your Hadoop hosts:
● SSH to Docker Host
○ Hosts arre listed in “Cloud stack description”
○ ssh cloudbreak@IPofHost
● Shell to the “ambari-agent”
container
○ sudo docker ps | grep ambari-agent
■ note the CONTAINER ID
○ sudo docker -it CONTAINERID bash
● Use the hosts as usual. e.g.:
○ hadoop fs -ls /
Cloudbreak internals
Page 40 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Cloudbreak
Cloudbreak Internals
Uluwatu
(cbreak UI)
Sultans
(User mgmt UI)
Browser
Cloudbreak
shellOAuth2
(UAA)
uaa-db
(psql)
Cloudbreak
(rest API)
cb-db
(psql)
Periscope
(autoscaling
)
ps-db
(psql)
consul registrator ambassador
docker
Docker
Page 42 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Swarm
• Native clustering for Docker
• Distributed container orchestration
• Same API as Docker
Page 43 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Swarm – How it works
• Swarm managers/agents
• Discovery services
• Advanced scheduling
Page 44 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Consul
• Service discovery/registry
• Health checking
• Key/Value store
• DNS
• Multi datacenter aware
Page 45 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Consul – How it works
• Consul servers/agents
• Consistency through a quorum (RAFT)
• Scalability due to gossip based protocol (SWIM)
• Decentralized and fault tolerant
• Highly available
• Consistency over availability (CP)
• Multiple interfaces - HTTP and DNS
• Support for watches
Page 46 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Apache Ambari
• Easy Hadoop cluster provisioning
• Management and monitoring
• Key feature - Blueprints
• REST API, CLI shell
• Extensible
• Stacks
• Services
• Views
Page 47 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Apache Ambari – How it works
• Ambari server/agents
• Define a blueprint (blueprint.json)
• Define a host mapping (hostmapping.json)
• Post the cluster create
Page 48 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Run Hadoop as Docker containers
HDP as Docker
Containers
via Cloudbreak
• Fully Automated Ambari Cluster installation
• Avoid GUI, use rest API only (ambari-shell)
• Fully Automated HDP installation with blueprints
• Quick installation (pre-pulled rpms)
• Same process/images for dev/qa/prod
• Same process for single/multinode
Cloudbreak Ambari HDP
Installs
Ambari
on the
VMs
Docker
VM
Docker
VM
Docker
Linux
Instructs
Ambari
to build
HDP
cluster
Cloud Provider/Bare Metal
Provision
s VMs
from
Cloud
Providers
Page 49 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Provisioning – How it works
Start VMs -
with a running
Docker
daemon
Cloudbreak
Bootstrap
•Start Consul
Cluster
•Start Swarm
Cluster (Consul
for discovery)
Start Ambari
servers/agents
- Swarm API
Ambari
services
registered in
Consul
(Registrator)
Post Blueprint
Page 50 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Cloudbreak
Run Hadoop as Docker containers
Docker Docker
DockerDockerDocker
Docker
Page 51 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Cloudbreak
Run Hadoop as Docker containers
Docker Docker
DockerDockerDocker
Docker
amb-
agn
amb-ser
amb-
agn
amb-
agn
amb-
agn
amb-
agn
Page 52 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Cloudbreak
Run Hadoop as Docker containers
Docker Docker
DockerDockerDocker
Docker
amb-
agn
amb-ser
amb-
agn
amb-
agn
amb-
agn
amb-
agn
Blueprint
Page 53 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Cloudbreak
Run Hadoop as Docker containers
Docker Docker
DockerDockerDocker
Docker
amb-agn
- hdfs
- hbase
amb-ser
amb-agn
-hdfs
-hive
amb-agn
-hdfs
-yarn
amb-agn
-hdfs
-zookpr
amb-agn
-nmnode
-hdfs
Workshop: Auto-Scale your Cluster
with Periscope
Page 55 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Optimize Cloud Usage via Elastic HDP Clusters
Dev / Test
Auto-scaling
Policy
• Policies based on any Ambari metrics
• Dynamically scale to achieve physical elasticity
• Coordinates with YARN to achieve elasticity based on
the policies.
Page 56 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Scaling for Static and Dynamic Clusters
Auto-scale
Policy
Auto-scale
Policy
Auto-scale
Policy
YARN
Ambari
Alerts
Ambari
Metrics
Ambari
Ambari
Ambari
Provisioning
Cloudbreak
Static
Dynamic
Enforces Policies
Scales
Cluster/YARN Apps
Metrics and Alerts
Feed
Cloudbreak/Periscope
Scale by Ambari Monitoring Metric
1. Ambari: review metric
2. CloudBreak: set alert
3. Cloudbreak: set scaling policy
Scale up/down by time
1. Set time-based alert
2. Set scaling policy
Repeat with an alert
and policy which
scales down
Roadmap
Page 60 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Release Summary
Cloudbreak
● It’s own project
(separate from Ambari)
● Supported on Linux
flavors which support
Docker
Periscope
● Feature of Cloudbreak 1.0
● Will be embedded in
Ambari later in 2015
Page 61 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Release Timeline
Cloudbreak 1.0
GA
June/July
2015
Cloudbreak 2.0 GA
2H2015
Ambari 2.1.0
HDP “Dal” / 2.3
Ambari 2.2
HDP “Erie” / 2.4
Cloudbreak 1.1
August 2015
(est)
Ambari 2.1.1
HDP “Dal-M10”
Cloudbreak
Incubator
Proposal
July/August 2015
(est)
Page 62 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Supported Cloud Environments
Cloudbreak
HDP 2.3
Microsoft Azure GA
AWS GA
Google Compute GA
Cloudbreak
HDP 2.3
Cloudbreak HDP
2.4
Openstack
Community
Tech Preview Tech Preview
Red Hat OSP TBD
HP Helion GA (Tentative)
Mirantis
OpenStack
HDP as a Service
Hortonworks Data Platform On Azure
Rackspace
Cloud Big Data Platform
● Rapidly spin up on-demand HDP clusters
● Integrated with Cloud Files (OpenStack Swift)
● Opt-in for Managed Services by Rackspace
Managed Big Data Platform
● Fully Managed HDP on Dedicated and/or Cloud
● Leverage Fanatical Support and Industry Leading SLA’s
● Supported by Rackspace with escalation to Hortonworks
CSC
HDP on IaaS - Best Practices
Microsoft Azure
● Deployment
○ Deploy using CloudBreak
○ Deploy using HWX Azure Gallery Image
● Integrated with Azure Blob Storage
● Supported directly by Hortonworks
● Other offerings
○ Microsoft HDInsight
○ HDP Sandbox
Azure Deployment Guideline
● All in same Region
● Instance Types
○ Typical: A7
○ Performance: D14
○ 8x1TB Standard LRS x3 Virtual Hard Disk per
server
● Multiple Storage Accounts are recommended
○ Recommend no more than 40 Virtual Hard Disks
per Storage Account
Azure Blob Store
Azure Blob Store (Object Storage)
● wasb[s]:
//<containername>@<accountname>.blob.
core.windows.net/<path>
Can be used as a replacement for HDFS
● Thoroughly tested in HDP release test suites
Amazon Web Services
● Deploy using CloudBreak
● Integrated with AWS S3 (object storage)
● Supported directly by Hortonworks
Amazon Deployment Guideline
● All in same Region/AZ
● Instances with Enhanced
Networking
Master Nodes:
● Choose EBS Optimized
● Boot: 100GB on EBS
● Data: 4+ 1TB on EBS
Worker Nodes:
● Boot: 100GB on EBS
● Data: Instance Storage
○ EBS can be used, but local
is preferred
Instance Types:
● Typical: d2.
● Performance: i2.
https://aws.amazon.com/ec2/instance-types/
AWS RDS
● Some services rely on MySQL, Oracle or PostgreSQL:
○ Apache Ambari
○ Apache Hive
○ Apache Oozie
○ Apache Ranger
● Use RDS for these instead of managing yourself.
AWS S3 (Object Storage)
● s3n:// with HDP 2.2 (Hadoop 2.6)
● s3a:// with HDP 2.3 (Hadoop 2.7)
Not currently a direct replacement for HDFS
Recommended to configure access with IAM Role/Policy
● https://docs.aws.amazon.
com/IAM/latest/UserGuide/policies_examples.html#iam-
policy-example-s3
● Example: http://git.io/vLoGY
Google Cloud
● Deploy using
○ CloudBreak
○ Google bdutil with Apache Ambari plug-in
● Integrated with Google Cloud Storage
● Supported directly by Hortonworks
Google Deployment Guideline
● Instance Types
○ Typical: n1 standard 4 with single 1.5 TB
persistent disks
○ Performance: n1 standard 8 with 1TB SSD
● Google GCS (Object Storage)
● gs://<CONFIGBUCKET>/dir/file
● Not currently a replacement for HDFS
S3 & GCS as Secondary storage system
The connectors are currently eventually consistent so do not replace HDFS
Backup
● Falcon, distCP, hadoop fs, HBase ExportSnapshot
● Kafka+Storm bolt sends messages to S3/GCS
providing backup & point-in-time recovery source
Input/Output
● Convenient & broadly used upload/download method
○ As a middleware to ease integration with Hadoop & limit access
● Publishing static content (optionally with CloudFront)
○ Removes need to manage any web services
● Storage for temporary/ephemeral clusters
Questions
$ shutdown -h now
- HDP 2.3
- http://hortonworks.com/
- Hadoop Summit recordings:
- http://2015.hadoopsummit.org/san-jose/
- http://2015.hadoopsummit.org/brussels/
- Past & Future workshops:
- http://hortonworks.com/partners/learn/

More Related Content

What's hot

SAM - Streaming Analytics Made Easy
SAM - Streaming Analytics Made EasySAM - Streaming Analytics Made Easy
SAM - Streaming Analytics Made EasyDataWorks Summit
 
Troubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the BeastTroubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the BeastDataWorks Summit
 
Cloudy with a chance of Hadoop - real world considerations
Cloudy with a chance of Hadoop - real world considerationsCloudy with a chance of Hadoop - real world considerations
Cloudy with a chance of Hadoop - real world considerationsDataWorks Summit
 
Driving in the Desert - Running Your HDP Cluster with Helion, Openstack, and ...
Driving in the Desert - Running Your HDP Cluster with Helion, Openstack, and ...Driving in the Desert - Running Your HDP Cluster with Helion, Openstack, and ...
Driving in the Desert - Running Your HDP Cluster with Helion, Openstack, and ...DataWorks Summit
 
Attunity Hortonworks Webinar- Sept 22, 2016
Attunity Hortonworks Webinar- Sept 22, 2016Attunity Hortonworks Webinar- Sept 22, 2016
Attunity Hortonworks Webinar- Sept 22, 2016Hortonworks
 
Hortonworks tech workshop in-memory processing with spark
Hortonworks tech workshop   in-memory processing with sparkHortonworks tech workshop   in-memory processing with spark
Hortonworks tech workshop in-memory processing with sparkHortonworks
 
Hadoop on Docker
Hadoop on DockerHadoop on Docker
Hadoop on DockerRakesh Saha
 
How to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDBHow to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDBHortonworks
 
Innovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data WarehouseInnovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data WarehouseDataWorks Summit
 
Successes, Challenges, and Pitfalls Migrating a SAAS business to Hadoop
Successes, Challenges, and Pitfalls Migrating a SAAS business to HadoopSuccesses, Challenges, and Pitfalls Migrating a SAAS business to Hadoop
Successes, Challenges, and Pitfalls Migrating a SAAS business to HadoopDataWorks Summit/Hadoop Summit
 
Visualizing Big Data in Realtime
Visualizing Big Data in RealtimeVisualizing Big Data in Realtime
Visualizing Big Data in RealtimeDataWorks Summit
 
Enabling Modern Application Architecture using Data.gov open government data
Enabling Modern Application Architecture using Data.gov open government dataEnabling Modern Application Architecture using Data.gov open government data
Enabling Modern Application Architecture using Data.gov open government dataDataWorks Summit
 
Hadoop and Spark – Perfect Together
Hadoop and Spark – Perfect TogetherHadoop and Spark – Perfect Together
Hadoop and Spark – Perfect TogetherHortonworks
 
Apache NiFi Toronto Meetup
Apache NiFi Toronto MeetupApache NiFi Toronto Meetup
Apache NiFi Toronto MeetupHortonworks
 
Double Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSenseDouble Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSenseHortonworks
 
A First-Hand Look at What's New in HDP 2.3
A First-Hand Look at What's New in HDP 2.3 A First-Hand Look at What's New in HDP 2.3
A First-Hand Look at What's New in HDP 2.3 DataWorks Summit
 
Hive - 1455: Cloud Storage
Hive - 1455: Cloud StorageHive - 1455: Cloud Storage
Hive - 1455: Cloud StorageHortonworks
 
Discover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Discover HDP2.1: Apache Storm for Stream Data Processing in HadoopDiscover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Discover HDP2.1: Apache Storm for Stream Data Processing in HadoopHortonworks
 

What's hot (20)

SAM - Streaming Analytics Made Easy
SAM - Streaming Analytics Made EasySAM - Streaming Analytics Made Easy
SAM - Streaming Analytics Made Easy
 
Troubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the BeastTroubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the Beast
 
Cloudy with a chance of Hadoop - real world considerations
Cloudy with a chance of Hadoop - real world considerationsCloudy with a chance of Hadoop - real world considerations
Cloudy with a chance of Hadoop - real world considerations
 
Driving in the Desert - Running Your HDP Cluster with Helion, Openstack, and ...
Driving in the Desert - Running Your HDP Cluster with Helion, Openstack, and ...Driving in the Desert - Running Your HDP Cluster with Helion, Openstack, and ...
Driving in the Desert - Running Your HDP Cluster with Helion, Openstack, and ...
 
Attunity Hortonworks Webinar- Sept 22, 2016
Attunity Hortonworks Webinar- Sept 22, 2016Attunity Hortonworks Webinar- Sept 22, 2016
Attunity Hortonworks Webinar- Sept 22, 2016
 
Hortonworks tech workshop in-memory processing with spark
Hortonworks tech workshop   in-memory processing with sparkHortonworks tech workshop   in-memory processing with spark
Hortonworks tech workshop in-memory processing with spark
 
Hadoop on Docker
Hadoop on DockerHadoop on Docker
Hadoop on Docker
 
How to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDBHow to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDB
 
Innovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data WarehouseInnovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data Warehouse
 
Successes, Challenges, and Pitfalls Migrating a SAAS business to Hadoop
Successes, Challenges, and Pitfalls Migrating a SAAS business to HadoopSuccesses, Challenges, and Pitfalls Migrating a SAAS business to Hadoop
Successes, Challenges, and Pitfalls Migrating a SAAS business to Hadoop
 
Visualizing Big Data in Realtime
Visualizing Big Data in RealtimeVisualizing Big Data in Realtime
Visualizing Big Data in Realtime
 
LEGO: Data Driven Growth Hacking Powered by Big Data
LEGO: Data Driven Growth Hacking Powered by Big Data LEGO: Data Driven Growth Hacking Powered by Big Data
LEGO: Data Driven Growth Hacking Powered by Big Data
 
Enabling Modern Application Architecture using Data.gov open government data
Enabling Modern Application Architecture using Data.gov open government dataEnabling Modern Application Architecture using Data.gov open government data
Enabling Modern Application Architecture using Data.gov open government data
 
Hadoop and Spark – Perfect Together
Hadoop and Spark – Perfect TogetherHadoop and Spark – Perfect Together
Hadoop and Spark – Perfect Together
 
Apache NiFi Toronto Meetup
Apache NiFi Toronto MeetupApache NiFi Toronto Meetup
Apache NiFi Toronto Meetup
 
Double Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSenseDouble Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSense
 
A First-Hand Look at What's New in HDP 2.3
A First-Hand Look at What's New in HDP 2.3 A First-Hand Look at What's New in HDP 2.3
A First-Hand Look at What's New in HDP 2.3
 
Hive - 1455: Cloud Storage
Hive - 1455: Cloud StorageHive - 1455: Cloud Storage
Hive - 1455: Cloud Storage
 
Discover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Discover HDP2.1: Apache Storm for Stream Data Processing in HadoopDiscover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Discover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
 
Hybrid is the New Normal
Hybrid is the New NormalHybrid is the New Normal
Hybrid is the New Normal
 

Similar to Hortonworks Technical Workshop: HDP everywhere - cloud considerations using cloudbreak 2015 june

Docker based Hadoop Deployment
Docker based Hadoop DeploymentDocker based Hadoop Deployment
Docker based Hadoop DeploymentRakesh Saha
 
DEVNET-1141 Dynamic Dockerized Hadoop Provisioning
DEVNET-1141	Dynamic Dockerized Hadoop ProvisioningDEVNET-1141	Dynamic Dockerized Hadoop Provisioning
DEVNET-1141 Dynamic Dockerized Hadoop ProvisioningCisco DevNet
 
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013Hortonworks
 
Cloud for agile_sw_projects-final
Cloud for agile_sw_projects-finalCloud for agile_sw_projects-final
Cloud for agile_sw_projects-finalAlain Delafosse
 
How to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime
How to Upgrade Your Hadoop Stack in 1 Step -- with Zero DowntimeHow to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime
How to Upgrade Your Hadoop Stack in 1 Step -- with Zero DowntimeIan Lumb
 
Cloud and agile software projects: Overview and Benefits
Cloud and agile software projects: Overview and BenefitsCloud and agile software projects: Overview and Benefits
Cloud and agile software projects: Overview and BenefitsGuillaume Berche
 
20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on dockerWei Ting Chen
 
Cloud Foundry: Hands-on Deployment Workshop
Cloud Foundry: Hands-on Deployment WorkshopCloud Foundry: Hands-on Deployment Workshop
Cloud Foundry: Hands-on Deployment WorkshopManuel Garcia
 
Build Your Own PaaS, Just like Red Hat's OpenShift from LinuxCon 2013 New Orl...
Build Your Own PaaS, Just like Red Hat's OpenShift from LinuxCon 2013 New Orl...Build Your Own PaaS, Just like Red Hat's OpenShift from LinuxCon 2013 New Orl...
Build Your Own PaaS, Just like Red Hat's OpenShift from LinuxCon 2013 New Orl...OpenShift Origin
 
Hello OpenStack, Meet Hadoop
Hello OpenStack, Meet HadoopHello OpenStack, Meet Hadoop
Hello OpenStack, Meet HadoopDataWorks Summit
 
Hadoop Operations – Past, Present, and Future
Hadoop Operations – Past, Present, and FutureHadoop Operations – Past, Present, and Future
Hadoop Operations – Past, Present, and FutureDataWorks Summit
 
MiniFi and Apache NiFi : IoT in Berlin Germany 2018
MiniFi and Apache NiFi : IoT in Berlin Germany 2018MiniFi and Apache NiFi : IoT in Berlin Germany 2018
MiniFi and Apache NiFi : IoT in Berlin Germany 2018Timothy Spann
 
Galera on kubernetes_no_video
Galera on kubernetes_no_videoGalera on kubernetes_no_video
Galera on kubernetes_no_videoPatrick Galbraith
 
Running Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudRunning Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudDataWorks Summit
 
Hadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and FutureHadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and FutureDataWorks Summit
 
IoT with Apache MXNet and Apache NiFi and MiniFi
IoT with Apache MXNet and Apache NiFi and MiniFiIoT with Apache MXNet and Apache NiFi and MiniFi
IoT with Apache MXNet and Apache NiFi and MiniFiDataWorks Summit
 
Apache MXNet for IoT with Apache NiFi
Apache MXNet for IoT with Apache NiFiApache MXNet for IoT with Apache NiFi
Apache MXNet for IoT with Apache NiFiTimothy Spann
 
Transforming Application Delivery with PaaS and Linux Containers
Transforming Application Delivery with PaaS and Linux ContainersTransforming Application Delivery with PaaS and Linux Containers
Transforming Application Delivery with PaaS and Linux ContainersGiovanni Galloro
 
Containers Anywhere with OpenShift by Red Hat
Containers Anywhere with OpenShift by Red HatContainers Anywhere with OpenShift by Red Hat
Containers Anywhere with OpenShift by Red HatAmazon Web Services
 

Similar to Hortonworks Technical Workshop: HDP everywhere - cloud considerations using cloudbreak 2015 june (20)

Docker based Hadoop Deployment
Docker based Hadoop DeploymentDocker based Hadoop Deployment
Docker based Hadoop Deployment
 
DEVNET-1141 Dynamic Dockerized Hadoop Provisioning
DEVNET-1141	Dynamic Dockerized Hadoop ProvisioningDEVNET-1141	Dynamic Dockerized Hadoop Provisioning
DEVNET-1141 Dynamic Dockerized Hadoop Provisioning
 
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
 
Cloud for agile_sw_projects-final
Cloud for agile_sw_projects-finalCloud for agile_sw_projects-final
Cloud for agile_sw_projects-final
 
How to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime
How to Upgrade Your Hadoop Stack in 1 Step -- with Zero DowntimeHow to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime
How to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime
 
Red hat cloud platforms
Red hat cloud platformsRed hat cloud platforms
Red hat cloud platforms
 
Cloud and agile software projects: Overview and Benefits
Cloud and agile software projects: Overview and BenefitsCloud and agile software projects: Overview and Benefits
Cloud and agile software projects: Overview and Benefits
 
20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker
 
Cloud Foundry: Hands-on Deployment Workshop
Cloud Foundry: Hands-on Deployment WorkshopCloud Foundry: Hands-on Deployment Workshop
Cloud Foundry: Hands-on Deployment Workshop
 
Build Your Own PaaS, Just like Red Hat's OpenShift from LinuxCon 2013 New Orl...
Build Your Own PaaS, Just like Red Hat's OpenShift from LinuxCon 2013 New Orl...Build Your Own PaaS, Just like Red Hat's OpenShift from LinuxCon 2013 New Orl...
Build Your Own PaaS, Just like Red Hat's OpenShift from LinuxCon 2013 New Orl...
 
Hello OpenStack, Meet Hadoop
Hello OpenStack, Meet HadoopHello OpenStack, Meet Hadoop
Hello OpenStack, Meet Hadoop
 
Hadoop Operations – Past, Present, and Future
Hadoop Operations – Past, Present, and FutureHadoop Operations – Past, Present, and Future
Hadoop Operations – Past, Present, and Future
 
MiniFi and Apache NiFi : IoT in Berlin Germany 2018
MiniFi and Apache NiFi : IoT in Berlin Germany 2018MiniFi and Apache NiFi : IoT in Berlin Germany 2018
MiniFi and Apache NiFi : IoT in Berlin Germany 2018
 
Galera on kubernetes_no_video
Galera on kubernetes_no_videoGalera on kubernetes_no_video
Galera on kubernetes_no_video
 
Running Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudRunning Enterprise Workloads in the Cloud
Running Enterprise Workloads in the Cloud
 
Hadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and FutureHadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and Future
 
IoT with Apache MXNet and Apache NiFi and MiniFi
IoT with Apache MXNet and Apache NiFi and MiniFiIoT with Apache MXNet and Apache NiFi and MiniFi
IoT with Apache MXNet and Apache NiFi and MiniFi
 
Apache MXNet for IoT with Apache NiFi
Apache MXNet for IoT with Apache NiFiApache MXNet for IoT with Apache NiFi
Apache MXNet for IoT with Apache NiFi
 
Transforming Application Delivery with PaaS and Linux Containers
Transforming Application Delivery with PaaS and Linux ContainersTransforming Application Delivery with PaaS and Linux Containers
Transforming Application Delivery with PaaS and Linux Containers
 
Containers Anywhere with OpenShift by Red Hat
Containers Anywhere with OpenShift by Red HatContainers Anywhere with OpenShift by Red Hat
Containers Anywhere with OpenShift by Red Hat
 

More from Hortonworks

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyHortonworks
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakHortonworks
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsHortonworks
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysHortonworks
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's NewHortonworks
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerHortonworks
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsHortonworks
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeHortonworks
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidHortonworks
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleHortonworks
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATAHortonworks
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Hortonworks
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseHortonworks
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseHortonworks
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationHortonworks
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementHortonworks
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHortonworks
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCHortonworks
 

More from Hortonworks (20)

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's New
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
 

Recently uploaded

Your enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4jYour enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4jNeo4j
 
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPTiSEO AI
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...FIDO Alliance
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!Memoori
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireExakis Nelite
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingScyllaDB
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessUXDXConf
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfSrushith Repakula
 
Using IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandUsing IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandIES VE
 
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties ReimaginedEasier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties Reimaginedpanagenda
 
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The InsideCollecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The InsideStefan Dietze
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxFIDO Alliance
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Patrick Viafore
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch TuesdayIvanti
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe中 央社
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfFIDO Alliance
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераMark Opanasiuk
 
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Skynet Technologies
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfFIDO Alliance
 

Recently uploaded (20)

Your enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4jYour enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4j
 
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - Questionnaire
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdf
 
Using IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandUsing IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & Ireland
 
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties ReimaginedEasier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
 
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The InsideCollecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptx
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch Tuesday
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджера
 
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 

Hortonworks Technical Workshop: HDP everywhere - cloud considerations using cloudbreak 2015 june

  • 2. $ whoami Sean Roberts Partner Solutions Engineer London, EMEA & everywhere @seano linkedin.com/in/seanorama MacGyver. Data Freak. Cook. Autodidact. Volunteer. Ancestral Health. Fito. Couchsurfer. Nomad
  • 3. - HDP 2.3 - http://hortonworks.com/ - Hadoop Summit recordings: - http://2015.hadoopsummit.org/san-jose/ - http://2015.hadoopsummit.org/brussels/ - Past & Future workshops: - http://hortonworks.com/partners/learn/ What’s New!
  • 4. Agenda ● Hadoop Everywhere ● Deployment challenges & requirements ● Cloudbreak & our Docker approach ● Workshop: Your own CloudBreak ○ And auto-scaling with Periscope ● Cloud best practices Reminder: ● Attendee phone lines are muted ● Please ask questions in the chat
  • 5. Page 5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Disclaimer This document may contain product features and technology directions that are under development, may be under development in the future or may ultimately not be developed. Project capabilities are based on information that is publicly available within the Apache Software Foundation project websites ("Apache"). Progress of the project capabilities can be tracked from inception to release through Apache, however, technical feasibility, market demand, user feedback and the overarching Apache Software Foundation community development process can all effect timing and final delivery. This document’s description of these features and technology directions does not represent a contractual commitment, promise or obligation from Hortonworks to deliver these features in any generally available product. Product features and technology directions are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind. Since this document contains an outline of general product development plans, customers should not rely upon it when making purchasing decisions.
  • 7. Page 7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Any application Batch, interactive, and real-time Any data Existing and new datasets Anywhere Complete range of deployment options Commodity Appliance Cloud YARN: data operating system Existing applications New analytics Partner applications Data access: batch, interactive, real-time Hadoop Everywhere
  • 8. Page 8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Hybrid Deployment Choice Windows, Linux, On-Premise or Cloud Data “gravity” guides choice Compatible Clusters Run applications and data processing workloads wherever and whenever needed Replicated Datasets Democratize Hadoop data access via automated sharing of datasets using Apache Falcon Hadoop Up There, Down Here...Everywhere! Dev / Test BI / ML IoT Apps On-Premises
  • 9. Page 9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Use Cases Where? Active Archive / Compliance Reporting Sensitive data = “down here”; “up there” valid for many scenarios ETL / Data Warehouse Optimization Usually has “down here” gravity; DW in cloud is changing that Smart Meter Analysis Data typically flows “up there” Single View of Customer May have “down here” gravity; unless you’re using SaaS apps Supply Chain Optimization May have heavy “down here” gravity New Data for Product Management “Up there” could be considered for many scenarios. Vehicle Data for Transportation/Logistics Why not “up there”? Vehicle Data for Insurance May have “down here” gravity (ex. join with existing risk data) Anywhere? Up There or Down Here?
  • 11. Deployment challenges ● Infrastructure is different everywhere ○ e.g. Each cloud provider has their own API ○ e.g. Each provider has different networking methods ● OS/images are different everywhere ● How to do service discovery? ● How to dynamically scale/manage? See prior operations workshops
  • 12. - Infrastructure - Operating System - Environment Prepared (see docs) - Ambari Agent/Server installed & registered - Deploy HDP Cluster - Ambari Blueprints or Cluster Wizard - Ongoing configuration/management Deployment requirements
  • 13. Options for Automation - Many combinations of tools - e.g. Foreman, Ansible, Chef, Puppet, docker-ambari, shell scripts, CloudFormation, … - Provider specific - Cisco UCS, Teradata, HP, Google’s bdutil, … - Docker with Cloudbreak Using Ambari with all of the above!
  • 15. https://github.com/seanorama/ambari-bootstrap Requirements: ● Infrastructure prepped (see HDP docs) ● Nodes with RedHat EL or CentOS 6 systems ● HDFS paths mounted (see HDP docs) ● sudo or root access ambari-bootstrap
  • 16. After Ambari deployment ● (optional) Configure local YUM/APT repos ● Deploy HDP with Ambari Wizard or Blueprint ● Ongoing configuration/management
  • 18. Build once. Deploy anywhere. Docker
  • 19. Page 19 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
  • 20. Page 20 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Multiplicity of Stacks Multiplicity of hardware environments Static website Web frontend User DB Queu e Analytics DB Development VM QA server Public Cloud Contributor’s laptop Docker is a “Shipping Container” System for Code Production Cluster Customer Data Center An engine that enables any payload to be encapsulated as a lightweight, portable, self-sufficient container
  • 21. Page 21 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Docker • Container based virtualization • Lightweight and portable • Build once, run anywhere • Ease of packaging applications • Automated and scripted • Isolated
  • 22. Page 22 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Why Is Docker So Exciting? For Developers: Build once…run anywhere • A clean, safe, and portable runtime environment for your app. • No missing dependencies, packages etc. • Run each app in its own isolated container • Automate testing, integration, packaging • Reduce/eliminate concerns about compatibility on different platforms • Cheap, zero-penalty containers to deploy services For DevOps: Configure once…run anything • Make the entire lifecycle more efficient, consistent, and repeatable • Eliminate inconsistencies between SDLC stages • Support segregation of duties • Significantly improves the speed and reliability of CICD • Significantly lightweight compared to VMs
  • 23. Page 23 © Hortonworks Inc. 2011 – 2015. All Rights Reserved More Technical Explanation WHY WHA T • Run on any LINUX • Regardless of kernel version (2.6.32+) • Regardless of host distro • Physical or virtual, cloud or not • Container and host architecture must match • Run anything • If it can run on the host, it can run in the container • i.e. if it can run on a Linux kernel, it can run • High Level—It’s a lightweight VM • Own process space • Own network interface • Can run stuff as root • Low Level—It’s chroot on steroids • Container=isolated processes • Share kernel with host • No device emulation (neither HVM nor PV) from host)
  • 24. Page 24 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Docker - How it works App A Hypervisor (Type 2) Host OS Server Guest OS Bins/ Libs App A’ Gues t OS Bins/ Libs App B Gues t OS Bins/ Libs Docker Host OS kernel Server bin AppA lib AppB VM Container Containers are isolated. Share OS and bins/libraries Guest OS Guest OS …result is significantly faster deployment, much less overhead, easier migration, faster restart lib AppB lib AppB lib AppB bin AppA
  • 25. Cloudbreak Tool for Provision and Managing Hadoop Clusters In The Cloud
  • 26. Page 26 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Cloudbreak • Developed by SequenceIQ • Open source with Apache 2.0 license [ Apache project soon ] • Cloud and infrastructure agnostic, cost effective Hadoop As-a-Service platform API. • Elastic – can spin up any number of nodes, add/remove on the fly • Provides full cloud lifecycle management post-deployment
  • 27. Page 27 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Key Features of Cloudbreak Elastic • Enables provisioning an arbitrary node Cluster • Enables (de) commissioning nodes from Cluster • Policy and time based based scaling of cluster Flexible • Declarative and flexible Hadoop cluster creation using blueprints • Provision to multiple public cloud providers or Openstack based private cloud using same common API • Access all of this functionality through rich UI, secured REST API or automatable Shell Enterprise-ready • Supports basic, token based and OAuth2 authentication model • The cluster is provisioned in a logically isolated network • Tracking usage and cluster metrics
  • 28. Page 28 © Hortonworks Inc. 2011 – 2015. All Rights Reserved BI / Analytics (Hive) IoT Apps (Storm, HBase, Hive) Launch HDP on Any Cloud for Any Application Dev / Test (all HDP services) Data Science (Spark) Cloudbreak 1. Pick a Blueprint 2. Choose a Cloud 3. Launch HDP! Example Ambari Blueprints: IoT Apps, BI / Analytics, Data Science, Dev / Test
  • 29. Page 29 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Cloudbreak Approach • Use Ambari for heavy lifting • Provisioning of Hadoop services • Monitoring • Use Ambari Blueprints • Assign Host groups to physical instance types • Public/Private Cloud provider API abstracted • Azure/Google/Amazon/Openstack • Run Ambari agent/server in Docker container • Networking: docker run –net=host • Service discovery: consul (previously serf)
  • 30. Workshop: Your own Cloudbreak
  • 31. cloudbreak-deployer ● https://github.com/sequenceiq/cloudbreak-deployer Requirements: ● A Docker host (laptop, server or Cloud infrastructure) ● Resources: ○ Very little. Tested with 2GB of RAM. Workshop: Your Own CloudBreak
  • 32. Requirement: a Docker host ● OSX or Windows: http://boot2docker.io/ ○ boot2docker init ○ boot2docker up ○ eval "$(boot2docker shellinit)" ○ boot2docker ssh ● Linux: Install the docker daemon ● Anywhere: docker-machine “lets you create Docker hosts on your computer, on cloud providers, and inside your own data center” ○ Example on Rackspace: ■ docker-machine create --driver rackspace --rackspace-api-key $OS_PASSWORD --rackspace-username $OS_USERNAME --rackspace-region DFW docker-rax ■ docker-machine ssh docker-rax
  • 33. Install cloudbreak-deployer https://github.com/sequenceiq/cloudbreak-deployer ● curl https://raw.githubusercontent.com/sequenceiq/cloudbreak- deployer/master/install | sh && cbd --version ● cbd init ● cbd start You’ll then have your own CloudBreak & Periscope server with API and Web UI
  • 34. Done: Your own Cloudbreak
  • 35. Deploy a cluster with your CloudBreak
  • 38. 3. Use your Cluster Ambari available as expected To reach your Hadoop hosts: ● SSH to Docker Host ○ Hosts arre listed in “Cloud stack description” ○ ssh cloudbreak@IPofHost ● Shell to the “ambari-agent” container ○ sudo docker ps | grep ambari-agent ■ note the CONTAINER ID ○ sudo docker -it CONTAINERID bash ● Use the hosts as usual. e.g.: ○ hadoop fs -ls /
  • 40. Page 40 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Cloudbreak Cloudbreak Internals Uluwatu (cbreak UI) Sultans (User mgmt UI) Browser Cloudbreak shellOAuth2 (UAA) uaa-db (psql) Cloudbreak (rest API) cb-db (psql) Periscope (autoscaling ) ps-db (psql) consul registrator ambassador docker
  • 42. Page 42 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Swarm • Native clustering for Docker • Distributed container orchestration • Same API as Docker
  • 43. Page 43 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Swarm – How it works • Swarm managers/agents • Discovery services • Advanced scheduling
  • 44. Page 44 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Consul • Service discovery/registry • Health checking • Key/Value store • DNS • Multi datacenter aware
  • 45. Page 45 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Consul – How it works • Consul servers/agents • Consistency through a quorum (RAFT) • Scalability due to gossip based protocol (SWIM) • Decentralized and fault tolerant • Highly available • Consistency over availability (CP) • Multiple interfaces - HTTP and DNS • Support for watches
  • 46. Page 46 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Apache Ambari • Easy Hadoop cluster provisioning • Management and monitoring • Key feature - Blueprints • REST API, CLI shell • Extensible • Stacks • Services • Views
  • 47. Page 47 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Apache Ambari – How it works • Ambari server/agents • Define a blueprint (blueprint.json) • Define a host mapping (hostmapping.json) • Post the cluster create
  • 48. Page 48 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Run Hadoop as Docker containers HDP as Docker Containers via Cloudbreak • Fully Automated Ambari Cluster installation • Avoid GUI, use rest API only (ambari-shell) • Fully Automated HDP installation with blueprints • Quick installation (pre-pulled rpms) • Same process/images for dev/qa/prod • Same process for single/multinode Cloudbreak Ambari HDP Installs Ambari on the VMs Docker VM Docker VM Docker Linux Instructs Ambari to build HDP cluster Cloud Provider/Bare Metal Provision s VMs from Cloud Providers
  • 49. Page 49 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Provisioning – How it works Start VMs - with a running Docker daemon Cloudbreak Bootstrap •Start Consul Cluster •Start Swarm Cluster (Consul for discovery) Start Ambari servers/agents - Swarm API Ambari services registered in Consul (Registrator) Post Blueprint
  • 50. Page 50 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Cloudbreak Run Hadoop as Docker containers Docker Docker DockerDockerDocker Docker
  • 51. Page 51 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Cloudbreak Run Hadoop as Docker containers Docker Docker DockerDockerDocker Docker amb- agn amb-ser amb- agn amb- agn amb- agn amb- agn
  • 52. Page 52 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Cloudbreak Run Hadoop as Docker containers Docker Docker DockerDockerDocker Docker amb- agn amb-ser amb- agn amb- agn amb- agn amb- agn Blueprint
  • 53. Page 53 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Cloudbreak Run Hadoop as Docker containers Docker Docker DockerDockerDocker Docker amb-agn - hdfs - hbase amb-ser amb-agn -hdfs -hive amb-agn -hdfs -yarn amb-agn -hdfs -zookpr amb-agn -nmnode -hdfs
  • 54. Workshop: Auto-Scale your Cluster with Periscope
  • 55. Page 55 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Optimize Cloud Usage via Elastic HDP Clusters Dev / Test Auto-scaling Policy • Policies based on any Ambari metrics • Dynamically scale to achieve physical elasticity • Coordinates with YARN to achieve elasticity based on the policies.
  • 56. Page 56 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Scaling for Static and Dynamic Clusters Auto-scale Policy Auto-scale Policy Auto-scale Policy YARN Ambari Alerts Ambari Metrics Ambari Ambari Ambari Provisioning Cloudbreak Static Dynamic Enforces Policies Scales Cluster/YARN Apps Metrics and Alerts Feed Cloudbreak/Periscope
  • 57. Scale by Ambari Monitoring Metric 1. Ambari: review metric 2. CloudBreak: set alert 3. Cloudbreak: set scaling policy
  • 58. Scale up/down by time 1. Set time-based alert 2. Set scaling policy Repeat with an alert and policy which scales down
  • 60. Page 60 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Release Summary Cloudbreak ● It’s own project (separate from Ambari) ● Supported on Linux flavors which support Docker Periscope ● Feature of Cloudbreak 1.0 ● Will be embedded in Ambari later in 2015
  • 61. Page 61 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Release Timeline Cloudbreak 1.0 GA June/July 2015 Cloudbreak 2.0 GA 2H2015 Ambari 2.1.0 HDP “Dal” / 2.3 Ambari 2.2 HDP “Erie” / 2.4 Cloudbreak 1.1 August 2015 (est) Ambari 2.1.1 HDP “Dal-M10” Cloudbreak Incubator Proposal July/August 2015 (est)
  • 62. Page 62 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Supported Cloud Environments Cloudbreak HDP 2.3 Microsoft Azure GA AWS GA Google Compute GA Cloudbreak HDP 2.3 Cloudbreak HDP 2.4 Openstack Community Tech Preview Tech Preview Red Hat OSP TBD HP Helion GA (Tentative) Mirantis OpenStack
  • 63. HDP as a Service
  • 65. Rackspace Cloud Big Data Platform ● Rapidly spin up on-demand HDP clusters ● Integrated with Cloud Files (OpenStack Swift) ● Opt-in for Managed Services by Rackspace Managed Big Data Platform ● Fully Managed HDP on Dedicated and/or Cloud ● Leverage Fanatical Support and Industry Leading SLA’s ● Supported by Rackspace with escalation to Hortonworks
  • 66. CSC
  • 67. HDP on IaaS - Best Practices
  • 68. Microsoft Azure ● Deployment ○ Deploy using CloudBreak ○ Deploy using HWX Azure Gallery Image ● Integrated with Azure Blob Storage ● Supported directly by Hortonworks ● Other offerings ○ Microsoft HDInsight ○ HDP Sandbox
  • 69. Azure Deployment Guideline ● All in same Region ● Instance Types ○ Typical: A7 ○ Performance: D14 ○ 8x1TB Standard LRS x3 Virtual Hard Disk per server ● Multiple Storage Accounts are recommended ○ Recommend no more than 40 Virtual Hard Disks per Storage Account
  • 70. Azure Blob Store Azure Blob Store (Object Storage) ● wasb[s]: //<containername>@<accountname>.blob. core.windows.net/<path> Can be used as a replacement for HDFS ● Thoroughly tested in HDP release test suites
  • 71. Amazon Web Services ● Deploy using CloudBreak ● Integrated with AWS S3 (object storage) ● Supported directly by Hortonworks
  • 72. Amazon Deployment Guideline ● All in same Region/AZ ● Instances with Enhanced Networking Master Nodes: ● Choose EBS Optimized ● Boot: 100GB on EBS ● Data: 4+ 1TB on EBS Worker Nodes: ● Boot: 100GB on EBS ● Data: Instance Storage ○ EBS can be used, but local is preferred Instance Types: ● Typical: d2. ● Performance: i2. https://aws.amazon.com/ec2/instance-types/
  • 73. AWS RDS ● Some services rely on MySQL, Oracle or PostgreSQL: ○ Apache Ambari ○ Apache Hive ○ Apache Oozie ○ Apache Ranger ● Use RDS for these instead of managing yourself.
  • 74. AWS S3 (Object Storage) ● s3n:// with HDP 2.2 (Hadoop 2.6) ● s3a:// with HDP 2.3 (Hadoop 2.7) Not currently a direct replacement for HDFS Recommended to configure access with IAM Role/Policy ● https://docs.aws.amazon. com/IAM/latest/UserGuide/policies_examples.html#iam- policy-example-s3 ● Example: http://git.io/vLoGY
  • 75. Google Cloud ● Deploy using ○ CloudBreak ○ Google bdutil with Apache Ambari plug-in ● Integrated with Google Cloud Storage ● Supported directly by Hortonworks
  • 76. Google Deployment Guideline ● Instance Types ○ Typical: n1 standard 4 with single 1.5 TB persistent disks ○ Performance: n1 standard 8 with 1TB SSD ● Google GCS (Object Storage) ● gs://<CONFIGBUCKET>/dir/file ● Not currently a replacement for HDFS
  • 77. S3 & GCS as Secondary storage system The connectors are currently eventually consistent so do not replace HDFS Backup ● Falcon, distCP, hadoop fs, HBase ExportSnapshot ● Kafka+Storm bolt sends messages to S3/GCS providing backup & point-in-time recovery source Input/Output ● Convenient & broadly used upload/download method ○ As a middleware to ease integration with Hadoop & limit access ● Publishing static content (optionally with CloudFront) ○ Removes need to manage any web services ● Storage for temporary/ephemeral clusters
  • 79. $ shutdown -h now - HDP 2.3 - http://hortonworks.com/ - Hadoop Summit recordings: - http://2015.hadoopsummit.org/san-jose/ - http://2015.hadoopsummit.org/brussels/ - Past & Future workshops: - http://hortonworks.com/partners/learn/