Deploying Hadoop on Openstack is never been easier but Hortonworks and Cisco collaboration in last few months makes it completely automated and seamless.
This is cautionary statement as this presentation may have product and collaboration direction which are subject to change.
We were founded in 2011 by 24 developers from Yahoo where Hadoop was conceived to address data challenges at internet scale. What we now know of as Hadoop really started in 2005, when a team at Yahoo was directed to build out a large-scale data storage and processing technology that would allow them to improve their most critical application, Search.
Their challenge was essentially two-fold. First they needed to capture and archive the contents of the internet, and then process the data so that users could search through it effectively an efficiently. Clearly traditional approaches were both technically (due to the size of the data) and commercially (due to the cost) impractical. The result was the Apache Hadoop project that delivered large scale storage (HDFS) and processing (MapReduce).
Today we are over 600 employees and have partnered with over 1000 companies who are the leaders in the data center
We have also been very fortunate to achieve very significant customer adoption with over 330 customers as of the end of 2014, spanning nearly every vertical.
Hortonworks was founded the sole intent to make Hadoop an enterprise data platform. With YARN as its foundation, HDP delivers a centralized architecture with true multi-tenancy for data-processing and shared services for Security, Governance and Operations to satisfy enterprise requirements, all deeply integrated and certified with leading datacenter technologies.
We are uniquely focused on this transformation of Hadoop and doing our work completely in open source. This is all predicated on our leadership in the community, which enables not only to best support users of but also provides uniquely present customer requirements within this open, thriving community.
Hortonworks approach is quite clear… we are focused on delivery of enterprise grade Hadoop as a reliable data platform that will enable your transition to a modern data architecture. To this end, we work solely within the broad open source community with a focus on innovation at the core of Apache Hadoop with YARN as a foundation and then within all the related projects that deliver on the key requirements for the enterprise such as governance, security and operation.
Since our incepetion just three years ago, we have grown to more than 450 employees and have partnered closely with the leaders in the datacenter, all of whom share this vision: to enable a modern data architecture with Hadoop in order to allow their customers to address the architectural challenge that they all are facing due to exploding data volumes.
Hortonworks Open platform approach enables us to partner and co-exist with other data center technologies. Our deep engineering relationship with data center leaders like Cisco makes it possible for customers to augment their data center with Hadoop technologies for their next generation modern data architecture.
Hortonwork’s Hadoop platform had already been enabled deployment Hadoop in any environment from Linux to Windows , Bare metal to Cloud so that Hadoop deployment environment should be business decision rather than a technical one. In continuation of such Hadoop Everywhere vision, Hortonworks recent acquisition of SequenceIQ added a provisioning and auto-scaling toolset which makes it even more easier to deploy Hadoop in private and public Cloud to accelerate the time-to-value for Hadoop deployment.
Cloudbreak is developed by SequenceIQ company from beautiful city of Budapest. Hortonworks acquired them in the month of April.
Cloudbreak is open source with Apache 2.0 license and uses many other open source technologies as the build blocks including Docker.
It is Hadoop cluster deployment and management tool which can deploy any app or use case specific hadoop cluster to public and private cloud environment in matter of minutes.
It also provide on-going cluster infrastructure management including policy based auto-scaling of clusters to optimize infrastructure usage.
Cloudbreak enables launching Hadoop cluster in 4 easy steps.
Cloudbreak support heterogeneous instances for building the hadoop cluster as all service or service components are not same in terms of their resource requirement.
Cloudbreak not only simplify the Hadoop cluster provisioning in Cisco Openstack Cloud but also automatically scale the Hadoop clusters based on SLA or time based policies. SLA is monitored through Hadoop service metrics captured by Ambari. This way Cloudbreak enables you to get an elastic Hadoop clusters very quickly in Cisco Openstack Cloud.
Cloudbreak actively monitors Ambari metrics to assess health of every Hadoop service. It allows defining policies based on these metrics for every cluster deployed and enabled for auto-scaling. Based on these metrics and user defined policies , cloudbreak can scale clusters or services by adding nodes or allocating more yarn containers depending of the type of hadoop service.
View from 10000 ft high.
Only thing it will need is a Docker daemon. All cloud providers are going towards Docker including Cisco Intercloud.
Quick question - How many of you have used Docker before.
Docker is a container based virtualization framework. It is an open platform for developers and admins to build, ship, and run distributed applications.
Consisting of Docker Engine, a portable, lightweight runtime and packaging tool, and Docker Hub, a cloud service for sharing applications and automating workflows, Docker enables apps to be quickly assembled from components and eliminates the friction between development, QA, and production environments. Docker is Lightweight, portable VM but without the overhead of a VM.
Unlike traditional virtualization Docker is fast, lightweight and easy to use. Docker allows you to create containers holding all the dependencies for an application. Each container is kept isolated from any other, and nothing gets shared.
Steps: Can span us Docker containers remotely on hosts considering:
1. Resource management - aware of the cluster resources (e.g. can schedule it with bin packing - anywhere where 1GB memory is available) or randomly 2. Constraints using labels (label one node and stsrt the container based on labels) 3. Affinity - containers can be co-scheduled (link, vollumes-from, net=container on the same host)
Best of Hadoop , Docker and Openstack in a single cloud platform to our joint customers.
Description Texas 3 GCP VM types GP2-2Xlarge n1-standard-8 Cores 8 8 Memory 32 GB 30 GB Volume size 2 x 400GB 2 x 400GB Volume type HDD (magnetic) generic (magnetic)
We are expanding our Cloud strategy to meet Enterprise customer demand.
Look at the top first. We’ve done a great job of taking our platform for Private Cloud and provisioning Enterprise workloads. We’ve done a great job with UCS, with VBlock, with FlexPod. As a matter of fact, we are the leader in converged infrastructure today, and that market is expanding as customers look to Cisco and our Partners to deliver the Enterprise workloads and the benefits of Private Cloud. They’re also asking for Dev/Ops models. They want to create truly native applications for the Public Cloud. They want to harness the value of Hadoop and Big Data Analytics and Hana. And they want to leverage the collaborative platform present today. We are the leader in Private Cloud infrastructure.
Along the left-hand side, our Partners have done some amazing things. 3 Million seats of HCS, the IaaS platforms that they’ve invested in, small, medium, large, local community-based infrastructure platforms. Some Partners have enabled the PaaS platform. Some Partners are hosting MicroSoft applications, like Dimension Data does today…globally around the world. Some Partners have managed to build a Citrix or VMware virtual desktop offer.
So what Cisco Cloud Services offer is an engine to generate more services to augment capabilities we’ve invested in, and to do so in a way that only we could do together. You’ll see us leverage the extensions through innovations in the WebEx platform. You’ll see that Meraki is a very powerful model to continue to expand. You’ll hear more about the portfolio of Unified Threat Defense, and comprehensive threat defense that we think only we can bring to the cloud.
You’ll see more about analytics, and the Platforms that we have in store. You’ll soon see more about Hana-as-a-Service. And all the capabilities we can bring, will be an acceleration of those offers that we can bring to you. Why not accelerate all of our capabilities together, using our capabilities in a way that no one else has. And btw, we can’t ignore the big Public Clouds. Let’s use the Intercloud FabricT manager when appropriate to just move a workload out to that Public Cloud. I don’t care if its Azure, or Amazon or Google. Only Cisco can do this through some of the innovations that we have.
How are we going to do this?
Cisco Intercloud Fabric: Solution Overview
Hadoop on Docker
On Cisco InterCloud
Innovation Architect, CIS CTO Group
Dmitri Chtchourov Rakesh Saha
• Developed by SequenceIQ
• Open source with Apache 2.0
license [ Apache project soon ]
• Deploys selected services to
public and private cloud via
• Elastic – can spin up any number
of nodes, add/remove on the fly
• Provides full cloud lifecycle
BI / Analytics
(Storm, HBase, Hive)
Launch HDP on Any Cloud for Any Application
Dev / Test
(all HDP services)
1. Pick a Blueprint
2. Choose a Cloud
3. Launch HDP!
IoT Apps, BI / Analytics, Data Science, Dev /
Hadoop in Cloud Provisioning with Cloudbreak
BI / Analytics
(Storm, HBase, Hive)
Dev / Test
(all HDP services)
• Policies based on any Ambari metrics
• Coordinates with YARN
• Policies are based on Metrics or Time
• Scaling can be service or component
Optimize cloud usage via Elastic Clusters
Provisioning – How it works
Start VMs -
with a running
- Swarm API
Static website Web frontendUser DB Queue Analytics DB
VM QA server Public Cloud
An engine that enables any payload to be
encapsulated as a lightweight, portable,
Docker is a “Shipping Container” System for Code
Build once, run anywhere
VM – without the overhead of a VM
Automated and scripted
Why Is Docker So Exciting?
Build once…run anywhere
• A clean, safe, and portable runtime
environment for your app.
• No missing dependencies, packages etc.
• Run each app in its own isolated container
• Automate testing, integration, packaging
• Reduce/eliminate concerns about
compatibility on different platforms
• Cheap, zero-penalty containers to deploy
Configure once…run anything
• Make the entire lifecycle more efficient,
consistent, and repeatable
• Eliminate inconsistencies between SDLC
• Support segregation of duties
• Significantly improves the speed and
reliability of CICD
• Significantly lightweight compared to VMs
Hypervisor (Type 2)
Host OS kernel
Containers are isolated,
Share only the kernel
…result is significantly faster
deployment, much less overhead,
easier migration, faster restart
Docker: Containers vs. VMs
HDP as Docker
• Running Ambari Cluster in Containers
• Use Blueprint to define services
• All HDP services share a single container
Cloud Provider/Bare Metal
Run Hadoop as Docker Containers
Cisco and Hortonworks’ Partnership
100% open source Hadoop Distribution,
Support and Training
Integrated Infrastructures for Big Data
CISCO AND HORTONWORKS ARE PARTNERING TO HELP YOU BUILD
YOUR BIG DATA SOLUTION AND REACH MASSIVE SCALABILITY,
SUPERIOR EFFICIENCY AND DRAMATICALLY LOWER TOTAL COST OF
OWNERSHIP THANKS TO A VALIDATED JOINT ARCHITECTURE.
Results of the collaboration
• Efficient Hadoop as a
• Adoption of Docker for
15:04 mins 11:55 mins
Teragen (avg of 3 execution)
7:08 mins 22:15 mins
Terasort(avg of 3 execution)
32:09 mins 60:12 mins
Teravalidate(avg of 3
2:31 mins 10:40 mins
Observations Future Collaboration
• Docker is maturing inside enterprises
• Interest to run Docker on top of bare
• Big data app developers are leaning
towards containerization of apps
• YARN is becoming application
deployment platform beyond big data
• Demand for native containerized fully
managed app on YARN
• Run Docker natively on
• Run Docker on Yarn
• OpenStack bare metal
BI / Analytics
Dev / Test
HDP + Cisco InterCloud - Efficient Hadoop-as-a-service
Download the Hortonworks Sandbox
Build Your Analytic App
Try Hadoop 2
More about Cisco & Hortonworks
More about Hortonworks’ Acquisition of SequenceIQ