Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Hadoop on Docker


Published on

This presentation describes how hortonworks is delivering Hadoop on Docker for a cloud-agnostic deployment approach which presented in Cisco Live 2015.

Published in: Software
  • Be the first to comment

Hadoop on Docker

  1. 1. Docker-Based Hadoop Provisioning On Cisco InterCloud Innovation Architect, CIS CTO Group Cisco Dmitri Chtchourov Rakesh Saha Product Management Hortonworks
  2. 2. © Hortonworks Inc. 2011 – 2015. All Rights Reserved Cautionary Statement Regarding Forward-Looking Statements This presentation contains forward-looking statements involving risks and uncertainties. Such forward-looking statements in this presentation generally relate to future events, our ability to increase the number of support subscription customers, the growth in usage of the Hadoop framework, our ability to innovate and develop the various open source projects that will enhance the capabilities of the Hortonworks Data Platform, anticipated customer benefits and general business outlook. In some cases, you can identify forward-looking statements because they contain words such as “may,” “will,” “should,” “expects,” “plans,” “anticipates,” “could,” “intends,” “target,” “projects,” “contemplates,” “believes,” “estimates,” “predicts,” “potential” or “continue” or similar terms or expressions that concern our expectations, strategy, plans or intentions. You should not rely upon forward-looking statements as predictions of future events. We have based the forward-looking statements contained in this presentation primarily on our current expectations and projections about future events and trends that we believe may affect our business, financial condition and prospects. We cannot assure you that the results, events and circumstances reflected in the forward-looking statements will be achieved or occur, and actual results, events, or circumstances could differ materially from those described in the forward-looking statements. The forward-looking statements made in this prospectus relate only to events as of the date on which the statements are made and we undertake no obligation to update any of the information in this presentation. Trademarks Hortonworks is a trademark of Hortonworks, Inc. in the United States and other jurisdictions. Other names used herein may be trademarks of their respective owners.
  3. 3. Speakers Rakesh Saha Product Management Hortonworks Dmitri Chtchourov Innovation Architect, CIS CTO Group Cisco
  4. 4. Agenda • About Hortonworks • Cloudbreak – Docker-based Hadoop provisioning tool • Introduction to Docker • Hadoop Provisioning using Docker • Cisco and Hortonworks Collaboration
  5. 5. © Hortonworks Inc. 2011 – 2015. All Rights Reserved About HortonworksONLY 100open source Apache Hadoop data platform % Founded in 2011 HADOOP 1ST distribution to go public IPO Fall 2014 (NASDAQ: HDP) subscription customers322 employees across 600+ countrie s technology partners 1000+ 17TM
  6. 6. © Hortonworks Inc. 2011 – 2015. All Rights Reserved Hortonworks Mission: Power your Modern Data Architecture with HDP and Enterprise Apache Hadoop Customer Momentum • 300+ customers in seven quarters, growing at 75+/quarter • Two thirds of customers come from F1000 Hortonworks and Hadoop at Scale • HDP in production on largest clusters on planet • Multiple +1000 node clusters, including 35,000 nodes at Yahoo!, 800 nodes at Spotify • Founded in 2011 • Original 24 architects, developers, operators of Hadoop from Yahoo! • We are leaders in Hadoop community • 500+ employees
  7. 7. © Hortonworks Inc. 2011 – 2015. All Rights Reserved OPERATIONAL TOOLS DEV & DATA TOOLS INFRASTRUCTURE HDP is deeply integrated in the data centerSOURCES EXISTING Systems Clickstream Web &Social Geolocation Sensor & Machine Server Logs Unstructured DATASYSTEM RDBMS EDW MPP APPLICATIONS Deep Partnerships Hortonworks engages in deep engineered relationships with the leaders in the data center, such as Cisco, Microsoft, EMC, Pivotal, Teradata, Red Hat, SAS & SAP. Broad Partnerships Over a 1,000 partners work with us to certify their applications to work with Hadoop so they can extend big data to their users. HDP Governance &Integration Security Operations Data Access Data Management YARN
  8. 8. Agenda Cloudbreak Docker Provisioning Collaboration
  9. 9. Cloudbreak • Developed by SequenceIQ • Open source with Apache 2.0 license [ Apache project soon ] • Deploys selected services to public and private cloud via Ambari Blueprints • Elastic – can spin up any number of nodes, add/remove on the fly • Provides full cloud lifecycle management post-deployment
  10. 10. BI / Analytics (Hive) IoT Apps (Storm, HBase, Hive) Launch HDP on Any Cloud for Any Application Dev / Test (all HDP services) Data Science (Spark) Cloudbreak 1. Pick a Blueprint 2. Choose a Cloud 3. Launch HDP! Example Ambari Blueprints: IoT Apps, BI / Analytics, Data Science, Dev / Test
  11. 11. Hadoop in Cloud Provisioning with Cloudbreak Create Templates Provide Blueprint Associate Credentials Launch Cluster
  12. 12. Provisioning: Template Create Template Provide Blueprint Associate Credentials Launch Cluster
  13. 13. Provisioning: Blueprint Create Template Provide Blueprint Associate Credentials Launch Cluster
  14. 14. Provisioning: Provider Credentials Create Template Provide Blueprint Associate Credentials Launch Cluster
  15. 15. Provisioning: Launch Create Template Provide Blueprint Associate Credentials Launch Cluster
  16. 16. Specialized Blueprints Quick productivity with pre-configured clusters blueprints  Lambda Architecture  Machine Learning  Batch ETL  …
  17. 17. BI / Analytics (Hive) IoT Apps (Storm, HBase, Hive) Dev / Test (all HDP services) Data Science (Spark) Autoscaling Policy • Policies based on any Ambari metrics • Coordinates with YARN • Policies are based on Metrics or Time • Scaling can be service or component type specific Optimize cloud usage via Elastic Clusters
  18. 18. Auto-scale Policy Auto-scale Policy Auto-scale Policy YARN Ambari Alerts Ambari Metrics Ambari Ambari Ambari Provisioning Cloudbreak Static Dynamic Enforces Policies Scales Cluster/YARN Apps Metrics and Alerts Feed Cloudbreak Scaling for Static and Dynamic Clusters
  19. 19. Provisioning – How it works Start VMs - with a running Docker daemon Cloudbreak Bootstrap •Start Consul Cluster •Start Swarm Cluster (Consul for discovery) Start Ambari servers/agents - Swarm API Ambari services registered in Consul (Registrator) Post Blueprint
  20. 20. Agenda Cloudbreak Docker Provisioning Collaboration
  21. 21. Multiplicity of Stacks Multiplicity of hardware environments Static website Web frontendUser DB Queue Analytics DB Development VM QA server Public Cloud Contributor’s laptopProduction Cluster Customer Data Center An engine that enables any payload to be encapsulated as a lightweight, portable, self-sufficient container Docker is a “Shipping Container” System for Code
  22. 22.  Lightweight, portable  Build once, run anywhere  VM – without the overhead of a VM  Isolated containers  Automated and scripted Docker
  23. 23. Why Is Docker So Exciting? For Developers: Build once…run anywhere • A clean, safe, and portable runtime environment for your app. • No missing dependencies, packages etc. • Run each app in its own isolated container • Automate testing, integration, packaging • Reduce/eliminate concerns about compatibility on different platforms • Cheap, zero-penalty containers to deploy services For DevOps: Configure once…run anything • Make the entire lifecycle more efficient, consistent, and repeatable • Eliminate inconsistencies between SDLC stages • Support segregation of duties • Significantly improves the speed and reliability of CICD • Significantly lightweight compared to VMs
  24. 24. App A Hypervisor (Type 2) Host OS Server Guest OS Bins/ Libs App A’ Guest OS Bins/ Libs App B Guest OS Bins/ Libs Docker Host OS kernel Server bin AppA lib AppB VM Container Containers are isolated, Share only the kernel Guest OS Guest OS …result is significantly faster deployment, much less overhead, easier migration, faster restart lib AppB lib AppB lib AppB bin AppA Docker: Containers vs. VMs
  25. 25. Agenda Cloudbreak Docker Provisioning Collaboration
  26. 26. HDP as Docker Containers via Cloudbreak • Running Ambari Cluster in Containers • Use Blueprint to define services • All HDP services share a single container Cloudb reak Ambari HDP Installs Ambari on the VMs Docker VM Docker VM Docker Linux Instruct s Ambari to build HDP cluster Cloud Provider/Bare Metal Provisions VMs from Cloud Providers Run Hadoop as Docker Containers
  27. 27. Swarm + Consul for Placement and Discovery
  28. 28. Cloudbreak Run Hadoop as Docker containers Docker Docker DockerDockerDocker Docker
  29. 29. Cloudbreak Run Hadoop as Docker containers Docker Docker DockerDockerDocker Docker amb- agn amb-ser amb- agn amb- agn amb- agn amb- agn Blueprint
  30. 30. Cloudbreak Run Hadoop as Docker containers Docker Docker DockerDockerDocker Docker amb-agn - hdfs - hbase amb-ser amb-agn -hdfs -hive amb-agn -hdfs -yarn amb-agn -hdfs -zookpr amb-agn -nmnode -hdfs
  31. 31. • Quick installation with pre-pulled rpms • Same process/images for dev/qa/prod • Same process for single/multi-node Benefits of running Hadoop on Docker
  32. 32. Demo
  33. 33. Agenda Cloudbreak Docker Provisioning Collaboration
  35. 35. Results of the collaboration • Efficient Hadoop as a service • Adoption of Docker for enterprise Hadoop deployment Tasks Cisco InterCloud Public Cloud Provider HDP installation 15:04 mins 11:55 mins Teragen (avg of 3 execution) 7:08 mins 22:15 mins Terasort(avg of 3 execution) 32:09 mins 60:12 mins Teravalidate(avg of 3 execution) 2:31 mins 10:40 mins
  36. 36. Observations Future Collaboration • Docker is maturing inside enterprises • Interest to run Docker on top of bare metal • Big data app developers are leaning towards containerization of apps • YARN is becoming application deployment platform beyond big data apps • Demand for native containerized fully managed app on YARN • Run Docker natively on Openstack • Run Docker on Yarn • OpenStack bare metal
  37. 37. Conclusion Data Science IoT BI / Analytics Dev / Test Blueprints HDP HDP + Cisco InterCloud - Efficient Hadoop-as-a-service
  38. 38. Learn More Download the Hortonworks Sandbox Learn Hadoop Build Your Analytic App Try Hadoop 2 More about Cisco & Hortonworks More about Hortonworks’ Acquisition of SequenceIQ