Apache Ambari BOF - OpenStack - Hadoop Summit 2013


Published on

Apache Ambari BOF Meet Up @ Hadoop Summit 2013



Published in: Technology

Apache Ambari BOF - OpenStack - Hadoop Summit 2013

  1. 1. © Hortonworks Inc. 2013 Hadoop + OpenStack integration Roadmap Himanshu Bari June 28th, 2013 Sr. Product Manager hbari@hortonworks.com
  2. 2. © Hortonworks Inc. 2013 Disclaimer •  This document may contain product features and technology directions that are under development or may be under development in the future. •  Technical feasibility, market demand, user feedback, and the Apache Software Foundation community development process can all affect timing and final delivery. •  This document’s description of these features and technology directions does not represent a contractual commitment from Hortonworks to deliver these features in any generally available product. •  Product features and technology directions are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.
  3. 3. © Hortonworks Inc. 2013 Agenda Why Hadoop on OpenStack Use cases A bit under the hood
  4. 4. © Hortonworks Inc. 2013 Big Data & Cloud Intersection Point è2013 Big Data & Cloud are top priority for CIOs Page 4 *
  5. 5. © Hortonworks Inc. 2013 OpenStack is an open source cloud management platform Glance Image Service Keystone Identity Service Horizon QuantumNova Cinder Block Store Swift Object Store (Apache License) Ceilometer Metering Heat Orchestration Integrated Mutli-hypervisor & guest OS support
  6. 6. © Hortonworks Inc. 2013 OpenStack has taken over Amazon AWS in market awareness… Source: Google trends
  7. 7. © Hortonworks Inc. 2013 Maturing quickly with broad support.. Pushed  by     150+  vendors       Millions  of  dollars  in   venture  capital   Early  adop;on  across  all   ver;cals  
  8. 8. © Hortonworks Inc. 2013 Why Hadoop & OpenStack? Hadoop provides a greenfield use case •  Net new workload •  Needs scale out infrastructure •  Shared platform OpenStack provides the perfect cloud platform •  Operational agility •  Supports scale out architecture •  Deployment choice across public & private clouds 1.  Open source communities provide the fastest path to innovation 2.  Open source is changing the game as economics and accessibility serve to accelerate cloud & big data market trends 3.  Both are attracting major ecosystem players: IBM, RHT, HP, RAX, etc… Marries two of the largest open source movements
  9. 9. © Hortonworks Inc. 2013 Accelerate Adoption of Hadoop on OpenStack Page 9 The leading contributor to Apache Hadoop The leading system integrator for OpenStack The leading contributor to OpenStack Apache Hadoop… The killer app for OpenStack
  10. 10. © Hortonworks Inc. 2013 OpenStack Infrastructure Savanna Elastic Hadoop Controller Collaborating on Project Savanna Page 10 Swift storage Hadoop Cluster N N N N N N 2 Ambari Hadoop management - - + + N N N N 1 3 1.  Cluster templates: deploy pre configured Hadoop clusters in seconds from Horizon or Ambari 2.  HDFS-Swift connectors: move data between HDFS and Swift object storage 3.  Simplified elasticity Project Savanna Automate deployment of Apache Hadoop on OpenStack
  11. 11. © Hortonworks Inc. 2013 Agenda Why Hadoop on OpenStack Use cases A bit under the hood
  12. 12. © Hortonworks Inc. 2013 Focus on API driven tight integration Hide Hadoop complexity through APIs “It Just Works” experience Fully leverage virtualization Scalability, Reliability, Performance Project Savanna design Goals
  13. 13. © Hortonworks Inc. 2013 Problems driving use cases Finance Compliance ITMarketing Web Mobile Sensor Interactive Batch Dev QA Prod Operational nightmare of supporting multiple cluster flavors Lack of agility Underutilized resources Maintenance complications Cluster requirements vary by business unit, data type & analytics use case Can’t migrate from public to private cloud
  14. 14. © Hortonworks Inc. 2013 Provisioning related use cases -  Frequent dev/test/staging cluster provision requests -  Migrations from staging to prod and vice versa -  Reduce operator error in cluster provisioning -  Migrate away from Amazon EMR for Ad hoc analytics requests to support experimentation
  15. 15. © Hortonworks Inc. 2013 Simplified provisioningPhase-1Phase-2 Use as is Single click provisioning Modify Update VM resource allocation, service to VM mapping and service config Provision and/or save template Template based provisioning Hadoop as a service (job flow based provisioning) Pick  job  type   +   Cascading,  streaming  &     custom  jar   Upload data to Swift Get results in Swift Cluster  template   E.g.  QA  cluster   Node  template     a.  Resource  based          -­‐  node.Large   b.  Func;on  based          -­‐  node.NameNode     Modify
  16. 16. © Hortonworks Inc. 2013 Ambari embedded in Horizon
  17. 17. © Hortonworks Inc. 2013 Swift object store support Phase-1 Phase-2 Bug fixes & optimizations Read/write data from/to Swift object stores Option-1: Copy data from Swift to HDFS, run mapreduce and copy results back to swift Option-2: Run mapreduce directly on top of Swift (Output data still needs to be copied from HDFS to Swift)
  18. 18. © Hortonworks Inc. 2013 Elasticity related use cases -  Commission a new node or decommission a node for maintenance -  For dev/test/staging clusters: automatically vary cluster data & compute capacity based on tenant, workload, time of day, resource utilization etc. -  Automatically vary compute capacity for production clusters
  19. 19. © Hortonworks Inc. 2013 Elasticity Nodeelasticity (computeand/ordata) Manual Rule based Long lived Short lived Cluster life (Swift or HDFS used for storage) Phase-1 Phase-2 Handle variable workloads eg. Alter cluster compute node count for peak/off-peak hrs. Job flow based clusters for ad-hoc analysis Best for Dev/QA use Best for predictable workloads.
  20. 20. © Hortonworks Inc. 2013 Multi-tenancy related use cases -  Improve server utilization by creating a common server pool for Hadoop and non Hadoop workloads -  Simplify maintenance & upgrade testing with the ability to multiple Hadoop clusters with different versions on the same server pool -  Support varying SLAs based on tenant and workload through resource isolation provided by VMs -  Simplify chargeback/showback
  21. 21. © Hortonworks Inc. 2013 Multi-tenancy Phase-1 Phase-2 •  Access isolation •  Single sign-on for Ambari & HUE through Keystone integration •  Dedicated Ambari & HUE instance per cluster per tenant •  Resource isolation •  CPU, memory isolation through VMs •  Ability to pin a Hadoop VM to a given set of physical hosts to enable per tenant physical host isolation •  Version isolation •  Choice of Hadoop versions for tenants •  Access isolation •  Single Ambari instance per tenant ( multi-cluster support with Ambari) •  Keystone enhancements to support Hadoop job flow level RBAC to support Hadoop as a service
  22. 22. © Hortonworks Inc. 2013 Agenda Why Hadoop on OpenStack Use cases A bit under the hood
  23. 23. © Hortonworks Inc. 2013 Savanna logical architecture OpenStack Infrastructure Network Storage Security Compute Savanna Controller HDP Savanna plugin API Hadoop Provisioning Ambari template management Horizon + Savanna UI A P I Configuration Elasticity Orchestration Plugin manager Hadoop Cluster Ambari + API
  24. 24. © Hortonworks Inc. 2013 Provisioning workflow overview 24 Horizon   Savanna Controller + HDP OpenStack Plugin Nova   Glance  Cluster request Provisions vanilla VMs Ambari configures all services and starts the cluster VM IMAGE OS only OR Pre loaded with HDP bits HDP plugin passes cluster template to Ambari Hadoop Cluster … … HDP Plugin installs Ambari Ambari Server HUE NN JT DNDN
  25. 25. © Hortonworks Inc. 2013 Ambari based cluster templates Preconfigured information across all clusters using this template HDP Stack Information - Services & Components & Packages - Description - Package Dependencies Hadoop Topology Component / Host Group Mapping Hadoop Configuration All Hadoop Configuration for the Cluster (hundreds of parameters and their values) Per cluster pluggable data - User names - Passwords - Host names - Host VM flavors ( CPU/Mem) - Node count per host group ………. ………. ………. ……….
  26. 26. © Hortonworks Inc. 2013 Swift object store support (Hadoop-8545) Dir File1 file2 file3 KEYSTONE   Dir/file1   Dir/file2   MapReduce, pig & Hive Swift store-1 Create, read, write, delete, mkdir, ls, mv & stat HDFS + Swift Bridge Container -1 Container -2 Swift store-n … Dir/file3   Container -1 Input data Output results
  27. 27. © Hortonworks Inc. 2013 Hadoop virtualization extensions(HVE) • Account for the additional ‘node group’ layer so replicas do not end up on VMs in the same hypervisor • Available in HDP 1.3. Work in progress to enable in HDP 2.0 ( YARN & HDFS) Data Center Rack-1 Node group-1 VM1 VM2 Node group-2 VM1 VM2 Rack-2 Node group-1 VM1 VM2 Node group-2 VM1 VM2 -  Replica (place, choose & remove) policies -  Balancer policies -  Task placement & container allocation(YARN)