Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Hadoop Everywhere & Cloudbreak

1,760 views

Published on

Hadoop Everywhere with Hortonworks Data Platform (HDP) & CloudBreak

Published in: Technology
  • Be the first to comment

Hadoop Everywhere & Cloudbreak

  1. 1. Hadoop Everywhere Hortonworks. We do Hadoop.
  2. 2. $ whoami Sean Roberts Partner Solutions Engineer London, EMEA & everywhere @seano linkedin.com/in/seanorama MacGyver. Data Freak. Cook. Autodidact. Volunteer. Ancestral Health. Fito. Couchsurfer. Nomad
  3. 3. - HDP 2.3 - http://hortonworks.com/ - Hadoop Summit recordings: - http://2015.hadoopsummit.org/san-jose/ - http://2015.hadoopsummit.org/brussels/ - Past & Future workshops: - http://hortonworks.com/partners/learn/ What’s New!
  4. 4. Agenda ● Hadoop Everywhere ● Deployment challenges & requirements ● Cloudbreak & our Docker approach ● Workshop: Your own CloudBreak ○ And auto-scaling with Periscope ● Cloud best practices Reminder: ● Attendee phone lines are muted ● Please ask questions in the chat
  5. 5. Page 5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Disclaimer This document may contain product features and technology directions that are under development, may be under development in the future or may ultimately not be developed. Project capabilities are based on information that is publicly available within the Apache Software Foundation project websites ("Apache"). Progress of the project capabilities can be tracked from inception to release through Apache, however, technical feasibility, market demand, user feedback and the overarching Apache Software Foundation community development process can all effect timing and final delivery. This document’s description of these features and technology directions does not represent a contractual commitment, promise or obligation from Hortonworks to deliver these features in any generally available product. Product features and technology directions are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind. Since this document contains an outline of general product development plans, customers should not rely upon it when making purchasing decisions.
  6. 6. Hadoop Everywhere
  7. 7. Page 7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Any application Batch, interactive, and real-time Any data Existing and new datasets Anywhere Complete range of deployment options Commodity Appliance Cloud YARN: data operating system Existing applications New analytics Partner applications Data access: batch, interactive, real-time Hadoop Everywhere
  8. 8. Page 8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Hybrid Deployment Choice Windows, Linux, On-Premise or Cloud Data “gravity” guides choice Compatible Clusters Run applications and data processing workloads wherever and whenever needed Replicated Datasets Democratize Hadoop data access via automated sharing of datasets using Apache Falcon Hadoop Up There, Down Here...Everywhere! Dev / Test BI / ML IoT Apps On-Premises
  9. 9. Page 9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Use Cases Where? Active Archive / Compliance Reporting Sensitive data = “down here”; “up there” valid for many scenarios ETL / Data Warehouse Optimization Usually has “down here” gravity; DW in cloud is changing that Smart Meter Analysis Data typically flows “up there” Single View of Customer May have “down here” gravity; unless you’re using SaaS apps Supply Chain Optimization May have heavy “down here” gravity New Data for Product Management “Up there” could be considered for many scenarios. Vehicle Data for Transportation/Logistics Why not “up there”? Vehicle Data for Insurance May have “down here” gravity (ex. join with existing risk data) Anywhere? Up There or Down Here?
  10. 10. Deployment Challenges & Requirements
  11. 11. Deployment challenges ● Infrastructure is different everywhere ○ e.g. Each cloud provider has their own API ○ e.g. Each provider has different networking methods ● OS/images are different everywhere ● How to do service discovery? ● How to dynamically scale/manage? See prior operations workshops
  12. 12. - Infrastructure - Operating System - Environment Prepared (see docs) - Ambari Agent/Server installed & registered - Deploy HDP Cluster - Ambari Blueprints or Cluster Wizard - Ongoing configuration/management Deployment requirements
  13. 13. Options for Automation - Many combinations of tools - e.g. Foreman, Ansible, Chef, Puppet, docker-ambari, shell scripts, CloudFormation, … - Provider specific - Cisco UCS, Teradata, HP, Google’s bdutil, … - Docker with Cloudbreak Using Ambari with all of the above!
  14. 14. https://github.com/seanorama/ambari-bootstrap/ Demo: Basic script-based example
  15. 15. https://github.com/seanorama/ambari-bootstrap Requirements: ● Infrastructure prepped (see HDP docs) ● Nodes with RedHat EL or CentOS 6 systems ● HDFS paths mounted (see HDP docs) ● sudo or root access ambari-bootstrap
  16. 16. After Ambari deployment ● (optional) Configure local YUM/APT repos ● Deploy HDP with Ambari Wizard or Blueprint ● Ongoing configuration/management
  17. 17. Using Ansible https://github.com/rackerlabs/ansible-hadoop
  18. 18. Build once. Deploy anywhere. Docker
  19. 19. Page 19 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
  20. 20. Page 20 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Multiplicity of Stacks Multiplicity of hardware environments Static website Web frontend User DB Queu e Analytics DB Development VM QA server Public Cloud Contributor’s laptop Docker is a “Shipping Container” System for Code Production Cluster Customer Data Center An engine that enables any payload to be encapsulated as a lightweight, portable, self-sufficient container
  21. 21. Page 21 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Docker • Container based virtualization • Lightweight and portable • Build once, run anywhere • Ease of packaging applications • Automated and scripted • Isolated
  22. 22. Page 22 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Why Is Docker So Exciting? For Developers: Build once…run anywhere • A clean, safe, and portable runtime environment for your app. • No missing dependencies, packages etc. • Run each app in its own isolated container • Automate testing, integration, packaging • Reduce/eliminate concerns about compatibility on different platforms • Cheap, zero-penalty containers to deploy services For DevOps: Configure once…run anything • Make the entire lifecycle more efficient, consistent, and repeatable • Eliminate inconsistencies between SDLC stages • Support segregation of duties • Significantly improves the speed and reliability of CICD • Significantly lightweight compared to VMs
  23. 23. Page 23 © Hortonworks Inc. 2011 – 2015. All Rights Reserved More Technical Explanation WHY WHA T • Run on any LINUX • Regardless of kernel version (2.6.32+) • Regardless of host distro • Physical or virtual, cloud or not • Container and host architecture must match • Run anything • If it can run on the host, it can run in the container • i.e. if it can run on a Linux kernel, it can run • High Level—It’s a lightweight VM • Own process space • Own network interface • Can run stuff as root • Low Level—It’s chroot on steroids • Container=isolated processes • Share kernel with host • No device emulation (neither HVM nor PV) from host)
  24. 24. Page 24 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Docker - How it works App A Hypervisor (Type 2) Host OS Server Guest OS Bins/ Libs App A’ Gues t OS Bins/ Libs App B Gues t OS Bins/ Libs Docker Host OS kernel Server bin AppA lib AppB VM Container Containers are isolated. Share OS and bins/libraries Guest OS Guest OS …result is significantly faster deployment, much less overhead, easier migration, faster restart lib AppB lib AppB lib AppB bin AppA
  25. 25. Cloudbreak Tool for Provision and Managing Hadoop Clusters In The Cloud
  26. 26. Page 26 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Cloudbreak • Developed by SequenceIQ • Open source with Apache 2.0 license [ Apache project soon ] • Cloud and infrastructure agnostic, cost effective Hadoop As-a-Service platform API. • Elastic – can spin up any number of nodes, add/remove on the fly • Provides full cloud lifecycle management post-deployment
  27. 27. Page 27 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Key Features of Cloudbreak Elastic • Enables provisioning an arbitrary node Cluster • Enables (de) commissioning nodes from Cluster • Policy and time based based scaling of cluster Flexible • Declarative and flexible Hadoop cluster creation using blueprints • Provision to multiple public cloud providers or Openstack based private cloud using same common API • Access all of this functionality through rich UI, secured REST API or automatable Shell Enterprise-ready • Supports basic, token based and OAuth2 authentication model • The cluster is provisioned in a logically isolated network • Tracking usage and cluster metrics
  28. 28. Page 28 © Hortonworks Inc. 2011 – 2015. All Rights Reserved BI / Analytics (Hive) IoT Apps (Storm, HBase, Hive) Launch HDP on Any Cloud for Any Application Dev / Test (all HDP services) Data Science (Spark) Cloudbreak 1. Pick a Blueprint 2. Choose a Cloud 3. Launch HDP! Example Ambari Blueprints: IoT Apps, BI / Analytics, Data Science, Dev / Test
  29. 29. Page 29 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Cloudbreak Approach • Use Ambari for heavy lifting • Provisioning of Hadoop services • Monitoring • Use Ambari Blueprints • Assign Host groups to physical instance types • Public/Private Cloud provider API abstracted • Azure/Google/Amazon/Openstack • Run Ambari agent/server in Docker container • Networking: docker run –net=host • Service discovery: consul (previously serf)
  30. 30. Workshop: Your own Cloudbreak
  31. 31. cloudbreak-deployer ● https://github.com/sequenceiq/cloudbreak-deployer Requirements: ● A Docker host (laptop, server or Cloud infrastructure) ● Resources: ○ Very little. Tested with 2GB of RAM. Workshop: Your Own CloudBreak
  32. 32. Requirement: a Docker host ● OSX or Windows: http://boot2docker.io/ ○ boot2docker init ○ boot2docker up ○ eval "$(boot2docker shellinit)" ○ boot2docker ssh ● Linux: Install the docker daemon ● Anywhere: docker-machine “lets you create Docker hosts on your computer, on cloud providers, and inside your own data center” ○ Example on Rackspace: ■ docker-machine create --driver rackspace --rackspace-api-key $OS_PASSWORD --rackspace-username $OS_USERNAME --rackspace-region DFW docker-rax ■ docker-machine ssh docker-rax
  33. 33. Install cloudbreak-deployer https://github.com/sequenceiq/cloudbreak-deployer ● curl https://raw.githubusercontent.com/sequenceiq/cloudbreak- deployer/master/install | sh && cbd --version ● cbd init ● cbd start You’ll then have your own CloudBreak & Periscope server with API and Web UI
  34. 34. Done: Your own Cloudbreak
  35. 35. Deploy a cluster with your CloudBreak
  36. 36. Documentation: http://sequenceiq. com/cloudbreak/#clou dbreak-credentials 1. Add Credentials
  37. 37. 2. Create Cluster
  38. 38. 3. Use your Cluster Ambari available as expected To reach your Hadoop hosts: ● SSH to Docker Host ○ Hosts arre listed in “Cloud stack description” ○ ssh cloudbreak@IPofHost ● Shell to the “ambari-agent” container ○ sudo docker ps | grep ambari-agent ■ note the CONTAINER ID ○ sudo docker -it CONTAINERID bash ● Use the hosts as usual. e.g.: ○ hadoop fs -ls /
  39. 39. Cloudbreak internals
  40. 40. Page 40 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Cloudbreak Cloudbreak Internals Uluwatu (cbreak UI) Sultans (User mgmt UI) Browser Cloudbreak shellOAuth2 (UAA) uaa-db (psql) Cloudbreak (rest API) cb-db (psql) Periscope (autoscaling ) ps-db (psql) consul registrator ambassador docker
  41. 41. Docker
  42. 42. Page 42 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Swarm • Native clustering for Docker • Distributed container orchestration • Same API as Docker
  43. 43. Page 43 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Swarm – How it works • Swarm managers/agents • Discovery services • Advanced scheduling
  44. 44. Page 44 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Consul • Service discovery/registry • Health checking • Key/Value store • DNS • Multi datacenter aware
  45. 45. Page 45 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Consul – How it works • Consul servers/agents • Consistency through a quorum (RAFT) • Scalability due to gossip based protocol (SWIM) • Decentralized and fault tolerant • Highly available • Consistency over availability (CP) • Multiple interfaces - HTTP and DNS • Support for watches
  46. 46. Page 46 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Apache Ambari • Easy Hadoop cluster provisioning • Management and monitoring • Key feature - Blueprints • REST API, CLI shell • Extensible • Stacks • Services • Views
  47. 47. Page 47 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Apache Ambari – How it works • Ambari server/agents • Define a blueprint (blueprint.json) • Define a host mapping (hostmapping.json) • Post the cluster create
  48. 48. Page 48 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Run Hadoop as Docker containers HDP as Docker Containers via Cloudbreak • Fully Automated Ambari Cluster installation • Avoid GUI, use rest API only (ambari-shell) • Fully Automated HDP installation with blueprints • Quick installation (pre-pulled rpms) • Same process/images for dev/qa/prod • Same process for single/multinode Cloudbreak Ambari HDP Installs Ambari on the VMs Docker VM Docker VM Docker Linux Instructs Ambari to build HDP cluster Cloud Provider/Bare Metal Provision s VMs from Cloud Providers
  49. 49. Page 49 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Provisioning – How it works Start VMs - with a running Docker daemon Cloudbreak Bootstrap •Start Consul Cluster •Start Swarm Cluster (Consul for discovery) Start Ambari servers/agents - Swarm API Ambari services registered in Consul (Registrator) Post Blueprint
  50. 50. Page 50 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Cloudbreak Run Hadoop as Docker containers Docker Docker DockerDockerDocker Docker
  51. 51. Page 51 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Cloudbreak Run Hadoop as Docker containers Docker Docker DockerDockerDocker Docker amb- agn amb-ser amb- agn amb- agn amb- agn amb- agn
  52. 52. Page 52 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Cloudbreak Run Hadoop as Docker containers Docker Docker DockerDockerDocker Docker amb- agn amb-ser amb- agn amb- agn amb- agn amb- agn Blueprint
  53. 53. Page 53 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Cloudbreak Run Hadoop as Docker containers Docker Docker DockerDockerDocker Docker amb-agn - hdfs - hbase amb-ser amb-agn -hdfs -hive amb-agn -hdfs -yarn amb-agn -hdfs -zookpr amb-agn -nmnode -hdfs
  54. 54. Workshop: Auto-Scale your Cluster with Periscope
  55. 55. Page 55 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Optimize Cloud Usage via Elastic HDP Clusters Dev / Test Auto-scaling Policy • Policies based on any Ambari metrics • Dynamically scale to achieve physical elasticity • Coordinates with YARN to achieve elasticity based on the policies.
  56. 56. Page 56 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Scaling for Static and Dynamic Clusters Auto-scale Policy Auto-scale Policy Auto-scale Policy YARN Ambari Alerts Ambari Metrics Ambari Ambari Ambari Provisioning Cloudbreak Static Dynamic Enforces Policies Scales Cluster/YARN Apps Metrics and Alerts Feed Cloudbreak/Periscope
  57. 57. Scale by Ambari Monitoring Metric 1. Ambari: review metric 2. CloudBreak: set alert 3. Cloudbreak: set scaling policy
  58. 58. Scale up/down by time 1. Set time-based alert 2. Set scaling policy Repeat with an alert and policy which scales down
  59. 59. Roadmap
  60. 60. Page 60 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Release Summary Cloudbreak ● It’s own project (separate from Ambari) ● Supported on Linux flavors which support Docker Periscope ● Feature of Cloudbreak 1.0 ● Will be embedded in Ambari later in 2015
  61. 61. Page 61 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Release Timeline Cloudbreak 1.0 GA June/July 2015 Cloudbreak 2.0 GA 2H2015 Ambari 2.1.0 HDP “Dal” / 2.3 Ambari 2.2 HDP “Erie” / 2.4 Cloudbreak 1.1 August 2015 (est) Ambari 2.1.1 HDP “Dal-M10” Cloudbreak Incubator Proposal July/August 2015 (est)
  62. 62. Page 62 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Supported Cloud Environments Cloudbreak HDP 2.3 Microsoft Azure GA AWS GA Google Compute GA Cloudbreak HDP 2.3 Cloudbreak HDP 2.4 Openstack Community Tech Preview Tech Preview Red Hat OSP TBD HP Helion GA (Tentative) Mirantis OpenStack
  63. 63. HDP as a Service
  64. 64. Hortonworks Data Platform On Azure
  65. 65. Rackspace Cloud Big Data Platform ● Rapidly spin up on-demand HDP clusters ● Integrated with Cloud Files (OpenStack Swift) ● Opt-in for Managed Services by Rackspace Managed Big Data Platform ● Fully Managed HDP on Dedicated and/or Cloud ● Leverage Fanatical Support and Industry Leading SLA’s ● Supported by Rackspace with escalation to Hortonworks
  66. 66. CSC
  67. 67. HDP on IaaS - Best Practices
  68. 68. Microsoft Azure ● Deployment ○ Deploy using CloudBreak ○ Deploy using HWX Azure Gallery Image ● Integrated with Azure Blob Storage ● Supported directly by Hortonworks ● Other offerings ○ Microsoft HDInsight ○ HDP Sandbox
  69. 69. Azure Deployment Guideline ● All in same Region ● Instance Types ○ Typical: A7 ○ Performance: D14 ○ 8x1TB Standard LRS x3 Virtual Hard Disk per server ● Multiple Storage Accounts are recommended ○ Recommend no more than 40 Virtual Hard Disks per Storage Account
  70. 70. Azure Blob Store Azure Blob Store (Object Storage) ● wasb[s]: //<containername>@<accountname>.blob. core.windows.net/<path> Can be used as a replacement for HDFS ● Thoroughly tested in HDP release test suites
  71. 71. Amazon Web Services ● Deploy using CloudBreak ● Integrated with AWS S3 (object storage) ● Supported directly by Hortonworks
  72. 72. Amazon Deployment Guideline ● All in same Region/AZ ● Instances with Enhanced Networking Master Nodes: ● Choose EBS Optimized ● Boot: 100GB on EBS ● Data: 4+ 1TB on EBS Worker Nodes: ● Boot: 100GB on EBS ● Data: Instance Storage ○ EBS can be used, but local is preferred Instance Types: ● Typical: d2. ● Performance: i2. https://aws.amazon.com/ec2/instance-types/
  73. 73. AWS RDS ● Some services rely on MySQL, Oracle or PostgreSQL: ○ Apache Ambari ○ Apache Hive ○ Apache Oozie ○ Apache Ranger ● Use RDS for these instead of managing yourself.
  74. 74. AWS S3 (Object Storage) ● s3n:// with HDP 2.2 (Hadoop 2.6) ● s3a:// with HDP 2.3 (Hadoop 2.7) Not currently a direct replacement for HDFS Recommended to configure access with IAM Role/Policy ● https://docs.aws.amazon. com/IAM/latest/UserGuide/policies_examples.html#iam- policy-example-s3 ● Example: http://git.io/vLoGY
  75. 75. Amazon Deployment Guideline ● All in same Region/AZ ● Instances with Enhanced Networking Master Nodes: ● Choose EBS Optimized ● Boot: 100GB on EBS ● Data: 4+ 1TB on EBS Worker Nodes: ● Boot: 100GB on EBS ● Data: Instance Storage ○ EBS can be used, but local is preferred Instance Types: ● Typical: d2. ● Performance: i2. https://aws.amazon.com/ec2/instance-types/
  76. 76. Google Cloud ● Deploy using ○ CloudBreak ○ Google bdutil with Apache Ambari plug-in ● Integrated with Google Cloud Storage ● Supported directly by Hortonworks
  77. 77. Google Deployment Guideline ● Instance Types ○ Typical: n1 standard 4 with single 1.5 TB persistent disks ○ Performance: n1 standard 8 with 1TB SSD ● Google GCS (Object Storage) ● gs://<CONFIGBUCKET>/dir/file ● Not currently a replacement for HDFS
  78. 78. S3 & GCS as Secondary storage system The connectors are currently eventually consistent so do not replace HDFS Backup ● Falcon, distCP, hadoop fs, HBase ExportSnapshot ● Kafka+Storm bolt sends messages to S3/GCS providing backup & point-in-time recovery source Input/Output ● Convenient & broadly used upload/download method ○ As a middleware to ease integration with Hadoop & limit access ● Publishing static content (optionally with CloudFront) ○ Removes need to manage any web services ● Storage for temporary/ephemeral clusters
  79. 79. Questions
  80. 80. $ shutdown -h now - HDP 2.3 - http://hortonworks.com/ - Hadoop Summit recordings: - http://2015.hadoopsummit.org/san-jose/ - http://2015.hadoopsummit.org/brussels/ - Past & Future workshops: - http://hortonworks.com/partners/learn/

×