Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

How bigtop leveraged docker for build automation and one click hadoop provisioning @ HadoopCon 2015

2,894 views

Published on

What is Apache Bigtop?
Achieving Build Automation
One-Click Hadoop Provisioning
Bigtop at Trend Micro

Published in: Software
  • Be the first to comment

How bigtop leveraged docker for build automation and one click hadoop provisioning @ HadoopCon 2015

  1. 1. How Bigtop Leveraged Docker for Build Automation and One-Click Hadoop Provisioning 葉祐欣 Evans Ye HadoopCon 2015
  2. 2. About me • Apache Bigtop PMC member • WAT? checkout my talk in HadoopCon 2014 • made Committer 2014/12, PMC member 2015/06 • Develop Big Data infra & apps @ Trend Micro
  3. 3. Outline • What is Apache Bigtop? • Achieving Build Automation • One-Click Hadoop Provisioning • Bigtop at Trend Micro
  4. 4. What is Apache Bigtop ?
  5. 5. Simple answer :
  6. 6. A project helps you to build your own Big Data Stack
  7. 7. Hmm…what exactly Big Data Stack is?
  8. 8. Big Data Stack Powered by Bigtop Hadoop YARN Hadoop HDFS TezSpark Spark Streaming Spark GraphX Hive Pig App A App B App C Tachyon HBase Pheonix Resource Management Storage Processing Engine APIs and Interfases In-house Apps Ignite
  9. 9. Comprehensive answer :
  10. 10. Bigtop is a project for Packaging Testing Deployment Virtualization
  11. 11. • Build RPM, DEB packages for Hadoop ecosystem • $ yum -y install hadoop • $ apt-get upgrade hbase • Why not just untar hadoop-2.7.1.tar.gz ? • Version control, upgrade/downgrade, dependency management • Unified view of configuration, log, binary directory layout • Daemons for better process management Packaging
  12. 12. • You can find binary repos here: • http://www.apache.org/dist/bigtop/bigtop-1.0.0/repos/ Supported Linux distro.
  13. 13. Supported components
  14. 14. • A fundamental CI problem may happen in your Big Data Stack, too Testing Hadoop 2.4.1 Hadoop 2.6.0 Hive 1.1.0 Hadoop 2.7.1 Hive 1.2.0 Hive 1.2.1 Tez 0.6.0 Tez 0.6.2 Tez 0.7.0
  15. 15. • A fundamental CI problem may happen in your Big Data Stack, too Testing Hadoop 2.4.1 Hadoop 2.6.0 Hive 1.1.0 Hadoop 2.7.1 Hive 1.2.0 Hive 1.2.1 Tez 0.6.0 Tez 0.6.2 Tez 0.7.0
  16. 16. • A fundamental CI problem may happen in your Big Data Stack, too Testing Hadoop 2.4.1 Hadoop 2.6.0 Hive 1.1.0 Hadoop 2.7.1 Hive 1.2.0 Hive 1.2.1 Tez 0.6.0 Tez 0.6.2 Tez 0.7.0 Does this combination still work?
  17. 17. There’s a need for integration tests !
  18. 18. • Bigtop has a testing framework that provides APIs • Built-in tests that Bigtop provides: • smoke-tests • Basic Hadoop ecosystem interoperability • integration tests • Advanced tests runs from jar files Bigtop Tests
  19. 19. Okay, then how to run those tests?
  20. 20. • Use Bigtop Puppet to deploy a fully functional, distributed Hadoop cluster • Integration tests running on a fully distributed cluster • Bigtop Puppet • Masterless Puppet • Numbers of components are supported • Kerberos enabled cluster deployment Deployment
  21. 21. Where to deploy ?
  22. 22. • Automate the infrastructure creation & setup • Bigtop Provisioner • Virtualbox VMs auto-provisioning • Docker containers auto-provisioning Virtualization
  23. 23. That’s good ! But I don’t need that much What else I can have?
  24. 24. • Apache Hadoop App developers • One-click to run distributed cluster to test your code on • Cluster administrators or deployment gurus • Use Bigtop Puppet to deploy and manage your cluster • Run Bigtop tests to ensure your cluster is working • Vendors • Build your own Hadoop distribution, customized from Bigtop bits From user point of view
  25. 25. Achieving Build Automation
  26. 26. • As you can see, we support a number of Linux distributions (m) What’s the challenge ?
  27. 27. • Numbers of components (n) What’s the challenge ?
  28. 28. We need to build them all —> m * n
  29. 29. Preparing build environment
  30. 30. … Seriously ? Preparing build environment
  31. 31. • Puppet recipes automatically install required libraries, build tools • To transform a machine into a build slave, simply do • $ ./gradlew toolchain • Prerequisites : • Puppet 3.x • JRE 7 Bigtop Toolchain
  32. 32. Continuous Integration(CI) CentOS slave Fedora slave Ubuntu slave Debian slave OpenSUSE slave
  33. 33. Continuous Integration(CI) CentOS slave Fedora slave Ubuntu slave Debian slave OpenSUSE slave Bigtop Toolchain Bigtop Toolchain Bigtop Toolchain Bigtop Toolchain Bigtop Toolchain
  34. 34. We got Bigtop Toolchain for you. Hi Bigtop, How do I test my packaging code? A contributor A committer
  35. 35. (Don’t want to answer…) I need to setup all the Linux environments on my laptop? A contributor A committer
  36. 36. • A very high level view : think it like a VM • Three key techniques in Docker • Use Linux Kernel’s Namespaces to create isolated resources • Use Linux Kernel’s cgroups to constrain resource usage • Union file system that supports git-like image creation Docker - How it works
  37. 37. • Lightweight • Fast creation • Repeatable • Portability • runs on *any* Linux environment The nice things about it
  38. 38. • build, test, and run different Linux distribution in one single machine Best fit in our use case OS specific Bins/Libs
  39. 39. Bigtop/slaves images Dockerhub official images Install Puppet bigtop/slaves Install Bigtop Toolchain
  40. 40. Now, easy to build & test on your laptop Linux boot2docker CentOS slave Ubuntu slave Mac OS X Windows CentOS slave Ubuntu slave Docker engine Docker engine
  41. 41. Flexible CI infra CentOS slave Fedora slave Ubuntu slave Debian slave OpenSUSE slave
  42. 42. One-Click Hadoop Provisioning
  43. 43. • A tool to demonstrate the full life cycle of Bigtop Bigtop Provisioner Packaging TestingDeploymentVirtualization Spin up resources Run Bigtop Puppet Run Bigtop Tests Bigtop Provisioner
  44. 44. • Fast iterative development • Test your code in the cluster, on your laptop, w/o human intervention • Flexibility • Choose any combination of components as you want • Responsive CI • Integration tests that get you the result in mins • A Big Data Stack playground • Spark + Tachyon, Pig on Tez, etc The goal
  45. 45. Vagrant + Automation Code + Bigtop Puppet One-click Hadoop Provisioning Bigtop Provisioner w/ provider =
  46. 46. • We use Vagrant as an abstraction layer to support different kinds of resource providers Vagrant Providers
  47. 47. One click Hadoop provisioning
  48. 48. bigtop/deploy images on Docker hub One click Hadoop provisioning
  49. 49. bigtop/deploy images on Docker hub puppet apply puppet apply puppet apply One click Hadoop provisioning
  50. 50. Bigtop/deploy images Dockerhub official images install Puppet bigtop/deploy install Vagrant ssh key
  51. 51. Bigtop image hierarchy Dockerhub official images bigtop/puppet bigtop/deploybigtop/slaves bigtop/puppet = diff
  52. 52. All on the Docker hub https://hub.docker.com/u/bigtop/
  53. 53. A demo is worth 1k words In case my demo fail https://asciinema.org/a/a7kiibouqav38xl5p2wrms51r
  54. 54. • Supported providers in Bigtop 1.0.0 release • Virtaulbox VM • Docker container • OpenStack support is in master branch now Bigtop Provisioner
  55. 55. • For Hadoop app developers, cluster admins, users • Run a distributed cluster to test your code on • Tweak & test configurations before applying to Production • Play around with Bigtop Big Data Stack • For Bigtop contributors • Easy to test your packaging, deployment, testing code • For vendors • CI out of the box —> patch upstream code made easier Use cases
  56. 56. Bigtop at Trend Micro
  57. 57. • Use Bigtop as the basis for our internal custom distribution of Hadoop • Apply community, internal patches to upstream projects for business and operational need • Newest TMH7 is based on Bigtop 1.0 SNAPSHOT Trend Micro Hadoop (TMH)
  58. 58. Working with community made our life easier • Knowing community status made us being able to release TMH7 based on 1.0 SNAPSHOT
  59. 59. Working with community made our life easier • Contribute Bigtop Provisioner, packaging code, pupept recipes, bugfixes, CI setup, anything! • Knowing community status made us being able to release TMH7 based on 1.0 SNAPSHOT
  60. 60. Working with community made our life easier • Running Bigtop smoke tests and integration tests using Bigtop Provisioner for TMH7
  61. 61. Working with community made our life easier • Contribute feedback, evaluation, and bugfixes through Production level adoption • Running Bigtop smoke tests and integration tests using Bigtop Provisioner for TMH7
  62. 62. Hadoop YARN Hadoop HDFS Mapreduce Ad-hoc Query UDFs Pig App A App C Oozie Resource Management Storage Processing Engine APIs and Interfases In-house Apps Trend Micro Big Data Stack Powered by Bigtop Kerberos App B App C HBase Wuji
  63. 63. Challenges we’re facing !
  64. 64. • We’re moving our cluster to another Data Center • New cluster will be running TMH7 • Need Distcp to copy data from old TMH6 cluster • Both are Kerberos enabled secure clusters Migration x Upgrade
  65. 65. Distcp between TMH6 and TMH7 ?
  66. 66. From our long history we can say: Things always get complicated with Kerberos…
  67. 67. Multi-Cluster Live Synchronization with Kerberos Federated Hadoop Mammi Chang 16:20~17:10 3F 第二會議室
  68. 68. • Moving data across Production and Staging cluster in different Data Centers is a need • Doing Distcp across 4 clusters ! • Our Hadoop deployment tool needs to support Distcop auto-configuration • Developing the deployment tool is challenging ! Hadoop cluster deployment
  69. 69. • Run multiple secure HA clusters on Docker • Develop & test the deployment tool repeatedly Docker comes to the rescue TMH6 Production TMH7 Production TMH6 Staging TMH7 Staging
  70. 70. Hadoop apps development on Docker
  71. 71. • A Devops toolkit for Hadoop app developer to develop and test its code on • Images with TMH7(Bigtop) already built-in —> up and running w/o deployment • A Hadoop environment for apps to test against our new Hadoop distribution Hadoocker
  72. 72. internal Docker registry Hadoop server Hadoop client data Zero deployment Hadoop TMH7 Hadoop app Restfull APIs sample data hadoop fs put…
  73. 73. • Support apps to run end-to end CI against a Hadoop environment • Support image creation from scratch all the way up to Hadoop pseudo cluster and your apps • Allowing customization for your own environment • Same feature is also delivered in Vagrant boxes • https://github.com/evans-ye/hadoocker Features
  74. 74. Summary
  75. 75. • Bigtop helps you to build your own Big Data Stack • Building Bigtop packages made easier by offering immutable build environment through Docker images • Bigtop Provisioner creates you Hadoop cluster by just one click • Use Docker to simulate complex environment to ease development and testing efforts • Ship you apps in Docker image with zero deployment Summary
  76. 76. Come and join the worldwide open source party !
  77. 77. Questions ? Thank you !

×