Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Apache Provisionr (incubating) - Bucharest JUG 10


Published on

My slides on Apache Provisionr (incubating) - a service that can be used to create and manage pools of virtual machines on multiple clouds.

  • Be the first to comment

Apache Provisionr (incubating) - Bucharest JUG 10

  1. 1. Apache Provisionr (incubating) Andrei Savu Bucharest JUG #10
  2. 2. About me● Founder of● Organizer of Bucharest JUG (● Apache Whirr PMC, ZooKeeper contributor● Passion for DevOps & Data Analysis● Connect with me on LinkedIn
  3. 3. @ Axemblr● Data Processing Infrastructure● Deployment Automation● Product: Hadoop On-Demand Appliance● Open Source (part of our DNA)● Fair amount of consulting (bootstrapping)
  4. 4. Agenda● What is Provisionr?● Challenges & Architecture● Demo (HDFS on EC2)● Future @ Apache Software Foundation
  5. 5. What is Provisionr?.. and how does it help me create pools of virtual machines?
  6. 6. What?● Simple Service for Managing Pools of 10s or 100s of Virtual Machines● A way to create clusters of machines that share a common set of characteristics on multiple cloud providers
  7. 7. Characteristics like?● Operating system ● Network settings● Pre-installed ● Firewall packages & ● SSH config binaries ● Admin access● Sane DNS settings (forward & reverse ● VPN access dns resolution) ● etc.● NTP settings
  8. 8. Why? (initially)● Setup on-demand Hadoop clusters (Axemblr)● Handles basic setup for large clusters● Service config by using 3rd party apps like Ambari or Cloudera Manager
  9. 9. Why? (long term) Core functionality is generic Next generation Apache Whirr? External ConfigurationSpecification Events Events Provisionr Events Monitoring
  10. 10. FAQ: Looks like Puppet?● No● Provisionr is actually using Puppet● Focus: Interact with IaaS APIs to start machines in groups with minimal configs (as listed before). Simple & reliable.
  11. 11. ChallengesHow is the game different when wework with 50-100+ virtual machines?
  12. 12. Challenges #1● API Throttling (batch calls)● Concurrency Control (across multiple instances)● Error handling, partial failures and automatic retries (idempotency)
  13. 13. Challenges #2● Granular internal workflows (short transactions)● State persistence across restarts and upgrades● Audit & Logging
  14. 14. Challenges #3● Integrating multiple native provider SDKs● Provide a plugin architecture (run just a sub-set of all the features)● Semi-automated and fully automated modes
  15. 15. Challenges #4● Automatic creation of gold images
  16. 16. Architecture Building Blocks, Internals,Persistence, Packaging, Plugins
  17. 17. Activiti (from Alfresco)● Light-weight workflow engine (BPM)● Has a nice Java API● Has a nice set of tools● Handles persistence as expected● Good error handling (retryable activities)
  18. 18. Activiti – Process Execution
  19. 19. Activiti – Interactive View
  20. 20. Apache Karaf● Using it as an application server● Provides an interactive shell● Integrated with Activiti● Solves the packaging problem (custom distribution)
  21. 21. Apache Karaf - Shell
  22. 22. IaaS SDKs● AWS SDK for Java –● jclouds (for CloudStack) –
  23. 23. Demo Time (video) Provisionr & RundeckCDH4 HDFS cluster on EC2
  24. 24. Summary● Provisionr solves the problem of creating large pools of virtual machines (100s)● Cloud portability by making the machines & the cluster indistinguishable from an application perspective on multiple clouds
  25. 25. Working on● Short term: first release compliant with the Apache Software Foundation policies● Automatic AMI creation (fast provisioning of large clusters)● Bundle Rundeck with the binary release and write support services
  26. 26. Youre invited to join!● Code: – git clone provisionr.git● Mailing list: –
  27. 27. Thanks! Questions? Andrei Savu Twitter: @andreisavu