Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

OSCON 2013 - The Hitchiker’s Guide to Open Source Cloud Computing


Published on

And while the Hitchhiker’s Guide to the Galaxy (HHGTTG) is a wholly remarkable book it doesn’t cover the nuances of cloud computing. Whether you want to build a public, private or hybrid cloud there are free and open source tools that can help provide you a complete solution or help augment your existing Amazon or other hosted cloud solution. That’s why you need the Hitchhiker’s Guide to (Open Source) Cloud Computing (HHGTCC) or at least to attend this talk understand the current state of open source cloud computing. This talk will cover infrastructure-as-a-service, platform-as-a-service and developments in big data and how to more effectively deploy and manage open source flavors of these technologies. Specific the guide will cover:

Infrastructure-as-a-Service – The Systems Cloud – Get a comparison of the open source cloud platforms including OpenStack, Apache CloudStack, Eucalyptus and OpenNebula
Platform-as-a-Service – The Developers Cloud – Learn about the tools that abstract the complexity for developers and used to build portable auto-scaling applications ton CloudFoundry, OpenShift, Stackato and more.
Data-as-a-Service – The Analytics Cloud – Want to figure out the who, what, where, when and why of big data? You’ll get an overview of open source NoSQL databases and technologies like MapReduce to help parallelize data mining tasks and crunch massive data sets in the cloud.
Network-as-a-Service – The Network Cloud – The final pillar for truly fungible network infrastructure is network virtualization. We will give an overview of software-defined networking including OpenStack Quantum, Nicira, open Vswitch and others.
Finally this talk will provide an overview of the tools that can help you really take advantage of the cloud. Do you want to auto-scale to serve millions of web pages and scale back down as demand fluctuates. Are you interested in automating the total lifecycle of cloud computing environments You’ll learn how to combine these tools into tool chains to provide continuous deployment systems that will help you become agile and spend more time improving your IT rather than simply maintaining it.

[Finally, for those of you that are Douglas Adams fans please accept the deepest apologies for bad analogies to the HHGTTG.]

Published in: Technology, Education
  • Be the first to comment

OSCON 2013 - The Hitchiker’s Guide to Open Source Cloud Computing

  1. 1. The HitchHiker’s Guide to Open Source Cloud ComputingOSCON 2013 Mark R. Hinkle Sr. Director , OPEN SOURCE SOLUTIONS Citrix Systems INC. @mrhinkle
  2. 2. Mark Hinkle, Sr. Director, Open Source Solutions • Dedicated to the success of the Apache CloudStack, Open Daylight & Xen Project Communities on Citrix behalf • Run learning activities all over the world • Joined Citrix via acquisition July 2011 • Zenoss Core Open Source project to 100,000 users, 1.5 million downloads • Former LinuxWorld Magazine Editor-in-Chief • Open Management Consortium organizer • Author - “Windows to Linux Business Desktop Migration” – Thomson • NetDirector Project - Open Source Configuration Management • Sometimes Author and Blogger at • NetworkWorld Open Source Subnet Hitchhiker’s Guide to the Open Cloud by @mrhinkle
  3. 3. Slides On Line Slideshare: edsoftware Hitchhiker’s Guide to the Open Cloud by @mrhinkle
  4. 4. Quick Cloud Computing Overview or the Obligatory “What is the Cloud Explanation”
  5. 5. 60 Second Cloud Definitions Hitchhiker’s Guide to the Open Cloud by @mrhinkle USER CLOUD a.k.a. SOFTWARE AS A SERVICE FIVE CHARACTERISTICS OF CLOUD 1. On-Demand Self-Service 2. Broad Network Access 3. Resource Pooling 4. Rapid Elasticity 5. Measured Service DEVELOPMENT CLOUD a.k.a. PLATFORM-AS-A-SERVICE SYSTEMS CLOUD a.k.a INFRASTRUCTURE-AS-A-SERVICE
  6. 6. Building Open Source Clouds
  7. 7. Cloud Architecture Hitchhiker’s Guide to the Open Cloud by @mrhinkle
  8. 8. Hypervisors Open Source • Xen, Project Xen Cloud Platform (XCP) • KVM – Kernel-based Virtualization • VirtualBox* - Oracle supported Virtualization Solutions • OpenVZ* - Container-based, Similar to Solaris Containers or BSD Zones • LXC – User Space chrooted installs Proprietary • VMware • Citrix Xenserver (based • Microsoft Hyper-V • OracleVM (Based on OS Xen) Hitchhiker’s Guide to the Open Cloud by @mrhinkle
  9. 9. Open Virtual Machine Formats Open Virtualization Format (OVF) is an open standard for packaging and distributing virtual appliances or more generally software to be run in virtual machines. Formats for hypervisors/cloud technologies: • Amazon - AMI • KVM – QCOW2 • VMware – VMDK • Xen Project– IMG • VHD – Virtual Hard Disk - Hyper-V Hitchhiker’s Guide to the Open Cloud by @mrhinkle
  10. 10. Sourcing Cloud Appliances Tool/Project What you can do with them Bitnami BitNami provides free, ready to run environments for your favorite open source web applications and frameworks, including Drupal, Joomla!, Wordpress, PHP, Rails, Django and many more. Boxgrinder BoxGrinder is a set of projects that help you grind out appliances for multiple virtualization and Cloud providers Oz Command-line tool that has the ability to create images for common Linux distributions to run on KVM SUSE Studio SUSE Studio supports building and deploying directly to cloud services such as Amazon EC2. Hitchhiker’s Guide to the Open Cloud by @mrhinkle
  11. 11. Scale-Up or Scale-Out Vertical Scaling (Scale-Up) Allocate additional resources to VMs, requires a reboot, no need for distributed app logic, single-point of OS failure Horizontal Scaling (Scale-Out) Application needs logic to work in distributed fashion (e.g. HA-Proxy and Apache, Hadoop) Hitchhiker’s Guide to the Open Cloud by @mrhinkle
  12. 12. Compute Clouds (IaaS) Year Started License Virtualization Technologies Apache CloudStack 2008 Apache Xenserver, Xen Cloud Platform, KVM, VMware (Hyper-V developing) Eucalyptus 2006 GPL Xen, KVM, VMware (commercial version) OpenNebula 2005 Apache Xen, KVM, VMware OpenStack 2010 (Developed by NASA by Anso Labs previously) Apache VMware ESX and ESXi, , Xen, Xen Cloud Platform KVM, LXC, QEMU and Virtual Box Hitchhiker’s Guide to the Open Cloud by @mrhinkle
  13. 13. OpenStack – Ecosystem of Projects Enterprise Message Queue based on Rabbit MQ (ESB) Object Storage “Swift” Image Service “Glance ” Compute “Nova” Dashboard “Horizon” KVM, VMware, Xen Cloud Platform Ceph, Gluster Advanced Cloud and Networking services accessing the Quantum API Firewall Service Gateway Service QuantumNetworkingFabric RESTAPI Plugins OpenvSwitch Quantum Plugin-ins IdentityServices“Keystone” API Hitchhiker’s Guide to the Open Cloud by @mrhinkle
  14. 14. Cloud APIs • jclouds • libcloud • deltacloud • fog Hitchhiker’s Guide to the Open Cloud by @mrhinkle
  15. 15. Cloud Computing Storage Project Description Ceph Distributed file storage system developed by DreamHost GlusterFS Scale Out NAS system aggregating storage over Ethernet or Infiniband OpenStack Storage Long-term object storage system Riak CS Riak CS is open source software designed to provide simple, available, distributed cloud storage at any scale. Riak CS is S3- API compatible and supports per-tenant reporting for billing and metering use cases. Sheepdog Distributed storage for KVM hypervisors Hitchhiker’s Guide to the Open Cloud by @mrhinkle
  16. 16. Platform-as-a-Service (PaaS) Project Year Started Sponsors Languages/Frameworks CloudFoundry 2011 VMware Spring for Java, Ruby for Rails and Sinatra, node.js, Grails, Scala on Lift and more via partners (e.g. Python, PHP) Cloudify 2012 Gigaspaces [Groovy for deployment recipes] OpenShift 2011 Red Hat Java, Ruby, PHP, Perl and Python Stackato 2012 ActiveState Java, Python, PHP, Ruby, Perl, Node.js, others WSO2 Stratus 2010 WSO2 Jboss, Java EE6 Hitchhiker’s Guide to the Open Cloud by @mrhinkle
  17. 17. What’s Coming…the Rise of LXC • Platform-as-a-Service (PaaS) Sounds Good but…. • Gives us a Standard Payload Container for Linux-based workloads • You Can Run LXC on a Virtualized Environment or Natively • There are already huge numbers of tools that can manage LXC. • SELinux provides a proven security model users are already familiar with. Hitchhiker’s Guide to the Open Cloud by @mrhinkle
  18. 18. Software Defined Networking (SDN)
  19. 19. Overview of Software Defined Networking Business Applications Network Services Network DevicesNetwork DevicesNetwork Devices Network DevicesNetwork DevicesNetwork Devices Hitchhiker’s Guide to the Open Cloud by @mrhinkle
  20. 20. Cloud Promise, Reality and Networks Cloud Promise Cloud Reality Centralized Configuration and Automation Without true virtualization, network devices must still be manually configured. Instant Self-Service Provisioning In a physical network, it could take a long time for network engineer to provision new services. Elasticity and Scalability By horizontally scaling up the physical network, elasticity is lost. Designed for Failure Failover can be automated and physical network limitations can be alleviated. Hitchhiker’s Guide to the Open Cloud by @mrhinkle
  21. 21. Open Flow OpenFlow enables networks to evolve, by giving a remote controller the power to modify the behavior of network devices, through a well-defined "forwarding instruction set". The growing OpenFlow ecosystem now includes routers, switches, virtual switches, and access points from a range of vendors. Hitchhiker’s Guide to the Open Cloud by @mrhinkle
  22. 22. Software Defined Networking (SDN) Project Description Floodlight The Floodlight controller is an enterprise-class, Apache-licensed, Java-based OpenFlow Controller. Indigo Indigo is an open source project to support OpenFlow on a range of physical switches. By leveraging hardware features of Ethernet switch ASICs, Indigo supports high rates for high port counts, up to 48 10-gigabit ports. Multiple gigabit platforms with 10-gigabit uplinks are also supported. Open Daylight Linux Foundation Collaborative Project based on Cisco One Controller and plugins from numerous vendors in development. E.g IBM DOVE OpenStack “Quantum” Networking Pluggable, scalable, API-driven network and IP management Open vSwitch Open vSwitch is a open source (ASL 2.0), multilayer virtual switch designed to enable massive network automation through programmatic extension, while still supporting standard management interfaces and protocols (e.g. NetFlow, sFlow, SPAN, RSPAN, CLI, LACP, 802.1ag). Hitchhiker’s Guide to the Open Cloud by @mrhinkle
  23. 23. Big Data
  24. 24. 1 Billion Facebook Users - October 2012 0 200 400 600 800 1000 1200 Dec-04 Mar-05 Jun-05 Sep-05 Dec-05 Mar-06 Jun-06 Sep-06 Dec-06 Mar-07 Jun-07 Sep-07 Dec-07 Mar-08 Jun-08 Sep-08 Dec-08 Mar-09 Jun-09 Sep-09 Dec-09 Mar-10 Jun-10 Sep-10 Dec-10 Mar-11 Jun-11 Sep-11 Dec-11 Mar-12 Jun-12 Sep-12 Hitchhiker’s Guide to the Open Cloud by @mrhinkle
  25. 25. Data is growing faster than storage capacity and computing power. Legacy systems hold organizations back; storage software must include multi-petabyte capacity, support potentially billions of objects, and provide application performance awareness and agile provisioning. -Gartner, Big Data Challenges for the IT Infrastructure Team Big Data and Storage Infrastructure Hitchhiker’s Guide to the Open Cloud by @mrhinkle
  26. 26. Open Source NoSQL Databases Name Type Description Apache Cassandra Wide Column Store/Families API: many » Query Method: MapReduce, Replicaton: , Written in: Java, Concurrency: eventually consistent , Misc: like "Big-Table on Amazon Dynamo alike", initiated by Facebook CouchDB Document Store API: Memcached API+protocol (binary and ASCII) , most languages, Protocol: Memcached REST interface for cluster conf + management, Written in: C/C++ + Erlang (clustering), Replication: Peer to Peer, fully consistent, Misc: Transparent topology changes during operation, provides memcached-compatible caching buckets HBase Wide Column Store/Families API: Java / any writer, Protocol: any write call, Query Method: MapReduce Java / any exec, Replication: HDFS Replication, Written in: Java Hypertable Wide Column Store/Families PI: Thrift (Java, PHP, Perl, Python, Ruby, etc.), Protocol: Thrift, Query Method: HQL, native Thrift API, Replication: HDFS Replication, Concurrency: MVCC, Consistency Model: Fully consistent Misc: High performance C++ implementation of Google's Bigtable. MongoDB Document Store API: BSON, Protocol: C, Query Method: dynamic object-based language & MapReduce, Replication: Master Slave & Auto-Sharding, Written in: C++,Concurrency Redis Key Value/ Tuple Store API: Tons of languages, Written in: C, Concurrency: in memory and saves asynchronous disk after a defined time. Append only mode available. Different kinds of fsync policies. Replication: Master / Slave, Misc: also lists, sets, sorted sets, hashes, queues. Riak Key Value / Tuple Store API: JSON, Protocol: REST, Query Method: MapReduce term matching , Scaling: Multiple Masters; Written in: Erlang, Concurrency: eventually consistent (stronger then MVCC via Vector Clocks) Hitchhiker’s Guide to the Open Cloud by @mrhinkle
  27. 27. MapReduce Problem Data Master Node Worker Node 1 Worker Node 2 Worker Node 3 Solution Data Hitchhiker’s Guide to the Open Cloud by @mrhinkle
  28. 28. Apache Hadoop Overview • Handles large amounts of data • Stores data in native format • Delivers linear scalability at low cost • Resilient in case of infrastructure failures • Transparent application scalability Facts • Apache top-level open source project • One framework for storage and compute – HDFS – Scalable storage in Hadoop Distributed File System (HDFS) – Compute via the MapReduce distributed processing platform • Domain Specific Language (DSL) - Java Hitchhiker’s Guide to the Open Cloud by @mrhinkle
  29. 29. Hadoop Architecture Hadoop Common HDFS Distributes & replicates data across machines MapReduce Distributes & monitors tasks Hive Data warehouse that provides SQL interface. Ad hoc projection of data structure to unstructured • • HBase Column-oriented schema-less distributed DB modeled after Google’s BigTable Random real time read/write. Pig Platform for manipulating and analyzing large data sets. Scripting language for analysts. Mahout Machine learning libraries for recommendations , clustering, classifications and item sets. ChuckwaZookeeper Hitchhiker’s Guide to the Open Cloud by @mrhinkle
  30. 30. Big Data Summary • Quantity of Machine Created Data Increasing Drastically (examples: networked sensor data from mobile phones and GPS devices) • Data manipulation moving from batched to real-time • Cloud services giving everyone Big Data tools • Consumer company speed and scale requirements driving efficiencies in Big Data storage and analytics • New and broader number of data sources being meshed together • Big Data Apps means using Big Data is faster and easier Hitchhiker’s Guide to the Open Cloud by @mrhinkle
  31. 31. Cloud Management Tools
  32. 32. Automation in the Cloud Hitchhiker’s Guide to the Open Cloud by @mrhinkle
  33. 33. 4 Types of Management Tools Provisioning Installation of operating systems and other software Configuration Management Sets the parameters for servers, can specify installation parameters Orchestration/Automation Automate tasks across systems Monitoring Records errors and health of IT infrastructure Hitchhiker’s Guide to the Open Cloud by @mrhinkle
  34. 34. Management Toolchains Configuration Patching and Provisioning Monitoring Hitchhiker’s Guide to the Open Cloud by @mrhinkle
  35. 35. Conceptual Automated Toolchain BootStrapped Image CloudStack OpenStack Configuration Puppet Chef Start/Stop Services RunDeck Capistrano MCollective Provision Cobbler SUSE Stuido Monitoring Nagios Zenoss Cacti Generate Images SUSE Studio BoxGrinder Hitchhiker’s Guide to the Open Cloud by @mrhinkle
  36. 36. Hitchhiker’s Guide to the Open Cloud by @mrhinkle
  37. 37. Goodbye and Thanks for All the Fish! Hitchhiker’s Guide to the Open Cloud by @mrhinkle Slides Can be Viewed and Downloaded at:
  38. 38. Contact Me Hitchhiker’s Guide to the Open Cloud by @mrhinkle
  39. 39. Appendix (A.K.A Great Stuff I didn’t say)
  40. 40. Additional Resources • Devops Toolchains Group • Software Defined Networking: The New Norm for Networks (Whitepaper) • DevOps Wikipedia Page • – Ultimate Guide to the Non-Relational Universe • Open Cloud Initiative • NIST Cloud Computing Platform • Open Virtualization Format Specs • Clouderati Twitter Account • Planet DevOps • Nicira Whitepaper – It’s Time to Virtualize the Network • Why Open vSwitch FAQ Hitchhiker’s Guide to the Open Cloud by @mrhinkle
  41. 41. Big Data Landscape Hitchhiker’s Guide to the Open Cloud by @mrhinkle
  42. 42. Monitoring Tools License Type of Monitoring Collection Methods Cacti / RRDTool GPL Performance SNMP, syslog Graphite Apache 2.0 Performance Agent Nagios GPL Availability SNMP,TCP, ICMP, IPMI, syslog Zabbix GPL Availability/ Performance and more SNMP, TCP/ICMP, IPMI, Synthetic Transactions Zenoss GPL Availability, Performance, Event Management SNMP, ICMP, SSH, syslog, WMI Hitchhiker’s Guide to the Open Cloud by @mrhinkle
  43. 43. Provisioning Project Installation Targets Apache Provisionr(incubating) Can provision 10s to 1000s of machines on various clouds. Cobbler Distributed virtual infrastructure using koan (kickstart of a network to PXE boot VMs) for Red Hat, OpenSUSE Fedora, Debian, Ubuntu VMs Crowbar (Bare metal provisioning) JuJu Public Clouds - Amazon Web Services HP Cloud, Private OpenStack clouds, Bare Metal via MAAS. Salt Cloud Tool to provision “salted” VMs that can then be updated by a central server via ZeroMQ Hitchhiker’s Guide to the Open Cloud by @mrhinkle
  44. 44. Configuration Management Tools Project Year Started Language License Client/Server Cfengine 1993 C Apache Yes Chef 2009 Ruby Apache Chef Solo – No Chef Server - Yes Puppet 2004 Ruby GPL Yes & standalone Salt 2011 Python Apache yes Hitchhiker’s Guide to the Open Cloud by @mrhinkle
  45. 45. Automation/Orchestration Tools Project Description Ansible Ansible's SSH-key based access allows contributors to the Fedora Project to assist in automating infrastructure while having access limited appropriately. Capistrano Utility and framework for executing commands in parallel on multiple remote machines, via SSH. It uses a simple DSL that allows you to define tasks, which may be applied to machines in certain roles RunDeck Rundeck is an open-source process automation and command orchestration tool with a web console. Func Func provides a two-way authenticated system for generically executing tasks, integrations with puppet and cobbler. MCollective The Marionette Collective AKA MCollective is a framework to build server orchestration or parallel job execution systems. Salt Execute arbitrary shell commands or choose from dozens of pre-built modules of common (or complex) commands. Scalr Provide scaling across multiple cloud computing platforms, integrates with Chef. Hitchhiker’s Guide to the Open Cloud by @mrhinkle
  46. 46. NetFlix Open Source ToolBag for AWS ASGARD ASTYANAX EDDA EUREKA PRIAM SIMIAN ARMY Hitchhiker’s Guide to the Open Cloud by @mrhinkle