Hitchhiker’s Guideto The Open CloudLinux Foundation Collaboration Summit 2013Mark R. HinkleSr. Director , OPEN SOURCE SOLU...
Mark Hinkle, Sr. Director, Open Source Solutions•  Dedicated	  to	  the	  success	  of	  the	  Apache	  CloudStack,	  Open...
Why Open Source and the Cloud Computing?•  User-­‐Driven	  Context	  from	  Solving	  Real	  Problems	  •  Lower	  Barrier...
Quick Cloud Computing Overview orthe Obligatory “What is the CloudExplanation”Infinite Improbability Drive
Five Characteristics of Cloud1. On-­‐Demand	  Self-­‐Service	  2. Broad	  Network	  Access	  3. Resource	  Pooling	  4. Ra...
Cloud Computing Service ModelsUSER CLOUD a.k.a. SOFTWARE AS A SERVICESingle application, multi-tenancy, network-based, one...
Deployment Models: Public, Private & HybridHitchhiker’s Guide to the Open Cloud by @mrhinkle7
Building Open Source Clouds
Cloud ArchitectureHitchhiker’s Guide to the Open Cloud by @mrhinkle9
HypervisorsOpen	  Source	  •  Xen,	  Project	  	  Xen	  Cloud	  PlaMorm	  (XCP)	  •  KVM	  –	  Kernel-­‐based	  Virtualiza...
Open Virtual Machine FormatsOpen	  VirtualizaGon	  Format	  (OVF)	  is	  an	  open	  standard	  for	  packaging	  and	  di...
Sourcing Cloud AppliancesTool/Project	   What	  you	  can	  do	  with	  them	  Bitnami	   BitNami	  provides	  free,	  rea...
Scale-Up or Scale-OutVerGcal	  Scaling	  (Scale-­‐Up)	  	  Allocate	  addi3onal	  resources	  to	  VMs,	  requires	  a	  r...
Compute Clouds (IaaS)Year	  Started	   License	   VirtualizaGon	  Technologies	  Apache	  CloudStack	  2008	   Apache	   X...
OpenStack – Ecosystem of ProjectsEnterprise	  Message	  Queue	  based	  on	  Rabbit	  MQ	  (ESB)	  Object	  Storage	  “Swi...
Cloud APIs•  jclouds	  •  libcloud	  •  deltacloud	  •  fog	  Hitchhiker’s Guide to the Open Cloud by @mrhinkle16
Cloud Computing StorageProject	  	   DescripGon	  Ceph	   Distributed	  file	  storage	  system	  developed	  by	  DreamHos...
Platform-as-a-Service (PaaS)Project	   Year	  Started	   Sponsors	   Languages/Frameworks	  CloudFoundry	   2011	   VMware...
Software Defined Networking (SDN)
Overview of Software Defined NetworkingBusiness	  Applica3ons	  Network	  Services	  SDNControlSoftwareAPI APINetwork	  Dev...
Cloud Promise, Reality and NetworksCloud	  Promise	   Cloud	  Reality	  Centralized	  ConfiguraGon	  and	  AutomaGon	  With...
Open FlowOpenFlow	  enables	  networks	  to	  evolve,	  by	  giving	  a	  remote	  controller	  the	  power	  to	  modify	...
Software Defined Networking (SDN)Project DescriptionFloodlight	   The	  Floodlight	  controller	  is	  an	  enterprise-­‐cl...
Big DataDeep Thought
1 Billion Facebook Users - October 20120	  200	  400	  600	  800	  1000	  1200	  Dec-­‐04	  Mar-­‐05	  Jun-­‐05	  Sep-­‐05...
Twitter at 400M Tweets Per Day – June 20120	  50	  100	  150	  200	  250	  300	  350	  400	  450	  Jan-­‐07	  Mar-­‐07	  M...
Data	  is	  growing	  faster	  than	  storage	  capacity	  and	  compu3ng	  power.	  Legacy	  systems	  hold	  organiza3on...
Big Data LandscapeSource:BigDataGroup.comHitchhiker’s Guide to the Open Cloud by @mrhinkle28
Open Source NoSQL DatabasesName Type DescriptionApache	  Cassandra	  Wide	  Column	  Store/Families	  API:	  many	  »	  Qu...
MapReduceProblem	  Data	  Master	  	  Node	  Worker	  Node	  1	  Worker	  Node	  2	  Worker	  Node	  3	  Solu3on	  Data	  ...
Apache HadoopOverview	  •  Handles	  large	  amounts	  of	  data	  •  Stores	  data	  in	  na3ve	  format	  •  Delivers	  ...
Hadoop ArchitectureHadoop	  Common	  	  	  HDFS	  Distributes	  &	  replicates	  data	  across	  machines	  MapReduce	  Di...
Big Data Summary•  Quan3ty	  of	  Machine	  Created	  Data	  Increasing	  Dras3cally	  (examples:	  networked	  sensor	  d...
Cloud Management Tools
Automation in the CloudMeat Cloud Cloud OperationsHitchhiker’s Guide to the Open Cloud by @mrhinkle35
4 Types of Management ToolsProvisioningInstallation of operating systems and other softwareConfiguration ManagementSets th...
Management ToolchainsConfigura3on	  Patching	  and	  Provisioning	  Monitoring	  Toolchain (n):A set of tools wherethe outp...
ProvisioningProject Installation TargetsApache	  Provisionr(incuba3ng)	  Can	  provision	  10s	  to	  1000s	  of	  machine...
Configuration Management ToolsProject	   Year	  Started	   Language	   License	   Client/Server	  Cfengine	   1993	   C	   ...
Automation/Orchestration ToolsProject	   DescripGon	  Ansible	   Ansibles	  SSH-­‐key	  based	  access	  allows	  contribu...
Conceptual Automated ToolchainBootStrapped	  Image	  CloudStack	  OpenStack	  ConfiguraGon	  Puppet	  Chef	  Start/Stop	  S...
NetFlix Open Source ToolBag for AWSASGARD ASTYANAX EDDAEUREKA PRIAM SIMIAN ARMY42Hitchhiker’s Guide to the Open Cloud by @...
Hitchhiker’s Guide to the Open Cloud by @mrhinkle43
Goodbye and thanks for all the fish!
Questions?Slides Can be Viewed and Downloaded at:http://www.slideshare.net/socializedsoftware/Copyright Mark R. Hinkle, av...
Contact MeProfessional: mark.hinkle@citrix.comPersonal: mrhinkle@gmail.comPhone: 919.228.8049Personal: http://www.socializ...
Additional Resources•  Devops	  Toolchains	  Group	  •  Soaware	  Defined	  Networking:	  	  The	  New	  Norm	  for	  Netwo...
Monitoring ToolsLicense	   Type	  of	  Monitoring	   CollecGon	  Methods	  Cac3	  	  /	  RRDTool	   GPL	  	   Performance	...
Upcoming SlideShare
Loading in …5

Linux Foundation Collaboration Summit: Hitchhiker's Guide to the Cloud


Published on

Imagine it'€™s eight o'€™clock on a Thursday morning and you awake to see a bulldozer out your window ready to plow over your data center. Normally you may wish to consult the Encyclopedia Galáctica to discern the best course of action but your copy is likely out of date. And while the Hitchhiker'€™s Guide to the Galaxy (HHGTTG) is a wholly remarkable book it doesn'€™t cover the nuances of cloud computing. That'€™s why you need the Hitchhiker'€™s Guide to Cloud Computing (HHGTCC) or at least to attend this talk understand the state of open source cloud computing. Specifically this talk will cover infrastructure-as-a-service, platform-as-a-service and developments in big data and how to more effectively take advantage of these technologies using open source software. Technologies that will be covered in this talk include Apache CloudStack, Chef, CloudFoundry, NoSQL, OpenStack, Puppet and many more.

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Infinite Probability Drive The Infinite Improbability Drive is a faster-than-light drive. The most prominent usage of the drive is in the starship Heart of Gold. It is based on a particular perception of quantum theory: a subatomic particle is most likely to be in a particular place, such as near the nucleus of an atom, but there is also an infintesimally small probability of it being found very far from its point of origin (for example close to a distant star). Thus, a body could travel from place to place without passing through the intervening space (or hyperspace, for that matter), if you had sufficient control of probability.Reference : Michael Lockwood (2005). The Labyrinth of Time: introducing the universe. Oxford University Press. ISBN 0-19-924995-4.
  • Cloud Software as a Service (SaaS) – The Application CloudThe capability provided to the consumer is to use the provider’s applications running on a cloud infrastructure. The applications are accessible from various client devices through a thin client interface such as a web browser (e.g., web-based email). The consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, storage, or even individual application capabilities, with the possible exception of limited user-specific application configuration settings.Cloud Platform as a Service (PaaS) – The Development Cloud The capability provided to the consumer is to deploy onto the cloud infrastructure consumer-created or acquired applications created using programming languages and tools supported by the provider. The consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, or storage, but has control over the deployed applications and possibly application hosting environment configurations.Cloud Infrastructure as a Service (IaaS). – Systems CloudThe capability provided to the consumer is to provision processing, storage, networks, and other fundamental computing resources where the consumer is able to deploy and run arbitrary software, which can include operating systems and applications. The consumer does not manage or control the underlying cloud infrastructure but has control over operating systems, storage, deployed applications, and possibly limited control of select networking components (e.g., host firewalls).
  • Private cloudThe cloud infrastructure is operated solely for an organization. It may be managed by the organization or a third party and may exist on premise or off premise.Public cloudThe cloud infrastructure is made available to the general public or a large industry group and is owned by an organization selling cloud services.Hybrid cloudThe cloud infrastructure is a composition of two or more clouds (private, community, or public) that remain unique entities but are bound together by standardized or proprietary technology that enables data and application portability (e.g., cloud bursting for load balancing between clouds).
  • The Kill-o-Zap is a weapon first appearing in the novel The Hitchhiker's Guide to the Galaxy, wielded by the police from Blagulon Kappa when they come to Magrathea to arrest Zaphod. It is referenced throughout the series in the role of a standard and widespread brand of raygun.In the novel The Restaurant at the End of the Universe it is described in more detail:The designer of the gun had clearly not been instructed to beat about the bush. 'Make it evil,' he'd been told. 'Make it totally clear that this gun has a right end and a wrong end. Make it totally clear to anyone standing at the wrong end that things are going badly for them. If that means sticking all sort of spikes and prongs and blackened bits all over it then so be it. This is not a gun for hanging over the fireplace or sticking in the umbrella stand, it is a gun for going out and making people miserable with.'In the novel Life, the Universe and Everything, the group arms themselves with Kill-o-Zap guns against the Krikkiters. Arthur "fumbled to release the safety catch and engage the extreme danger catch as Ford had shown him. He was shaking so much that if he'd fired at anybody at that moment he probably would have burnt his signature on them."In the 2005 movie adaptation, the gun has a sophisticated look. It is more of a white circle that covers the hand and has a trigger on the inside. This version is wielded by Marvin.
  • Derived from the NIST Diagram Physical Resources NetworkingComputeStorageBios/FirmwareSoftware KernelOperating Systems with Type II HypervisorsVM Manager (VMM) – Type 1 Hypervisors Virtualized Resources NetworkingComputeStorageVirtualized ResourcesMetadataVirtual Machine Images
  • Top choices for Cloud Computing are Xen and KVM.OpenVZ, container virtualization for Linux, is an interesting option as it has a very minimal overhead to scale application space similar to containers like BSD Jails. Advantage is that memory allocation is soft and unutilized memory can be used by other applications.
  • OVFAn OVF package consists of several files, placed in one directory. A one-file alternative is the OVA package, which is a TAR file with the OVF directory inside.OVF is a packaging format for software appliances. From a technical point of view, an OVF is a transport mechanism for virtual machine templates. One OVF may contain a single VM, or many VMs (it is left to the software appliance developer to decide which arrangement best suits their application). OVFs must be installed before they can be run; a particular virtualization platform may run the VM from the OVF, but this is not required. If this is done, the OVF itself can no longer be viewed as a “golden image” version of the appliance, since run-time state for the virtual machine(s) will pervade the OVF. Moreover the digital signature that allows the platform to check the integrity of the OVF will be invalidAn Amazon Machine Image (AMI) is a special type of virtual appliance which is used to instantiate (create) a virtual machine within the Amazon Elastic Compute Cloud. It serves as the basic unit of deployment for services delivered using EC2..Amazon AMI An Amazon Machine Image (AMI) is a special type of virtual appliance which is used to instantiate (create) a virtual machine within the Amazon Elastic Compute Cloud. It serves as the basic unit of deployment for services delivered using EC2. Like all virtual appliances, the main component of an AMI is a read-only filesystem image which includes an operating system (e.g., Linux, UNIX, or Windows) and any additional software required to deliver a service or a portion of it.[2]The AMI filesystem is compressed, encrypted, signed, split into a series of 10MB chunks and uploaded into Amazon S3 for storage. An XML manifest file stores information about the AMI, including name, version, architecture, default kernel id, decryption key and digests for all of the filesystem chunks.An AMI does not include a kernel image, only a pointer to the default kernel id, which can be chosen from an approved list of safe kernels maintained by Amazon and its partners (e.g., RedHat, Canonical, Microsoft). Users may choose kernels other than the default when booting an AMI.QCOW2 – QEMU “Copy on Write” Version 2qcow stands for "QEMU Copy On Write" and denotes a disk storage optimization strategy that delays allocation of storage until it is actually needed. QEMU is an emulator and virtual machine container, and it can use a variety of virtual disk images which are generally associated with specific guests operating systems.qcow2 is a newer version of the qcow format. QEMU can use a base image which is read-only, and store all writes to the qcow2 image. Among the QEMU supported formats, this is the most versatile format. Features include smaller images (useful if the filesystem does not support holes, for example on FAT32), optional AES encryption, zlib based compression and support of multiple VM snapshots. qemu and xen have retained the qcow format for backwards compatibility. Users can easily convert qcow disk images to the qcow2 format.VMDK - Virtual Machine Disk VMDK (Virtual Machine Disk) is a file format used for virtual appliances developed for VMware products. The format is a container for virtual hard disk drives to be used in virtual machines like VMware Workstation or Virtualbox. VMDK is an open format.IMGThe IMG file extension is used by files which are standardized raw dumps of a disk, and by files in various formats created by different imaging programs.Xen can use raw disk images and physical disks as filesystems for a Xen based domainU. Another option is to use the disk images used by QEMU. VHD – Virtual Hard Disk Virtual Hard Disk format started by Connectix (now part of Microsoft) made open through the Microsoft Open Specification Promise.VHDs are implemented as files that reside on the native host file system. The following types of VHD formats are supported by Microsoft Virtual PC and Virtual Server:Fixed hard disk image: a file that is allocated to the size of the virtual disk. Fixed VHDs consist of a raw disk image followed by a VHD footer (512 or formerly 511 bytes).[1]Dynamic hard disk image: a file that at any given time is as large as the actual data written to it, plus the size of the header and footer. Dynamic and differencing VHDs begin with a copy of the VHD footer (padded to 512 bytes), and for dynamic or differencing VHDs created by Microsoft products this results in a VHD-cookie string conectix at the begin of the VHD file.[1]Differencing hard disk image: a set of modified blocks (maintained in a separate file referred to as the "child image") in comparison to a parent image. The Differencing hard disk image format allows the concept of Undo Changes: when enabled, all changes to a hard drive contained within a VHD (the parent image) are stored in a separate file (the child image). Options are available to undo the changes to the VHD, or to merge them permanently into the VHD. Different child images based on the same parent image also allow "cloning" of VHDs; at least the globally unique identifier (GUID) must be different.Linked to a hard disk: a file which contains a link to a physical hard drive or partition of a physical hard drive
  • Appliances are like toasters, they do one thing very well. BitnamiBitNami Cloud Images allow BitNami Stacks to run in a cloud computing environment. BitNami offers Amazon Machine Images (AMIs) for running BitNami Stacks on the Amazon Cloud, as well as BitNami Cloud Hosting, a service that simplifies the process of running open source applications on Amazon EC2.BoxGrinderBoxGrinder supports many virtualization and Cloud platforms like EC2, Xen, KVM, VMware. You can create an appliance based on Fedora, Red Hat Enterprise Linux or CentOS. You are of course free to write your own plugin to support any other virtualization platform or operating system.Oz Oz is a command-line tool that has the ability to create images for common Linux distributions. There are lots of tools for image building.  Oz is a bit different from most others in that it actually spawns a VM to do an install, while most other tools simply use a loopback mounted filesystem.SUSE StudioSuSE Studio allows you to use a hosted build service and a on premise virtual build system. Has a RESTful API to make calls to SUSE Studio openSUSE, SUSE Enterprise Linux (SuSE) and JeOSIntegrates with SUSE Lifecycle Management Server and WebYASTCan Share Images in the SUSE Studio GalleryOther projects:Imagefactory - http://imgfac.org/ - imagefactory builds images for a variety of operating system/cloud combinations.UShareSoft – Create cloud Server Templates on any OS in minutes via a SaaS
  • Scale Up Scale Out
  • CloudStack – www.cloudstack.org - CloudStack is an Apache Software Foundation project released under ASL 2.0 that provides a highly capable IaaS solution for service providers and enterprises. Robust Web Interface Comprehensive APISecure-Single Sign-OnDynamic Workload ManagementXenserver, Xen Cloud Platform, KVM, VMware, OracleVM supportSecure AJAX Console for VMsNetworking-as-a-Service (Create VLANs to segregate traffic)EC2 API Compatibility Usage MeteringEucalyptus– http://open.eucalyptus.com - IaaS platform originally targeted to provide migration path from Amazon EC2 to private cloud. Amazon AWS Interface CompatibilitySupports Amazon AMIHigh AvailabilityNetwork Management, Security Groups, Traffic IsolationSelf Service S3 compatible Storage Bucket-Based StorageXen and KVM Hypervisor Support (VMware in Enterprise Edition)User Group and Role-Based ManagementSingle Data Center OpenStack– www.openstack.orgOpenStack Compute (Nova) – Nova is a cloud orchestration platform similar to Amazon EC2 Orchestration of popular hypervisors (Xen, Xenserver, KVM, Hyper-V, VMware, Linux Containers)Floating IP Addresses (keep IPs and DNS correct when restarting VMs)VNC proxy through the WebApache 2.0 License Android/iOS ClientsBlock Storage Support (AOE, iSCSI, Sheepdog)OpenStack Storage (Swift) – Is a EBS style solution used for long term storage not real time. Swift is used creating redundant, scalable object storage using clusters of standardized servers to store petabytes of accessible data.Features:Store and Manage files ProgrammaticallyCreate public and private folders Using Commodity HardwareFault tolerant (Nodes/HDD)Scale-out, Scale-UpOpenStack Image Service(Glance) - OpenStack Image Service (code-named Glance) provides discovery, registration, and delivery services for virtual disk images.Features:Provides images-as-a-serviceSupports Raw, VHD, VDI, qcow2, VMDK, OVF Restful APIBackend Options – Swift, Local, S3, HTTPVersion Control and LoggingOpenNebula – http://www.opennebula.org/ – Cloud Computing Toolkit Apache license
  • Nova – Compute Fabric Controller similar to Amazon EC2 Nova is the project name for OpenStack Compute, a cloud computing fabric controller, the main part of an IaaS system. Individuals and organizations can use Nova to host and manage their own cloud computing.Component based architecture: Quickly add new behaviorsHighly available: Scale to very serious workloadsFault-Tolerant: Isolated processes avoid cascading failuresRecoverable: Failures should be easy to diagnose, debug, and rectifyOpen Standards: Be a reference implementation for a community-driven apiAPI Compatibility: Nova strives to provide API-compatible with popular systems like Amazon EC2Object Storage - Swift – Object Storage like Amazon S3Object Storage is ideal for cost effective, scale-out storage. It provides a fully distributed, API-accessible storage platform that can be integrated directly into applications or used for backup, archiving and data retention. Block Storage allows block devices to be exposed and connected to compute instances for expanded storage, better performance and integration with enterprise storage platforms, such as NetApp, Nexenta and SolidFire.Image Service “Glance”The OpenStack Image Service provides discovery, registration and delivery services for disk and server images. The ability to copy or snapshot a server image and immediately store it away is a powerful capability of the OpenStack cloud operating system. Stored images can be used as a template to get new servers up and running quickly—and more consistently if you are provisioning multiple servers—than installing a server operating system and individually configuring additional services. It can also be used to store and catalog an unlimited number of backups.Identity Service – “Keystone”OpenStack Identity provides a central directory of users mapped to the OpenStack services they can access. It acts as a common authentication system across the cloud operating system and can integrate with existing backend directory services like LDAP. It supports multiple forms of authentication including standard username and password credentials, token-based systems and AWS-style logins.Additionally, the catalog provides a queryable list of all of the services deployed in an OpenStack cloud in a single registry. Users and third-party tools can programmatically determine which resources they can access.As an administrator, OpenStack Identity enables you to:Configure centralized policies across users and systemsCreate users and tenants and define permissions for compute, storage and networking resources using role-based access control (RBAC) featuresIntegrate with an existing directory like LDAP, allowing for a single source of identity authentication across the enterpriseQuantum The Quantum API allows for creation and management of “virtual networks” each of which can have one or more “ports”. A port on a virtual network can be attached to a “network interface”, where a “network interface” is anything which can source traffic, such as a vNIC exposed by a virtual machine, an interface on a load balancer, and so on. These abstractions offered by Quantum (virtual networks, virtual ports,and network interfaces) are the building blocks for building and managing logical network topologies. Of course, the technology that implements Quantum is fully decoupled from the API (that is, the backend is “pluggable”).
  • Types of Tasks Accomplished by an APIProvisioning (creating, re-creating, moving, or deleting components e.g. virtual machines, vlans)Configuration (assigning or changing attributes of the architecture such as security and network settings)Cloud ProvidersJclouds – java API Abstraction Libcloud – started by CloudKick (now Rackspace) to abstract clouds, Apache incubator projectDeltacloud – started by Red Hat to abstract clouds, Apache incubator projectFog - provider and abstraction level API across compute and storage, written in Ruby
  • Primary Storage Secondary Storage = GlusterFSGluster FS is an open source scale-out NAS solution. The software is a powerful and flexible solution that simplifies the task of managing unstructured file data whether you have a few terabytes of storage or multiple petabytes.CephCeph is a distributed network storage and file system designed to provide excellent performance, reliability, and scalability.  Ceph is based on a reliable and scalable distributed object store, with a distributed metadata management cluster layered on top to provide a distributed file system with POSIX semantics.  There are a variety of ways to interact with the system Used by Bloomberg and DreamHostOpenStack Storage (code-named Swift)is open source software for creating redundant, scalable object storage using clusters of standardized servers to store petabytes of accessible data. It is not a file system or real-time data storage system, but rather a long-term storage system for a more permanent type of static data that can be retrieved, leveraged, and then updated if necessary. Primary examples of data that best fit this type of storage model are virtual machine images, photo storage, email storage and backup archiving. Having no central "brain" or master point of control provides greater scalability, redundancy and permanence.Riak Cloud Storage is simple, available storage software built on top of Riak. It features:Large object support (up to 5GB/object)S3-compatible API and authenticationMulti-tenancy and per-user reportingPer-node or capacity-based pricingMulti-datacenter replicationSheepdogSheepdogis a distributed storage system for QEMU/KVM. It provides highly available block level storage volumes that can be attached to QEMU/KVM virtual machines. Sheepdog scales to several hundreds nodes, and supports advanced volume management features such as snapshot, cloning, and thin provisioning.
  • CloudFoundryCloud Foundry, a VMware-led project, for building a Platform as a Service (PaaS) offering. Cloud Foundry provides a platform for building, deploying, and running cloud apps using Spring for Java developers, Rails and Sinatra for Ruby developers, Node.js and other JVM frameworks including Grails.CloudifyCloudify is designed to bring any app to any cloud enabling enterprises, ISVs, and managed service providers alike to quickly benefit from the cloud automation and elasticity organizations today need. Cloudify helps you maximize application onboarding and automation by externally orchestrating the application deployment and runtime. Cloudify’sDevOps approach treats infrastructure as code, enabling you to describe deployment and post–deployment steps for any application through an external blueprint – AKA, a recipe, which you can then take from cloud to cloud, unchanged.Cloudify recipes on Github at OpenShiftA free Platform-as-a-Service that enables developers to deploy apps written in multiple frameworks and languages across clouds. Open source licensing is forthcoming. StackatoStackato enables you to create a private PaaS hosted on the cloud of your choice (your own or with a hosting provider) to empower your developers to deploy, run, and manage their applications in the cloud. Stackato includes:Multi-choice cloud application platform with automatic provisioning:choice of language (Java, Python, PHP, Ruby, Perl, Node.js, Erlang, Scala, Clojure)choice of framework (popular frameworks for each of the languages above, such as Spring, Django, Pyramid, Rails, Mojolicious, Catalyst and more)choice of data service (MySQL, PostgreSQL, Redis, MongoDB) plus ability to connect to othersWSO2 The WSO2 middleware platform offers a full range of core services: application server, enterprise service bus (ESB), governance registry and repository, identity and access management, business process management (BPM), business activity monitor (BAM), portal server and more. WSO2 Stratos monitors CPU, memory and bandwidth utilization, and SLAs. Then it automatically scales up or down depending on the load. When new resources are needed, WSO2 Stratos transparently adds services and when load goes down, WSO2 Stratos automatically brings services down. Dynamic discovery enables services to automatically detect when resource allocations change; there is no need for manual monitoring or reconfiguration.
  • In the SDN architecture, the control and data planes are decoupled, network intelligence and state are logically centralized, and the underlying network infrastructure is abstracted from the applications. As a result, enterprises and carriers gain unprecedented programmability, automation, and network control, enabling them to build highly scalable, flexible networks that readily adapt to changing business needs
  • Software Defined Networking (SDN) is an emerging network architecture where network control is decoupled from forwarding and is directly programmable. This migration of control, formerly tightly bound in individual network devices, into accessible computing devices enables the underlying infrastructure to be abstracted for applications and network services, which can treat the network as a logical or virtual entity. This figure depicts a logical view of the SDN architecture. Network intelligence is (logically) centralized in software-based SDN controllers, which maintain a global view of the network. As a result, the network appears to the applications and policy engines as a single, logical switch. With SDN, enterprises and carriers gain vendor-independent control over the entire network from a single logical point, which greatly simplifies the network design and operation. SDN also greatly simplifies the network devices themselves, since they no longer need to understand and process thousands of protocol standards but merely accept instructions from the SDN controllers.
  • The limitations of the hardware-dependent network are preventing the enterprise from realizing the full potential of their cloud—and vastly limiting the return on their investment.To get the most from your cloud, you must untether your network.
  • Open FlowOpenFlow is an open standard that enables researchers to run experimental protocols in the campus networks we use every day. OpenFlow is added as a feature to commercial Ethernet switches, routers and wireless access points – and provides a standardized hook to allow researchers to run experiments, without requiring vendors to expose the internal workings of their network devices. OpenFlow is currently being implemented by major vendors, with OpenFlow-enabled switches now commercially available.In a classical router or switch, the fast packet forwarding (data path) and the high level routing decisions (control path) occur on the same device. An OpenFlow Switch separates these two functions. The data path portion still resides on the switch, while high-level routing decisions are moved to a separate controller, typically a standard server. The OpenFlow Switch and Controller communicate via the OpenFlow protocol, which defines messages, such as packet-received, send-packet-out, modify-forwarding-table, and get-stats.The data path of an OpenFlow Switch presents a clean flow table abstraction; each flow table entry contains a set of packet fields to match, and an action (such as send-out-port, modify-field, or drop). When an OpenFlow Switch receives a packet it has never seen before, for which it has no matching flow entries, it sends this packet to the controller. The controller then makes a decision on how to handle this packet. It can drop the packet, or it can add a flow entry directing the switch on how to forward similar packets in the future.OpenFlow is the first standard communications interface defined betweenthe control and forwarding layers of an SDN architecture. OpenFlow allows direct access to and manipulation of the forwarding plane of network devices such as switches and routers, both physical and virtual (hypervisor-based). It is the absence of an open interface to the forwarding plane that has led to the characterization of today’s networking devices as monolithic, closed, and mainframe-like. No other standard protocol does what OpenFlow does, and a protocol like OpenFlow is needed to move network control out of the networking switches to logically centralized control software
  • Floodlight - http://floodlight.openflowhub.org/The Floodlight controller is an enterprise-class, Apache-licensed, Java-based OpenFlow Controller. It is supported by a community of developers including a number of engineers from Big Switch Networks.OpenFlow is a open standard managed by the Open Networking Foundation (ONF). It specifies a protocol through switch a remote controller can modify the behavior of networking devices through a well-defined “forwarding instruction set”. Floodlight is designed to work with the growing number of switches, routers, virtual witches, and access points that support the OpenFlow standard.Open Daylight – http://www.opendaylight.comThe adoption of new technologies and pursuit of programmable networks has the potential to significantly improve levels of functionality, flexibility and adaptability of mainstream datacenter architectures. To leverage this abstraction to its fullest requires the network to adapt and evolve to a Software-Defined architecture. One of the architectural elements required to achieve this goal is a Software-Defined-Networking (SDN) platform that enables network control and programmability.OpenStack Networking “Quantum” – https://www.openstack.org/software/openstack-networking/OpenStack Networking is a pluggable, scalable and API-driven system for managing networks and IP addresses. Like other aspects of the cloud operating system, it can be used by administrators and users to increase the value of existing datacenter assets. OpenStack Networking ensures the network will not be the bottleneck or limiting factor in a cloud deployment and gives users real self service, even over their network configurations.Networking CapabilitiesOpenStack provides flexible networking models to suit the needs of different applications or user groups. Standard models include flat networks or VLANs for separation of servers and traffic.OpenStack Networking manages IP addresses, allowing for dedicated static IPs or DHCP. Floating IPs allow traffic to be dynamically rerouted to any of your compute resources, which allows you to redirect traffic during maintenance or in the case of failure. Users can create their own networks, control traffic and connect servers and devices to one or more networks.The pluggable backend architecture lets users take advantage of commodity gear or advanced networking services from supported vendors.Administrators can take advantage of software-defined networking (SDN) technology like OpenFlow to allow for high levels of multi-tenancy and massive scale.OpenStack Networking has an extension framework allowing additional network services, such as intrusion detection systems (IDS), load balancing, firewalls and virtual private networks (VPN) to be deployed and managed.Open vSwitchOpen vSwitch is a production quality, multilayer virtual switch licensed under the open source Apache 2.0 license. It is designed to enable massive network automation through programmatic extension, while still supporting standard management interfaces and protocols (e.g. NetFlow, sFlow, SPAN, RSPAN, CLI, LACP, 802.1ag). In addition, it is designed to support distribution across multiple physical servers similar to VMware's vNetwork distributed vswitch or Cisco's Nexus 1000V. See the full feature list here
  • Deep Thought is a computer that was created by the pan-dimensional, hyper-intelligent species of beings (whose three dimensional protrusions into our universe are ordinary white mice) to come up with the Answer to The Ultimate Question of Life, the Universe, and Everything. Deep Thought is the size of a small city. When, after seven and a half million years of calculation, the answer finally turns out to be 42, Deep Thought admonishes Loonquawl and Phouchg (the receivers of the Ultimate Answer) that "[he] checked it very thoroughly, and that quite definitely is the answer. I think the problem, to be quite honest with you is that you've never actually known what the question was.”Deep Thought does not know the ultimate question to Life, the Universe and Everything, but offers to design an even more powerful computer, Earth, to calculate it. After ten million years of calculation, the Earth is destroyed by Vogons five minutes before the computation is complete.
  • http://www.benphoster.com/facebook-to-1-billion-users-i-predict-august-16-2012/
  • Applications Infrastructure
  • NoSQLIn computing, NoSQL (commonly interpreted as "not only SQL"[1]) is a broad class of database management systems identified by non-adherence to the widely used relational database management system model. NoSQL databases are not built primarily on tables, and generally do not use SQL for data manipulation.NoSQL database systems are often highly optimized for retrieval and appending operations and often offer little functionality beyond record storage (e.g. key–value stores). The reduced run-time flexibility compared to full SQL systems is compensated by marked gains in scalability and performance for certain data models.Apache CassandraThe Apache Cassandra database is the right choice when you need scalability and high availability without compromising performance. Linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure make it the perfect platform for mission-critical data. Cassandra's support for replicating across multiple datacenters is best-in-class, providing lower latency for your users and the peace of mind of knowing that you can survive regional outages.Cassandra's ColumnFamily data model offers the convenience of column indexes with the performance of log-structured updates, strong support for materialized views, and powerful built-in caching. Cassandra is in use at Netflix, Twitter, Urban Airship, Constant Contact, Reddit, Cisco, OpenX, Digg, CloudKick, Ooyala, and more companies that have large, active data sets. The largest known Cassandra cluster has over 300 TB of data in over 400 machines. HypertableHypertable is based on a design developed by Googl(e.g.BigTable clone) to meet their scalability requirements and solves the scale problem better than any of the other NoSQL solutions out there.Mongo DB RedisRedis is an open source, BSD licensed, advanced key-value store. It is often referred to as a data structure server since keys can contain strings, hashes, lists, sets and sorted sets.Riak
  • MapReduce is a programming model for processing large data sets, and the name of an implementation of the model by Google. MapReduce is typically used to do distributed computing on clusters of computersThe model is inspired by the map and reduce functions commonly used in functional programming, although their purpose in the MapReduce framework is not the same as their original forms.[3]MapReduce libraries have been written in many programming languages. A popular free implementation is Apache Hadoop.
  • Chukwa - http://incubator.apache.org/chukwa/Chukwa is a Hadoop subproject devoted to large-scale log collection and analysis. Chukwa is built on top of the Hadoop distributed filesystem (HDFS) and MapReduce framework and inherits Hadoop’s scalability and robustness. Chukwa also includes a flexible and powerful toolkit for displaying monitoring and analyzing results, in order to make the best use of this collected data.ZooKeeper - http://zookeeper.apache.org/ -
  • MeatCloud, Can’t Keep up with Cloud ComputingDevops & Agile IT PhilosophyScript Repetitive TasksAutomate, Automate, Automate
  • Other disciplines like back-up, log management, performance and security (virus,intrusion detection) are important but not core to the delivery of cloud computing systems
  • Ideally for the cloud you create management toolchains that automate the management of your cloud. So that the output of one tool informs the input of another.
  • These tools are all appropriate for Linux guest operating systems, Windows operating system provisioning is not well addressed in OSS. AxemblerProvisonrProvisionr solves the problem of cloud portability by hiding completely the APIs and only focusing on building a cluster that matches the same set of assumptions on all clouds, assumptions like: a specific OS, pre-installed packages and binaries, sane dns settings, ssh & vpn access etc. - think a solid foundation for configuration.As a secondary goal Provisionr will also provide primitives for building automatic or semi-automatic workflows for configuring and monitoring services, workflows that assume that all the machines share a common set of characteristics as described above.CobblerCobbler is a Linux installation server that allows for rapid setup of network installation environments. It glues together and automates many associated Linux tasks so you do not have to hop between lots of various commands and applications when rolling out new systems, and, in some cases, changing existing ones. With a simple series of commands, network installs can be configured for PXE, reinstallations, media-based net-installs, and virtualized installs (supporting Xen, qemu, KVM, and some variants of VMware). Cobbler uses a helper program called 'koan' (which interacts with Cobbler) for reinstallation and virtualization support. CrowbarBare metal provisioning for CloudStack developed by Dell using Opscode Chef. JujuMetal as a Service (MAAS)MAAS offers a nice UI to provision your Ubuntu servers. Each physical server (“node”) will be commissioned automatically on first boot. During the commissioning process administrators are able to configure hardware settings manually before an automated smoke test and burn-in test are done. Once commissioned, a node can be deployed on demand by name, or allocated to a queue for dynamic allocation to services being deployed on this MAAS.Salt Cloud Salt Cloud is a tool for provisioning salted minions across various cloud providers. Currently supported providers are:- Amazon EC2- GoGrid- HP Cloud (using OpenStack)- Joyent- Linode- OpenStack- Rackspace (using OpenStack)The salt-cloud command can be used to query configured providers, create VMs on them, deploy salt-minion on those VMs and destroy them when no longer needed.Salt Cloud requires Salt to be installed, but does not require any Salt daemons to be running. However, if used in a salted environment, it is best to run Salt Cloud on the salt-master, so that it can properly lay down salt keys when it deploys machines, and then properly remove them later. If Salt Cloud is run in this manner, minions will automatically be approved by the master; no need to manually authenticate them later.Deprecated SpacewalkSpacewalk manages software content updates for Red Hat derived distributions such as Fedora, CentOS, and Scientific Linux, within your firewall. You can stage software content through different environments, managing the deployment of updates to systems and allowing you to view at which update level any given system is at across your deployment. A clean central web interface allows viewing of systems and their software update status, and initiating update actions.
  • Salt - https://github.com/saltstack/salt
  • AnsibleAnsible's SSH-key based access allows contributors to the Fedora Project to assist in automating infrastructure while having access limited appropriately. Ansible is also used to roll out and manage clusters of machines and ISV software, such as Basho's flagship key-value store Riak.CapistranoCapistrano is a developer tool for deploying web applications. It is typically installed on a workstation, and used to deploy code from your source code management (SCM) to one, or more servers.Capistrano recently added classes capabilities that match cobbler. RunDeckRunDeck is cross-platform open source software that helps you automate ad-hoc and routine procedures in data center or cloud environments. RunDeck allows you to run tasks on any number of nodes from a web-based or command-line interface. RunDeck also includes other features that make it easy to scale up your scripting efforts including: access control, workflow building, scheduling, logging, and integration with external sources for node and option data.FuncFunc allows for running commands on remote systems in a secure way, like SSH, but offers several improvements. Func allows you to manage an arbitrary group of machines all at once. Func automatically distributes certificates to all "slave" machines. There's almost nothing to configure. Func comes with a command line for sending remote commands and gathering data. There are lots of modules already provided for common tasks. Anyone can write their own modules using the simple Python module API. Everything that can be done with the command line can be done with the Python client API. The hack potential is unlimited. You'll never have to use "expect" or other ugly hacks to automate your workflow. It's really simple under the covers. Func works over XMLRPC and SSL. Since func uses certmaster, any program can use func certificates, latch on to them, and take advantage of secure master-to-slave communication. There are no databases or crazy stuff to install and configure. Again, certificate distribution is automatic too. McollectiveThe Marionette Collective AKA mcollective is a framework to build server orchestration or parallel job execution systems.Mcollective is used as a means of programmatic execution of Systems Administration actions on clusters of servers. MCollective use modern tools like Publish Subscribe Middleware and modern philosophies like real time discovery of network resources using meta data and not hostnames. Delivering a very scalable and very fast parallel execution environment.ScalrScalr is a pretty darn good open source cloud management tool. It provides both an automation framework (do Foo when Bar) and a web interface (where is this volume mounted) for managing infrastructure on the cloud, like EC2.FEATURES* Integrated into Opscode Chef, for configuration management.* Pre-automated software, such as nginx, mysql, redis, mongo, and rabbitmq* Blazing fast UI* Multi-cloud* More at http://scalr.net/features/ROADMAP* http://wiki.scalr.net/Roadmap
  • Automated Toolchain(For Linux guests) Bootstrapped image is launched fro a template in the cloud provider, then searches for the Cobbler server.Post Install from Cobbler kicks off Puppet with defined management class to configure server using rolesAfter cobbler runs kicks off configuration management in Puppet. Then services can be started and stopped with RunDeck or post-install scriptsThen RunDeck can insert new hosts in Zenoss or NagiosFinally as the network conditions change Zenoss can remediate via other tools based on situational awareness
  • NetFlix AWS Toolbag – http://netflix.github.comOver 25 projects developed by NetFlix to manager their cloud deployments. AsgardAsgard is a web-based tool for managing cloud-based applications and infrastructure.AstyanazAstyanax is a high level Java client for Apache Cassandra. Apache Cassandra is a highly available column oriented database.EddaEdda is a Service to track changes in your cloud deployments.EurekaEureka is a REST (Representational State Transfer) based service that is primarily used in the AWS cloud for locating services for the purpose of load balancing and failover of middle-tier servers.At Netflix, Eureka is used for the following purposes apart from playing a critical part in mid-tier load balancing.For aiding Netflix Asgard - an open source service which makes cloud deployments easier, inFast rollback of versions in case of problems avoiding the re-launch of 100's of instances which could take a long time.In rolling pushes, for avoiding propagation of a new version to all instances in case of problems.For our cassandra deployments to take instances out of traffic for maintenance.For our memcached caching services to identify the list of nodes in the ring.PriamPriam is a process/tool that runs alongside Apache Cassandra to automate the following:- Backup and recovery (Complete and incremental)- Token management- Seed discovery- ConfigurationSupport AWS environmentSimian ArmyThe Simian Army is a suite of tools for keeping your cloud operating in top form. Chaos Monkey, the first member, is a resiliency tool that helps ensure that your applications can tolerate random instance failures
  • CactiCacti is a complete network graphing solution designed to harness the power of RRDTool's data storage and graphing functionality. Cacti provides a fast poller, advanced graph templating, multiple data acquisition methods, and user management features out of the box. All of this is wrapped in an intuitive, easy to use interface that makes sense for LAN-sized installations up to complex networks with hundreds of devices.RRDToolRRDtool is the OpenSource industry standard, high performance data logging and graphing system for time series data. RRDtool can be easily integrated in shell scripts, perl, python, ruby, lua or tcl applications.Graphite Graphite is a highly scalable real-time graphing system. As a user, you write an application that collects numeric time-series data that you are interested in graphing, and send it to Graphite's processing backend, carbon, which stores the data in Graphite's specialized database. The data can then be visualized through graphite's web interfaces.
  • Linux Foundation Collaboration Summit: Hitchhiker's Guide to the Cloud

    1. 1. Hitchhiker’s Guideto The Open CloudLinux Foundation Collaboration Summit 2013Mark R. HinkleSr. Director , OPEN SOURCE SOLUTIONSCitrix Systems INC.@mrhinklemrhinkle@gmail.com
    2. 2. Mark Hinkle, Sr. Director, Open Source Solutions•  Dedicated  to  the  success  of  the  Apache  CloudStack,  Open  Daylight  &  Xen  Project  Communi3es  on  Citrix  behalf  •  Run  BuildACloud.org  learning  ac3vi3es  all  over  the  world  •  Joined  Citrix  via  Cloud.com  acquisi3on  July  2011  •  Zenoss  Core  Open  Source  project  to  100,000  users,  1.5  million  downloads  •  Former  LinuxWorld  Magazine  Editor-­‐in-­‐Chief  •  Open  Management  ConsorGum  organizer  •  Author  -­‐  “Windows  to  Linux  Business  Desktop  MigraGon”  –  Thomson  •  NetDirector  Project  -­‐  Open  Source  Configura3on  Management    •  Some3mes  Author  and  Blogger  at  SocializedSoJware.com  •  NetworkWorld  Open  Source  Subnet  Hitchhiker’s Guide to the Open Cloud by @mrhinkle2
    3. 3. Why Open Source and the Cloud Computing?•  User-­‐Driven  Context  from  Solving  Real  Problems  •  Lower  Barrier  to  Par3cipa3on  •  Larger  user  base,  users  helping  users    •  Aggressive  release  cycles  stay  current  with  the  state-­‐of-­‐the-­‐art  •  Open  Source  innova3ng  faster  than  commercial  •  Open  data,  Open  standards,  Open  APIs  Hitchhiker’s Guide to the Open Cloud by @mrhinkle3
    4. 4. Quick Cloud Computing Overview orthe Obligatory “What is the CloudExplanation”Infinite Improbability Drive
    5. 5. Five Characteristics of Cloud1. On-­‐Demand  Self-­‐Service  2. Broad  Network  Access  3. Resource  Pooling  4. Rapid  Elas3city  5. Measured  Service  Hitchhiker’s Guide to the Open Cloud by @mrhinkle5
    6. 6. Cloud Computing Service ModelsUSER CLOUD a.k.a. SOFTWARE AS A SERVICESingle application, multi-tenancy, network-based, one-to-many delivery ofapplications, all users have same access to features.Examples: Salesforce.com, Google Docs, Red Hat Network/RHELDEVELOPMENT CLOUD a.k.a. PLATFORM-AS-A-SERVICEApplication developer model, Application deployed to an elastic service thatautoscales, low administrative overhead. No concept of virtual machines oroperating system. Code it and deploy it.Examples: VMware CloudFoundry, Google AppEngine, Windows Azure,Rackspace Sites, Red Hat OpenShift, Active State Stackato, AppfogSYSTEMS CLOUD a.k.a INFRASTRUCTURE-AS-A-SERVICEServers and storage are made available in a scalable way over a network.Examples: EC2,Rackspace CloudFiles, OpenStack, CloudStack,Eucalyptus, OpenNebulaHitchhiker’s Guide to the Open Cloud by @mrhinkle6
    7. 7. Deployment Models: Public, Private & HybridHitchhiker’s Guide to the Open Cloud by @mrhinkle7
    8. 8. Building Open Source Clouds
    9. 9. Cloud ArchitectureHitchhiker’s Guide to the Open Cloud by @mrhinkle9
    10. 10. HypervisorsOpen  Source  •  Xen,  Project    Xen  Cloud  PlaMorm  (XCP)  •  KVM  –  Kernel-­‐based  VirtualizaGon  •  VirtualBox*  -­‐  Oracle  supported  Virtualiza3on  Solu3ons    •  OpenVZ*  -­‐  Container-­‐based,  Similar  to  Solaris  Containers  or  BSD  Zones  •  LXC  –  User  Space  chrooted  installs    Proprietary  •  VMware  •  Citrix  Xenserver  (based    •  Microsoa  Hyper-­‐V  •  OracleVM  (Based  on  OS  Xen)  Hitchhiker’s Guide to the Open Cloud by @mrhinkle10
    11. 11. Open Virtual Machine FormatsOpen  VirtualizaGon  Format  (OVF)  is  an  open  standard  for  packaging  and  distribu3ng  virtual  appliances  or  more  generally  soaware  to  be  run  in  virtual  machines.  Formats  for  hypervisors/cloud  technologies:      •  Amazon  -­‐  AMI  •  KVM  –  QCOW2  •  VMware  –  VMDK  •  Xen  Project–  IMG  •  VHD  –  Virtual  Hard  Disk    -­‐  Hyper-­‐V  Hitchhiker’s Guide to the Open Cloud by @mrhinkle11
    12. 12. Sourcing Cloud AppliancesTool/Project   What  you  can  do  with  them  Bitnami   BitNami  provides  free,  ready  to  run  environments  for  your  favorite  open  source  web  applica3ons  and  frameworks,  including  Drupal,  Joomla!,  Wordpress,  PHP,  Rails,  Django  and  many  more.    Boxgrinder   BoxGrinder  is  a  set  of  projects  that  help  you  grind  out  appliances  for  mul3ple  virtualiza3on  and  Cloud  providers  Oz   Command-­‐line  tool  that  has  the  ability  to  create  images  for  common  Linux  distribu3ons  to  run  on  KVM  SUSE  Studio   SUSE  Studio  supports  building  and  deploying  directly  to  cloud  services  such  as  Amazon  EC2.    Hitchhiker’s Guide to the Open Cloud by @mrhinkle12
    13. 13. Scale-Up or Scale-OutVerGcal  Scaling  (Scale-­‐Up)    Allocate  addi3onal  resources  to  VMs,  requires  a  reboot,  no  need  for  distributed  app  logic,  single-­‐point  of  OS  failure    Horizontal  Scaling  (Scale-­‐Out)  Applica3on  needs  logic  to  work  in  distributed  fashion  (e.g.  HA-­‐Proxy  and  Apache,  Hadoop)  Hitchhiker’s Guide to the Open Cloud by @mrhinkle13
    14. 14. Compute Clouds (IaaS)Year  Started   License   VirtualizaGon  Technologies  Apache  CloudStack  2008   Apache   Xenserver,  Xen  Cloud  Plalorm,  KVM,  VMware  (Hyper-­‐V  developing)  Eucalyptus   2006   GPL     Xen,  KVM,  VMware  (commercial  version)  OpenNebula   2005   Apache   Xen,  KVM,  VMware  OpenStack   2010  (Developed  by    NASA  by  Anso  Labs    previously)    Apache   VMware  ESX  and  ESXi,  ,  Xen,  Xen  Cloud  Plalorm  KVM,  LXC,  QEMU  and  Virtual  Box  Numerous companies are building cloud software on OpenStack including Nebula, Piston Inc., CloudScalingHitchhiker’s Guide to the Open Cloud by @mrhinkle14
    15. 15. OpenStack – Ecosystem of ProjectsEnterprise  Message  Queue  based  on  Rabbit  MQ  (ESB)  Object  Storage  “Swia”  Image  Service  “Glance”    Compute  “Nova”  Dashboard  “Horizon”  KVM,  VMware,  Xen  Cloud  Plalorm  Ceph,  Gluster  Advanced  Cloud  and  Networking  services  accessing  the  Quantum  API  Firewall  Service    Gateway  Service  Quantum  Networking  Fabric  REST  API  Plugins  OpenvSwitch  Quantum  Plugin-­‐ins  Iden3ty  Services  “Keystone”  API  20+ Collective projects hosted at: https://launchpad.net/openstackHitchhiker’s Guide to the Open Cloud by @mrhinkle15
    16. 16. Cloud APIs•  jclouds  •  libcloud  •  deltacloud  •  fog  Hitchhiker’s Guide to the Open Cloud by @mrhinkle16
    17. 17. Cloud Computing StorageProject     DescripGon  Ceph   Distributed  file  storage  system  developed  by  DreamHost  GlusterFS   Scale  Out  NAS  system  aggrega3ng  storage  over  Ethernet  or  Infiniband  OpenStack    Storage  Long-­‐term  object  storage  system  Riak  CS     Riak  CS  is  open  source  soaware  designed  to  provide  simple,  available,  distributed  cloud  storage  at  any  scale.  Riak  CS  is  S3-­‐API  compa3ble  and  supports  per-­‐tenant  repor3ng  for  billing  and  metering  use  cases.  Sheepdog   Distributed  storage  for  KVM  hypervisors  Hitchhiker’s Guide to the Open Cloud by @mrhinkle17
    18. 18. Platform-as-a-Service (PaaS)Project   Year  Started   Sponsors   Languages/Frameworks  CloudFoundry   2011   VMware   Spring  for  Java,  Ruby  for  Rails  and  Sinatra,  node.js,  Grails,  Scala  on  Lia  and  more  via  partners  (e.g.  Python,  PHP)  Cloudify   2012   Gigaspaces   [Groovy  for  deployment  recipes]  OpenShia  **   2011   Red  Hat   Java,  Ruby,  PHP,  Perl  and  Python    Stackato*   2012   Ac3veState   Java,  Python,  PHP,  Ruby,  Perl,  Node.js,  others  WSO2  Stratus   2010   WSO2   Jboss,  Java  EE6  Hitchhiker’s Guide to the Open Cloud by @mrhinkle18
    19. 19. Software Defined Networking (SDN)
    20. 20. Overview of Software Defined NetworkingBusiness  Applica3ons  Network  Services  SDNControlSoftwareAPI APINetwork  Devices  Network  Devices  Network  Devices  Network  Devices  Network  Devices  Network  Devices  ApplicationLayerControlLayerInfrastructureLayerControl Data Plane Interface (e.g. OpenFlow)Hitchhiker’s Guide to the Open Cloud by @mrhinkle20
    21. 21. Cloud Promise, Reality and NetworksCloud  Promise   Cloud  Reality  Centralized  ConfiguraGon  and  AutomaGon  Without  true  virtualiza3on,  network  devices  must  s3ll  be  manually  configured.  Instant  Self-­‐Service  Provisioning  In  a  physical  network,  it  could  take  a  long  3me  for  network  engineer  to  provision  new  services.  ElasGcity  and  Scalability   By  horizontally  scaling  up  the  physical  network,  elas3city  is  lost.  Designed  for  Failure   Failover  can  be  automated  and  physical  network  limita3ons  can  be  alleviated.    Source: MidokuraHitchhiker’s Guide to the Open Cloud by @mrhinkle21
    22. 22. Open FlowOpenFlow  enables  networks  to  evolve,  by  giving  a  remote  controller  the  power  to  modify  the  behavior  of  network  devices,  through  a  well-­‐defined  "forwarding  instruc3on  set".  The  growing  OpenFlow  ecosystem  now  includes  routers,  switches,  virtual  switches,  and  access  points  from  a  range  of  vendors.  Image from http://www.openflow.org/documents/openflow-wp-latest.pdfHitchhiker’s Guide to the Open Cloud by @mrhinkle22
    23. 23. Software Defined Networking (SDN)Project DescriptionFloodlight   The  Floodlight  controller  is  an  enterprise-­‐class,  Apache-­‐licensed,  Java-­‐based  OpenFlow  Controller.  Indigo   Indigo  is  an  open  source  project  to  support  OpenFlow  on  a  range  of  physical  switches.  By  leveraging  hardware  features  of  Ethernet  switch  ASICs,  Indigo  supports  high  rates  for  high  port  counts,  up  to  48  10-­‐gigabit  ports.  Mul3ple  gigabit  plalorms  with  10-­‐gigabit  uplinks  are  also  supported.    Open  Daylight   Linux  Founda3on  Collabora3ve  Project  based  on  Cisco  One  Controller  and    OpenStack    Networking  “Quantum”    Pluggable,  scalable,  API-­‐driven  network  and  IP  management  Open  vSwitch   Open  vSwitch  is  a  open  source  (ASL  2.0),  mul3layer  virtual  switch  designed  to  enable  massive  network  automa3on  through  programma3c  extension,  while  s3ll  suppor3ng  standard  management  interfaces  and  protocols  (e.g.  NetFlow,  sFlow,  SPAN,  RSPAN,  CLI,  LACP,  802.1ag).  Hitchhiker’s Guide to the Open Cloud by @mrhinkle23
    24. 24. Big DataDeep Thought
    25. 25. 1 Billion Facebook Users - October 20120  200  400  600  800  1000  1200  Dec-­‐04  Mar-­‐05  Jun-­‐05  Sep-­‐05  Dec-­‐05  Mar-­‐06  Jun-­‐06  Sep-­‐06  Dec-­‐06  Mar-­‐07  Jun-­‐07  Sep-­‐07  Dec-­‐07  Mar-­‐08  Jun-­‐08  Sep-­‐08  Dec-­‐08  Mar-­‐09  Jun-­‐09  Sep-­‐09  Dec-­‐09  Mar-­‐10  Jun-­‐10  Sep-­‐10  Dec-­‐10  Mar-­‐11  Jun-­‐11  Sep-­‐11  Dec-­‐11  Mar-­‐12  Jun-­‐12  Sep-­‐12  FacebookUsersinMillionsSource: Benphoster.comHitchhiker’s Guide to the Open Cloud by @mrhinkle25
    26. 26. Twitter at 400M Tweets Per Day – June 20120  50  100  150  200  250  300  350  400  450  Jan-­‐07  Mar-­‐07  May-­‐07  Jul-­‐07  Sep-­‐07  Nov-­‐07  Jan-­‐08  Mar-­‐08  May-­‐08  Jul-­‐08  Sep-­‐08  Nov-­‐08  Jan-­‐09  Mar-­‐09  May-­‐09  Jul-­‐09  Sep-­‐09  Nov-­‐09  Jan-­‐10  Mar-­‐10  May-­‐10  Jul-­‐10  Sep-­‐10  Nov-­‐10  Jan-­‐11  Mar-­‐11  May-­‐11  Jul-­‐11  Sep-­‐11  Nov-­‐11  Jan-­‐12  Mar-­‐12  May-­‐12  TweetsinMillionsSource :TheBigDataGroup.comHitchhiker’s Guide to the Open Cloud by @mrhinkle26
    27. 27. Data  is  growing  faster  than  storage  capacity  and  compu3ng  power.  Legacy  systems  hold  organiza3ons  back;  storage  soaware  must  include  mul3-­‐petabyte  capacity,  support  poten3ally  billions  of  objects,  and  provide  applica3on  performance  awareness  and  agile  provisioning.      -­‐Gartner,  Big  Data  Challenges  for  the  IT  Infrastructure  Team  Big Data and Storage InfrastructureHitchhiker’s Guide to the Open Cloud by @mrhinkle27
    28. 28. Big Data LandscapeSource:BigDataGroup.comHitchhiker’s Guide to the Open Cloud by @mrhinkle28
    29. 29. Open Source NoSQL DatabasesName Type DescriptionApache  Cassandra  Wide  Column  Store/Families  API:  many  »  Query  Method:  MapReduce,  Replicaton:  ,  Wriuen  in:  Java,  Concurrency:  eventually  consistent  ,  Misc:  like  "Big-­‐Table  on  Amazon  Dynamo  alike",  ini3ated  by  Facebook  CouchDB   Document  Store   API:  Memcached  API+protocol  (binary  and  ASCII)  ,  most  languages,  Protocol:  Memcached  REST  interface  for  cluster  conf  +  management,  Wriuen  in:  C/C++  +  Erlang  (clustering),  Replica3on:  Peer  to  Peer,  fully  consistent,  Misc:  Transparent  topology  changes  during  opera3on,  provides  memcached-­‐compa3ble  caching  buckets  HBase   Wide  Column  Store/Families    API:  Java  /  any  writer,  Protocol:  any  write  call,  Query  Method:  MapReduce  Java  /  any  exec,  Replica3on:  HDFS  Replica3on,  Wriuen  in:  Java  Hypertable   Wide  Column  Store/Families    PI:  Thria  (Java,  PHP,  Perl,  Python,  Ruby,  etc.),  Protocol:  Thria,  Query  Method:  HQL,  na3ve  Thria  API,  Replica3on:  HDFS  Replica3on,  Concurrency:  MVCC,  Consistency  Model:  Fully  consistent  Misc:  High  performance  C++  implementa3on  of  Googles  Bigtable.  MongoDB   Document  Store   API:  BSON,  Protocol:  C,  Query  Method:  dynamic  object-­‐based  language  &  MapReduce,  Replica3on:  Master  Slave  &  Auto-­‐Sharding,  Wriuen  in:  C++,Concurrency  Redis   Key  Value/  Tuple  Store  API:  Tons  of  languages,  Wriuen  in:  C,  Concurrency:  in  memory  and  saves  asynchronous  disk  aaer  a  defined  3me.  Append  only  mode  available.  Different  kinds  of  fsync  policies.  Replica3on:  Master  /  Slave,  Misc:  also  lists,  sets,  sorted  sets,  hashes,  queues.    Riak     Key  Value  /  Tuple  Store  API:  JSON,  Protocol:  REST,  Query  Method:  MapReduce  term  matching  ,  Scaling:  Mul3ple  Masters;  Wriuen  in:  Erlang,  Concurrency:  eventually  consistent  (stronger  then  MVCC  via  Vector  Clocks)  Hitchhiker’s Guide to the Open Cloud by @mrhinkle29
    30. 30. MapReduceProblem  Data  Master    Node  Worker  Node  1  Worker  Node  2  Worker  Node  3  Solu3on  Data  MapReduceHitchhiker’s Guide to the Open Cloud by @mrhinkle30
    31. 31. Apache HadoopOverview  •  Handles  large  amounts  of  data  •  Stores  data  in  na3ve  format  •  Delivers  linear  scalability  at  low  cost  •  Resilient  in  case  of  infrastructure  failures  •  Transparent  applica3on  scalability  Facts    •  Apache  top-­‐level  open  source  project  •  One  framework  for  storage  and  compute  –  HDFS  –  Scalable  storage  in  Hadoop  Distributed  File  System  (HDFS)  –  Compute  via  the  MapReduce  distributed  processing  plalorm  •  Domain  Specific  Language  (DSL)  -­‐  Java  Hitchhiker’s Guide to the Open Cloud by @mrhinkle31
    32. 32. Hadoop ArchitectureHadoop  Common      HDFS  Distributes  &  replicates  data  across  machines  MapReduce  Distributes  &  monitors  tasks    Hive    Data  warehouse  that  provides  SQL  interface.  Ad  hoc  projec3on  of  data  structure  to  unstructured  MapReduce•  Parallel programming•  Handles large data blocksNon-Relational DBHBase    Column-­‐oriented  schema-­‐less  distributed  DB  modeled  aaer  Google’s  BigTable  Random  real  3me  read/write.      ScriptingPig  Plalorm  for  manipula3ng  and  analyzing  large  data  sets.  Scrip3ng  language  for  analysts.    Mahout    Machine  learning  libraries  for  recommenda3ons  ,  clustering,  classifica3ons  and  item  sets.    Machine LearningChuckwa  Zookeeper  ManagementHitchhiker’s Guide to the Open Cloud by @mrhinkle32
    33. 33. Big Data Summary•  Quan3ty  of  Machine  Created  Data  Increasing  Dras3cally  (examples:  networked  sensor  data  from  mobile  phones  and  GPS  devices)  •  Data  manipula3on  moving  from  batched  to  real-­‐3me  •  Cloud  services  giving  everyone  Big  Data  tools    •  Consumer  company  speed  and  scale  requirements  driving  efficiencies  in  Big  Data  storage  and  analy3cs  •  New  and  broader  number  of  data  sources  being  meshed  together  •  Big  Data  Apps  means  using  Big  Data  is  faster  and  easier    Hitchhiker’s Guide to the Open Cloud by @mrhinkle33
    34. 34. Cloud Management Tools
    35. 35. Automation in the CloudMeat Cloud Cloud OperationsHitchhiker’s Guide to the Open Cloud by @mrhinkle35
    36. 36. 4 Types of Management ToolsProvisioningInstallation of operating systems and other softwareConfiguration ManagementSets the parameters for servers, can specify installation parametersOrchestration/AutomationAutomate tasks across systemsMonitoringRecords errors and health of IT infrastructureHitchhiker’s Guide to the Open Cloud by @mrhinkle36
    37. 37. Management ToolchainsConfigura3on  Patching  and  Provisioning  Monitoring  Toolchain (n):A set of tools wherethe output of onetool becomes theinput of another toolHitchhiker’s Guide to the Open Cloud by @mrhinkle37
    38. 38. ProvisioningProject Installation TargetsApache  Provisionr(incuba3ng)  Can  provision  10s  to  1000s  of  machines  on  various  clouds.    Cobbler   Distributed  virtual  infrastructure  using  koan  (kickstart  of  a  network  to  PXE  boot  VMs)  for  Red  Hat,  OpenSUSE  Fedora,  Debian,  Ubuntu  VMs  Crowbar    (Bare  metal  provisioning)  JuJu   Public  Clouds  -­‐    Amazon  Web  Services  HP  Cloud,    Private  OpenStack  clouds,  Bare  Metal  via  MAAS.    Salt  Cloud     Tool  to  provision  “salted”  VMs  that  can  then  be  updated  by  a  central  server  via  ZeroMQ  Hitchhiker’s Guide to the Open Cloud by @mrhinkle38
    39. 39. Configuration Management ToolsProject   Year  Started   Language   License   Client/Server  Cfengine   1993   C   Apache   Yes  Chef   2009   Ruby   Apache   Chef  Solo  –  No    Chef  Server  -­‐  Yes  Puppet   2004   Ruby   GPL     Yes  &  standalone  Salt   2011   Python   Apache   yes  Hitchhiker’s Guide to the Open Cloud by @mrhinkle39
    40. 40. Automation/Orchestration ToolsProject   DescripGon  Ansible   Ansibles  SSH-­‐key  based  access  allows  contributors  to  the  Fedora  Project  to  assist  in  automa3ng  infrastructure  while  having  access  limited  appropriately.    Capistrano   U3lity  and  framework  for  execu3ng  commands  in  parallel  on  mul3ple  remote  machines,  via  SSH.  It  uses  a  simple  DSL  that  allows  you  to  define  tasks,  which  may  be  applied  to  machines  in  certain  roles  RunDeck   Rundeck  is  an  open-­‐source  process  automa3on  and  command  orchestra3on  tool  with  a  web  console.  Func   Func  provides  a  two-­‐way  authen3cated  system  for  generically  execu3ng  tasks,  integra3ons  with  puppet  and  cobbler.  MCollec3ve   The  Marioneue  Collec3ve  AKA  MCollec3ve  is  a  framework  to  build  server  orchestra3on  or  parallel  job  execu3on  systems.  Salt   Execute  arbitrary  shell  commands  or  choose  from  dozens  of  pre-­‐built  modules  of  common  (or  complex)  commands.  Scalr   Provide  scaling  across  mul3ple  cloud  compu3ng  plalorms,  integrates  with  Chef.    Hitchhiker’s Guide to the Open Cloud by @mrhinkle40
    41. 41. Conceptual Automated ToolchainBootStrapped  Image  CloudStack  OpenStack  ConfiguraGon  Puppet  Chef  Start/Stop  Services  RunDeck  Capistrano  MCollec3ve  Provision  Cobbler  SUSE  Stuido  Monitoring  Nagios  Zenoss    Cac3    Generate  Images  SUSE  Studio  BoxGrinder  Hitchhiker’s Guide to the Open Cloud by @mrhinkle41
    42. 42. NetFlix Open Source ToolBag for AWSASGARD ASTYANAX EDDAEUREKA PRIAM SIMIAN ARMY42Hitchhiker’s Guide to the Open Cloud by @mrhinklehttp://netflix.github.com
    43. 43. Hitchhiker’s Guide to the Open Cloud by @mrhinkle43
    44. 44. Goodbye and thanks for all the fish!
    45. 45. Questions?Slides Can be Viewed and Downloaded at:http://www.slideshare.net/socializedsoftware/Copyright Mark R. Hinkle, availableunder the CCbySA license somerights reserved. 2012 -2013
    46. 46. Contact MeProfessional: mark.hinkle@citrix.comPersonal: mrhinkle@gmail.comPhone: 919.228.8049Personal: http://www.socializedsoftware.comTwitter: @mrhinkleMark R. HinkleSenior Director,Open Source SolutionsCitrix Systems Inc.Open Source EnthusiastHitchhiker’s Guide to the Open Cloud by @mrhinkle46
    47. 47. Appendix
    48. 48. Additional Resources•  Devops  Toolchains  Group  •  Soaware  Defined  Networking:    The  New  Norm  for  Networks  (Whitepaper)  •  DevOps  Wikipedia  Page  •  NoSQL-­‐Database.org  –  Ul3mate  Guide  to  the  Non-­‐Rela3onal  Universe  •  Open  Cloud  Ini3a3ve  •  NIST  Cloud  Compu3ng  Plalorm  •  Open  Virtualiza3on  Format  Specs  •  Cloudera3  Twiuer  Account  •  Planet  DevOps  •  Nicira  Whitepaper  –  It’s  Time  to  Virtualize  the  Network  •  Why  Open  vSwitch  FAQ  Hitchhiker’s Guide to the Open Cloud by @mrhinkle48
    49. 49. Monitoring ToolsLicense   Type  of  Monitoring   CollecGon  Methods  Cac3    /  RRDTool   GPL     Performance   SNMP,  syslog  Graphite   Apache  2.0   Performance   Agent  Nagios   GPL   Availability   SNMP,TCP,  ICMP,  IPMI,  syslog  Zabbix   GPL     Availability/  Performance  and  more  SNMP,  TCP/ICMP,  IPMI,  Synthe3c  Transac3ons  Zenoss   GPL   Availability,  Performance,  Event  Management  SNMP,  ICMP,  SSH,  syslog,  WMI  Hitchhiker’s Guide to the Open Cloud by @mrhinkle49