• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Great Wide Open: Crash Course Open Source Cloud Computing - 2014
 

Great Wide Open: Crash Course Open Source Cloud Computing - 2014

on

  • 239 views

Very few trends in IT have generated as much buzz as cloud computing. This session will cut through the hype and quickly clarify the ontology for cloud computing. The bulk of the conversation will ...

Very few trends in IT have generated as much buzz as cloud computing. This session will cut through the hype and quickly clarify the ontology for cloud computing. The bulk of the conversation will focus on the open source software that can be used to build compute clouds (infrastructure-as-a-service) and the complimentary open source management tools that can be combined to automate the management of cloud computing environments.

The session will appeal to anyone who has a good grasp of traditional data center infrastructure but is struggling with the benefits and migration path to a cloud computing environment. Systems administrators and IT generalists will leave the discussion with a general overview of the options at their disposal to effectively build and manage their own cloud computing environments using free and open source software.

Statistics

Views

Total Views
239
Views on SlideShare
239
Embed Views
0

Actions

Likes
1
Downloads
15
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Private cloudThe cloud infrastructure is operated solely for an organization. It may be managed by the organization or a third party and may exist on premise or off premise.Public cloudThe cloud infrastructure is made available to the general public or a large industry group and is owned by an organization selling cloud services.Hybrid cloudThe cloud infrastructure is a composition of two or more clouds (private, community, or public) that remain unique entities but are bound together by standardized or proprietary technology that enables data and application portability (e.g., cloud bursting for load balancing between clouds).
  • Top choices for Cloud Computing are Xen and KVM.OpenVZ, container virtualization for Linux, is an interesting option as it has a very minimal overhead to scale application space similar to containers like BSD Jails. Advantage is that memory allocation is soft and unutilized memory can be used by other applications.
  • OVFAn OVF package consists of several files, placed in one directory. A one-file alternative is the OVA package, which is a TAR file with the OVF directory inside.OVF is a packaging format for software appliances. From a technical point of view, an OVF is a transport mechanism for virtual machine templates. One OVF may contain a single VM, or many VMs (it is left to the software appliance developer to decide which arrangement best suits their application). OVFs must be installed before they can be run; a particular virtualization platform may run the VM from the OVF, but this is not required. If this is done, the OVF itself can no longer be viewed as a “golden image” version of the appliance, since run-time state for the virtual machine(s) will pervade the OVF. Moreover the digital signature that allows the platform to check the integrity of the OVF will be invalidAn Amazon Machine Image (AMI) is a special type of virtual appliance which is used to instantiate (create) a virtual machine within the Amazon Elastic Compute Cloud. It serves as the basic unit of deployment for services delivered using EC2..Amazon AMI An Amazon Machine Image (AMI) is a special type of virtual appliance which is used to instantiate (create) a virtual machine within the Amazon Elastic Compute Cloud. It serves as the basic unit of deployment for services delivered using EC2. Like all virtual appliances, the main component of an AMI is a read-only filesystem image which includes an operating system (e.g., Linux, UNIX, or Windows) and any additional software required to deliver a service or a portion of it.[2]The AMI filesystem is compressed, encrypted, signed, split into a series of 10MB chunks and uploaded into Amazon S3 for storage. An XML manifest file stores information about the AMI, including name, version, architecture, default kernel id, decryption key and digests for all of the filesystem chunks.An AMI does not include a kernel image, only a pointer to the default kernel id, which can be chosen from an approved list of safe kernels maintained by Amazon and its partners (e.g., RedHat, Canonical, Microsoft). Users may choose kernels other than the default when booting an AMI.QCOW2 – QEMU “Copy on Write” Version 2qcow stands for "QEMU Copy On Write" and denotes a disk storage optimization strategy that delays allocation of storage until it is actually needed. QEMU is an emulator and virtual machine container, and it can use a variety of virtual disk images which are generally associated with specific guests operating systems.qcow2 is a newer version of the qcow format. QEMU can use a base image which is read-only, and store all writes to the qcow2 image. Among the QEMU supported formats, this is the most versatile format. Features include smaller images (useful if the filesystem does not support holes, for example on FAT32), optional AES encryption, zlib based compression and support of multiple VM snapshots. qemu and xen have retained the qcow format for backwards compatibility. Users can easily convert qcow disk images to the qcow2 format.VMDK - Virtual Machine Disk VMDK (Virtual Machine Disk) is a file format used for virtual appliances developed for VMware products. The format is a container for virtual hard disk drives to be used in virtual machines like VMware Workstation or Virtualbox. VMDK is an open format.IMGThe IMG file extension is used by files which are standardized raw dumps of a disk, and by files in various formats created by different imaging programs.Xen can use raw disk images and physical disks as filesystems for a Xen based domainU. Another option is to use the disk images used by QEMU. VHD – Virtual Hard Disk Virtual Hard Disk format started by Connectix (now part of Microsoft) made open through the Microsoft Open Specification Promise.VHDs are implemented as files that reside on the native host file system. The following types of VHD formats are supported by Microsoft Virtual PC and Virtual Server:Fixed hard disk image: a file that is allocated to the size of the virtual disk. Fixed VHDs consist of a raw disk image followed by a VHD footer (512 or formerly 511 bytes).[1]Dynamic hard disk image: a file that at any given time is as large as the actual data written to it, plus the size of the header and footer. Dynamic and differencing VHDs begin with a copy of the VHD footer (padded to 512 bytes), and for dynamic or differencing VHDs created by Microsoft products this results in a VHD-cookie string conectix at the begin of the VHD file.[1]Differencing hard disk image: a set of modified blocks (maintained in a separate file referred to as the "child image") in comparison to a parent image. The Differencing hard disk image format allows the concept of Undo Changes: when enabled, all changes to a hard drive contained within a VHD (the parent image) are stored in a separate file (the child image). Options are available to undo the changes to the VHD, or to merge them permanently into the VHD. Different child images based on the same parent image also allow "cloning" of VHDs; at least the globally unique identifier (GUID) must be different.Linked to a hard disk: a file which contains a link to a physical hard drive or partition of a physical hard drive
  • Common use cases for Docker include:Automating the packaging and deployment of applicationsCreation of lightweight, private PAAS environmentsAutomated testing and continuous integration/deploymentDeploying and scaling web apps, databases and backend services
  • Source: http://www.slideshare.net/jpetazzo/presentations
  • Source: http://www.slideshare.net/jpetazzo/presentations
  • Types of Tasks Accomplished by an APIProvisioning (creating, re-creating, moving, or deleting components e.g. virtual machines, vlans)Configuration (assigning or changing attributes of the architecture such as security and network settings)Cloud ProvidersJclouds – java API Abstraction Libcloud – started by CloudKick (now Rackspace) to abstract clouds, Apache incubator projectDeltacloud – started by Red Hat to abstract clouds, Apache incubator projectFog - provider and abstraction level API across compute and storage, written in Ruby
  • OpenStack Shared Serviceshttps://www.openstack.org/software/openstack-shared-services/Identity ServiceOpenStack Identity provides a central directory of users mapped to the OpenStack services they can access. It acts as a common authentication system across the cloud operating system and can integrate with existing backend directory services like LDAP. It supports multiple forms of authentication including standard username and password credentials, token-based systems and AWS-style logins.Image ServiceThe OpenStack Image Service provides discovery, registration and delivery services for disk and server images. The ability to copy or snapshot a server image and immediately store it away is a powerful capability of the OpenStack cloud operating system. Stored images can be used as a template to get new servers up and running quickly and more consistently if you are provisioning multiple servers than installing a server operating system and individually configuring additional services. It can also be used to store and catalog an unlimited number of backups.Telemetry ServiceThe OpenStack Telemetry service aggregates usage and performance data across the services deployed in an OpenStack cloud. This powerful capability provides visibility and insight into the usage of the cloud across dozens of data points and allows cloud operators to view metrics globally or by individual deployed resources.Orchestration ServiceOpenStack Orchestration is a template-driven engine that allows application developers to describe and automate the deployment of infrastructure. The flexible template language can specify compute, storage and networking configurations as well as detailed post-deployment activity to automate the full provisioning of infrastructure as well as services and applications. Through integration with the Telemetry service, the Orchestration engine can also perform auto-scaling of certain infrastructure elements.
  • OpenStack Shared Serviceshttps://www.openstack.org/software/openstack-shared-services/Identity ServiceOpenStack Identity provides a central directory of users mapped to the OpenStack services they can access. It acts as a common authentication system across the cloud operating system and can integrate with existing backend directory services like LDAP. It supports multiple forms of authentication including standard username and password credentials, token-based systems and AWS-style logins.Image ServiceThe OpenStack Image Service provides discovery, registration and delivery services for disk and server images. The ability to copy or snapshot a server image and immediately store it away is a powerful capability of the OpenStack cloud operating system. Stored images can be used as a template to get new servers up and running quickly and more consistently if you are provisioning multiple servers than installing a server operating system and individually configuring additional services. It can also be used to store and catalog an unlimited number of backups.Telemetry ServiceThe OpenStack Telemetry service aggregates usage and performance data across the services deployed in an OpenStack cloud. This powerful capability provides visibility and insight into the usage of the cloud across dozens of data points and allows cloud operators to view metrics globally or by individual deployed resources.Orchestration ServiceOpenStack Orchestration is a template-driven engine that allows application developers to describe and automate the deployment of infrastructure. The flexible template language can specify compute, storage and networking configurations as well as detailed post-deployment activity to automate the full provisioning of infrastructure as well as services and applications. Through integration with the Telemetry service, the Orchestration engine can also perform auto-scaling of certain infrastructure elements.
  • Canonical Ubuntu OpenStack - http://www.ubuntu.com/cloud/tools/openstackCloudScaling – Elastic Cloud Infrastructure - http://www.cloudscaling.com/Elastic Cloud Infrastructure – built on OpenStack – enables any IT group to deploy cloud services comparable to the capabilities of the world’s largest and most successful public clouds. Cloudscaling solutions allow your organization to rapidly scale resources, achieve new levels of agility and improve market responsiveness. All with full control and governance in the privacy of your on-premise data center.HP Cloud OS - http://www8.hp.com/us/en/business-solutions/solution.html?compURI=1421776#.UzoD3K1dVDoBased on OpenStack technology, HP Cloud OS provides the foundation for the HP Cloud common architecture across private, public, and hybrid cloud delivery.Piston Cloud Computing - http://www.pistoncloud.com/openstack-cloud-software/Piston OpenStack is a software product that uses advanced systems intelligence to orchestrate an entire private cloud environment using commodity hardware. Starting with an extremely lightweight custom Linux OS called Iocane Micro-OS™, and using an advanced high-availability system called Moxie Runtime Environment™, Piston keeps your cloud running no matter what – through hardware failure, operator error, upgrades, and power outages.Red Hat Distribution of OpenStack - http://openstack.redhat.com/Main_PageRDO is a community of people using and deploying OpenStack on Red Hat Enterprise Linux, Fedora and distributions derived from these (such as CentOS, Scientific Linux and others). We have documentation to help get started, forums where you can connect with other users, and community-supported packages of the most up-to-date OpenStack releases available for download.Rackspace Private Cloud powered by OpenStack - http://www.rackspace.com/cloud/private/
  • Types of Tasks Accomplished by an APIProvisioning (creating, re-creating, moving, or deleting components e.g. virtual machines, vlans)Configuration (assigning or changing attributes of the architecture such as security and network settings)Cloud ProvidersDaisein - Jclouds – java API Abstraction Libcloud – started by CloudKick (now Rackspace) to abstract clouds, Apache incubator projectDeltacloud – started by Red Hat to abstract clouds, Apache incubator projectFog - provider and abstraction level API across compute and storage, written in Ruby
  • Software Defined Networking (SDN) is an emerging network architecture where network control is decoupled from forwarding and is directly programmable. This migration of control, formerly tightly bound in individual network devices, into accessible computing devices enables the underlying infrastructure to be abstracted for applications and network services, which can treat the network as a logical or virtual entity. This figure depicts a logical view of the SDN architecture. Network intelligence is (logically) centralized in software-based SDN controllers, which maintain a global view of the network. As a result, the network appears to the applications and policy engines as a single, logical switch. With SDN, enterprises and carriers gain vendor-independent control over the entire network from a single logical point, which greatly simplifies the network design and operation. SDN also greatly simplifies the network devices themselves, since they no longer need to understand and process thousands of protocol standards but merely accept instructions from the SDN controllers.
  • Open FlowOpenFlow is an open standard that enables researchers to run experimental protocols in the campus networks we use every day. OpenFlow is added as a feature to commercial Ethernet switches, routers and wireless access points – and provides a standardized hook to allow researchers to run experiments, without requiring vendors to expose the internal workings of their network devices. OpenFlow is currently being implemented by major vendors, with OpenFlow-enabled switches now commercially available.In a classical router or switch, the fast packet forwarding (data path) and the high level routing decisions (control path) occur on the same device. An OpenFlow Switch separates these two functions. The data path portion still resides on the switch, while high-level routing decisions are moved to a separate controller, typically a standard server. The OpenFlow Switch and Controller communicate via the OpenFlow protocol, which defines messages, such as packet-received, send-packet-out, modify-forwarding-table, and get-stats.The data path of an OpenFlow Switch presents a clean flow table abstraction; each flow table entry contains a set of packet fields to match, and an action (such as send-out-port, modify-field, or drop). When an OpenFlow Switch receives a packet it has never seen before, for which it has no matching flow entries, it sends this packet to the controller. The controller then makes a decision on how to handle this packet. It can drop the packet, or it can add a flow entry directing the switch on how to forward similar packets in the future.OpenFlow is the first standard communications interface defined betweenthe control and forwarding layers of an SDN architecture. OpenFlow allows direct access to and manipulation of the forwarding plane of network devices such as switches and routers, both physical and virtual (hypervisor-based). It is the absence of an open interface to the forwarding plane that has led to the characterization of today’s networking devices as monolithic, closed, and mainframe-like. No other standard protocol does what OpenFlow does, and a protocol like OpenFlow is needed to move network control out of thenetworking switches to logically centralized control software
  • Why Open vSwitch - http://git.openvswitch.org/cgi-bin/gitweb.cgi?p=openvswitch;a=blob_plain;f=WHY-OVS;hb=HEADHypervisors need the ability to bridge traffic between VMs and with theoutside world. On Linux-based hypervisors, this used to mean using thebuilt-in L2 switch (the Linux bridge), which is fast and reliable. So,it is reasonable to ask why Open vSwitch is used.The answer is that Open vSwitch is targeted at multi-servervirtualization deployments, a landscape for which the previous stack isnot well suited. These environments are often characterized by highlydynamic end-points, the maintenance of logical abstractions, and(sometimes) integration with or offloading to special purpose switchinghardware.
  • Floodlight - http://floodlight.openflowhub.org/The Floodlight controller is an enterprise-class, Apache-licensed, Java-based OpenFlow Controller. It is supported by a community of developers including a number of engineers from Big Switch Networks.OpenFlow is a open standard managed by the Open Networking Foundation (ONF). It specifies a protocol through switch a remote controller can modify the behavior of networking devices through a well-defined “forwarding instruction set”. Floodlight is designed to work with the growing number of switches, routers, virtual witches, and access points that support the OpenFlow standard.Open Daylight – http://www.opendaylight.comThe adoption of new technologies and pursuit of programmable networks has the potential to significantly improve levels of functionality, flexibility and adaptability of mainstream datacenter architectures. To leverage this abstraction to its fullest requires the network to adapt and evolve to a Software-Defined architecture. One of the architectural elements required to achieve this goal is a Software-Defined-Networking (SDN) platform that enables network control and programmability.OpenStack Networking “Quantum” – https://www.openstack.org/software/openstack-networking/OpenStack Networking is a pluggable, scalable and API-driven system for managing networks and IP addresses. Like other aspects of the cloud operating system, it can be used by administrators and users to increase the value of existing datacenter assets. OpenStack Networking ensures the network will not be the bottleneck or limiting factor in a cloud deployment and gives users real self service, even over their network configurations.Networking CapabilitiesOpenStack provides flexible networking models to suit the needs of different applications or user groups. Standard models include flat networks or VLANs for separation of servers and traffic.OpenStack Networking manages IP addresses, allowing for dedicated static IPs or DHCP. Floating IPs allow traffic to be dynamically rerouted to any of your compute resources, which allows you to redirect traffic during maintenance or in the case of failure. Users can create their own networks, control traffic and connect servers and devices to one or more networks.The pluggable backend architecture lets users take advantage of commodity gear or advanced networking services from supported vendors.Administrators can take advantage of software-defined networking (SDN) technology like OpenFlow to allow for high levels of multi-tenancy and massive scale.OpenStack Networking has an extension framework allowing additional network services, such as intrusion detection systems (IDS), load balancing, firewalls and virtual private networks (VPN) to be deployed and managed.Open vSwitchOpen vSwitch is a production quality, multilayer virtual switch licensed under the open source Apache 2.0 license. It is designed to enable massive network automation through programmatic extension, while still supporting standard management interfaces and protocols (e.g. NetFlow, sFlow, SPAN, RSPAN, CLI, LACP, 802.1ag). In addition, it is designed to support distribution across multiple physical servers similar to VMware's vNetwork distributed vswitch or Cisco's Nexus 1000V. See the full feature list here
  • Big datathe term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. The challenges include capture, curation, storage, search, sharing, transfer, analysis[4] and visualization. The trend to larger data sets is due to the additional information derivable from analysis of a single large set of related data, as compared to separate smaller sets with the same total amount of data, allowing correlations to be found to "spot business trends, determine quality of research, prevent diseases, link legal citations, combat crime, and determine real-time roadway traffic conditions.
  • NoSQLIn computing, NoSQL (commonly interpreted as "not only SQL"[1]) is a broad class of database management systems identified by non-adherence to the widely used relational database management system model. NoSQL databases are not built primarily on tables, and generally do not use SQL for data manipulation.NoSQL database systems are often highly optimized for retrieval and appending operations and often offer little functionality beyond record storage (e.g. key–value stores). The reduced run-time flexibility compared to full SQL systems is compensated by marked gains in scalability and performance for certain data models.Apache CassandraThe Apache Cassandra database is the right choice when you need scalability and high availability without compromising performance. Linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure make it the perfect platform for mission-critical data. Cassandra's support for replicating across multiple datacenters is best-in-class, providing lower latency for your users and the peace of mind of knowing that you can survive regional outages.Cassandra's ColumnFamily data model offers the convenience of column indexes with the performance of log-structured updates, strong support for materialized views, and powerful built-in caching. Cassandra is in use at Netflix, Twitter, Urban Airship, Constant Contact, Reddit, Cisco, OpenX, Digg, CloudKick, Ooyala, and more companies that have large, active data sets. The largest known Cassandra cluster has over 300 TB of data in over 400 machines. HypertableHypertable is based on a design developed by Googl(e.g.BigTable clone) to meet their scalability requirements and solves the scale problem better than any of the other NoSQL solutions out there.Mongo DB MongoDB (from "humongous") is a cross-platform document-oriented database system.RedisRedis is an open source, BSD licensed, advanced key-value store. It is often referred to as a data structure server since keys can contain strings, hashes, lists, sets and sorted sets.RiakRiak is known for its ability to distribute data across nodes using consistent hashing in a simple key/value scheme in namespaces called buckets.
  • MapReduce is a programming model for processing large data sets with a parallel, distributed algorithm on a cluster.A MapReduce program is composed of a Map() procedure that performs filtering and sorting (such as sorting students by first name into queues, one queue for each name) and a Reduce() procedure that performs a summary operation (such as counting the number of students in each queue, yielding name frequencies). The "MapReduce System" (also called "infrastructure" or "framework") orchestrates by marshalling the distributed servers, running the various tasks in parallel, managing all communications and data transfers between the various parts of the system, and providing for redundancy and fault tolerance.
  • CactiCacti is a complete network graphing solution designed to harness the power of RRDTool's data storage and graphing functionality. Cacti provides a fast poller, advanced graph templating, multiple data acquisition methods, and user management features out of the box. All of this is wrapped in an intuitive, easy to use interface that makes sense for LAN-sized installations up to complex networks with hundreds of devices.RRDToolRRDtool is the OpenSource industry standard, high performance data logging and graphing system for time series data. RRDtool can be easily integrated in shell scripts, perl, python, ruby, lua or tcl applications.Graphite Graphite is a highly scalable real-time graphing system. As a user, you write an application that collects numeric time-series data that you are interested in graphing, and send it to Graphite's processing backend, carbon, which stores the data in Graphite's specialized database. The data can then be visualized through graphite's web interfaces.
  • These tools are all appropriate for Linux guest operating systems, Windows operating system provisioning is not well addressed in OSS. AxemblerProvisonrProvisionr solves the problem of cloud portability by hiding completely the APIs and only focusing on building a cluster that matches the same set of assumptions on all clouds, assumptions like: a specific OS, pre-installed packages and binaries, sane dns settings, ssh & vpn access etc. - think a solid foundation for configuration.As a secondary goal Provisionr will also provide primitives for building automatic or semi-automatic workflows for configuring and monitoring services, workflows that assume that all the machines share a common set of characteristics as described above.CobblerCobbler is a Linux installation server that allows for rapid setup of network installation environments. It glues together and automates many associated Linux tasks so you do not have to hop between lots of various commands and applications when rolling out new systems, and, in some cases, changing existing ones. With a simple series of commands, network installs can be configured for PXE, reinstallations, media-based net-installs, and virtualized installs (supporting Xen, qemu, KVM, and some variants of VMware). Cobbler uses a helper program called 'koan' (which interacts with Cobbler) for reinstallation and virtualization support. CrowbarBare metal provisioning for CloudStack developed by Dell using Opscode Chef. JujuMetal as a Service (MAAS)MAAS offers a nice UI to provision your Ubuntu servers. Each physical server (“node”) will be commissioned automatically on first boot. During the commissioning process administrators are able to configure hardware settings manually before an automated smoke test and burn-in test are done. Once commissioned, a node can be deployed on demand by name, or allocated to a queue for dynamic allocation to services being deployed on this MAAS.Salt Cloud Salt Cloud is a tool for provisioning salted minions across various cloud providers. Currently supported providers are:- Amazon EC2- GoGrid- HP Cloud (using OpenStack)- Joyent- Linode- OpenStack- Rackspace (using OpenStack)The salt-cloud command can be used to query configured providers, create VMs on them, deploy salt-minion on those VMs and destroy them when no longer needed.Salt Cloud requires Salt to be installed, but does not require any Salt daemons to be running. However, if used in a salted environment, it is best to run Salt Cloud on the salt-master, so that it can properly lay down salt keys when it deploys machines, and then properly remove them later. If Salt Cloud is run in this manner, minions will automatically be approved by the master; no need to manually authenticate them later.Deprecated SpacewalkSpacewalk manages software content updates for Red Hat derived distributions such as Fedora, CentOS, and Scientific Linux, within your firewall. You can stage software content through different environments, managing the deployment of updates to systems and allowing you to view at which update level any given system is at across your deployment. A clean central web interface allows viewing of systems and their software update status, and initiating update actions.
  • Salt - https://github.com/saltstack/salt
  • AnsibleAnsible's SSH-key based access allows contributors to the Fedora Project to assist in automating infrastructure while having access limited appropriately. Ansible is also used to roll out and manage clusters of machines and ISV software, such as Basho's flagship key-value store Riak.CapistranoCapistrano is a developer tool for deploying web applications. It is typically installed on a workstation, and used to deploy code from your source code management (SCM) to one, or more servers.Capistrano recently added classes capabilities that match cobbler. RunDeckRunDeck is cross-platform open source software that helps you automate ad-hoc and routine procedures in data center or cloud environments. RunDeck allows you to run tasks on any number of nodes from a web-based or command-line interface. RunDeck also includes other features that make it easy to scale up your scripting efforts including: access control, workflow building, scheduling, logging, and integration with external sources for node and option data.FuncFunc allows for running commands on remote systems in a secure way, like SSH, but offers several improvements. Func allows you to manage an arbitrary group of machines all at once. Func automatically distributes certificates to all "slave" machines. There's almost nothing to configure. Func comes with a command line for sending remote commands and gathering data. There are lots of modules already provided for common tasks. Anyone can write their own modules using the simple Python module API. Everything that can be done with the command line can be done with the Python client API. The hack potential is unlimited. You'll never have to use "expect" or other ugly hacks to automate your workflow. It's really simple under the covers. Func works over XMLRPC and SSL. Since func uses certmaster, any program can use func certificates, latch on to them, and take advantage of secure master-to-slave communication. There are no databases or crazy stuff to install and configure. Again, certificate distribution is automatic too. McollectiveThe Marionette Collective AKA mcollective is a framework to build server orchestration or parallel job execution systems.Mcollective is used as a means of programmatic execution of Systems Administration actions on clusters of servers. MCollective use modern tools like Publish Subscribe Middleware and modern philosophies like real time discovery of network resources using meta data and not hostnames. Delivering a very scalable and very fast parallel execution environment.ScalrScalr is a pretty darn good open source cloud management tool. It provides both an automation framework (do Foo when Bar) and a web interface (where is this volume mounted) for managing infrastructure on the cloud, like EC2.FEATURES* Integrated into Opscode Chef, for configuration management.* Pre-automated software, such as nginx, mysql, redis, mongo, and rabbitmq* Blazing fast UI* Multi-cloud* More at http://scalr.net/features/ROADMAP* http://wiki.scalr.net/Roadmap
  • NetFlix AWS Toolbag – http://netflix.github.comOver 25 projects developed by NetFlix to manager their cloud deployments. AsgardAsgard is a web-based tool for managing cloud-based applications and infrastructure.AstyanazAstyanax is a high level Java client for Apache Cassandra. Apache Cassandra is a highly available column oriented database.EddaEdda is a Service to track changes in your cloud deployments.EurekaEureka is a REST (Representational State Transfer) based service that is primarily used in the AWS cloud for locating services for the purpose of load balancing and failover of middle-tier servers.At Netflix, Eureka is used for the following purposes apart from playing a critical part in mid-tier load balancing.For aiding Netflix Asgard - an open source service which makes cloud deployments easier, inFast rollback of versions in case of problems avoiding the re-launch of 100's of instances which could take a long time.In rolling pushes, for avoiding propagation of a new version to all instances in case of problems.For our cassandra deployments to take instances out of traffic for maintenance.For our memcached caching services to identify the list of nodes in the ring.PriamPriam is a process/tool that runs alongside Apache Cassandra to automate the following:- Backup and recovery (Complete and incremental)- Token management- Seed discovery- ConfigurationSupport AWS environmentSimian ArmyThe Simian Army is a suite of tools for keeping your cloud operating in top form. Chaos Monkey, the first member, is a resiliency tool that helps ensure that your applications can tolerate random instance failures

Great Wide Open: Crash Course Open Source Cloud Computing - 2014 Great Wide Open: Crash Course Open Source Cloud Computing - 2014 Presentation Transcript

  • Mark Hinkle Senior Director, Open Source Solutions Citrix Inc. mark.hinkle@citrix.com mrhinkle@gmail.com @mrhinkle Crash Course Open Source Cloud Computing
  • By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com Crash Course in Open Source Cloud Computing ABOUT ME I Help Build Open Source Ecosystems Open Source Experience • Manage Citrix Open Source Business Office • Apache CloudStack Committer • Advisory boards Gluster and Xen Project • Joined Citrix via Cloud.com acquisition July 2011 • Zenoss Core open source project to 100,000 users, 1.5 million downloads • Former LinuxWorld Magazine Editor-in-Chief • Open Management Consortium organizer • Author - ―Windows to Linux Business Desktop Migration‖ – Thomson • NetDirector Project - Open Source Configuration Management
  • By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com Crash Course in Open Source Cloud Computing Slides Available on Slideshare: http://www.slideshare.net/socializedsoftwar e Creative Commons Attributions-ShareAlike 4.0 International Share — copy and redistribute the material in any medium or format Adapt — remix, transform, and build upon the material for any purpose, even commercially. The licensor cannot revoke these freedoms as long as you follow the license terms. Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use. ShareAlike — If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.
  • By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com Crash Course in Open Source Cloud Computing VETTING OPEN SOURCE PROJECTS How can you tell if they’re legit • Code Velocity • Committers • Committer Reputation • User-driven or Vendor-Driven Innovation • User Activity • Corporate Support* • Reputation of Foundation*
  • By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com Crash Course in Open Source Cloud Computing OPEN SOURCE ANALYSIS Visualizing Community Activity http://www.ohloh.net http://activity.openstack.org
  • By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com Crash Course in Open Source Cloud Computing 60 SECOND CLOUD DEFINITION 5 CHARACTERISTICS OF CLOUD 1. On-Demand Self-Service 2. Broad Network Access 3. Resource Pooling 4. Rapid Elasticity 5. Measured Service User Cloud a.k.a. SOFTWARE-AS-A-SERVICE Developer Cloud a.k.a. PLATFORM-AS-A-SERVICE Systems Cloud a.k.a. INFRASTRUCTURE-AS-A- SERVICE
  • By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com Crash Course in Open Source Cloud Computing Vertical Scaling (Scale-Up) Allocate additional resources to VMs, requires a reboot, no need for distributed app logic, single-point of OS failure Horizontal Scaling (Scale-Out) Application needs logic to work in distributed fashion (e.g. HA-Proxy and Apache Hadoop) SCALE-UP SCALE OUT Elasticity and the cloud
  • By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com Crash Course in Open Source Cloud Computing VIRTUALIZATION Carving up compute resources OPEN SOURCE • Xen Project • Citrix XenServer • KVM • VirtualBox • OpenVZ • LXC PROPRIETARY • VMware • Microsoft Hyper-V • OracleVM (Based on Xen Project)
  • By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com Crash Course in Open Source Cloud Computing OPEN VIRTUALIZATION FORMATS Virtualization Payloads Open Virtualization Format (OVF) is an open standard for packaging and distributing virtual appliances or more generally software to be run in virtual machines. Formats for hypervisors/cloud technologies: • Amazon - AMI • KVM – QCOW2 • VMware – VMDK • Xen Project– IMG • Hyper-V - VHD – Virtual Hard Disk • LXC – local file system/mount point - Docker*
  • By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com Crash Course in Open Source Cloud Computing SOURCING CLOUD APPLIANCES Packaging Engines for VMs Tool/Project What you can do with them Bitnami BitNami provides free, ready to run environments for your favorite open source web applications and frameworks, including Drupal, Joomla!, Wordpress, PHP, Rails, Django and many more. Boxgrinder BoxGrinder is a set of projects that help you grind out appliances for multiple virtualization and Cloud providers Oz Command-line tool that has the ability to create images for common Linux distributions to run on KVM SUSE Studio SUSE Studio supports building and deploying directly to cloud services such as Amazon EC2.
  • By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com Crash Course in Open Source Cloud Computing LINUX CONTAINERS (LXC) “Lightweight” Linux Virtualization • Lets your run a Linux system within another Linux system • A container is a group of processes on a Linux box, put together the provide an isolated environment • From the inside, it looks like a VM • Externally it looks like normal processes • ―chroot on steroids‖
  • By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com Crash Course in Open Source Cloud Computing LXC VS. VMs Containers compared to Hardware Virtualization Source: http://www.slideshare.net/jpetazzo/presentations
  • By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com Crash Course in Open Source Cloud Computing DOCKER CONTAINER PACKAGING Open source LXC packaging engine Docker is an open-source project to easily create lightweight, portable, self-sufficient containers from any application. The same container that a developer builds and tests on a laptop can run at scale, in production, on VMs, bare metal, public clouds and more. To learn more please visit our website: www.docker.io
  • By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com Crash Course in Open Source Cloud Computing MultiplicityofGoodsMultiplicityof methodsfor transporting/storing DoIworryabouthow goodsinteract(e.g. coffeebeansnextto spices) CanItransport quicklyand smoothly (e.g.fromboatto traintotruck) CARGO TRANSPORT PRE-1960
  • By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com Crash Course in Open Source Cloud Computing MultiplicityofGoods Multiplicityof methodsfor transporting/storing DoIworryabouthow goodsinteract(e.g. coffeebeansnextto spices) CanItransportquicklyand smoothly (e.g.fromboattotrainto truck) …in between, can be loaded and unloaded, stacked, tran sported efficiently over long distances, and transferred from one mode of transport to another A standard container that is loaded with virtually any goods, and stays sealed until it reaches final delivery. SOLUTION INTERMODAL SHIPPING CONTAINER
  • By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com Crash Course in Open Source Cloud Computing Static website Web frontendUser DB Queue Analytics DB Developm ent VM QA server Public Cloud Contributor’ s laptop MultiplicityofStacks Multiplicityof hardware environments Production Cluster Customer Data Center Doservicesandapps interactappropriately? CanImigratesmoothly andquickly An engine that enables any payload to be encapsulated as a lightweight, portable, self-sufficient container… …that can be manipulated using standard operations and run consistently on virtually any hardware platform DOCKER IS A SHIPPING CONTAINER FOR CODE
  • By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com Crash Course in Open Source Cloud Computing INFRASTRUCTURE AS-A-SERVICE Compute Orchestration Year Started License Virtualization Technologies Apache CloudStack 2008 Apache Xenserver, Xen Cloud Platform, KVM, Vmware Hyper-V Eucalyptus 2006 GPL Xen, KVM, VMware (commercial version) OpenNebula 2005 Apache Xen, KVM, VMware OpenStack 2010 (Developed by NASA by Anso Labs previously) Apache VMware ESX and ESXi, , Xen, XenServer, KVM, LXC, QEMU and Virtual Box
  • By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com Crash Course in Open Source Cloud Computing OPENSTACK The Boy Band of the Open Source Cloud 
  • By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com Crash Course in Open Source Cloud Computing OPENSTACK SHARED SERVICES Span Compute, Storage and Networking IDENTITY SERVICE IMAGE SERVICE TELEMETRY SERVICE ORCHESTRATI ON SERVICE
  • By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com Crash Course in Open Source Cloud Computing EVEN MORE OPENSTACK PROJECTS Span Compute, Storage and Networking • Cinder (Block Storage Service) • Metering/Monitoring(Ce ilometer) • Orchestration (Heat) • Trove(Database Service) • Bare Metal (Ironic) • Queue Service (Marconi)
  • By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com Crash Course in Open Source Cloud Computing OPENSTACK SOLUTION PROVIDERS If you can’t do it yourself
  • By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com Crash Course in Open Source Cloud Computing CLOUD APIS Everything (should) have an API in the Cloud • deltacloud • daisein • jclouds • libcloud • fog
  • By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com Crash Course in Open Source Cloud Computing CLOUD STORAGE Virtualized, Distributed usually on Commodity Hardware Project Description Ceph Distributed file storage system developed by DreamHost GlusterFS Scale Out NAS system aggregating storage over Ethernet or Infiniband OpenStack Storage Long-term object storage system Riak CS Riak CS is open source software designed to provide simple, available, distributed cloud storage at any scale. Riak CS is S3-API compatible and supports per-tenant reporting for billing and metering use cases. Sheepdog Distributed storage for KVM hypervisors
  • By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com Crash Course in Open Source Cloud Computing PLATFORM-AS-A-SERVICE Abstracted Cloud-Scale Run-Time Environments Project Sponsors Languages/Frameworks CloudFoundry Vmware -> Pivotal -> CloudFoundry Foundation Spring for Java, Ruby for Rails and Sinatra, node.js, Grails, Scala on Lift and more via partners (e.g. Python, PHP) Cloudify Gigaspaces [Groovy for deployment recipes] OpenShift Origin Red Hat Java, Ruby, PHP, Perl and Python Apache Stratos WSO2 - >Apache Stratus PHP, Tomcat, MySQL ―cartridges‖
  • By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com Crash Course in Open Source Cloud Computing SOFTWARE DEFINED NETWORKING(SDN) Virtualization meets the network Decoupling of the control and data planes of the network to improve efficiency. Communication from a SDN controller via a protocol to network devices both physical and virtual. Automation Dynamic Networks Security Heterogeneous Management Abstractions allow for programmable networks. Network can be changed quickly via a controller Network offerings can match virtualization offerings for finer grained security in a highly volatile compute landscape. Single control point for various devices.
  • By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com Crash Course in Open Source Cloud Computing Business Applications Network Services SDN Control Software API API Network DevicesNetwork DevicesNetwork Devices Network DevicesNetwork DevicesNetwork Devices Application Layer Control Layer Infrastructure Layer Control Data Plane Interface (e.g. OpenFlow) SDN OVERVIEW
  • By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com Crash Course in Open Source Cloud Computing OPENFLOW Virtualization meets the network OpenFlow enables networks to evolve, by giving a remote controller the power to modify the behavior of network devices, through a well-defined "forwarding instruction set". The growing OpenFlow ecosystem now includes routers, switches, virtual switches, and access points from a range of vendors.
  • By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com Crash Course in Open Source Cloud Computing OPEN VSWITCH Open vSwitch is a production quality, multilayer virtual switch licensed under the open source Apache 2.0 license. It is designed to enable massive network automation through programmatic extension, while still supporting standard management interfaces and protocols (e.g. NetFlow, sFlow, SPAN, RSPAN, CLI, LACP, 8 02.1ag). To learn more please visit our website: http://openvswitch.org/
  • By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com Crash Course in Open Source Cloud Computing Project Description Floodlight The Floodlight controller is an enterprise-class, Apache-licensed, Java-based OpenFlow Controller. Indigo Indigo is an open source project to support OpenFlow on a range of physical switches. By leveraging hardware features of Ethernet switch ASICs, Indigo supports high rates for high port counts, up to 48 10-gigabit ports. Multiple gigabit platforms with 10-gigabit uplinks are also supported. Open Daylight Linux Foundation Collaborative Project based on Cisco One Controller and plugins from numerous vendors in development. E.g IBM DOVE OpenStack Network Pluggable, scalable, API-driven network and IP management Open vSwitch Open vSwitch is a open source (ASL 2.0), multilayer virtual switch designed to enable massive network automation through programmatic extension, while still supporting standard management interfaces and protocols (e.g. NetFlow, sFlow, SPAN, RSPAN, CLI, LACP, 802.1ag). Hitchhiker’s Guide to the Open Cloud by @mrhinkle 29 OPEN SOURCE SDN
  • By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com Crash Course in Open Source Cloud Computing BIG DATA
  • By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com Crash Course in Open Source Cloud Computing NOSQL DATABASES Horizontally scalable unstructured data retrieval Name Type Description Apache Cassandra Wide Column Store/Families API: many » Query Method: MapReduce, Replicaton: , Written in: Java, Concurrency: eventually consistent , Misc: like "Big-Table on Amazon Dynamo alike", initiated by Facebook CouchDB Document Store API: Memcached API+protocol (binary and ASCII) , most languages, Protocol: Memcached REST interface for cluster conf + management, Written in: C/C++ + Erlang (clustering), Replication: Peer to Peer, fully consistent, Misc: Transparent topology changes during operation, provides memcached-compatible caching buckets HBase Wide Column Store/Families API: Java / any writer, Protocol: any write call, Query Method: MapReduce Java / any exec, Replication: HDFS Replication, Written in: Java Hypertable Wide Column Store/Families PI: Thrift (Java, PHP, Perl, Python, Ruby, etc.), Protocol: Thrift, Query Method: HQL, native Thrift API, Replication: HDFS Replication, Concurrency: MVCC, Consistency Model: Fully consistent Misc: High performance C++ implementation of Google's Bigtable. MongoDB Document Store API: BSON, Protocol: C, Query Method: dynamic object-based language & MapReduce, Replication: Master Slave & Auto-Sharding, Written in: C++,Concurrency Redis Key Value/ Tuple Store API: Tons of languages, Written in: C, Concurrency: in memory and saves asynchronous disk after a defined time. Append only mode available. Different kinds of fsync policies. Replication: Master / Slave, Misc: also lists, sets, sorted sets, hashes, queues. Riak Key Value / Tuple Store API: JSON, Protocol: REST, Query Method: MapReduce term matching , Scaling: Multiple Masters; Written in: Erlang, Concurrency: eventually consistent (stronger then MVCC via Vector Clocks)
  • By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com Crash Course in Open Source Cloud Computing MAP REDUCE Algorithm for Parallelized Data Set Processing Problem Data Master Node Worker Node 1 Worker Node 2 Worker Node 3 Solution Data Map Reduce
  • By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com Crash Course in Open Source Cloud Computing APACHE HADOOP Apache Project for Parallelized Data Set Processing Overview • Handles large amounts of data • Stores data in native format • Delivers linear scalability at low cost • Resilient in case of infrastructure failures • Transparent application scalability Features • Handles large amounts of data • Stores data in native format • Delivers linear scalability at low cost • Resilient in case of infrastructure failures • Transparent application scalability
  • By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com Crash Course in Open Source Cloud Computing Hadoop Hadoop Common HDFS Distributes & replicates data across machines MapReduce Distributes & monitors tasks Hive Data warehouse that provides SQL interface. Ad hoc projection of data structure to unstructured MapReduce • Parallel programming • Handles large data blocks Non-Relational DB HBase Column-oriented schema-less distributed DB modeled after Google’s BigTable Random real time read/write. Scripting Pig Platform for manipulating and analyzing large data sets. Scripting language for analysts. Mahout Machine learning libraries for recommendations , clustering, classifications and item sets. Machine Learning ChuckwaZookeeper APACHE HADOOP ECOSYSTEM
  • By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com Crash Course in Open Source Cloud Computing CONTACT ME Happy to Chat about Open Source, Cloud or Pittsburgh Sports Professional: mark.hinkle@citrix.com Personal: mrhinkle@gmail.com Phone: 919.228.8049 Professional: http://open.citrix.com Personal: http://www.socializedsoftware.com Twitter: @mrhinkle
  • By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com Crash Course in Open Source Cloud Computing Appendix
  • By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com Crash Course in Open Source Cloud Computing Additional Links • Devops Toolchains Group • Software Defined Networking: The New Norm for Networks (Whitepaper) • DevOps Wikipedia Page • NoSQL-Database.org – Ultimate Guide to the Non-Relational Universe • Open Cloud Initiative • NIST Cloud Computing Platform • Open Virtualization Format Specs • Clouderati Twitter Account • Planet DevOps • Nicira Whitepaper – It’s Time to Virtualize the Network • Why Open vSwitch FAQ
  • By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com Crash Course in Open Source Cloud Computing Cloud Monitoring Tools License Type of Monitoring Collection Methods Cacti / RRDTool GPL Performance SNMP, syslog Graphite Apache 2.0 Performance Agent Nagios GPL Availability SNMP,TCP, ICMP, IPMI, syslog Zabbix GPL Availability/ Performance and more SNMP, TCP/ICMP, IPMI, Synthetic Transactions Zenoss GPL Availability, Performance, Event Management SNMP, ICMP, SSH, syslog, WMI Hitchhiker’s Guide to the Open Cloud by @mrhinkle 38
  • By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com Crash Course in Open Source Cloud Computing Cloud Provisioning Project Installation Targets Apache Provisionr(incubating) Can provision 10s to 1000s of machines on various clouds. Cobbler Distributed virtual infrastructure using koan (kickstart of a network to PXE boot VMs) for Red Hat, OpenSUSE Fedora, Debian, Ubuntu VMs Crowbar (Bare metal provisioning) JuJu Public Clouds - Amazon Web Services HP Cloud, Private OpenStack clouds, Bare Metal via MAAS. Salt Cloud Tool to provision ―salted‖ VMs that can then be updated by a central server via ZeroMQ Hitchhiker’s Guide to the Open Cloud by @mrhinkle 39
  • By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com Crash Course in Open Source Cloud Computing Configuration Management Tools Project Year Started Language License Client/Server Cfengine 1993 C Apache Yes Chef 2009 Ruby Apache Chef Solo – No Chef Server - Yes Puppet 2004 Ruby GPL Yes & standalone Salt 2011 Python Apache yes Hitchhiker’s Guide to the Open Cloud by @mrhinkle 40
  • By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com Crash Course in Open Source Cloud Computing Automation/Orchestration ToolsProject Description Ansible Ansible's SSH-key based access allows contributors to the Fedora Project to assist in automating infrastructure while having access limited appropriately. Capistrano Utility and framework for executing commands in parallel on multiple remote machines, via SSH. It uses a simple DSL that allows you to define tasks, which may be applied to machines in certain roles RunDeck Rundeck is an open-source process automation and command orchestration tool with a web console. Func Func provides a two-way authenticated system for generically executing tasks, integrations with puppet and cobbler. MCollective The Marionette Collective AKA MCollective is a framework to build server orchestration or parallel job execution systems. Salt Execute arbitrary shell commands or choose from dozens of pre-built modules of common (or complex) commands. Scalr Provide scaling across multiple cloud computing platforms, integrates with Chef. Hitchhiker’s Guide to the Open Cloud by @mrhinkle 41
  • By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com Crash Course in Open Source Cloud Computing NetFlix Open Source ToolBag for AWS ASGARD ASTYANAX EDDA EUREKA PRIAM SIMIAN ARMY 42 Hitchhiker’s Guide to the Open Cloud by @mrhinkle http://netflix.github.com