OSCON 2013 - The Hitchiker’s Guide to Open Source Cloud Computing


Published on

And while the Hitchhiker’s Guide to the Galaxy (HHGTTG) is a wholly remarkable book it doesn’t cover the nuances of cloud computing. Whether you want to build a public, private or hybrid cloud there are free and open source tools that can help provide you a complete solution or help augment your existing Amazon or other hosted cloud solution. That’s why you need the Hitchhiker’s Guide to (Open Source) Cloud Computing (HHGTCC) or at least to attend this talk understand the current state of open source cloud computing. This talk will cover infrastructure-as-a-service, platform-as-a-service and developments in big data and how to more effectively deploy and manage open source flavors of these technologies. Specific the guide will cover:

Infrastructure-as-a-Service – The Systems Cloud – Get a comparison of the open source cloud platforms including OpenStack, Apache CloudStack, Eucalyptus and OpenNebula
Platform-as-a-Service – The Developers Cloud – Learn about the tools that abstract the complexity for developers and used to build portable auto-scaling applications ton CloudFoundry, OpenShift, Stackato and more.
Data-as-a-Service – The Analytics Cloud – Want to figure out the who, what, where, when and why of big data? You’ll get an overview of open source NoSQL databases and technologies like MapReduce to help parallelize data mining tasks and crunch massive data sets in the cloud.
Network-as-a-Service – The Network Cloud – The final pillar for truly fungible network infrastructure is network virtualization. We will give an overview of software-defined networking including OpenStack Quantum, Nicira, open Vswitch and others.
Finally this talk will provide an overview of the tools that can help you really take advantage of the cloud. Do you want to auto-scale to serve millions of web pages and scale back down as demand fluctuates. Are you interested in automating the total lifecycle of cloud computing environments You’ll learn how to combine these tools into tool chains to provide continuous deployment systems that will help you become agile and spend more time improving your IT rather than simply maintaining it.

[Finally, for those of you that are Douglas Adams fans please accept the deepest apologies for bad analogies to the HHGTTG.]

Published in: Technology, Education
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • The EncyclopædiaGalactica is a fictional or hypothetical encyclopædia of a galaxy-spanning civilization, containing all the knowledge accumulated by a society with quadrillions of people and thousands of years of history. The name evokes the exhaustive and imperialistic aspects of the real-life Encyclopædia Britannica.
  • Infinite Probability Drive The Infinite Improbability Drive is a faster-than-light drive. The most prominent usage of the drive is in the starship Heart of Gold. It is based on a particular perception of quantum theory: a subatomic particle is most likely to be in a particular place, such as near the nucleus of an atom, but there is also an infintesimally small probability of it being found very far from its point of origin (for example close to a distant star). Thus, a body could travel from place to place without passing through the intervening space (or hyperspace, for that matter), if you had sufficient control of probability.Reference : Michael Lockwood (2005). The Labyrinth of Time: introducing the universe. Oxford University Press. ISBN 0-19-924995-4.
  • Private cloudThe cloud infrastructure is operated solely for an organization. It may be managed by the organization or a third party and may exist on premise or off premise.Public cloudThe cloud infrastructure is made available to the general public or a large industry group and is owned by an organization selling cloud services.Hybrid cloudThe cloud infrastructure is a composition of two or more clouds (private, community, or public) that remain unique entities but are bound together by standardized or proprietary technology that enables data and application portability (e.g., cloud bursting for load balancing between clouds).
  • The Kill-o-Zap is a weapon first appearing in the novel The Hitchhiker's Guide to the Galaxy, wielded by the police from Blagulon Kappa when they come to Magrathea to arrest Zaphod. It is referenced throughout the series in the role of a standard and widespread brand of raygun.In the novel The Restaurant at the End of the Universe it is described in more detail:The designer of the gun had clearly not been instructed to beat about the bush. 'Make it evil,' he'd been told. 'Make it totally clear that this gun has a right end and a wrong end. Make it totally clear to anyone standing at the wrong end that things are going badly for them. If that means sticking all sort of spikes and prongs and blackened bits all over it then so be it. This is not a gun for hanging over the fireplace or sticking in the umbrella stand, it is a gun for going out and making people miserable with.'In the novel Life, the Universe and Everything, the group arms themselves with Kill-o-Zap guns against the Krikkiters. Arthur "fumbled to release the safety catch and engage the extreme danger catch as Ford had shown him. He was shaking so much that if he'd fired at anybody at that moment he probably would have burnt his signature on them."In the 2005 movie adaptation, the gun has a sophisticated look. It is more of a white circle that covers the hand and has a trigger on the inside. This version is wielded by Marvin.
  • Derived from the NIST Diagram Physical Resources NetworkingComputeStorageBios/FirmwareSoftware KernelOperating Systems with Type II HypervisorsVM Manager (VMM) – Type 1 Hypervisors Virtualized Resources NetworkingComputeStorageVirtualized ResourcesMetadataVirtual Machine Images
  • Top choices for Cloud Computing are Xen and KVM.OpenVZ, container virtualization for Linux, is an interesting option as it has a very minimal overhead to scale application space similar to containers like BSD Jails. Advantage is that memory allocation is soft and unutilized memory can be used by other applications.
  • OVFAn OVF package consists of several files, placed in one directory. A one-file alternative is the OVA package, which is a TAR file with the OVF directory inside.OVF is a packaging format for software appliances. From a technical point of view, an OVF is a transport mechanism for virtual machine templates. One OVF may contain a single VM, or many VMs (it is left to the software appliance developer to decide which arrangement best suits their application). OVFs must be installed before they can be run; a particular virtualization platform may run the VM from the OVF, but this is not required. If this is done, the OVF itself can no longer be viewed as a “golden image” version of the appliance, since run-time state for the virtual machine(s) will pervade the OVF. Moreover the digital signature that allows the platform to check the integrity of the OVF will be invalidAn Amazon Machine Image (AMI) is a special type of virtual appliance which is used to instantiate (create) a virtual machine within the Amazon Elastic Compute Cloud. It serves as the basic unit of deployment for services delivered using EC2..Amazon AMI An Amazon Machine Image (AMI) is a special type of virtual appliance which is used to instantiate (create) a virtual machine within the Amazon Elastic Compute Cloud. It serves as the basic unit of deployment for services delivered using EC2. Like all virtual appliances, the main component of an AMI is a read-only filesystem image which includes an operating system (e.g., Linux, UNIX, or Windows) and any additional software required to deliver a service or a portion of it.[2]The AMI filesystem is compressed, encrypted, signed, split into a series of 10MB chunks and uploaded into Amazon S3 for storage. An XML manifest file stores information about the AMI, including name, version, architecture, default kernel id, decryption key and digests for all of the filesystem chunks.An AMI does not include a kernel image, only a pointer to the default kernel id, which can be chosen from an approved list of safe kernels maintained by Amazon and its partners (e.g., RedHat, Canonical, Microsoft). Users may choose kernels other than the default when booting an AMI.QCOW2 – QEMU “Copy on Write” Version 2qcow stands for "QEMU Copy On Write" and denotes a disk storage optimization strategy that delays allocation of storage until it is actually needed. QEMU is an emulator and virtual machine container, and it can use a variety of virtual disk images which are generally associated with specific guests operating systems.qcow2 is a newer version of the qcow format. QEMU can use a base image which is read-only, and store all writes to the qcow2 image. Among the QEMU supported formats, this is the most versatile format. Features include smaller images (useful if the filesystem does not support holes, for example on FAT32), optional AES encryption, zlib based compression and support of multiple VM snapshots. qemu and xen have retained the qcow format for backwards compatibility. Users can easily convert qcow disk images to the qcow2 format.VMDK - Virtual Machine Disk VMDK (Virtual Machine Disk) is a file format used for virtual appliances developed for VMware products. The format is a container for virtual hard disk drives to be used in virtual machines like VMware Workstation or Virtualbox. VMDK is an open format.IMGThe IMG file extension is used by files which are standardized raw dumps of a disk, and by files in various formats created by different imaging programs.Xen can use raw disk images and physical disks as filesystems for a Xen based domainU. Another option is to use the disk images used by QEMU. VHD – Virtual Hard Disk Virtual Hard Disk format started by Connectix (now part of Microsoft) made open through the Microsoft Open Specification Promise.VHDs are implemented as files that reside on the native host file system. The following types of VHD formats are supported by Microsoft Virtual PC and Virtual Server:Fixed hard disk image: a file that is allocated to the size of the virtual disk. Fixed VHDs consist of a raw disk image followed by a VHD footer (512 or formerly 511 bytes).[1]Dynamic hard disk image: a file that at any given time is as large as the actual data written to it, plus the size of the header and footer. Dynamic and differencing VHDs begin with a copy of the VHD footer (padded to 512 bytes), and for dynamic or differencing VHDs created by Microsoft products this results in a VHD-cookie string conectix at the begin of the VHD file.[1]Differencing hard disk image: a set of modified blocks (maintained in a separate file referred to as the "child image") in comparison to a parent image. The Differencing hard disk image format allows the concept of Undo Changes: when enabled, all changes to a hard drive contained within a VHD (the parent image) are stored in a separate file (the child image). Options are available to undo the changes to the VHD, or to merge them permanently into the VHD. Different child images based on the same parent image also allow "cloning" of VHDs; at least the globally unique identifier (GUID) must be different.Linked to a hard disk: a file which contains a link to a physical hard drive or partition of a physical hard drive
  • Appliances are like toasters, they do one thing very well. BitnamiBitNami Cloud Images allow BitNami Stacks to run in a cloud computing environment. BitNami offers Amazon Machine Images (AMIs) for running BitNami Stacks on the Amazon Cloud, as well as BitNami Cloud Hosting, a service that simplifies the process of running open source applications on Amazon EC2.BoxGrinderBoxGrinder supports many virtualization and Cloud platforms like EC2, Xen, KVM, VMware. You can create an appliance based on Fedora, Red Hat Enterprise Linux or CentOS. You are of course free to write your own plugin to support any other virtualization platform or operating system.Oz Oz is a command-line tool that has the ability to create images for common Linux distributions. There are lots of tools for image building.  Oz is a bit different from most others in that it actually spawns a VM to do an install, while most other tools simply use a loopback mounted filesystem.SUSE StudioSuSE Studio allows you to use a hosted build service and a on premise virtual build system. Has a RESTful API to make calls to SUSE Studio openSUSE, SUSE Enterprise Linux (SuSE) and JeOSIntegrates with SUSE Lifecycle Management Server and WebYASTCan Share Images in the SUSE Studio GalleryOther projects:Imagefactory - http://imgfac.org/ - imagefactory builds images for a variety of operating system/cloud combinations.UShareSoft – Create cloud Server Templates on any OS in minutes via a SaaS
  • Scale Up Scale Out
  • CloudStack – www.cloudstack.org - CloudStack is an Apache Software Foundation project released under ASL 2.0 that provides a highly capable IaaS solution for service providers and enterprises. Robust Web Interface Comprehensive APISecure-Single Sign-OnDynamic Workload ManagementXenserver, Xen Cloud Platform, KVM, VMware, OracleVM supportSecure AJAX Console for VMsNetworking-as-a-Service (Create VLANs to segregate traffic)EC2 API Compatibility Usage MeteringEucalyptus– http://open.eucalyptus.com - IaaS platform originally targeted to provide migration path from Amazon EC2 to private cloud. Amazon AWS Interface CompatibilitySupports Amazon AMIHigh AvailabilityNetwork Management, Security Groups, Traffic IsolationSelf Service S3 compatible Storage Bucket-Based StorageXen and KVM Hypervisor Support (VMware in Enterprise Edition)User Group and Role-Based ManagementSingle Data Center OpenStack– www.openstack.orgOpenStack Compute (Nova) – Nova is a cloud orchestration platform similar to Amazon EC2 Orchestration of popular hypervisors (Xen, Xenserver, KVM, Hyper-V, VMware, Linux Containers)Floating IP Addresses (keep IPs and DNS correct when restarting VMs)VNC proxy through the WebApache 2.0 License Android/iOS ClientsBlock Storage Support (AOE, iSCSI, Sheepdog)OpenStack Storage (Swift) – Is a EBS style solution used for long term storage not real time. Swift is used creating redundant, scalable object storage using clusters of standardized servers to store petabytes of accessible data.Features:Store and Manage files ProgrammaticallyCreate public and private folders Using Commodity HardwareFault tolerant (Nodes/HDD)Scale-out, Scale-UpOpenStack Image Service(Glance) - OpenStack Image Service (code-named Glance) provides discovery, registration, and delivery services for virtual disk images.Features:Provides images-as-a-serviceSupports Raw, VHD, VDI, qcow2, VMDK, OVF Restful APIBackend Options – Swift, Local, S3, HTTPVersion Control and LoggingOpenNebula – http://www.opennebula.org/ – Cloud Computing Toolkit Apache license
  • Nova – Compute Fabric Controller similar to Amazon EC2 Nova is the project name for OpenStack Compute, a cloud computing fabric controller, the main part of an IaaS system. Individuals and organizations can use Nova to host and manage their own cloud computing.Component based architecture: Quickly add new behaviorsHighly available: Scale to very serious workloadsFault-Tolerant: Isolated processes avoid cascading failuresRecoverable: Failures should be easy to diagnose, debug, and rectifyOpen Standards: Be a reference implementation for a community-driven apiAPI Compatibility: Nova strives to provide API-compatible with popular systems like Amazon EC2Object Storage - Swift – Object Storage like Amazon S3Object Storage is ideal for cost effective, scale-out storage. It provides a fully distributed, API-accessible storage platform that can be integrated directly into applications or used for backup, archiving and data retention. Block Storage allows block devices to be exposed and connected to compute instances for expanded storage, better performance and integration with enterprise storage platforms, such as NetApp, Nexenta and SolidFire.Image Service “Glance”The OpenStack Image Service provides discovery, registration and delivery services for disk and server images. The ability to copy or snapshot a server image and immediately store it away is a powerful capability of the OpenStack cloud operating system. Stored images can be used as a template to get new servers up and running quickly—and more consistently if you are provisioning multiple servers—than installing a server operating system and individually configuring additional services. It can also be used to store and catalog an unlimited number of backups.Identity Service – “Keystone”OpenStack Identity provides a central directory of users mapped to the OpenStack services they can access. It acts as a common authentication system across the cloud operating system and can integrate with existing backend directory services like LDAP. It supports multiple forms of authentication including standard username and password credentials, token-based systems and AWS-style logins.Additionally, the catalog provides a queryable list of all of the services deployed in an OpenStack cloud in a single registry. Users and third-party tools can programmatically determine which resources they can access.As an administrator, OpenStack Identity enables you to:Configure centralized policies across users and systemsCreate users and tenants and define permissions for compute, storage and networking resources using role-based access control (RBAC) featuresIntegrate with an existing directory like LDAP, allowing for a single source of identity authentication across the enterpriseQuantum The Quantum API allows for creation and management of “virtual networks” each of which can have one or more “ports”. A port on a virtual network can be attached to a “network interface”, where a “network interface” is anything which can source traffic, such as a vNIC exposed by a virtual machine, an interface on a load balancer, and so on. These abstractions offered by Quantum (virtual networks, virtual ports,and network interfaces) are the building blocks for building and managing logical network topologies. Of course, the technology that implements Quantum is fully decoupled from the API (that is, the backend is “pluggable”).
  • Types of Tasks Accomplished by an APIProvisioning (creating, re-creating, moving, or deleting components e.g. virtual machines, vlans)Configuration (assigning or changing attributes of the architecture such as security and network settings)Cloud ProvidersJclouds – java API Abstraction Libcloud – started by CloudKick (now Rackspace) to abstract clouds, Apache incubator projectDeltacloud – started by Red Hat to abstract clouds, Apache incubator projectFog - provider and abstraction level API across compute and storage, written in Ruby
  • Primary Storage Secondary Storage = GlusterFSGluster FS is an open source scale-out NAS solution. The software is a powerful and flexible solution that simplifies the task of managing unstructured file data whether you have a few terabytes of storage or multiple petabytes.CephCeph is a distributed network storage and file system designed to provide excellent performance, reliability, and scalability.  Ceph is based on a reliable and scalable distributed object store, with a distributed metadata management cluster layered on top to provide a distributed file system with POSIX semantics.  There are a variety of ways to interact with the system Used by Bloomberg and DreamHostOpenStack Storage (code-named Swift)is open source software for creating redundant, scalable object storage using clusters of standardized servers to store petabytes of accessible data. It is not a file system or real-time data storage system, but rather a long-term storage system for a more permanent type of static data that can be retrieved, leveraged, and then updated if necessary. Primary examples of data that best fit this type of storage model are virtual machine images, photo storage, email storage and backup archiving. Having no central "brain" or master point of control provides greater scalability, redundancy and permanence.Riak Cloud Storage is simple, available storage software built on top of Riak. It features:Large object support (up to 5GB/object)S3-compatible API and authenticationMulti-tenancy and per-user reportingPer-node or capacity-based pricingMulti-datacenter replicationSheepdogSheepdogis a distributed storage system for QEMU/KVM. It provides highly available block level storage volumes that can be attached to QEMU/KVM virtual machines. Sheepdog scales to several hundreds nodes, and supports advanced volume management features such as snapshot, cloning, and thin provisioning.
  • CloudFoundryCloud Foundry, a VMware-led project, for building a Platform as a Service (PaaS) offering. Cloud Foundry provides a platform for building, deploying, and running cloud apps using Spring for Java developers, Rails and Sinatra for Ruby developers, Node.js and other JVM frameworks including Grails.CloudifyCloudify is designed to bring any app to any cloud enabling enterprises, ISVs, and managed service providers alike to quickly benefit from the cloud automation and elasticity organizations today need. Cloudify helps you maximize application onboarding and automation by externally orchestrating the application deployment and runtime. Cloudify’sDevOps approach treats infrastructure as code, enabling you to describe deployment and post–deployment steps for any application through an external blueprint – AKA, a recipe, which you can then take from cloud to cloud, unchanged.Cloudify recipes on Github at OpenShiftA free Platform-as-a-Service that enables developers to deploy apps written in multiple frameworks and languages across clouds. Open source licensing is forthcoming. StackatoStackato enables you to create a private PaaS hosted on the cloud of your choice (your own or with a hosting provider) to empower your developers to deploy, run, and manage their applications in the cloud. Stackato includes:Multi-choice cloud application platform with automatic provisioning:choice of language (Java, Python, PHP, Ruby, Perl, Node.js, Erlang, Scala, Clojure)choice of framework (popular frameworks for each of the languages above, such as Spring, Django, Pyramid, Rails, Mojolicious, Catalyst and more)choice of data service (MySQL, PostgreSQL, Redis, MongoDB) plus ability to connect to othersWSO2 The WSO2 middleware platform offers a full range of core services: application server, enterprise service bus (ESB), governance registry and repository, identity and access management, business process management (BPM), business activity monitor (BAM), portal server and more. WSO2 Stratos monitors CPU, memory and bandwidth utilization, and SLAs. Then it automatically scales up or down depending on the load. When new resources are needed, WSO2 Stratos transparently adds services and when load goes down, WSO2 Stratos automatically brings services down. Dynamic discovery enables services to automatically detect when resource allocations change; there is no need for manual monitoring or reconfiguration.
  • A towel, [The Hitchhiker's Guide to the Galaxy] says, is about the most massively useful thing an interstellar hitchhiker can have. Partly it has great practical value. You can wrap it around you for warmth as you bound across the cold moons of Jaglan Beta; you can lie on it on the brilliant marble-sanded beaches of Santraginus V, inhaling the heady sea vapors; you can sleep under it beneath the stars which shine so redly on the desert world of Kakrafoon; use it to sail a miniraft down the slow heavy River Moth; wet it for use in hand-to-hand-combat; wrap it round your head to ward off noxious fumes or avoid the gaze of the Ravenous Bugblatter Beast of Traal (such a mind-boggingly stupid animal, it assumes that if you can't see it, it can't see you); you can wave your towel in emergencies as a distress signal, and of course dry yourself off with it if it still seems to be clean enough.”
  • In the SDN architecture, the control and data planes are decoupled, network intelligence and state are logically centralized, and the underlying network infrastructure is abstracted from the applications. As a result, enterprises and carriers gain unprecedented programmability, automation, and network control, enabling them to build highly scalable, flexible networks that readily adapt to changing business needs
  • Software Defined Networking (SDN) is an emerging network architecture where network control is decoupled from forwarding and is directly programmable. This migration of control, formerly tightly bound in individual network devices, into accessible computing devices enables the underlying infrastructure to be abstracted for applications and network services, which can treat the network as a logical or virtual entity. This figure depicts a logical view of the SDN architecture. Network intelligence is (logically) centralized in software-based SDN controllers, which maintain a global view of the network. As a result, the network appears to the applications and policy engines as a single, logical switch. With SDN, enterprises and carriers gain vendor-independent control over the entire network from a single logical point, which greatly simplifies the network design and operation. SDN also greatly simplifies the network devices themselves, since they no longer need to understand and process thousands of protocol standards but merely accept instructions from the SDN controllers.
  • The limitations of the hardware-dependent network are preventing the enterprise from realizing the full potential of their cloud—and vastly limiting the return on their investment.To get the most from your cloud, you must untether your network.
  • Open FlowOpenFlow is an open standard that enables researchers to run experimental protocols in the campus networks we use every day. OpenFlow is added as a feature to commercial Ethernet switches, routers and wireless access points – and provides a standardized hook to allow researchers to run experiments, without requiring vendors to expose the internal workings of their network devices. OpenFlow is currently being implemented by major vendors, with OpenFlow-enabled switches now commercially available.In a classical router or switch, the fast packet forwarding (data path) and the high level routing decisions (control path) occur on the same device. An OpenFlow Switch separates these two functions. The data path portion still resides on the switch, while high-level routing decisions are moved to a separate controller, typically a standard server. The OpenFlow Switch and Controller communicate via the OpenFlow protocol, which defines messages, such as packet-received, send-packet-out, modify-forwarding-table, and get-stats.The data path of an OpenFlow Switch presents a clean flow table abstraction; each flow table entry contains a set of packet fields to match, and an action (such as send-out-port, modify-field, or drop). When an OpenFlow Switch receives a packet it has never seen before, for which it has no matching flow entries, it sends this packet to the controller. The controller then makes a decision on how to handle this packet. It can drop the packet, or it can add a flow entry directing the switch on how to forward similar packets in the future.OpenFlow is the first standard communications interface defined betweenthe control and forwarding layers of an SDN architecture. OpenFlow allows direct access to and manipulation of the forwarding plane of network devices such as switches and routers, both physical and virtual (hypervisor-based). It is the absence of an open interface to the forwarding plane that has led to the characterization of today’s networking devices as monolithic, closed, and mainframe-like. No other standard protocol does what OpenFlow does, and a protocol like OpenFlow is needed to move network control out of the networking switches to logically centralized control software
  • Floodlight - http://floodlight.openflowhub.org/The Floodlight controller is an enterprise-class, Apache-licensed, Java-based OpenFlow Controller. It is supported by a community of developers including a number of engineers from Big Switch Networks.OpenFlow is a open standard managed by the Open Networking Foundation (ONF). It specifies a protocol through switch a remote controller can modify the behavior of networking devices through a well-defined “forwarding instruction set”. Floodlight is designed to work with the growing number of switches, routers, virtual witches, and access points that support the OpenFlow standard.Open Daylight – http://www.opendaylight.comThe adoption of new technologies and pursuit of programmable networks has the potential to significantly improve levels of functionality, flexibility and adaptability of mainstream datacenter architectures. To leverage this abstraction to its fullest requires the network to adapt and evolve to a Software-Defined architecture. One of the architectural elements required to achieve this goal is a Software-Defined-Networking (SDN) platform that enables network control and programmability.OpenStack Networking “Quantum” – https://www.openstack.org/software/openstack-networking/OpenStack Networking is a pluggable, scalable and API-driven system for managing networks and IP addresses. Like other aspects of the cloud operating system, it can be used by administrators and users to increase the value of existing datacenter assets. OpenStack Networking ensures the network will not be the bottleneck or limiting factor in a cloud deployment and gives users real self service, even over their network configurations.Networking CapabilitiesOpenStack provides flexible networking models to suit the needs of different applications or user groups. Standard models include flat networks or VLANs for separation of servers and traffic.OpenStack Networking manages IP addresses, allowing for dedicated static IPs or DHCP. Floating IPs allow traffic to be dynamically rerouted to any of your compute resources, which allows you to redirect traffic during maintenance or in the case of failure. Users can create their own networks, control traffic and connect servers and devices to one or more networks.The pluggable backend architecture lets users take advantage of commodity gear or advanced networking services from supported vendors.Administrators can take advantage of software-defined networking (SDN) technology like OpenFlow to allow for high levels of multi-tenancy and massive scale.OpenStack Networking has an extension framework allowing additional network services, such as intrusion detection systems (IDS), load balancing, firewalls and virtual private networks (VPN) to be deployed and managed.Open vSwitchOpen vSwitch is a production quality, multilayer virtual switch licensed under the open source Apache 2.0 license. It is designed to enable massive network automation through programmatic extension, while still supporting standard management interfaces and protocols (e.g. NetFlow, sFlow, SPAN, RSPAN, CLI, LACP, 802.1ag). In addition, it is designed to support distribution across multiple physical servers similar to VMware's vNetwork distributed vswitch or Cisco's Nexus 1000V. See the full feature list here
  • Deep Thought is a computer that was created by the pan-dimensional, hyper-intelligent species of beings (whose three dimensional protrusions into our universe are ordinary white mice) to come up with the Answer to The Ultimate Question of Life, the Universe, and Everything. Deep Thought is the size of a small city. When, after seven and a half million years of calculation, the answer finally turns out to be 42, Deep Thought admonishes Loonquawl and Phouchg (the receivers of the Ultimate Answer) that "[he] checked it very thoroughly, and that quite definitely is the answer. I think the problem, to be quite honest with you is that you've never actually known what the question was.”Deep Thought does not know the ultimate question to Life, the Universe and Everything, but offers to design an even more powerful computer, Earth, to calculate it. After ten million years of calculation, the Earth is destroyed by Vogons five minutes before the computation is complete.
  • http://www.benphoster.com/facebook-to-1-billion-users-i-predict-august-16-2012/
  • NoSQLIn computing, NoSQL (commonly interpreted as "not only SQL"[1]) is a broad class of database management systems identified by non-adherence to the widely used relational database management system model. NoSQL databases are not built primarily on tables, and generally do not use SQL for data manipulation.NoSQL database systems are often highly optimized for retrieval and appending operations and often offer little functionality beyond record storage (e.g. key–value stores). The reduced run-time flexibility compared to full SQL systems is compensated by marked gains in scalability and performance for certain data models.Apache CassandraThe Apache Cassandra database is the right choice when you need scalability and high availability without compromising performance. Linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure make it the perfect platform for mission-critical data. Cassandra's support for replicating across multiple datacenters is best-in-class, providing lower latency for your users and the peace of mind of knowing that you can survive regional outages.Cassandra's ColumnFamily data model offers the convenience of column indexes with the performance of log-structured updates, strong support for materialized views, and powerful built-in caching. Cassandra is in use at Netflix, Twitter, Urban Airship, Constant Contact, Reddit, Cisco, OpenX, Digg, CloudKick, Ooyala, and more companies that have large, active data sets. The largest known Cassandra cluster has over 300 TB of data in over 400 machines. HypertableHypertable is based on a design developed by Googl(e.g.BigTable clone) to meet their scalability requirements and solves the scale problem better than any of the other NoSQL solutions out there.Mongo DB RedisRedis is an open source, BSD licensed, advanced key-value store. It is often referred to as a data structure server since keys can contain strings, hashes, lists, sets and sorted sets.Riak
  • MapReduce is a programming model for processing large data sets, and the name of an implementation of the model by Google. MapReduce is typically used to do distributed computing on clusters of computersThe model is inspired by the map and reduce functions commonly used in functional programming, although their purpose in the MapReduce framework is not the same as their original forms.[3]MapReduce libraries have been written in many programming languages. A popular free implementation is Apache Hadoop.
  • Chukwa - http://incubator.apache.org/chukwa/Chukwa is a Hadoop subproject devoted to large-scale log collection and analysis. Chukwa is built on top of the Hadoop distributed filesystem (HDFS) and MapReduce framework and inherits Hadoop’s scalability and robustness. Chukwa also includes a flexible and powerful toolkit for displaying monitoring and analyzing results, in order to make the best use of this collected data.ZooKeeper - http://zookeeper.apache.org/ -
  • Marvin, the Paranoid Android, is a fictional character in The Hitchhiker's Guide to the Galaxy series by Douglas Adams. Marvin is the ship's robot aboard the starship Heart of Gold. Originally built as a failed prototype of Sirius Cybernetics Corporation's GPP (Genuine People Personalities) technology, Marvin is afflicted with severe depression and boredom, in part because he has a "brain the size of a planet"[1] which he is seldom, if ever, given the chance to use. Indeed, the true horror of Marvin's existence is that no task he could be given would occupy even the tiniest fraction of his vast intellect. Marvin claims he is 50,000 times more intelligent than a human,[2] (or 30 billion times more intelligent than a live mattress) though this is, if anything, a vast underestimation. When kidnapped by the bellicose Krikkit robots and tied to the interfaces of their intelligent war computer, Marvin simultaneously manages to plan the entire planet's military strategy, solve "all of the major mathematical, physical, chemical, biological, sociological, philosophical, etymological, meteorological and psychological problems of the Universe except his own, three times over," and compose a number of lullabies.
  • MeatCloud, Can’t Keep up with Cloud ComputingDevops & Agile IT PhilosophyScript Repetitive TasksAutomate, Automate, Automate
  • Other disciplines like back-up, log management, performance and security (virus,intrusion detection) are important but not core to the delivery of cloud computing systems
  • Ideally for the cloud you create management toolchains that automate the management of your cloud. So that the output of one tool informs the input of another.
  • Automated Toolchain(For Linux guests) Bootstrapped image is launched fro a template in the cloud provider, then searches for the Cobbler server.Post Install from Cobbler kicks off Puppet with defined management class to configure server using rolesAfter cobbler runs kicks off configuration management in Puppet. Then services can be started and stopped with RunDeck or post-install scriptsThen RunDeck can insert new hosts in Zenoss or NagiosFinally as the network conditions change Zenoss can remediate via other tools based on situational awareness
  • Applications Infrastructure
  • CactiCacti is a complete network graphing solution designed to harness the power of RRDTool's data storage and graphing functionality. Cacti provides a fast poller, advanced graph templating, multiple data acquisition methods, and user management features out of the box. All of this is wrapped in an intuitive, easy to use interface that makes sense for LAN-sized installations up to complex networks with hundreds of devices.RRDToolRRDtool is the OpenSource industry standard, high performance data logging and graphing system for time series data. RRDtool can be easily integrated in shell scripts, perl, python, ruby, lua or tcl applications.Graphite Graphite is a highly scalable real-time graphing system. As a user, you write an application that collects numeric time-series data that you are interested in graphing, and send it to Graphite's processing backend, carbon, which stores the data in Graphite's specialized database. The data can then be visualized through graphite's web interfaces.
  • These tools are all appropriate for Linux guest operating systems, Windows operating system provisioning is not well addressed in OSS. AxemblerProvisonrProvisionr solves the problem of cloud portability by hiding completely the APIs and only focusing on building a cluster that matches the same set of assumptions on all clouds, assumptions like: a specific OS, pre-installed packages and binaries, sane dns settings, ssh & vpn access etc. - think a solid foundation for configuration.As a secondary goal Provisionr will also provide primitives for building automatic or semi-automatic workflows for configuring and monitoring services, workflows that assume that all the machines share a common set of characteristics as described above.CobblerCobbler is a Linux installation server that allows for rapid setup of network installation environments. It glues together and automates many associated Linux tasks so you do not have to hop between lots of various commands and applications when rolling out new systems, and, in some cases, changing existing ones. With a simple series of commands, network installs can be configured for PXE, reinstallations, media-based net-installs, and virtualized installs (supporting Xen, qemu, KVM, and some variants of VMware). Cobbler uses a helper program called 'koan' (which interacts with Cobbler) for reinstallation and virtualization support. CrowbarBare metal provisioning for CloudStack developed by Dell using Opscode Chef. JujuMetal as a Service (MAAS)MAAS offers a nice UI to provision your Ubuntu servers. Each physical server (“node”) will be commissioned automatically on first boot. During the commissioning process administrators are able to configure hardware settings manually before an automated smoke test and burn-in test are done. Once commissioned, a node can be deployed on demand by name, or allocated to a queue for dynamic allocation to services being deployed on this MAAS.Salt Cloud Salt Cloud is a tool for provisioning salted minions across various cloud providers. Currently supported providers are:- Amazon EC2- GoGrid- HP Cloud (using OpenStack)- Joyent- Linode- OpenStack- Rackspace (using OpenStack)The salt-cloud command can be used to query configured providers, create VMs on them, deploy salt-minion on those VMs and destroy them when no longer needed.Salt Cloud requires Salt to be installed, but does not require any Salt daemons to be running. However, if used in a salted environment, it is best to run Salt Cloud on the salt-master, so that it can properly lay down salt keys when it deploys machines, and then properly remove them later. If Salt Cloud is run in this manner, minions will automatically be approved by the master; no need to manually authenticate them later.Deprecated SpacewalkSpacewalk manages software content updates for Red Hat derived distributions such as Fedora, CentOS, and Scientific Linux, within your firewall. You can stage software content through different environments, managing the deployment of updates to systems and allowing you to view at which update level any given system is at across your deployment. A clean central web interface allows viewing of systems and their software update status, and initiating update actions.
  • Salt - https://github.com/saltstack/salt
  • AnsibleAnsible's SSH-key based access allows contributors to the Fedora Project to assist in automating infrastructure while having access limited appropriately. Ansible is also used to roll out and manage clusters of machines and ISV software, such as Basho's flagship key-value store Riak.CapistranoCapistrano is a developer tool for deploying web applications. It is typically installed on a workstation, and used to deploy code from your source code management (SCM) to one, or more servers.Capistrano recently added classes capabilities that match cobbler. RunDeckRunDeck is cross-platform open source software that helps you automate ad-hoc and routine procedures in data center or cloud environments. RunDeck allows you to run tasks on any number of nodes from a web-based or command-line interface. RunDeck also includes other features that make it easy to scale up your scripting efforts including: access control, workflow building, scheduling, logging, and integration with external sources for node and option data.FuncFunc allows for running commands on remote systems in a secure way, like SSH, but offers several improvements. Func allows you to manage an arbitrary group of machines all at once. Func automatically distributes certificates to all "slave" machines. There's almost nothing to configure. Func comes with a command line for sending remote commands and gathering data. There are lots of modules already provided for common tasks. Anyone can write their own modules using the simple Python module API. Everything that can be done with the command line can be done with the Python client API. The hack potential is unlimited. You'll never have to use "expect" or other ugly hacks to automate your workflow. It's really simple under the covers. Func works over XMLRPC and SSL. Since func uses certmaster, any program can use func certificates, latch on to them, and take advantage of secure master-to-slave communication. There are no databases or crazy stuff to install and configure. Again, certificate distribution is automatic too. McollectiveThe Marionette Collective AKA mcollective is a framework to build server orchestration or parallel job execution systems.Mcollective is used as a means of programmatic execution of Systems Administration actions on clusters of servers. MCollective use modern tools like Publish Subscribe Middleware and modern philosophies like real time discovery of network resources using meta data and not hostnames. Delivering a very scalable and very fast parallel execution environment.ScalrScalr is a pretty darn good open source cloud management tool. It provides both an automation framework (do Foo when Bar) and a web interface (where is this volume mounted) for managing infrastructure on the cloud, like EC2.FEATURES* Integrated into Opscode Chef, for configuration management.* Pre-automated software, such as nginx, mysql, redis, mongo, and rabbitmq* Blazing fast UI* Multi-cloud* More at http://scalr.net/features/ROADMAP* http://wiki.scalr.net/Roadmap
  • NetFlix AWS Toolbag – http://netflix.github.comOver 25 projects developed by NetFlix to manager their cloud deployments. AsgardAsgard is a web-based tool for managing cloud-based applications and infrastructure.AstyanazAstyanax is a high level Java client for Apache Cassandra. Apache Cassandra is a highly available column oriented database.EddaEdda is a Service to track changes in your cloud deployments.EurekaEureka is a REST (Representational State Transfer) based service that is primarily used in the AWS cloud for locating services for the purpose of load balancing and failover of middle-tier servers.At Netflix, Eureka is used for the following purposes apart from playing a critical part in mid-tier load balancing.For aiding Netflix Asgard - an open source service which makes cloud deployments easier, inFast rollback of versions in case of problems avoiding the re-launch of 100's of instances which could take a long time.In rolling pushes, for avoiding propagation of a new version to all instances in case of problems.For our cassandra deployments to take instances out of traffic for maintenance.For our memcached caching services to identify the list of nodes in the ring.PriamPriam is a process/tool that runs alongside Apache Cassandra to automate the following:- Backup and recovery (Complete and incremental)- Token management- Seed discovery- ConfigurationSupport AWS environmentSimian ArmyThe Simian Army is a suite of tools for keeping your cloud operating in top form. Chaos Monkey, the first member, is a resiliency tool that helps ensure that your applications can tolerate random instance failures
  • OSCON 2013 - The Hitchiker’s Guide to Open Source Cloud Computing

    1. 1. The HitchHiker’s Guide to Open Source Cloud ComputingOSCON 2013 Mark R. Hinkle Sr. Director , OPEN SOURCE SOLUTIONS Citrix Systems INC. @mrhinkle mrhinkle@gmail.com
    2. 2. Mark Hinkle, Sr. Director, Open Source Solutions • Dedicated to the success of the Apache CloudStack, Open Daylight & Xen Project Communities on Citrix behalf • Run BuildACloud.org learning activities all over the world • Joined Citrix via Cloud.com acquisition July 2011 • Zenoss Core Open Source project to 100,000 users, 1.5 million downloads • Former LinuxWorld Magazine Editor-in-Chief • Open Management Consortium organizer • Author - “Windows to Linux Business Desktop Migration” – Thomson • NetDirector Project - Open Source Configuration Management • Sometimes Author and Blogger at SocializedSoftware.com • NetworkWorld Open Source Subnet Hitchhiker’s Guide to the Open Cloud by @mrhinkle
    3. 3. Slides On Line Slideshare: http://www.slideshare.net/socializ edsoftware Hitchhiker’s Guide to the Open Cloud by @mrhinkle
    4. 4. Quick Cloud Computing Overview or the Obligatory “What is the Cloud Explanation”
    5. 5. 60 Second Cloud Definitions Hitchhiker’s Guide to the Open Cloud by @mrhinkle USER CLOUD a.k.a. SOFTWARE AS A SERVICE FIVE CHARACTERISTICS OF CLOUD 1. On-Demand Self-Service 2. Broad Network Access 3. Resource Pooling 4. Rapid Elasticity 5. Measured Service DEVELOPMENT CLOUD a.k.a. PLATFORM-AS-A-SERVICE SYSTEMS CLOUD a.k.a INFRASTRUCTURE-AS-A-SERVICE
    6. 6. Building Open Source Clouds
    7. 7. Cloud Architecture Hitchhiker’s Guide to the Open Cloud by @mrhinkle
    8. 8. Hypervisors Open Source • Xen, Project Xen Cloud Platform (XCP) • KVM – Kernel-based Virtualization • VirtualBox* - Oracle supported Virtualization Solutions • OpenVZ* - Container-based, Similar to Solaris Containers or BSD Zones • LXC – User Space chrooted installs Proprietary • VMware • Citrix Xenserver (based • Microsoft Hyper-V • OracleVM (Based on OS Xen) Hitchhiker’s Guide to the Open Cloud by @mrhinkle
    9. 9. Open Virtual Machine Formats Open Virtualization Format (OVF) is an open standard for packaging and distributing virtual appliances or more generally software to be run in virtual machines. Formats for hypervisors/cloud technologies: • Amazon - AMI • KVM – QCOW2 • VMware – VMDK • Xen Project– IMG • VHD – Virtual Hard Disk - Hyper-V Hitchhiker’s Guide to the Open Cloud by @mrhinkle
    10. 10. Sourcing Cloud Appliances Tool/Project What you can do with them Bitnami BitNami provides free, ready to run environments for your favorite open source web applications and frameworks, including Drupal, Joomla!, Wordpress, PHP, Rails, Django and many more. Boxgrinder BoxGrinder is a set of projects that help you grind out appliances for multiple virtualization and Cloud providers Oz Command-line tool that has the ability to create images for common Linux distributions to run on KVM SUSE Studio SUSE Studio supports building and deploying directly to cloud services such as Amazon EC2. Hitchhiker’s Guide to the Open Cloud by @mrhinkle
    11. 11. Scale-Up or Scale-Out Vertical Scaling (Scale-Up) Allocate additional resources to VMs, requires a reboot, no need for distributed app logic, single-point of OS failure Horizontal Scaling (Scale-Out) Application needs logic to work in distributed fashion (e.g. HA-Proxy and Apache, Hadoop) Hitchhiker’s Guide to the Open Cloud by @mrhinkle
    12. 12. Compute Clouds (IaaS) Year Started License Virtualization Technologies Apache CloudStack 2008 Apache Xenserver, Xen Cloud Platform, KVM, VMware (Hyper-V developing) Eucalyptus 2006 GPL Xen, KVM, VMware (commercial version) OpenNebula 2005 Apache Xen, KVM, VMware OpenStack 2010 (Developed by NASA by Anso Labs previously) Apache VMware ESX and ESXi, , Xen, Xen Cloud Platform KVM, LXC, QEMU and Virtual Box Hitchhiker’s Guide to the Open Cloud by @mrhinkle
    13. 13. OpenStack – Ecosystem of Projects Enterprise Message Queue based on Rabbit MQ (ESB) Object Storage “Swift” Image Service “Glance ” Compute “Nova” Dashboard “Horizon” KVM, VMware, Xen Cloud Platform Ceph, Gluster Advanced Cloud and Networking services accessing the Quantum API Firewall Service Gateway Service QuantumNetworkingFabric RESTAPI Plugins OpenvSwitch Quantum Plugin-ins IdentityServices“Keystone” API Hitchhiker’s Guide to the Open Cloud by @mrhinkle
    14. 14. Cloud APIs • jclouds • libcloud • deltacloud • fog Hitchhiker’s Guide to the Open Cloud by @mrhinkle
    15. 15. Cloud Computing Storage Project Description Ceph Distributed file storage system developed by DreamHost GlusterFS Scale Out NAS system aggregating storage over Ethernet or Infiniband OpenStack Storage Long-term object storage system Riak CS Riak CS is open source software designed to provide simple, available, distributed cloud storage at any scale. Riak CS is S3- API compatible and supports per-tenant reporting for billing and metering use cases. Sheepdog Distributed storage for KVM hypervisors Hitchhiker’s Guide to the Open Cloud by @mrhinkle
    16. 16. Platform-as-a-Service (PaaS) Project Year Started Sponsors Languages/Frameworks CloudFoundry 2011 VMware Spring for Java, Ruby for Rails and Sinatra, node.js, Grails, Scala on Lift and more via partners (e.g. Python, PHP) Cloudify 2012 Gigaspaces [Groovy for deployment recipes] OpenShift 2011 Red Hat Java, Ruby, PHP, Perl and Python Stackato 2012 ActiveState Java, Python, PHP, Ruby, Perl, Node.js, others WSO2 Stratus 2010 WSO2 Jboss, Java EE6 Hitchhiker’s Guide to the Open Cloud by @mrhinkle
    17. 17. What’s Coming…the Rise of LXC • Platform-as-a-Service (PaaS) Sounds Good but…. • Gives us a Standard Payload Container for Linux-based workloads • You Can Run LXC on a Virtualized Environment or Natively • There are already huge numbers of tools that can manage LXC. • SELinux provides a proven security model users are already familiar with. Hitchhiker’s Guide to the Open Cloud by @mrhinkle
    18. 18. Software Defined Networking (SDN)
    19. 19. Overview of Software Defined Networking Business Applications Network Services Network DevicesNetwork DevicesNetwork Devices Network DevicesNetwork DevicesNetwork Devices Hitchhiker’s Guide to the Open Cloud by @mrhinkle
    20. 20. Cloud Promise, Reality and Networks Cloud Promise Cloud Reality Centralized Configuration and Automation Without true virtualization, network devices must still be manually configured. Instant Self-Service Provisioning In a physical network, it could take a long time for network engineer to provision new services. Elasticity and Scalability By horizontally scaling up the physical network, elasticity is lost. Designed for Failure Failover can be automated and physical network limitations can be alleviated. Hitchhiker’s Guide to the Open Cloud by @mrhinkle
    21. 21. Open Flow OpenFlow enables networks to evolve, by giving a remote controller the power to modify the behavior of network devices, through a well-defined "forwarding instruction set". The growing OpenFlow ecosystem now includes routers, switches, virtual switches, and access points from a range of vendors. Hitchhiker’s Guide to the Open Cloud by @mrhinkle
    22. 22. Software Defined Networking (SDN) Project Description Floodlight The Floodlight controller is an enterprise-class, Apache-licensed, Java-based OpenFlow Controller. Indigo Indigo is an open source project to support OpenFlow on a range of physical switches. By leveraging hardware features of Ethernet switch ASICs, Indigo supports high rates for high port counts, up to 48 10-gigabit ports. Multiple gigabit platforms with 10-gigabit uplinks are also supported. Open Daylight Linux Foundation Collaborative Project based on Cisco One Controller and plugins from numerous vendors in development. E.g IBM DOVE OpenStack “Quantum” Networking Pluggable, scalable, API-driven network and IP management Open vSwitch Open vSwitch is a open source (ASL 2.0), multilayer virtual switch designed to enable massive network automation through programmatic extension, while still supporting standard management interfaces and protocols (e.g. NetFlow, sFlow, SPAN, RSPAN, CLI, LACP, 802.1ag). Hitchhiker’s Guide to the Open Cloud by @mrhinkle
    23. 23. Big Data
    24. 24. 1 Billion Facebook Users - October 2012 0 200 400 600 800 1000 1200 Dec-04 Mar-05 Jun-05 Sep-05 Dec-05 Mar-06 Jun-06 Sep-06 Dec-06 Mar-07 Jun-07 Sep-07 Dec-07 Mar-08 Jun-08 Sep-08 Dec-08 Mar-09 Jun-09 Sep-09 Dec-09 Mar-10 Jun-10 Sep-10 Dec-10 Mar-11 Jun-11 Sep-11 Dec-11 Mar-12 Jun-12 Sep-12 Hitchhiker’s Guide to the Open Cloud by @mrhinkle
    25. 25. Data is growing faster than storage capacity and computing power. Legacy systems hold organizations back; storage software must include multi-petabyte capacity, support potentially billions of objects, and provide application performance awareness and agile provisioning. -Gartner, Big Data Challenges for the IT Infrastructure Team Big Data and Storage Infrastructure Hitchhiker’s Guide to the Open Cloud by @mrhinkle
    26. 26. Open Source NoSQL Databases Name Type Description Apache Cassandra Wide Column Store/Families API: many » Query Method: MapReduce, Replicaton: , Written in: Java, Concurrency: eventually consistent , Misc: like "Big-Table on Amazon Dynamo alike", initiated by Facebook CouchDB Document Store API: Memcached API+protocol (binary and ASCII) , most languages, Protocol: Memcached REST interface for cluster conf + management, Written in: C/C++ + Erlang (clustering), Replication: Peer to Peer, fully consistent, Misc: Transparent topology changes during operation, provides memcached-compatible caching buckets HBase Wide Column Store/Families API: Java / any writer, Protocol: any write call, Query Method: MapReduce Java / any exec, Replication: HDFS Replication, Written in: Java Hypertable Wide Column Store/Families PI: Thrift (Java, PHP, Perl, Python, Ruby, etc.), Protocol: Thrift, Query Method: HQL, native Thrift API, Replication: HDFS Replication, Concurrency: MVCC, Consistency Model: Fully consistent Misc: High performance C++ implementation of Google's Bigtable. MongoDB Document Store API: BSON, Protocol: C, Query Method: dynamic object-based language & MapReduce, Replication: Master Slave & Auto-Sharding, Written in: C++,Concurrency Redis Key Value/ Tuple Store API: Tons of languages, Written in: C, Concurrency: in memory and saves asynchronous disk after a defined time. Append only mode available. Different kinds of fsync policies. Replication: Master / Slave, Misc: also lists, sets, sorted sets, hashes, queues. Riak Key Value / Tuple Store API: JSON, Protocol: REST, Query Method: MapReduce term matching , Scaling: Multiple Masters; Written in: Erlang, Concurrency: eventually consistent (stronger then MVCC via Vector Clocks) Hitchhiker’s Guide to the Open Cloud by @mrhinkle
    27. 27. MapReduce Problem Data Master Node Worker Node 1 Worker Node 2 Worker Node 3 Solution Data Hitchhiker’s Guide to the Open Cloud by @mrhinkle
    28. 28. Apache Hadoop Overview • Handles large amounts of data • Stores data in native format • Delivers linear scalability at low cost • Resilient in case of infrastructure failures • Transparent application scalability Facts • Apache top-level open source project • One framework for storage and compute – HDFS – Scalable storage in Hadoop Distributed File System (HDFS) – Compute via the MapReduce distributed processing platform • Domain Specific Language (DSL) - Java Hitchhiker’s Guide to the Open Cloud by @mrhinkle
    29. 29. Hadoop Architecture Hadoop Common HDFS Distributes & replicates data across machines MapReduce Distributes & monitors tasks Hive Data warehouse that provides SQL interface. Ad hoc projection of data structure to unstructured • • HBase Column-oriented schema-less distributed DB modeled after Google’s BigTable Random real time read/write. Pig Platform for manipulating and analyzing large data sets. Scripting language for analysts. Mahout Machine learning libraries for recommendations , clustering, classifications and item sets. ChuckwaZookeeper Hitchhiker’s Guide to the Open Cloud by @mrhinkle
    30. 30. Big Data Summary • Quantity of Machine Created Data Increasing Drastically (examples: networked sensor data from mobile phones and GPS devices) • Data manipulation moving from batched to real-time • Cloud services giving everyone Big Data tools • Consumer company speed and scale requirements driving efficiencies in Big Data storage and analytics • New and broader number of data sources being meshed together • Big Data Apps means using Big Data is faster and easier Hitchhiker’s Guide to the Open Cloud by @mrhinkle
    31. 31. Cloud Management Tools
    32. 32. Automation in the Cloud Hitchhiker’s Guide to the Open Cloud by @mrhinkle
    33. 33. 4 Types of Management Tools Provisioning Installation of operating systems and other software Configuration Management Sets the parameters for servers, can specify installation parameters Orchestration/Automation Automate tasks across systems Monitoring Records errors and health of IT infrastructure Hitchhiker’s Guide to the Open Cloud by @mrhinkle
    34. 34. Management Toolchains Configuration Patching and Provisioning Monitoring Hitchhiker’s Guide to the Open Cloud by @mrhinkle
    35. 35. Conceptual Automated Toolchain BootStrapped Image CloudStack OpenStack Configuration Puppet Chef Start/Stop Services RunDeck Capistrano MCollective Provision Cobbler SUSE Stuido Monitoring Nagios Zenoss Cacti Generate Images SUSE Studio BoxGrinder Hitchhiker’s Guide to the Open Cloud by @mrhinkle
    36. 36. Hitchhiker’s Guide to the Open Cloud by @mrhinkle
    37. 37. Goodbye and Thanks for All the Fish! Hitchhiker’s Guide to the Open Cloud by @mrhinkle Slides Can be Viewed and Downloaded at: http://www.slideshare.net/socializedsoftware/
    38. 38. Contact Me Hitchhiker’s Guide to the Open Cloud by @mrhinkle
    39. 39. Appendix (A.K.A Great Stuff I didn’t say)
    40. 40. Additional Resources • Devops Toolchains Group • Software Defined Networking: The New Norm for Networks (Whitepaper) • DevOps Wikipedia Page • NoSQL-Database.org – Ultimate Guide to the Non-Relational Universe • Open Cloud Initiative • NIST Cloud Computing Platform • Open Virtualization Format Specs • Clouderati Twitter Account • Planet DevOps • Nicira Whitepaper – It’s Time to Virtualize the Network • Why Open vSwitch FAQ Hitchhiker’s Guide to the Open Cloud by @mrhinkle
    41. 41. Big Data Landscape Hitchhiker’s Guide to the Open Cloud by @mrhinkle
    42. 42. Monitoring Tools License Type of Monitoring Collection Methods Cacti / RRDTool GPL Performance SNMP, syslog Graphite Apache 2.0 Performance Agent Nagios GPL Availability SNMP,TCP, ICMP, IPMI, syslog Zabbix GPL Availability/ Performance and more SNMP, TCP/ICMP, IPMI, Synthetic Transactions Zenoss GPL Availability, Performance, Event Management SNMP, ICMP, SSH, syslog, WMI Hitchhiker’s Guide to the Open Cloud by @mrhinkle
    43. 43. Provisioning Project Installation Targets Apache Provisionr(incubating) Can provision 10s to 1000s of machines on various clouds. Cobbler Distributed virtual infrastructure using koan (kickstart of a network to PXE boot VMs) for Red Hat, OpenSUSE Fedora, Debian, Ubuntu VMs Crowbar (Bare metal provisioning) JuJu Public Clouds - Amazon Web Services HP Cloud, Private OpenStack clouds, Bare Metal via MAAS. Salt Cloud Tool to provision “salted” VMs that can then be updated by a central server via ZeroMQ Hitchhiker’s Guide to the Open Cloud by @mrhinkle
    44. 44. Configuration Management Tools Project Year Started Language License Client/Server Cfengine 1993 C Apache Yes Chef 2009 Ruby Apache Chef Solo – No Chef Server - Yes Puppet 2004 Ruby GPL Yes & standalone Salt 2011 Python Apache yes Hitchhiker’s Guide to the Open Cloud by @mrhinkle
    45. 45. Automation/Orchestration Tools Project Description Ansible Ansible's SSH-key based access allows contributors to the Fedora Project to assist in automating infrastructure while having access limited appropriately. Capistrano Utility and framework for executing commands in parallel on multiple remote machines, via SSH. It uses a simple DSL that allows you to define tasks, which may be applied to machines in certain roles RunDeck Rundeck is an open-source process automation and command orchestration tool with a web console. Func Func provides a two-way authenticated system for generically executing tasks, integrations with puppet and cobbler. MCollective The Marionette Collective AKA MCollective is a framework to build server orchestration or parallel job execution systems. Salt Execute arbitrary shell commands or choose from dozens of pre-built modules of common (or complex) commands. Scalr Provide scaling across multiple cloud computing platforms, integrates with Chef. Hitchhiker’s Guide to the Open Cloud by @mrhinkle
    46. 46. NetFlix Open Source ToolBag for AWS ASGARD ASTYANAX EDDA EUREKA PRIAM SIMIAN ARMY Hitchhiker’s Guide to the Open Cloud by @mrhinkle