This document provides a summary of Mark Hinkle's presentation on open source cloud computing technologies. It includes an agenda covering vetting open source projects, virtualization, infrastructure as a service, platform as a service and SDN. It then discusses various open source projects for virtualization, IaaS including OpenStack, PaaS including CloudFoundry, container technologies like Docker and Kubernetes, storage options like Ceph and GlusterFS, and SDN projects like OpenFlow and Open vSwitch.
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdf
InteropNY/CloudConnect 2014 - Quick Crash Course in Open Source Cloud Computing
1. Crash Course In
Open Source Cloud
Computing
Mark Hinkle
Senior Director, Open Source Solutions
Citrix Inc.
mark.hinkle@citrix.com
mrhinkle@gmail.com
@mrhinkle
Last updated: 10/1/2014
2. By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
ABOUT ME
I Help Build Open Source Ecosystems
Open Source Experience
• Manage Citrix Open Source Business Office
• Apache CloudStack Committer and PMC Member
• Advisory boards Gluster and Xen Project
• Joined Citrix via Cloud.com acquisition July 2011
• Zenoss Core open source project to 100,000 users,
1.5 million downloads
• Former LinuxWorld Magazine Editor-in-Chief
• Open Management Consortium organizer
• Author - “Windows to Linux Business Desktop
Migration” – Thomson
• NetDirector Project - Open Source Configuration
Management
INTEROP NY 2014 - Crash Course in Open Source Cloud Computing
3. http://www.slideshare.net/socializedsoftwar
e
Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made.
You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
ShareAlike — If you remix, transform, or build upon the material, you must distribute your contributions under the
same license as the original.
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
Slides Available on Slideshare:
Creative Commons Attributions-ShareAlike 4.0 International
Share — copy and redistribute the material in any medium or format
Adapt — remix, transform, and build upon the material
for any purpose, even commercially.
The licensor cannot revoke these freedoms as long as you follow the license terms.
INTEROP NY 2014 - Crash Course in Open Source Cloud Computing
4. By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
AGENDA
•Vetting Open Source Projects
•Virtualization
•Infrastructure-as-a-Service
•Platform-as-a-Service
•SDN
INTEROP NY 2014 - Crash Course in Open Source Cloud Computing
5. …the future of technological innovation is not stealing limited
resources away from one another, but creating new resources
— and new opportunities to create new resources — together in
a rich ecosystem.”
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
OPEN SOURCE ISN’T
A ZERO-SUM GAME
Allison Randal
Open Source Hacker
Former OSCON Program Chair
@allisonrandal
INTEROP NY 2014 - Crash Course in Open Source Cloud Computing
6. • Code Velocity
• Committers
• Committer Reputation
• User-driven or Vendor-Driven
Innovation
• User Activity
• Corporate Support*
• Reputation of Foundation*
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
VETTING OPEN SOURCE PROJECTS
How can you tell if they’re Legit
INTEROP NY 2014 - Crash Course in Open Source Cloud Computing
8. By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
VIRTUALIZATION
Carving up compute resources
OPEN SOURCE
• Xen Project
• Citrix XenServer
• KVM
• VirtualBox
• OpenVZ
• LXC
PROPRIETARY
• VMware
• Microsoft Hyper-V
• OracleVM (Based on Xen Project)
INTEROP NY 2014 - Crash Course in Open Source Cloud Computing
9. By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
HYPERVISORS AND CONTAINERS
Differences in virtualization
Type 1 Hypervisors
VMware, Xen Project, Hyper-V
Type 2 Hypervisors
KVM, VirtualBox
INTEROP NY 2014 - Crash Course in Open Source Cloud Computing
Containers
LXC
10. • Lets your run a Linux system within
• A container is a group of processes on a
Linux box, put together the provide an
isolated environment
• From the inside, it looks like a VM
• Externally it looks like normal processes
• “chroot on steroids”
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
LINUX CONTAINERS (LXC)
“Lightweight” Linux Virtualization
another Linux system
INTEROP NY 2014 - Crash Course in Open Source Cloud Computing
11. • Different file formats for virtual machines
• VMware uses vmdk file format, Xen and Hyper-
V use VHD, KVM uses Raw or QCOW2
• Guest images may be “processor architecture”
• VMware and Xen can manage SCSI devices, but
• KVM and Xen can use virtio drivers but not
• VMware uses a proprietary agent inside the
guest OS (VMware tools) which does not work
with Xen or KVM
• Xen uses VirtIo and ParaVirtualized drivers, Xen
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
THE PORTABILITY PROBLEM
Containers compared to Hardware Virtualization
bound
KVM cannot
VMware
uses
INTEROP NY 2014 - Crash Course in Open Source Cloud Computing
12. • Code – Application is stored
• Build – Code is built (Jenkins)
• Test – Unit tests are
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
CONTINUOUS INTEGRATION
Rebuild Applications on any Cloud and/or Virtualized Infrastructure
in a repository
(Subversion,Git)
automated (Jenkins)
• Deploy – Deploy code to
server various ways
Code
Build
Test
Deploy
Thoughtworks Go – Open Source
Continuous Deliver System
INTEROP NY 2014 - Crash Course in Open Source Cloud Computing
13. Docker is an open-source project to easily
create lightweight, portable, self-sufficient
containers from any application. The same
container that a developer builds and tests
on a laptop can run at scale, in production,
on VMs, bare metal, public clouds and
more.
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
DOCKER CONTAINER PACKAGING
Open source LXC Packaging Engine
To learn more please visit:
www.docker.io
INTEROP NY 2014 - Crash Course in Open Source Cloud Computing
14. • Compliment to LXC not a replacement
• Managed daemonized processes on Linux
• Create ability to re-use and manage similar
• Content agnostic
• Hardware agnostic
• Easy to automate
• Integrated with other tools: Chef, OpenShift,
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
WHAT IS DOCKER
System for Managing and Deploying LXC Containers
using LXC
applications
Puppet, VMware, etc.
INTEROP NY 2014 - Crash Course in Open Source Cloud Computing
15. Kubernetes builds on top of Docker to
construct a clustered container scheduling
service. Kubernetes enables users to ask
a cluster to run a set of containers. The
system will automatically pick worker
nodes to run those containers on, which
we think of more as "scheduling" than
"orchestration”
To learn more please visit:
https://github.com/GoogleCloudPlatform/kubernetes Greek for Shipmaster
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
KUBERNETES
Container Cluster Management – Scheduler
INTEROP NY 2014 - Crash Course in Open Source Cloud Computing
16. Apache Mesos is a cluster manager that simplifies the
complexity of running applications on a shared pool of
servers. Largely supported by Twitter, used by LinkedIn,
AirBNB too.
Features
• Fault-tolerant replicated master using ZooKeeper
• Scalability to 10,000s of nodes
• Isolation between tasks with Linux Containers
• Multi-resource scheduling (memory and CPU aware)
• Java, Python and C++ APIs for developing new
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
APACHE MESOS
One to many tools for managing large numbers of devices
parallel applications
• Web UI for viewing cluster state
To learn more please visit:
http://mesos.apache.org/
INTEROP NY 2014 - Crash Course in Open Source Cloud Computing
17. Project Year Started License Virtualization
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
INFRASTRUCTURE-AS-A-SERVICE
Compute Orchestration
INTEROP NY 2014 - Crash Course in Open Source Cloud Computing
Technologies
Apache
CloudStack
2008 Apache (Bare Metal), Xenserver,
KVM, LXC VMware Hyper-
V
HP Eucalyptus 2006 GPL Xen, KVM, VMware
(commercial version)
OpenNebula 2005 Apache Xen, KVM, VMware
OpenStack 2010 (Developed by
NASA by Anso Labs
previously)
Apache VMware ESX and ESXi, ,
Xen, XenServer, KVM,
LXC, QEMU and Virtual
Box
18. By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
OPENSTACK
The Boy Band of the Open Source Cloud
INTEROP NY 2014 - Crash Course in Open Source Cloud Computing
19. By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
OPENSTACK SHARED SERVICES
Span Compute, Storage and Networking
IDENTITY
SERVICE
IMAGE
SERVICE
TELEMETRY
SERVICE
INTEROP NY 2014 - Crash Course in Open Source Cloud Computing
ORCHESTRATION
SERVICE
20. • Trove
Database Service
• Ironic
Bare Metal (Ironic)
• Marconi
Queue Service
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
EVEN MORE OPENSTACK PROJECTS
Span Compute, Storage and Networking
• Cinder
Block Storage Service
• Ceilometer
Metering/Monitoring
• Heat
Orchestration
INTEROP NY 2014 - Crash Course in Open Source Cloud Computing
21. “OpenStack is not a product. If you are building a large infrastructure, it’s
more like a tool kit. It gives you a lot of technologies that do take a lot of
effort to integrate.”
Chris Kemp, OpenStack Board Member and Co-Founder
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
OPENSTACK SOLUTION PROVIDERS
If you can’t do it yourself
INTEROP NY 2014 - Crash Course in Open Source Cloud Computing
CEO of Piston Computing
22. • Deltacloud(ruby)
• Daisein(java)
• Jclouds(java)
• Libcloud(python)
• Fog(ruby)
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
CLOUD APIS
Everything (should) have an API in the Cloud
INTEROP NY 2014 - Crash Course in Open Source Cloud Computing
23. Project Description
Ceph Distributed file storage system developed by DreamHost ->
GlusterFS Scale Out NAS system aggregating storage over Ethernet or
Riak CS Riak CS is open source software designed to provide simple,
available, distributed cloud storage at any scale. Riak CS is S3-
API compatible and supports per-tenant reporting for billing and
metering use cases. (object)
Sheepdog Distributed storage for KVM hypervisors, distributed iSCSI
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
CLOUD STORAGE
Virtualized, Distributed usually on Commodity Hardware
InkTank -> Red Hat (block, object, file)
Infiniband (file)
OpenStack
Storage
Long-term object storage system (object)
INTEROP NY 2014 - Crash Course in Open Source Cloud Computing
24. Project Sponsors Languages/Frameworks
Spring for Java, Ruby for Rails and
Sinatra, node.js, Grails, Scala on
Lift and more via partners (e.g.
Python, PHP)
Cloudify Gigaspaces [Groovy for deployment recipes]
OpenShift Origin Red Hat Java, Ruby, PHP, Perl and Python
Apache Stratos WSO2 - >Apache Stratus PHP, Tomcat, MySQL “cartridges”
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
PLATFORM-AS-A-SERVICE
Abstracted Cloud-Scale Run-Time Environments
CloudFoundry VMware -> Pivotal -> CloudFoundry
Foundation
INTEROP NY 2014 - Crash Course in Open Source Cloud Computing
25. SOFTWARE DEFINED NETWORKING(SDN)
Virtualization meets the network
Decoupling of the control and data planes of the network to
improve efficiency. Communication from a SDN controller via a
protocol to network devices both physical and virtual.
Abstractions allow for programmable networks.
Network can be changed quickly via a controller
Network offerings can match virtualization offerings for finer
grained security in a highly volatile compute landscape.
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
Automation
Dynamic Networks
Security
Heterogeneous Management
Single control point for various devices.
INTEROP NY 2014 - Crash Course in Open Source Cloud Computing
26. API API
Network Services
Control Data Plane Interface (e.g. OpenFlow)
SDN OVERVIEW
Network Devices Network Devices Network Devices
Network Devices Network Devices Network Devices
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
Business Applications
SDN
Control
Software
Application
Layer
Control
Layer
Infrastructure
Layer
INTEROP NY 2014 - Crash Course in Open Source Cloud Computing
27. OpenFlow enables networks to
evolve, by giving a remote
controller the power to modify
the behavior of network
devices, through a well-defined
"forwarding instruction set".
The growing OpenFlow
ecosystem now includes
routers, switches, virtual
switches, and access points
from a range of vendors.
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
OPENFLOW
Virtualization meets the network
INTEROP NY 2014 - Crash Course in Open Source Cloud Computing
28. OPEN SOURCE SDN
Software Defined Network Controllers and more
Floodlight The Floodlight Open SDN Controller is an enterprise-class, Apache-licensed, Java-based OpenFlow
Controller. It is supported by a community of developers including a number of engineers from Big Switch
Networks. - See more at: http://www.projectfloodlight.org/floodlight/#sthash.9IhA1Ih5.dpuf
Indigo Indigo is an open source project aimed at enabling support for OpenFlow on physical and hypervisor
switches. Big Switch has helped numerous companies OpenFlow enable their equipment, and we
provide firmware for a number of popular switches. Indigo is the basis of Switch Light by Big Switch
Networks. - See more at: http://www.projectfloodlight.org/indigo/#sthash.K7LiHcqc.dpuf
Lincx LINCX is a pure OpenFlow software switch written in Erlang. It runs within a separate domain under Xen
Nox NOX is the original OpenFlow controller, and facilitates development of fast C++ controllers on Linux.
Open Daylight Linux Foundation Collaborative Project based on Cisco One Controller and plugins from numerous
Open vSwitch Open vSwitch is a open source (ASL 2.0), multilayer virtual switch designed to enable massive network
automation through programmatic extension, while still supporting standard management interfaces and
protocols (e.g. NetFlow, sFlow, SPAN, RSPAN, CLI, LACP, 802.1ag).
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
Project Description
hypervisor using LING (erlangonxen.org).
vendors in development. E.g IBM DOVE
INTEROP NY 2014 - Crash Course in Open Source Cloud Computing
29. Open vSwitch is a production quality,
multilayer virtual switch licensed under the
open source Apache 2.0 license. It is
designed to enable massive network
automation through programmatic extension,
while still supporting standard management
interfaces and protocols (e.g. NetFlow, sFlow,
SPAN, RSPAN, CLI, LACP, 802.1ag).
To learn more please visit our website:
http://openvswitch.org/
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
OPEN VSWITCH
INTEROP NY 2014 - Crash Course in Open Source Cloud Computing
30. DevOps
Toolchain
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
OPEN SOURCE CLOUD STACK
Platform-as-a-Service – CloudFoundry, OpenShift,
Gigaspaces
Mesos Kubernetes
Orchestration
Docker
(OpenStack, Apache CloudStack, Eucalyptus)
Compute
(LXC (CoreOS),
KVM, Xen)
Storage
(Ceph, Gluster)
Networking
(OpenDaylight,
Contrail)
INTEROP NY 2014 - Crash Course in Open Source Cloud Computing
Orchestration
-
Ansible/SaltStack/Scalr*
Configuration
Management
(Cfengine/Chef/Puppet)
Monitoring
(logstash,graphite,)
31. By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
CONTACT ME
Happy to Chat about Open Source, Cloud or Pittsburgh Sports
Professional: mark.hinkle@citrix.com
Personal: mrhinkle@gmail.com
Phone: 919.228.8049
Professional: http://open.citrix.com
Personal: http://www.socializedsoftware.com
Twitter: @mrhinkle
INTEROP NY 2014 - Crash Course in Open Source Cloud Computing
32. By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
APPENDIX A
Additional Links to related stuff
INTEROP NY 2014 - Crash Course in Open Source Cloud Computing
33. ADDITIONAL LINKS
• Devops Toolchains Group
• Software Defined Networking: The New Norm for Networks
(Whitepaper)
• DevOps Wikipedia Page
• NoSQL-Database.org – Ultimate Guide to the Non-Relational Universe
• Open Cloud Initiative
• NIST Cloud Computing Platform
• Open Virtualization Format Specs
• Clouderati Twitter Account
• Planet DevOps
• Nicira Whitepaper – It’s Time to Virtualize the Network
• Why Open vSwitch FAQ
• Stanford Seminar - Software-Defined Networking at the Crossroads
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
INTEROP NY 2014 - Crash Course in Open Source Cloud Computing
34. By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
ADDITIONAL LINKS (CONT’D)
• SDN, NFV, and open source: The Operator’s View
• Puppet Labs: Build a Toolbox for Continuous Delivery
INTEROP NY 2014 - Crash Course in Open Source Cloud Computing
35. By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
APPENDIX B
Stuff I’d liked to have talked about
but didn’t have time
INTEROP NY 2014 - Crash Course in Open Source Cloud Computing
36. By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
60 SECOND CLOUD DEFINITION
Just because Software Marketing Guys Think it’s the Internet
5 CHARACTERISTICS OF CLOUD
1. On-Demand Self-Service
2. Broad Network Access
3. Resource Pooling
4. Rapid Elasticity
5. Measured Service
User Cloud a.k.a.
SOFTWARE-AS-A-SERVICE
Developer Cloud a.k.a.
PLATFORM-AS-A-SERVICE
Systems Cloud a.k.a.
INFRASTRUCTURE-AS-A-SERVICE
INTEROP NY 2014 - Crash Course in Open Source Cloud Computing
37. ZooKeeper is a centralized service for
maintaining configuration information,
naming, providing distributed
synchronization, and providing group
services. All of these kinds of services
are used in some form or another by
distributed applications
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
APACHE ZOOKEEPER
Centralized Server to Service Distributed Apps
To learn more please visit:
http://zookeeper.apache.org/
INTEROP NY 2014 - Crash Course in Open Source Cloud Computing
37
38. By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
SCALE-UP SCALE OUT
Elasticity and the cloud
Vertical Scaling (Scale-Up)
Allocate additional resources to
VMs, requires a reboot, no need for
distributed app logic, single-point of
OS failure
Horizontal Scaling (Scale-Out)
Application needs logic to work in
distributed fashion (e.g. HA-Proxy
and Apache Hadoop)
INTEROP NY 2014 - Crash Course in Open Source Cloud Computing
39. Formats for hypervisors/cloud
technologies:
• Amazon - AMI
• KVM – QCOW2
• VMware – VMDK
• Xen Project– IMG
• Hyper-V - VHD – Virtual Hard Disk
• LXC – local file system/mount point -
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
OPEN VIRTUALIZATION FORMATS
Virtualization Payloads
Open Virtualization
Format (OVF) is an
open standard for
packaging and
distributing virtual
appliances or more
generally software to
be run in virtual
machines.
Docker*
INTEROP NY 2014 - Crash Course in Open Source Cloud Computing
40. Bitnami BitNami provides free, ready to run environments for your favorite open source
web applications and frameworks, including Drupal, Joomla!, Wordpress, PHP,
Rails, Django and many more.
Boxgrinder BoxGrinder is a set of projects that help you grind out appliances for multiple
Oz Command-line tool that has the ability to create images for common Linux
SUSE Studio SUSE Studio supports building and deploying directly to cloud services such as
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
SOURCING CLOUD APPLIANCES
Packaging Engines for VMs
Tool/Project What you can do with them
virtualization and Cloud providers
distributions to run on KVM
Amazon EC2.
INTEROP NY 2014 - Crash Course in Open Source Cloud Computing
41. CLOUD MONITORING TOOLS
Tools with features for monitoring cloud infrastructure
Project Type of Monitoring Collection Methods
Cacti / RRDTool Performance SNMP, syslog
Nagios Availability SNMP,TCP, ICMP, IPMI,
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
Graphite Performance Agent
INTEROP NY 2014 - Crash Course in Open Source Cloud Computing
syslog
Sensu Availability Agent
Zabbix Availability/ Performance and more SNMP, TCP/ICMP, IPMI,
Synthetic Transactions
Zenoss Availability, Performance, Event
Management
SNMP, ICMP, SSH, syslog,
WMI
Hitchhiker’s Guide to the
Open Cloud by @mrhinkle
41
42. CLOUD PROVISIONING TOOLS
Packaging Engines for VMs
Can provision 10s to 1000s of machines on various clouds.
Cobbler Distributed virtual infrastructure using koan (kickstart of a network to PXE
boot VMs) for Red Hat, OpenSUSE Fedora, Debian, Ubuntu VMs
Salt Cloud Tool to provision “salted” VMs that can then be updated by a central server
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
Project Installation Targets
Apache Provisionr
(incubating)
Crowbar (Bare metal provisioning)
JuJu Public Clouds - Amazon Web Services HP Cloud,
Private OpenStack clouds, Bare Metal via MAAS.
via ZeroMQ
INTEROP NY 2014 - Crash Course in Open Source Cloud Computing
Hitchhiker’s Guide to the
Open Cloud by @mrhinkle
42
43. By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
BIG DATA
INTEROP NY 2014 - Crash Course in Open Source Cloud Computing
44. API: many » Query Method: MapReduce, Replicaton: , Written in: Java, Concurrency: eventually
consistent , Misc: like "Big-Table on Amazon Dynamo alike", initiated by Facebook
CouchDB Document Store API: Memcached API+protocol (binary and ASCII) , most languages, Protocol: Memcached REST interface
for cluster conf + management, Written in: C/C++ + Erlang (clustering), Replication: Peer to Peer, fully
consistent, Misc: Transparent topology changes during operation, provides memcached-compatible
caching buckets
API: Java / any writer, Protocol: any write call, Query Method: MapReduce Java / any exec, Replication:
HDFS Replication, Written in: Java
PI: Thrift (Java, PHP, Perl, Python, Ruby, etc.), Protocol: Thrift, Query Method: HQL, native Thrift API,
Replication: HDFS Replication, Concurrency: MVCC, Consistency Model: Fully consistent Misc: High
performance C++ implementation of Google's Bigtable.
MongoDB Document Store API: BSON, Protocol: C, Query Method: dynamic object-based language & MapReduce, Replication:
Redis Key Value/ Tuple Store API: Tons of languages, Written in: C, Concurrency: in memory and saves asynchronous disk after a
defined time. Append only mode available. Different kinds of fsync policies. Replication: Master / Slave,
Misc: also lists, sets, sorted sets, hashes, queues.
Riak Key Value / Tuple Store API: JSON, Protocol: REST, Query Method: MapReduce term matching , Scaling: Multiple Masters; Written
in: Erlang, Concurrency: eventually consistent (stronger then MVCC via Vector Clocks)
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
NOSQL DATABASES
Horizontally scalable unstructured data retrieval
Name Type Description
Apache
Wide Column
Cassandra
Store/Families
HBase Wide Column
Store/Families
Hypertable Wide Column
Store/Families
Master Slave & Auto-Sharding, Written in: C++,Concurrency
INTEROP NY 2014 - Crash Course in Open Source Cloud Computing
45. By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
MAP REDUCE
Algorithm for Parallelized Data Set Processing
Problem
Data
Master
Node
INTEROP NY 2014 - Crash Course in Open Source Cloud Computing
Worker
Node 1
Worker
Node 2
Worker
Node 3
Solution
Data
Map
Reduce
46. • Handles large amounts of
• Stores data in native format
• Delivers linear scalability at
• Resilient in case of
infrastructure failures
• Transparent application
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
APACHE HADOOP
Apache Project for Parallelized Data Set Processing
Overview
• Handles large amounts of
data
• Stores data in native format
• Delivers linear scalability at
low cost
• Resilient in case of
infrastructure failures
• Transparent application
scalability
Features
data
low cost
scalability
INTEROP NY 2014 - Crash Course in Open Source Cloud Computing
47. BENEFITS OF SDN
Network Virtualization is the final frontier of Software Defined Datacenter
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
• Dynamically update networks
• Automate network
functionality
• “Program” security into the
network
• Centrally apply policies to
network and services
• Optimize networks
INTEROP NY 2014 - Crash Course in Open Source Cloud Computing
48. Machine Learning
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
APACHE HADOOP ECOSYSTEM
Non-Relational DB
Hadoop Hadoop Common
HDFS
Distributes & replicates data
across machines
INTEROP NY 2014 - Crash Course in Open Source Cloud Computing
MapReduce
Distributes & monitors tasks
Hive
Data warehouse that
provides SQL interface.
Ad hoc projection of
data structure to
unstructured
MapReduce
• Parallel programming
• Handles large data blocks
HBase
Column-oriented
schema-less distributed
DB modeled after
Google’s BigTable
Random real time
read/write.
Scripting
Pig
Platform for
manipulating and
analyzing large data sets.
Scripting language for
analysts.
Mahout
Machine learning
libraries for
recommendations ,
clustering, classifications
and item sets.
Chuckwa Zookeeper
49. CONFIGURATION MANAGEMENT TOOLS
Tools with features for configuring cloud infrastructure
Project Year Started Language License Client/Server
Chef 2009 Ruby Apache Chef Solo – No
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
CFengine 1993 C Apache Yes
INTEROP NY 2014 - Crash Course in Open Source Cloud Computing
Chef Server - Yes
Puppet 2004 Ruby GPL Yes & standalone
Salt 2011 Python Apache yes
Hitchhiker’s Guide to the
Open Cloud by @mrhinkle
49
50. CLOUD AUTOMATION TOOLS
One to many tools for managing large numbers of devices
Ansible Ansible's SSH-key based access allows contributors to the Fedora Project to assist in
automating infrastructure while having access limited appropriately. (Originally authored Func)
Capistrano Utility and framework for executing commands in parallel on multiple remote machines, via SSH.
It uses a simple DSL that allows you to define tasks, which may be applied to machines in
certain roles
RunDeck Rundeck is an open-source process automation and command orchestration tool with a web
Func Func provides a two-way authenticated system for generically executing tasks, integrations with
MCollective The Marionette Collective AKA MCollective is a framework to build server orchestration or
Salt Execute arbitrary shell commands or choose from dozens of pre-built modules of common (or
Scalr Provide scaling across multiple cloud computing platforms, integrates with Chef.
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
Project Description
console.
puppet and cobbler.
parallel job execution systems.
complex) commands.
INTEROP NY 2014 - Crash Course in Open Source Cloud Computing
51. EUREKA PRIAM SIMIAN ARMY
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
ASGARD ASTYANAX EDDA
INTEROP NY 2014 - Crash Course in Open Source Cloud Computing
51
http://netflix.github.com
NETFLIX AWS TOOLBAG
Tools developed by a super Amazon Web Services Power User
Editor's Notes
Why Open vSwitch - http://git.openvswitch.org/cgi-bin/gitweb.cgi?p=openvswitch;a=blob_plain;f=WHY-OVS;hb=HEAD
Hypervisors need the ability to bridge traffic between VMs and with theoutside world. On Linux-based hypervisors, this used to mean using thebuilt-in L2 switch (the Linux bridge), which is fast and reliable. So,
it is reasonable to ask why Open vSwitch is used.
The answer is that Open vSwitch is targeted at multi-server virtualization deployments, a landscape for which the previous stack is not well suited. These environments are often characterized by highly
dynamic end-points, the maintenance of logical abstractions, and (sometimes) integration with or offloading to special purpose switching hardware.