Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Crash Course in Cloud Computing

1,653 views

Published on

All Things Open 2014 - Day 1
Wednesday, October 22nd, 2014

Mark Hinkle
Senior Director & Citrix Open Source Business Office for Citrix
Cloud
Crash Course in Cloud Computing

Find more of Mark's talks here: http://www.slideshare.net/socializedsoftware

Published in: Technology
  • Be the first to comment

Crash Course in Cloud Computing

  1. 1. All Things Open 2014 Crash Course in Open Source Cloud Computing Mark Hinkle Senior Director, Open Source Solutions Citrix Inc. mark.hinkle@citrix.com mrhinkle@gmail.com @mrhinkle
  2. 2. By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com ABOUT ME I Help Build Open Source Ecosystems Open Source Experience • Manage Citrix Open Source Business Office • Apache CloudStack Committer and PMC Member • Advisory boards Gluster and Xen Project • Joined Citrix via Cloud.com acquisition July 2011 • Zenoss Core open source project to 100,000 users, 1.5 million downloads • Former LinuxWorld Magazine Editor-in-Chief • Open Management Consortium organizer • Author - “Windows to Linux Business Desktop Migration” – Thomson • NetDirector Project - Open Source Configuration Management All Things Open 2014 - Open Source Cloud Computing
  3. 3. http://www.slideshare.net/socializedsoftware Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use. ShareAlike — If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original. By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com Slides Available on Slideshare: Creative Commons Attributions-ShareAlike 4.0 International Share — copy and redistribute the material in any medium or format Adapt — remix, transform, and build upon the material for any purpose, even commercially. The licensor cannot revoke these freedoms as long as you follow the license terms. All Things Open 2014 - Open Source Cloud Computing
  4. 4. By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com AGENDA • Vetting Open Source Cloud Projects • Virtualization • Infrastructure-as-a-Service • Platform-as-a-Service • SDN • Open Source for Amazon Web Services All Things Open 2014 - Open Source Cloud Computing
  5. 5. VETTING OPEN SOURCE PROJECTS How can you tell if they’re Legit • Code Velocity • Committers • Committer Reputation • User-driven or Vendor-Driven Innovation • User Activity • Corporate Support* • Reputation of Foundation* By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com All Things Open 2014 - Open Source Cloud Computing
  6. 6. …the future of technological innovation is not stealing limited resources away from one another, but creating new resources — and new opportunities to create new resources — together in a rich ecosystem.” By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com OPEN SOURCE ISN’T A ZERO-SUM GAME Allison Randal Open Source Hacker Former OSCON Program Chair @allisonrandal All Things Open 2014 - Open Source Cloud Computing
  7. 7. http://www.openhub.net http://activity.openstack.org By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com OPEN SOURCE ANALYSIS Visualizing Community Activity All Things Open 2014 - Open Source Cloud Computing
  8. 8. DevOps Toolchain By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com OPEN SOURCE CLOUD STACK Platform-as-a-Service (PaaS) ? ? Infrastructure-as-a-Service (IaaS) Orchestration ? Compute Storage Networking (Networking-as-a-Service) All Things Open 2014 - Open Source Cloud Computing Orchestration Configuration Management Monitoring
  9. 9. By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com VIRTUALIZATION Carving up compute resources OPEN SOURCE • Xen Project • Citrix XenServer • KVM • VirtualBox • OpenVZ • LXC • libcontainer PROPRIETARY • VMware • Microsoft Hyper-V • OracleVM (Based on Xen Project) All Things Open 2014 - Open Source Cloud Computing
  10. 10. By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com HYPERVISORS AND CONTAINERS Differences in virtualization Type 1 Hypervisors VMware, Xen Project, Hyper-V Type 2 Hypervisors KVM, VirtualBox All Things Open 2014 - Open Source Cloud Computing Containers LXC, libcontainer
  11. 11. • Different file formats for virtual machines (VMware uses vmdk file format, Xen and Hyper-V use VHD, KVM uses Raw or QCOW2) • Guest images may be “processor architecture” • VMware and Xen can manage SCSI devices, but • KVM and Xen can use virtio drivers but not • VMware uses a proprietary agent inside the guest OS (VMware tools) which does not work with Xen or KVM By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com THE PORTABILITY PROBLEM Containers compared to Hardware Virtualization bound KVM cannot VMware • Yada, Yada, Yada All Things Open 2014 - Open Source Cloud Computing
  12. 12. • Lets your run a Linux system within • A container is a group of processes on a Linux box, put together the provide an isolated environment • From the inside, it looks like a VM • Externally it looks like normal processes • “chroot on steroids” By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com LINUX CONTAINERS “Lightweight” Linux Virtualization another Linux system All Things Open 2014 - Open Source Cloud Computing
  13. 13. • Code – Application is stored • Build – Code is built (Jenkins) • Test – Unit tests are By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com CONTINUOUS INTEGRATION Rebuild Applications on any Cloud and/or Virtualized Infrastructure in a repository (Subversion,Git) automated (Jenkins) • Deploy – Deploy code to server various ways Code Build Test Deploy Thoughtworks Go – Open Source Continuous Deliver System All Things Open 2014 - Open Source Cloud Computing
  14. 14. Docker is an open-source project to easily create lightweight, portable, self-sufficient containers from any application. The same container that a developer builds and tests on a laptop can run at scale, in production, on VMs, bare metal, public clouds and more. By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com DOCKER CONTAINER PACKAGING Open source LXC Packaging Engine To learn more please visit: www.docker.io All Things Open 2014 - Open Source Cloud Computing
  15. 15. • Compliment to LXC not a replacement • Managed daemonized processes on Linux • Create ability to re-use and manage similar • Content agnostic • Hardware agnostic • Easy to automate • Integrated with other tools: Chef, OpenShift, By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com WHAT IS DOCKER System for Managing and Deploying LXC Containers using LXC libcontainer applications Puppet, VMware, etc. All Things Open 2014 - Open Source Cloud Computing
  16. 16. DOCKER’S GROWING ECOSYSTEM By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com All Things Open 2014 - Open Source Cloud Computing
  17. 17. Kubernetes builds on top of Docker to construct a clustered container scheduling service. Kubernetes enables users to ask a cluster to run a set of containers. The system will automatically pick worker nodes to run those containers on, which we think of more as "scheduling" than "orchestration” To learn more please visit: https://github.com/GoogleCloudPlatform/kubernetes Greek for Shipmaster By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com KUBERNETES Container Cluster Management – Scheduler All Things Open 2014 - Open Source Cloud Computing
  18. 18. DOCKER RELATED PROJECTS • Fig -Fast, isolated development environments • Flynn - Next-generation application platform • Panamax – Drag-and-Drop Docker Containerization • Project Atomic – JEOS designed to run Docker containers • SocketPlane – Docker Networking (coming soon) • Weave – Docker Networking • 13,000+ Docker-related repos on Github By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com All Things Open 2014 - Open Source Cloud Computing
  19. 19. By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com $141 Billion Market Cap $363 Billion Market Cap All Things Open 2014 - Open Source Cloud Computing $356 Billion Market Cap PUBLIC CLOUD
  20. 20. Project Year Started License Virtualization By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com MINIMUM VIABLE CLOUD Infrastructure-as-a-Service | IaaS |Compute Orchestration All Things Open 2014 - Open Source Cloud Computing Technologies Apache CloudStack 2008 Apache (Bare Metal), Xenserver, KVM, LXC VMware Hyper- V Eucalyptus 2006 GPL Xen, KVM, VMware (commercial version) OpenNebula 2005 Apache Xen, KVM, VMware OpenStack 2010 (Developed by NASA by Anso Labs previously) Apache VMware ESX and ESXi, , Xen, XenServer, KVM, LXC, QEMU and Virtual Box
  21. 21. By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com OPENSTACK The Boy Band of the Open Source Cloud  All Things Open 2014 - Open Source Cloud Computing
  22. 22. By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com OPENSTACK SHARED SERVICES Span Compute, Storage and Networking IDENTITY SERVICE IMAGE SERVICE All Things Open 2014 - Open Source Cloud Computing TELEMETRY SERVICE ORCHESTRATION SERVICE
  23. 23. EVEN MORE OPENSTACK PROJECTS Span Compute, Storage and Networking • Trove Database Service • Ironic Bare Metal (Ironic) • Marconi Queue Service By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com • Cinder Block Storage Service • Ceilometer Metering/Monitoring • Heat Orchestration All Things Open 2014 - Open Source Cloud Computing
  24. 24. OPENSTACK SOLUTION PROVIDERS If you can’t do it yourself “OpenStack is not a product. If you are building a large infrastructure, it’s more like a tool kit. It gives you a lot of technologies that do take a lot of effort to integrate.” Chris Kemp, OpenStack Board Member and Co-Founder By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com All Things Open 2014 - Open Source Cloud Computing CEO of Piston Computing
  25. 25. By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com CLOUD APIS Everything (should) have an API in the Cloud All Things Open 2014 - Open Source Cloud Computing • Deltacloud(ruby) • Daisein(java) • Jclouds(java) • Libcloud(python) • Fog(ruby)
  26. 26. Project Description Ceph Distributed file storage system developed by DreamHost -> GlusterFS Scale Out NAS system aggregating storage over Ethernet or Riak CS Riak CS is open source software designed to provide simple, available, distributed cloud storage at any scale. Riak CS is S3- API compatible and supports per-tenant reporting for billing and metering use cases. (object) Sheepdog Distributed storage for KVM hypervisors, distributed iSCSI By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com CLOUD STORAGE Virtualized, Distributed usually on Commodity Hardware InkTank -> Red Hat (block, object, file) Infiniband (file) OpenStack Storage Long-term object storage system (object) All Things Open 2014 - Open Source Cloud Computing
  27. 27. CLOUD AUTOMATION TOOLS One to many tools for managing large numbers of devices Ansible Ansible's SSH-key based access allows contributors to the Fedora Project to assist in automating infrastructure while having access limited appropriately. (Originally authored Func) Capistrano Utility and framework for executing commands in parallel on multiple remote machines, via SSH. It uses a simple DSL that allows you to define tasks, which may be applied to machines in certain roles RunDeck Rundeck is an open-source process automation and command orchestration tool with a web Func Func provides a two-way authenticated system for generically executing tasks, integrations with MCollective The Marionette Collective AKA MCollective is a framework to build server orchestration or Salt Execute arbitrary shell commands or choose from dozens of pre-built modules of common (or Scalr Provide scaling across multiple cloud computing platforms, integrates with Chef. By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com Project Description console. puppet and cobbler. parallel job execution systems. complex) commands. All Things Open 2014 - Open Source Cloud Computing
  28. 28. By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com All Things Open 2014 - Open Source Cloud Computing
  29. 29. Project Sponsors Languages/Frameworks By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com PLATFORM-AS-A-SERVICE Abstracted Cloud-Scale Run-Time Environments CloudFoundry VMware -> Pivotal -> CloudFoundry Foundation All Things Open 2014 - Open Source Cloud Computing Spring for Java, Ruby for Rails and Sinatra, node.js, Grails, Scala on Lift and more via partners (e.g. Python, PHP) Cloudify Gigaspaces [Groovy for deployment recipes] OpenShift Origin Red Hat Java, Ruby, PHP, Perl and Python Apache Stratos WSO2 - >Apache Stratus PHP, Tomcat, MySQL “cartridges”
  30. 30. Apache Mesos is a cluster manager that simplifies the complexity of running applications on a shared pool of servers. Largely supported by Twitter, used by LinkedIn, AirBNB too. Features • Fault-tolerant replicated master using ZooKeeper • Scalability to 10,000s of nodes • Isolation between tasks with Linux Containers • Multi-resource scheduling (memory and CPU aware) • Java, Python and C++ APIs for developing new By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com APACHE MESOS One to many tools for managing large numbers of devices parallel applications • Web UI for viewing cluster state To learn more please visit: http://mesos.apache.org/ All Things Open 2014 - Open Source Cloud Computing
  31. 31. Decoupling of the control and data planes of the network to improve efficiency. Communication from a SDN controller via a protocol to network devices both physical and virtual. Abstractions allow for programmable networks. Network can be changed quickly via a controller Network offerings can match virtualization offerings for finer grained security in a highly volatile compute landscape. By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com SOFTWARE DEFINED VNirtuEalizTatiWon mOeetRs thKe neItwNorkG(SDN) Automation Dynamic Networks Security Heterogeneous Management Single control point for various devices. All Things Open 2014 - Open Source Cloud Computing
  32. 32. By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com Business Applications SDN OVERVIEW All Things Open 2014 - Open Source Cloud Computing Network Services SDN Control Software API API Network Devices Network Devices Network Devices Network Devices Network Devices Network Devices Application Layer Control Layer Infrastructure Layer Control Data Plane Interface (e.g. OpenFlow)
  33. 33. BENEFITS OF SDN Network Virtualization is the final frontier of Software Defined Datacenter By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com • Dynamically update networks • Automate network functionality • “Program” security into the network • Centrally apply policies to network and services • Optimize networks All Things Open 2014 - Open Source Cloud Computing
  34. 34. OpenFlow enables networks to evolve, by giving a remote controller the power to modify the behavior of network devices, through a well-defined "forwarding instruction set". The growing OpenFlow ecosystem now includes routers, switches, virtual switches, and access points from a range of vendors. By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com OPENFLOW Virtualization meets the network All Things Open 2014 - Open Source Cloud Computing
  35. 35. OPEN SOURCE SDN Software Defined Network Controllers and more Floodlight The Floodlight Open SDN Controller is an enterprise-class, Apache-licensed, Java-based OpenFlow Controller. It is supported by a community of developers including a number of engineers from Big Switch Networks. - See more at: http://www.projectfloodlight.org/floodlight/#sthash.9IhA1Ih5.dpuf Indigo Indigo is an open source project aimed at enabling support for OpenFlow on physical and hypervisor switches. Big Switch has helped numerous companies OpenFlow enable their equipment, and we provide firmware for a number of popular switches. Indigo is the basis of Switch Light by Big Switch Networks. - See more at: http://www.projectfloodlight.org/indigo/#sthash.K7LiHcqc.dpuf Lincx LINCX is a pure OpenFlow software switch written in Erlang. It runs within a separate domain under Xen Nox NOX is the original OpenFlow controller, and facilitates development of fast C++ controllers on Linux. Open Daylight Linux Foundation Collaborative Project based on Cisco One Controller and plugins from numerous Open vSwitch Open vSwitch is a open source (ASL 2.0), multilayer virtual switch designed to enable massive network automation through programmatic extension, while still supporting standard management interfaces and protocols (e.g. NetFlow, sFlow, SPAN, RSPAN, CLI, LACP, 802.1ag). By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com Project Description hypervisor using LING (erlangonxen.org). vendors in development. E.g IBM DOVE All Things Open 2014 - Open Source Cloud Computing
  36. 36. Open vSwitch is a production quality, multilayer virtual switch licensed under the open source Apache 2.0 license. It is designed to enable massive network automation through programmatic extension, while still supporting standard management interfaces and protocols (e.g. NetFlow, sFlow, SPAN, RSPAN, CLI, LACP, 802.1ag). To learn more please visit our website: http://openvswitch.org/ By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com OPEN VSWITCH All Things Open 2014 - Open Source Cloud Computing
  37. 37. DevOps Toolchain By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com OPEN SOURCE CLOUD STACK CloudFoundry, OpenShift, Gigaspaces Docker Platform-as-a-Service Mesos Kubernetes Infrastructure-as-a-Service | IaaS | Orchestration (OpenStack, Apache CloudStack, Eucalyptus) Compute (Containers, KVM, Xen) Storage (Ceph, Gluster) Networking (OpenDaylight, All Things Open 2014 - Open Source Cloud Computing Contrail) Orchestration - Ansible/SaltStack/Scalr* Configuration Management (CFengine/Chef/Puppet) Monitoring (logstash,graphite,)
  38. 38. EUREKA PRIAM SIMIAN ARMY By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com ASGARD ASTYANAX EDDA All Things Open 2014 - Open Source Cloud Computing 38 http://netflix.github.com NETFLIX AWS TOOLBAG Tools developed by a super Amazon Web Services Power User
  39. 39. By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com CONTACT ME Happy to Chat about Open Source, Cloud or Pittsburgh Sports Professional: mark.hinkle@citrix.com Personal: mrhinkle@gmail.com Phone: 919.228.8049 Professional: http://open.citrix.com Personal: http://www.socializedsoftware.com Twitter: @mrhinkle All Things Open 2014 - Open Source Cloud Computing
  40. 40. By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com APPENDIX A Additional Links to related stuff All Things Open 2014 - Open Source Cloud Computing
  41. 41. ADDITIONAL LINKS • Devops Toolchains Group • Software Defined Networking: The New Norm for Networks (Whitepaper) • DevOps Wikipedia Page • NoSQL-Database.org – Ultimate Guide to the Non-Relational Universe • Open Cloud Initiative • NIST Cloud Computing Platform • Open Virtualization Format Specs • Clouderati Twitter Account • Planet DevOps • Nicira Whitepaper – It’s Time to Virtualize the Network • Why Open vSwitch FAQ • Stanford Seminar - Software-Defined Networking at the Crossroads By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com All Things Open 2014 - Open Source Cloud Computing
  42. 42. By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com ADDITIONAL LINKS (CONT’D) • SDN, NFV, and open source: The Operator’s View • Puppet Labs: Build a Toolbox for Continuous Delivery All Things Open 2014 - Open Source Cloud Computing
  43. 43. By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com APPENDIX B Stuff I’d liked to have talked about but didn’t have time All Things Open 2014 - Open Source Cloud Computing
  44. 44. By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com 60 SECOND CLOUD DEFINITION Just because Software Marketing Guys Think it’s the Internet 5 CHARACTERISTICS OF CLOUD 1. On-Demand Self-Service 2. Broad Network Access 3. Resource Pooling 4. Rapid Elasticity 5. Measured Service User Cloud a.k.a. SOFTWARE-AS-A-SERVICE Developer Cloud a.k.a. PLATFORM-AS-A-SERVICE Systems Cloud a.k.a. INFRASTRUCTURE-AS-A-SERVICE All Things Open 2014 - Open Source Cloud Computing
  45. 45. By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com SCALE-UP SCALE OUT Elasticity and the cloud Vertical Scaling (Scale-Up) Allocate additional resources to VMs, requires a reboot, no need for distributed app logic, single-point of OS failure Horizontal Scaling (Scale-Out) Application needs logic to work in distributed fashion (e.g. HA-Proxy and Apache Hadoop) All Things Open 2014 - Open Source Cloud Computing
  46. 46. Bitnami BitNami provides free, ready to run environments for your favorite open source web applications and frameworks, including Drupal, Joomla!, Wordpress, PHP, Rails, Django and many more. Boxgrinder BoxGrinder is a set of projects that help you grind out appliances for multiple Oz Command-line tool that has the ability to create images for common Linux SUSE Studio SUSE Studio supports building and deploying directly to cloud services such as By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com SOURCING CLOUD APPLIANCES Packaging Engines for VMs Tool/Project What you can do with them virtualization and Cloud providers distributions to run on KVM Amazon EC2. All Things Open 2014 - Open Source Cloud Computing
  47. 47. PACKER MULTIPLATFORM VM CREATION Packer is easy to use and automates the creation of any type of machine image. It embraces modern configuration management by encouraging you to use automated scripts to install and configure the software within your Packer-made images. By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com To learn more please visit: www.packer.io Open source Automation for VMs All Things Open 2014 - Open Source Cloud Computing
  48. 48. CONFIGURATION MANAGEMENT TOOLS Tools with features for configuring cloud infrastructure Project Year Started Language License Client/Server Chef 2009 Ruby Apache Chef Solo – No By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com CFengine 1993 C Apache Yes All Things Open 2014 - Open Source Cloud Computing Chef Server - Yes Puppet 2004 Ruby GPL Yes & standalone Salt 2011 Python Apache yes Hitchhiker’s Guide to the Open Cloud by @mrhinkle 48
  49. 49. CLOUD MONITORING TOOLS Tools with features for monitoring cloud infrastructure Project Type of Monitoring Collection Methods Cacti / RRDTool Performance SNMP, syslog Nagios Availability SNMP,TCP, ICMP, IPMI, By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com Graphite Performance Agent All Things Open 2014 - Open Source Cloud Computing syslog Sensu Availability Agent Zabbix Availability/ Performance and more SNMP, TCP/ICMP, IPMI, Synthetic Transactions Zenoss Availability, Performance, Event Management SNMP, ICMP, SSH, syslog, WMI Hitchhiker’s Guide to the Open Cloud by @mrhinkle 49
  50. 50. CLOUD PROVISIONING TOOLS Packaging Engines for VMs Can provision 10s to 1000s of machines on various clouds. Cobbler Distributed virtual infrastructure using koan (kickstart of a network to PXE boot VMs) for Red Hat, OpenSUSE Fedora, Debian, Ubuntu VMs Salt Cloud Tool to provision “salted” VMs that can then be updated by a central server By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com Project Installation Targets Apache Provisionr (incubating) Crowbar (Bare metal provisioning) JuJu Public Clouds - Amazon Web Services HP Cloud, Private OpenStack clouds, Bare Metal via MAAS. via ZeroMQ All Things Open 2014 - Open Source Cloud Computing Hitchhiker’s Guide to the Open Cloud by @mrhinkle 50
  51. 51. By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com BIG DATA All Things Open 2014 - Open Source Cloud Computing
  52. 52. API: many » Query Method: MapReduce, Replicaton: , Written in: Java, Concurrency: eventually consistent , Misc: like "Big-Table on Amazon Dynamo alike", initiated by Facebook CouchDB Document Store API: Memcached API+protocol (binary and ASCII) , most languages, Protocol: Memcached REST interface for cluster conf + management, Written in: C/C++ + Erlang (clustering), Replication: Peer to Peer, fully consistent, Misc: Transparent topology changes during operation, provides memcached-compatible caching buckets API: Java / any writer, Protocol: any write call, Query Method: MapReduce Java / any exec, Replication: HDFS Replication, Written in: Java PI: Thrift (Java, PHP, Perl, Python, Ruby, etc.), Protocol: Thrift, Query Method: HQL, native Thrift API, Replication: HDFS Replication, Concurrency: MVCC, Consistency Model: Fully consistent Misc: High performance C++ implementation of Google's Bigtable. MongoDB Document Store API: BSON, Protocol: C, Query Method: dynamic object-based language & MapReduce, Replication: Redis Key Value/ Tuple Store API: Tons of languages, Written in: C, Concurrency: in memory and saves asynchronous disk after a defined time. Append only mode available. Different kinds of fsync policies. Replication: Master / Slave, Misc: also lists, sets, sorted sets, hashes, queues. Riak Key Value / Tuple Store API: JSON, Protocol: REST, Query Method: MapReduce term matching , Scaling: Multiple Masters; Written in: Erlang, Concurrency: eventually consistent (stronger then MVCC via Vector Clocks) By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com NOSQL DATABASES Horizontally scalable unstructured data retrieval Name Type Description Apache Wide Column Cassandra Store/Families HBase Wide Column Store/Families Hypertable Wide Column Store/Families Master Slave & Auto-Sharding, Written in: C++,Concurrency All Things Open 2014 - Open Source Cloud Computing
  53. 53. By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com MAP REDUCE Algorithm for Parallelized Data Set Processing Problem Data Master Node All Things Open 2014 - Open Source Cloud Computing Worker Node 1 Worker Node 2 Worker Node 3 Solution Data Map Reduce
  54. 54. By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com APACHE HADOOP Apache Project for Parallelized Data Set Processing Overview • Handles large amounts of data • Stores data in native format • Delivers linear scalability at low cost • Resilient in case of infrastructure failures • Transparent application scalability All Things Open 2014 - Open Source Cloud Computing Features • Handles large amounts of data • Stores data in native format • Delivers linear scalability at low cost • Resilient in case of infrastructure failures • Transparent application scalability
  55. 55. Machine Learning By Mark R. Hinkle @mrhinkle mrhinkle@gmail.com APACHE HADOOP ECOSYSTEM Non-Relational DB Hadoop Hadoop Common HDFS Distributes & replicates data across machines All Things Open 2014 - Open Source Cloud Computing MapReduce Distributes & monitors tasks Hive Data warehouse that provides SQL interface. Ad hoc projection of data structure to unstructured MapReduce • Parallel programming • Handles large data blocks HBase Column-oriented schema-less distributed DB modeled after Google’s BigTable Random real time read/write. Scripting Pig Platform for manipulating and analyzing large data sets. Scripting language for analysts. Mahout Machine learning libraries for recommendations , clustering, classifications and item sets. Chuckwa Zookeeper

×