Hands on
Virtualization with
      Ganeti
          (part 1)

      Lance Albertson
        @ramereth
     Associate Director
    OSU Open Source Lab
About us
● OSU Open Source Lab
● Server hosting for Open Source
  Projects
  ○ Linux Foundation, Apache Software Foundation,
    Drupal, Python Software Foundation, Freenode,
    Gentoo, Debian, CentOS, Fedora, etc etc ...
● Open Source development projects
  ○ Ganeti Web Manager
Session Overview (part 1)
● Ganeti Introduction
  ● Terminology
  ● Major Components
● Latest Features
● Using Ganeti in Practice
● How Ganeti is deployed at OSUOSL
Session Overview (part 2)
● Hands on Demo
● Installation and Initialization
● Cluster Management
  ● Adding instances (VMs)
  ● Controlling instances
  ● Auto Allocation
● Dealing with node failures
What can Ganeti do?
●   Virtual machine management software tool
●   Manages clusters of physical machines
●   Xen/KVM/LXC VM deployment
●   Live Migration
●   Resiliency to failure
    ●   data redundancy via DRBD
● Cluster Balancing
● Ease of repairs and hardware swaps
Ganeti Cluster
Comparing Ganeti
●   Private IaaS
●   Primarily utilizes local storage
●   Designed for hardware failures
●   Mature project
●   Low package requirements
●   Simple administration
●   Easily pluggable via hooks & RAPI
Project Background
●   Google funded project
●   Used in internal corporate env
●   Open Sourced in 2007 GPLv2
●   Team based in Google Switzerland
●   Active mailing list & IRC channel
●   Started internally before libvirt,
    openstack, etc
Goals of Ganeti
Goals: Low Entry Level
● Keeping the entry level as low as
  possible
● Easy to install, manage and upgrade
● No specialized hardware needed
  ● i.e. SANs
● Lightweight
  ● no "expensive" package dependencies
Goals: Enterprise Scale
● Manage simultaneously from 1 to ~200
  host machines
● Access to advanced features
  ● drbd, live migration, API, OOB control
● Batch VM deployments
● Ease of lateral expansion and
  rebalancing
Goals: Open Source Citizen
● Design and code discussions are open
● External contributions are welcome
● Cooperate with other "big scale"
  Ganeti users
● Welcome third-party projects
  ● Ganeti Web Manager (OSL), Synnefo
    (GRNET)
Terminology
Terminology
Node         virtualization host


Node Group   homogeneous set of nodes   (i.e. rack of nodes)




Instance     virtualization guest

Cluster      set of nodes, managed as a collective
Job          ganeti operation
Architecture
Components
● Linux & standard utils
  ○ (iproute2, bridge-utils, ssh)
● KVM, Xen or LXC
● DRBD, LVM, RDB, or SAN
● Python
  ○ (plus a few modules)
● socat
● Haskell
   (optional, for auto-allocation)
Nodes Roles              (management level)
                    Runs ganeti-masterd, rapi, noded
Master Node
                    and confd
                    Have a full copy of the config, can
                    become master
Master Candidates
                    Run ganeti-confd and noded
                    Cannot become master
Regular Nodes
                    Get only part of the config
Offline nodes       In repair or decommissioned
Nodes Roles          (instance hosting level)



VM Capable Node   Can run virtual machines


Drained Nodes     Are being evacuated


Offline Nodes     Are in repair
Instances




● Virtual machine that runs on the cluster
● fault tolerant/HA entity within cluster
Instance Parameters
● Hypervisor: hvparams

● General: beparams

● Networking: nicparams

● Modifiable at the instance or

  cluster level
hvparams
● Boot order, CDROM Image
● NIC Type, Disk Type
● VNC Parameters, Serial console
● Kernel Path, initrd, args
● Other Hypervisor specific
  parameters
beparams / nicparams
● Memory / Virtual CPUs

● Adding or removing disks

● MAC

● NIC mode (routed or bridged)

● Link
Disk Template

drbd              LVM + DRBD between 2 nodes


rbd               RBD volumes residing inside a RADOS cluster *


plain             LVM with no redundancy


diskless          No disks. Useful for testing only


* experimental support added in 2.6
Primary & Secondary Concepts




● Instances always runs on primary
● Uses secondary node for disk replication
● Depends on disk template (i.e. drbd, plain)
Instance creation scripts
           also known as OS Definitions
●   Requires Operating System installation
    script
●   Provide scripts to deploy various operating
    systems
●   Ganeti Instance Debootstrap
    ● upstream supported
●   Ganeti Instance Image
    ● written by me
OS Variants
● Variants of the OS Definition
● Used for defining guest operating
  system
● Types of deployment settings:
  ● Extra packages
  ● Filesystem
  ● Image directory
  ● Image Name
Latest Features
            2.4                             2.5
          March 2011                      April 2012

●   Out of Band management    ●   shared storage (SAN)
●   vhost net support (KVM)       support
●   hugepages support (KVM)   ●   improved node groups
●   initial node groups           (scalability, evacuate,
                                  commands)
                              ●   master IP turnup
                                  customization
                              ●   full SPICE support (KVM)
Latest Features
             2.6                          Upcoming
           July 2012                   Just ideas, not promises

● RBD support (ceph)              ● Full dynamic memory support
● initial memory balloning        ● Better instance networking
  (KVM, Xen)                        customization
● cpu pinning                     ● Rolling Reboot
● OVF export/import support       ● Better automation, self-
● customized drbd parameters        healing, availability
● policies for better resource    ● Higher Scalability
  modeling                        ● KVM block device migration
● Optional haskell ganeti-confd   ● Better OS Installation
Initializing your cluster
The node needs to be set up following the ganeti installation guide.


gnt-cluster init [-s ip] ... 
  --enabled-hypervisors=kvm cluster
gnt-cluster
Cluster wide operations:

gnt-cluster      info
gnt-cluster      modify [-B/H/N ...]
gnt-cluster      verify
gnt-cluster      master-failover
gnt-cluster      command/copyfile ...
Adding nodes
gnt-node add [-s ip] node2
gnt-node add [-s ip] node3
gnt-node add [-s ip] node4
Adding instances
# install instance-{debootstrap, image}
gnt-os list
gnt-instance add -t drbd 
  {-n node3:node2 | -I hail } 
  -o debootstrap+default web
ping web
ssh web # easy with OS hooks
gnt-node
Per node operations:
gnt-node remove node4
gnt-node modify 
  [ --master-candidate yes|no ] 
  [ --drained yes|no ] 
  [ --offline yes|no ] node2
gnt-node evacuate/failover/migrate
gnt-node powercycle
-t drbd
DRBD provides redundancy to instance data, and
makes it possible to perform live migration without
having shared storage between the nodes.




        "RAID1" over the network
Recovering from failure
# set the node offline
gnt-node modify -O yes node3
Recovering from failure
# failover instances to their secondaries
gnt-node failover --ignore-consistency node3

# or, for each instance:
gnt-instance failover 
  --ignore-consistency web
Recovering from failure
# restore redundancy
gnt-node evacuate -I hail node3

# or, for each instance:
gnt-instance replace-disks 
  {-n node1 | -I hail } web
gnt-backup
Manage instance exports/backups:

gnt-backup export -n node1 web
gnt-backup import -t plain 
  {-n node3 | -I hail } 
  --src-node node1 
  --src-dir /tmp/myexport web
gnt-backup list
gnt-backup remove
htools: cluster resource management
● Written in Haskell
● Where do I put a new instance?
● Where do I move an existing one?
  ● hail: the H iallocator
● How much space do I have?
  ● hspace: the H space calculator
● How do I fix an N+1 error?
  ● hbal: the cluster balancer
Controlling Ganeti
● Command line *
● Ganeti Web Manager
  ● Developed by OSUOSL
● RAPI (Rest-full HTTP Interface) *
● On-cluster "luxi" interface *
  ● luxi is currently json over unix socket
  ● there is code for python and haskell

* programmable interfaces
Job Queue
● Ganeti operations generate jobs in the master
  ○ with the exception of queries
● Jobs execute concurrently
● You can cancel non-started jobs, inspect the queue
  status, and inspect jobs


gnt-job     list
gnt-job     info
gnt-job     watch
gnt-job     cancel
gnt-group
Managing node groups:

gnt-group add
gnt-group assign-nodes
gnt-group evacuate
gnt-group list
gnt-group modify
gnt-group remove
gnt-group rename
gnt-instance change-group
Running Ganeti in Production
             What should you add?
●   Monitoring/Automation
    ● Check host disks, memory, load
    ● Trigger events (evacuate, send to repairs, readd
      node, rebalance)
    ● Automated host installation/setup (config
      management)
●   Self service use
    ● Instance creation and resize
    ● Instance console access
Ganeti in practice
● Medium to small virtualization
  environments
● High performance
  ● Dedicated hardware, faster disks, more spindles on
      local storage
● Cheap hardware to high-end
  hardware
● Higher reliability
Ganeti as a "cloud"
● Not a traditional cloud environment
  ● No AWS APIs (yet at least), no object store
  ● Ganeti specific API
● Tools to extend it
  ● Ganeti Web Manager, Syssnefo, GlusterFS, Ceph
● Storage layer differences
  ● block devices instead of disk images (typically)
How the OSL uses Ganeti
● Powers all managed virtualization
● Project hosting
● KVM based
● Hundreds of VMs
● Web hosts, code hosting, etc
● Per-project clusters: PSF, OSGeo,
  phpBB, Gentoo
● Powers Supercell
Ganeti at OSL
● Node OS: Gentoo
  ● Migrating towards CentOS
● CFEngine for node configuration setup
● Utilize instance-image for guest installs
  ● Flexibility on guest operating systems we can
      deploy
● 10 clusters, 27 nodes, 230 instances
● Ganeti Web Manager
Ganeti at OSL
● Production cluster
  ● busybox, darcs, inkscape, musicbrainz, openmrs,
       php.net, qemu, freenode, yum
   ●   5 nodes, 20 instances per machine
   ●   64G Ram / 3-7TB / 24 cores (2)
   ●   24G Ram / 670G / 4 cores (3)
● Reduced cooling footprint
● Per-project clusters enabled flexibility
People running Ganeti
● Google
  ● Corporate Computing Infra
● osuosl.org
  ● Oregon State University Open Source Lab
● grnet.gr
  ● Greek Research & Technology Network
● nero.net
  ● Network for Education & Research in Oregon
Questions?                     (Part 1 Conclusion)
               Lance Albertson
              lance@osuosl.org
                 @ramereth
         http://lancealbertson.com

            Check it out at: http://code.google.com/p/ganeti/
                        Or just search for "Ganeti"
        Try it. Love it. Improve it. Contribute back (CLA required).
                   © 2009-2012 Oregon State University
Use under CC-by-SA / Some content borrowed/modified from Iustin Pop (with
                               permission)

Hands on Virtualization with Ganeti (part 1) - LinuxCon 2012

  • 1.
    Hands on Virtualization with Ganeti (part 1) Lance Albertson @ramereth Associate Director OSU Open Source Lab
  • 2.
    About us ● OSUOpen Source Lab ● Server hosting for Open Source Projects ○ Linux Foundation, Apache Software Foundation, Drupal, Python Software Foundation, Freenode, Gentoo, Debian, CentOS, Fedora, etc etc ... ● Open Source development projects ○ Ganeti Web Manager
  • 3.
    Session Overview (part1) ● Ganeti Introduction ● Terminology ● Major Components ● Latest Features ● Using Ganeti in Practice ● How Ganeti is deployed at OSUOSL
  • 4.
    Session Overview (part2) ● Hands on Demo ● Installation and Initialization ● Cluster Management ● Adding instances (VMs) ● Controlling instances ● Auto Allocation ● Dealing with node failures
  • 5.
    What can Ganetido? ● Virtual machine management software tool ● Manages clusters of physical machines ● Xen/KVM/LXC VM deployment ● Live Migration ● Resiliency to failure ● data redundancy via DRBD ● Cluster Balancing ● Ease of repairs and hardware swaps
  • 6.
  • 7.
    Comparing Ganeti ● Private IaaS ● Primarily utilizes local storage ● Designed for hardware failures ● Mature project ● Low package requirements ● Simple administration ● Easily pluggable via hooks & RAPI
  • 8.
    Project Background ● Google funded project ● Used in internal corporate env ● Open Sourced in 2007 GPLv2 ● Team based in Google Switzerland ● Active mailing list & IRC channel ● Started internally before libvirt, openstack, etc
  • 9.
  • 10.
    Goals: Low EntryLevel ● Keeping the entry level as low as possible ● Easy to install, manage and upgrade ● No specialized hardware needed ● i.e. SANs ● Lightweight ● no "expensive" package dependencies
  • 11.
    Goals: Enterprise Scale ●Manage simultaneously from 1 to ~200 host machines ● Access to advanced features ● drbd, live migration, API, OOB control ● Batch VM deployments ● Ease of lateral expansion and rebalancing
  • 12.
    Goals: Open SourceCitizen ● Design and code discussions are open ● External contributions are welcome ● Cooperate with other "big scale" Ganeti users ● Welcome third-party projects ● Ganeti Web Manager (OSL), Synnefo (GRNET)
  • 13.
  • 14.
    Terminology Node virtualization host Node Group homogeneous set of nodes (i.e. rack of nodes) Instance virtualization guest Cluster set of nodes, managed as a collective Job ganeti operation
  • 15.
  • 16.
    Components ● Linux &standard utils ○ (iproute2, bridge-utils, ssh) ● KVM, Xen or LXC ● DRBD, LVM, RDB, or SAN ● Python ○ (plus a few modules) ● socat ● Haskell (optional, for auto-allocation)
  • 17.
    Nodes Roles (management level) Runs ganeti-masterd, rapi, noded Master Node and confd Have a full copy of the config, can become master Master Candidates Run ganeti-confd and noded Cannot become master Regular Nodes Get only part of the config Offline nodes In repair or decommissioned
  • 18.
    Nodes Roles (instance hosting level) VM Capable Node Can run virtual machines Drained Nodes Are being evacuated Offline Nodes Are in repair
  • 19.
    Instances ● Virtual machinethat runs on the cluster ● fault tolerant/HA entity within cluster
  • 20.
    Instance Parameters ● Hypervisor:hvparams ● General: beparams ● Networking: nicparams ● Modifiable at the instance or cluster level
  • 21.
    hvparams ● Boot order,CDROM Image ● NIC Type, Disk Type ● VNC Parameters, Serial console ● Kernel Path, initrd, args ● Other Hypervisor specific parameters
  • 22.
    beparams / nicparams ●Memory / Virtual CPUs ● Adding or removing disks ● MAC ● NIC mode (routed or bridged) ● Link
  • 23.
    Disk Template drbd LVM + DRBD between 2 nodes rbd RBD volumes residing inside a RADOS cluster * plain LVM with no redundancy diskless No disks. Useful for testing only * experimental support added in 2.6
  • 24.
    Primary & SecondaryConcepts ● Instances always runs on primary ● Uses secondary node for disk replication ● Depends on disk template (i.e. drbd, plain)
  • 25.
    Instance creation scripts also known as OS Definitions ● Requires Operating System installation script ● Provide scripts to deploy various operating systems ● Ganeti Instance Debootstrap ● upstream supported ● Ganeti Instance Image ● written by me
  • 26.
    OS Variants ● Variantsof the OS Definition ● Used for defining guest operating system ● Types of deployment settings: ● Extra packages ● Filesystem ● Image directory ● Image Name
  • 27.
    Latest Features 2.4 2.5 March 2011 April 2012 ● Out of Band management ● shared storage (SAN) ● vhost net support (KVM) support ● hugepages support (KVM) ● improved node groups ● initial node groups (scalability, evacuate, commands) ● master IP turnup customization ● full SPICE support (KVM)
  • 28.
    Latest Features 2.6 Upcoming July 2012 Just ideas, not promises ● RBD support (ceph) ● Full dynamic memory support ● initial memory balloning ● Better instance networking (KVM, Xen) customization ● cpu pinning ● Rolling Reboot ● OVF export/import support ● Better automation, self- ● customized drbd parameters healing, availability ● policies for better resource ● Higher Scalability modeling ● KVM block device migration ● Optional haskell ganeti-confd ● Better OS Installation
  • 29.
    Initializing your cluster Thenode needs to be set up following the ganeti installation guide. gnt-cluster init [-s ip] ... --enabled-hypervisors=kvm cluster
  • 30.
    gnt-cluster Cluster wide operations: gnt-cluster info gnt-cluster modify [-B/H/N ...] gnt-cluster verify gnt-cluster master-failover gnt-cluster command/copyfile ...
  • 31.
    Adding nodes gnt-node add[-s ip] node2 gnt-node add [-s ip] node3 gnt-node add [-s ip] node4
  • 32.
    Adding instances # installinstance-{debootstrap, image} gnt-os list gnt-instance add -t drbd {-n node3:node2 | -I hail } -o debootstrap+default web ping web ssh web # easy with OS hooks
  • 33.
    gnt-node Per node operations: gnt-noderemove node4 gnt-node modify [ --master-candidate yes|no ] [ --drained yes|no ] [ --offline yes|no ] node2 gnt-node evacuate/failover/migrate gnt-node powercycle
  • 34.
    -t drbd DRBD providesredundancy to instance data, and makes it possible to perform live migration without having shared storage between the nodes. "RAID1" over the network
  • 35.
    Recovering from failure #set the node offline gnt-node modify -O yes node3
  • 36.
    Recovering from failure #failover instances to their secondaries gnt-node failover --ignore-consistency node3 # or, for each instance: gnt-instance failover --ignore-consistency web
  • 37.
    Recovering from failure #restore redundancy gnt-node evacuate -I hail node3 # or, for each instance: gnt-instance replace-disks {-n node1 | -I hail } web
  • 38.
    gnt-backup Manage instance exports/backups: gnt-backupexport -n node1 web gnt-backup import -t plain {-n node3 | -I hail } --src-node node1 --src-dir /tmp/myexport web gnt-backup list gnt-backup remove
  • 39.
    htools: cluster resourcemanagement ● Written in Haskell ● Where do I put a new instance? ● Where do I move an existing one? ● hail: the H iallocator ● How much space do I have? ● hspace: the H space calculator ● How do I fix an N+1 error? ● hbal: the cluster balancer
  • 40.
    Controlling Ganeti ● Commandline * ● Ganeti Web Manager ● Developed by OSUOSL ● RAPI (Rest-full HTTP Interface) * ● On-cluster "luxi" interface * ● luxi is currently json over unix socket ● there is code for python and haskell * programmable interfaces
  • 41.
    Job Queue ● Ganetioperations generate jobs in the master ○ with the exception of queries ● Jobs execute concurrently ● You can cancel non-started jobs, inspect the queue status, and inspect jobs gnt-job list gnt-job info gnt-job watch gnt-job cancel
  • 42.
    gnt-group Managing node groups: gnt-groupadd gnt-group assign-nodes gnt-group evacuate gnt-group list gnt-group modify gnt-group remove gnt-group rename gnt-instance change-group
  • 43.
    Running Ganeti inProduction What should you add? ● Monitoring/Automation ● Check host disks, memory, load ● Trigger events (evacuate, send to repairs, readd node, rebalance) ● Automated host installation/setup (config management) ● Self service use ● Instance creation and resize ● Instance console access
  • 44.
    Ganeti in practice ●Medium to small virtualization environments ● High performance ● Dedicated hardware, faster disks, more spindles on local storage ● Cheap hardware to high-end hardware ● Higher reliability
  • 45.
    Ganeti as a"cloud" ● Not a traditional cloud environment ● No AWS APIs (yet at least), no object store ● Ganeti specific API ● Tools to extend it ● Ganeti Web Manager, Syssnefo, GlusterFS, Ceph ● Storage layer differences ● block devices instead of disk images (typically)
  • 46.
    How the OSLuses Ganeti ● Powers all managed virtualization ● Project hosting ● KVM based ● Hundreds of VMs ● Web hosts, code hosting, etc ● Per-project clusters: PSF, OSGeo, phpBB, Gentoo ● Powers Supercell
  • 47.
    Ganeti at OSL ●Node OS: Gentoo ● Migrating towards CentOS ● CFEngine for node configuration setup ● Utilize instance-image for guest installs ● Flexibility on guest operating systems we can deploy ● 10 clusters, 27 nodes, 230 instances ● Ganeti Web Manager
  • 48.
    Ganeti at OSL ●Production cluster ● busybox, darcs, inkscape, musicbrainz, openmrs, php.net, qemu, freenode, yum ● 5 nodes, 20 instances per machine ● 64G Ram / 3-7TB / 24 cores (2) ● 24G Ram / 670G / 4 cores (3) ● Reduced cooling footprint ● Per-project clusters enabled flexibility
  • 49.
    People running Ganeti ●Google ● Corporate Computing Infra ● osuosl.org ● Oregon State University Open Source Lab ● grnet.gr ● Greek Research & Technology Network ● nero.net ● Network for Education & Research in Oregon
  • 50.
    Questions? (Part 1 Conclusion) Lance Albertson lance@osuosl.org @ramereth http://lancealbertson.com Check it out at: http://code.google.com/p/ganeti/ Or just search for "Ganeti" Try it. Love it. Improve it. Contribute back (CLA required). © 2009-2012 Oregon State University Use under CC-by-SA / Some content borrowed/modified from Iustin Pop (with permission)