Compute 101 - OpenStack Summit Vancouver 2015

OPENSTACK COMPUTE 101
OpenStack Compute 101
Stephen Gordon (@xsgordon)
Sr. Technical Product Manager,
Red Hat

OPENSTACK COMPUTE 101OPENSTACK COMPUTE 101
Agenda
● Overview
● Instance Lifecycle
● Compute Drivers
● Scaling Compute
● Segregating Compute
● New in Kilo

OVERVIEW

What is OpenStack?
● A group of related projects that when combined form an
Open Source cloud infrastructure platform for providing
Infrastructure-as-a-Service.
● Intended to be “massively scalable”, scales horizontally
not vertically, on commodity hardware.
● Modular architecture allows consumers of the platform
to deploy only what they need.

OpenStack Components

What is OpenStack Compute (Nova)?
● One of the two original OpenStack projects, along with
Object Storage (Swift).
● Exposes a rich API for defining compute instances and
managing their lifecycle.
● Pluggable support for multiple common hypervisor
platforms, relatively solution agnostic.

Compute Components
● RESTful nova-api
interface exposed on TCP
port 8774.
● AMQP message queue
used for RPC
communications.
● nova-scheduler handles
hypervisor selection for
instance placement.

Components (cont.)
● nova-compute acts as the
Compute agent, interacting
with the relevant
hypervisor APIs to
launch/manage guests.
● nova-conductor handles
database access (no-db-
compute)

Other Components
● Metadata service - nova-metadata-api
● Traditional networking model - nova-network
● L2 agent - e.g.:
○ neutron-openvswitch-agent
○ neutron-linuxbridge-agent
● Ceilometer agent:
○ openstack-ceilometer-compute
● EC2 API: nova-ec2, nova-cert
● Console Auth and Proxies: noVNC, SPICE, etc.

INSTANCE LIFECYCLE

Authentication$ cat keystonerc_demo
export OS_USERNAME=demo
export OS_TENANT_NAME=demo
export OS_PASSWORD=c8500b92ed7f4ed0
export OS_AUTH_URL=http://93.184.216.34:5000/v2.0/
export PS1='[u@h W(keystone_demo)]$ '
$ source keystonerc_demo

Instance Creation
● Instance creation achieved using nova boot command.
● Minimal set of arguments include selecting a flavor and
image:
$ nova boot --flavor <flavor> --image <image>
[--nic net-id=<net-id>] <name>
● Flavor determines the “size” of an instance.
● Image determines the disk image used to boot the
instance.

Flavor Selection
● Simplify process of packing
instances onto physical hosts.
● Largest flavor is typically twice
the size (CPU, RAM, Disk) of
next largest flavor and so on.
● Admin may want to customize
depending on workload
patterns.
http://bit.ly/1QPNVaZ

Flavor Selection
$ nova flavor-list
+--------------------------------------+------------------+-----------+------+-----------+------+-------+
| ID | Name | Memory_MB | Disk | Ephemeral | Swap | VCPUs |
+--------------------------------------+------------------+-----------+------+-----------+------+-------+
| 1 | m1.tiny | 512 | 1 | 0 | | 1 |
| 2 | m1.small | 2048 | 20 | 0 | | 1 |
| 3 | m1.medium | 4096 | 40 | 0 | | 2 |
| 4 | m1.large | 8192 | 80 | 0 | | 4 |
| 5 | m1.xlarge | 16384 | 160 | 0 | | 8 |
+--------------------------------------+------------------+-----------+------+-----------+------+-------+

Flavor Selection
$ nova flavor-show m1.small
+----------------------------+----------+
+----------------------------+----------+
| ... | ... |
| extra_specs | {} |
| id | 2 |
| name | m1.small |
| os-flavor-access:is_public | True |
| ram | 2048 |
| rxtx_factor | 1.0 |
| swap | |
| vcpus | 1 |
+----------------------------+----------+

Network Selection
$ neutron net-list
+--------------------------------------+---------+------------------------------------------------------+
| id | name | subnets |
+--------------------------------------+---------+------------------------------------------------------+
| 605b65dd-dd7a-4f82-91f3-7c10d8e2e448 | public | 59358224-3090-4970-b07e-330b867a4411 172.24.4.224/28 |
| 7a9a376d-88cc-41ae-a08f-e3ca274f88cd | private | d68302bf-6397-480d-a61a-1eaa45e9edb9 10.0.0.0/24 |
+--------------------------------------+---------+------------------------------------------------------+

What just happened?
● Retrieved token and endpoints from Keystone API
○ Compute end-point of the form: http[s]://<ip>:8774/v2/%(tenant_id)s
● Confirm image identifier:
○ Retrieved list of available images from Nova API
■ http://93.184.216.34:8774/v2/fc50f6843ba644baaae2af0398e7f04e/images
○ Retrieved specific image detail from Nova API
■ .../v2/fc50f6843ba644baaae2af0398e7f04e/images/3a752292-4484-469c-a716-de2542b5742f
● Confirm flavor identifier:
○ Retrieved list of available flavors from Nova API
■ ../v2/fc50f6843ba644baaae2af0398e7f04e/flavors
○ Retrieved specific flavor detail from Nova API
■ ../v2/fc50f6843ba644baaae2af0398e7f04e/flavors/2

What just happened? (cont.)
● User request was sent to the compute endpoint in
JSON format:
{"server":
{"name": "test-instance",
"imageRef": "3a752292-4484-469c-a716-de2542b5742f",
"flavorRef": "2", "max_count": 1, "min_count": 1,
"networks": [{"uuid": "7a9a376d-88cc-41ae-a08f-e3ca274f88cd"}]
}
}
● Request is picked up by nova-api service.

What just happened? (cont.)
● nova-api:
○ Extracts parameters for basic validation.
○ Retrieves a reference to the selected flavor.
○ Retrieves a reference to selected boot media:
■ Image using Glance client (in this example); OR
■ Volume using Cinder client (boot from volume)
○ Saves initial instance state to database.
○ Puts a message on the message queue for the conductor.
● API call returns at this point, with instance status of
BUILD, task state SCHEDULING.

Scheduling
● Conductor asks the schedule where to build the
instance
● Default implementation is a filter scheduler
● Applies filters and weights based on configuration
○ Filter examples:
■ ComputeFilter - is this host on?
■ CoreFilter - is this host exposing enough free vCPUs?
■ RamFilter - is this host exposing enough free vRAM?
■ ImagePropertiesFilter - does this host conform to selected image properties
(architecture, hypervisor type, etc.).
○ Weight examples:
■ RAM Weigher - give preference to hosts with more or less RAM free.
● Can also take user provided hints

Filter Scheduler Example

Filter Scheduler Example (cont.)
● Running with debug=True:
[req-... None] Starting with 3 host(s)
[req-... None] Filter RetryFilter returned 3 host(s)
[req-... None] Filter AvailabilityZoneFilter returned 3 host(s)
[req-... None] Filter RamFilter returned 2 host(s)
...
[req-... None] Filtered [(localhost.localdomain, localhost.localdomain)
ram:3208 disk:7168 io_ops:0 instances:1] _schedule ...
[req-... None] Weighed [ WeighedHost [host: (localhost.localdomain,
localhost.localdomain) ram:3208 disk:7168 io_ops:0 instances:1, weight:
1.0]] ...

Scheduling (cont.)
● Updates instance state in database.
● Returns to conductor, conductor places message on the
queue for openstack-nova-compute (the compute
agent) on the selected compute node.

Compute Agent
● Prepares for instance launch:
○ Calls Glance and/or Cinder to retrieve boot media info (image or
volume).
○ Calls Neutron or nova-network to get network and security group
information and “plug” virtual interfaces.
○ Calls Cinder to attach volume if necessary.
○ Sets up configuration drive if necessary.
● Uses hypervisor APIs to create virtual machine!
● Updates virtual machine state in DB (using conductor).

COMPUTE DRIVERS

Driver Selection
● Two tools to help guide operators:
○ Driver testing status
■ “Is this driver tested using unit and/or functional tests in the gate?”
○ Hypervisor support matrix
■ “Does this driver support actions x, y, and z?”

Driver Testing Status
● Multi-tiered:
○ Group A - Fully supported.
■ Coverage includes unit and functional tests in the gate.
○ Group B - Middle ground.
■ Test coverage includes unit tests that gate commits, functional testing by an external
system that does not gate but does comment on patches.
○ Group C - Drivers that have limited testing, use at own risk.
■ Test coverage includes (potentially) unit tests that gate commits and no public
functional testing.
● https://wiki.openstack.
org/wiki/HypervisorSupportMatrix#Driver_Testing_Statu
s

Hypervisor Support Matrix
● Lists mandatory and optional driver capabilities:
○ http://docs.openstack.org/developer/nova/support-matrix.html
● Examples of capabilities:
○ Launch instance (mandatory)
○ Attach block volume to instance (optional)

Hypervisor Support Matrix
● 11+ in-tree drivers:
○ Hyper-V
○ Ironic
○ Libvirt/
■ KVM (x86)
■ KVM (ppc64)
■ KVM (s390)
■ QEMU (x86)
■ LXC
■ Xen
■ Parallels CT
■ Parallels VM
○ VMware vCenter
○ XenServer
● Out of tree (stackforge):
○ Docker
○ PowerVM
○ zVM
● Others may exist!

SCALING COMPUTE

Scaling Compute
● Compute services scale
horizontally (simply add
more).
● Scheduler needs to be
scaled a little more
carefully.
● Message queue and
database can be
clustered.

Cells
● Divide multiple compute
installations into “cells”.
● API cell handles incoming
requests, schedules to a compute
cell.
● Each cell has an instance of
nova-cells, its own message
queue and database.

Cells
● Pros:
○ Maintain a single compute endpoint.
○ Relieve pressure on queues/database at
scale.
○ Introduce additional layer of scheduling.

Cells
● Cons:
○ Lack of “cell awareness” in other projects
(e.g. Neutron).
○ Minimal test coverage in the gate.
○ Some standard functionality remains
broken with cells (Security Groups, Host
Aggregates).
● CellsV2, currently under
development, offers more promise
for the future.

SEGREGATING COMPUTE

Why Segregate Compute Resources?
● Expose logical groupings:
○ Geographical region, data center, rack, power source, network, etc.
● Expose special capabilities:
○ Faster NICs, storage, special devices, etc.
● The divisions mean whatever you want them to mean!

Regions
● Complete OpenStack deployments
○ Share as many or as few services as
needed.
○ Implement their own targetable API
endpoints, networks, and compute.
● By default all services in one region:
$ keystone endpoint-create --region
“RegionTwo” ...
● Target actions at a regions endpoint:
$ nova --os-region-name “RegionTwo” boot ...

Host Aggregates
● Logical groupings of hosts based on metadata.
● Typically metadata describes capabilities hosts expose:
○ SSD hard disks for ephemeral data storage.
○ PCI devices for passthrough.
○ Etc.
● Hosts can be in multiple host aggregates:
○ “Hosts that have SSD storage and 40G interfaces”.

Host Aggregates (cont.)
● Implicitly user targetable:
○ Admin defines host aggregate with metadata and flavor to match:
■ $ nova aggregate-create hypervisors-with-SSD
■ $ nova aggregate-set-metadata 1 SSDs=true
■ $ nova aggregate-add-host 1 hypervisor-1
■ $ nova flavor-key 1 set
aggregate_instance_extra_specs:SSDs=true
○ User selects flavor when requesting instance.
○ Scheduler places on host aggregate with metadata matching flavor
extra specifications using AggregateInstanceExtraSpecsFilter

Availability Zones
● Logical groupings of hosts based on arbitrary factors
like:
○ Location (country, data center, rack, etc.)
○ Network layout
○ Power source
● Explicitly user targetable:
$ nova boot --availability-zone “rack-1”

Availability Zones
● Host aggregates are made explicitly user targetable by
creating them as an AZ:
○ $ nova aggregate-create tier-1 us-east-tier-1
○ tier-1 is the aggregate name, us-east-tier-1 is the AZ name.
● The host aggregate is the availability zone!
○ Unlike aggregates hosts can not be in multiple availability zones.

SEGREGATION EXAMPLE

NEW IN KILO

API Microversions
● Compute API V2 has been in place for some time, was
to be superseded by V3.
● Determined that implementing new major version of API
would be too difficult:
○ User impact.
○ Developer overhead.
● V2 is extended by adding “extensions”, lots of them.

API Microversions
● Microversions aim to:
○ Make it possible to evolve the API incrementally.
○ Provide backwards compatibility for REST API users.
○ Improve code cleanliness to make doing the “right thing” easier.

API Microversions
● Use a single monotonic counter of the form X.Y where:
○ X will only be changed due to a significant backwards incompatible
API change is made. Expected to be rarely never incremented.
○ Y will be changed when making any change to the API. Whether such
a change is backwards compatible or not will be reflected via
documentation.
● Client will specify the version it supports, e.g.:
○ X-OpenStack-Nova-API-Version: 2.114

API Microversions
● Initial implementation in Kilo:
○ v2.0 API code still used to serve v2.0 API requests.
■ Plan is in Liberty v2.1 API code will serve both v2.0 and v2.1.
○ v2.0 API is frozen:
■ All new features will be added to v2.1 using microversions.
○ python-novaclient does not yet support v2.1.

vCPU Pinning
● Allows assignment of vCPU cores, and the associated
emulator threads, to dedicated pCPU cores.
● Administrator defines host(s) that accept dedicated
resourcing requests, scheduler places guests on them.
○ Reserve cores for guests using kernel isolcpus and nova
vcpu_pin_set
○ Create flavor and matching host aggregates.
● Scheduler and agent work together to assign
appropriate CPU cores for vCPUs.

Huge Pages
● Huge pages allow the use of larger page sizes (2M, 1
GB) increasing CPU TLB cache efficiency.
○ Backing guest memory with huge pages allows predictable memory
access, at the expense of the ability to over-commit.
○ Different workloads extract different performance characteristics from
different page sizes - bigger is not always better!
● Administrator reserves large pages during compute
node setup and creates flavors to match:
○ hw:mem_page_size=large|small|any|2048|1048576
● User requests using flavor or image properties.

I/O (PCIe) based NUMA Scheduling
● Extends Libvirt driver to capture NUMA locality of PCI
devices on the host.
● Extends NUMATopologyFilter to take into account
locality of any PCI devices being passed to the guest.

Standalone EC2 API
● Aims to:
○ Implement AWS Virtual Private Cloud API.
○ Provide the EC2 API as a standalone service.
○ Ultimately replace/supersede current Nova EC2 implementation.
● Current state:
○ Recent 0.1.0 release:
■ https://launchpad.net/ec2-api/trunk/0.1.0
○ In addition to Nova EC2 API coverage includes:
■ VPC API
■ Filtering
■ Tags
■ Paging

Storage Enhancements
● Consistent snapshots using qemu-guest-agent
● Libvirt driver support for KVM/QEMU built-in iSCSI
initiator - allow direct attachment of volumes to guests.
● vCenter driver support for vSAN datastores.
● vCenter driver support for ephemeral disks.
● Libvirt and Hyper-V driver support for SMB based
volumes.

New In-tree Driver Support
● Libvirt driver support for IBM System Z (KVM)
● Libvirt driver support for Parallels Cloud Server

THANK YOU
@xsgordon
http://www.slideshare.net/sgordon2/

Compute 101 - OpenStack Summit Vancouver 2015

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Compute 101 - OpenStack Summit Vancouver 2015

Similar to Compute 101 - OpenStack Summit Vancouver 2015 (20)

More from Stephen Gordon

More from Stephen Gordon (6)

Recently uploaded

Recently uploaded (20)

Compute 101 - OpenStack Summit Vancouver 2015