The document discusses plans to deploy baremetal instances on OpenStack for the NII dodai-compute2.0 project. It compares the architectures of dodai-compute1.0 and NTTdocomo-openstack for coupling with Nova scheduler, OS provisioning, and network virtualization. The goal is to extend the upstream framework to support the use cases of NII's Academic Research Cloud, while contributions remain in the upstream project.
2. NII dodai-compute2.0 project
$ who am i
Etsuji Nakai
– Senior solution architect and cloud
evangelist at Red Hat.
– Working for NII (National Institute of
Informatics Japan) as a cloud technology
consultant.
– The author of “Professional Linux Systems” series.
• Available only in Japanese. Translation offering from publishers are welcomed ;-)
Professional Linux Systems Professional Linux Systems Professional Linux Systems
Technology for Next Decade Deployment and Management Network Management
2
4. NII dodai-compute2.0 project
Why does baremetal matter?
General usecase
– I/O Intensive application (RDB)
– Realtime application (Deterministic latency)
– Native Processor Features
– etc....
Specific usecase in “Academic Research Cloud (ARC)” of NII
– Flexible extension of existing server cluster.
– Flexible extension of existing cloud infrastructure.
4
5. NII dodai-compute2.0 project
Academic Research Cloud (ARC) in NII, today.
This is a prototype of the Japan-wide research cloud.
– It's now running in NII's laboratories, and will be extended as a Japan-wide research cloud.
Research labs can extend their existing clusters (HPC cluster, cloud infrastructures,
etc...) by attaching baremetal servers from the resource pool.
L2 connection(VLAN)
Baremetal Resource Pool Existing HPC Cluster
・・・ ・・・
・・・
Existing Cloud Infrastructure
・・・ ・・・
Self Service Portal
On-demand provisioning/
de-provisioning Flexible extension of
5 existing cluster
6. NII dodai-compute2.0 project
Future plan of the ARC.
ARC will be extended as a Japan-wide cloud with SINET4 WAN connection.
– SINET4 is a MPLS based wide area Ethernet service for academic facilities in Japan, operated
by NII.
Baremetal Resource Pool Existing HPC Cluster
・・・ ・・・
・・・
Existing Cloud Infrastructure
・・・ ・・・
http://www.sinet.ad.jp/index_en.html MPLS based
Wide Area Ethernet
6
7. NII dodai-compute2.0 project
Overview of dodai-compute1.0
What is dodai-compute?
– Baremetal driver extension of Nova, currently used in ARC.
• Designed and developed by NII in 2012
• Based on Diablo with Ubuntu 11.10
• Source codes – https://github.com/nii-cloud/dodai-compute
– Upside: Simple extension aimed for the specific usecase :-)
– Downside: Unsuitable for general usecase :-(
• Cannot manage mixed environment of baremetal and hypervisor hosts.
• One-to-one mapping from instance flavor to baremetal host. (No scheduling
logic to select suitable host automatically.)
• Nonstandard use of availability zone. (Used for host status management.)
The most outstanding issue -
It's not merged in upstream. No community support,
No future!
7
8. NII dodai-compute2.0 project
Planning of ARC baremetal provisioning feature
It should be designed based on the framework in the upstream.
– Existing framework: GeneralBareMetalProvisioningFramework.
• So called “NTTdocomo-openstack.”
• Blueprint - http://wiki.openstack.org/GeneralBareMetalProvisioningFramework
• Source codes - https://github.com/NTTdocomo-openstack/nova
As a first step, we compared the architectures of “dodai-compute”
and “NTTdocomo-openstack”, and considered the following things.
– What's common and what's uncommon?
– What can be more generalized in “NTTdocomo-openstack”?
– What should be added to be used for ARC?
The goal of the project “dodai-compute2.0” is
- Extend the upstream framework for ARC.
- Not to be a private branch, stay in the upstream.
Note:
– NTTdocomo-openstack branch has been merged in the upstream with many modifications. Although this slide is
based on NTTdocomo-openstack branch, the future extension will be done directly on the upstream.
8
9. NII dodai-compute2.0 project
By the way, what does “dodai” stand for?
1. Base, Foundation, Framework, etc...
2. A sub flight system (SFS) featured in Mobile Suit Gundam.
9
13. NII dodai-compute2.0 project
General flow of instance launch
Question:
– How can we apply baremetal servers in place of VM instances in
this structure?
VM VM
Select host
for new instance Compute Driver
・・・
Register hosts
to scheduler
Nova Scheduler VM VM
Launch VM
Asks to launch instance
Compute Driver
13
14. NII dodai-compute2.0 project
A1. Register “Baremetal Pool” as an “Instance Host”
dodai-compute takes this approach. Its driver acts as a single host which
accommodates multiple baremetal servers.
Launch
baremetal
Select baremetal server
server
to launch
Baremetal
Asks to Pool
launch instance
Compute Driver
Nova Scheduler Register pools
Select pool Baremetal
for new instance Pool
Compute Driver
14
15. NII dodai-compute2.0 project
A2. Register each baremetal as a “Single Instance Host”
NTTdocomo-openstack takes this approach. Its driver acts as a proxy for
baremetal servers, each of them accommodates just one instance.
Launch selected
baremetal server
Register each baremetal
as host
Asks to
Nova Scheduler launch instance Compute Driver
Select baremetal server
for new instance
15
16. NII dodai-compute2.0 project
Class structure for coupling with Nova
dodai-compute1.0 and NTTdocomo-openstack has basically the same class
structure in terms of coupling with Nova.
– The drawing is the case of dodai-compute1.0
– NTTdocomo-openstack uses “BareMetalDriver” in place of “DodaiConnection”
Base class of different kinds
of visualization hosts
Driver for libvirt managed
hypervisor (KVM/LXC)
Driver for baremetal management
https://github.com/nii-cloud/dodai-compute/wiki/Developer-guide
16
17. NII dodai-compute2.0 project
How does Nova Scheduler see baremetal servers?
dodai-compute's driver acts as a single host which accommodates multiple
baremetal servers.
– It's like representing a baremetal pool as a single “Host” which runs baremetal
servers as its “VM's”.
– Scheduling policy is implemented in the driver side. (Nova Scheduler has no choice
of hosts.)
Nova API
Nova Scheduler Scheduler recognizes
it as a single host
Nova Compute dodai db(Baremetal
(dodaiConnection) serverinformation)
A host of “baremetal VM's”
Choose host to provision
by referring to dodai db
・・・
17
18. NII dodai-compute2.0 project
How does Nova Scheduler see baremetal servers?
NTTdocomo-openstack driver acts as a proxy of all baremetal hosts.
– Each baremetal server is seen as an independent host which can accommodate up to
one instance.
– Scheduling policy is implemented as a part of Nova Scheduler. It uses "extra_specs”
metadata to distinguish baremetal hosts from hypervisor hosts.
Scheduler recognizes
all baremetal hosts
Nova API
Nova Scheduler
Register all hosts by
referring to baremetal db
Hosts of just one instance Nova Compute beremetal db (Baremetal
(BareMetalDriver) serverinformation)
・・・
18 extra_specs=cpu_arch:x86_64
19. NII dodai-compute2.0 project
Considerations on the Nova Scheduler coupling
dodai-compute
– Scheduling (server selection logic) is up to the driver.
• Currently, there's no intelligence in the driver's scheduler. One-to-one mappings
between physical servers and instance types are pre-defined.
• However, it enables users to choose a baremetal server explicitly.
NTTdocomo-openstack
– Scheduling (server selection logic) is up to Nova Scheduler.
• Currently, the standard “Filter Scheduler” is used.
• “instance_type_extra_specs=cup_arch:x86_64” is used to distinguish
baremetal hosts from hypervisor hosts.
• Users cannot choose a baremetal server to use explicitly.
This must be addressed for ARC usecase.
We may use additional “labels” in instance_type_extra_specs, like,
“instance_type_extra_specs=cpu_arch:x86_64,racklocation:a32”
19
21. NII dodai-compute2.0 project
OS Installation Mechanism of dadai-compute1.0
The basic flow of OS installation in dodai-compute1.0
– Management IP (IPMI) of baremetal servers are stored in database.
– The driver prepares a boot image and an installation script.
– The actual installation works are handled by the script.
(2) Pass installation script
URL as a kernel parameter
BareMetal PXEBoot
(1) Fetch the target image from Glance
Driver Server
(tar ball of root file system contents),
And prepare the installation script.
pxe boot image
OS Installation
Server
(4) Fetch the image tar ball,
Baremetal
and expand it into the local disk
Server
(3) Fetch the installation script and run it.
21
22. NII dodai-compute2.0 project
OS Installation Mechanism of NTTdocomo-openstack
The basic flow of OS installation in NTTdocomo-openstack.
– Management IP (IPMI) of baremetal servers are stored in database.
– The driver prepares a boot image and an installation script.
– The actual installation works are handled by the script.
(2) Embed installation script
into the init script
BareMetal PXEBoot
(1) Fetch the target image from Glance
Driver Server
(dd image of root filesystem),
And prepare the installation script.
pxe boot image
OS Installation
Server
(4) Attache the iSCSI LUN,
and fill it with the dd image.
Baremetal
Server
(3) export local disk as an iSCSI LUN,
and ask installation service to fill it.
22
23. NII dodai-compute2.0 project
OS Installation Mechanism
The basic framework is
the same for both of them.
– Management IP (IPMI) of baremetal servers are stored in database.
– The driver prepares a pxe boot image to start OS installation.
– The actual installation works are handled by scripts in the boot image.
The difference just lies on the actual installation method.
– Installation script of dodai-compute1.0:
• Make partitions and filesystems on the local disk.
• Fetch tar.gz image and unbundle it directly to the local filesystem.
• Install grub to the local disk.
– Installation script of NTTdocomo-openstack:
• Start tgtd (iSCSI target daemon) and export the local disk as an iSCSI LUN.
• Ask the external “Installation Server” to install OS in that LUN.
• The installation server attaches the LUN and copy “dd” image to it.
• Grub is not installed. The baremetal relies on PXE boot even for bootstrapping
of OS provisioned in the local disk.
So,...
23
24. NII dodai-compute2.0 project
Considerations on OS Installation Mechanism
We could give more general framework
which allows multiple installation methods.
Registered machine images need to have meta-data to specify:
– Type of Installation Service
(2) Prepare PXE boot image
– Installation service's FQDN corresponding to the
• We may use “properties attribute” of the image. selected installation service
BareMetal PXEBoot
Driver Server
(1) Prepare the target
Image in the corresponding
installation service pxe boot image/
initrd script for the selected
OS Installation installation service
Server A
OS Installation Baremetal
Server B Server
(3) Script in initrd starts the installation
using the selected installation service.
24
25. NII dodai-compute2.0 project
Considerations on OS Installation Mechanism
Candidates of Installation Service:
– Existing ones such as in dodai-compute and NTTdocomo-openstack.
– We'd like to add Kickstart method, too.
• The image contains a ks.cfg file instead of an actual binary image.
• The installation service install the baremetal using Kickstart.
Kickstart gives more flexibility and ease of use
for customizing image contents.
25
27. NII dodai-compute2.0 project
Network configuration of dadai-compute1.0
L2 separation is done by VLAN.
– Each lab has its own fixed VLAN ID assigned on SINET4.
– dodai-compute asks OpenFlow controller to setup a SINET4
port/VLAN mapping. VLAN is explicitly specified by a
user.
– Mappings between baremetal's NICs and associated VLAN Trunking
switch ports are stored in database.
Service Network Service Network
OS side configuration is done by the local agent. Switch #1 Switch #2
– NIC bonding is also configured for redundancy.
– NIC bonding is mandatory in ARC. bonding
Service IP
OpenFlow
Service IP and Bonding config is Controller
done by local agent based on
Baremetal
the request from dodai-compute Server Port/VLAN mapping
Management IP PXE Boot /
(Fixed) Agent Operations
Management Network
dodai-compute
27
28. NII dodai-compute2.0 project
Network configuration of NTTdocomo-openstack
Virtual Network is managed by Quantum API and
NEC OpenFlow Plug-in.
– L2 separation is done port-based packet
separation using flowtable entries.
– Mappings between baremetal's NICs and associated
switch ports are stored in database.
– VLAN based separation needs to be added for ARC
usecase.
Service Network
– When a user specifies more than two NICs, the Switch
driver choose unused NICs from the database and
setup the flowtable entries for associated ports.
– NIC bonding mechanism needs to be added for ARC Service IP
usecase.
Baremetal OpenFlow
Server Controller
Management IP PXE Boot
(Fixed)
Management Network
BaremetalDriver
28
29. NII dodai-compute2.0 project
How will Quantum API be used for ARC usecase?
Using Quantum API and plugin is a preferable
choice for ARC. But we need some
modification/extension, too.
VLAN based separation needs to be added
for ARC usecase.
SINET4
– Our plan is to add BareMetal VLAN plugin
which configures port/VLAN mappings using
flowtable entries, or directly configures port-
VLAN on CISCO switches. VLAN Trunking
– This enables us not only SINET4 VLAN
connection but also interconnection with VM Service Network Service Network
instances using OVS plugin(via VLAN). Switch #1 Switch #2
NIC bonding mechanism needs to be added
Port VLAN
for ARC usecase.
– As all NICs of baremetal servers are registered VLAN Trunking
in database, we may add redundancy
information there. (eg. NIC-A should be paired
with NIC-B for bonding.) OVS Plugin
BareMetal
– We may still need a local agent to make actual
VLAN Plugin
bonding configuration. Hypervisor Host
Baremetal Server
29
31. NII dodai-compute2.0 project
Summary
Target areas for the future extension:
1. Scheduler extension for grouping of baremetal servers.
– Allowing users to specify baremetal servers to be used.
2. Multiple OS provisioning method.
– Allowing multiple types of OS images such as:
• dd-image (NTTdocomo-openstack style)
• tar ball (dodai-compute style)
• Kickstart installation (new feature)
3. Baremetal Quantum plugin for VLAN inter-connection.
– Allowing inter-connection to existing VLAN networks.
– Allowing NIC-bonding configuration.
As NTTdocomo-openstack branch has been merged in the upstream, the
future extension will be done directly on the upstream.
31