CLIMB Technical Overview
Arif Ali
Thursday 13th July
Hardware Overview
• IBM/Lenovo x3550 M4
– 3 x Controller Nodes (Cardiff Only)
– 1 x Server Provisioning Node
• IBM/Lenovo x3650 M4
– 3 x Controller Nodes (Warwick and Swansea)
– 4 x GPFS Servers
• IBM/Lenovo x3750 M4
– 21 x Cloud Compute nodes
• IBM/Lenovo x3950 X6
– 3 x Large Memory Nodes
• IBM Storwize V3700 Storage
– 4 x Dual Controllers
– 16 x Expansion Shelves
Cardiff HW Layout
Warwick Rack Layout
Swansea Rack Layout
Key Architecture Differences
• Cardiff University has x3550 M4 instead of x3650 M4 for
controller nodes
• Cardiff University use Cat6 for 10G instead of 10G DACs
Key Architecture Challenges
• Cable Lengths
• Cable Types
• Rack layouts
• Differences in Hardware and design
Software Overview
• xCAT
• IBM Spectrum Scale (originally GPFS)
• CentOS 7
• SaltStack
• RDO OpenStack Juno/Kilo
• Icinga
xCAT
• eXtreme Cluster/Cloud Administration Toolkit
• Management of clusters (Clouds, HPC, Grids)
• Baremetal Provisioning
• Scriptable
• Large scale management (Lightsout, remote console,
distributed shell)
• Configures key services based on tables
Why we are using xCAT?
• Provides tight integration with IBM/Lenovo Systems
– Automagic Discovery
– IPMI integeration
• Can manage the Mellanox switches from CLI
• OCF is very experienced, and has development
experience
xCAT Configuration
• Base images for each machine
– Highmem/compute
– Controller
– Storage
• Network configuration is defined within xCAT
• Only salt-minion is configured through xCAT
• All software and configs are done via SaltStack
What is SaltStack
“Software to automate the management and configuration
of any infrastructure or application at scale”
• Uses YAML
• Security, controlled by server/client public/private keys
• Daemons are on master and client (minions)
Why SaltStack
• Previously used by UoB, where some of the OpenStack
configuration had already been started
• Automate the configuration
• Consistency across installations
• Re-usable for future installs
• Repeatable
OpenStack and SaltStack
• Integration of some key applications
– Keystone
– Rabbitmq
– Mysql/mariadb
What is OpenStack?
“To produce the ubiquitous Open Source cloud computing
platform that will meet the needs of public and private
cloud providers regardless of size, by being simple to
implement and massively scalable”
OpenStack Logical Architecture
Conceptual Architecture
Nova (Compute)
• Manage virtualised server
resources
• Live guest migration
• Live VM management
• Security groups
• VNC proxy
• Support for various
hypervisors
– KVM, LXC, VMWare, Xen,
Hyper V, ESX
APIs Supported:
• OpenStack Compute API
• EC2 API
• Admin API
Nova Configuration
• EC2 API has been enabled
• Enable extra extensions for monitoring for ceilometer
• Epheremal storage is centrally located on GPFS
• Security Groups are controlled by neutron
• Enabled live migration
• Create snapshots using RAW format
• Availability zones to distinguish between normal cloud nodes and
large memory nodes
Neutron (Networking)
• Framework for Software
Defined Network (SDN)
• Responsible for managing
networks, ports, routers
• Create/delete L2 networks
• L3 support
• Attach/Detach host to
network
• Support for SW and HW
plugins
– OIpenvSwitch, OpenFlow, Cisco
Nexus, Arist, NCS, Mellanox
Neutron Configuration
• Enable security groups
• Use of network type to be VXLAN
• Overlapping of IPs is allowed
• Increase the dhcp agents to 2
• Enable layer 3 HA
Glance (Image Registry)
• Image Registry, not Image Repository
• Query for information on public and
private disk images
• Register new disk images
• Disk imagae can be stored in and
delivered from a variery of stores
– Filesystem, Swift, Amazon S3
• Supported formats
– Raw, Machine (AMI), VHD (Hyper-V), VDI
(VirtualBox), qcow2 (Qemu/KVM), VMDK
(VMWare), OVF (VMWare), and others
Glance Configuration
• The images for glance are stored into GPFS filesystem or
swift
• The scrub data is located locally on the controllers
Keystone (Authentication)
• Identity service provides auth credentials validation and data
• Token service validates and manages tokens used to authenticate
requests after initial credential verification
• Catalog service provides and endpoint registry used for endpoint
discovery
• Policy service provides a rule-based authorisation engine and the
associated rule management interface
• Each service configured to serve data from pluggable back- end
– Key-Value, SQL, PAM, LDAP, Templates
Keystone Configuration
• Using V2 API
• Local authentication which is stored in the database
Swift (Storage)
• Object server that stores
objects
• Storage, retrieval, deletion
of objects
• Updates to objects
• Replication
• Modeled after Amazon
S3’s service
Swift Configuration
• Swift is configured to store it’s data on GPFS
• The rings for swift have to be created manually
Cinder (Block Storage)
• Responsible for managing lifecycle of
volumes and exposing of attachment
• Enables additional attached persistent
block storage to virtual machines
• Allows multiple volumes to be attached
per virtual machine
• Supports following
– iscsi
– RADOS block devices (e.g. Ceph)
– NetApp
– gpfs
• Similar to Amazon EBS Service
Cinder Configuration
• The block devices within OpenStack are stored on GPFS
• Enable Copy on Write, which is a feature that is available
in the GPFS driver for OpenStack
• Specify the specific gpfs storage pool that would be used
for Cinder
OpenStack has drivers for GPFS for Cinder
Heat (Orchestration)
• Declarative, template defined
deployment
• Compatible with AWS
Cloudformation
• Many Cloudformation-compatible
resources
• Templating, using HOT or CFN
• Controls complex groups of cloud
resources
• Multiple use cases
Horizon (Dashboard)
• Provides simple self service
UI for end-users
• Basic Cloud administrator
functions
• Thin wrapper over APIs, no
local state
• Out-of-the-box support for
all core OpenStack projects
– Nova, glance, swift, neutron
• Anyone can add a new
component
• Visual and interaction
paradigm are maintained
Other useful projects
• OpenStack
– Ceilometer
– Trove
– Sahara
– Ironic
– Magnum
• Dependencies
– Rabbitmq
– Mariadb Galera Server
– HA Proxy
– keepalived
Why not RDO Manager/TripleO?
• Ironic was still in technology preview
• RDO Manager was not available in Juno/Kilo
• RDO Manager would then conflict with xCAT and it’s DHCP
configuration
• Issues with static IPs for machines until Mitaka. This would be
an issue for GPFS
• Only works in specific scenarios at the time
• A provisioning system would be required to install the GPFS
Servers
What is GPFS Spectrum Scale
• IBM Spectrum Scale (Previously known as GPFS)
• Parallel filesystem
– A single POSIX filesystem spans multiple block devices
– Concurrent filesystem access from multiple clients
• Feature Rich (tiering, ILM, replication, snapshotting)
• Distributed Architecture
Why we are using Spectrum Scale
• Building block scalable solution
• Large capacity
– Supports up to 8 Yottabyte (8422162432 PB)
– Supports up to 8 Exabyte Files (8192 PB)
• Proven Technology (Spectrum Scale started in 1993)
• Highly Parallel
– Scale Up
– Scale Out
• Native client access over InfiniBand
• IBM are supporting active development for OpenStack
and involving the community.
• Storage Tiering
OpenStack links with GPFS
• Spectrum Scale provides OpenStack drivers (Juno)
• Spectrum Scale provides snapshotting, Copy on Write,
large concurrent file access
• Has detailed documentation on Swift configuration
• Road maps to provide further support (including Manilla)
Storage Capacity and Configuration
• 533TB on Swansea and Warwick
• 399TB on Cardiff
• Mounted into /gpfs
• Used RDMA
• Cinder driver being directly used storage pool nlsas in
/gpfs/data/cinder
• Separate inode space is required for Swift with a new fileset
in /gpfs/swift
• Epheremal storage for Instances also on gpfs, and this is
located in /gpfs/data/nova
• Finally the glance store is located in /gpfs/data/glance
Initial SW install
• RDO OpenStack Juno
• CentOS 7.0
• GPFS 4.1.0-5
Upgraded Installed SW
• RDO OpenStack Kilo
• CentOS 7.1
• GPFS 4.1.1
Upgrade to Kilo
• Migration of the salt configs were done internally at OCF
• First run of upgrade was implemented on Warwick
• A re-installation of the system on Warwick, once fine-
tuned
• Same re-installation was implemented on Swansea
• Finally, migration of Cardiff from Juno to Kilo
– Test bed at OCF
– Challenges
• Kilo required FQDN for hosts
• Updated cinder GPFS driver required manual intervention
Future Development
• Collaborate work with openstack-salt team
• Modularise the config
• Add support for Keystone V3 API
• Move keystone from eventlet to web based
• Add support for OpenStack versions after kilo, such as
liberty and mitaka
• Maybe use TripleO to do the OpenStack deployment
Questions / Comments
Contact
• Email: aali@ocf.co.uk
• IRC: arif-ali
• Resources: http://www.github.com/arif-ali/openstack-
lab
• Support: support@ocf.co.uk

Climb Technical Overview

  • 1.
    CLIMB Technical Overview ArifAli Thursday 13th July
  • 2.
    Hardware Overview • IBM/Lenovox3550 M4 – 3 x Controller Nodes (Cardiff Only) – 1 x Server Provisioning Node • IBM/Lenovo x3650 M4 – 3 x Controller Nodes (Warwick and Swansea) – 4 x GPFS Servers • IBM/Lenovo x3750 M4 – 21 x Cloud Compute nodes • IBM/Lenovo x3950 X6 – 3 x Large Memory Nodes • IBM Storwize V3700 Storage – 4 x Dual Controllers – 16 x Expansion Shelves
  • 3.
  • 4.
  • 5.
  • 6.
    Key Architecture Differences •Cardiff University has x3550 M4 instead of x3650 M4 for controller nodes • Cardiff University use Cat6 for 10G instead of 10G DACs
  • 7.
    Key Architecture Challenges •Cable Lengths • Cable Types • Rack layouts • Differences in Hardware and design
  • 8.
    Software Overview • xCAT •IBM Spectrum Scale (originally GPFS) • CentOS 7 • SaltStack • RDO OpenStack Juno/Kilo • Icinga
  • 9.
    xCAT • eXtreme Cluster/CloudAdministration Toolkit • Management of clusters (Clouds, HPC, Grids) • Baremetal Provisioning • Scriptable • Large scale management (Lightsout, remote console, distributed shell) • Configures key services based on tables
  • 10.
    Why we areusing xCAT? • Provides tight integration with IBM/Lenovo Systems – Automagic Discovery – IPMI integeration • Can manage the Mellanox switches from CLI • OCF is very experienced, and has development experience
  • 11.
    xCAT Configuration • Baseimages for each machine – Highmem/compute – Controller – Storage • Network configuration is defined within xCAT • Only salt-minion is configured through xCAT • All software and configs are done via SaltStack
  • 12.
    What is SaltStack “Softwareto automate the management and configuration of any infrastructure or application at scale” • Uses YAML • Security, controlled by server/client public/private keys • Daemons are on master and client (minions)
  • 13.
    Why SaltStack • Previouslyused by UoB, where some of the OpenStack configuration had already been started • Automate the configuration • Consistency across installations • Re-usable for future installs • Repeatable
  • 14.
    OpenStack and SaltStack •Integration of some key applications – Keystone – Rabbitmq – Mysql/mariadb
  • 15.
    What is OpenStack? “Toproduce the ubiquitous Open Source cloud computing platform that will meet the needs of public and private cloud providers regardless of size, by being simple to implement and massively scalable”
  • 16.
  • 17.
  • 18.
    Nova (Compute) • Managevirtualised server resources • Live guest migration • Live VM management • Security groups • VNC proxy • Support for various hypervisors – KVM, LXC, VMWare, Xen, Hyper V, ESX APIs Supported: • OpenStack Compute API • EC2 API • Admin API
  • 19.
    Nova Configuration • EC2API has been enabled • Enable extra extensions for monitoring for ceilometer • Epheremal storage is centrally located on GPFS • Security Groups are controlled by neutron • Enabled live migration • Create snapshots using RAW format • Availability zones to distinguish between normal cloud nodes and large memory nodes
  • 20.
    Neutron (Networking) • Frameworkfor Software Defined Network (SDN) • Responsible for managing networks, ports, routers • Create/delete L2 networks • L3 support • Attach/Detach host to network • Support for SW and HW plugins – OIpenvSwitch, OpenFlow, Cisco Nexus, Arist, NCS, Mellanox
  • 21.
    Neutron Configuration • Enablesecurity groups • Use of network type to be VXLAN • Overlapping of IPs is allowed • Increase the dhcp agents to 2 • Enable layer 3 HA
  • 22.
    Glance (Image Registry) •Image Registry, not Image Repository • Query for information on public and private disk images • Register new disk images • Disk imagae can be stored in and delivered from a variery of stores – Filesystem, Swift, Amazon S3 • Supported formats – Raw, Machine (AMI), VHD (Hyper-V), VDI (VirtualBox), qcow2 (Qemu/KVM), VMDK (VMWare), OVF (VMWare), and others
  • 23.
    Glance Configuration • Theimages for glance are stored into GPFS filesystem or swift • The scrub data is located locally on the controllers
  • 24.
    Keystone (Authentication) • Identityservice provides auth credentials validation and data • Token service validates and manages tokens used to authenticate requests after initial credential verification • Catalog service provides and endpoint registry used for endpoint discovery • Policy service provides a rule-based authorisation engine and the associated rule management interface • Each service configured to serve data from pluggable back- end – Key-Value, SQL, PAM, LDAP, Templates
  • 25.
    Keystone Configuration • UsingV2 API • Local authentication which is stored in the database
  • 26.
    Swift (Storage) • Objectserver that stores objects • Storage, retrieval, deletion of objects • Updates to objects • Replication • Modeled after Amazon S3’s service
  • 27.
    Swift Configuration • Swiftis configured to store it’s data on GPFS • The rings for swift have to be created manually
  • 28.
    Cinder (Block Storage) •Responsible for managing lifecycle of volumes and exposing of attachment • Enables additional attached persistent block storage to virtual machines • Allows multiple volumes to be attached per virtual machine • Supports following – iscsi – RADOS block devices (e.g. Ceph) – NetApp – gpfs • Similar to Amazon EBS Service
  • 29.
    Cinder Configuration • Theblock devices within OpenStack are stored on GPFS • Enable Copy on Write, which is a feature that is available in the GPFS driver for OpenStack • Specify the specific gpfs storage pool that would be used for Cinder OpenStack has drivers for GPFS for Cinder
  • 30.
    Heat (Orchestration) • Declarative,template defined deployment • Compatible with AWS Cloudformation • Many Cloudformation-compatible resources • Templating, using HOT or CFN • Controls complex groups of cloud resources • Multiple use cases
  • 31.
    Horizon (Dashboard) • Providessimple self service UI for end-users • Basic Cloud administrator functions • Thin wrapper over APIs, no local state • Out-of-the-box support for all core OpenStack projects – Nova, glance, swift, neutron • Anyone can add a new component • Visual and interaction paradigm are maintained
  • 32.
    Other useful projects •OpenStack – Ceilometer – Trove – Sahara – Ironic – Magnum • Dependencies – Rabbitmq – Mariadb Galera Server – HA Proxy – keepalived
  • 33.
    Why not RDOManager/TripleO? • Ironic was still in technology preview • RDO Manager was not available in Juno/Kilo • RDO Manager would then conflict with xCAT and it’s DHCP configuration • Issues with static IPs for machines until Mitaka. This would be an issue for GPFS • Only works in specific scenarios at the time • A provisioning system would be required to install the GPFS Servers
  • 34.
    What is GPFSSpectrum Scale • IBM Spectrum Scale (Previously known as GPFS) • Parallel filesystem – A single POSIX filesystem spans multiple block devices – Concurrent filesystem access from multiple clients • Feature Rich (tiering, ILM, replication, snapshotting) • Distributed Architecture
  • 35.
    Why we areusing Spectrum Scale • Building block scalable solution • Large capacity – Supports up to 8 Yottabyte (8422162432 PB) – Supports up to 8 Exabyte Files (8192 PB) • Proven Technology (Spectrum Scale started in 1993) • Highly Parallel – Scale Up – Scale Out • Native client access over InfiniBand • IBM are supporting active development for OpenStack and involving the community. • Storage Tiering
  • 36.
    OpenStack links withGPFS • Spectrum Scale provides OpenStack drivers (Juno) • Spectrum Scale provides snapshotting, Copy on Write, large concurrent file access • Has detailed documentation on Swift configuration • Road maps to provide further support (including Manilla)
  • 37.
    Storage Capacity andConfiguration • 533TB on Swansea and Warwick • 399TB on Cardiff • Mounted into /gpfs • Used RDMA • Cinder driver being directly used storage pool nlsas in /gpfs/data/cinder • Separate inode space is required for Swift with a new fileset in /gpfs/swift • Epheremal storage for Instances also on gpfs, and this is located in /gpfs/data/nova • Finally the glance store is located in /gpfs/data/glance
  • 38.
    Initial SW install •RDO OpenStack Juno • CentOS 7.0 • GPFS 4.1.0-5
  • 39.
    Upgraded Installed SW •RDO OpenStack Kilo • CentOS 7.1 • GPFS 4.1.1
  • 40.
    Upgrade to Kilo •Migration of the salt configs were done internally at OCF • First run of upgrade was implemented on Warwick • A re-installation of the system on Warwick, once fine- tuned • Same re-installation was implemented on Swansea • Finally, migration of Cardiff from Juno to Kilo – Test bed at OCF – Challenges • Kilo required FQDN for hosts • Updated cinder GPFS driver required manual intervention
  • 41.
    Future Development • Collaboratework with openstack-salt team • Modularise the config • Add support for Keystone V3 API • Move keystone from eventlet to web based • Add support for OpenStack versions after kilo, such as liberty and mitaka • Maybe use TripleO to do the OpenStack deployment
  • 42.
  • 43.
    Contact • Email: aali@ocf.co.uk •IRC: arif-ali • Resources: http://www.github.com/arif-ali/openstack- lab • Support: support@ocf.co.uk

Editor's Notes

  • #18 This diagram shows the relationships among the OpenStack services
  • #24 Having he scrub data locally ensures that it knows what it can delete from the database
  • #31 HOT: Short for Heat Orchestration Template, HOT is one of two template formats used by Heat. HOT is not backwards-compatible with AWS CloudFormation templates and can only be used with OpenStack. Templates in HOT format are typically—but not necessarily required to be—expressed as YAML (more information on YAML here). (I’ll do my best to avoid saying “HOT template,” as that would be redundant, wouldn’t it?) CFN: Short for AWS CloudFormation, this is the second template format that is supported by Heat. CFN-formatted templates are typically expressed in JSON (see here and see my non-programmer’s introduction to JSON for more information on JSON specifically).
  • #35 IBM Spectrum Scale (Previously known as GPFS) Tiger Shark, Multimedia FS, GPFS, SS Parallel filesystem A single POSIX filesystem spans multiple block devices Concurrent filesystem access from multiple clients Feature Rich (tiering, ILM, replication, snapshotting) Distributed Architecture Multiple manages, quorum, distributed workload
  • #36 Building block scalable solution Can expand the block storage in sections to expand the filesystem Large capacity Supports up to 8 Yottabyte (8422162432 PB) Supports up to 8 Exabyte Files (8192 PB) Explains it’s self Proven Technology (Spectrum Scale started in 1993) Actively developed, regular roadmap Spectrum Scale User Group Highly Parallel Scale Up Scale Out Add new jbods to exising storage systems to scale up Add new subsystems and servers to scale out Native client access over InfiniBand Fast, removes TCP overhead that can be upto 20% IBM are supporting active development for OpenStack and involving the community. Openstack / Spectrum Scale Calls Involvement at SSUG Storage Tiering Tier to flash storage, off to tape
  • #37 Spectrum Scale provides OpenStack drivers (Juno) Spectrum Scale provides snapshotting, Copy on Write, large concurrent file access Features that assist Openstack, saves space, speeds up image deployment Has detailed documentation on Swift configuration Road maps to provide further support (including Manilla)