SlideShare a Scribd company logo
Sanger and our upcoming flexible
compute platform
Peter Clapham - Jan 2017
Why a private cloud ?
Collaboration is hard enough already
HPC is a weak security model
Cat 4 data is a large elephant
We’re reaching POSIX scalability
Increasing demand for more flexibility regarding operating
systems and supplied libraries
Running services at scale should be able to burst to meet
demand and collapse when no longer required
We should be able to more readily take advantage of
developing technology
Linking up with common standards across the broader
community.
Openstack at Sanger.
July 2015 - Development Juno system.
September 2015 - Limited access POC Kilo system ( using Triple-O ).
January 2016 - Hybrid cloud for commercial entities.
June 2016 - Wider access POC Kilo system ( Triple-O ).
Sep -> Dec 2016 - First production Liberty system ( Triple-O )
Production openstack (I)
• 107 Compute nodes (Supermicro) each with:
• 512GB of RAM, 2 * 25GB/s network interfaces,
• 1 * 960GB local SSD, 2 * Intel E52690v4 ( 14 cores @ 2.6Ghz )
• 6 Control nodes (Supermicro) allow 2 openstack instances.
• 256 GB RAM, 2 * 100 GB/s network interfaces,
• 1 * 120 GB local SSD, 1 * Intel P3600 NVMe ( /var )
• 2 * Intel E52690v4 ( 14 cores @ 2.6Ghz )
• Total of 53 TB of RAM, 2996 cores, 5992 with hyperthreading.
• Redhat Liberty deployed with Triple-O
Production openstack (II)
• 9 Storage nodes (Supermicro) each with:
• 512GB of RAM,
• 2 * 100GB/s network interfaces,
• 60 * 6TB SAS discs, 2 system SSD.
• 2 * Intel E52690v4 ( 14 cores @ 2.6Ghz )
• 4TB of Intel P3600 NVMe used for journal.
• Ubuntu Xenial.
• 3 PB of disc space, 1PB usable.
• Single instance ( 1.3 GBytes/sec write, 200 MBytes/sec read )
• Ceph benchmarks imply 7 GBytes/second
Production openstack (III)
• 3 Racks of equipment, 24 KW load per rack.
• 10 Arista 7060CX-32S switches .
• 1U, 32 * 100Gb/s -> 128 * 25Gb/s .
• Hardware VxLan support integrated with openstack *.
• Layer two traffic limited to rack, VxLan used inter-rack.
• Layer three between racks and interconnect to legacy systems.
• All network switch software can be upgraded without disruption.
• True linux systems.
• 400 Gb/s from racks to spine, 160 Gb/s from spine to legacy
systems.
(* VxLan in ml2 plugin not used in first iteration)
But what are we providing?
CloudForms
Service driven access
OpenStack Horizon
Granular control over
instance
Direct API Access
Direct https access
from anywhere
Accessible only from
within Sanger
Ceph Object Storage
(Used to provide volume
and image storage)
S3 Object
Storage
Layer
How Does this Fit with Existing
Services?
OpenStack “Bubble”
Compute
Ceph
100GB/s SDN network
infrastructure
Sanger internal systems
Access to secured
services
i.e.
iRODS
Databases
CIFS (Windows shares)
S3 API access
OpenStack API and GUI
access
80GB/s connectivity
No access to:
NFS
Lustre
CloudForms Interface
Horizon Interface
Efficient Resource
Management
OpenStack resources are managed at a tenant group level.
• Each “tenant” group has an assigned quota for:
• Disk
• CPU
• Memory
Once limits are full, tenant members will either have to wait for resources to
become available or shutdown or terminate a running instance.
Initial quotas are agreed with the IC before creation
Quotas, they are not all the
same.
Some groups have a requirement that they have an absolute number of spots
available for essential services
Other groups would like to burst to meet demand as required.
These requirements do not fit well with each other.
The Proposed Workaround
For those projects which require guaranteed access:
• We create a dedicated tenant group that has specific access to a set quota
allocation of vCPU, Disk and memory.
• This is tied directly against reserved hardware
This guarantees requested resource will be available when required, whilst
providing security, operating system flexibility and instance management.
BUT there is no ability to use more than the requested allocation
Dynamic Workflows
Dynamic workflows can expand to meet demand and collapse when not required.
So a quota that matches the initial resource request will mean constantly under
quota’ing the system
For the initial release we will start by:
• Overcommitting CPU by 1.5 : 1 (available total vCPU ~9000)
• Overallocate quotas so that 115% of the overcommitted vCPU is available to
tenants.
So some initial ability to use more of the system than may be available.
For More Details, see
https://docs.google.com/a/sanger.ac.uk/document/d/17z9urhh3bTLRhQo9b8Ccs
ZW_3O7cxlGY9uiwpAS_GqQ/edit?usp=sharing
Or
http://tinyurl.com/zzurp5s
We are adding monitoring and metrics gathering to the system. This
will provide a feedback loop for quota and project management.
New Opportunities for
Application Development
Cloud application development aims to scale out compute and
provide:
• Auto scaling of key services
• Making pipelines cost effective on commercial platform providers
• Self-healing of service components that fail
• Creating resilient services with reduced impact when service components fail
• Not tied to any one specific environment
• Enabling sharing of code, images and services with collaborators. This can
dramatically reduce the need to copy large data sets around the world and permit
running complex pipelines where the data resides.
How do we see Migration ?
Initial Early Adopters.
We have some early adopters !
1. Mutational signatures
2. Imputation service
3. Blast service
4. Pan-prostate
We look forward to hearing more from
these groups soon !
Mostly Share a Common
Approach
Web
Interface
Data
upload
Run
analysis
Update job
status data base
Present data
Invoke
Analysis
Retain a copy
Adaption to Cloud based tools
Stage Current approach Cloud approach
User details local databases, directory services, Oauth Oauth, directory services
Data Downloads Globus or https S3, Globus
Job status RDBM: MySQL, Oracle or PostgreSQL NoSQL: Mongodb, Cassandra or REDIS
Invoke Job
Analysis
Hand crafted equest to LSF AMQP
Run Analysis LSF job submission AMQP, Heat orchestration or API call to Openstack
Present data Make available via sftp, Globus or https web
upload
S3 automatically generated URL's
Keep data No consistent approach S3, archive as required
Service failure Await systems Use IFTTT or add code to instance to raise or restart an instance as
required
Autoscale options Await systems Use IFTTT or add code to instance to raise or restart an instance as
required
Service discovery Manual Cloud init, heat templates, dynamic DNS
New service, New Image ?
Cloud software stacks are based around services (micro-services) and
are an exemplar of service-orientated architectures.
Instances are mostly started from pre-created images and these form
the building blocks for a given service.
Starting with:
• Ubuntu with Docker support
• Rstudio
• An NFS server
• OpenLava cluster
But what if you need something different ? You could ask or you could use the tools provided to create your
own. Think /software+++
Developing machine images.
• Start simple and add complexity later.
• We understand that Biologists are not often software engineers.
• We believe that the process of images creation should be codified
and software development best practices followed.
• Openstack images are based on images from a vendor.
• It is possible to import other virtualisation system images to
Openstack ( these images could be made with automated tools ).
• Virtualisation allows the possibility of software reproducibility.
Software development
• Source control ( git ), gitflow
• Infrastructure as code ( Packer )
• Continuous Integration ( gitlab CI )
• Test driven development ( test kitchen )
Git branches (gitflow)
• Gitflow http://nvie.com/posts/a-successful-git-branching-model/
• We follow the principle but do not use the software
• The master branch is always useable.
• New features are integrated on the development branch.
• Develop on a feature branch created from the development branch.
• When a feature is complete pull feature to development branch.
• When a set of features is ready pull development to master and tag release.
• Develop bug fixes on a branch off development and cherry pick to bug
release branch created from tag of release.
Semantic versioning
MAJOR.MINOR.PATCH
http://semver.org/spec/v2.0.0.html
• MAJOR version when you make incompatible API changes,
• MINOR version when you add functionality in a backwards-
compatible manner, and
• PATCH version when you make backwards-compatible bug fixes.
We treat changes in environment variables as a change to the “api”.
Packer
• https://packer.io/
• Machine image configuration as code.
• In use by systems at Sanger since 2014 ( used to build lustre clients )
• Supports multiple virtualization platforms.
• Supports both linux and Windows.
• Simple example that can be used without CI:
• https://github.com/wtsi-ssg/image-creation
Packer, Provisioners
• Provisioners change the state of the machine.
• Provisioners are bits of code written in various languages.
• Multiple provisioners are allowed in an order.
• Can be restricted to specific builds.
• Shell - simple shell scripts.
• File uploads.
• Ansible
• Chef, Puppet, Salt
• Powershell, Windows-Shell
Packer, Builders
• Builders are responsible for creating machines and generating
images from them for various platforms.
• Amazon, Takes an Image and applies changes.
• Openstack, Takes an Image and applies changes.
• Vmware , Uses an ISO and installs, then applies changes.
• Docker, Takes a container and applies changes.
• VirtualBox, Uses an ISO and installs, then applies changes.
• Others….
Gitlab CI
• Allows processes to be run in response to a push to a repository.
• Configured by a yaml file ( .gitlab-ci.yml )
• A build consists of multiple stages.
• Each stage is run sequentially.
• Parallel execution of tasks in each stage
• State needs to be stored in separate files/directories ( $CI_BUILD_ID )
• Tags control which processes execute the stage.
• https://about.gitlab.com/gitlab-ci/
Test Kitchen
• http://kitchen.ci/
• Creates new instances to run tests on.
• Drivers for various systems eg.
• Amazon
• Openstack
• Docker
• Windows
• Configured with a single file ( .kitchen.yml ) which is a erb template.
Test Kitchen
• Each group should have an openstack tenant for CI.
• Credentials are stored in gitlabs variables section.
• Tenant needs to have a ssh security group.
• Tenant needs a single network.
• Configuration is shared in environment variables.
• Supports multiple test frameworks:
• ServerSpec
• Bats
Testing orchestration
Test kitchen can have multiple servers running at one time, each test
runs from a separate directory, this allows us to test client server
systems:
• In a server directory start a machine and run server tests.
• Extract the internal ip address from the master.
• In a client directory start a machine, inject master location.
• Run client tests.
• Stop client, stop master.
ServerSpec
• RSpec is a behaviour-driven development framework for unit tests.
• ServerSpec allows rspec tests to check server status.
• E.g
require 'serverspec'
# Required by serverspec
set :backend, :exec
describe "file system checks" do
describe file('/data1') do
it { should
be_mounted }
end
end
A CI workflow
image creation
• Our base image.
• Used to make changes that will affect all the images.
• https://github.com/wtsi-ssg/image-creation-ci
• Multiple tags, each tag is a release eg.
• v5.0.0 migration from openstack beta to openstack gamma
• v6.0.0 adding ansible as a system for configuration
• v7.0.0 adding support for xenial and centos 7.2 as well as trusty.
ISG repository.
• https://github.com/wtsi-ssg/simple-image-builder
• Continuous Integration and tests infrastructure framework already
available, additional tests will need writing.
• Chain of software reproducibility relies on
• Trust that vendor built an image consistently.
• Note that operating system packages will be pulled in a time of creation.
• Critical components need to be pulled in from a fixed source.
• Test should be written to validate system.
Batch scheduling is a bit old...
Openlava image
• A single image which is used for both master/head node and
compute nodes.
• Includes NFS server for home directory.
• Currently based on trusty ( Ubuntu 14.04 ).
• Development branch for Xenial ( Ubuntu 16.04 ) .
• Development branch for Centos 7.2 .
• ServerSpec tests using multiple servers.
A canned demonstration
New tools and images are
already being created internally
WR from Sendu:
https://github.com/VertebrateResequencing/wr
NPG are producing an AMQP service image
https://gitlab.internal.sanger.ac.uk
When completed
And there’s more !
We are listening to your requests for features and supporting
infrastructure:
https://docs.google.com/spreadsheets/d/1_oeBz27beLLj_4xe3yoyZYj
paYhDFTE__F_L6pVNcTE/edit
OR
http://tinyurl.com/z5bh5q5
But also online tutorials, videos
and documentation
Some, hopefully, useful examples have been collated here:
https://ssg-
confluence.internal.sanger.ac.uk/display/OPENSTACK/Distributed+ap
plications%3A+links+and+resources
Or
http://tinyurl.com/gwbrtfl
And still more
10th March OpenStack event here at Sanger
Tim Bell from Cern.
• Head of the CERN OpenStack team
• 200,000+ vCPU’s
• Many Many PB of Ceph
Final schedule TBA
Almost done
Release date:
On time for March 1st.
Watch out for the upcoming flyers
Acknowledgements
Current group staff: Pete Clapham, James Beal, Helen Brimmer, John Constable,
Brett Hartley, Dave Holland, Jon Nicholson, Matthew Vernon.
Previous group staff: Simon Fraser, Andrew Perry, Matthew Rahtz.
All our early testers and those who have provided constructive feedback !
P.S.
11 more days to migrate from lustre 108, 109,110 and 111 before the
system is made read only.
And only 1 month (1st March) until the old lustre systems are securely
wiped and ready for removal from campus
Remember, lustre is not backed up

More Related Content

What's hot

Speed Up Your Existing Relational Databases with Hazelcast and Speedment
Speed Up Your Existing Relational Databases with Hazelcast and SpeedmentSpeed Up Your Existing Relational Databases with Hazelcast and Speedment
Speed Up Your Existing Relational Databases with Hazelcast and Speedment
Hazelcast
 
Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...
Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...
Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...
Spark Summit
 
Enterprise Grade Streaming under 2ms on Hadoop
Enterprise Grade Streaming under 2ms on HadoopEnterprise Grade Streaming under 2ms on Hadoop
Enterprise Grade Streaming under 2ms on Hadoop
DataWorks Summit/Hadoop Summit
 
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
Yahoo Developer Network
 
Big Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and MesosBig Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and Mesos
Heiko Loewe
 
State of Spark in the cloud (Spark Summit EU 2017)
State of Spark in the cloud (Spark Summit EU 2017)State of Spark in the cloud (Spark Summit EU 2017)
State of Spark in the cloud (Spark Summit EU 2017)
Nicolas Poggi
 
Micro-batching: High-performance Writes (Adam Zegelin, Instaclustr) | Cassand...
Micro-batching: High-performance Writes (Adam Zegelin, Instaclustr) | Cassand...Micro-batching: High-performance Writes (Adam Zegelin, Instaclustr) | Cassand...
Micro-batching: High-performance Writes (Adam Zegelin, Instaclustr) | Cassand...
DataStax
 
How to deploy Apache Spark in a multi-tenant, on-premises environment
How to deploy Apache Spark in a multi-tenant, on-premises environmentHow to deploy Apache Spark in a multi-tenant, on-premises environment
How to deploy Apache Spark in a multi-tenant, on-premises environment
BlueData, Inc.
 
Cisco UCS Integrated Infrastructure for Big Data with Cassandra
Cisco UCS Integrated Infrastructure for Big Data with CassandraCisco UCS Integrated Infrastructure for Big Data with Cassandra
Cisco UCS Integrated Infrastructure for Big Data with Cassandra
DataStax Academy
 
Hadoop on-mesos
Hadoop on-mesosHadoop on-mesos
Hadoop on-mesos
Henry Cai 蔡明航
 
20150716 introduction to apache spark v3
20150716 introduction to apache spark v3 20150716 introduction to apache spark v3
20150716 introduction to apache spark v3
Andrey Vykhodtsev
 
What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?
DataWorks Summit
 
How to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized EnvironmentHow to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized Environment
BlueData, Inc.
 
Effective Spark on Multi-Tenant Clusters
Effective Spark on Multi-Tenant ClustersEffective Spark on Multi-Tenant Clusters
Effective Spark on Multi-Tenant Clusters
DataWorks Summit/Hadoop Summit
 
Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...
Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...
Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...
Spark Summit
 
Structor - Automated Building of Virtual Hadoop Clusters
Structor - Automated Building of Virtual Hadoop ClustersStructor - Automated Building of Virtual Hadoop Clusters
Structor - Automated Building of Virtual Hadoop Clusters
Owen O'Malley
 
Open Source Ingredients for Interactive Data Analysis in Spark by Maxim Lukiy...
Open Source Ingredients for Interactive Data Analysis in Spark by Maxim Lukiy...Open Source Ingredients for Interactive Data Analysis in Spark by Maxim Lukiy...
Open Source Ingredients for Interactive Data Analysis in Spark by Maxim Lukiy...
DataWorks Summit/Hadoop Summit
 
20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker
Wei Ting Chen
 
C* Summit 2013: Time for a New Relationship - Intuit's Journey from RDBMS to ...
C* Summit 2013: Time for a New Relationship - Intuit's Journey from RDBMS to ...C* Summit 2013: Time for a New Relationship - Intuit's Journey from RDBMS to ...
C* Summit 2013: Time for a New Relationship - Intuit's Journey from RDBMS to ...
DataStax Academy
 
Hive on spark berlin buzzwords
Hive on spark berlin buzzwordsHive on spark berlin buzzwords
Hive on spark berlin buzzwords
Szehon Ho
 

What's hot (20)

Speed Up Your Existing Relational Databases with Hazelcast and Speedment
Speed Up Your Existing Relational Databases with Hazelcast and SpeedmentSpeed Up Your Existing Relational Databases with Hazelcast and Speedment
Speed Up Your Existing Relational Databases with Hazelcast and Speedment
 
Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...
Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...
Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...
 
Enterprise Grade Streaming under 2ms on Hadoop
Enterprise Grade Streaming under 2ms on HadoopEnterprise Grade Streaming under 2ms on Hadoop
Enterprise Grade Streaming under 2ms on Hadoop
 
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
 
Big Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and MesosBig Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and Mesos
 
State of Spark in the cloud (Spark Summit EU 2017)
State of Spark in the cloud (Spark Summit EU 2017)State of Spark in the cloud (Spark Summit EU 2017)
State of Spark in the cloud (Spark Summit EU 2017)
 
Micro-batching: High-performance Writes (Adam Zegelin, Instaclustr) | Cassand...
Micro-batching: High-performance Writes (Adam Zegelin, Instaclustr) | Cassand...Micro-batching: High-performance Writes (Adam Zegelin, Instaclustr) | Cassand...
Micro-batching: High-performance Writes (Adam Zegelin, Instaclustr) | Cassand...
 
How to deploy Apache Spark in a multi-tenant, on-premises environment
How to deploy Apache Spark in a multi-tenant, on-premises environmentHow to deploy Apache Spark in a multi-tenant, on-premises environment
How to deploy Apache Spark in a multi-tenant, on-premises environment
 
Cisco UCS Integrated Infrastructure for Big Data with Cassandra
Cisco UCS Integrated Infrastructure for Big Data with CassandraCisco UCS Integrated Infrastructure for Big Data with Cassandra
Cisco UCS Integrated Infrastructure for Big Data with Cassandra
 
Hadoop on-mesos
Hadoop on-mesosHadoop on-mesos
Hadoop on-mesos
 
20150716 introduction to apache spark v3
20150716 introduction to apache spark v3 20150716 introduction to apache spark v3
20150716 introduction to apache spark v3
 
What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?
 
How to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized EnvironmentHow to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized Environment
 
Effective Spark on Multi-Tenant Clusters
Effective Spark on Multi-Tenant ClustersEffective Spark on Multi-Tenant Clusters
Effective Spark on Multi-Tenant Clusters
 
Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...
Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...
Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...
 
Structor - Automated Building of Virtual Hadoop Clusters
Structor - Automated Building of Virtual Hadoop ClustersStructor - Automated Building of Virtual Hadoop Clusters
Structor - Automated Building of Virtual Hadoop Clusters
 
Open Source Ingredients for Interactive Data Analysis in Spark by Maxim Lukiy...
Open Source Ingredients for Interactive Data Analysis in Spark by Maxim Lukiy...Open Source Ingredients for Interactive Data Analysis in Spark by Maxim Lukiy...
Open Source Ingredients for Interactive Data Analysis in Spark by Maxim Lukiy...
 
20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker
 
C* Summit 2013: Time for a New Relationship - Intuit's Journey from RDBMS to ...
C* Summit 2013: Time for a New Relationship - Intuit's Journey from RDBMS to ...C* Summit 2013: Time for a New Relationship - Intuit's Journey from RDBMS to ...
C* Summit 2013: Time for a New Relationship - Intuit's Journey from RDBMS to ...
 
Hive on spark berlin buzzwords
Hive on spark berlin buzzwordsHive on spark berlin buzzwords
Hive on spark berlin buzzwords
 

Similar to Flexible compute

Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017
Dave Holland
 
Openstack - An introduction/Installation - Presented at Dr Dobb's conference...
 Openstack - An introduction/Installation - Presented at Dr Dobb's conference... Openstack - An introduction/Installation - Presented at Dr Dobb's conference...
Openstack - An introduction/Installation - Presented at Dr Dobb's conference...
Rahul Krishna Upadhyaya
 
Open shift and docker - october,2014
Open shift and docker - october,2014Open shift and docker - october,2014
Open shift and docker - october,2014
Hojoong Kim
 
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageWebinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
MayaData Inc
 
Netflix oss season 2 episode 1 - meetup Lightning talks
Netflix oss   season 2 episode 1 - meetup Lightning talksNetflix oss   season 2 episode 1 - meetup Lightning talks
Netflix oss season 2 episode 1 - meetup Lightning talks
Ruslan Meshenberg
 
Monitoring&Logging - Stanislav Kolenkin
Monitoring&Logging - Stanislav Kolenkin  Monitoring&Logging - Stanislav Kolenkin
Monitoring&Logging - Stanislav Kolenkin
Kuberton
 
State of the Container Ecosystem
State of the Container EcosystemState of the Container Ecosystem
State of the Container Ecosystem
Vinay Rao
 
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & AlluxioAlluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio, Inc.
 
Provisioning Servers Made Easy
Provisioning Servers Made EasyProvisioning Servers Made Easy
Provisioning Servers Made Easy
All Things Open
 
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit
 
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache AccumuloReal-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Joe Stein
 
Container Security
Container SecurityContainer Security
Container Security
Paul Cichonski
 
Openstack_administration
Openstack_administrationOpenstack_administration
Openstack_administration
Ashish Sharma
 
Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications
OpenEBS
 
Kubernetes for HCL Connections Component Pack - Build or Buy?
Kubernetes for HCL Connections Component Pack - Build or Buy?Kubernetes for HCL Connections Component Pack - Build or Buy?
Kubernetes for HCL Connections Component Pack - Build or Buy?
Martin Schmidt
 
Build cloud native solution using open source
Build cloud native solution using open source Build cloud native solution using open source
Build cloud native solution using open source
Nitesh Jadhav
 
Engage 2020 - Kubernetes for HCL Connections Component Pack - Build or Buy?
Engage 2020 - Kubernetes for HCL Connections Component Pack - Build or Buy?Engage 2020 - Kubernetes for HCL Connections Component Pack - Build or Buy?
Engage 2020 - Kubernetes for HCL Connections Component Pack - Build or Buy?
panagenda
 
Introduction to Stacki - World's fastest Linux server provisioning Tool
Introduction to Stacki - World's fastest Linux server provisioning ToolIntroduction to Stacki - World's fastest Linux server provisioning Tool
Introduction to Stacki - World's fastest Linux server provisioning Tool
Suresh Paulraj
 
Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)
Kubernetes at NU.nl   (Kubernetes meetup 2019-09-05)Kubernetes at NU.nl   (Kubernetes meetup 2019-09-05)
Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)
Tibo Beijen
 
Webinar - DreamObjects/Ceph Case Study
Webinar - DreamObjects/Ceph Case StudyWebinar - DreamObjects/Ceph Case Study
Webinar - DreamObjects/Ceph Case Study
Ceph Community
 

Similar to Flexible compute (20)

Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017
 
Openstack - An introduction/Installation - Presented at Dr Dobb's conference...
 Openstack - An introduction/Installation - Presented at Dr Dobb's conference... Openstack - An introduction/Installation - Presented at Dr Dobb's conference...
Openstack - An introduction/Installation - Presented at Dr Dobb's conference...
 
Open shift and docker - october,2014
Open shift and docker - october,2014Open shift and docker - october,2014
Open shift and docker - october,2014
 
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageWebinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
 
Netflix oss season 2 episode 1 - meetup Lightning talks
Netflix oss   season 2 episode 1 - meetup Lightning talksNetflix oss   season 2 episode 1 - meetup Lightning talks
Netflix oss season 2 episode 1 - meetup Lightning talks
 
Monitoring&Logging - Stanislav Kolenkin
Monitoring&Logging - Stanislav Kolenkin  Monitoring&Logging - Stanislav Kolenkin
Monitoring&Logging - Stanislav Kolenkin
 
State of the Container Ecosystem
State of the Container EcosystemState of the Container Ecosystem
State of the Container Ecosystem
 
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & AlluxioAlluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
 
Provisioning Servers Made Easy
Provisioning Servers Made EasyProvisioning Servers Made Easy
Provisioning Servers Made Easy
 
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
 
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache AccumuloReal-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
 
Container Security
Container SecurityContainer Security
Container Security
 
Openstack_administration
Openstack_administrationOpenstack_administration
Openstack_administration
 
Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications
 
Kubernetes for HCL Connections Component Pack - Build or Buy?
Kubernetes for HCL Connections Component Pack - Build or Buy?Kubernetes for HCL Connections Component Pack - Build or Buy?
Kubernetes for HCL Connections Component Pack - Build or Buy?
 
Build cloud native solution using open source
Build cloud native solution using open source Build cloud native solution using open source
Build cloud native solution using open source
 
Engage 2020 - Kubernetes for HCL Connections Component Pack - Build or Buy?
Engage 2020 - Kubernetes for HCL Connections Component Pack - Build or Buy?Engage 2020 - Kubernetes for HCL Connections Component Pack - Build or Buy?
Engage 2020 - Kubernetes for HCL Connections Component Pack - Build or Buy?
 
Introduction to Stacki - World's fastest Linux server provisioning Tool
Introduction to Stacki - World's fastest Linux server provisioning ToolIntroduction to Stacki - World's fastest Linux server provisioning Tool
Introduction to Stacki - World's fastest Linux server provisioning Tool
 
Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)
Kubernetes at NU.nl   (Kubernetes meetup 2019-09-05)Kubernetes at NU.nl   (Kubernetes meetup 2019-09-05)
Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)
 
Webinar - DreamObjects/Ceph Case Study
Webinar - DreamObjects/Ceph Case StudyWebinar - DreamObjects/Ceph Case Study
Webinar - DreamObjects/Ceph Case Study
 

Recently uploaded

みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 

Recently uploaded (20)

みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 

Flexible compute

  • 1. Sanger and our upcoming flexible compute platform Peter Clapham - Jan 2017
  • 2. Why a private cloud ? Collaboration is hard enough already HPC is a weak security model Cat 4 data is a large elephant We’re reaching POSIX scalability Increasing demand for more flexibility regarding operating systems and supplied libraries Running services at scale should be able to burst to meet demand and collapse when no longer required We should be able to more readily take advantage of developing technology Linking up with common standards across the broader community.
  • 3. Openstack at Sanger. July 2015 - Development Juno system. September 2015 - Limited access POC Kilo system ( using Triple-O ). January 2016 - Hybrid cloud for commercial entities. June 2016 - Wider access POC Kilo system ( Triple-O ). Sep -> Dec 2016 - First production Liberty system ( Triple-O )
  • 4. Production openstack (I) • 107 Compute nodes (Supermicro) each with: • 512GB of RAM, 2 * 25GB/s network interfaces, • 1 * 960GB local SSD, 2 * Intel E52690v4 ( 14 cores @ 2.6Ghz ) • 6 Control nodes (Supermicro) allow 2 openstack instances. • 256 GB RAM, 2 * 100 GB/s network interfaces, • 1 * 120 GB local SSD, 1 * Intel P3600 NVMe ( /var ) • 2 * Intel E52690v4 ( 14 cores @ 2.6Ghz ) • Total of 53 TB of RAM, 2996 cores, 5992 with hyperthreading. • Redhat Liberty deployed with Triple-O
  • 5. Production openstack (II) • 9 Storage nodes (Supermicro) each with: • 512GB of RAM, • 2 * 100GB/s network interfaces, • 60 * 6TB SAS discs, 2 system SSD. • 2 * Intel E52690v4 ( 14 cores @ 2.6Ghz ) • 4TB of Intel P3600 NVMe used for journal. • Ubuntu Xenial. • 3 PB of disc space, 1PB usable. • Single instance ( 1.3 GBytes/sec write, 200 MBytes/sec read ) • Ceph benchmarks imply 7 GBytes/second
  • 6. Production openstack (III) • 3 Racks of equipment, 24 KW load per rack. • 10 Arista 7060CX-32S switches . • 1U, 32 * 100Gb/s -> 128 * 25Gb/s . • Hardware VxLan support integrated with openstack *. • Layer two traffic limited to rack, VxLan used inter-rack. • Layer three between racks and interconnect to legacy systems. • All network switch software can be upgraded without disruption. • True linux systems. • 400 Gb/s from racks to spine, 160 Gb/s from spine to legacy systems. (* VxLan in ml2 plugin not used in first iteration)
  • 7. But what are we providing? CloudForms Service driven access OpenStack Horizon Granular control over instance Direct API Access Direct https access from anywhere Accessible only from within Sanger Ceph Object Storage (Used to provide volume and image storage) S3 Object Storage Layer
  • 8. How Does this Fit with Existing Services? OpenStack “Bubble” Compute Ceph 100GB/s SDN network infrastructure Sanger internal systems Access to secured services i.e. iRODS Databases CIFS (Windows shares) S3 API access OpenStack API and GUI access 80GB/s connectivity No access to: NFS Lustre
  • 11. Efficient Resource Management OpenStack resources are managed at a tenant group level. • Each “tenant” group has an assigned quota for: • Disk • CPU • Memory Once limits are full, tenant members will either have to wait for resources to become available or shutdown or terminate a running instance. Initial quotas are agreed with the IC before creation
  • 12. Quotas, they are not all the same. Some groups have a requirement that they have an absolute number of spots available for essential services Other groups would like to burst to meet demand as required. These requirements do not fit well with each other.
  • 13. The Proposed Workaround For those projects which require guaranteed access: • We create a dedicated tenant group that has specific access to a set quota allocation of vCPU, Disk and memory. • This is tied directly against reserved hardware This guarantees requested resource will be available when required, whilst providing security, operating system flexibility and instance management. BUT there is no ability to use more than the requested allocation
  • 14. Dynamic Workflows Dynamic workflows can expand to meet demand and collapse when not required. So a quota that matches the initial resource request will mean constantly under quota’ing the system For the initial release we will start by: • Overcommitting CPU by 1.5 : 1 (available total vCPU ~9000) • Overallocate quotas so that 115% of the overcommitted vCPU is available to tenants. So some initial ability to use more of the system than may be available.
  • 15. For More Details, see https://docs.google.com/a/sanger.ac.uk/document/d/17z9urhh3bTLRhQo9b8Ccs ZW_3O7cxlGY9uiwpAS_GqQ/edit?usp=sharing Or http://tinyurl.com/zzurp5s We are adding monitoring and metrics gathering to the system. This will provide a feedback loop for quota and project management.
  • 16. New Opportunities for Application Development Cloud application development aims to scale out compute and provide: • Auto scaling of key services • Making pipelines cost effective on commercial platform providers • Self-healing of service components that fail • Creating resilient services with reduced impact when service components fail • Not tied to any one specific environment • Enabling sharing of code, images and services with collaborators. This can dramatically reduce the need to copy large data sets around the world and permit running complex pipelines where the data resides.
  • 17. How do we see Migration ?
  • 18. Initial Early Adopters. We have some early adopters ! 1. Mutational signatures 2. Imputation service 3. Blast service 4. Pan-prostate We look forward to hearing more from these groups soon !
  • 19. Mostly Share a Common Approach Web Interface Data upload Run analysis Update job status data base Present data Invoke Analysis Retain a copy
  • 20. Adaption to Cloud based tools Stage Current approach Cloud approach User details local databases, directory services, Oauth Oauth, directory services Data Downloads Globus or https S3, Globus Job status RDBM: MySQL, Oracle or PostgreSQL NoSQL: Mongodb, Cassandra or REDIS Invoke Job Analysis Hand crafted equest to LSF AMQP Run Analysis LSF job submission AMQP, Heat orchestration or API call to Openstack Present data Make available via sftp, Globus or https web upload S3 automatically generated URL's Keep data No consistent approach S3, archive as required Service failure Await systems Use IFTTT or add code to instance to raise or restart an instance as required Autoscale options Await systems Use IFTTT or add code to instance to raise or restart an instance as required Service discovery Manual Cloud init, heat templates, dynamic DNS
  • 21. New service, New Image ? Cloud software stacks are based around services (micro-services) and are an exemplar of service-orientated architectures. Instances are mostly started from pre-created images and these form the building blocks for a given service. Starting with: • Ubuntu with Docker support • Rstudio • An NFS server • OpenLava cluster But what if you need something different ? You could ask or you could use the tools provided to create your own. Think /software+++
  • 22. Developing machine images. • Start simple and add complexity later. • We understand that Biologists are not often software engineers. • We believe that the process of images creation should be codified and software development best practices followed. • Openstack images are based on images from a vendor. • It is possible to import other virtualisation system images to Openstack ( these images could be made with automated tools ). • Virtualisation allows the possibility of software reproducibility.
  • 23. Software development • Source control ( git ), gitflow • Infrastructure as code ( Packer ) • Continuous Integration ( gitlab CI ) • Test driven development ( test kitchen )
  • 24. Git branches (gitflow) • Gitflow http://nvie.com/posts/a-successful-git-branching-model/ • We follow the principle but do not use the software • The master branch is always useable. • New features are integrated on the development branch. • Develop on a feature branch created from the development branch. • When a feature is complete pull feature to development branch. • When a set of features is ready pull development to master and tag release. • Develop bug fixes on a branch off development and cherry pick to bug release branch created from tag of release.
  • 25. Semantic versioning MAJOR.MINOR.PATCH http://semver.org/spec/v2.0.0.html • MAJOR version when you make incompatible API changes, • MINOR version when you add functionality in a backwards- compatible manner, and • PATCH version when you make backwards-compatible bug fixes. We treat changes in environment variables as a change to the “api”.
  • 26. Packer • https://packer.io/ • Machine image configuration as code. • In use by systems at Sanger since 2014 ( used to build lustre clients ) • Supports multiple virtualization platforms. • Supports both linux and Windows. • Simple example that can be used without CI: • https://github.com/wtsi-ssg/image-creation
  • 27. Packer, Provisioners • Provisioners change the state of the machine. • Provisioners are bits of code written in various languages. • Multiple provisioners are allowed in an order. • Can be restricted to specific builds. • Shell - simple shell scripts. • File uploads. • Ansible • Chef, Puppet, Salt • Powershell, Windows-Shell
  • 28. Packer, Builders • Builders are responsible for creating machines and generating images from them for various platforms. • Amazon, Takes an Image and applies changes. • Openstack, Takes an Image and applies changes. • Vmware , Uses an ISO and installs, then applies changes. • Docker, Takes a container and applies changes. • VirtualBox, Uses an ISO and installs, then applies changes. • Others….
  • 29. Gitlab CI • Allows processes to be run in response to a push to a repository. • Configured by a yaml file ( .gitlab-ci.yml ) • A build consists of multiple stages. • Each stage is run sequentially. • Parallel execution of tasks in each stage • State needs to be stored in separate files/directories ( $CI_BUILD_ID ) • Tags control which processes execute the stage. • https://about.gitlab.com/gitlab-ci/
  • 30. Test Kitchen • http://kitchen.ci/ • Creates new instances to run tests on. • Drivers for various systems eg. • Amazon • Openstack • Docker • Windows • Configured with a single file ( .kitchen.yml ) which is a erb template.
  • 31. Test Kitchen • Each group should have an openstack tenant for CI. • Credentials are stored in gitlabs variables section. • Tenant needs to have a ssh security group. • Tenant needs a single network. • Configuration is shared in environment variables. • Supports multiple test frameworks: • ServerSpec • Bats
  • 32. Testing orchestration Test kitchen can have multiple servers running at one time, each test runs from a separate directory, this allows us to test client server systems: • In a server directory start a machine and run server tests. • Extract the internal ip address from the master. • In a client directory start a machine, inject master location. • Run client tests. • Stop client, stop master.
  • 33. ServerSpec • RSpec is a behaviour-driven development framework for unit tests. • ServerSpec allows rspec tests to check server status. • E.g require 'serverspec' # Required by serverspec set :backend, :exec describe "file system checks" do describe file('/data1') do it { should be_mounted } end end
  • 35. image creation • Our base image. • Used to make changes that will affect all the images. • https://github.com/wtsi-ssg/image-creation-ci • Multiple tags, each tag is a release eg. • v5.0.0 migration from openstack beta to openstack gamma • v6.0.0 adding ansible as a system for configuration • v7.0.0 adding support for xenial and centos 7.2 as well as trusty.
  • 36. ISG repository. • https://github.com/wtsi-ssg/simple-image-builder • Continuous Integration and tests infrastructure framework already available, additional tests will need writing. • Chain of software reproducibility relies on • Trust that vendor built an image consistently. • Note that operating system packages will be pulled in a time of creation. • Critical components need to be pulled in from a fixed source. • Test should be written to validate system.
  • 37. Batch scheduling is a bit old...
  • 38. Openlava image • A single image which is used for both master/head node and compute nodes. • Includes NFS server for home directory. • Currently based on trusty ( Ubuntu 14.04 ). • Development branch for Xenial ( Ubuntu 16.04 ) . • Development branch for Centos 7.2 . • ServerSpec tests using multiple servers.
  • 40. New tools and images are already being created internally WR from Sendu: https://github.com/VertebrateResequencing/wr NPG are producing an AMQP service image https://gitlab.internal.sanger.ac.uk When completed
  • 41. And there’s more ! We are listening to your requests for features and supporting infrastructure: https://docs.google.com/spreadsheets/d/1_oeBz27beLLj_4xe3yoyZYj paYhDFTE__F_L6pVNcTE/edit OR http://tinyurl.com/z5bh5q5
  • 42. But also online tutorials, videos and documentation Some, hopefully, useful examples have been collated here: https://ssg- confluence.internal.sanger.ac.uk/display/OPENSTACK/Distributed+ap plications%3A+links+and+resources Or http://tinyurl.com/gwbrtfl
  • 43. And still more 10th March OpenStack event here at Sanger Tim Bell from Cern. • Head of the CERN OpenStack team • 200,000+ vCPU’s • Many Many PB of Ceph Final schedule TBA
  • 44. Almost done Release date: On time for March 1st. Watch out for the upcoming flyers
  • 45.
  • 46. Acknowledgements Current group staff: Pete Clapham, James Beal, Helen Brimmer, John Constable, Brett Hartley, Dave Holland, Jon Nicholson, Matthew Vernon. Previous group staff: Simon Fraser, Andrew Perry, Matthew Rahtz. All our early testers and those who have provided constructive feedback !
  • 47. P.S. 11 more days to migrate from lustre 108, 109,110 and 111 before the system is made read only. And only 1 month (1st March) until the old lustre systems are securely wiped and ready for removal from campus Remember, lustre is not backed up