Elastic storage in the cloud session 5224 final v2

© 2015 IBM Corporation
IBM Spectrum Scale
(formerly Elastic Storage)
in the Cloud
Neeta Garimella (neeta@us.ibm.com)
and
Brad DesAulniers(bradd@us.ibm.com)

Agenda
• What is IBM Spectrum Scale (a.k.a. Elastic Storage)?
• IBM Spectrum Scale Key Features
• Real World Use Cases
1

Software Defined Storage
• Software defined storage is enterprise class storage that uses
standard commodity hardware with all the important storage and
management functions performed in intelligent software
• Software defined storage delivers automated, policy-driven,
application-aware storage services through orchestration of the
underlining storage infrastructure in support of an overall
software defined environment (SDE)

Software Defined Storage Benefits
• Save acquisition costs by using standard servers and storage
instead of expensive, special purpose hardware
• Realize extreme scale and performance through linear, building
block, scale-out
• Increase resource and operational efficiency by pooling
redundant isolated resources and optimizing utilization
• Achieve greater IT agility by being able to quickly react,
provision and redeploy resources in response to new
requirements
• Lower data management costs through policy driven
automation and tiered storage management

What is Elastic Storage?
Elastic Storage is the Infrastructure for
Global Data Management
Virtualized Access to Data
Elastic Storage
Software Virtualized, Centrally deployed,
managed, backed up and
grown
Clustered file system, all nodes
access data.
Seamless capacity and
performance scaling

Elastic Storage Strengths
Maximum file system size: 299
bytes : 1 Million Yottabytes
263 ~ 9 quintillion files per file
system
Maximum file size equals file
system size
Customers with 18 PB file
systems
IPv6
Future proof
Commodity hardware
Extreme Scalability
Snapshots, replication
Declustered RAID
Built-in heartbeat, automatic
failover/failback
Add/remove nodes and
storage on the fly
Rolling upgrades
End-to-end data integrity
Administer from any node
Commodity hardware
Proven Reliability
Parallel file access
Distributed, scalable, high
performance metadata
Flash acceleration
Automatic tiering
Over 400 GB/s
Commodity hardware
High Performance

Global Name Space
Computation & Analytics
Local Data Access
Site B
Site C
Site C
Single site +
Multi-site +
Global Data Access
Tape
Automatic
Tiering +
Migration
GPFS Storage + Server Cluster
Data
Ingest
DISK
FLASH
SSD
INTERNAL DISK
HADOOP
FILES
RDBMS
HADOOP FILES
RDBMS
High Data Output
Life Sciences
Research
Energy
Financial Services
FAILOVER
Elastic Storage: A deployment scenario

IBM Spectrum Scale:
Key Features
7

• Elastic Storage software is a single scale-out data plane for the entire data center
• Unifies storage for VMs, analytics, objects, HPC, and file serving
• Single name space view no matter where data resides
• De-clustered parity - Software RAID for commodity storage with no HW RAID
• Abstracts storage “pools” across various back-end storage (direct attached, SAN-
based, JBOD arrays, integrated storage
• Data in best location, on the best tier (performance & cost), at the right time
 ICStore and AFM for data movement across cloud geographies
IBM Spectrum Scale
Active
File
Manager
ES
ES
ES
POSIX
NFS
SMB3
iSCSI
Hadoop
OpenStack
Cinder
Manila Glance
Nova
VMWare
VASAvVols
VAAI vSphere
File Sharing Object
Access
Open
Stack
Swift
Virtualization
Analytics
Single Namespace
ILM from Flash to Cloud
SSD Fast
Disk
Tape
Optical
DVD
SAN SW
RAID
Cloud
StorageICStore
Gateway
IBM Spectrum Scale: Cloud Data Plane Vision

Elastic Storage Cluster Models
Symmetric
cluster with
direct-attached
disks
(FPO)
Dedicated storage
Nodes
(ESS)SAN storage
Global namespace with combination of multiple cluster models is also possible

Elastic Storage parallel architecture
 Clients use data, Network Storage
Devices (NSDs) serve shared data
 All NSD servers export to all clients in
active-active mode
 Elastic Storage stripes files across
NSD servers and NSDs in units of file-
system block-size
 NSD client communicates with all the
servers
 File-system load spread evenly across
all the servers and storage. No
HotSpots
 Easy to scale file-system capacity and
performance while keeping the
architecture balanced
NSD Client does real-time parallel I/O
to all the NSD servers and storage
volumes/NSDs
File stored in blocks

Data ingestion
or creation
Data
processing
Access Archival
High Performance
Disk Tier
Flash, SSD, SAS
Parallel Access
Provide highest
performance for most
demanding
applications
High volume storage
Single Global Name
Space across all
tiersLower costs by
allocating the right tier
of storage to the right
need
Archival storage with low
cost disk or tape
Integration with Tivoli
Storage Manager/LTFS
Policy based Archival
and remote Disaster
Recovery
Manage the full data lifecycle cost effectively

12
Leverage the Metadata for Management Purposes
• Use the Right Tool for the Right Job
• Average utilization >80%
• Automated tiered storage
• Policy driven file movement between tiers
• Store petabytes of storage on terabytes of disk (inactive files are auto-migrated to tape)
• Migration as granular as a ‘per file’ basis
$ $ $ $
High
Capacity
HDD
SSDFlashSystems
LUNs
Storage
pools
Tier 1 Tier 2

… … …>
….. ….. ...
Elastic Storage Virtualization
/home/appl/data/web/important_big_spreadsheet.xls
/home/appl/data/web/big_architecture_drawing.ppt
/home/appl/data/web/unstructured_big_video.mpg
/home
/appl
/data
/web
/home/appl/data/web/important_big_spreadsheet.xls
/home/appl/data/web/big_architecture_drawing.ppt
/home/appl/data/web/unstructured_big_video.mpg
Policy Engine
Virtualized Global Namespace
Pool 1: FlashSystems Pool 2: Solid State Drives Pool 3: Nearline SAS Drives
GPFS
Nodes
GPFS
Nodes
GPFS
Nodes
Storage
Controllers
Storage
Controllers >
Storage
Controllers
LogicalPhysical

Elastic Storage: Flash for optimization
• Solid State Drives for Metadata
• Metadata includes directories, inodes, indirect blocks
• All Metadata will fit in relatively few SSDs (metadata typically 1% of total storage)
• Solid State Drives for Data Caching (LROC)
• Extend the page pool memory to include SSD for read caching
– Writes invalidate cache and are consistent across nodes
• Highly Available Write Cache (HAWC)
• Data is written to GPFS recovery log and committed by forcing the log to Flash
• As log blocks fills, rewrite to home location
– Relatively small amount NVDIMM required to maximize b/w to disk
– Recovery log will be b/w bound not IOP bound
• Solves problems with small writes
• Select data for highest tiers based on the file’s “heat”
• On-line SSD, SAS, SATA,
• Off-line TAPE, Cloud,
• GPFS Policy transparently moves data between on-line and off-line storage
– File name & directory do not indicate where the data resides
• File system tracks access to file data and computes “temperature” for data
• Tracking is done at the file level and assumes uniform access to file data
• Users define policy rules for migration & choose time for execution
EXTERNAL POOL 'bronze' EXEC ‘/var/mmfs/etc/GlueCode’
RULE 'DefineTiers' GROUP POOL 'TIERS'
IS 'gold' LIMIT(80)
THEN 'silver' LIMIT(90)
THEN 'bronze'
RULE 'Rebalance' MIGRATE FROM POOL 'TIERS' TO POOL 'TIERS' WEIGHT(FILE_HEAT)
– WEIGHT(FILE_HEAT)

Local Read-Only Cache (LROC) Architecture
• Cache data and directory blocks
on client side of switch
• Lowest latency (close to consumer)
• Cached data must be treated as volatile
– Read-only cache
– Synchronous write-through to disk
• Largest benefit from caching metadata
• Extends node’s buffer pool
• Cheaper than adding more DRAM -> more capacity
• Buffer priority/LRU list maintained like standard memory
buffers
• Cache consistency insured by standard GPFS byte-range
tokens
– Remote caching is very hard for block devices to do
• Hot swappable SSD
• Increase/decrease LROC space while file system is on-
line
• Dynamic configuration – file system stays on-line
GPFS Client nodesLROC
SSD SSD SSD
Add 500GBs of SSD to
each interface node

16
LROC Flash Cache Example Speed Up
• Initially, with all data coming from the disk storage system, the client reads data from the SAS disks at ~
5,000 IOPS
• As more data is cached in Flash, client performance increases to 32,000 IOPS while reducing the load
on the disk subsystem by more than 95%
~ 5,000 IOPS 10K RPM SAS Drives
~ 32,000 IOPS Flash SSD
~ 6x
• Two consumer grade 200 GB SSDs cache a forty-eight 300 GB 10K SAS disk storage system

Highly-Available Write Cache (HAWC) GPFS Client Side
• Place both metadata and small writes in recovery log,
which is now stored in fast storage (NVRAM) such as
Flash-backed DIMMs, Fast SSDs
• Log small writes and send large writes directly to disk
• Scale linearly with number of nodes
• Each file system and server has its own recovery log
• On node failure, quickly run recovery log to recover file system
• Designed to handle *bursts* of small I/O requests
• Optimize write requests to slow disk
• Allow aggregation of small writes into more efficient large write requests
to storage
• Workloads that will benefit include VMs, DBs, Logging
• GPFS Flash optimizations
• Creating a “Hot File” policy with a fast storage pool improves HAWC steady state
performance
1. Write data to HAWC
2. Write data to fast storage
3. Down tier to slow storage

HAWC Configuration 1: Store in Fast Storage
sync
write
Page
Pool
GPFS Client
Fast
storage
Recovery
Log M
M
2. Write to primary storage
GPFS Storage
SSD Fast
Disk
Slow
Disk
Tape
1. Store in fast storage
• Store small data write requests in recovery log on fast storage
• Fast storage could be a separate system (FAS840) or integrated (SSDs in a
storage server or V7K)
• Once recovery log is full, or memory is low, write data to primary storage
system

HAWC Configuration 2: Replicate Across GPFS
Clients
• Replicate small data write requests in recovery log on fast
storage device
• Once recovery log is full, or memory is low, write data to primary
storage system
Page
Pool
sync
write
GPFS Client
Recovery
Log
Page
Pool
GPFS Client
Recovery
Log
2. Write to
storage
M M M
M - Data
- RAM
- NVRAM
GPFS Storage
SSD Fast
Disk
Slow
Disk
Tape
1. Replicate across clients

Elastic Storage: Active File Management (AFM) for Global
Data Mobility
Global WAN Caching
removes latency effects
Protocols
CIFS
NFS
HTTP
FTP
SCP
Management
Central
Administration
Monitoring
File Mgmt
Availability
Data Migration
Replication
Backup
GPFS/AFM
Protocols
CIFS
NFS
HTTP
FTP
SCP
Management
Central
Administration
Monitoring
File Mgmt
Availability
Data Migration
Replication
Backup
GPFS/AFM
Network
Protocols
CIFS
NFS
HTTP
FTP
SCP
Manage
ment
Central
Administra
tion
Monitoring
File Mgmt
Availabili
ty
Data
Migration
Replication
Backup
GPFS/AFM
Protocols
CIFS
NFS
HTTP
FTP
SCP
Manage
ment
Central
Administra
tion
Monitoring
File Mgmt
Availabili
ty
Data
Migration
Replication
Backup
GPFS/AFM
Geo-dispersed
Replicas
“Global Namespace, not
just “Common” Namespace
Data Center
Ownership and Relationships all
done on a fileset boundary
Data Migration/Ingest
from legacy NAS

Global Namespace and Caching with AFM
4-21
Clients access:
/global/data1
/global/data2
/global/data3
/global/data4
/global/data5
/global/data6
Clients access:
/global/data1
/global/data2
/global/data3
/global/data4
/global/data5
/global/data6
Clients access:
/global/data1
/global/data2
/global/data3
/global/data4
/global/data5
/global/data6
Cache Filesets:
/data1
/data2
Local Filesets:
/data3
/data4
Cache Filesets:
/data5
/data6
File System: store1
Local Filesets:
/data1
/data2
Cache Filesets:
/data3
/data4
Cache Filesets:
/data5
/data6
File System: store2
Cache Filesets:
/data1
/data2
Cache Filesets:
/data3
/data4
Local Filesets:
/data5
/data6
File System: store3
 Map local fileset to any remote export
 See all data from any Cluster
 Cache as much data as required or fetch
data on demand
 All VFS operations trapped at the cache
cluster to be transparent to all application
 Multi writer support (Independent Writer)
 Multi protocol support (NFS, GPFS RPC)
 Multiple end point support (legacy filer,
other POSIX local filesystems
 Multi node parallel read/write support
 WAN performance improvements: Aspera
transport
GPFS
GPFS
GPFS

IBM Spectrum Scale RAID: De-clustered RAID
• ESS : Elastic Storage Server with GPFS Native RAID (De-clustered
RAID)
‒ Data and parity stripes are uniformly partitioned and distributed across array
‒ Rebuilds that take days on other systems, take minutes on Elastic Storage
• 2-fault and 3-fault tolerance
‒ Reed-Solomon parity encoding; 2-fault or 3-fault tolerant
‒ 3 or 4-way mirroring
• End-to-end checksum & dropped write detection
‒ From disk surface to Elastic Storage user / client
‒ Detects and corrects off-track and lost / dropped disk writes
• Asynchronous error diagnosis while affected I/Os continue
‒ If media error: verify and restore if possible
‒ If path problem: attempt alternate paths
• Supports live replacement of disks
‒ I/O operations continue for tracks whose disks are removed during service

Declustered RAID Example
7 disks3 groups
6 disks
spare
disk
21 stripes
(42 strips)
49 strips
7 stripes per group
(2 strips per stripe)
7 spare
strips
3 1-fault-tolerant
mirrored groups
(RAID1)

Elastic Storage: Native Encryption and Secure Erase
Encryption of data at rest
• Files are encrypted before they are stored on disk
• Master Keys are never written to file-system disks
• Protects data from security breaches,
unauthorized access, and being lost, stolen or
improperly discarded
• Complies with NIST SP 800-131A and is FIPS
140-2 certified
• Supports HIPAA, Sarbanes-Oxley, EU and
national data privacy law compliance
Secure deletion
• Ability to destroy arbitrarily large subsets of a file
system
• No “digital shredding”, no overwriting: secure
deletion is a cryptographic operation

Elastic
Storage
Hadoop
Connector
GPFS Storage Server Cluster
Elastic Storage: Application data access
Cinder Swift
Single software defined storage solution across all these application types
Linear capacity &
performance scale out
Enterprise storage on
commodity hardware
Single Name Space
Technical Computing Big Data & Analytics Cloud
Block Object
Elastic
Storage NFS
POSIX
File

Elastic Storage :HDFS
• Elastic Storage Hadoop connector
• Supports IBM Big Insights Analytics and
Apache Hadoop
• Existing infrastructure can do Hadoop-based
Analytics
‒ No need to purchase a dedicated Analytics
infrastructure, lowering CAPEX and OPEX
• No need to move data in and out of an
Analytics dedicated silo
‒ Speeds results
• Enterprise-class protection and efficiency
‒ Full data lifecycle management
‒ Policy based tiering from flash to disk to tape
• Reduce cost, simplify management
Compute Cluster
GPFS Storage Server Cluster
HDFS

GPFS Hadoop Connector Overview
FileSystem API abstracts
the file system interface
Distributed file system: GPFS Distributed file system: HDFS
Map/Reduce API
Hadoop FS APIs
Applications
The reason why
applications don’t see
a difference between
GPFS and HDFS
Higher-level languages:
Hive, BigSQL JAQL, Pig …
Higher level languages
abstract application from
Map/Reduce API
Applications
GPFS Hadoop
Connector

Category Features GPFS HDFS
Enterprise
readiness
POSIX compliance Full support Limited support
Meta-data replication
Triplication, scale-out metadata
management for many years
High Availability for HDFS since v2.2
Access protocols
Data lake with rich access protocols: file,
NFSv3/v4, CNFS, FTP, M/R, object, block, etc.
M/R, NFSv3 gateway (no HA yet) since
v2.2
Protection &
Recovery
Snapshot Yes, mature feature for years Yes since v2.2
Geographical
Distribution, DR
Yes (AFM) No
Backup Scalable, Fast Recovery, TSM No
Tape integration LTFS, DMAPI, HPSS, TSM No
Storage
Efficiency &
Cost
Optimization
Erasure code
GPFS Native RAID for storage efficiency,
and end-to-end data availability, reliability,
and integrity in tier-1 storage
Non-tier-1 storage implementations, for
data archive
Heterogeneous
storage pools
Yes for many years, with policy driven
ILM capability, no application
modification needed
Yes since v2.3/2.6, API driven. Need to
modify applications
Variable block sizes
Variable block sizes – suited to multiple
types of data and data access patterns
Large block-sizes – poor support for
small files
Workload
Access pattern All, update in place Write-once-read-many, append only
Preferred workload Covers the whole spectrum Write-once-read-many app, large files
High-Level Comparison of GPFS and HDFS

Category Features GPFS HDFS
Privacy and
Security
Encryption Encryption and Secure Erase Encryption for data at rest since v2.6
Access Control Lists ACL for years ACL support since v2.4
Data Retention Immutability and retention features No
Ease of Use
Policy based ILM Policy driven, automatic API driven, no policy and automation
Fine grained policy
control
Fileset, user, group, file, etc. No
Disk maintenance &
replacement
Yes, GNR/GSS disk management
features, like disk LED control, etc.
No
Rolling upgrades
Yes for many years Yes since v2.4 with limitations like down
time in rollback & downgrade, etc.
User defined node
classes
Yes No
Flexible
Architecture
Server x86, Power x86
OS Linux (x/p), AIX, Windows Linux
SSD as dynamic read
and write cache
LROC, HAWC No
Hybrid storage
architecture
External shared storage, GSS, server
internal storage
Server internal storage
High-Level Comparison of Elastic Storage and HDFS (cont.)

• OpenStack Havana release includes a Elastic Storage Cinder driver
• Giving architects access to the features and capabilities of the industry’s leading
enterprise scale-out software defined storage
• With OpenStack on Elastic Storage, all nodes see all data
• Copying data between services, like Glance to Cinder is minimized
or eliminated
• Speeding instance creation and conserving storage space
• Rich set of data management and information lifecycle features
• Efficient file clones
• Policy based automation optimizing data placement for locality or performance tier
• Backup
• Industrial strength reliability, minimizing risk
• Elastic Storage Cinder driver provides resilient block storage, minimal
data copying between services, speedy instance creation and efficient
space utilization
Elastic Storage and OpenStack : Cinder Driver

Elastic Storage Object: OpenStack Swift
31
Challenge
 The world is not object today!
 and never will be completely…
 Inefficient NAS “copy and change” gateways
Primary Use Cases
1.Single Management Plane
 Manage file and object within single system
2.Create and Share
 Create data via file interface and share globally using object
interface
3.Sync/Archive and Analyze
 Ingest through object and analyze data (Hadoop) using file
interface
Collaborating with open-source community on
SwiftOnFile StackForge project
Elastic Storage
Object
SSD Fast
Disk
Slow
Disk
Tape

32
Elastic Storage Object: Design and Benefits
• Only configuration changes required
• Place Swift Proxy, Swift Object Service, and GPFS client
on all nodes
• Object Ring
• Set only a single object server per ring
• Proxy only contacts local object service daemon
• Object-replicator
• Run infrequently to clean up tombstone and deleted files
• Object-auditor (disk scrubbing)
• Compare file-level checksum in xattr with data on disk
• Do not run
– Leverage GSS checksums and disk scrubbing/auditing
– Leverage ‘immutability’ bits to prevent changes
• Swift Virtual Devices and Partitions
• Create a ‘reasonably’ sized directory tree depending on
expected number of objects
• Currently focus on shared storage GPFS deployment
• E.g., GSS, NSD servers,=SAN, but will work with FPO
• Keystone Authentication
• Integration into LDAP/AD
Proxy
Service
HTTP Swift
Requests
GPFS
Object
Nodes
Load Balancer
Storage
Network
..Object
Service
GPFS
Geo-Distributed GPFS Object Store
SSD Fast
Disk
Slow
Disk
Tape
Keystone
Authentication
Service
Swift Services
Proxy
Service
Object
Service
GPFS
Additional
Services in
Cluster
Memcached

In Summary: Why Elastic Storage?
HPC
Big Data
CloudFile
Sharing
Compute
Network
Storage
Compute
Network
Storage
Compute
Network
Storage
Compute
Network
Storage
IT infrastructure silos lead to:
Rigid and manual assignment of redundant IT resources
Low utilization
Needless data movement and copying

Why Elastic Storage?
Smart Storage
Versatile + flexible +“silo-less”
High performance + scaling
Low TCO + easy to manage
Reliable + proven
Advanced features: all data

IBM Spectrum Scale:
Real World Use Cases

Elastic Storage in Cloud Deployments
1. IBM Spectrum Scale Cloud Services (on IBM
SoftLayer)
2. University of Birmingham
3. eMedLab-UK
4. SuperVessel Cloud for Open Power

 More storage capability - Easily meet additional
resource demands without the cost of purchasing
or managing in-house infrastructure
 Lower risks and upfront costs – increase storage
incrementally as needed. No more guessing how
much you will need 3 years from now.
 Secure - Ensure data security with physical
isolation of storage and networks on the cloud
 Easy public cloud adoption - Minimize
administrative burden with fully supported, ready-
to-run software defined storage in the cloud
What’s new
• Elastic Storage delivered as a service, bringing high performance, scalable storage and integrated data
governance for managing large amounts of data and files in the cloud
• Deployed on dedicated bare metal resources at a named data center for optimal I/O performance & security
• Optimized for technical computing & analytics workloads
• Installed, integrated & administered by skilled Cloud Ops team
A complete, application-ready cluster in the cloud,
optimized for technical computing & analytics
Use Case #1: Elastic Storage on SoftLayer Cloud
Platform LSF
(SaaS)
Platform
Symphony
(SaaS)
SoftLayer, an IBM Company
bare metal infrastructure
24X7 CloudOps Support
Elastic Storage (GPFS) on Cloud
IBM® Platform Computing™ Cloud Service

Non-shared storage paradigm
Private VLAN
Platform LSF® Master
& Platform Application
Center Server
Compute Nodes + GPFS clients
NSD Servers
Replication
Elastic Storage Cluster
Elastic Storage servers and storage are isolated inside each organization's
private VLAN  no sharing for maximum security
Elastic Storage on Cloud is a fully integrated solution that includes server and client
licenses, installation, support & maintenance of the Elastic Storage environment

Terabytes
1980 1990 2000 2010
Gigabytes
Petabytes
1000 X
1000 X
Exabytes
Zettabytes
1000 X
1000 X
Storage budgets increase only 1-5%
2014
Storage Requirements are Devouring CAPEX and
OPEX Resources
1) Easy to
manage at
scale
2) Data lifecycle
automation
3) Commodity
hardware
Data doubles approximately
every 2 years Elastic Storage
closes this gap 3
ways

Use Case #2: CLIMB - University of Birmingham
• CLIMB project (Cloud Infrastructure for Microbial Bioinformatics)
• Funded by Medical Research Council (MRC) : ~£8m (~$13M) grant
• Four partner Universities
– Birmingham
– Cardiff
– Swansea
– Warwick
• CLIMB goal is to develop and deploy a world leading cyber
infrastructure for microbial bioinformatics.
• Private cloud, running 1000 VMs over 4 sites
40

CLIMB Specs
• Private cloud, running 1000 VMs over 4 sites
• Separate OpenStack region per site, with a single gateway for access
• Local GPFS high performance (~0.5PB per site)
• Storage cluster replicated across sites
• Takes advantage of the GPFS driver for Cinder in OpenStack
• Nova Compute, Swift, and Glance use GPFS directly
GPFS magic sauce & OpenStack
• Swift : Object storage, separate file-set
• Glance : Image service (where we store VM images)
• Cinder : Volume (block disk service)
• Share file-set with Cinder, using file-clone to create images
• Nova compute: The bit that runs on the Hypervisor servers
• Point Nova compute at GPFS No GPFS magic, its just ‘local’ storage for NOVA to
use
• It’s a shared file-system so can live migrate
• Normal GPFS storage so can use RDMA
• Will LROC improve performance here?
42

CLIMB : GPFS @UoB
• GPFS @UoB
• BlueBEAR – Linux HPC running over FDR-10
• Research Data Store – multi-data centre, replicated, HA failover system for bulk data
for research projects
• Hadoop?
• CLIMB : Future work
• Tune GPFS environment – any thoughts?
• Add local SSDs to enable LROC for nova-compute nodes?
• AFM to replicate glance across sites
• Integrate OpenStack environment with GPFS and CEPH storage
• Contact : Simon Thompson, Research Computing Team, University of
Birmingham, England, UK
• S.J.Thompson@bham.ac.uk
• www.roamingzebra.co.uk (shameless blog plug)
• Project: www.climb.ac.uk
• Twitter: @MRCClimb
43

Use Case #3: eMedLab
• Background
• Funded by Medical Research Council for 5 years
• UK research council focusing on health, budget of £850M ($1.3B) in
2013/14
• Similar to NIH in the US
• Allocation of £6.8M ($10M) for capital equipment (including hosting and
power costs)
• Medical Bioinformatics: Data-Driven Discovery for Personalized
Medicine
• Objective of creating an off-site data center
• This resource will allow scientists to analyze human genome
data and medical images, together with clinical and other
physiological and social data, for the benefit of human health.
44

eMedLab Objectives
• To accumulate medical and biological data on an
unprecedented scale and complexity
• To coordinate it
• To store it safely and securely
• To make it readily available to interested researchers
• To allow customized use of resources
• To enable innovative ways of working collaboratively
• To allow a distributed support model
• To help generate new insights and clinical outcomes by
combining data from diverse sources
45

eMedLab Infrastructure
• High Performance Scratch (1.3PB @ ~25GB/s)
• VM storage (475TB) – storage for VMs and snapshots
• Project storage (1.5PB) – project specific, medium term
• Reference data (2.7PB) – where data will be shared
• 252 servers
• Why GPFS/GSS?
• Good pedigree
• Great scalability as and when we
grow
• POSIX compatible
• Active in OpenStack projects
• 4.1 Security model
• Management tools improving
46

eMedLab Design
47
Present GPFS straight into instances, use separate GPFS
disk for Glance and Cinder

eMedLab: Courtesy and contacts
48
Dr Bruno Silva
High Performance Computing Lead
The Francis Crick Institute
Thomas King
Head of Research Infrastructure
Queen Mary University of London
t.king@qmul.ac.uk

Use Case #4 – SuperVessel: Cloud for OpenPOWER
• In China, there are around 300,000 Computer Science students
graduated from universities. Only 1000~2000 of them have
training on POWER (3~6 months).
• China STG donated POWER servers to 10 ~ 15 universities
years ago. Most of them are POWER6 machines and AIX only,
too old to install those new platforms today (e.g. Hadoop).
• Limited by the teachers’ capability in university, the POWER
machine couldn’t be effectively shared by students, or create
up-to-date content (e.g. IaaS, PaaS) for POWER learning.
• POWER Technology Open Lab is the first lab to support
OpenPOWER ecosystem in GCG.
• co-led by IBM Research – China and IBM STG lab in China
• Endorsed by GCG(Andy Ho) and Global OpenPOWER (Ken
King)
49

Architecture of SuperVessel - Cloud for
OpenPOWER
50
Nova Neutron Cinder
KVM
Nova Neutron Cinder
KVM
Nova Neutron Cinder
LxC/
Docker
Nova Neutron Cinder
LxC
/ Docker
Nova Neutron Cinder
KVM
KVM pool for POWER8 LE KVM pool for POWER8 BE Container pool for
POWER8 LE
KVM pool for x86Container pool for
POWER8 BE
System
maintenance
System
monitoring
Resource usage
metering
System analysis
Services for cloud admin
User account &
authentication
management
User interface
Horizon
OpenStack controller
Nova
NeutronGlance Cinder
HEAT
Admin
interface
Virtual point
management
Statistic and analysis
Baremetal
management
Image
management
Cloud Infrastructure
Service
Big Data Service OpenPOWER
enablement service
Super Class
service
Super Project
Service
FPGA/GPU
OpenPOWER
server
GPFS
KeyStone
Manila
IBM POWER
servers

GPFS providing shared file system for Cloud IaaS
and Big Data Service on SuperVessel Cloud
51
Docker
(Sympho
ny)
Horizon
OpenStack controller
HEAT
NeutronGlance Manila
Nova
Cloud Infrastructure
Service
Big Data Service
• Select Big data
computing framework
(Mapreduce, SPARK
• Select cluster size
• Select data folder size
HEAT template for big data cluster
Docker
(Sympho
ny)
Docker
(Symphony)
Docker
(Sympho
ny)
Docker
(Sympho
ny)
Docker
(SPARK)
POWER7/POWER8
KVM/Docker
(Web app)
Folder
A
User BUser A
Folder
B
User A
• HEAT will orchestrate docker instances, subnet and data folder based on user’s request
• Manila provides the NFS service using GPFS as backend, and the folder will be mounted via nova-docker (with –v support)
• Folder created by Manila could be accessed by the KVM/docker instances created for big data and other purpose
GPFS FPO
POWER7/POWER8
Servers
GPFS FPO
Servers
GPFS FPO
KeySton
e
Cinder

SuperVessel: IBM technologies in integrated cloud
environment
• IBM Cloud Management with OpenStack: Product to provide the unify
management for cloud infrastructure and big data infrastructure on POWER
• IBM General Parallel File System (FPO) : Product to provide share file system
for cloud IaaS and big data
• Platform Symphony: Product to provide big data service
• Manila : Open Source project to provide NFS management service with
OpenStack
• Nova-docker: Open Source project to provide docker driver with Nova in
OpenStack
• Research technologies on cloud
52

Notices and Disclaimers
Copyright © 2015 by International Business Machines Corporation (IBM). No part of this document may be reproduced or
transmitted in any form without written permission from IBM.
U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with
IBM.
Information in these presentations (including information relating to products that have not yet been announced by IBM) has been
reviewed for accuracy as of the date of initial publication and could include unintentional technical or typographical errors. IBM
shall have no responsibility to update this information. THIS DOCUMENT IS DISTRIBUTED "AS IS" WITHOUT ANY WARRANTY,
EITHER EXPRESS OR IMPLIED. IN NO EVENT SHALL IBM BE LIABLE FOR ANY DAMAGE ARISING FROM THE USE OF
THIS INFORMATION, INCLUDING BUT NOT LIMITED TO, LOSS OF DATA, BUSINESS INTERRUPTION, LOSS OF PROFIT
OR LOSS OF OPPORTUNITY. IBM products and services are warranted according to the terms and conditions of the
agreements under which they are provided.
Any statements regarding IBM's future direction, intent or product plans are subject to change or withdrawal without
notice.
Performance data contained herein was generally obtained in a controlled, isolated environments. Customer examples are
presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual
performance, cost, savings or other results in other operating environments may vary.
References in this document to IBM products, programs, or services does not imply that IBM intends to make such products,
programs or services available in all countries in which IBM operates or does business.
Workshops, sessions and associated materials may have been prepared by independent session speakers, and do not
necessarily reflect the views of IBM. All materials and discussions are provided for informational purposes only, and are neither
intended to, nor shall constitute legal or other guidance or advice to any individual participant or their specific situation.
It is the customer’s responsibility to insure its own compliance with legal requirements and to obtain advice of competent legal
counsel as to the identification and interpretation of any relevant laws and regulatory requirements that may affect the customer’s
business and any actions the customer may need to take to comply with such laws. IBM does not provide legal advice or
represent or warrant that its services or products will ensure that the customer is in compliance with any law.

Notices and Disclaimers (con’t)
Information concerning non-IBM products was obtained from the suppliers of those products, their published
announcements or other publicly available sources. IBM has not tested those products in connection with this
publication and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM
products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products.
IBM does not warrant the quality of any third-party products, or the ability of any such third-party products to
interoperate with IBM’s products. IBM EXPRESSLY DISCLAIMS ALL WARRANTIES, EXPRESSED OR IMPLIED,
INCLUDING BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
PARTICULAR PURPOSE.
The provision of the information contained herein is not intended to, and does not, grant any right or license under any
IBM patents, copyrights, trademarks or other intellectual property right.
• IBM, the IBM logo, ibm.com, Bluemix, Blueworks Live, CICS, Clearcase, DOORS®, Enterprise Document
Management System™, Global Business Services ®, Global Technology Services ®, Information on Demand,
ILOG, Maximo®, MQIntegrator®, MQSeries®, Netcool®, OMEGAMON, OpenPower, PureAnalytics™,
PureApplication®, pureCluster™, PureCoverage®, PureData®, PureExperience®, PureFlex®, pureQuery®,
pureScale®, PureSystems®, QRadar®, Rational®, Rhapsody®, SoDA, SPSS, StoredIQ, Tivoli®, Trusteer®,
urban{code}®, Watson, WebSphere®, Worklight®, X-Force® and System z® Z/OS, are trademarks of
International Business Machines Corporation, registered in many jurisdictions worldwide. Other product and
service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on
the Web at "Copyright and trademark information" at: www.ibm.com/legal/copytrade.shtml.

Thank You
Your Feedback is
Important!
Access the InterConnect 2015
Conference CONNECT Attendee
Portal to complete your session
surveys from your smartphone,
laptop or conference kiosk.

Elastic storage in the cloud session 5224 final v2

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Elastic storage in the cloud session 5224 final v2

Similar to Elastic storage in the cloud session 5224 final v2 (20)

Recently uploaded

Recently uploaded (20)

Elastic storage in the cloud session 5224 final v2