Introduction to IBM Spectrum Scale and Its Use in Life Science

#ibmedge© 2016 IBM Corporation
SBD-1266
Introduction to IBM Spectrum Scale and
Its Use in Life Science
Sven Oehme, IBM Research
Konstantin Arnold, University of Basel

#ibmedge
Spectrum Scale Architecture Highlights: Scalability
3

#ibmedge
Spectrum Scale Architecture Highlights: HA/Reliability
4

#ibmedge
Spectrum Scale Software Local Read Only Cache (LROC)
5
• Many NAS workloads benefit from large read cache
• SPECsfs
• OpenStack, VMWare and other virtualization
• Database
• Augment the Spectrum Scale Node DRAM cache with SSD/NVMe
• Used to cache:
– Data
– Inodes
– Indirect blocks
• Cache consistency insured by standard Spectrum Scale tokens
• Assumes SSD device is unreliable, data is protected by checksum and verified on read
• Provide low-latency access to file system metadata and data
• Implement with consumer flash for maximum Cache/$
• Enabled by FLEA’s LSA (Data is written Sequential to Device, to eliminate wear leveling)
• Reach small File performance leadership compared to other NAS Devices

#ibmedge
LROC Example Speed Up
6
• Two consumer grade 200 GB SSDs cache a forty-eight 300 GB 10K SAS disk Spectrum Scale storage system
• Initially, with all data coming from the disk storage system, the client reads data from the SAS disks at ~ 3,000 IOPS
• As more data is cached in Flash, client performance increases to 33,000 IOPS while reducing the load on the disk subsystem by
more than 95%

#ibmedge
Spectrum Scale Raid features
7

#ibmedge
ESS (Spectrum Scale Raid Building Blocks)
• Elastic Storage Server (ESS) is a prepacked solution using on the Spectrum Scale Raid technology and
Commodity HW components
• SSD/10k SAS Models
• GS1, GS2, GS4,GS6
• 2 x High Volume Servers
• 1/2/4/6 x JBOD disk enclosures
• NL-SAS Models
• GL2, GL4,GL6
• 2 x High Volume Servers
• 2/4/6 x JBOD disk enclosures
8

#ibmedge
ESS : various models
9

#ibmedge
University of Basel, Switzerland
10
1460: First and only University in Switzerland
until 19th century
7 faculties: Humanities, Science, Medicine,
Law, Business and Economics,
Psychology, Theology
7600 undergraduate students
3700 postgraduate and doctoral students
1300 academic staff
358 Professors

#ibmedge
Scientific Computing @ University of Basel
• HPC Clusters – specialized for large IO (bioinformatics) and high-speed
interconnects (molecular simulations)
• Central systems administration
• Up-to-date scientific databases
• Up-to-date software stack
• Back-up service
• User training
• User support
• Developer support
(code version, issue tracking,
wiki, etc.)
• Dedicated 24/7 production server environment for
web services (SWISS-MODEL, Ismara, Mirz, etc.)
11
3.5 PB
storage
10'000
CPU
cores
HPC
compute
clusters
scientific
software
training
&
support

#ibmedge
Supporting research in Northwest Switzerland
12
• Hosting reference bioinformatics services
• 500 registered users
• 110 research groups
• Acknowledged in 70 life-science publications in 2016
From stellar astrophysics…
… to brain genomics…
… to structural biology … … to hosting reference services…
SWISS-MODEL
Major funding

#ibmedge
Scientific Storage and Computing Infrastructure
Once upon time …
13
HPC Cluster
NFS Server

#ibmedge
Cluster and storage grew bigger ...
14
HPC Cluster
NSD Server NSD ServerNSD Server

#ibmedge
15
SONAS
NSD Server
Spectrum Scale Data Hub Layer
NSD Server NSD Server
TSM-HSMLTFS-EE
HPC Cluster
Biomedical
Research
Life Sciences
Department
Physics
Department
Chemistry
Department
Psychology
Department
Microscopy
Facility
Economy
Department
…
Genome
Sequencing
Facility

#ibmedge
Cluster Export Services
High available file and object export services
- export/share configuration straight forward
- authentication against AD or LDAP
Important for planning:
- NFS and Apple OS X
- SMB1 not supported
- mixed workload and performance
- changes in authentication
16
NSD ServerNSD Server NSD Server
Protocol Nodes
Active Directory
Authentication
CIFS NFS

#ibmedge
AFM for Data migration, Example: SONAS migration
Operational advantages:
- preparing and prefetching before switching clients
- migrate data while clients working on new share
- minimal downtime: 1min (AFM) for share 30TB, 30M inodes
vs. several months (using transfer host with robocopy)
Technical advantages:
- data transfer: observed up to 1TB/h
per gateway host
- ACL: transferred together with data
- Direct storage → storage migration,
no transfer host or copy software
needed (e.g. robocopy, rsync)
17
NSD ServerNSD Server NSD Server
SONAS
Gateway Nodes
Home Cluster

#ibmedge
Example: Scientific web server
Protein sequences vs. protein structures
18

#ibmedge
Protein annotation: humans vs. machines
19

#ibmedge
Disaster recovery: AFM between two sites
- less work to develop data replication to DR site
Scientific pipeline speedup x8: big pagepool + LROC
- processing steps depend on bigger datasets, unchanged for 1 week
- update of datasets very simple,
no data distribution required
20
NSD Server
HPC Cluster
200km
pagepool=128GB
LROC: 1TB SSD
AFM independent writer
(replication not speed critical)
Internet

#ibmedge
Information Lifecycle Management - HSM
Use of tape to lower cost of storage
Spectrum Archive EE (LTFS-EE):
- easy to manage, direct control of tape
- use of policies for fine grained placement
- well suited for data export
- not a full fledged backup system
Spectrum Protect for
Space Management
- integration with backup system
- requires TSM infrastructure
2121
Disk Pool
TS3500 TS3500
NSD Server
Spectrum Protect for Space ManagementSpectrum Archive EE
TSM Server
…
NSD Server
ClientsClients

#ibmedge
Secure environment for biomedical research
Encryption
- encryption of data at rest and on network
- defined via policies
- possibility of fine grained access groups
- encryption keys managed by key
management software (IBM SKLM)
- integration with general research infrastructure
- suited for Biomedical data and processing
22
SKLM
Secure research environment
Login
HPC Cluster
NSD Server
General research environment
NSD Server
Clients

#ibmedge
Summary
23
SONAS
NSD Server
TSM-HSMLTFS-EE
HPC Cluster
Biomedical
Research
Life Sciences
Department
Physics
Department
Chemistry
Department
Psychology
Department
Microscopy
Facility
Economy
Department
…
Genome
Sequencing
Facility
SKLM
Secure research environment
Login
HPC Cluster
NSD Server
Remote Site
AFM
CES: CIFS,NFSEncryption
ILM, HSM
LROC
Remote Cluster

#ibmedge
Spectrum Scale User Group
• The Spectrum Scale User Group is free
to join and open to all using, interested
in using or integrating Spectrum Scale.
• Join the User Group activities to meet
your peers and get access to experts
from partners and IBM.
• Next meetings:
- APAC: October 14, Melbourne
- Global at SC16 : November 13 1pm to 5pm, Salt Lake City
• Web page: http://www.spectrumscale.org/
• Presentations: http://www.spectrumscale.org/presentations/
• Mailing list: http://www.spectrumscale.org/join/
• Contact: http://www.spectrumscale.org/committee/
• Meet Bob Oesterlin (US Co-Principal) at Edge2016: Robert.Oesterlin@nuance.com

#ibmedge
Session : Futures of IBM Spectrum Scale
NDA & Customers ONLY
• Who: IBM Spectrum Scale Offering Management
• Carl Zetie, Ron Riffe
• When: Tuesday, September 20, 2016
• 1pm to 2pm
• Where: MGM Grand, Signature Tower 3
• Meeting Room D
• Contact (if any questions)
• douglasof@us.ibm.com, cmukhya@us.ibm.com
25

#ibmedge
Session : How to apply Flash benefits to big data
analytics and unstructured data
NDA & Customers ONLY
• Who: IBM Elastic Storage Server Offering Management
• Alex Chen
• When: Thursday, September 22, 2016
• 1:15pm to 2:15pm
• Where: Grand Garden Arena, Lower Level, MGM, Studio 10
• Contact(if any questions)
• • cmukhya@us.ibm.com, douglasof@us.ibm.co
26

#ibmedge
Spectrum Scale Trial VM
• Download the IBM Spectrum Scale Trial VM from :
• http://www-03.ibm.com/systems/storage/spectrum/scale/trial.html
27

#ibmedge
Spectrum Scale Edge – Technical Sessions
• Just Search for “ Spectrum Scale” in the IBM Events mobile app. There
are 15+ sessions on various topics including Lab sessions.
Lab Sessions:
• Spectrum Scale Problem Determination Lab
Date: Sept 20th 2:15 PM – 3:15 PM
Location : MGM Grand , Room 317 Lab Center F
• Spectrum Scale Trail VM Lab
Date: Sept 20th 3:45PM – 4:45PM
Location: MGM Grand , Room 317 Lab Center F
• Booth on ESS , Spectrum Scale + TCT and DeepFlash
28

Introduction to IBM Spectrum Scale and Its Use in Life Science

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (18)

Similar to Introduction to IBM Spectrum Scale and Its Use in Life Science

Similar to Introduction to IBM Spectrum Scale and Its Use in Life Science (20)

More from Sandeep Patil

More from Sandeep Patil (8)

Recently uploaded

Recently uploaded (20)

Introduction to IBM Spectrum Scale and Its Use in Life Science