© 2014 IBM Corporation 
Platform Computing 
1 
IBM Platform Computing 
Elastic Storage 
Gord Sissons 
Platform Symphony Product Marketing 
Scott Campbell 
Platform Symphony Product Manager 
Rohit Valia 
Director, Product Marketing
© 2014 IBM Corporation 
Platform Computing 
Traditional Storage 
2 
Filer 1 Filer 2
© 2014 IBM Corporation 
Platform Computing 
Traditional Storage 
3 
Filer 1 Filer 2 Filer 3 Filer 4 Filer 5 Filer 6 Filer 7 Filer 8
Solution: Global workload sharing, resource balanced storage 
© 2014 IBM Corporation 
Platform Computing 
4 
Filer 1 Filer 2 Filer 3 Filer 4 
Global Namespace 
with automated 
storage tiering
Elastic Storage provides extraordinarily parallel, scale-out storage 
© 2014 IBM Corporation 
Platform Computing 
5 
Global Namespace 
with automated 
storage tiering
Proven Reliability 
© 2014 IBM Corporation 
Platform Computing 
Elastic Storage – Key features 
6 
Extreme Scalability 
Maximum file system size: 
1 Million Yottabytes 
2^64 files per file system 
Maximum file size equals file 
system size 
Customers with 18 PB file 
systems 
IPv6 
Futureproof 
Commodity hardware 
Snapshots, replication 
Built-in heartbeat, automatic 
failover/failback 
Add/remove on the fly 
Rolling upgrades 
Administer from any node 
Commodity hardware 
High Performance 
Parallel file access 
Distributed, scalable, high 
performance metadata 
Flash acceleration 
Automatic tiering 
Over 400 GB/s 
Commodity hardware
© 2014 IBM Corporation 
Platform Computing 
Supported storage hardware 
In addition to IBM Storage, IBM General 
Parallel File System (GPFS™) supports 
storage hardware from these vendors: 
EMC 
Hitachi 
Hewlett Packard 
DDN 
7 
GPFS supports many storage systems, and the IBM support team can help 
customers using storage hardware solutions not on this list of tested devices.
General Parallel File System (GPFS™) for IBM 
POWER Systems™ is supported on both IBM 
AIX® and Linux®. 
© 2014 IBM Corporation 
Platform Computing 
Supported server hardware 
GPFS for x86 Architecture is 
supported on multiple x86 and AMD 
compatible systems: 
 IBM Intelligent Cluster 
 IBM iDataPlex® 
 IBM System x® 
rack-optimized servers 
 IBM BladeCenter® servers 
 Non-IBM x86 and AMD 
compatible servers 
8 
 System p™ 
 BladeCenter servers 
 IBM Blue Gene® 
 IBM System p® 
General Parallel File System (GPFS™) for x86 
Architecture™ is supported on both Linux® and 
Windows Server 2008. 
GPFS for Power is supported on 
multiple IBM POWER platforms:
© 2014 IBM Corporation 
Platform Computing 
Sharing Data Across an Organization 
9 
GPFS introduced 
concurrent file 
system access from 
multiple nodes. 
Multi-cluster expands the global 
namespace by connecting 
multiple sites 
AFM takes global namespace 
truly global by automatically 
managing asynchronous 
replication of data 
GPFS 
GPFS 
GPFS 
GPFS 
GPFS 
GPFS 
1993 2005 2011
© 2014 IBM Corporation 
Platform Computing 
Global Namespace 
10 
Clients access: 
/global/data1 
/global/data2 
/global/data3 
/global/data4 
/global/data5 
/global/data6 
Clients access: 
/global/data1 
/global/data2 
/global/data3 
/global/data4 
/global/data5 
/global/data6 
Clients access: 
/global/data1 
/global/data2 
/global/data3 
/global/data4 
/global/data5 
/global/data6 
File System: store1 
Cache Filesets: 
/data1 
/data2 
Local Filesets: 
/data3 
/data4 
Cache Filesets: 
/data5 
/data6 
File System: store2 
Local Filesets: 
/data1 
/data2 
Cache Filesets: 
/data3 
/data4 
Cache Filesets: 
/data5 
/data6 
File System: store3 
Cache Filesets: 
/data1 
/data2 
Cache Filesets: 
/data3 
/data4 
Local Filesets: 
/data5 
/data6 
See all data from any Cluster 
Cache as much data as 
required or fetch data on 
demand
© 2014 IBM Corporation 
Platform Computing 
Elastic Storage Data Life Cycle Management 
11 
Single Name Space 
Elastic Storage 
SSD 
CIFS 
File-System 
SAS SATA 
TSM 
LTFS 
HPSS 
Use Elastic Storage file set and ILM policies to 
control data placement, deletion and movement 
across storage tiers (pools)
© 2014 IBM Corporation 
Platform Computing 
A Typical Hadoop HDFS Environment 
12 
MapReduce Cluster 
NFS 
Filers 
M 
a 
p 
R 
e 
d 
u 
c 
e 
Jobs Users 
H 
D 
F 
S 
 Uses disk local to each server 
 Aggregates the local disk space into a single, redundant shared file system 
 The open source standard file systems used in partnership with Hadoop 
MapReduce
Hadoop MapReduce Environment Using Elastic Storage FPO 
© 2014 IBM Corporation 
Platform Computing 
13 
MapReduce Cluster 
NFS 
Filers 
M 
a 
p 
R 
e 
d 
u 
c 
e 
Jobs Users 
Elastic 
Storage 
FPO 
 Uses disk local to each server 
 Aggregates the local disk space into a single redundant shared file system 
 Designed for MapReduce workloads 
 Unlike HDFS, GPFS-FPO is POSIX compliant – so data maintenance is easy 
 Intended as a drop in replacement for open source HDFS (IBM BigInsights product may be 
required)
© 2014 IBM Corporation 
Platform Computing 
14 
Cloud Tier (ICStore) 
• IBM Public Cloud 
• Amazon S3 
• MS Azure 
• Private Cloud 
The Vision 
Analytics File Storage Media Data Ingest 
Solid 
State Spinning 
Disk 
Tape 
ESS 
World-Wide 
Data Distribution 
POSIX 
NFS 
MAP Reduce 
Object 
Elastic Storage 
• Single name space no matter where data resides 
• Data in best location, on the best tier (performance & cost), at the right time 
• Multi-tenancy 
• All in software
© 2014 IBM Corporation 
Platform Computing 
15
© 2014 IBM Corporation 
Platform Computing 
16 
Architecture
© 2014 IBM Corporation 
Platform Computing 
Elastic Storage Cluster Models 
17 
TCP/IP or Infiniband Network 
Storage 
Storage Storage 
TCP/IP or Infinband RDMA Network 
Storage Network 
TCP/IP or Infinband Network 
Application 
Nodes 
NSD 
Servers 
Application 
Nodes
© 2014 IBM Corporation 
Platform Computing 
18 
Features
Basics Standard 
© 2014 IBM Corporation 
Platform Computing 
Elastic Storage Key Features more details 
19 
Distributed journaled file 
system, scalable, high 
performance metadata 
AIX, Linux and Windows 
Single name space 
File parallel access 
Built-in heartbeat, automatic 
failover/failback, quorum 
Administer from any node 
Add/remove on the fly servers 
or disks 
Rolling upgrades 
Basics 
SNMP (running on a Linux 
node) 
Snapshots, backup, replication 
Filesets, quotas 
Active/active dual site with 
synchronous replication 
Multi-cluster 
Server internal disks (FPO) 
Flash acceleration (LROC 
Linux) 
File clones 
Automatic tiering (ILM), even to 
tapes with HSM software 
Geographic asynchronous 
caching (AFM) 
Clustered NFS servers (cNFS 
Linux) to give access beyond 
the elastic storage cluster 
Advanced 
Native encryption 
Secure deletion
Elastic Storage Manages the Full Data Lifecycle Cost Effectively 
© 2014 IBM Corporation 
Platform Computing 
• Policy-driven automation and tiered storage 
management 
• Match the cost of storage to the value of data 
• Storage pools create tiers of storage 
20 
Application servers 
Elastic Storage Server 
Or commodity hardware 
Tape Library 
Autotiering and 
Migration 
‒ High performance SSD 
‒ High speed SAS drives 
‒ High capacity NL SAS drives 
• Integrated with IBM Tivoli Storage Manager (TSM) 
and IBM LTFS Enterprise Edition (EE) 
‒ Elastic Storage handles all metadata processing then hands 
the data to TSM and LTFS EE for storage on tape 
‒ Data is retrieved from the external storage pool on demand, 
as a result of an application opening a file for example 
‒ Policies move data from one pool to another without changing 
the file’s location in the directory structure 
• Tape Migration Bottom Line: 
‒Cuts storage costs up to 90% 
• Right Data 
• Right Place 
• Right Time 
• Right Performance 
• Right Cost
© 2014 IBM Corporation 
Platform Computing 
Flash Local Read Only Cache (LROC) 
21 
Clients 
Flash LROC SSDs 
Elastic Storage 
• Inexpensive SSDs placed directly in Client nodes 
• Accelerates I/O performance up to 6x by reducing the amount of time 
CPUs wait for data 
• Also decreases the overall load on the network, benefitting 
performance across the board 
• Improves application performance while maintaining all the 
manageability benefits of shared storage 
• Cache consistency ensured by standard tokens 
• Data is protected by checksum and verified on read 
• Elastic Storage handles the flash cache automatically so data is 
transparently available to your application with very low latency and 
no code changes
© 2014 IBM Corporation 
Platform Computing 
Elastic Storage : Tiering to tape with LTFS/EE 
• Automatic migration to tape 
• File user does not see where file is 
stored 
• Scales by adding tape drives or nodes 
• Load is balanced on nodes and drives 
• Tapes can be exported/imported. 
• Redbook : IBM Linear Tape File System 
Enterprise Edition V1.1 Installation and 
Configuration Guide sg248143 
22 
GPFS Node 1 
Users and applications 
User data 
TSxxxx Tape Library 
LTFS EE 
GPFS Node 2 
LTFS EE 
Global name space 
GPFS file systems (user data and metadata)
© 2014 IBM Corporation 
Platform Computing 
23 
File Placement Optimizer 
(GPFS-FPO)
© 2014 IBM Corporation 
Platform Computing 
Elastic Storage - FPO 
24 
GPFS 
 Use disk local to each server 
 All Nodes are NSD servers and NSD Clients 
 Designed for map reduce workloads
© 2014 IBM Corporation 
Platform Computing 
Elastic Storage advanced storage for Hadoop 
Hadoop HDFS IBM GPFS-FPO Advantages 
25 
HDFS NameNode is a single point of failure 
Large block-sizes – poor support for small files 
Non-POSIX file system – obscure commands 
Difficulty to ingest data – special tools required 
Single-purpose, Hadoop MapReduce only 
Not recommended for critical data 
No single point of failure, distributed 
metadata 
Variable block sizes – suited to multiple types 
of data and data access patterns 
POSIX file system – easy to use and manage 
Policy based data ingest 
Versatile, Multi-purpose 
Enterprise Class advanced storage features
© 2014 IBM Corporation 
Platform Computing 
26 
OpenStack
OpenStack Delivers a Massively Scalable Cloud Operating 
System 
© 2014 IBM Corporation 
Platform Computing 
27 
OpenStack Mission: To produce the ubiquitous open source cloud 
computing platform that will meet the needs of public and private 
cloud providers regardless of size, by being simple to implement and 
massively scalable
© 2014 IBM Corporation 
Platform Computing 
28 
Horizon 
Nova 
Cinder 
Swift 
Neutron 
Glance 
Keystone 
OpenStack Key Components
© 2014 IBM Corporation 
Platform Computing 
OpenStack GPFS Cinder Driver 
• OpenStack Havana release includes a GPFS Cinder driver 
29 
– Giving architects access to the features and capabilities of the industry’s leading 
enterprise scale-out software defined storage 
• With OpenStack on GPFS, all nodes see all data 
– Copying data between services, like Glance to Cinder is minimized 
or eliminated 
– Speeding instance creation and conserving storage space 
• Rich set of data management and information lifecycle features 
- Volume Placement: On GPFS storage pools or FPO based placement 
- Resilience: Per-volume replication level, DIO volumes 
- Storage migration: Transparent or user-directed migration of volumes between GPFS 
storage pools, GPFS nodes or to other Cinder back ends 
- Glance Integration: Convert a volume to an image or an image to a volume through 
COW mechanism – fast mechanism for instance provisioning and capture
© 2014 IBM Corporation 
Platform Computing 
30 
Competition
© 2014 IBM Corporation 
Platform Computing 
31 
31 
Business Problem IBM GPFS Lustre EMC Isilion IBRIX 
Fusion 
HDFS MAPR 
POSIX Interface Yes Yes Yes Yes No Yes 
Multi-OS Support Yes Linux only N/A No No No 
Hadoop FS API or location aware 
connector 
Yes No Yes Yes Yes 
Lifecycle Management, Tape 
archival 
Yes No No No No 
Global name space Yes No Yes No 
Distributed meta data Yes No Yes Yes 
Expand capability on-line Yes No 
WAN caching / replication Yes No No No 
File system snapshots Yes 
Quotas Yes 
Open source No Yes No Yes No 
Commercial support Yes Yes 
Oracle, Cray, 
Bull, SGI & 
others 
Yes Yes 
HP 
Yes 
Cloudera, 
IBM & 
others 
Yes 
IBM GPFS vs. Competitors
© 2014 IBM Corporation 
Platform Computing 
32 
Elastic Storage – Editions
© 2014 IBM Corporation 
Platform Computing 
Elastic Storage - New Pricing Structure 
33 
Server and Client for Each 
Socket Based Licensing 
• Simpler, no more PVUs 
Express Edition 
• gpfs.base (no ilm, afm, cnfs) 
• gpfs.docs 
• gpfs.gpl 
• gpfs.msg 
• gpfs.gskit 
Standard Edition 
• Add gpfs.ext 
Advanced Edition 
• Add – gpfs.crypto 
Platforms 
• zLinux 
• Ubuntu 
Features Express Edition Standard Edition Advanced Edition 
Basic GPFS 
functionality 
ILM: Storage pools, 
Policy, mmbackup 
Active File 
Management (AFM) 
Clustered NFS 
(cNFS) 
Encryption
Client license 
Server license 
© 2014 IBM Corporation 
Platform Computing 
Elastic Storage Cluster Models 
34 
TCP/IP or Infiniband Network 
Storage 
Storage Storage 
TCP/IP or Infinband RDMA Network 
Storage Network 
TCP/IP or Infinband Network 
Application 
Nodes 
NSD 
Servers 
Application 
Nodes 
FPO license 
Server license
© 2014 IBM Corporation 
Platform Computing 
35 
Elastic Storage Server
Replaces Specialized hardware controller with software 
© 2014 IBM Corporation 
Platform Computing 
36 
36 
 Delivers Extreme Data Integrity 
– 2- and 3-fault-tolerant erasure codes 
– End-to-end checksum 
– Protection against lost writes 
– Fastest rebuild times using Declustered RAID 
 Breakthrough Performance 
– Declustered RAID reduces app load during rebuilds 
– Up to 3x lower overhead to applications 
– Built-in SSDs and NVRAM for write performance 
– Faster than alternatives today – and tomorrow! 
 Lowers TCO 
– 3 Years Maintenance and Support 
– General Purpose Servers 
– Off-the-shelf SBODs 
– Standardized in-band SES management 
– Standard Linux 
– Modular Upgrades 
Elastic Storage Server
© 2014 IBM Corporation 
Platform Computing 
Elastic Storage Server GL Models 
37 
37 
Model GL4 
Analytics and Cloud 
4 Enclosures, 20U 
232 NL-SAS, 2 SSD 
10+ GB/Sec 
Model GL6 
PetaScale Storage 
6 Enclosures, 28U 
348 NL-SAS, 2 SSD 
12+ GB/sec 
5146 Machine 
Type 
Model GL2 
Analytics Focused 
2 Enclosures, 12U 
116 NL-SAS, 2 SSD 
5+ GB/Sec 
•Power S822L Servers 
•20 Cores Each 
•1818-80e Expansion Chassis 
•Red Hat 7 
•Graphical User Interface 
•Management Server and HMC 
•Elastic Storage Software 
•Elastic Storage Native RAID 
•xCat or Platform Cluster Mgr. Opt. 
•10 Gb, 40 Gb Enet, FDR Inifiniband 
•From 116 to 348 Spinning Disk 
•3 Years Maitenance 
•Building Block approach to Growth 
•High Capacity Storage for Analytics and Cloud Serving 
•Uses 4U, 60 Drive Storage Enclosures 
•2TB or 4 TB Drives 
•A Client-Ready Petabyte in Single Rack!
© 2014 IBM Corporation 
Platform Computing 
Elastic Storage Server GS Models 
38 
•Smaller Configurations for High Velocity Ingest or Lower Cost Entry Point 
•Uses 2U, 24 Drive Storage Enclosures 
•400, 800 GB SSD Drives or 1.2 TB SAS Drives 
•Highest “Performance per U” Delivered to Clients 
•Deployable alone or as part of an ESS Configuration as “Platinum” tier 
Model GS1 
24 SSD 
6 GB/Sec 
Model GS2 
46 SAS + 2 SSD or 
48 SSD Drives 
2 GB/Sec SAS 
12 GB/Sec SSD 
Model GS4 
94 SAS + 2 SSD or 
96 SSD Drives 
5 GB/Sec SAS 
16 GB/Sec SSD 
Model GS6 
142 SAS + 2 SSD 
7 GB/Sec 
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 
FC 5887 
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 
FC 5887 
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 
FC 5887 
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 
FC 5887 
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 
FC 5887 
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 
FC 5887 
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 
FC 5887 
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 
FC 5887 
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 
FC 5887 
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 
FC 5887 
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 
FC 5887 
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 
FC 5887 
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 
FC 5887 
5146 Machine 
Type •Power S822L - 20 Cores Each 
•Power Expansion Chassis 
•Red Hat 7 
•Graphical User Interface 
•Management Server and HMC 
•Elastic Storage Software 
•Elastic Storage Native RAID 
•xCat or Platform Cluster Mgr. Opt. 
•10 Gb, 40 Gb Enet, FDR Inifiniband 
•3 Years Maintenance 
•Building Block approach to Growth
© 2014 IBM Corporation 
Platform Computing 
Elastic Storage Ensures End-to-end Data Availability, 
Reliability, and Integrity 
• GPFS Elastic Storage Native RAID (De-clustered RAID) 
39 
‒ Data and parity stripes are uniformly partitioned and distributed across array 
‒ Rebuilds that take days on other systems, take minutes on Elastic Storage 
• 2-fault and 3-fault tolerance 
‒ Reed-Solomon parity encoding; 2-fault or 3-fault tolerant 
‒ 3 or 4-way mirroring 
• End-to-end checksum & dropped write detection 
‒ From disk surface to Elastic Storage user / client 
‒ Detects and corrects off-track and lost / dropped disk writes 
• Asynchronous error diagnosis while affected I/Os continue 
‒ If media error: verify and restore if possible 
‒ If path problem: attempt alternate paths 
• Supports live replacement of disks 
‒ I/O operations continue for tracks whose disks are removed during service

IBM Platform Computing Elastic Storage

  • 1.
    © 2014 IBMCorporation Platform Computing 1 IBM Platform Computing Elastic Storage Gord Sissons Platform Symphony Product Marketing Scott Campbell Platform Symphony Product Manager Rohit Valia Director, Product Marketing
  • 2.
    © 2014 IBMCorporation Platform Computing Traditional Storage 2 Filer 1 Filer 2
  • 3.
    © 2014 IBMCorporation Platform Computing Traditional Storage 3 Filer 1 Filer 2 Filer 3 Filer 4 Filer 5 Filer 6 Filer 7 Filer 8
  • 4.
    Solution: Global workloadsharing, resource balanced storage © 2014 IBM Corporation Platform Computing 4 Filer 1 Filer 2 Filer 3 Filer 4 Global Namespace with automated storage tiering
  • 5.
    Elastic Storage providesextraordinarily parallel, scale-out storage © 2014 IBM Corporation Platform Computing 5 Global Namespace with automated storage tiering
  • 6.
    Proven Reliability ©2014 IBM Corporation Platform Computing Elastic Storage – Key features 6 Extreme Scalability Maximum file system size: 1 Million Yottabytes 2^64 files per file system Maximum file size equals file system size Customers with 18 PB file systems IPv6 Futureproof Commodity hardware Snapshots, replication Built-in heartbeat, automatic failover/failback Add/remove on the fly Rolling upgrades Administer from any node Commodity hardware High Performance Parallel file access Distributed, scalable, high performance metadata Flash acceleration Automatic tiering Over 400 GB/s Commodity hardware
  • 7.
    © 2014 IBMCorporation Platform Computing Supported storage hardware In addition to IBM Storage, IBM General Parallel File System (GPFS™) supports storage hardware from these vendors: EMC Hitachi Hewlett Packard DDN 7 GPFS supports many storage systems, and the IBM support team can help customers using storage hardware solutions not on this list of tested devices.
  • 8.
    General Parallel FileSystem (GPFS™) for IBM POWER Systems™ is supported on both IBM AIX® and Linux®. © 2014 IBM Corporation Platform Computing Supported server hardware GPFS for x86 Architecture is supported on multiple x86 and AMD compatible systems:  IBM Intelligent Cluster  IBM iDataPlex®  IBM System x® rack-optimized servers  IBM BladeCenter® servers  Non-IBM x86 and AMD compatible servers 8  System p™  BladeCenter servers  IBM Blue Gene®  IBM System p® General Parallel File System (GPFS™) for x86 Architecture™ is supported on both Linux® and Windows Server 2008. GPFS for Power is supported on multiple IBM POWER platforms:
  • 9.
    © 2014 IBMCorporation Platform Computing Sharing Data Across an Organization 9 GPFS introduced concurrent file system access from multiple nodes. Multi-cluster expands the global namespace by connecting multiple sites AFM takes global namespace truly global by automatically managing asynchronous replication of data GPFS GPFS GPFS GPFS GPFS GPFS 1993 2005 2011
  • 10.
    © 2014 IBMCorporation Platform Computing Global Namespace 10 Clients access: /global/data1 /global/data2 /global/data3 /global/data4 /global/data5 /global/data6 Clients access: /global/data1 /global/data2 /global/data3 /global/data4 /global/data5 /global/data6 Clients access: /global/data1 /global/data2 /global/data3 /global/data4 /global/data5 /global/data6 File System: store1 Cache Filesets: /data1 /data2 Local Filesets: /data3 /data4 Cache Filesets: /data5 /data6 File System: store2 Local Filesets: /data1 /data2 Cache Filesets: /data3 /data4 Cache Filesets: /data5 /data6 File System: store3 Cache Filesets: /data1 /data2 Cache Filesets: /data3 /data4 Local Filesets: /data5 /data6 See all data from any Cluster Cache as much data as required or fetch data on demand
  • 11.
    © 2014 IBMCorporation Platform Computing Elastic Storage Data Life Cycle Management 11 Single Name Space Elastic Storage SSD CIFS File-System SAS SATA TSM LTFS HPSS Use Elastic Storage file set and ILM policies to control data placement, deletion and movement across storage tiers (pools)
  • 12.
    © 2014 IBMCorporation Platform Computing A Typical Hadoop HDFS Environment 12 MapReduce Cluster NFS Filers M a p R e d u c e Jobs Users H D F S  Uses disk local to each server  Aggregates the local disk space into a single, redundant shared file system  The open source standard file systems used in partnership with Hadoop MapReduce
  • 13.
    Hadoop MapReduce EnvironmentUsing Elastic Storage FPO © 2014 IBM Corporation Platform Computing 13 MapReduce Cluster NFS Filers M a p R e d u c e Jobs Users Elastic Storage FPO  Uses disk local to each server  Aggregates the local disk space into a single redundant shared file system  Designed for MapReduce workloads  Unlike HDFS, GPFS-FPO is POSIX compliant – so data maintenance is easy  Intended as a drop in replacement for open source HDFS (IBM BigInsights product may be required)
  • 14.
    © 2014 IBMCorporation Platform Computing 14 Cloud Tier (ICStore) • IBM Public Cloud • Amazon S3 • MS Azure • Private Cloud The Vision Analytics File Storage Media Data Ingest Solid State Spinning Disk Tape ESS World-Wide Data Distribution POSIX NFS MAP Reduce Object Elastic Storage • Single name space no matter where data resides • Data in best location, on the best tier (performance & cost), at the right time • Multi-tenancy • All in software
  • 15.
    © 2014 IBMCorporation Platform Computing 15
  • 16.
    © 2014 IBMCorporation Platform Computing 16 Architecture
  • 17.
    © 2014 IBMCorporation Platform Computing Elastic Storage Cluster Models 17 TCP/IP or Infiniband Network Storage Storage Storage TCP/IP or Infinband RDMA Network Storage Network TCP/IP or Infinband Network Application Nodes NSD Servers Application Nodes
  • 18.
    © 2014 IBMCorporation Platform Computing 18 Features
  • 19.
    Basics Standard ©2014 IBM Corporation Platform Computing Elastic Storage Key Features more details 19 Distributed journaled file system, scalable, high performance metadata AIX, Linux and Windows Single name space File parallel access Built-in heartbeat, automatic failover/failback, quorum Administer from any node Add/remove on the fly servers or disks Rolling upgrades Basics SNMP (running on a Linux node) Snapshots, backup, replication Filesets, quotas Active/active dual site with synchronous replication Multi-cluster Server internal disks (FPO) Flash acceleration (LROC Linux) File clones Automatic tiering (ILM), even to tapes with HSM software Geographic asynchronous caching (AFM) Clustered NFS servers (cNFS Linux) to give access beyond the elastic storage cluster Advanced Native encryption Secure deletion
  • 20.
    Elastic Storage Managesthe Full Data Lifecycle Cost Effectively © 2014 IBM Corporation Platform Computing • Policy-driven automation and tiered storage management • Match the cost of storage to the value of data • Storage pools create tiers of storage 20 Application servers Elastic Storage Server Or commodity hardware Tape Library Autotiering and Migration ‒ High performance SSD ‒ High speed SAS drives ‒ High capacity NL SAS drives • Integrated with IBM Tivoli Storage Manager (TSM) and IBM LTFS Enterprise Edition (EE) ‒ Elastic Storage handles all metadata processing then hands the data to TSM and LTFS EE for storage on tape ‒ Data is retrieved from the external storage pool on demand, as a result of an application opening a file for example ‒ Policies move data from one pool to another without changing the file’s location in the directory structure • Tape Migration Bottom Line: ‒Cuts storage costs up to 90% • Right Data • Right Place • Right Time • Right Performance • Right Cost
  • 21.
    © 2014 IBMCorporation Platform Computing Flash Local Read Only Cache (LROC) 21 Clients Flash LROC SSDs Elastic Storage • Inexpensive SSDs placed directly in Client nodes • Accelerates I/O performance up to 6x by reducing the amount of time CPUs wait for data • Also decreases the overall load on the network, benefitting performance across the board • Improves application performance while maintaining all the manageability benefits of shared storage • Cache consistency ensured by standard tokens • Data is protected by checksum and verified on read • Elastic Storage handles the flash cache automatically so data is transparently available to your application with very low latency and no code changes
  • 22.
    © 2014 IBMCorporation Platform Computing Elastic Storage : Tiering to tape with LTFS/EE • Automatic migration to tape • File user does not see where file is stored • Scales by adding tape drives or nodes • Load is balanced on nodes and drives • Tapes can be exported/imported. • Redbook : IBM Linear Tape File System Enterprise Edition V1.1 Installation and Configuration Guide sg248143 22 GPFS Node 1 Users and applications User data TSxxxx Tape Library LTFS EE GPFS Node 2 LTFS EE Global name space GPFS file systems (user data and metadata)
  • 23.
    © 2014 IBMCorporation Platform Computing 23 File Placement Optimizer (GPFS-FPO)
  • 24.
    © 2014 IBMCorporation Platform Computing Elastic Storage - FPO 24 GPFS  Use disk local to each server  All Nodes are NSD servers and NSD Clients  Designed for map reduce workloads
  • 25.
    © 2014 IBMCorporation Platform Computing Elastic Storage advanced storage for Hadoop Hadoop HDFS IBM GPFS-FPO Advantages 25 HDFS NameNode is a single point of failure Large block-sizes – poor support for small files Non-POSIX file system – obscure commands Difficulty to ingest data – special tools required Single-purpose, Hadoop MapReduce only Not recommended for critical data No single point of failure, distributed metadata Variable block sizes – suited to multiple types of data and data access patterns POSIX file system – easy to use and manage Policy based data ingest Versatile, Multi-purpose Enterprise Class advanced storage features
  • 26.
    © 2014 IBMCorporation Platform Computing 26 OpenStack
  • 27.
    OpenStack Delivers aMassively Scalable Cloud Operating System © 2014 IBM Corporation Platform Computing 27 OpenStack Mission: To produce the ubiquitous open source cloud computing platform that will meet the needs of public and private cloud providers regardless of size, by being simple to implement and massively scalable
  • 28.
    © 2014 IBMCorporation Platform Computing 28 Horizon Nova Cinder Swift Neutron Glance Keystone OpenStack Key Components
  • 29.
    © 2014 IBMCorporation Platform Computing OpenStack GPFS Cinder Driver • OpenStack Havana release includes a GPFS Cinder driver 29 – Giving architects access to the features and capabilities of the industry’s leading enterprise scale-out software defined storage • With OpenStack on GPFS, all nodes see all data – Copying data between services, like Glance to Cinder is minimized or eliminated – Speeding instance creation and conserving storage space • Rich set of data management and information lifecycle features - Volume Placement: On GPFS storage pools or FPO based placement - Resilience: Per-volume replication level, DIO volumes - Storage migration: Transparent or user-directed migration of volumes between GPFS storage pools, GPFS nodes or to other Cinder back ends - Glance Integration: Convert a volume to an image or an image to a volume through COW mechanism – fast mechanism for instance provisioning and capture
  • 30.
    © 2014 IBMCorporation Platform Computing 30 Competition
  • 31.
    © 2014 IBMCorporation Platform Computing 31 31 Business Problem IBM GPFS Lustre EMC Isilion IBRIX Fusion HDFS MAPR POSIX Interface Yes Yes Yes Yes No Yes Multi-OS Support Yes Linux only N/A No No No Hadoop FS API or location aware connector Yes No Yes Yes Yes Lifecycle Management, Tape archival Yes No No No No Global name space Yes No Yes No Distributed meta data Yes No Yes Yes Expand capability on-line Yes No WAN caching / replication Yes No No No File system snapshots Yes Quotas Yes Open source No Yes No Yes No Commercial support Yes Yes Oracle, Cray, Bull, SGI & others Yes Yes HP Yes Cloudera, IBM & others Yes IBM GPFS vs. Competitors
  • 32.
    © 2014 IBMCorporation Platform Computing 32 Elastic Storage – Editions
  • 33.
    © 2014 IBMCorporation Platform Computing Elastic Storage - New Pricing Structure 33 Server and Client for Each Socket Based Licensing • Simpler, no more PVUs Express Edition • gpfs.base (no ilm, afm, cnfs) • gpfs.docs • gpfs.gpl • gpfs.msg • gpfs.gskit Standard Edition • Add gpfs.ext Advanced Edition • Add – gpfs.crypto Platforms • zLinux • Ubuntu Features Express Edition Standard Edition Advanced Edition Basic GPFS functionality ILM: Storage pools, Policy, mmbackup Active File Management (AFM) Clustered NFS (cNFS) Encryption
  • 34.
    Client license Serverlicense © 2014 IBM Corporation Platform Computing Elastic Storage Cluster Models 34 TCP/IP or Infiniband Network Storage Storage Storage TCP/IP or Infinband RDMA Network Storage Network TCP/IP or Infinband Network Application Nodes NSD Servers Application Nodes FPO license Server license
  • 35.
    © 2014 IBMCorporation Platform Computing 35 Elastic Storage Server
  • 36.
    Replaces Specialized hardwarecontroller with software © 2014 IBM Corporation Platform Computing 36 36  Delivers Extreme Data Integrity – 2- and 3-fault-tolerant erasure codes – End-to-end checksum – Protection against lost writes – Fastest rebuild times using Declustered RAID  Breakthrough Performance – Declustered RAID reduces app load during rebuilds – Up to 3x lower overhead to applications – Built-in SSDs and NVRAM for write performance – Faster than alternatives today – and tomorrow!  Lowers TCO – 3 Years Maintenance and Support – General Purpose Servers – Off-the-shelf SBODs – Standardized in-band SES management – Standard Linux – Modular Upgrades Elastic Storage Server
  • 37.
    © 2014 IBMCorporation Platform Computing Elastic Storage Server GL Models 37 37 Model GL4 Analytics and Cloud 4 Enclosures, 20U 232 NL-SAS, 2 SSD 10+ GB/Sec Model GL6 PetaScale Storage 6 Enclosures, 28U 348 NL-SAS, 2 SSD 12+ GB/sec 5146 Machine Type Model GL2 Analytics Focused 2 Enclosures, 12U 116 NL-SAS, 2 SSD 5+ GB/Sec •Power S822L Servers •20 Cores Each •1818-80e Expansion Chassis •Red Hat 7 •Graphical User Interface •Management Server and HMC •Elastic Storage Software •Elastic Storage Native RAID •xCat or Platform Cluster Mgr. Opt. •10 Gb, 40 Gb Enet, FDR Inifiniband •From 116 to 348 Spinning Disk •3 Years Maitenance •Building Block approach to Growth •High Capacity Storage for Analytics and Cloud Serving •Uses 4U, 60 Drive Storage Enclosures •2TB or 4 TB Drives •A Client-Ready Petabyte in Single Rack!
  • 38.
    © 2014 IBMCorporation Platform Computing Elastic Storage Server GS Models 38 •Smaller Configurations for High Velocity Ingest or Lower Cost Entry Point •Uses 2U, 24 Drive Storage Enclosures •400, 800 GB SSD Drives or 1.2 TB SAS Drives •Highest “Performance per U” Delivered to Clients •Deployable alone or as part of an ESS Configuration as “Platinum” tier Model GS1 24 SSD 6 GB/Sec Model GS2 46 SAS + 2 SSD or 48 SSD Drives 2 GB/Sec SAS 12 GB/Sec SSD Model GS4 94 SAS + 2 SSD or 96 SSD Drives 5 GB/Sec SAS 16 GB/Sec SSD Model GS6 142 SAS + 2 SSD 7 GB/Sec 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 FC 5887 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 FC 5887 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 FC 5887 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 FC 5887 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 FC 5887 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 FC 5887 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 FC 5887 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 FC 5887 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 FC 5887 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 FC 5887 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 FC 5887 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 FC 5887 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 FC 5887 5146 Machine Type •Power S822L - 20 Cores Each •Power Expansion Chassis •Red Hat 7 •Graphical User Interface •Management Server and HMC •Elastic Storage Software •Elastic Storage Native RAID •xCat or Platform Cluster Mgr. Opt. •10 Gb, 40 Gb Enet, FDR Inifiniband •3 Years Maintenance •Building Block approach to Growth
  • 39.
    © 2014 IBMCorporation Platform Computing Elastic Storage Ensures End-to-end Data Availability, Reliability, and Integrity • GPFS Elastic Storage Native RAID (De-clustered RAID) 39 ‒ Data and parity stripes are uniformly partitioned and distributed across array ‒ Rebuilds that take days on other systems, take minutes on Elastic Storage • 2-fault and 3-fault tolerance ‒ Reed-Solomon parity encoding; 2-fault or 3-fault tolerant ‒ 3 or 4-way mirroring • End-to-end checksum & dropped write detection ‒ From disk surface to Elastic Storage user / client ‒ Detects and corrects off-track and lost / dropped disk writes • Asynchronous error diagnosis while affected I/Os continue ‒ If media error: verify and restore if possible ‒ If path problem: attempt alternate paths • Supports live replacement of disks ‒ I/O operations continue for tracks whose disks are removed during service