Spectrum scale object analytics

#ibmedge© 2016 IBM Corporation
Analytics for Object Storage Simplified
- Spectrum Scale Object Storage with
File Access for Hadoop
Sandeep Patil, STSM, IBM Spectrum Scale
Tomer Perry, Solution Architect, IBM Spectrum Scale
Smita Raut , Object Development, IBM Spectrum Scale
Acknowledgement : Bill Owen, Ashutosh Mate, Shou Feng, John Gu, Yong
Zeng, Piyush Chaudhary, Wei Gong

#ibmedge
Agenda
• Introduction to Spectrum Scale
• Introduction to Spectrum Scale Analytics
• Introduction to Spectrum Scale Object Store
• Unified File & Object Access (UFO) Feature Details
• Use Cases Enabled By UFO
• Deep Dive of In-Place Analytics Use Case
• Demo
• Q & A
1

#ibmedge
Spectrum Scale Analytics Introduction
8

#ibmedge
GPFS-FPO Advanced Storage for Map Reduce Data
15
Hadoop HDFS IBM GPFS Advantages
HDFS NameNode is a single point of failure
Large block-sizes – poor support
for small files
Non-POSIX file system – obscure
commands
Difficulty to ingest data – special
tools required
Single-purpose, Hadoop
MapReduce only
Not recommended for critical data
No single point of failure,
distributed metadata
Variable block sizes – suited to multiple
types of data and data access patterns
POSIX file system – easy to use and
manage
Policy based data ingest
Versatile, Multi-purpose
Enterprise Class advanced
storage features

#ibmedge
Use Case: Big Data Analytics
• Problem: Separate storage systems for ingest/distribution
and analysis
• Data movement overhead is a significant part of
my time to insight.
• Increased cost from data duplication & overhead
• Inconsistent results
• Solution: Native HDFS support
• Decreased time to results
• Run Map/Reduce directly
• No waiting for data transfer between storage
systems
• Immediately share results
16
Spectrum Scale
File/ ObjectFile/HDFS
Global
Ingest and
Distribution
Business
Analytics
Custom
Applications
Packaged
Applications

#ibmedge
Spectrum Scale Object Storage Introduction
17

#ibmedge
IBM Spectrum Scale
• Avoid vendor lock-in with true Software
Defined Storage and Open Standards
• Seamless performance & capacity scaling
• Automate data management at scale
• Enable global collaboration
Data management at scale OpenStack and Spectrum Scale helps
clients manage data at scale
Business: I need virtually
unlimited storage
Operations: I need a flexible
infrastructure that supports
both object and file based
storage
Operations: I need to
minimize the time it takes to
perform common storage
management tasks
Collaboration: I need to share
data between people,
departments and sites with
low latency.
A single data plane
that supports Cinder,
Glance, Swift, Manila
as well as NFS, et. al.
A fully automated
policy based data
placement and
migration tool
An open & scalable
cloud platform
Sharing with a
variety of WAN
caching modes
Results
• Converge File and Object based storage under one roof
• Employ enterprise features to protect data, e.g.
Snapshots, Backup, and Disaster Recovery
• Support native file, block and object sharing to data.
Spectrum Scale
NFS
SMBPOSIX
SSD Fast
Disk
Slow
Disk
Tape
Swift
HDFS
Cinder
Glance Manila
Cognitive
Services
18

#ibmedge
Spectrum Scale Object Storage
• Basic support added in 4.1.1 release & enhanced in 4.2 and 4.2.1 release
• Based on Openstack Swift (Juno Release)
• REST-based data access
• Growing number of clients due to extremely simple protocol
• Applications can easily save & access data from anywhere using HTTP
• Simple set of atomic operations:
– PUT (upload)
– POST (update metadata)
– GET (download)
– DELETE
• Amazon S3 Protocol support
• High Availability with CES Integration
• Simple and Automated Installation Process
• Integrated authentication (Keystone) support
• Native GPFS Command Line Interface to manage Object service (mmobj command)
19

#ibmedge
Spectrum Scale Object Store – Additional Features
• Unified file and object support with Hadoop connectors
• Support for Encryption
• Support for Compression
• Only Object Store with Tape support for Backup
• Object store with integrated transparent cloud tiering Support
• Multi Region support
• AD/LDAP support for authentication
• ILM support for Object
• Movement of Object across storage tiers based on access heat
• Spectrum Scale Object with IBM DeepFlash becomes object store over all flash array for newer faster
workloads.
• Spectrum Scale Object with WAN caching support (AFM)
20

#ibmedge
Spectrum Scale Object Vs Cleversafe
21

#ibmedge
The right solution for the workload
22
Ideal Workloads
• Big Data Analytics
• High Performance Computing, e.g. Engineering
Applications
• Performance optimized Backup and Restore
• Multi-Site file collaboration
• Multi-tier File Synch and Share
• Cold data archive with lowest cost data storage
tier
Differentiation
• Designed for high performance
• Unified Storage Infrastructure: Native File,
Object & Hadoop
• Robust Tiering with policy based data placement
and data movement
• Multi site collaboration with advanced routing
and caching
• Enterprise Features, e.g. Encryption,
Compression, QoS, & Disaster Recovery
Ideal Workloads
• Active Archive (warm data, mostly static)
• Cost optimized Cloud backup target
• Web app content
• Remote office storage consolidation
• Storage as a service
Differentiation
• Designed for easy deployment and
management at scale
• Always-on architecture
• Geo-dispersed erasure coding for site fault
tolerance and DR
• Simple keyless native encryption and multi-
tenant security
• Reduced cost and complexity
Spectrum Scale
IBM Cloud Object Store
(Cleversafe)

#ibmedge
IBM Spectrum Scale: Unified File and Object Access
Feature Overview
23

#ibmedge
Unified File & Object (UFO) Support
• Challenge
• The world is not converged/file/object/HDFS today!
• and never will be completely…
• Unified Scale-out Content Repository
• File or object in. Object or file out.
• Integrated big data analytics support
• Native protocol support
• High-performance that scales
• Single Management Plane
24
Spectrum Scale
NFS SMBPOSIX
SSD Fast
Disk
Slow
Disk
Tape
Swift/S3HDFS
Spectrum Scale: Redefining Unified Storage

#ibmedge
Spectrum Scale Unified File & Object
• Access same content both as a File & as an Object without making a copy or needing File or Object
Gateways!
• File-In-Object-Out and Object-In-File-Out Support
• Support for File Access Protocols (NFS/SMB/POSIX) and Object Access Protocols (Swift/S3)
• Objects ingested into designated Unified Container available as Files and Files ingested into it available as
Objects.
• Support for File & Object ACLs with Unified Mode ID Mapping
25

#ibmedge
Unified File and Object Access – What is it ?
26

#ibmedge
What is Unified File and Object Access ?
• Accessing object using file interfaces
(SMB/NFS/POSIX) and accessing file using object
interfaces (REST) helps legacy applications
designed for file to seamlessly start integrating into the
object world.
• It allows object data to be accessed using
applications designed to process files. It allows file
data to be published as objects.
• Multi protocol access for file and object in the same
namespace (with common User ID management
capability) allows supporting and hosting data oceans
of different types of data with multiple access options.
• Optimizes various use cases and solution architectures
resulting in better efficiency as well as cost savings.
27
<Clustered file system>
Swift (With Swift on File)
NFS/SMB/POSIXObject(http)
2
1
<Container>
File Exports created
on container level
OR
POSIX access from
container level
Objects accessed
as FilesData ingested
as Objects
3
Data ingested
as Files4
Files accessed as
Objects

#ibmedge
Flexible Identity Management Modes
• Support’s Two Identity Management Modes
• Administrators can choose based on their need and use-case using CLI -------------->
28
#mmobj config change --ccrfile
object-server-sof.conf --
section DEFAULT --property
id_mgmt --value unified_mode |
local_mode
Local_Mode Unified_Mode
Identity Management Modes
Object created by Object interface
will be owned by internal “swift” user
Application processing the object data
from file interface will need the required
file ACL to access the data.
Object authentication setup
is independent of File
Authentication setup
Object created from Object interface should be
owned by the user doing the Object PUT (i.e
FILE will be owned by UID/GID of the user)
Users from Object and File are expected to be
common auth and coming from same directory
service (only AD+RFC 2307 or LDAP)
Owner of the object will own and
have access to the data from file
interface.
Suitable for unified file and object access for
end users. Leverage common ILM policies
for file and object data based on data
ownership
Suitable when auth schemes for file and
object are different and unified access
is for applications

#ibmedge
Use Cases Enabled by Unified File Object
29

#ibmedge
Use case 1 – Enabling “In-Place” analytics for Object
data repository with analytic results available as objects
30
Clustered file system
<SOF_Fileset>/<Device>
Object
(http)
Data ingested
as Objects
Spark or Hadoop
MapReduce
In-Place Analytics
Source:https://aws.amazon.com/elasticmapreduce/
Traditional object store – Data to be copied from
object store to dedicated cluster , do the analysis
and copy the result back to object store for
publishing
Object store with Unified File and Object Access –
Object Data available as File on the same fileset. Analytics systems like
Hadoop MapReduce or Spark allow the data to be directly leveraged for
analytics.
No data movement i.e. In-Place immediate data analytics.
Analytics With Unified File and Object AccessAnalytics on Traditional Object Store
Explicit Data movement
Results Published
as Objects with
no data movement
Results returned
in place

#ibmedge
Use case 2 : Process Object Data with File-Oriented
Applications and Publish Outcomes as Objects
31
Swift on file
Container1
Virtual
Machine
Instances
Virtual
Machine
Instances
Container2
Subsidiary 1 Subsidiary 2
NFS Export
on
Container 1
NFS Export
on
Container 2
Virtual
Machine
Instances
Virtual
Machine
Instances
VM Farm for Subsidiary 1
for video processing
VM Farm for Subsidiary 2
for video processing
…. ….
Ingest
Media Objects
Media House OpenStack Cloud Platform
(Tenant = Media House Subsidiaries)
Manila Shares (NFS) exported only for Subsidiary1
Publishing Channels
Final Video (as objects)
available for streaming
Final processed videos available as
Objects in container which is used for
external publishing
Raw media content sent for media
processing which happens over files
(Object to File access)
NFS Export
on
Container 1’
Container
1’
Manila Shares (NFS) exported only for Subsidiary2
Files converted into objects for publishing
(File to Object access)

#ibmedge
Use case 3 : Users read/write data via File and Object
with Common User Authentication and Identity
32
Clustered file system
Data
N
F
S
S
M
B
O
b
je
c
t
Data
N
F
S
S
M
B
O
b
je
c
t
User: John User: Riya
Access Common Data using the same User Credentials across all protocols
Corporate User
Directory
(Active Directory/LDAP)
Riya’s data Read/Written
from Object should be
owned by Riya when
accessed from File
(SMB/NFS/POSIX)
User: Riya
UID: 1001
GID: 2000
Domain: XYZ

#ibmedge
Deep Dive on In-Place Analytics Use Case
33

#ibmedge
Analytics use case
34
What is In-place Analytics ?
What is unified storage ?
Setup Details
Prerequisites for demo
Demo content

#ibmedge
What is In-place Analytics ?
35

#ibmedge
Setup Details
36
/dev/dm-3
viknode1
Roles – Admin,
quorum, NSD
viknode2
Roles – Quorum,
NSD, CES Node
viknode3
Roles – Quorum,
CES Node
Spectrum Scale Cluster
IBM BigInsight with Spectrum Scale Demo Setup
/dev/dm-2 Disks
Ambari Server
IBM BigInsightsYarn
Spark
HiveOozie
Slider Knox

#ibmedge
Prerequisites For Demo
37
Setup a Spectrum Scale Cluster with NFS, SMB and Object Protocols Enabled
Setup same authentication for File and Object
Enable unified access mode
Enable file access capabilities
Create a swift storage policy with File access enabled
Install BigInsights
Start Ambari server
Configure Cyberduck Client to access object store

#ibmedge
Demo Content
38
Upload a file
using Cyberduck
Run Analytics on
file
Download result
using Cyberduck

#ibmedge
Spectrum Scale User Group
• The Spectrum Scale User Group is free
to join and open to all using, interested
in using or integrating Spectrum Scale.
• Join the User Group activities to meet
your peers and get access to experts
from partners and IBM.
• Next meetings:
- APAC: October 14, Melbourne
- Global at SC16 : November 13 1pm to 5pm, Salt Lake City
• Web page: http://www.spectrumscale.org/
• Presentations: http://www.spectrumscale.org/presentations/
• Mailing list: http://www.spectrumscale.org/join/
• Contact: http://www.spectrumscale.org/committee/
• Meet Bob Oesterlin (US Co-Principal) at Edge2016: Robert.Oesterlin@nuance.com

#ibmedge
Session : Futures of IBM Spectrum Scale
NDA & Customers ONLY
• Who: IBM Spectrum Scale Offering Management
• Carl Zetie, Ron Riffe
• When: Tuesday, September 20, 2016
• 1pm to 2pm
• Where: MGM Grand, Signature Tower 3
• Meeting Room D
• Contact (if any questions)
• douglasof@us.ibm.com, cmukhya@us.ibm.com
41

#ibmedge
Session : How to apply Flash benefits to big data
analytics and unstructured data
NDA & Customers ONLY
• Who: IBM Elastic Storage Server Offering Management
• Alex Chen
• When: Thursday, September 22, 2016
• 1:15pm to 2:15pm
• Where: Grand Garden Arena, Lower Level, MGM, Studio 10
• Contact(if any questions)
• • cmukhya@us.ibm.com, douglasof@us.ibm.co
42

#ibmedge
Trial VM
• Download the IBM Spectrum Scale Trial VM from : http://www-
03.ibm.com/systems/storage/spectrum/scale/trial.html
43

#ibmedge
References
Write a File, read as an Object: Openstack Summit, Austin, TX Apr 2016
https://www.youtube.com/watch?v=6ovLb6aktbM&feature=youtu.be&t=2
Amalgamating Manila and Swift for Unified Data Sharing: Openstack Summit, Austin, TX Apr 2016
https://www.youtube.com/watch?v=3MMrMUaA_Mg
Hadoop HDFS Vs Spectrum Scale: https://www.youtube.com/watch?v=kOeEbdO8F4A
From Archive to Insight: Debunking Myths of Analytics on Object Stores – Dean Hildebrand, Bill Owen,
Simon Lorenz, Luis Pabon, Rui Zhang. Vancouver Summit, Spring 2015.
https://www.youtube.com/watch?v=brhEUptD3JQ
Deploying Swift on a File System – Bill Owen, Thiago Da Silva. BrownBag at OpenStack Paris, Fall 2014
https://www.youtube.com/watch?v=vPn2uZF4yWo
Breaking the Mold with OpenStack Swift and GlusterFS – Jon Dickinson, Luis Pabo. Atlanta Summit, Spring 2014
https://www.youtube.com/watch?v=pSWdzjA8WuA
SNIA SDC 2015
http://www.snia.org/sites/default/files/SDC15_presentations/security/DeanHildebrand_Sasi__OpenStack%20SwiftOnFile.pdf
Spectrum Scale Infocenter
http://www.ibm.com/support/knowledgecenter/#!/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adm.doc/bl1adm_manageunifieda
ccess.htm
44

#ibmedge
OpenStack Summit 2016: IBM Spectrum Scale in an
OpenStack Environment Redpaper Published.
45
http://www.redbooks.ibm.com/abstracts/redp5331.html

#ibmedge
IBM Spectrum Scale - Unified File and Object Access
Feature Overview
• Multi protocol access for file and object in the same namespace
• Access object as file from POSIX, NFS and SMB
• Access file as object
– Provision to convert files to object automatically via background service called ‘objectizer’
– Provision to explicitly and immediately convert files to objects using CLI
• Feature is specifically made available as an “object storage policy”
• Allows to coexists with traditional object and other policies
• Create multiple unified file and object access policies
• Since policies are applicable per container , it gives end user the flexibility to create certain containers with Unified File and Object
Access policy and certain without it.
Flexible Identity Management Mode Support
• Local Mode: Suitable when auth schemes for file and object are different and unified access is for applications
• Object created by Object interface will be owned by internal “swift” user
• Unified Mode: Suitable for unified file and object access by end users. Leverage common ILM policies for file and object data based on data
ownership.
• Object created from Object interface should be owned by the user doing the Object PUT (i.e. FILE will be owned by UID/GID of the
user)
• Ability to run in-place analytics of object data using Spectrum Scale Hadoop connectors via POSIX interface.
47

#ibmedge
Filesystem Layout (Traditional Vs Unified File and Object
Access)
• One of the key advantages of unified file and object access is the placement and naming of objects when stored on the file
system. In unified file and object access stores objects following the same path hierarchy as the object's URL.
• In contrast, the default object implementation stores the object following the mapping given by the ring, and its final file path
cannot be determined by the user easily.
48
ibm/gpfs0/
Object ingest
object_fileset/
o/z1device108/objects/7551/125
75fc66179f12dc513580a239e92c3125
a.jpg a.jpg
Object ingest
ibm/gpfs0/
<Sof_policy_fileset>/<device>/
AUTH_acctID/cont/
a.jpg
Traditional SWIFT Unified File and Object Access
Ingest object URL: https://swift.example.com/v1/acct/cont/a.jpg

#ibmedge
Easy Access Of Objects as Files via supported File
Interfaces (NFS/SMB/POSIX)
• Objects ingested are available immediately for File access via the 3 supported file protocols.
• ID management modes (explained later) gives flexibility of assigning/retaining of owners, generally required by file protocols.
• Object authorization semantics are used during object access and file authorization semantics are used during file access of
the same data – thus ensuring compatibility of object and file applications
49
<Spectrum Scale Filesystem>
<SOF_Fileset>/<Device>
NFS/SMB/POSIXObject
(http) 2
1
<AUTH_account_ID>
<Container>
File Exports created on container level
OR
POSIX access from container level
Objects accessed as Files
Data ingested as Objects

#ibmedge
Objectization – Making Files as Objects (Accessing File
via Object interface)
• Spectrum Scale 4.2 features with a system service called ibmobjectizer responsible for objectization.
• Objectization is a process that converts files ingested from the file interface on unified file and object access
enabled container path to be available from the object interface.
• When new files are added from the file interface, they need to be visible to the Swift database to show
correct container listing and container or account statistics.
50
Spectrum Scale Filesystem
Unified File and Object
Fileset
NFS/SMB/POSIXObject
(http)
ibmobjectizer
objectization
1
2
3 Data ingested as Files
Files accessed as Objects

#ibmedge
Unified File and Object Access – Policy Integration for
Flexibility
• This feature is specifically made available as an “object storage policy” as it gives the following
advantages:
• Flexibility for administrator to manage unified file and object access separately
• Allows to coexists with traditional object and other policies
• Create multiple unified file and object access policies which can vary based on underlying storage
• Since policies are applicable per container , it gives end user the flexibility to create certain containers
with Unified File and Object Access policy and certain without it.
• Example: mmobj policy create SwiftOnFileFS --enable-file-access
51

#ibmedge
Notices and Disclaimers
52
Copyright © 2016 by International Business Machines Corporation (IBM). No part of this document may be reproduced or transmitted in any form without written permission
from IBM.
U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM.
Information in these presentations (including information relating to products that have not yet been announced by IBM) has been reviewed for accuracy as of the date of
initial publication and could include unintentional technical or typographical errors. IBM shall have no responsibility to update this information. THIS DOCUMENT IS
DISTRIBUTED "AS IS" WITHOUT ANY WARRANTY, EITHER EXPRESS OR IMPLIED. IN NO EVENT SHALL IBM BE LIABLE FOR ANY DAMAGE ARISING FROM THE
USE OF THIS INFORMATION, INCLUDING BUT NOT LIMITED TO, LOSS OF DATA, BUSINESS INTERRUPTION, LOSS OF PROFIT OR LOSS OF OPPORTUNITY.
IBM products and services are warranted according to the terms and conditions of the agreements under which they are provided.
IBM products are manufactured from new parts or new and used parts. In some cases, a product may not be new and may have been previously installed. Regardless, our
warranty terms apply.”
Any statements regarding IBM's future direction, intent or product plans are subject to change or withdrawal without notice.
Performance data contained herein was generally obtained in a controlled, isolated environments. Customer examples are presented as illustrations of how those customers
have used IBM products and the results they may have achieved. Actual performance, cost, savings or other results in other operating environments may vary.
References in this document to IBM products, programs, or services does not imply that IBM intends to make such products, programs or services available in all countries in
which IBM operates or does business.
Workshops, sessions and associated materials may have been prepared by independent session speakers, and do not necessarily reflect the views of IBM. All materials
and discussions are provided for informational purposes only, and are neither intended to, nor shall constitute legal or other guidance or advice to any individual participant or
their specific situation.
It is the customer’s responsibility to insure its own compliance with legal requirements and to obtain advice of competent legal counsel as to the identification and
interpretation of any relevant laws and regulatory requirements that may affect the customer’s business and any actions the customer may need to take to comply with such
laws. IBM does not provide legal advice or represent or warrant that its services or products will ensure that the customer is in compliance with any law

#ibmedge
Notices and Disclaimers Con’t.
53
Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not
tested those products in connection with this publication and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products.
Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. IBM does not warrant the quality of any third-party products, or the
ability of any such third-party products to interoperate with IBM’s products. IBM EXPRESSLY DISCLAIMS ALL WARRANTIES, EXPRESSED OR IMPLIED, INCLUDING BUT
NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
The provision of the information contained h erein is not intended to, and does not, grant any right or license under any IBM patents, copyrights, trademarks or other intellectual
property right.
IBM, the IBM logo, ibm.com, Aspera®, Bluemix, Blueworks Live, CICS, Clearcase, Cognos®, DOORS®, Emptoris®, Enterprise Document Management System™, FASP®,
FileNet®, Global Business Services ®, Global Technology Services ®, IBM ExperienceOne™, IBM SmartCloud®, IBM Social Business®, Information on Demand, ILOG,
Maximo®, MQIntegrator®, MQSeries®, Netcool®, OMEGAMON, OpenPower, PureAnalytics™, PureApplication®, pureCluster™, PureCoverage®, PureData®,
PureExperience®, PureFlex®, pureQuery®, pureScale®, PureSystems®, QRadar®, Rational®, Rhapsody®, Smarter Commerce®, SoDA, SPSS, Sterling Commerce®,
StoredIQ, Tealeaf®, Tivoli®, Trusteer®, Unica®, urban{code}®, Watson, WebSphere®, Worklight®, X-Force® and System z® Z/OS, are trademarks of International Business
Machines Corporation, registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM
trademarks is available on the Web at "Copyright and trademark information" at: www.ibm.com/legal/copytrade.shtml.

Spectrum scale object analytics

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (18)

Similar to Spectrum scale object analytics

Similar to Spectrum scale object analytics (20)

Recently uploaded

Recently uploaded (20)

Spectrum scale object analytics