SlideShare a Scribd company logo
1 of 55
Transcending the Desktop
Metaphor :
Building an object store, based on hybrid cloud collaboration
and analysis infrastructure in less than one hour.
John Martin
© 2017 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---1
THETA Deep Dive Session @ 3:45pm
Plenary 2
© 2017 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---2
Future Ready and the New Reality
Transcending the Desktop Metaphor : Impacts and Opportunities of AI, Cloud and IoT in Research Data Management
All universities and research organisations in Australia have a well-documented set of research data management policies, not only because they help
them comply with legislative and community expectations, but also because these policies facilitate the application processes under the National
Competitive Grants Program. These plans, however, are often framed and limited by technology assumptions and powerful metaphors for managing
unstructured data which originated in the late 1960’s and early 1970’s, creating an invisible straightjacket that conceptually constrains thinking about
how data can and should be managed.
Some of these invisible constraints include:-
 Data is created by people in a fixed workplace who work primarily with paper documents, manila folders, and filing cabinets
 This data will be consumed primarily by other humans
 Each piece of data has a unique idiosyncratic name which is given to it by its creator, generally without a broadly agreed naming standard
 Data has a physical location in one place, and while copies can be made to other storage locations, there is no inherent link between these copies
 Information about the data in the file is inferred from the file name and its containing folder.
 Other forms of metadata, such as security permissions are generally minimal, often hidden, difficult to administer, and may serve as a malware
vector.
None of these constraints hold true where the majority of data is created by machines, for machines and where the possibilities of deriving value from
globally aggregated diverse and unstructured data is driving the world’s most influential organisations.
This presentation highlights some areas where recent and anticipated advances in machine learning, IoT, data management and multi-cloud
architectures transcends the metaphors and technologies of the late 20th century, and can provide new opportunities for research organisations to
create significant new value. It will outline and where existing policy frameworks may require clarification or augmentation to ensure that taking
advantage of these new opportunities meets the expectations of the research and broader communities around security, probity, and ethical standards
© 2017 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---3
Deep Dive Session
Transcending the Desktop Metaphor : Building a hybrid cloud collaboration and analysis infrastructure in less than one hour.
Leveraging the innovation and scale of public cloud providers using hybrid-cloud data workflows, automatically tagging data with rich metadata for
future use via machine learning, and safely sharing data with diverse organisations across the world has the potential to transform an individual
project’s research data into an valuable global resource. This workshop aims to give a taste of each of these capabilities that can be used to break
through the limitations of the desktop metaphor, and inform the development of new paradigms for research data management.
This deep dive session takes participants through the steps required to create a secure data sharing and analysis infrastructure across a hybrid cloud
infrastructure using a combination of open-source and commercial toolsets, including
1. A brief introduction to object storage and the AWS S3 API and toolsets
2. Adding data into a private cloud object repository using human and machine oriented interfaces
3. Configuring event driven cloud based machine learning to automatically classify data and tag it with additional metadata
4. Sharing data between two different organisations using bucket policies
5. Creating temporary access credentials for data
6. Creating a cloud based Network File Sharing (NFS v3) environment, and copying data into it from an object repository
7. Attaching cloud compute to the file sharing environment for custom machine learning
8. Building a “sync n share” capability for cloud resident data using open source tools
9. Building a policy based data distribution and lifecycle management solution (time allowing)
Participants will need to bring their own laptop, chromebook or similar, as exercises will be done using a variety of web based resources using a
browser. Some exercises will benefit from command-line access and a willingness to install an iOS or Android application from the relevant app store.
Familiarity with Amazon web services will also be helpful, but not required.
© 2017 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---4 http://libguides.library.curtin.edu.au/c.php?g=202401&p=1333108
 What’s Driving My Information Growth?
 Decision Support / Analytics
 Machine Generated Data
 Systems of Record
 Systems of Engagement
Why Object Stores are the answer to research data management
 What drives the negative Inflection Point?
 Focus on Costs rather than Outcomes
 Fragmented Data Silo’s
 Lack of Standardisation & Automation
 No support for Lean or Agile methods
© 2018 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use Only5
Business
Velocity
Inflection
Point
Information Becomes
a Propellant to Research
Data Becomes a
Burden to IT Infrastructure
2010 2020
 Extremely large data sets
 Automotive, Manufacturing, Media/Entertainment,
Research, Service Providers, etc.
 Mostly unstructured data
 Images, Video, Audio, large amounts of text
 Data is seldomly deleted
 Backup & Archive keep growing
 Balancing between performance and cost
Object Storage
What is it?
© 2019 NetApp, Inc. All Rights Reserverd. Limited Use.6
Object
Storage
Massive
Scalability
Large
Archives Media
Repositories
Web
Datastores
Object Storage
Why Does It Matter?
© 2019 NetApp, Inc. All Rights Reserverd. Limited Use.7
Unstructured data continues
to grow
Millions of files were not bad,
billions are scary, and trillions
are terrifying…
Data access is changing
I want applications that can
access my data wherever it
lives…
Highly cost sensitive
petabyte scale repositories
Storage economics change at
petabytes and decades…
Storage being managed
in a cloud ecosystem
I need it in my data center
today and in the cloud
next year…
File vs. Object: Abstracting Massive Scale (No. of Files)
Scenario: You drive to the airport for your cross-country trip and want to park in long-term parking.
© 2019 NetApp, Inc. All Rights Reserverd. Limited Use.8
Parking Garage 1
Where did I
park?
Garage 1
Level 3
Row 3
Space 2
C:UsersOttoParkingGarage1Level3Row3Space2.file Object UID 317
Valet
Parking
Ticket Nr.
317
File based = Parking Garage Object based = Valet Parking
File vs. Object: Abstracting Data Locality from Access
© 2019 NetApp, Inc. All Rights Reserverd. Limited Use.
You Travel to where you
are going.
And at the door the Valet
has your car waiting on
you.
9
File vs. Object: Abstracting Storage Changes from Access
Say, you are gone for an extended amount of time…maybe 5 years
© 2019 NetApp, Inc. All Rights Reserverd. Limited Use.
Tape: Object:
You hope your car is still in the
garage, that you remember where
it’s at, and that it will start .
The Valet has made sure that your
car is in good working order and
delivers it to you.
10
Let’s Start with the Object…
Object is the atomically referenced unit of data storage and transmission
© 2019 NetApp, Inc. All Rights Reserverd. Limited Use.11
Client
S3 API
PUT / GET / DELETE
HTTPS
Client
Connections
Object Key / Name
Object type: JPG
Date modified: 07/21/ 2014
GPS Coordinates: Lat, Long
Location: DC @ Seattle
Metadata
Swift API
Object, Bucket, Tenant, and Grid
A bucket belongs in a
tenant or account.
12 © 2019 NetApp, Inc. All Rights Reserverd. Limited Use.
A StorageGRID Webscale
system manages data of the
tenants across multiple sites
Tenant N
Tenant 1
…
Site 1 Site 2 Site N
An object belongs in a bucket.
Comparison SAN/NAS/Object Storage
13
Application
Operating
System
Filesystem
SAN FabricNetwork
Client
Disk
Blocks
Storage
Volumes / LUNs
iSCSI
FC
Client
Application
Operating
System
LANNetwork
Disk
Blocks
Storage
Volumes / LUNs
Filesystem
NFS
SMB
Export Share
Client
Application
Network LAN
S3
Swift
Storage
Disk
Blocks
Volumes / LUNs
Data Meta-
data
Objects
Buckets
SAN NAS Object Storage
© 2019 NetApp, Inc. All Rights Reserverd. Limited Use.
Object Storage Versus File System
© 2019 NetApp, Inc. All Rights Reserverd. Limited Use. 14
File Object
Protocols NFS and CIFS (file protocols) S3 and Swift (RESTful APIs)
Portability Often OS-specific Independent of OS
Locking
Portable Operating System Interface (POSIX)
standard
Application needs to coordinate locks
Last write wins
Editing Data Inline edits possible Overwrite full object
Access Latency <1 ms ~10 ms
Custom Metadata No Yes
Namespace Usually within one data center Multiple data centers
Protection Snapshot copies, clones, and so on Versioning and bucket policies
NetApp SaaS Backup
StorageGRID Use Cases
15 © 2019 NetApp, Inc. All Rights Reserverd. Limited Use.
Content repository for Media
and new Cloud Applications
Backup and Archive to Cloud
to reduce cost, improve RTO
Data Analytics with
unstructured data and
machine learning
S3
Performance Tier with
Hot Data
Capacity Tier with
Cold Data, snapshots
and DR volumes
Tier cold data, snapshots and
DR volumes to cloud with
FabricPool
IoT / Industry 4.0 direct
interaction between sensors,
machines and storage
Collaboration
Roaming User Sync & Share Gateways
S3
Web Applications built on
object storage
Backup & restore Office 365
and Salesforce data
Exchange
SharePoint
OneDrive
Industry Collaboration via iRODS for Enhanced Research Data
Management
 Open Source Data Management Software
 Used in both university and commercial spaces
for research data
 e.g. Bayer uses this for genomics data
 Includes support for multiple back-ends
 S3
 NFS
 POSIX is the default
 Can act as an S3 gateway
 Initial deployments are often SAN based using
traditional virtualization techniques
 In 2016 Bayer found challenges at petabyte
scale without Object Stores
 Costs, D/R Capability, Recovery+Backup
https://irods.org
© 2017 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---16
Why use Object Storage for Research Data Management ?
 Highly scalable (from TB to PB)
 Low price tag
 Fully automated
 Low administrative effort
 Backup included by design
 DR included by design (replication)
 Data security features (fast rebuilt, check summing,
self healing, access control, encryption)
 Appliance with vendor support
 S3 interface support (de facto standard with iRODS
plugin available)
 Performs Well
Based on Bayer’s Findings - https://irods.org/uploads/2016/06/Bayer_iRODS-and-Objectstorage_UGM2016.pdf
© 2017 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---17
 StorageGRID has the potential to
become a single easy to use
repository covering the majority of
use cases for research data
management
 Current use cases don’t highlight:-
 Need for wide area collaboration on large
datasets
 Data lifecycle and fine grained data
placement
 Opportunity for exploratory analytics and
cross project data mining
 Opportunities to commercialize or publish
access to refined datasets
Current Data Management / Storage Options at UoW
© 2017 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---18
Hands on Exercise
How to make object
stores friendly
© 2017 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---19
Workshop exercise 1 – Build your own OneDrive Equivalent
1) Navigate to test.crankbird.com/nextcloud
2) Download the demo-1-xx-setup.txt file (it has the security and login info etc you’ll need)
3) Install NextCloud onto a webserver using instructions from that file
4) Enable the external storage plugin
5) Create your own S3 bucket using the external storage plugin
6) Copy a file into that bucket
7) Log into the aws console using instructions from demo-1-xx-setup.txt file, head to s3 and
look at the bucket, select the file (not the checkbox) and inspect Properties->Metadata
8) EXTRA CREDIT – Try to access a pre-existing datastore (aka fun with s3fs)
10 minutes
© 2017 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---20
AKA .. How to be nice without losing all your stuff
How to secure and share
data
© 2017 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---21
Versioning — Basics
 Allows multiple objects with same key in same bucket
 Cannot be disabled once enabled (only suspended)
 Object version needs to be explicitly deleted
22 © 2019 NetApp, Inc. All Rights Reserverd. Limited Use.
Key = object1
VersionId = 123
Key = object1
VersionId = 456
Key = object1
VersionId = 123
Key = object1
VersionId = 456
Delete Marker
Id = 789
Key = object1
VersionId = 123
Key = object1
VersionId = 123
Key = object1
VersionId = 456
Key = object1
VersionId = 123
Object visible
in Bucket Listing
Object invisible
in Bucket Listing
Object visible
in Bucket Listing
Enabling Versioning
From the shell you can use the
aws command line
https://docs.aws.amazon.com/cli/l
atest/reference/s3api/put-bucket-
versioning.html
23 © 2019 NetApp, Inc. All Rights Reserverd. Limited Use.
aws s3api put-bucket-versioning --bucket my-bucket --versioning-configuration Status=Enabled
Enabling Versioning – For Pro-gamers
Remember GenXers … don‘t try this without help from your millenials
24 © 2019 NetApp, Inc. All Rights Reserverd. Limited Use.
# Bucket versioning should be disabled (returns 'None')
s3.Bucket(bucket_name).Versioning().status
# Enable versioning
s3.Bucket(bucket_name).Versioning().enable()
# Versioning should now be enabled (returns 'Enabled')
s3.Bucket(bucket_name).Versioning().status
Python
# Bucket versioning should be disabled (returns nothing)
Get-S3BucketVersioning -BucketName $BucketName
# Enable versioning
Enable-S3BucketVersioning -BucketName $BucketName
# Versioning should now be enabled (returns 'Enabled')
Get-S3BucketVersioning -BucketName $BucketName
PowerShell
Workshop exercise 2
1) Using ssh from your computer,
log into the server running the
nextcloud websites
2) Enable bucket versioning
3) Delete one of the files in the
bucket
4) Delete the “delete marker”
5) List the files again
6) See the file magically reappear
5 minutes
© 2017 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---25
ssh –i thetademo.pem user-1-xx@demo-1.crankbird.com
aws s3api put-bucket-versioning --bucket my-bucket --
versioning-configuration Status=Enabled
aws s3api delete-object --bucket my-bucket --key test.txt
aws s3api list-object-versions --bucket my-bucket
aws s3api delete-object --bucket my-bucket –version-id
<version> --key test.txt
Sharing Data with Pre-Signed URLs
 All objects by default are private. Only the object owner has permission to access these
objects. However, the object owner can optionally share objects with others by creating a
presigned URL, using their own security credentials, to grant time-limited permission to
download the objects.
 A pre-signed URL can be accessed without credentials for a pre-determined period of time
with a set of pre-determined actions. (for example, object GET or PUT)
 Generating Pre-Signed URLs does NOT require a connection to the Object Storage
aws s3 presign s3://awsexamplebucket/test2.txt --expires-in 604800
https://examplebucket.s3.amazonaws.com/test2.txt?AWSAccessKeyId=AKIAEXAMPLEACC
ESSKEY&Signature=EXHCcBe%EXAMPLEKnz3r8O0AgEXAMPLE&Expires=1556132848
26 © 2019 NetApp, Inc. All Rights Reserverd. Limited Use.
Workshop exercise 3
1) Using the aws command line utility create a pre-signed URL for an object in
your bucket (preferably a picture of a cat)
2) Copy that URL and paste it into a web browser, (or email it to a colleague
and get them to try it)
3) See your cat picture
5 minutes
© 2017 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---27
This is some kind of next level …
Multi-organization data
sharing
© 2017 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---28
 Unix (NFS) and Windows (SMB) based
authentication mechanisms which remain simple
because they work within a single security
domain, typically defined by LDAP / Active
Directory
 Access to data from multiple security contexts
usually requires enabling anonymous access or
opening up inter-forest trust relationships.
Security teams rarely allow this level of trust
between institutions
 S3 compatible object stores have richer and
more flexible data authentication and access
permissioning schemes defined by JSON
formatted rules that allow for completely different
security contexts to safely share objects/buckets
 Pre-signed URL’s which provide temporary
access to anyone are an inherent capability
Multi-Organisational Data Sharing
© 2017 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---29
S3 Request Authentication
30 © 2019 NetApp, Inc. All Rights Reserverd. Limited Use.
AccessKey
lookup
FAIL
if AccessKey
cannot be found
Signature
validation
FAIL
if request
Signature
does not match
calculated
signature
Group
Policy
validation
DENY
if an explicit
DENY
has been
specified
Bucket
Policy
validation
DENY
if an explicit DENY
has been specified
or if no ALLOW
is specified in
Bucket
or Group Policy
(implicit DENY)
Object
S3 request
Bucket Policies follow the Access Policy Language Definition
The AWS Policy Generator can be used to create policies online
Workshop exercise 4
1) Open up
https://awspolicygen.s3.amaz
onaws.com/policygen.html in
a web browser
2) Try to create a policy
template that looks a bit like
this
5 minutes
© 2017 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---31
$ cat data_sharing_policy.json
{
"Statement": [
{
"Effect": "Allow",
"Principal": { "SGWS":
"17468297853691494465" },
"Action": [
"s3:GetObject",
"s3:ListBucket"
],
"Resource": [
"urn:aws:s3:::shared-bucket",
"urn:aws:s3:::shared-bucket/*"
]
}
]
}
Tenant we want to
delegate access to
Allowed actions
“Shared” bucket
How to enrich data
for analysis
© 2017 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---32
1956 1974 19931987 2018
AI Winter AI Winter
1980
Artificial Intelligence Timeline
Deep Learning
Deep Reinforcement Learning
DATA
Statistical
Performance
Machine Learning
Deep Learning
 Exploratory Analytics is probably the
highest value thing you can do with your
data lake
 You do feature extraction / assignment
when the data is new for real time
analytics
 Features are Metadata
 You cant easily do feature extraction
across Petabytes of cold data
 Its pretty much impossible to do it on tape
Why Metadata Becomes More Important Over Time
© 2018 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use Only34
StorageGRID
Site 1
San Francisco
StorageGRID
Site 2
New York
StorageGRID
Site 3
Munich
StorageGRID
Site 4
Tokyo
International Co-Operation and Hybrid Cloud Data Pipelines
© 2017 NetApp, Inc. All rights reserved.35
With StorageGRID Webscale Platform Services
Mirror my bucket
from Webscale to
S3 and run Hive
script on Elastic
MapReduce
Amazon EMR
Amazon S3 Amazon SNS
Notify my Lambda
function to run
facial recognition
on images in my
SG bucket
Search object
metadata in my
SG bucket as
they are updated
Amazon Lambda
Amazon Elasticsearch
Amazon Rekognition
Metadata can trigger ILM Policy
Metadata
 Objects can have up to 2 KB of metadata in form of key-value pairs
 Case insensitive and HTTP header compatible ASCII characters allowed
 Metadata is an integral part of the object
 Metadata can only be changed by copying the object to itself
36 © 2019 NetApp, Inc. All Rights Reserverd. Limited Use.
Write-S3Object -BucketName $BucketName -Key 'my-object' -Content 'Hello World!'
-Metadata @{company='NetApp';location='Germany'}
Get-S3ObjectMetadata -BucketName $BucketName -Key 'my-object'
PowerShell
Metadata
Data
metadata ={'company': 'NetApp',
'location': 'Germany'}
obj = s3.Object(bucket_name, 'my-object')
obj.put(Body='Hello World!’, Metadata=metadata)
s3.Object(bucket_name, 'my-object').get().get('Metadata')
Python
Object Tagging
 Similar to metadata, but more flexible
 Can be added or updated independently of object data
 Up to 10 unique tags per object
 128 characters for key and 256 for value, case sensitive!
37 © 2019 NetApp, Inc. All Rights Reserverd. Limited Use.
Can trigger ILM Policy
Set-S3ObjectTagging -BucketName $BucketName -Key 'my-object' -Tags
@{Name='projectId';Value='12345'},@{Name='company';Value='NetApp'}
Get-S3ObjectTagging -BucketName $BucketName -Key 'my-object'
Remove-S3ObjectTagging -BucketName $BucketName -Key 'my-object'
PowerShell
obj = s3.Object(bucket_name, 'my-object')
tags = 'projectId=12345&company=NetApp'
obj.put(Body='Hello World!', Tagging=tags)
client.get_object_tagging(bucket_name,'my-object')
Python
Metadata
Data
Ta
g
Ta
g
Ta
g
Applications of Elasticsearch
38 © 2018 NetApp, Inc. All rights reserved. — NETAPP CONFIDENTIAL —
Search engine
Use as a search engine for your object store
Data visualization
Kibana dashboard to visualize the data based on metadata fields
Data cataloging
Keep track of all your data in your data lake for a single source of truth about your data
Training data analytics
Analyze the data to ensure a representative training dataset for ML/AI models
Bucket monitoring
Track how much data is stored and usage stats$
Kibana Data Visualization
39 © 2018 NetApp, Inc. All rights reserved. — NETAPP CONFIDENTIAL —
Use Case: High Performance Data Lake for Data Analytics
40 © 2019 NetApp, Inc. All Rights Reserverd. Limited Use.
Docker based
StorageGRID running on
bare metal ruggedized
server with local SSDs
- Single Copy -
Edge Core
Automatic replication using
S3 based CloudMirror with
guaranteed data delivery
>100PB Data Lake build on
StorageGRID Appliances
- Erasure Coding for increased availability and durability -
Windows Client (HiL/SiL) Linux Client (HiL/SiL)
Mount Data Lake as
Windows Drive using
CloudBerry Drive
Mount Data Lake
as local Mountpoint
using s3fs-fuse
Cloud
Automatic replication using
S3 based CloudMirror with
guaranteed data delivery
S3-enabled Application (HiL/SiL)
Multiple PB ingest/day
Usage of Cloud Services
Cloud Bursting
>100GByte/s retrieval
Data Improvement / Refinement at Australian Universities
© 2017 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---41
Monetising your data
Requestor Pays Buckets
© 2017 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---42
Workshop exercise 5
1) Log into the AWS console
2) Select Lambda (under compute)
3) Create Function -> Use BluePrint -> recognition-python -> configure
4) Name -> Use existing role -> UberLambda
5) Choose your bucket from the dropdown
6) Enable trigger -> create function
7) Open Designer -> Add events for your bucket (PUT and POST)
8) Copy in the code from S3Recog.py
9) Add new picture into the bucket from nextcloud
10) Inspect the new metadata from the s3 interface
15 minutes
© 2017 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---43
Building hybrid cloud data
analytics worklows using NetApp
DataFabric Technologies
Extra Credit
© 2017 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---44
A Vision for an Australian Research Data Network
 A single multi-tenant StorageGRID
for secure collaborative data
resource between Government,
Defense, and leading Research
Universities
 Data stays under full custodianship
of Australian Citizens with policy
based placement, control and role
based access
 Local access to hot data with
throughput measured in
Gigabytes/sec
 Cold data distributed at lowest cost
 Leverages HPC, Hadoop analytics
ecosystem in Private, Public and
Hybrid Cloud deployments
120
PB
Capacity
100B
Objects
16 Sites
CloudSync NFS / SMB to Object in Cloud and On Premesis
 Cloud Sync is a service
 From https://cloud.netapp.com
 Create your relationship
 Identify the NFS / CIFS server
 Launch a Data Broker
 Select your S3 bucket
 Source, target and the Data Broker
may be in the cloud or on-premises
Data Broker
46 © 2019 NetApp, Inc. All Rights Reserverd. Limited Use.
AWS S3
Cloud Mirroring
Replicate log objects with prefix
sgws://bucket_name/logs/2017-08* to
s3://us-east-1.bucket_name to be
analyzed in Elastic MapReduce
Bucket Event Notifications
When new objects with prefix
sgws://bucket_name/images/* are
created, notify my Lambda function to
copy the image into AWS S3 and
detect features in the scene
Search Integration
Update my Elasticsearch index when
new object metadata is written or
updated for objects matching prefix
sgws://bucket_name/images/*
StorageGrid User Driven Platform Services
Empower Users to Implement Hybrid Cloud Data Pipelines with Amazon Web Services
© 2019 NetApp, Inc. All Rights Reserverd. Limited Use.47
SG
Amazon EMR
Amazon S3
SG
Amazon SNS
Amazon Lambda
1
2
3
Amazon S3
2 3
4
5
6
1
SG
1 2
Amazon
Rekognition
© 2017 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---48
Thank You.
Pre-Signed URLs
49 © 2019 NetApp, Inc. All Rights Reserverd. Limited Use.
url = client.generate_presigned_url(
'get_object', {'Bucket': bucket_name, 'Key': 'my-key'},
ExpiresIn=3600)
Python
https://webscaledemo.netapp.com/mybucket/my-key?Action=GET
&X-Amz-Algorithm=AWS4-HMAC-SHA256
&X-Amz-Credential=IZUDAQ5NQ39AUJ6HR3Q4%2F20181220%2Fus-east-1%2Fs3%2Faws4_request
&X-Amz-Date=20181220T162225Z
&X-Amz-Expires=3600
&X-Amz-SignedHeaders=host
&X-Amz-Signature=4a5981c70bb69e8d103cd37f1f5985784979fe95b5d17e202860d778dcb22b7c
Get-S3PresignedUrl -Method 'GET' -BucketName $BucketName -Key 'my-key’
-Expires (Get-Date).AddHours(1)
PowerShell
Persisting S3 Credentials
 Credentials need to be stored locally in ~/.aws/credentials (Linux) or
$HOME/.aws/credentials (Windows) or in a properties file, but never in code
 Config parameters can be stored in ~/.aws/config (Linux) or $HOME/.aws/config
(Windows) or in a properties file
 Create an AWS Profile
 via PowerShell
 New-AwsProfile -ProfileName 'default' -AccessKey 'FAHKA1WXDM5E7N4L1JP3' -
SecretKey
'1iqbkq+yECyIFXbSnpzdeF8zVcWDm/u6DOgxNzHW' -EndpointUrl
'https://webscaledemo.netapp.com'
 manual by editing $HOME/.aws/credentials or ~/.aws/credentials
 [default]
aws_access_key_id = FAHKA1WXDM5E7N4L1JP3
aws_secret_access_key = 1iqbkq+yECyIFXbSnpzdeF8zVcWDm/u6DOgxNzHW
© 2019 NetApp, Inc. All Rights Reserverd. Limited Use.50
How to handle custom metadata
 As Object Metadata is an integral part of the object and is limited to 2KB it should be used
to describe how and where the object should be stored, e.g. for
 Service Level Agreements
 Locality or Geo-Fencing
 Importance
 Confidentiality
 As object tags can be added and removed, but are limited to 10 tags, they should be used
to group objects by e.g.
 cost center
 project
51 © 2019 NetApp, Inc. All Rights Reserverd. Limited Use.
Bucket Policies
52 © 2019 NetApp, Inc. All Rights Reserverd. Limited Use.
# Reference Bucket Policy
bucket_policy = s3.BucketPolicy(bucket_name)
# Load Policy from local file and attach it to bucket
open('/path/to/policy.json','r') as f:
bucket_policy.put(Policy=f.read())
# Detach current Bucket Policy from bucket
bucket_policy.delete()
Python
# create Bucket Policy
$BucketPolicy = New-AwsPolicy -Principal @{SGWS='17468297853691494465’}
-Action "s3:GetObject","s3:ListBucket"
-Resource "urn:sgws:s3:::shared-bucket","urn:sgws:s3:::shared-
bucket/*”
Set-S3BucketPolicy -BucketName $BucketName -Policy $BucketPolicy
Remove-S3BucketPolicy -BucketName $BucketName
PowerShell
Pre-Signed URLs
 A pre-signed URL can be accessed without credentials for a pre-determined period of time
with a set of pre-determined actions. (for example, object GET or PUT)
 Generating Pre-Signed URLs does NOT require a connection to the Object Storage
53 © 2019 NetApp, Inc. All Rights Reserverd. Limited Use.
url = client.generate_presigned_url(
'get_object', {'Bucket': bucket_name, 'Key': 'my-key'},
ExpiresIn=3600)
Python
https://webscaledemo.netapp.com/mybucket/my-key?Action=GET
&X-Amz-Algorithm=AWS4-HMAC-SHA256
&X-Amz-Credential=IZUDAQ5NQ39AUJ6HR3Q4%2F20181220%2Fus-east-1%2Fs3%2Faws4_request
&X-Amz-Date=20181220T162225Z
&X-Amz-Expires=3600
&X-Amz-SignedHeaders=host
&X-Amz-Signature=4a5981c70bb69e8d103cd37f1f5985784979fe95b5d17e202860d778dcb22b7c
Get-S3PresignedUrl -Method 'GET' -BucketName $BucketName -Key 'my-key’
-Expires (Get-Date).AddHours(1)
PowerShell
Pre-Signed URLs — Example
 Use case: Photo-sharing app on
mobile phone
 Bad design: Put S3 credentials in
phone app to upload photos to S3
 Better design: Use pre-signed URLs
 S3 credentials will only be required on
the (secured) server side, but not
inside the phone app
 Traffic does not need to be routed
through backend servers
54 © 2019 NetApp, Inc. All Rights Reserverd. Limited Use.
Servers
Backend App
Phone
Frontend
App
1. Request
photo
upload
3. Return pre-
signed URL for
photo upload 4. Upload photo
via pre-signed
URL
2. Generate pre-
signed URL
(in backend)
S3 Best Practices: Bucket Listing
 Listing objects of a bucket with millions of objects
is a very expensive operation
 Metadata needs to be fetched from all storage nodes
 Can result in high CPU utilization due to high load
on metadata database
 Bucket event notifications
 SNS notifications can be a much better approach in many cases
 Available since NetApp® StorageGRID® 11.0
 Example: When a new object is ingested with prefix
sgws://bucket_name/images/* notify my application
to analyze the object and update its metadata
 Try it out with localstack SNS
https://github.com/localstack/localstack
© 2019 NetApp, Inc. All Rights Reserverd. Limited Use.55
SG
Amazon SNS
SNS Receiver
Application
2 3
5
1
4

More Related Content

What's hot

Protection of big data privacy
Protection of big data privacyProtection of big data privacy
Protection of big data privacyredpel dot com
 
The rise of “Big Data” on cloud computing
The rise of “Big Data” on cloud computingThe rise of “Big Data” on cloud computing
The rise of “Big Data” on cloud computingMinhazul Arefin
 
11.cyber forensics in cloud computing
11.cyber forensics in cloud computing11.cyber forensics in cloud computing
11.cyber forensics in cloud computingAlexander Decker
 
Big data security and privacy issues in the
Big data security and privacy issues in theBig data security and privacy issues in the
Big data security and privacy issues in theIJNSA Journal
 
wp-security-dbsec-cloud-3225125
wp-security-dbsec-cloud-3225125wp-security-dbsec-cloud-3225125
wp-security-dbsec-cloud-3225125Gabor Bokor
 
Introduction to pyspark new
Introduction to pyspark newIntroduction to pyspark new
Introduction to pyspark newAnam Mahmood
 
Big Data on Public Cloud
Big Data on Public CloudBig Data on Public Cloud
Big Data on Public CloudIMC Institute
 
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of TechnologyGuest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of TechnologyNishant Gandhi
 
Big data analytics, survey r.nabati
Big data analytics, survey r.nabatiBig data analytics, survey r.nabati
Big data analytics, survey r.nabatinabati
 
Week 4: Big Data and Hadoop in Alibaba Cloud - DSA 441 Cloud Computing
Week 4: Big Data and Hadoop in Alibaba Cloud - DSA 441 Cloud ComputingWeek 4: Big Data and Hadoop in Alibaba Cloud - DSA 441 Cloud Computing
Week 4: Big Data and Hadoop in Alibaba Cloud - DSA 441 Cloud ComputingFerdin Joe John Joseph PhD
 
Future of big data nick kabra speaker compendium march 2013
Future of big data nick kabra speaker compendium march 2013Future of big data nick kabra speaker compendium march 2013
Future of big data nick kabra speaker compendium march 2013nkabra
 
Research, the Cloud, and the IRB
Research, the Cloud, and the IRBResearch, the Cloud, and the IRB
Research, the Cloud, and the IRBMichael Zimmer
 
Forecast of Big Data Trends
Forecast of Big Data TrendsForecast of Big Data Trends
Forecast of Big Data TrendsIMC Institute
 
A scalabl e and cost effective framework for privacy preservation over big d...
A  scalabl e and cost effective framework for privacy preservation over big d...A  scalabl e and cost effective framework for privacy preservation over big d...
A scalabl e and cost effective framework for privacy preservation over big d...amna alhabib
 
Why edge computing is critical to hybrid IT and cloud success
Why edge computing is critical to hybrid IT and cloud successWhy edge computing is critical to hybrid IT and cloud success
Why edge computing is critical to hybrid IT and cloud successClearSky Data
 

What's hot (20)

Protection of big data privacy
Protection of big data privacyProtection of big data privacy
Protection of big data privacy
 
The rise of “Big Data” on cloud computing
The rise of “Big Data” on cloud computingThe rise of “Big Data” on cloud computing
The rise of “Big Data” on cloud computing
 
11.cyber forensics in cloud computing
11.cyber forensics in cloud computing11.cyber forensics in cloud computing
11.cyber forensics in cloud computing
 
Big data security and privacy issues in the
Big data security and privacy issues in theBig data security and privacy issues in the
Big data security and privacy issues in the
 
wp-security-dbsec-cloud-3225125
wp-security-dbsec-cloud-3225125wp-security-dbsec-cloud-3225125
wp-security-dbsec-cloud-3225125
 
Introduction to pyspark new
Introduction to pyspark newIntroduction to pyspark new
Introduction to pyspark new
 
Big Data on Public Cloud
Big Data on Public CloudBig Data on Public Cloud
Big Data on Public Cloud
 
A REVIEW ON RESOURCE ALLOCATION MECHANISM IN CLOUD ENVIORNMENT
A REVIEW ON RESOURCE ALLOCATION MECHANISM IN CLOUD ENVIORNMENTA REVIEW ON RESOURCE ALLOCATION MECHANISM IN CLOUD ENVIORNMENT
A REVIEW ON RESOURCE ALLOCATION MECHANISM IN CLOUD ENVIORNMENT
 
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of TechnologyGuest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
 
Big data analytics, survey r.nabati
Big data analytics, survey r.nabatiBig data analytics, survey r.nabati
Big data analytics, survey r.nabati
 
Week 4: Big Data and Hadoop in Alibaba Cloud - DSA 441 Cloud Computing
Week 4: Big Data and Hadoop in Alibaba Cloud - DSA 441 Cloud ComputingWeek 4: Big Data and Hadoop in Alibaba Cloud - DSA 441 Cloud Computing
Week 4: Big Data and Hadoop in Alibaba Cloud - DSA 441 Cloud Computing
 
Iot dan cc
Iot dan ccIot dan cc
Iot dan cc
 
Hadoop
HadoopHadoop
Hadoop
 
Future of big data nick kabra speaker compendium march 2013
Future of big data nick kabra speaker compendium march 2013Future of big data nick kabra speaker compendium march 2013
Future of big data nick kabra speaker compendium march 2013
 
Research, the Cloud, and the IRB
Research, the Cloud, and the IRBResearch, the Cloud, and the IRB
Research, the Cloud, and the IRB
 
Big Data: Big Issues for IP
Big Data: Big Issues for IPBig Data: Big Issues for IP
Big Data: Big Issues for IP
 
Forecast of Big Data Trends
Forecast of Big Data TrendsForecast of Big Data Trends
Forecast of Big Data Trends
 
A scalabl e and cost effective framework for privacy preservation over big d...
A  scalabl e and cost effective framework for privacy preservation over big d...A  scalabl e and cost effective framework for privacy preservation over big d...
A scalabl e and cost effective framework for privacy preservation over big d...
 
Why edge computing is critical to hybrid IT and cloud success
Why edge computing is critical to hybrid IT and cloud successWhy edge computing is critical to hybrid IT and cloud success
Why edge computing is critical to hybrid IT and cloud success
 
C017421624
C017421624C017421624
C017421624
 

Similar to Research data management 1.5

DCA Symposium 6 Feb 2023.pdf
DCA Symposium 6 Feb 2023.pdfDCA Symposium 6 Feb 2023.pdf
DCA Symposium 6 Feb 2023.pdfAlan Morrison
 
Data Division in Cloud for Secured Data Storage using RSA Algorithm
Data Division in Cloud for Secured Data Storage using RSA AlgorithmData Division in Cloud for Secured Data Storage using RSA Algorithm
Data Division in Cloud for Secured Data Storage using RSA AlgorithmIRJET Journal
 
云计算及其应用
云计算及其应用云计算及其应用
云计算及其应用lantianlcdx
 
The Proliferation And Advances Of Computer Networks
The Proliferation And Advances Of Computer NetworksThe Proliferation And Advances Of Computer Networks
The Proliferation And Advances Of Computer NetworksJessica Deakin
 
CSIT 534 Presentation Cherri_edmond
CSIT 534 Presentation Cherri_edmondCSIT 534 Presentation Cherri_edmond
CSIT 534 Presentation Cherri_edmondlowedmond
 
Cloud Computing_ICT Concepts & Trends.pptx
Cloud Computing_ICT Concepts & Trends.pptxCloud Computing_ICT Concepts & Trends.pptx
Cloud Computing_ICT Concepts & Trends.pptxssuser6063b0
 
Cloud Analytics Ability to Design, Build, Secure, and Maintain Analytics Solu...
Cloud Analytics Ability to Design, Build, Secure, and Maintain Analytics Solu...Cloud Analytics Ability to Design, Build, Secure, and Maintain Analytics Solu...
Cloud Analytics Ability to Design, Build, Secure, and Maintain Analytics Solu...YogeshIJTSRD
 
Your Data is Waiting. What are the Top 5 Trends for Data in 2022? (ASEAN)
Your Data is Waiting. What are the Top 5 Trends for Data in 2022? (ASEAN)Your Data is Waiting. What are the Top 5 Trends for Data in 2022? (ASEAN)
Your Data is Waiting. What are the Top 5 Trends for Data in 2022? (ASEAN)Denodo
 
IRJET- Distributed Decentralized Data Storage using IPFS
IRJET- Distributed Decentralized Data Storage using IPFSIRJET- Distributed Decentralized Data Storage using IPFS
IRJET- Distributed Decentralized Data Storage using IPFSIRJET Journal
 
An Efficient and Safe Data Sharing Scheme for Mobile Cloud Computing
An Efficient and Safe Data Sharing Scheme for Mobile Cloud ComputingAn Efficient and Safe Data Sharing Scheme for Mobile Cloud Computing
An Efficient and Safe Data Sharing Scheme for Mobile Cloud Computingijtsrd
 
Xanadu for Big Data + IoT + Deep Learning + Cloud Integration Strategy
Xanadu for Big Data + IoT + Deep Learning + Cloud Integration StrategyXanadu for Big Data + IoT + Deep Learning + Cloud Integration Strategy
Xanadu for Big Data + IoT + Deep Learning + Cloud Integration StrategyAlex G. Lee, Ph.D. Esq. CLP
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An IntroductionDenodo
 
The Growth Of Data Centers
The Growth Of Data CentersThe Growth Of Data Centers
The Growth Of Data CentersGina Buck
 
Intro To Cloud Computing
Intro To Cloud ComputingIntro To Cloud Computing
Intro To Cloud Computingprakashjjaya
 
Government Applications of Cloud Computing
Government Applications of Cloud ComputingGovernment Applications of Cloud Computing
Government Applications of Cloud ComputingRoger Smith
 
Cloud Storage: Focusing On Back End Storage Architecture
Cloud Storage: Focusing On Back End Storage ArchitectureCloud Storage: Focusing On Back End Storage Architecture
Cloud Storage: Focusing On Back End Storage ArchitectureIOSR Journals
 
Wicsa2011 cloud tutorial
Wicsa2011 cloud tutorialWicsa2011 cloud tutorial
Wicsa2011 cloud tutorialAnna Liu
 

Similar to Research data management 1.5 (20)

DCA Symposium 6 Feb 2023.pdf
DCA Symposium 6 Feb 2023.pdfDCA Symposium 6 Feb 2023.pdf
DCA Symposium 6 Feb 2023.pdf
 
Data Division in Cloud for Secured Data Storage using RSA Algorithm
Data Division in Cloud for Secured Data Storage using RSA AlgorithmData Division in Cloud for Secured Data Storage using RSA Algorithm
Data Division in Cloud for Secured Data Storage using RSA Algorithm
 
云计算及其应用
云计算及其应用云计算及其应用
云计算及其应用
 
The Proliferation And Advances Of Computer Networks
The Proliferation And Advances Of Computer NetworksThe Proliferation And Advances Of Computer Networks
The Proliferation And Advances Of Computer Networks
 
CSIT 534 Presentation Cherri_edmond
CSIT 534 Presentation Cherri_edmondCSIT 534 Presentation Cherri_edmond
CSIT 534 Presentation Cherri_edmond
 
Hybrid Cloud Strategy for Big Data and Analytics
Hybrid Cloud Strategy for Big Data and Analytics Hybrid Cloud Strategy for Big Data and Analytics
Hybrid Cloud Strategy for Big Data and Analytics
 
Cloud Computing_ICT Concepts & Trends.pptx
Cloud Computing_ICT Concepts & Trends.pptxCloud Computing_ICT Concepts & Trends.pptx
Cloud Computing_ICT Concepts & Trends.pptx
 
Cloud Analytics Ability to Design, Build, Secure, and Maintain Analytics Solu...
Cloud Analytics Ability to Design, Build, Secure, and Maintain Analytics Solu...Cloud Analytics Ability to Design, Build, Secure, and Maintain Analytics Solu...
Cloud Analytics Ability to Design, Build, Secure, and Maintain Analytics Solu...
 
Your Data is Waiting. What are the Top 5 Trends for Data in 2022? (ASEAN)
Your Data is Waiting. What are the Top 5 Trends for Data in 2022? (ASEAN)Your Data is Waiting. What are the Top 5 Trends for Data in 2022? (ASEAN)
Your Data is Waiting. What are the Top 5 Trends for Data in 2022? (ASEAN)
 
IRJET- Distributed Decentralized Data Storage using IPFS
IRJET- Distributed Decentralized Data Storage using IPFSIRJET- Distributed Decentralized Data Storage using IPFS
IRJET- Distributed Decentralized Data Storage using IPFS
 
An Efficient and Safe Data Sharing Scheme for Mobile Cloud Computing
An Efficient and Safe Data Sharing Scheme for Mobile Cloud ComputingAn Efficient and Safe Data Sharing Scheme for Mobile Cloud Computing
An Efficient and Safe Data Sharing Scheme for Mobile Cloud Computing
 
Xanadu for Big Data + IoT + Deep Learning + Cloud Integration Strategy
Xanadu for Big Data + IoT + Deep Learning + Cloud Integration StrategyXanadu for Big Data + IoT + Deep Learning + Cloud Integration Strategy
Xanadu for Big Data + IoT + Deep Learning + Cloud Integration Strategy
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
 
J017547478
J017547478J017547478
J017547478
 
The Growth Of Data Centers
The Growth Of Data CentersThe Growth Of Data Centers
The Growth Of Data Centers
 
Intro To Cloud Computing
Intro To Cloud ComputingIntro To Cloud Computing
Intro To Cloud Computing
 
Government Applications of Cloud Computing
Government Applications of Cloud ComputingGovernment Applications of Cloud Computing
Government Applications of Cloud Computing
 
Cloud Storage: Focusing On Back End Storage Architecture
Cloud Storage: Focusing On Back End Storage ArchitectureCloud Storage: Focusing On Back End Storage Architecture
Cloud Storage: Focusing On Back End Storage Architecture
 
K017146064
K017146064K017146064
K017146064
 
Wicsa2011 cloud tutorial
Wicsa2011 cloud tutorialWicsa2011 cloud tutorial
Wicsa2011 cloud tutorial
 

Recently uploaded

BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsssuserddc89b
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensorsonawaneprad
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxFarihaAbdulRasheed
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxBerniceCayabyab1
 
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptxSulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptxnoordubaliya2003
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPirithiRaju
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPirithiRaju
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologycaarthichand2003
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationColumbia Weather Systems
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingNetHelix
 
Solution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsSolution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsHajira Mahmood
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)riyaescorts54
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptArshadWarsi13
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxMurugaveni B
 

Recently uploaded (20)

BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physics
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensor
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
 
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptxSulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technology
 
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort ServiceHot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather Station
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
 
Solution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsSolution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutions
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.ppt
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
 

Research data management 1.5

  • 1. Transcending the Desktop Metaphor : Building an object store, based on hybrid cloud collaboration and analysis infrastructure in less than one hour. John Martin © 2017 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---1 THETA Deep Dive Session @ 3:45pm Plenary 2
  • 2. © 2017 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---2 Future Ready and the New Reality Transcending the Desktop Metaphor : Impacts and Opportunities of AI, Cloud and IoT in Research Data Management All universities and research organisations in Australia have a well-documented set of research data management policies, not only because they help them comply with legislative and community expectations, but also because these policies facilitate the application processes under the National Competitive Grants Program. These plans, however, are often framed and limited by technology assumptions and powerful metaphors for managing unstructured data which originated in the late 1960’s and early 1970’s, creating an invisible straightjacket that conceptually constrains thinking about how data can and should be managed. Some of these invisible constraints include:-  Data is created by people in a fixed workplace who work primarily with paper documents, manila folders, and filing cabinets  This data will be consumed primarily by other humans  Each piece of data has a unique idiosyncratic name which is given to it by its creator, generally without a broadly agreed naming standard  Data has a physical location in one place, and while copies can be made to other storage locations, there is no inherent link between these copies  Information about the data in the file is inferred from the file name and its containing folder.  Other forms of metadata, such as security permissions are generally minimal, often hidden, difficult to administer, and may serve as a malware vector. None of these constraints hold true where the majority of data is created by machines, for machines and where the possibilities of deriving value from globally aggregated diverse and unstructured data is driving the world’s most influential organisations. This presentation highlights some areas where recent and anticipated advances in machine learning, IoT, data management and multi-cloud architectures transcends the metaphors and technologies of the late 20th century, and can provide new opportunities for research organisations to create significant new value. It will outline and where existing policy frameworks may require clarification or augmentation to ensure that taking advantage of these new opportunities meets the expectations of the research and broader communities around security, probity, and ethical standards
  • 3. © 2017 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---3 Deep Dive Session Transcending the Desktop Metaphor : Building a hybrid cloud collaboration and analysis infrastructure in less than one hour. Leveraging the innovation and scale of public cloud providers using hybrid-cloud data workflows, automatically tagging data with rich metadata for future use via machine learning, and safely sharing data with diverse organisations across the world has the potential to transform an individual project’s research data into an valuable global resource. This workshop aims to give a taste of each of these capabilities that can be used to break through the limitations of the desktop metaphor, and inform the development of new paradigms for research data management. This deep dive session takes participants through the steps required to create a secure data sharing and analysis infrastructure across a hybrid cloud infrastructure using a combination of open-source and commercial toolsets, including 1. A brief introduction to object storage and the AWS S3 API and toolsets 2. Adding data into a private cloud object repository using human and machine oriented interfaces 3. Configuring event driven cloud based machine learning to automatically classify data and tag it with additional metadata 4. Sharing data between two different organisations using bucket policies 5. Creating temporary access credentials for data 6. Creating a cloud based Network File Sharing (NFS v3) environment, and copying data into it from an object repository 7. Attaching cloud compute to the file sharing environment for custom machine learning 8. Building a “sync n share” capability for cloud resident data using open source tools 9. Building a policy based data distribution and lifecycle management solution (time allowing) Participants will need to bring their own laptop, chromebook or similar, as exercises will be done using a variety of web based resources using a browser. Some exercises will benefit from command-line access and a willingness to install an iOS or Android application from the relevant app store. Familiarity with Amazon web services will also be helpful, but not required.
  • 4. © 2017 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---4 http://libguides.library.curtin.edu.au/c.php?g=202401&p=1333108
  • 5.  What’s Driving My Information Growth?  Decision Support / Analytics  Machine Generated Data  Systems of Record  Systems of Engagement Why Object Stores are the answer to research data management  What drives the negative Inflection Point?  Focus on Costs rather than Outcomes  Fragmented Data Silo’s  Lack of Standardisation & Automation  No support for Lean or Agile methods © 2018 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use Only5 Business Velocity Inflection Point Information Becomes a Propellant to Research Data Becomes a Burden to IT Infrastructure 2010 2020
  • 6.  Extremely large data sets  Automotive, Manufacturing, Media/Entertainment, Research, Service Providers, etc.  Mostly unstructured data  Images, Video, Audio, large amounts of text  Data is seldomly deleted  Backup & Archive keep growing  Balancing between performance and cost Object Storage What is it? © 2019 NetApp, Inc. All Rights Reserverd. Limited Use.6 Object Storage Massive Scalability Large Archives Media Repositories Web Datastores
  • 7. Object Storage Why Does It Matter? © 2019 NetApp, Inc. All Rights Reserverd. Limited Use.7 Unstructured data continues to grow Millions of files were not bad, billions are scary, and trillions are terrifying… Data access is changing I want applications that can access my data wherever it lives… Highly cost sensitive petabyte scale repositories Storage economics change at petabytes and decades… Storage being managed in a cloud ecosystem I need it in my data center today and in the cloud next year…
  • 8. File vs. Object: Abstracting Massive Scale (No. of Files) Scenario: You drive to the airport for your cross-country trip and want to park in long-term parking. © 2019 NetApp, Inc. All Rights Reserverd. Limited Use.8 Parking Garage 1 Where did I park? Garage 1 Level 3 Row 3 Space 2 C:UsersOttoParkingGarage1Level3Row3Space2.file Object UID 317 Valet Parking Ticket Nr. 317 File based = Parking Garage Object based = Valet Parking
  • 9. File vs. Object: Abstracting Data Locality from Access © 2019 NetApp, Inc. All Rights Reserverd. Limited Use. You Travel to where you are going. And at the door the Valet has your car waiting on you. 9
  • 10. File vs. Object: Abstracting Storage Changes from Access Say, you are gone for an extended amount of time…maybe 5 years © 2019 NetApp, Inc. All Rights Reserverd. Limited Use. Tape: Object: You hope your car is still in the garage, that you remember where it’s at, and that it will start . The Valet has made sure that your car is in good working order and delivers it to you. 10
  • 11. Let’s Start with the Object… Object is the atomically referenced unit of data storage and transmission © 2019 NetApp, Inc. All Rights Reserverd. Limited Use.11 Client S3 API PUT / GET / DELETE HTTPS Client Connections Object Key / Name Object type: JPG Date modified: 07/21/ 2014 GPS Coordinates: Lat, Long Location: DC @ Seattle Metadata Swift API
  • 12. Object, Bucket, Tenant, and Grid A bucket belongs in a tenant or account. 12 © 2019 NetApp, Inc. All Rights Reserverd. Limited Use. A StorageGRID Webscale system manages data of the tenants across multiple sites Tenant N Tenant 1 … Site 1 Site 2 Site N An object belongs in a bucket.
  • 13. Comparison SAN/NAS/Object Storage 13 Application Operating System Filesystem SAN FabricNetwork Client Disk Blocks Storage Volumes / LUNs iSCSI FC Client Application Operating System LANNetwork Disk Blocks Storage Volumes / LUNs Filesystem NFS SMB Export Share Client Application Network LAN S3 Swift Storage Disk Blocks Volumes / LUNs Data Meta- data Objects Buckets SAN NAS Object Storage © 2019 NetApp, Inc. All Rights Reserverd. Limited Use.
  • 14. Object Storage Versus File System © 2019 NetApp, Inc. All Rights Reserverd. Limited Use. 14 File Object Protocols NFS and CIFS (file protocols) S3 and Swift (RESTful APIs) Portability Often OS-specific Independent of OS Locking Portable Operating System Interface (POSIX) standard Application needs to coordinate locks Last write wins Editing Data Inline edits possible Overwrite full object Access Latency <1 ms ~10 ms Custom Metadata No Yes Namespace Usually within one data center Multiple data centers Protection Snapshot copies, clones, and so on Versioning and bucket policies
  • 15. NetApp SaaS Backup StorageGRID Use Cases 15 © 2019 NetApp, Inc. All Rights Reserverd. Limited Use. Content repository for Media and new Cloud Applications Backup and Archive to Cloud to reduce cost, improve RTO Data Analytics with unstructured data and machine learning S3 Performance Tier with Hot Data Capacity Tier with Cold Data, snapshots and DR volumes Tier cold data, snapshots and DR volumes to cloud with FabricPool IoT / Industry 4.0 direct interaction between sensors, machines and storage Collaboration Roaming User Sync & Share Gateways S3 Web Applications built on object storage Backup & restore Office 365 and Salesforce data Exchange SharePoint OneDrive
  • 16. Industry Collaboration via iRODS for Enhanced Research Data Management  Open Source Data Management Software  Used in both university and commercial spaces for research data  e.g. Bayer uses this for genomics data  Includes support for multiple back-ends  S3  NFS  POSIX is the default  Can act as an S3 gateway  Initial deployments are often SAN based using traditional virtualization techniques  In 2016 Bayer found challenges at petabyte scale without Object Stores  Costs, D/R Capability, Recovery+Backup https://irods.org © 2017 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---16
  • 17. Why use Object Storage for Research Data Management ?  Highly scalable (from TB to PB)  Low price tag  Fully automated  Low administrative effort  Backup included by design  DR included by design (replication)  Data security features (fast rebuilt, check summing, self healing, access control, encryption)  Appliance with vendor support  S3 interface support (de facto standard with iRODS plugin available)  Performs Well Based on Bayer’s Findings - https://irods.org/uploads/2016/06/Bayer_iRODS-and-Objectstorage_UGM2016.pdf © 2017 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---17
  • 18.  StorageGRID has the potential to become a single easy to use repository covering the majority of use cases for research data management  Current use cases don’t highlight:-  Need for wide area collaboration on large datasets  Data lifecycle and fine grained data placement  Opportunity for exploratory analytics and cross project data mining  Opportunities to commercialize or publish access to refined datasets Current Data Management / Storage Options at UoW © 2017 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---18
  • 19. Hands on Exercise How to make object stores friendly © 2017 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---19
  • 20. Workshop exercise 1 – Build your own OneDrive Equivalent 1) Navigate to test.crankbird.com/nextcloud 2) Download the demo-1-xx-setup.txt file (it has the security and login info etc you’ll need) 3) Install NextCloud onto a webserver using instructions from that file 4) Enable the external storage plugin 5) Create your own S3 bucket using the external storage plugin 6) Copy a file into that bucket 7) Log into the aws console using instructions from demo-1-xx-setup.txt file, head to s3 and look at the bucket, select the file (not the checkbox) and inspect Properties->Metadata 8) EXTRA CREDIT – Try to access a pre-existing datastore (aka fun with s3fs) 10 minutes © 2017 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---20
  • 21. AKA .. How to be nice without losing all your stuff How to secure and share data © 2017 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---21
  • 22. Versioning — Basics  Allows multiple objects with same key in same bucket  Cannot be disabled once enabled (only suspended)  Object version needs to be explicitly deleted 22 © 2019 NetApp, Inc. All Rights Reserverd. Limited Use. Key = object1 VersionId = 123 Key = object1 VersionId = 456 Key = object1 VersionId = 123 Key = object1 VersionId = 456 Delete Marker Id = 789 Key = object1 VersionId = 123 Key = object1 VersionId = 123 Key = object1 VersionId = 456 Key = object1 VersionId = 123 Object visible in Bucket Listing Object invisible in Bucket Listing Object visible in Bucket Listing
  • 23. Enabling Versioning From the shell you can use the aws command line https://docs.aws.amazon.com/cli/l atest/reference/s3api/put-bucket- versioning.html 23 © 2019 NetApp, Inc. All Rights Reserverd. Limited Use. aws s3api put-bucket-versioning --bucket my-bucket --versioning-configuration Status=Enabled
  • 24. Enabling Versioning – For Pro-gamers Remember GenXers … don‘t try this without help from your millenials 24 © 2019 NetApp, Inc. All Rights Reserverd. Limited Use. # Bucket versioning should be disabled (returns 'None') s3.Bucket(bucket_name).Versioning().status # Enable versioning s3.Bucket(bucket_name).Versioning().enable() # Versioning should now be enabled (returns 'Enabled') s3.Bucket(bucket_name).Versioning().status Python # Bucket versioning should be disabled (returns nothing) Get-S3BucketVersioning -BucketName $BucketName # Enable versioning Enable-S3BucketVersioning -BucketName $BucketName # Versioning should now be enabled (returns 'Enabled') Get-S3BucketVersioning -BucketName $BucketName PowerShell
  • 25. Workshop exercise 2 1) Using ssh from your computer, log into the server running the nextcloud websites 2) Enable bucket versioning 3) Delete one of the files in the bucket 4) Delete the “delete marker” 5) List the files again 6) See the file magically reappear 5 minutes © 2017 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---25 ssh –i thetademo.pem user-1-xx@demo-1.crankbird.com aws s3api put-bucket-versioning --bucket my-bucket -- versioning-configuration Status=Enabled aws s3api delete-object --bucket my-bucket --key test.txt aws s3api list-object-versions --bucket my-bucket aws s3api delete-object --bucket my-bucket –version-id <version> --key test.txt
  • 26. Sharing Data with Pre-Signed URLs  All objects by default are private. Only the object owner has permission to access these objects. However, the object owner can optionally share objects with others by creating a presigned URL, using their own security credentials, to grant time-limited permission to download the objects.  A pre-signed URL can be accessed without credentials for a pre-determined period of time with a set of pre-determined actions. (for example, object GET or PUT)  Generating Pre-Signed URLs does NOT require a connection to the Object Storage aws s3 presign s3://awsexamplebucket/test2.txt --expires-in 604800 https://examplebucket.s3.amazonaws.com/test2.txt?AWSAccessKeyId=AKIAEXAMPLEACC ESSKEY&Signature=EXHCcBe%EXAMPLEKnz3r8O0AgEXAMPLE&Expires=1556132848 26 © 2019 NetApp, Inc. All Rights Reserverd. Limited Use.
  • 27. Workshop exercise 3 1) Using the aws command line utility create a pre-signed URL for an object in your bucket (preferably a picture of a cat) 2) Copy that URL and paste it into a web browser, (or email it to a colleague and get them to try it) 3) See your cat picture 5 minutes © 2017 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---27
  • 28. This is some kind of next level … Multi-organization data sharing © 2017 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---28
  • 29.  Unix (NFS) and Windows (SMB) based authentication mechanisms which remain simple because they work within a single security domain, typically defined by LDAP / Active Directory  Access to data from multiple security contexts usually requires enabling anonymous access or opening up inter-forest trust relationships. Security teams rarely allow this level of trust between institutions  S3 compatible object stores have richer and more flexible data authentication and access permissioning schemes defined by JSON formatted rules that allow for completely different security contexts to safely share objects/buckets  Pre-signed URL’s which provide temporary access to anyone are an inherent capability Multi-Organisational Data Sharing © 2017 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---29
  • 30. S3 Request Authentication 30 © 2019 NetApp, Inc. All Rights Reserverd. Limited Use. AccessKey lookup FAIL if AccessKey cannot be found Signature validation FAIL if request Signature does not match calculated signature Group Policy validation DENY if an explicit DENY has been specified Bucket Policy validation DENY if an explicit DENY has been specified or if no ALLOW is specified in Bucket or Group Policy (implicit DENY) Object S3 request Bucket Policies follow the Access Policy Language Definition The AWS Policy Generator can be used to create policies online
  • 31. Workshop exercise 4 1) Open up https://awspolicygen.s3.amaz onaws.com/policygen.html in a web browser 2) Try to create a policy template that looks a bit like this 5 minutes © 2017 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---31 $ cat data_sharing_policy.json { "Statement": [ { "Effect": "Allow", "Principal": { "SGWS": "17468297853691494465" }, "Action": [ "s3:GetObject", "s3:ListBucket" ], "Resource": [ "urn:aws:s3:::shared-bucket", "urn:aws:s3:::shared-bucket/*" ] } ] } Tenant we want to delegate access to Allowed actions “Shared” bucket
  • 32. How to enrich data for analysis © 2017 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---32
  • 33. 1956 1974 19931987 2018 AI Winter AI Winter 1980 Artificial Intelligence Timeline Deep Learning Deep Reinforcement Learning DATA Statistical Performance Machine Learning Deep Learning
  • 34.  Exploratory Analytics is probably the highest value thing you can do with your data lake  You do feature extraction / assignment when the data is new for real time analytics  Features are Metadata  You cant easily do feature extraction across Petabytes of cold data  Its pretty much impossible to do it on tape Why Metadata Becomes More Important Over Time © 2018 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use Only34
  • 35. StorageGRID Site 1 San Francisco StorageGRID Site 2 New York StorageGRID Site 3 Munich StorageGRID Site 4 Tokyo International Co-Operation and Hybrid Cloud Data Pipelines © 2017 NetApp, Inc. All rights reserved.35 With StorageGRID Webscale Platform Services Mirror my bucket from Webscale to S3 and run Hive script on Elastic MapReduce Amazon EMR Amazon S3 Amazon SNS Notify my Lambda function to run facial recognition on images in my SG bucket Search object metadata in my SG bucket as they are updated Amazon Lambda Amazon Elasticsearch Amazon Rekognition
  • 36. Metadata can trigger ILM Policy Metadata  Objects can have up to 2 KB of metadata in form of key-value pairs  Case insensitive and HTTP header compatible ASCII characters allowed  Metadata is an integral part of the object  Metadata can only be changed by copying the object to itself 36 © 2019 NetApp, Inc. All Rights Reserverd. Limited Use. Write-S3Object -BucketName $BucketName -Key 'my-object' -Content 'Hello World!' -Metadata @{company='NetApp';location='Germany'} Get-S3ObjectMetadata -BucketName $BucketName -Key 'my-object' PowerShell Metadata Data metadata ={'company': 'NetApp', 'location': 'Germany'} obj = s3.Object(bucket_name, 'my-object') obj.put(Body='Hello World!’, Metadata=metadata) s3.Object(bucket_name, 'my-object').get().get('Metadata') Python
  • 37. Object Tagging  Similar to metadata, but more flexible  Can be added or updated independently of object data  Up to 10 unique tags per object  128 characters for key and 256 for value, case sensitive! 37 © 2019 NetApp, Inc. All Rights Reserverd. Limited Use. Can trigger ILM Policy Set-S3ObjectTagging -BucketName $BucketName -Key 'my-object' -Tags @{Name='projectId';Value='12345'},@{Name='company';Value='NetApp'} Get-S3ObjectTagging -BucketName $BucketName -Key 'my-object' Remove-S3ObjectTagging -BucketName $BucketName -Key 'my-object' PowerShell obj = s3.Object(bucket_name, 'my-object') tags = 'projectId=12345&company=NetApp' obj.put(Body='Hello World!', Tagging=tags) client.get_object_tagging(bucket_name,'my-object') Python Metadata Data Ta g Ta g Ta g
  • 38. Applications of Elasticsearch 38 © 2018 NetApp, Inc. All rights reserved. — NETAPP CONFIDENTIAL — Search engine Use as a search engine for your object store Data visualization Kibana dashboard to visualize the data based on metadata fields Data cataloging Keep track of all your data in your data lake for a single source of truth about your data Training data analytics Analyze the data to ensure a representative training dataset for ML/AI models Bucket monitoring Track how much data is stored and usage stats$
  • 39. Kibana Data Visualization 39 © 2018 NetApp, Inc. All rights reserved. — NETAPP CONFIDENTIAL —
  • 40. Use Case: High Performance Data Lake for Data Analytics 40 © 2019 NetApp, Inc. All Rights Reserverd. Limited Use. Docker based StorageGRID running on bare metal ruggedized server with local SSDs - Single Copy - Edge Core Automatic replication using S3 based CloudMirror with guaranteed data delivery >100PB Data Lake build on StorageGRID Appliances - Erasure Coding for increased availability and durability - Windows Client (HiL/SiL) Linux Client (HiL/SiL) Mount Data Lake as Windows Drive using CloudBerry Drive Mount Data Lake as local Mountpoint using s3fs-fuse Cloud Automatic replication using S3 based CloudMirror with guaranteed data delivery S3-enabled Application (HiL/SiL) Multiple PB ingest/day Usage of Cloud Services Cloud Bursting >100GByte/s retrieval
  • 41. Data Improvement / Refinement at Australian Universities © 2017 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---41
  • 42. Monetising your data Requestor Pays Buckets © 2017 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---42
  • 43. Workshop exercise 5 1) Log into the AWS console 2) Select Lambda (under compute) 3) Create Function -> Use BluePrint -> recognition-python -> configure 4) Name -> Use existing role -> UberLambda 5) Choose your bucket from the dropdown 6) Enable trigger -> create function 7) Open Designer -> Add events for your bucket (PUT and POST) 8) Copy in the code from S3Recog.py 9) Add new picture into the bucket from nextcloud 10) Inspect the new metadata from the s3 interface 15 minutes © 2017 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---43
  • 44. Building hybrid cloud data analytics worklows using NetApp DataFabric Technologies Extra Credit © 2017 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---44
  • 45. A Vision for an Australian Research Data Network  A single multi-tenant StorageGRID for secure collaborative data resource between Government, Defense, and leading Research Universities  Data stays under full custodianship of Australian Citizens with policy based placement, control and role based access  Local access to hot data with throughput measured in Gigabytes/sec  Cold data distributed at lowest cost  Leverages HPC, Hadoop analytics ecosystem in Private, Public and Hybrid Cloud deployments 120 PB Capacity 100B Objects 16 Sites
  • 46. CloudSync NFS / SMB to Object in Cloud and On Premesis  Cloud Sync is a service  From https://cloud.netapp.com  Create your relationship  Identify the NFS / CIFS server  Launch a Data Broker  Select your S3 bucket  Source, target and the Data Broker may be in the cloud or on-premises Data Broker 46 © 2019 NetApp, Inc. All Rights Reserverd. Limited Use. AWS S3
  • 47. Cloud Mirroring Replicate log objects with prefix sgws://bucket_name/logs/2017-08* to s3://us-east-1.bucket_name to be analyzed in Elastic MapReduce Bucket Event Notifications When new objects with prefix sgws://bucket_name/images/* are created, notify my Lambda function to copy the image into AWS S3 and detect features in the scene Search Integration Update my Elasticsearch index when new object metadata is written or updated for objects matching prefix sgws://bucket_name/images/* StorageGrid User Driven Platform Services Empower Users to Implement Hybrid Cloud Data Pipelines with Amazon Web Services © 2019 NetApp, Inc. All Rights Reserverd. Limited Use.47 SG Amazon EMR Amazon S3 SG Amazon SNS Amazon Lambda 1 2 3 Amazon S3 2 3 4 5 6 1 SG 1 2 Amazon Rekognition
  • 48. © 2017 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---48 Thank You.
  • 49. Pre-Signed URLs 49 © 2019 NetApp, Inc. All Rights Reserverd. Limited Use. url = client.generate_presigned_url( 'get_object', {'Bucket': bucket_name, 'Key': 'my-key'}, ExpiresIn=3600) Python https://webscaledemo.netapp.com/mybucket/my-key?Action=GET &X-Amz-Algorithm=AWS4-HMAC-SHA256 &X-Amz-Credential=IZUDAQ5NQ39AUJ6HR3Q4%2F20181220%2Fus-east-1%2Fs3%2Faws4_request &X-Amz-Date=20181220T162225Z &X-Amz-Expires=3600 &X-Amz-SignedHeaders=host &X-Amz-Signature=4a5981c70bb69e8d103cd37f1f5985784979fe95b5d17e202860d778dcb22b7c Get-S3PresignedUrl -Method 'GET' -BucketName $BucketName -Key 'my-key’ -Expires (Get-Date).AddHours(1) PowerShell
  • 50. Persisting S3 Credentials  Credentials need to be stored locally in ~/.aws/credentials (Linux) or $HOME/.aws/credentials (Windows) or in a properties file, but never in code  Config parameters can be stored in ~/.aws/config (Linux) or $HOME/.aws/config (Windows) or in a properties file  Create an AWS Profile  via PowerShell  New-AwsProfile -ProfileName 'default' -AccessKey 'FAHKA1WXDM5E7N4L1JP3' - SecretKey '1iqbkq+yECyIFXbSnpzdeF8zVcWDm/u6DOgxNzHW' -EndpointUrl 'https://webscaledemo.netapp.com'  manual by editing $HOME/.aws/credentials or ~/.aws/credentials  [default] aws_access_key_id = FAHKA1WXDM5E7N4L1JP3 aws_secret_access_key = 1iqbkq+yECyIFXbSnpzdeF8zVcWDm/u6DOgxNzHW © 2019 NetApp, Inc. All Rights Reserverd. Limited Use.50
  • 51. How to handle custom metadata  As Object Metadata is an integral part of the object and is limited to 2KB it should be used to describe how and where the object should be stored, e.g. for  Service Level Agreements  Locality or Geo-Fencing  Importance  Confidentiality  As object tags can be added and removed, but are limited to 10 tags, they should be used to group objects by e.g.  cost center  project 51 © 2019 NetApp, Inc. All Rights Reserverd. Limited Use.
  • 52. Bucket Policies 52 © 2019 NetApp, Inc. All Rights Reserverd. Limited Use. # Reference Bucket Policy bucket_policy = s3.BucketPolicy(bucket_name) # Load Policy from local file and attach it to bucket open('/path/to/policy.json','r') as f: bucket_policy.put(Policy=f.read()) # Detach current Bucket Policy from bucket bucket_policy.delete() Python # create Bucket Policy $BucketPolicy = New-AwsPolicy -Principal @{SGWS='17468297853691494465’} -Action "s3:GetObject","s3:ListBucket" -Resource "urn:sgws:s3:::shared-bucket","urn:sgws:s3:::shared- bucket/*” Set-S3BucketPolicy -BucketName $BucketName -Policy $BucketPolicy Remove-S3BucketPolicy -BucketName $BucketName PowerShell
  • 53. Pre-Signed URLs  A pre-signed URL can be accessed without credentials for a pre-determined period of time with a set of pre-determined actions. (for example, object GET or PUT)  Generating Pre-Signed URLs does NOT require a connection to the Object Storage 53 © 2019 NetApp, Inc. All Rights Reserverd. Limited Use. url = client.generate_presigned_url( 'get_object', {'Bucket': bucket_name, 'Key': 'my-key'}, ExpiresIn=3600) Python https://webscaledemo.netapp.com/mybucket/my-key?Action=GET &X-Amz-Algorithm=AWS4-HMAC-SHA256 &X-Amz-Credential=IZUDAQ5NQ39AUJ6HR3Q4%2F20181220%2Fus-east-1%2Fs3%2Faws4_request &X-Amz-Date=20181220T162225Z &X-Amz-Expires=3600 &X-Amz-SignedHeaders=host &X-Amz-Signature=4a5981c70bb69e8d103cd37f1f5985784979fe95b5d17e202860d778dcb22b7c Get-S3PresignedUrl -Method 'GET' -BucketName $BucketName -Key 'my-key’ -Expires (Get-Date).AddHours(1) PowerShell
  • 54. Pre-Signed URLs — Example  Use case: Photo-sharing app on mobile phone  Bad design: Put S3 credentials in phone app to upload photos to S3  Better design: Use pre-signed URLs  S3 credentials will only be required on the (secured) server side, but not inside the phone app  Traffic does not need to be routed through backend servers 54 © 2019 NetApp, Inc. All Rights Reserverd. Limited Use. Servers Backend App Phone Frontend App 1. Request photo upload 3. Return pre- signed URL for photo upload 4. Upload photo via pre-signed URL 2. Generate pre- signed URL (in backend)
  • 55. S3 Best Practices: Bucket Listing  Listing objects of a bucket with millions of objects is a very expensive operation  Metadata needs to be fetched from all storage nodes  Can result in high CPU utilization due to high load on metadata database  Bucket event notifications  SNS notifications can be a much better approach in many cases  Available since NetApp® StorageGRID® 11.0  Example: When a new object is ingested with prefix sgws://bucket_name/images/* notify my application to analyze the object and update its metadata  Try it out with localstack SNS https://github.com/localstack/localstack © 2019 NetApp, Inc. All Rights Reserverd. Limited Use.55 SG Amazon SNS SNS Receiver Application 2 3 5 1 4