Deep Dive on Archiving and Compliance

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved
Deep Dive on Archiving and Compliance with Amazon Glacier
Henry Zhang
Senior Product Manager, Amazon Glacier

Cloud Data Migration
Direct
Connect
Snow* data
transport
family
3rd Party
Connectors
Transfer
Acceleration
Storage
Gateway
Kinesis Firehose
The AWS Storage Portfolio
Object
Amazon GlacierAmazon S3
Block
Amazon EBS
(persistent)
Amazon EC2
Instance Store
(ephemeral)
File
Amazon EFS

Satellite Image Archive
• DigitalGlobe takes Satellite imagery of the Earth
• 100PB image library = 6 billion square kilometers
• 1PB new image every year
• Images to be archived and retained for decades

Patient data–Philips Healthcare
• HealthSuite digital platform powered by AWS
• 15 petabytes of patient data
• Archived for decades (beyond the lifetime of patients)
• Uses AWS HIPAA-eligible services in the BAA

Public sector–King County
• Most populous county in Washington state
• Replaced tape solution for backup from 17 agencies
• Meets compliance requirement
• Saved $1MM in first year; no more tape refresh or
management churn

Archive:
Data retained for the long term,
for compliance or potential
future reference
Data archiving needs are growing everywhere
• Media assets, 4K, 8K
• Health care/life sciences
• Financial services
• Regulated industries
• Oil and gas/geospatial
• Digital preservation
• Long-term backups
• Logs

Consideration 1 – Total Archive Cost

Traditional archiving approaches
• Tape libraries, robots, drives, media
• Onsite (online and offline)
• Offsite tape out/vaulting
• Specialized software and personnel
• Tape refresh every 3-5 years

How can AWS help with your archival?
Metered usage:
Pay as you go
No capital investment
No commitment
No risky capacity planning
Avoid risks of physical
media handling
Control your
geographic locality for
performance and
compliance

1 PB raw storage
800 TB usable storage
600 TB allocated storage
400 TB application data
Storage pricing - pay only for what you use
AWS Cloud
Storage
Amazon Glacier starts at $0.004/GB/month
Price dropped by 43% on 11/21/2016

Consideration 2 – Durability

99.999999999%
Durability
Durability for long-term preservation
Built-in Fixity Checking
Automatic recovery

Consideration 3 – Accessibility

Accessing Amazon Glacier
1. Direct Amazon Glacier API/SDK
2. Amazon S3 lifecycle integration
3. Third-party tools and gateways
FastGlacier

Amazon Glacier – Direct access/APIs
Create
Vault
Configure
Access
Upload
Archives
Register
Archive ID
Data Upload
Initiate
Retrieval
Async
Retrieval
Completion
Completion
Notification
Download
Data
Data Retrieval

Use Glacier via S3 Object Lifecycle
S3 Standard
Active data Archive dataInfrequently accessed data
S3 - Infrequent Access Amazon Glacier
Synchronous access Async accessSynchronous access
$0.023/GB/mo. $0.004/GB/mo.$0.0125/GB/mo.

- Transition Standard to Standard-IA
- Transition Standard-IA to Amazon Glacier
- Transition based on object tags
- Expiration and versioning
Data lifecycle management
T T+3 days T+5 days T+ 15 days T + 25 days T + 30 days T + 60 days T + 90 days T + 150 days T + 250 days T + 365 days
Data access frequency over time

Transition older videos to Standard-IA

Save money on storage
45% saving over S3 Standard-IA
68% saving over S3 Standard-IA
* Assumes the highest public pricing tier

Amazon Glacier – Third-party tools and gateways
• Consumer grade: less than $50
• Example: Cloudberry, FastGlacier, Arq (Haystack Software)
• Small / medium business: $500 - $1,000
• Example: Synology, Veeam, QNap
• Enterprise gateway and data management software
• Example: NetApp AltaVault, CommVault, StorNext, StoreReduce,
Vidispine

Which option should I choose?
• Use S3 lifecycle managed Amazon Glacier if the S3
object keys are sufficient for index/search capability
• Use Amazon Glacier directly if you already plan to store
more metadata/indices in a database
• Use 3rd party tools to minimize coding

Amazon Glacier – Data Retrieval Tiers
Standard Retrieval
• Current model
• 3-5 hours
• Disaster Recovery
Bulk Retrieval
• Batch/Bulk access
• 5-12 hours
• PB scale re-transcoding
or video/image analysis
Expedited Retrieval
• Emergency access
• 1-5 minutes
• Last minute play-out
schedule swap
$0.03/GB $0.01/GB $0.0025/GB
On-site tape replacement Off-site tape replacement

• Media distribution backbone (Ve.nue platform)
• Over-The-Top (OTT) broadcast service
• 20PBs of media assets, 1MM+ hours of high-res content
• Assets to be archived and retained for decades
Video archives

Comprehensive media lifecycle
@SonyDADCNMS

“If physical deliveries can happen
within one hour based on
unpredictable requests, surely we
are able to exceed such
expectations digitally”
@SonyDADCNMS

Our migration
The Challenge
• Seamlessly migrate a platform that enables content
delivery across all devices and more than 1,200
distribution points worldwide
• Store 20 petabytes of motion picture and television
content
• Equating to 1,000,000 M+ hours of content
• At a growth curve of ~1 petabyte every quarter
Desired Goals:
• One-hour delivery turn around time
• Agile, scalable, predictable cost model and
infrastructure
• Investing in innovation vs. hardware
@SonyDADCNMS

On-premises asset storage workflow
@SonyDADCNMS

AWS Cloud-based asset storage workflow
@SonyDADCNMS

Amazon Glacier vs. on-premises cost comparison
@SonyDADCNMS

Consideration 4 - Compliance

Amazon Glacier Vault Lock allows you to easily
set compliance controls on individual vaults and
enforce them via a lockable policy
Time-based retention
MFA authentication
Controls govern all
records in a vault
Immutable policy
Two-step locking
Compliance storage with Vault Lock

Vault Lock for compliance storage
• Non-overwrite, non-erasable records
• Time-based retention with “ArchiveAgeInDays” control
• Policy lockdown (strong governance)
• Legal hold with vault-level tags
• Configure optional designated third-party access and grant
temporary access

Amazon Glacier received a third-party assessment
from Cohasset Associates on how Amazon Glacier
with Vault Lock can be used to meet the requirements
of SEC Rule 17a-4(f) and CFTC 1.31(b)-(c).

Proofpoint
• Cloud-based security and compliance for the enterprise:
threat research, email, mobile, social, digital risk
• Founded 2002, public in 2012
• $350M annual revenue, $3B market cap
• Big AWS user

Proofpoint SocialPatrol
Policy controls and enforcement for social
• Combats fraudulent brand impersonation
• Moderates content at scale
• Ensures compliance in publishing
• Integrates with social APIs
• 150+ classifiers using NLP and ML
• Text, links, images, meta data
• Ingesting >1M social posts per day
• Built in AWS

Proofpoint SocialPatrol Archive with Glacier
SEC Rule 17a-4(f)-compliant archive, purpose-built for
social, enabled by Amazon Glacier and Vault Lock
PFPT in AWS
Policy engine MySQL/C*/SolrSocial
Amazon Glacier
& Vault Lock

Proofpoint SocialPatrol Archive
The customer specifies the retention period in Proofpoint
Social:

Via AWS API we create a vault for that customer:

Via AWS API,
we lock the vault,
and specify policy
to observe a
legal hold via a tag.

As social content flows in, we record its purge date and
surface that to the user. Each piece of social content is an
archive in the vault.

aws.amazon.com/activate
Everything and Anything Startups
Need to Get Started on AWS

Deep Dive on Archiving and Compliance

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Deep Dive on Archiving and Compliance

Similar to Deep Dive on Archiving and Compliance (20)

More from Amazon Web Services

More from Amazon Web Services (20)

Deep Dive on Archiving and Compliance

Editor's Notes