SlideShare a Scribd company logo
1 of 48
Download to read offline
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Henry Zhang, Senior Product Manager, AWS
Rich Sutton, VP of Engineering, Digital Risk, Proofpoint
November 30, 2016
STG209
Strategic Planning for Long-Term
Data Archiving with Amazon Glacier
AWS storage maturity
Amazon EFS
File
Amazon Elastic
Block Store
Amazon EC2
Instance Store
Block
Amazon S3 Amazon Glacier
Object
Data Transfer
AWS Direct
Connect
AWS
Snowball
ISV
Connectors
Amazon
Kinesis
Firehose
Amazon S3
Transfer
Acceleration
AWS Storage
Gateway
• Media distribution backbone (Ve.nue platform)
• Over-The-Top (OTT) broadcast service
• 20PBs of media assets, 800,000 hours of high-res content
• Assets to be archived and retained for decades
Video archives
Patient data–Philips Healthcare
• HealthSuite digital platform powered by AWS
• 15 petabytes of patient data
• Archived for decades (beyond the lifetime of patients)
• Uses AWS HIPAA-eligible services in the BAA
Public sector–King County
• Most populous county in Washington state
• Replaced tape solution for backup from 17 agencies
• Meets compliance requirement
• Saved $1MM in first year; no more tape refresh or
management churn
Archive:
Data retained for the long term,
for compliance or potential
future reference
Data archiving needs are growing everywhere
• Media assets, 4K, 8K
• Health care/life sciences
• Financial services
• Regulated industries
• Oil and gas/geospatial
• Digital preservation
• Long-term backups
• Logs
Consideration 1 – Total Archive Cost
Traditional archiving approaches
• Tape libraries, robots, drives, media
• Onsite (online and offline)
• Offsite tape out/vaulting
• Specialized software and personnel
• Tape refresh every 3-5 years
How can AWS help with your archival?
Metered usage:
Pay as you go
No capital investment
No commitment
No risky capacity planning
Avoid risks of physical
media handling
Control your
geographic locality for
performance and
compliance
1 PB raw storage
800 TB usable storage
600 TB allocated storage
400 TB application data
Storage pricing - pay only for what you use
AWS Cloud
Storage
Amazon Glacier starts at $0.004/GB/month
Price drop by 43% on 11/21
Consideration 2 – Durability
99.999999999%
Durability
Durability for long-term preservation
Built-in Fixity Checking
Automatic recovery
Consideration 3 – Accessibility
Amazon Glacier – Data Retrieval Tiers
Standard Retrieval
• Current model
• 3-5 hours
• Disaster Recovery
Bulk Retrieval
• Batch/Bulk access
• 5-12 hours
• PB scale re-transcoding
or video/image analysis
Expedited Retrieval
• Emergency access
• 1-5 minutes
• Last minute play-out
schedule swap
$0.03/GB $0.01/GB $0.0025/GB
On-site tape replacement Off-site tape replacement
Consideration 4 - Application & Data Management
Amazon Glacier – 3 ways to Access
•Direct Glacier API/SDK
•S3 lifecycle integration
•Third party tools and gateways
Amazon Glacier – Direct access/APIs
Create
Vault
Configure
Access
Upload
Archives
Register
Archive ID
Data Upload
Initiate
Retrieval
Async
Retrieval
Completion
Completion
Notification
Download
Data
Data Retrieval
Use Glacier via S3 Object Lifecycle
S3 Standard
Active data Archive dataInfrequently accessed data
S3 - Infrequent Access Amazon Glacier
Synchronous access Async accessSynchronous access
$0.023/GB/mo. $0.004/GB/mo.$0.0125/GB/mo.
- Transition Standard to Standard-IA
- Transition Standard-IA to Amazon Glacier
- Transition based on object tags
- Expiration and versioning
Data lifecycle management
T T+3 days T+5 days T+ 15 days T + 25 days T + 30 days T + 60 days T + 90 days T + 150 days T + 250 days T + 365 days
Data access frequency over time
Transition older videos to Standard-IA
Save money on storage
45% saving over S3 Standard
44% saving over S3 Standard-IA
* Assumes the highest public pricing tier
Amazon Glacier – Third-party tools and gateways
• Consumer grade: less than $50
• Example: Cloudberry, FastGlacier, Arq (Haystack Software)
• Small / medium business: $500 - $1,000
• Example: Synology, Veeam, QNap
• Enterprise gateway and data management software
• Example: NetApp AltaVault, CommVault, StorNext, Vidispine
Which option should I choose?
• Use S3 lifecycle managed Amazon Glacier if the S3
object keys are sufficient for index/search capability
• Use Amazon Glacier directly if you already plan to store
more metadata/indices in a database
• Use 3rd party tools to minimize coding
• Does the tool write data in proprietary or native format in AWS?
corporate data center
Media Archive and Metadata (cloud transition)
Onsite Archive Offsite Tape Archive
Hierarchical Storage Manager
Metadata (Asset Manager)
Processing Tasks
On-Premise Tape
Onsite Archive
Hierarchical Storage Manager
Metadata (Asset Manager)
Processing Tasks
corporate data center
AWS Region
Amazon Glacier
Cloud DAM (Syncing Metadata from on-prem)
Amazon Direct Connect
Offsite Tape ArchiveOn-Premise Tape
Media Archive (transition to the cloud)
Onsite Archive
Hierarchical Storage Manager
Metadata (Asset Manager)
Processing Tasks
corporate data center
AWS Region
Amazon Glacier
Cloud DAM (Syncing
Metadata from on-
prem)
Amazon S3
Cloud Based Processing
Tasks
Amazon Direct Connect
On-Premise Tape Offsite Tape Archive
Media Archive (transition to the cloud)
Onsite Archive
Hierarchical Storage Manager
Metadata (Asset Manager)
Processing Tasks
corporate data center
AWS Region
Amazon Glacier
Cloud DAM (Syncing
Metadata from on-
prem)
Amazon S3
Cloud Based Processing
Tasks
Amazon Direct Connect
Onsite Cache Offsite Tape ArchiveOn-Premise Tape
Media Archive (transition to the cloud)
Consideration 5 - Compliance and Retention
Amazon Glacier Vault Lock allows you to easily
set compliance controls on individual vaults and
enforce them via a lockable policy
Time-based retention
MFA authentication
Controls govern all
records in a vault
Immutable policy
Two-step locking
Compliance storage with Vault Lock
Vault Lock for compliance storage
• Non-overwrite, non-erasable records
• Time-based retention with “ArchiveAgeInDays” control
• Policy lockdown (strong governance)
• Legal hold with vault-level tags
• Configure optional designated third-party access and grant
temporary access
Amazon Glacier received a third-party assessment
from Cohasset Associates on how Amazon Glacier
with Vault Lock can be used to meet the requirements
of SEC Rule 17a-4(f) and CFTC 1.31(b)-(c).
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Rich Sutton, VP of Engineering
Digital Risk, Social Media Security, and Compliance
Proofpoint SocialPatrol Archive
AWS Glacier and Vault Lock
Use Case
Proofpoint
• Cloud-based security and compliance for the enterprise:
threat research, email, mobile, social, digital risk
• Founded 2002, public in 2012
• $350M annual revenue, $3B market cap
• Huge AWS user
Proofpoint SocialPatrol
Policy controls and enforcement for social
• Combats fraudulent brand impersonation
• Moderates content at scale
• Ensures compliance in publishing
• Integrates with social APIs
• 150+ classifiers using NLP and ML
• Text, links, images, meta data
• Ingesting >1M social posts per day
• Built in AWS
Proofpoint SocialPatrol
How it works:
PFPT in AWS
Policy engine MySQL/C*/Solr
Enterprise
Archive
“Awesome. Help me with retention by integrating with my existing email archive.”
Social
Proofpoint SocialPatrol archiving integration
Imperfect …
Social != Email Every archive is
different
Requires internal
collaboration
Proofpoint SocialPatrol Archive
SEC Rule 17a-4(f)-compliant archive, purpose-built for
social, enabled by Amazon Glacier and Vault Lock
PFPT in AWS
Policy engine MySQL/C*/SolrSocial
Amazon Glacier
& Vault Lock
Proofpoint SocialPatrol Archive
The customer specifies the retention period in Proofpoint
Social:
Proofpoint SocialPatrol Archive
Via AWS API we create a vault for that customer:
Proofpoint SocialPatrol Archive
Via AWS API,
we lock the vault,
and specify policy
to observe a
legal hold via a tag.
Proofpoint SocialPatrol Archive
As social content flows in, we record its purge date and
surface that to the user. Each piece of social content is an
archive in the vault.
Proofpoint SocialPatrol Archive
Search UI uses
the copy of the data
we already had.
As archives expire,
we purge them.
Proofpoint SocialPatrol Archive
• Legal hold can be put in place by Proofpoint Support
• Data can be exported from Amazon Glacier by
Proofpoint Support when necessary
• Amazon Glacier with Vault Lock allowed us to build a
product that complies with SEC Rule 17a-4(f) and CFTC
Rule 1.31(b)-(c)
What would it have cost for us to build a WORM data store,
get it certified, and scale it … ?
Snowball Edge
• Accelerate PBs with AWS-
provided appliances
• NEW 100 TB model with
compute
Storage Gateway
• Instant hybrid cloud
• Up to 120 MB/s cloud upload rate
(4x improvement)
Data ingestion into AWS storage services
Firehose
• Ingest data streams directly into
AWS data stores
Direct Connect
• COLO to AWS
ISV Connectors
• Commvault
• Veritas
• etcetera
NEW S3 Transfer Acceleration
• Accelerate object transfer up to
300% using AWS’s private
network
Related Sessions
STG302 - Deep Dive on Amazon Glacier
STG210 - Simplified Data Center Migration—Lessons
Learned by Live Nation
STG312 - Workshop: Working with AWS Snowball -
Accelerating Data Ingest into the Cloud
Related Sessions
STG302 - Deep Dive on Amazon Glacier
STG210 - Simplified Data Center Migration—Lessons
Learned by Live Nation
STG312 - Workshop: Working with AWS Snowball -
Accelerating Data Ingest into the Cloud
Remember to complete
your evaluations!
Thank you!

More Related Content

What's hot

What's hot (20)

Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
From my sql to postgresql using kafka+debezium
From my sql to postgresql using kafka+debeziumFrom my sql to postgresql using kafka+debezium
From my sql to postgresql using kafka+debezium
 
From Data Warehouse to Lakehouse
From Data Warehouse to LakehouseFrom Data Warehouse to Lakehouse
From Data Warehouse to Lakehouse
 
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its Benefits
 
DataOps - The Foundation for Your Agile Data Architecture
DataOps - The Foundation for Your Agile Data ArchitectureDataOps - The Foundation for Your Agile Data Architecture
DataOps - The Foundation for Your Agile Data Architecture
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Bridge to Cloud: Using Apache Kafka to Migrate to GCP
Bridge to Cloud: Using Apache Kafka to Migrate to GCPBridge to Cloud: Using Apache Kafka to Migrate to GCP
Bridge to Cloud: Using Apache Kafka to Migrate to GCP
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
 
Data Architecture - The Foundation for Enterprise Architecture and Governance
Data Architecture - The Foundation for Enterprise Architecture and GovernanceData Architecture - The Foundation for Enterprise Architecture and Governance
Data Architecture - The Foundation for Enterprise Architecture and Governance
 
Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...
Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...
Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...
 
Migrating and modernizing your data estate to Azure with Data Migration Services
Migrating and modernizing your data estate to Azure with Data Migration ServicesMigrating and modernizing your data estate to Azure with Data Migration Services
Migrating and modernizing your data estate to Azure with Data Migration Services
 
Enterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshEnterprise guide to building a Data Mesh
Enterprise guide to building a Data Mesh
 
Data-Ed Slides: Best Practices in Data Stewardship (Technical)
Data-Ed Slides: Best Practices in Data Stewardship (Technical)Data-Ed Slides: Best Practices in Data Stewardship (Technical)
Data-Ed Slides: Best Practices in Data Stewardship (Technical)
 
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse ArchitectureServerless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
 
Disaster Recovery of on-premises IT infrastructure with AWS
Disaster Recovery of on-premises IT infrastructure with AWSDisaster Recovery of on-premises IT infrastructure with AWS
Disaster Recovery of on-premises IT infrastructure with AWS
 
Open core summit: Observability for data pipelines with OpenLineage
Open core summit: Observability for data pipelines with OpenLineageOpen core summit: Observability for data pipelines with OpenLineage
Open core summit: Observability for data pipelines with OpenLineage
 
Data Mesh for Dinner
Data Mesh for DinnerData Mesh for Dinner
Data Mesh for Dinner
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
 
Enterprise Data Architecture Deliverables
Enterprise Data Architecture DeliverablesEnterprise Data Architecture Deliverables
Enterprise Data Architecture Deliverables
 

Viewers also liked

Viewers also liked (20)

AWS re:Invent 2016: Deep Dive on Amazon Glacier (STG302)
AWS re:Invent 2016: Deep Dive on Amazon Glacier (STG302)AWS re:Invent 2016: Deep Dive on Amazon Glacier (STG302)
AWS re:Invent 2016: Deep Dive on Amazon Glacier (STG302)
 
(STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS
(STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS(STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS
(STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS
 
AWS June 2016 Webinar Series - Best Practices for Architecting Cloud Backup a...
AWS June 2016 Webinar Series - Best Practices for Architecting Cloud Backup a...AWS June 2016 Webinar Series - Best Practices for Architecting Cloud Backup a...
AWS June 2016 Webinar Series - Best Practices for Architecting Cloud Backup a...
 
Automating Backup & Archiving with AWS and CommVault
Automating Backup & Archiving with AWS and CommVaultAutomating Backup & Archiving with AWS and CommVault
Automating Backup & Archiving with AWS and CommVault
 
Bit Level Preservation
Bit Level PreservationBit Level Preservation
Bit Level Preservation
 
Protecting Your Data in AWS
Protecting Your Data in AWSProtecting Your Data in AWS
Protecting Your Data in AWS
 
Pbm Week One
Pbm Week OnePbm Week One
Pbm Week One
 
AWS Webcast - Archiving in the Cloud - Best Practices for Amazon Glacier
AWS Webcast - Archiving in the Cloud - Best Practices for Amazon GlacierAWS Webcast - Archiving in the Cloud - Best Practices for Amazon Glacier
AWS Webcast - Archiving in the Cloud - Best Practices for Amazon Glacier
 
Logo 1 retrieval and retention 2017
Logo 1 retrieval and retention 2017Logo 1 retrieval and retention 2017
Logo 1 retrieval and retention 2017
 
Data Storage for the Long Haul: Compliance and Archive
Data Storage for the Long Haul: Compliance and ArchiveData Storage for the Long Haul: Compliance and Archive
Data Storage for the Long Haul: Compliance and Archive
 
Everything You Need for a Viral Game (Except the Game)
Everything You Need for a Viral Game (Except the Game)Everything You Need for a Viral Game (Except the Game)
Everything You Need for a Viral Game (Except the Game)
 
ALL Languages Live 2016 show and tell
ALL Languages Live 2016  show and tellALL Languages Live 2016  show and tell
ALL Languages Live 2016 show and tell
 
AWS re:Invent 2016: Setting the Stage for Instant Success: Getting the Most O...
AWS re:Invent 2016: Setting the Stage for Instant Success: Getting the Most O...AWS re:Invent 2016: Setting the Stage for Instant Success: Getting the Most O...
AWS re:Invent 2016: Setting the Stage for Instant Success: Getting the Most O...
 
Building a Serverless Pipeline
Building a Serverless PipelineBuilding a Serverless Pipeline
Building a Serverless Pipeline
 
AWS re:Invent 2016: From Resilience to Ubiquity - #NetflixEverywhere Global A...
AWS re:Invent 2016: From Resilience to Ubiquity - #NetflixEverywhere Global A...AWS re:Invent 2016: From Resilience to Ubiquity - #NetflixEverywhere Global A...
AWS re:Invent 2016: From Resilience to Ubiquity - #NetflixEverywhere Global A...
 
AWS Snowball: Accelerating Large-Scale Data Ingest Into the AWS Cloud | AWS P...
AWS Snowball: Accelerating Large-Scale Data Ingest Into the AWS Cloud | AWS P...AWS Snowball: Accelerating Large-Scale Data Ingest Into the AWS Cloud | AWS P...
AWS Snowball: Accelerating Large-Scale Data Ingest Into the AWS Cloud | AWS P...
 
Enterprise & Media Storage in the Cloud
Enterprise & Media Storage in the CloudEnterprise & Media Storage in the Cloud
Enterprise & Media Storage in the Cloud
 
Serverless architecture with AWS Lambda (June 2016)
Serverless architecture with AWS Lambda (June 2016)Serverless architecture with AWS Lambda (June 2016)
Serverless architecture with AWS Lambda (June 2016)
 
AWS re:Invent 2016: How Gree Launched New Games Faster and More Securely with...
AWS re:Invent 2016: How Gree Launched New Games Faster and More Securely with...AWS re:Invent 2016: How Gree Launched New Games Faster and More Securely with...
AWS re:Invent 2016: How Gree Launched New Games Faster and More Securely with...
 
AWS Hybrid Cloud Connectivity - VPN Solutions
AWS Hybrid Cloud Connectivity - VPN SolutionsAWS Hybrid Cloud Connectivity - VPN Solutions
AWS Hybrid Cloud Connectivity - VPN Solutions
 

Similar to AWS re:Invent 2016: Strategic Planning for Long-Term Data Archiving with Amazon Glacier (STG209)

Similar to AWS re:Invent 2016: Strategic Planning for Long-Term Data Archiving with Amazon Glacier (STG209) (20)

Strategic Uses for Cost Efficient Long-Term Cloud Storage
Strategic Uses for Cost Efficient Long-Term Cloud StorageStrategic Uses for Cost Efficient Long-Term Cloud Storage
Strategic Uses for Cost Efficient Long-Term Cloud Storage
 
Deep Dive on Archiving and Compliance
Deep Dive on Archiving and ComplianceDeep Dive on Archiving and Compliance
Deep Dive on Archiving and Compliance
 
Data Storage for the Long Haul: Compliance and Archive
Data Storage for the Long Haul: Compliance and ArchiveData Storage for the Long Haul: Compliance and Archive
Data Storage for the Long Haul: Compliance and Archive
 
Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AW...
Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AW...Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AW...
Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AW...
 
Data Storage for the Long Haul: Compliance and Archive
Data Storage for the Long Haul: Compliance and ArchiveData Storage for the Long Haul: Compliance and Archive
Data Storage for the Long Haul: Compliance and Archive
 
An Overview of AWS services for Data Storage and Migration - SRV205 - Toronto...
An Overview of AWS services for Data Storage and Migration - SRV205 - Toronto...An Overview of AWS services for Data Storage and Migration - SRV205 - Toronto...
An Overview of AWS services for Data Storage and Migration - SRV205 - Toronto...
 
Aws storage for media overview
Aws storage for media overview Aws storage for media overview
Aws storage for media overview
 
Deep Dive on Object Storage: Amazon S3 and Amazon Glacier | AWS Public Sector...
Deep Dive on Object Storage: Amazon S3 and Amazon Glacier | AWS Public Sector...Deep Dive on Object Storage: Amazon S3 and Amazon Glacier | AWS Public Sector...
Deep Dive on Object Storage: Amazon S3 and Amazon Glacier | AWS Public Sector...
 
An Overview of AWS Services for Data Storage and Migration - SRV205 - Atlanta...
An Overview of AWS Services for Data Storage and Migration - SRV205 - Atlanta...An Overview of AWS Services for Data Storage and Migration - SRV205 - Atlanta...
An Overview of AWS Services for Data Storage and Migration - SRV205 - Atlanta...
 
SRV403 Deep Dive on Object Storage: Amazon S3 and Amazon Glacier
SRV403 Deep Dive on Object Storage: Amazon S3 and Amazon GlacierSRV403 Deep Dive on Object Storage: Amazon S3 and Amazon Glacier
SRV403 Deep Dive on Object Storage: Amazon S3 and Amazon Glacier
 
Backup & Recovery - Optimize Your Backup and Restore Architectures in the Cloud
Backup & Recovery - Optimize Your Backup and Restore Architectures in the CloudBackup & Recovery - Optimize Your Backup and Restore Architectures in the Cloud
Backup & Recovery - Optimize Your Backup and Restore Architectures in the Cloud
 
AWS Storage and Data Migration
AWS Storage and Data MigrationAWS Storage and Data Migration
AWS Storage and Data Migration
 
ENT306 Migrating Large Scale Data Sets to the Cloud
ENT306 Migrating Large Scale Data Sets to the CloudENT306 Migrating Large Scale Data Sets to the Cloud
ENT306 Migrating Large Scale Data Sets to the Cloud
 
Moving Data into the Cloud with AWS Transfer Services - May 2017 AWS Online ...
Moving Data into the Cloud with AWS Transfer Services  - May 2017 AWS Online ...Moving Data into the Cloud with AWS Transfer Services  - May 2017 AWS Online ...
Moving Data into the Cloud with AWS Transfer Services - May 2017 AWS Online ...
 
Building Hybrid Cloud Storage Architectures with AWS @scale
Building Hybrid Cloud Storage Architectures with AWS @scaleBuilding Hybrid Cloud Storage Architectures with AWS @scale
Building Hybrid Cloud Storage Architectures with AWS @scale
 
Migrating Large Scale Datasets
Migrating Large Scale DatasetsMigrating Large Scale Datasets
Migrating Large Scale Datasets
 
Strategic Uses for Cost Efficient Long-Term Cloud Storage
Strategic Uses for Cost Efficient Long-Term Cloud StorageStrategic Uses for Cost Efficient Long-Term Cloud Storage
Strategic Uses for Cost Efficient Long-Term Cloud Storage
 
Best Practices for Building a Data Lake with Amazon S3 - August 2016 Monthly ...
Best Practices for Building a Data Lake with Amazon S3 - August 2016 Monthly ...Best Practices for Building a Data Lake with Amazon S3 - August 2016 Monthly ...
Best Practices for Building a Data Lake with Amazon S3 - August 2016 Monthly ...
 
ENT306 Migrating large Scale Data Sets to the Cloud
ENT306 Migrating large Scale Data Sets to the CloudENT306 Migrating large Scale Data Sets to the Cloud
ENT306 Migrating large Scale Data Sets to the Cloud
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWS
 

More from Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Recently uploaded (20)

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 

AWS re:Invent 2016: Strategic Planning for Long-Term Data Archiving with Amazon Glacier (STG209)

  • 1. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Henry Zhang, Senior Product Manager, AWS Rich Sutton, VP of Engineering, Digital Risk, Proofpoint November 30, 2016 STG209 Strategic Planning for Long-Term Data Archiving with Amazon Glacier
  • 2. AWS storage maturity Amazon EFS File Amazon Elastic Block Store Amazon EC2 Instance Store Block Amazon S3 Amazon Glacier Object Data Transfer AWS Direct Connect AWS Snowball ISV Connectors Amazon Kinesis Firehose Amazon S3 Transfer Acceleration AWS Storage Gateway
  • 3. • Media distribution backbone (Ve.nue platform) • Over-The-Top (OTT) broadcast service • 20PBs of media assets, 800,000 hours of high-res content • Assets to be archived and retained for decades Video archives
  • 4. Patient data–Philips Healthcare • HealthSuite digital platform powered by AWS • 15 petabytes of patient data • Archived for decades (beyond the lifetime of patients) • Uses AWS HIPAA-eligible services in the BAA
  • 5. Public sector–King County • Most populous county in Washington state • Replaced tape solution for backup from 17 agencies • Meets compliance requirement • Saved $1MM in first year; no more tape refresh or management churn
  • 6. Archive: Data retained for the long term, for compliance or potential future reference Data archiving needs are growing everywhere • Media assets, 4K, 8K • Health care/life sciences • Financial services • Regulated industries • Oil and gas/geospatial • Digital preservation • Long-term backups • Logs
  • 7. Consideration 1 – Total Archive Cost
  • 8. Traditional archiving approaches • Tape libraries, robots, drives, media • Onsite (online and offline) • Offsite tape out/vaulting • Specialized software and personnel • Tape refresh every 3-5 years
  • 9. How can AWS help with your archival? Metered usage: Pay as you go No capital investment No commitment No risky capacity planning Avoid risks of physical media handling Control your geographic locality for performance and compliance
  • 10. 1 PB raw storage 800 TB usable storage 600 TB allocated storage 400 TB application data Storage pricing - pay only for what you use AWS Cloud Storage Amazon Glacier starts at $0.004/GB/month Price drop by 43% on 11/21
  • 11. Consideration 2 – Durability
  • 12. 99.999999999% Durability Durability for long-term preservation Built-in Fixity Checking Automatic recovery
  • 13. Consideration 3 – Accessibility
  • 14. Amazon Glacier – Data Retrieval Tiers Standard Retrieval • Current model • 3-5 hours • Disaster Recovery Bulk Retrieval • Batch/Bulk access • 5-12 hours • PB scale re-transcoding or video/image analysis Expedited Retrieval • Emergency access • 1-5 minutes • Last minute play-out schedule swap $0.03/GB $0.01/GB $0.0025/GB On-site tape replacement Off-site tape replacement
  • 15. Consideration 4 - Application & Data Management
  • 16. Amazon Glacier – 3 ways to Access •Direct Glacier API/SDK •S3 lifecycle integration •Third party tools and gateways
  • 17. Amazon Glacier – Direct access/APIs Create Vault Configure Access Upload Archives Register Archive ID Data Upload Initiate Retrieval Async Retrieval Completion Completion Notification Download Data Data Retrieval
  • 18. Use Glacier via S3 Object Lifecycle S3 Standard Active data Archive dataInfrequently accessed data S3 - Infrequent Access Amazon Glacier Synchronous access Async accessSynchronous access $0.023/GB/mo. $0.004/GB/mo.$0.0125/GB/mo.
  • 19. - Transition Standard to Standard-IA - Transition Standard-IA to Amazon Glacier - Transition based on object tags - Expiration and versioning Data lifecycle management T T+3 days T+5 days T+ 15 days T + 25 days T + 30 days T + 60 days T + 90 days T + 150 days T + 250 days T + 365 days Data access frequency over time
  • 20. Transition older videos to Standard-IA
  • 21. Save money on storage 45% saving over S3 Standard 44% saving over S3 Standard-IA * Assumes the highest public pricing tier
  • 22. Amazon Glacier – Third-party tools and gateways • Consumer grade: less than $50 • Example: Cloudberry, FastGlacier, Arq (Haystack Software) • Small / medium business: $500 - $1,000 • Example: Synology, Veeam, QNap • Enterprise gateway and data management software • Example: NetApp AltaVault, CommVault, StorNext, Vidispine
  • 23. Which option should I choose? • Use S3 lifecycle managed Amazon Glacier if the S3 object keys are sufficient for index/search capability • Use Amazon Glacier directly if you already plan to store more metadata/indices in a database • Use 3rd party tools to minimize coding • Does the tool write data in proprietary or native format in AWS?
  • 24. corporate data center Media Archive and Metadata (cloud transition) Onsite Archive Offsite Tape Archive Hierarchical Storage Manager Metadata (Asset Manager) Processing Tasks On-Premise Tape
  • 25. Onsite Archive Hierarchical Storage Manager Metadata (Asset Manager) Processing Tasks corporate data center AWS Region Amazon Glacier Cloud DAM (Syncing Metadata from on-prem) Amazon Direct Connect Offsite Tape ArchiveOn-Premise Tape Media Archive (transition to the cloud)
  • 26. Onsite Archive Hierarchical Storage Manager Metadata (Asset Manager) Processing Tasks corporate data center AWS Region Amazon Glacier Cloud DAM (Syncing Metadata from on- prem) Amazon S3 Cloud Based Processing Tasks Amazon Direct Connect On-Premise Tape Offsite Tape Archive Media Archive (transition to the cloud)
  • 27. Onsite Archive Hierarchical Storage Manager Metadata (Asset Manager) Processing Tasks corporate data center AWS Region Amazon Glacier Cloud DAM (Syncing Metadata from on- prem) Amazon S3 Cloud Based Processing Tasks Amazon Direct Connect Onsite Cache Offsite Tape ArchiveOn-Premise Tape Media Archive (transition to the cloud)
  • 28. Consideration 5 - Compliance and Retention
  • 29. Amazon Glacier Vault Lock allows you to easily set compliance controls on individual vaults and enforce them via a lockable policy Time-based retention MFA authentication Controls govern all records in a vault Immutable policy Two-step locking Compliance storage with Vault Lock
  • 30. Vault Lock for compliance storage • Non-overwrite, non-erasable records • Time-based retention with “ArchiveAgeInDays” control • Policy lockdown (strong governance) • Legal hold with vault-level tags • Configure optional designated third-party access and grant temporary access
  • 31. Amazon Glacier received a third-party assessment from Cohasset Associates on how Amazon Glacier with Vault Lock can be used to meet the requirements of SEC Rule 17a-4(f) and CFTC 1.31(b)-(c).
  • 32. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Rich Sutton, VP of Engineering Digital Risk, Social Media Security, and Compliance Proofpoint SocialPatrol Archive AWS Glacier and Vault Lock Use Case
  • 33. Proofpoint • Cloud-based security and compliance for the enterprise: threat research, email, mobile, social, digital risk • Founded 2002, public in 2012 • $350M annual revenue, $3B market cap • Huge AWS user
  • 34. Proofpoint SocialPatrol Policy controls and enforcement for social • Combats fraudulent brand impersonation • Moderates content at scale • Ensures compliance in publishing • Integrates with social APIs • 150+ classifiers using NLP and ML • Text, links, images, meta data • Ingesting >1M social posts per day • Built in AWS
  • 35. Proofpoint SocialPatrol How it works: PFPT in AWS Policy engine MySQL/C*/Solr Enterprise Archive “Awesome. Help me with retention by integrating with my existing email archive.” Social
  • 36. Proofpoint SocialPatrol archiving integration Imperfect … Social != Email Every archive is different Requires internal collaboration
  • 37. Proofpoint SocialPatrol Archive SEC Rule 17a-4(f)-compliant archive, purpose-built for social, enabled by Amazon Glacier and Vault Lock PFPT in AWS Policy engine MySQL/C*/SolrSocial Amazon Glacier & Vault Lock
  • 38. Proofpoint SocialPatrol Archive The customer specifies the retention period in Proofpoint Social:
  • 39. Proofpoint SocialPatrol Archive Via AWS API we create a vault for that customer:
  • 40. Proofpoint SocialPatrol Archive Via AWS API, we lock the vault, and specify policy to observe a legal hold via a tag.
  • 41. Proofpoint SocialPatrol Archive As social content flows in, we record its purge date and surface that to the user. Each piece of social content is an archive in the vault.
  • 42. Proofpoint SocialPatrol Archive Search UI uses the copy of the data we already had. As archives expire, we purge them.
  • 43. Proofpoint SocialPatrol Archive • Legal hold can be put in place by Proofpoint Support • Data can be exported from Amazon Glacier by Proofpoint Support when necessary • Amazon Glacier with Vault Lock allowed us to build a product that complies with SEC Rule 17a-4(f) and CFTC Rule 1.31(b)-(c) What would it have cost for us to build a WORM data store, get it certified, and scale it … ?
  • 44. Snowball Edge • Accelerate PBs with AWS- provided appliances • NEW 100 TB model with compute Storage Gateway • Instant hybrid cloud • Up to 120 MB/s cloud upload rate (4x improvement) Data ingestion into AWS storage services Firehose • Ingest data streams directly into AWS data stores Direct Connect • COLO to AWS ISV Connectors • Commvault • Veritas • etcetera NEW S3 Transfer Acceleration • Accelerate object transfer up to 300% using AWS’s private network
  • 45. Related Sessions STG302 - Deep Dive on Amazon Glacier STG210 - Simplified Data Center Migration—Lessons Learned by Live Nation STG312 - Workshop: Working with AWS Snowball - Accelerating Data Ingest into the Cloud
  • 46. Related Sessions STG302 - Deep Dive on Amazon Glacier STG210 - Simplified Data Center Migration—Lessons Learned by Live Nation STG312 - Workshop: Working with AWS Snowball - Accelerating Data Ingest into the Cloud