Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data Storage for the Long Haul: Compliance and Archive


Published on

This session is for IT pros working with compliance managers to deliver solutions that lower costs and still meet compliance demands. You will learn how to move large scale data stores to the cloud, while remaining compliant with existing regulations. Services mentioned: S3, Glacier and the Vault Lock feature, Snowball, ingestion services.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Data Storage for the Long Haul: Compliance and Archive

  1. 1. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Henry Zhang, Senior Product Manager, Amazon Glacier August 11, 2016 Data Storage for the Long Haul: Compliance and Archive
  2. 2. AWS storage maturity Amazon EFS File Amazon Elastic Block Store Amazon EC2 Instance Store Block Amazon S3 Amazon Glacier Object Data Transfer AWS Direct Connect AWS Snowball ISV Connectors Amazon Kinesis Firehose Amazon S3 Transfer Acceleration AWS Storage Gateway
  3. 3. Audio archives–SoundCloud • World’s leading social sound platform • Audio files transcoded and stored in multiple formats • Stores petabytes (PBs) of data • Transcoded files served from S3 • Originals moved to Amazon Glacier for long-term retention
  4. 4. • Media distribution backbone (Ve.nue platform) • Over-The-Top (OTT) broadcast service • PBs of media assets • Assets to be archived and retained for decades Video archives ̶
  5. 5. Patient data–Philips Healthcare • HealthSuite digital platform powered by AWS • 15 petabytes of patient data • Archived for decades (beyond the lifetime of patients) • Uses AWS HIPAA-eligible services in the BAA
  6. 6. Public sector–King County • Most populous county in Washington state • Replaced tape solution for backup from 17 agencies • Meets compliance requirement • Saved $1MM in first year; no more tape refresh or management churn
  7. 7. Archive: Data retained for the long term, for compliance or potential future reference Data archiving needs are growing everywhere • Media assets, 4K, 8K • Health care/life sciences • Financial services • Regulated industries • Oil and gas/geospatial • Digital preservation • Long-term backups • Logs
  8. 8. Traditional archiving approaches • Storage arrays/disk arrays • Tape silos/tape libraries • Tape drives (LTO-X/DLT/etc.) • Virtual tape libraries (VTLs) • Tape out/vaulting • Specialized software and personnel
  9. 9. How can AWS help with your archival? Metered usage: Pay as you go No capital investment No commitment No risky capacity planning Avoid risks of physical media handling Control your geographic locality for performance and compliance
  10. 10. Archive Options–Storage Tiers and Data Lifecycle
  11. 11. Object storage options S3 Standard Active data Archive dataInfrequently accessed data S3 Standard - Infrequent Access Amazon Glacier Milliseconds 3-5 hoursMilliseconds $0.03/GB/mo. $0.007/GB/mo.$0.0125/GB/mo.
  12. 12. A closer look: S3-IA and Amazon Glacier S3-IA • Same durability and throughput as S3 Standard • Instant access • $0.01/GB on each data retrieval Amazon Glacier • Same 11 9s durability as S3 Standard • 3-5 hour data retrieval latency • Suitable for cold archive such as offsite tapes S3 Standard - Infrequent Access Amazon Glacier
  13. 13. - Transition Standard to Standard-IA - Transition Standard-IA to Amazon Glacier - Expiration lifecycle policy - Versioning support Data lifecycle management T T+3 days T+5 days T+ 15 days T + 25 days T + 30 days T + 60 days T + 90 days T + 150 days T + 250 days T + 365 days Data access frequency over time
  14. 14. Set up lifecycle policy
  15. 15. Transition older videos to Standard-IA
  16. 16. Archive to S3-IA after 30 days Lifecycle policy Standard Storage->Standard-IA <LifecycleConfiguration> <Rule> <ID>sample-rule</ID> <Prefix>documents/</Prefix> <Status>Enabled</Status> <Transition> <Days>30</Days> <StorageClass>STANDARD-IA</StorageClass> </Transition> <Transition> <Days>365</Days> <StorageClass>GLACIER</StorageClass> </Transition> </Rule> </LifecycleConfiguration>
  17. 17. Archive to Amazon Glacier after 365 days Lifecycle policy Standard Storage->Standard-IA <LifecycleConfiguration> <Rule> <ID>sample-rule</ID> <Prefix>documents/</Prefix> <Status>Enabled</Status> <Transition> <Days>30</Days> <StorageClass>STANDARD-IA</StorageClass> </Transition> <Transition> <Days>365</Days> <StorageClass>GLACIER</StorageClass> </Transition> </Rule> </LifecycleConfiguration> Standard-IA Storage->Amazon Glacier
  18. 18. Save money on storage 58% saving over S3 Standard 44% saving over S3 Standard-IA * Assumes the highest public pricing tier
  19. 19. Example backup software integration • Commvault–Native integration with S3 and Amazon Glacier • Deduplication and encryption • Single-console management Amazon S3 Amazon Glacier
  20. 20. Compliance Use Case 1–Regulatory Retention
  21. 21. Amazon Glacier Vault Lock allows you to easily set compliance controls on individual vaults and enforce them via a lockable policy Time-based retention MFA authentication Controls govern all records in a vault Immutable policy Two-step locking Compliance storage with Vault Lock
  22. 22. Vault Lock for compliance storage • Non-overwrite, non-erasable records • Time-based retention with “ArchiveAgeInDays” control • Policy lockdown (strong governance) • Legal hold with vault-level tags • Configure optional designated third-party access and grant temporary access
  23. 23. Amazon Glacier received a third-party assessment from Cohasset Associates on how Amazon Glacier with Vault Lock can be used to meet the requirements of SEC Rule 17a-4(f) and CFTC 1.31(b)-(c).
  24. 24. Example control: 1-year record retention
  25. 25. Example control: 1-year record retention
  26. 26. Vault Lock: Two-step locking
  27. 27. Legal hold with vault-level tags
  28. 28. Example control: Legal hold
  29. 29. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Rich Sutton, VP of Engineering Digital Risk, Social Media Security, and Compliance Proofpoint SocialPatrol Archive AWS Glacier and Vault Lock Use Case
  30. 30. Proofpoint • Cloud-based security and compliance for the enterprise: threat research, email, mobile, social, digital risk • Founded 2002, public in 2012 • $350M annual revenue, $3B market cap • Huge AWS user
  31. 31. Proofpoint SocialPatrol Policy controls and enforcement for social • Combats fraudulent brand impersonation • Moderates content at scale • Ensures compliance in publishing • Integrates with social APIs • 150+ classifiers using NLP and ML • Text, links, images, meta data • Ingesting >1M social posts per day • Built in AWS
  32. 32. Proofpoint SocialPatrol How it works: PFPT in AWS Policy engine MySQL/C*/Solr Enterprise Archive “Awesome. Help me with retention by integrating with my existing email archive.” Social
  33. 33. Proofpoint SocialPatrol archiving integration Imperfect … Social != Email Every archive is different Requires internal collaboration
  34. 34. Proofpoint SocialPatrol Archive SEC Rule 17a-4(f)-compliant archive, purpose-built for social, enabled by Amazon Glacier and Vault Lock PFPT in AWS Policy engine MySQL/C*/SolrSocial Amazon Glacier & Vault Lock
  35. 35. Proofpoint SocialPatrol Archive The customer specifies the retention period in Proofpoint Social:
  36. 36. Proofpoint SocialPatrol Archive Via AWS API we create a vault for that customer:
  37. 37. Proofpoint SocialPatrol Archive Via AWS API, we lock the vault, and specify policy to observe a legal hold via a tag.
  38. 38. Proofpoint SocialPatrol Archive As social content flows in, we record its purge date and surface that to the user. Each piece of social content is an archive in the vault.
  39. 39. Proofpoint SocialPatrol Archive Search UI uses the copy of the data we already had. As archives expire, we purge them.
  40. 40. Proofpoint SocialPatrol Archive • Legal hold can be put in place by Proofpoint Support • Data can be exported from Amazon Glacier by Proofpoint Support when necessary • Amazon Glacier with Vault Lock allowed us to build a product that complies with SEC Rule 17a-4(f) and CFTC Rule 1.31(b)-(c) What would it have cost for us to build a WORM data store, get it certified, and scale it … ?
  41. 41. Compliance Use Case 2–Auditing and Alerts
  42. 42. Audit logging with AWS CloudTrail • S3 and Amazon Glacier can log API calls for audit via CloudTrail • Enable CloudTrail in the AWS console and designate your log bucket • S3 logs bucket-level activities; object activities supported via event notification • Amazon Glacier logs all API calls for vault and archives
  43. 43. Access policy for a storage container • Control access to a storage container in a single location – S3 bucket or Amazon Glacier vault access policy – Grant/revoke access to internal business units/teams – “Marketing_Vault” has a distinct access policy from “DevOps_Vault” • Easily manage cross-account access for your business partner – Simply add a section for your business partner in the same policy – Cross-account activities (API calls) also show up in CloudTrail logs
  44. 44. S3 event notifications Events Amazon SNS topic Amazon SQS queue AWS Lambda function • Notification when objects are created via PUT, POST, Copy, or Multipart Upload, DELETE • Filtering on prefixes and suffixes for all types of notifications
  45. 45. Request specific notifications Request notifications on specific PUT APIs Request notifications on specific DELETE APIs s3:ObjectCreated:* s3:ObjectCreated:Put s3:ObjectCreated:Post s3:ObjectCreated:Copy s3:ObjectCreated:CompleteMultipartUpload s3:ObjectRemoved:* s3:ObjectRemoved:Delete s3:ObjectRemoved:DeleteMarkerCreated
  46. 46. Compliance Use Case 3–Geographic Redundancy
  47. 47. Remote replicas managed by separate AWS accounts Secure Distribute data to regional customers Lower Latency Store hundreds of miles apart Compliance S3 cross-region replication Automated, fast, and reliable asynchronous replication of data across AWS regions
  48. 48. • Usual charges for storage, requests, and inter-region data transfer for the replicated copy of data • Replicate into Standard-IA or Amazon Glacier Cost HEAD operation on a source object to determine replication status • Replicated objects will not be re-replicated • Use S3 COPY to replicate existing objects Replication status DELETE without object version ID • Marker replicated DELETE specific object version ID • Marker NOT replicated Delete operation Cross-region replication: Details Object ACL updates are replicated • Objects with Amazon- managed encryption key replicated • AWS KMS encryption not replicated Access control
  49. 49. Versioning with cross-region replication A B Vid1- v2 Vid1- v1 Key: A/vid1 Key: B/vid1 Vid1- v2 Vid1- v1 Vid1- v3 Vid1- v3 Vid1- v4 Vid1- v4 A
  50. 50. Cross-region replication with lifecycle archiving S3 Bucket A Amazon Glacier S3 Bucket B
  51. 51. Snowball • Accelerate PBs with AWS- provided appliances • NEW 80 TB model Storage Gateway • Instant hybrid cloud • Up to 120 MB/s cloud upload rate (4x improvement) Data ingestion into AWS storage services Firehose • Ingest data streams directly into AWS data stores Direct Connect • COLO to AWS ISV Connectors • Commvault • Veritas • etcetera NEW S3 Transfer Acceleration • Accelerate object transfer up to 300% using AWS’s private network
  52. 52. What is Snowball? Petabyte-scale data transport E-ink shipping label Ruggedized case “8.5G Impact” All data encrypted end-to-end 50 TB or 80 TB 10 G network Rain & dust resistant Tamper-resistant case & electronics
  53. 53. Pricing Dimension Price Usage Charge per Job $250.00 Extra Day Charge (First 10 days* are free) $15.00 Data Transfer In $0.00/GB Data Transfer Out $0.02/GB Shipping** Varies Amazon S3 Charges Standard storage and request fees apply * Starts one day after the appliance is delivered to you. The first day the appliance is received at your site and the last day the appliance is shipped out are also free and not included in the 10-day free usage time. ** Shipping charges are based on your shipment destination and the shipping option (e.g., overnight, 2-day) you choose. Transfer 1 PB with 13 devices in parallel in 1 week!
  54. 54. Remember to complete your evaluations!
  55. 55. Thank you!