Successfully reported this slideshow.
Your SlideShare is downloading. ×

Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad

Check these out next

1 of 78 Ad

Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

Download to read offline

Organizations face significant challenges moving their applications to the cloud when they require a standard file system interface for accessing their cloud data. In this technical session, we will explore the world’s first cloud-scale file system and its targeted use cases. Attendees will learn about the Amazon Elastic File System (EFS) features and benefits, how to identify applications that are appropriate for use with Amazon EFS, and details about its performance and security models. We will highlight and demonstrate how to deploy Amazon EFS in one of our most common use cases and will share tips for success throughout.

Learning Objectives:
• Recognize why and when to use Amazon EFS
• Understand key technical/security concepts
• Learn how to leverage EFS’s performance
• See a demo of EFS in action
• Review EFS’s economics

Organizations face significant challenges moving their applications to the cloud when they require a standard file system interface for accessing their cloud data. In this technical session, we will explore the world’s first cloud-scale file system and its targeted use cases. Attendees will learn about the Amazon Elastic File System (EFS) features and benefits, how to identify applications that are appropriate for use with Amazon EFS, and details about its performance and security models. We will highlight and demonstrate how to deploy Amazon EFS in one of our most common use cases and will share tips for success throughout.

Learning Objectives:
• Recognize why and when to use Amazon EFS
• Understand key technical/security concepts
• Learn how to leverage EFS’s performance
• See a demo of EFS in action
• Review EFS’s economics

Advertisement
Advertisement

More Related Content

Slideshows for you (20)

Viewers also liked (20)

Advertisement

Similar to Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks (20)

More from Amazon Web Services (20)

Advertisement

Recently uploaded (20)

Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks

  1. 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Edward Naim, Head of Product, Amazon EFS Darryl Osborne, Storage Specialist Solutions Architect David Green, Enterprise Solutions Architect February 23rd, 2017 Deep Dive on Amazon EFS
  2. 2. Learn why and when to use Amazon EFS Understand key technical & security concepts Discover how to leverage EFS’s performance See EFS in action: Hands-on demos Review EFS’s economics Answer your questions (Q&A) What to expect from this webinar
  3. 3. Why & When to Use Amazon EFS
  4. 4. Cloud Data Migration Direct Connect Snow* data transport family 3rd Party Connectors Transfer Acceleration Storage Gateway Kinesis Firehose AWS Storage Platform and SolutionsThe AWS Storage Portfolio Object Amazon GlacierAmazon S3 Block Amazon EBS (persistent) Amazon EC2 Instance Store (ephemeral) File Amazon EFS
  5. 5. Amazon EFS attributes 1) Standard file system interface & semantics 2) Shared storage 3) Highly available 4) Highly durable 5) Consistent, low latencies 6) Scalable (storage & throughput) 7) Elastic capacity 8) Fully managed
  6. 6. We focused on changing the game Simple Elastic Scalable 1 2 3 Highly durable Highly available
  7. 7. Amazon EFS is Simple • Fully managed - No hardware, network, file layer - Create a scalable file system in seconds! • Seamless integration with existing tools and apps - NFS v4.1—widespread, open - Standard file system access semantics - Works with standard OS file system APIs • Simple pricing = simple forecasting 1
  8. 8. Amazon EFS is Elastic • File systems grow and shrink automatically as you add and remove files • No need to provision storage capacity or performance • You pay only for the storage space you use, with no minimum fee 2
  9. 9. • File systems can grow to petabytes of capacity • Throughput scales automatically as file systems grow • Consistent low latencies regardless of file system size • Support for thousands of concurrent NFS connections Amazon EFS is Scalable 3
  10. 10. • Every file system object is redundantly stored across multiple Availability Zones in a Region • Designed to sustain Availability Zone offline conditions • Superior to traditional NAS availability models • Appropriate for production/tier 0 applications High Durability & High Availability
  11. 11. In which Regions can I use EFS today? • US West (Oregon) • US East (N. Virginia) • US East (Ohio) • EU (Ireland) More coming soon!
  12. 12. Do you need an EFS file system? If you have an application (EC2 or on-premises) or use case that requires a file system AND • Requires multi-attach OR • GBs/s throughput OR • Multi-AZ availability/durability OR • Requires automatic scaling (grow/shrink) of storage
  13. 13. What customers are using EFS for today Web serving Content management Analytics Media and Entertainment workflows Workflow management Home directories Container storage Database backups
  14. 14. Understand Key Technical and Security Concepts
  15. 15. What is a file system? • The primary resource in EFS • Where you store files and directories • Can create 125 file systems per account
  16. 16. What is a mount target? • To access your file system within a VPC, you create mount targets in the VPC • A mount target is an NFS endpoint that lives in your VPC • A mount target has an IP address and a DNS name you use in your mount command • A mount target is highly available AVAILABILITY ZONE 1 REGION AVAILABILITY ZONE 2 AVAILABILITY ZONE 3 VPC EC2 EC2 EC2 EC2 Mount target
  17. 17. How to access a file system from an instance • You “mount” a file system on an Amazon EC2 instance (standard command) — the file system appears like a local set of directories and files • An NFS v4.1 client is standard on Linux distributions mount –t nfs4 –o nfsvers=4.1 [file system DNS name]:/ /[user’s target directory]
  18. 18. How does it all fit together? AVAILABILITY ZONE 1 REGION AVAILABILITY ZONE 2 AVAILABILITY ZONE 3 VPC EC2 EC2 EC2 EC2 File system Data can be accessed from any AZ in the Region while maintaining full consistency
  19. 19. Several security mechanisms  Control network traffic to and from file systems (mount targets) by using VPC security groups and network ACLs  Control file and directory access by using POSIX permissions  Control administrative access (API access) to file systems by using AWS Identity and Access Management (IAM)  EFS supports action-level and resource-level permissions
  20. 20. Access your EFS file system via AWS Direct Connect Direct Connect EFS in your Amazon VPCOn-premises servers
  21. 21. Direct Connect support addresses three of four hybrid scenarios Bursting Migration Tiering Backup / DR
  22. 22. Learn How to Leverage EFS’s Performance
  23. 23. Amazon EFS is designed for wide spectrum of performance needs High throughput and parallel I/O Low latency and serial I/O Genomics Big data analytics Scale-out jobs Home directories Content management Web serving Metadata-intensive jobs
  24. 24. Choose the performance mode best suited to your workload Mode What’s it for? Advantages Tradeoffs When to use General purpose (default) Latency-sensitive applications and general-purpose workloads Lowest latencies for file operations Limit of 7,000 ops/sec Best choice for most workloads Max I/O Large-scale and data- heavy applications Virtually unlimited ability to scale out throughput/IOPS Slightly higher latencies Consider if 10s (or more) instances access your file system concurrently
  25. 25. Use the PercentIOLimit CloudWatch metric to determine if you’re constrained by General Purpose mode
  26. 26. Amazon EFS has a distributed data storage design EC2 EC2 … EC2 EC2 … EC2 EC2 … • File systems distributed across unconstrained number of servers • Avoids bottlenecks/constraints of traditional file servers • Enables high levels of aggregate IOPS/throughput • Data also distributed across Availability Zones (durability, availability)
  27. 27. How to think about EFS perf relative to EBS Amazon EFS Amazon EBS PIOPS Performance Per-operation latency Low, consistent Lowest, consistent Throughput scale Multiple GBs per second Single GB per second Characteristics Data availability / durability Stored redundantly across multiple AZs Stored redundantly in a single AZ Access 1 to 1000s of EC2 instances, from multiple AZs, concurrently Single EC2 instance in a single AZ Use cases Big Data and analytics, media processing workflows, content management, web serving, home directories Boot volumes, transactional and NoSQL databases, data warehousing & ETL
  28. 28. An implication of per-operation latency: I/O size impacts throughput of serialized operations 4 KB 32 KB 256 KB 2 MB 16 MB I/O size Throughput
  29. 29. How to take advantage of EFS’s distributed architecture: Parallelize Parallelize via multiple threads and/or multiple instances 0 5000 10000 15000 20000 25000 30000 0 20 40 60 80 100 120 140 160 IOPS # of Total Threads Aggregate IOPS of parallel writes using 10 m4.xlarge instances
  30. 30. Use CloudWatch for a number of views of file system performance DataReadIOBytes DataWriteIOBytes MetadataIOBytes TotalIOBytes Measure throughput (‘Sum’ of bytes divided by seconds in time period) or ops/sec (‘Data Samples’ divided by seconds in time period) BurstCreditBalance Monitor your burst credit usage over time to ensure sufficient throughput capacity PermittedThroughput Compare to actual throughput to determine whether you’re being constrained by the burst model ClientConnections View the number of clients connected to your file system PercentIOLimit Determine whether you’re being constrained by General Purpose mode (PercentIOLimit at or near 100%)
  31. 31. Recommended kernel version and NFS mount options Kernel version  Use Linux kernel 4.0+ (e.g., Amazon Linux 2016.03.0, Ubuntu 15.10 or 16.04) Mount options  Mount via NFSv4.1  Specify 1MB read/write buffers (“rsize”/”wsize”)  Ensure operations are asynchronous Recommend the following mount options: -o nfsvers=4.1, rsize=1048576,wsize=1048576,hard, timeo=600,retrans=2,async
  32. 32. See EFS in Action: Move Data
  33. 33. Goal: Move Data Quickly!!
  34. 34. Two Scenarios:
  35. 35. Transferring media assets to EFS • Size ranges from a few GB to 100+GB per file • Data sources: • Amazon S3 • Amazon EBS
  36. 36. Transferring many small files to EFS • Size ranges from 64K to 256K • Data sources: • Amazon S3 • Amazon EBS
  37. 37. Serial vs Parallel
  38. 38. Serial file transfer
  39. 39. Parallel file transfer
  40. 40. How do we do this?
  41. 41. GNU parallel • Tool for executing jobs in parallel • Similar to xargs • Replace loops in shell scripts • GNU parallel makes sure output from the commands is the same output as you would get if you had run the commands sequentially https://www.gnu.org/software/parallel/ For people who live life in the parallel lane
  42. 42. Use parallel threads – GNU parallel # Create destination directory tree from source find . -type d -print0 | parallel -j $N_THREADS -0 "mkdir -p ${DST_DIR}/{}" > /dev/null 2>&1 # Copy files find . ! ( -type d ) -print0 | parallel -j $N_THREADS -0 "cp - f {} ${DST_DIR}/{}"
  43. 43. Optimizing Transfers
  44. 44. Monitoring performance • Data-driven results • Repeatable outcomes • Optimize for costs
  45. 45. Benchmark different instance types • Determine the optimal instance size • What is best? T2, C3, C4, M3, M4, R3, X? • Transfer test set of 1000 small files • Increase thread count from 1-1024 concurrent threads
  46. 46. Tools • Command orchestration • Instance configuration • Log collection • Visualization • Instance performance
  47. 47. Test Results – Large Files
  48. 48. Large Files: Four Instances
  49. 49. Large Files: Four Instances
  50. 50. Adding Additional Instances
  51. 51. Large File: 50 Instances
  52. 52. Test Results – Small Files
  53. 53. Small File Performance - Instance Family Test ~200 threads
  54. 54. c3.large – 5,342 files per minute @ 200 threads
  55. 55. Increase Instance Count • Using optimal instance size • c3.large • Using optimal thread counts • ~200 per instance • Increase instance count • 300 instances • Optimize for costs • EC2 Spot Market
  56. 56. EC2 Spot
  57. 57. c3.large – 300 instances
  58. 58. Summary / tl;dr
  59. 59. Results Small files – 300 instancesLarge files – 50 instances
  60. 60. Demo
  61. 61. Summary / tl;dr • Parallelize everything • Threads • Instances • Test, test, test • Capture & analyze test data • Less than $5/hr for 300 instances
  62. 62. See EFS in Action: Web Serving
  63. 63. Content Management & Web Serving Web-based applications for creating and managing website content. wikis blogs discussion boards
  64. 64. Free and open-source content management system hosted on a web platform Web software to create beautiful websites, blogs, or apps “Free and priceless at the same time” – WordPress.org CODE IS POETRY
  65. 65. 27% of all websites (November 2016) – Web Technology Surveys Easiest and most popular blogging system in use on the Web – CMS Usage Statistics Supporting more than 60 million websites – Forbes WordPress is Popular
  66. 66. Available as.. • Managed Web Hosting Service • Software package from WordPress.org installed on self- provisioned web platform… like AWS How are people running WordPress today?
  67. 67. Structured data (Posts, pages, comments, categories, tags, etc.) Amazon EFSUnstructured data (directories, php files, config, themes, plugins, etc.) Amazon RDS Amazon EC2Web Server (Amazon Linux, Apache, PHP, OPCache)
  68. 68. WordPress Demo
  69. 69. Reference Architecture https://aws.amazon.com/architecture/ Coming Soon
  70. 70. Economics
  71. 71. Simple and predictable pricing • With Amazon EFS, you pay only for the storage space you use  No minimum commitments or up-front fees  No need to provision storage in advance  No other fees, charges, or billing dimensions • EFS price: $0.30/GB-month (US Regions) $0.33/GB-month (EU Ireland)
  72. 72. AVAILABILITY ZONE 1 REGION EC2 AVAILABILITY ZONE 2 AVAILABILITY ZONE 3 EC2 Compute nodes to manage 3rd-party file system layer EBS Replicated storage volumes EBS Inter-AZ traffic for replication Typical multi-AZ file system setup without EFS EC2 NFS client accessing file system NFS
  73. 73. TCO example Let’s say you need to store ~500 GB and require high availability and durability Using a shared file layer on top of EBS, you might provision 600 GB (with ~85% utilization) and fully replicate the data to a second Availability Zone for availability/durability Example comparative cost: Storage (2x 600 GB EBS gp2 volumes): $120 per month Compute (2x m4.xlarge instances): $350 per month Inter-AZ data transfer costs (est.): $129 per month Total $599 per month EFS cost is (500GB * $0.30/GB-month) = $150 per month, with no additional charges
  74. 74. Summary
  75. 75. Key Recommendations • Test your application! • Use General Purpose mode for lowest latency, Max-I/O for scale-out • Use Linux kernel version 4.0 or newer, mount via NFSv4.1 • To optimize, look for opportunities to: • Aggregate I/O • Perform async operations • Parallelize (demo later) • Cache (demo later) • Don’t forget to check your burst credit earn/spend rate when testing – ensure sufficient amount of storage
  76. 76. Coming Soon: Encryption of data at rest • Integrated with AWS Key Management Service • Encryption/decryption handled transparently • No extra cost
  77. 77. Additional Resources Amazon EFS Site - https://aws.amazon.com/efs/ Amazon EFS User Guide - https://docs.aws.amazon.com/efs/latest/ug/whatisefs.html AWS 10-Minute Tutorials - https://aws.amazon.com/getting-started/tutorials/ Reference Architecture - WordPress on EFS coming soon - https://aws.amazon.com/architecture/ qwikLABS - https://aws.qwiklabs.com/ YouTube: Amazon Web Services Channel
  78. 78. Thank you!

×