More Related Content Similar to Big Data and Analytics Workloads on Amazon EFS - AWS Online Tech Talks (20) More from Amazon Web Services (20) Big Data and Analytics Workloads on Amazon EFS - AWS Online Tech Talks1. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Webinar
Darryl S. Osborne
Solutions Architect – AWS File Services
Big Data and Analytics Workloads
on
Amazon EFS
Joe Disher
Sr. Product Marketing Manager – Amazon EFS - AWS
2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
Big Data and Analytics Workloads on Amazon EFS
Phase 1:
Choose Storage
Platform
Phase 2:
Big Data and
Analytics
Defined
Phase 3:
Big Data and
Analytics
Workloads
Phase 4:
Wrapping Up
3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
Phase 1:
Choose Storage Platform
4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
What do you think about when
choosing a storage solution?
EconomicsStorage type Features
and performance
5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
Three types of storage
File ObjectBlock
6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
Three types of storage
File
Data stored as files in a
directory hierarchy
Shared over a network
7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
Three types of storage
Block
Data stored as blocks on a
disk or disks
Locally attached
8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
Three types of storage
Object
Data is stored as an object that’s
identified by a key in a flat space
Simple API to get and put data
based on key
9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
Why is file storage so popular?
Works natively with operating systems
Provides shared access while providing consistency guarantees and
locking functionality
Provides hierarchical namespace
10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
How does performance compare
File
Object
Block
Latency
Throughput
11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
Before Amazon EFS… DIY file storage costs
AZ-a
Clients
Storage
volumes
AZ-b
File server
Storage
volumes
File server
12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
Amazon EFS
A fully managed file service
13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
Key features of Amazon EFS
Simple
14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
Key features of Amazon EFS
Elastic
15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
Key features of Amazon EFS
Scalable
16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
Key features of Amazon EFS
Highly available and durable
17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
Key features of Amazon EFS
Simple ScalableElastic
Highly available and durable
18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
Security model & security features
19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
Security model & security features
Control
network traffic
using Amazon VPC security
groups and network ACLs
20. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
Security model & security features
Control
network traffic
using Amazon VPC security
groups and network ACLs
Control file and
directory access
using POSIX
permissions
21. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
Security model & security features
Control
network traffic
using Amazon VPC security
groups and network ACLs
Control file and
directory access
using POSIX
permissions
Control administrative
access (API access)
using AWS IAM
(action-level and
resource-level
permissions)
22. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
Security model & security features
Control
network traffic
using Amazon VPC security
groups and network ACLs
Control file and
directory access
using POSIX
permissions
Control administrative
access (API access)
using AWS IAM
(action-level and
resource-level
permissions)
Encryption
of data
at rest and
in transit
23. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
Where is Amazon EFS available today?
• US West (Oregon)
• US West (N. California)
• US East (N. Virginia)
• US East (Ohio)
• EU (Ireland)
• EU (Frankfurt)
• Asia Pacific (Sydney)
• Asia Pacific (Seoul)
More coming soon!
24. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
Amazon EFS economics
$0.30/GB-Month (US N. Virginia, Ohio, Oregon)
$0.33/GB-Month (US N. California, EU Ireland, AP Seoul)
$0.36/GB-Month (EU Frankfurt, AP Sydney)
No minimum commitments
or up-front fees
No need to provision
storage in advance
No other fees, charges,
or billing dimensions
25. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
Before Amazon EFS… DIY file storage costs
AZ-a
Clients
Storage
volumes
AZ-b
File server
Storage
volumes
File server
Amazon EC2
instance costs
Inter-AZ data
transfer costs
Amazon EBS
volume costs
26. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
TCO example
Amazon EFS cost: (500 GB * $0.30/GB-month*) = $150 per month
For DIY, you might provision 600 GB of Amazon EBS (i.e., ~85% utilization):
Storage (2x 600 GB EBS gp2 volumes): $120 per month
Compute (2x m4.xlarge instances): $290 per month
Inter-AZ data transfer costs (est.): $130 per month
Total $540 per month
For storing 500 GB, Amazon EFS is 70% less than DIY
* US N. Virginia pricing
27. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
Designed for a wide spectrum of needs
Scale-out jobs Metadata-intensive jobs
Analytics
Media
workflows
Enterprise apps and messaging
Web serving
Content management
Database backups
Container storage
Dev tooling
Home
directories
Low latency and serial I/OHigh throughput and parallel I/O
28. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
Amazon EFS customers and partners
New York
University
University of
Pennsylvania
Cornell
University
29. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
Phase 2:
Big Data and Analytics Defined
30. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
What is Big Data and analytics?
31. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
What is Big Data and analytics?
Volume
TiBs & PiBs
32. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
What is Big Data and analytics?
Volume
TiBs & PiBs
Variety
Web logs, social media
interactions, transactions
33. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
What is Big Data and analytics?
Volume
TiBs & PiBs
Velocity
Collected, stored,
processed, analyzed within
a relatively short window
Variety
Web logs, social media
interactions, transactions
34. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
What is the common data flow of Big Data?
35. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
What is the common data flow of Big Data?
1. Collect
• Raw data
36. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
What is the common data flow of Big Data?
1. Collect
• Raw data
2. Store
• Secure, scalable, durable
• Pre, post, temp
37. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
What is the common data flow of Big Data?
1. Collect
• Raw data
2. Store
• Secure, scalable, durable
• Pre, post, temp
3. Process and analyze
• Transform
38. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
What is the common data flow of Big Data?
1. Collect
• Raw data
2. Store
• Secure, scalable, durable
• Pre, post, temp
3. Process and analyze
• Transform
4. Consume and visualize
• Tools to explore data
39. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
Phase 3:
Big Data and Analytics Workloads
40. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
Big Data workloads
Run custom code for virtually
any type of data application
Easily run and scale big data
processing frameworks on
managed clusters
Use Amazon EFS as a
durable, decoupled, and
secure file system accessible
to all nodes in the cluster
41. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
Writing objects in parallel into Amazon EFS
43. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
Informatica PowerCenter Quick Start
https://aws.amazon.com/quickstart/architecture/informatica-powercenter/
44. Webinar
SAS Grid
SAS Grid is a analytics
computing environment
Use Amazon EFS to share
bootstrap information
and shared storage
among all the machines
in the grid
45. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
SAS Grid whitepaper
https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-
proceedings/2018/1866-2018.pdf
46. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
SAS Grid recommendations when using EFS
• If placing SAS Home and SAS Configuration files on an
Amazon EFS file system, disable file locks in the
sasv9.cfg file for the home directory
“-filelocks <HOME> none”
• Where <HOME> is the directory containing the SAS
Home and SAS Configuration files
• Any one particular file can have up to 87 locks across all
users of the file system
47. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
SAS Grid recommendations when using EFS
• Use the appropriate sized Amazon EC2 instance for
your workload – keep in mind network performance
• i3.8xlarge instances are recommended by SAS due to CPU,
memory, instance store (temp space), and consistent 10
Gbps network performance
• Maximum throughput each Amazon EC2 instance can
drive a single file system is 250 MB/s
• Mount multiple file systems to achieve greater than
250 MB/s throughput to EFS
48. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
SAS Grid performance results from one node
During SAS POC we run a SAS simulation load from one i3.8xlarge instance with 8
EFS file systems mounted
49. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
SAS Grid Quick Start
https://aws.amazon.com/quickstart/architecture/sas-grid-infrastructure/
50. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
Wrapping up
51. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
Best practices using Amazon EFS
Linux
kernel 4.0+
EFS mount
helper
(NFSv4.1)
Multiple
instances
Large
IO size
(aggregate IO)
Multiple
threads
Multiple
directories
Monitor
metrics
52. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
Informatica PowerCenter Quick Start
https://aws.amazon.com/quickstart/architecture/informatica-powercenter/
53. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
SAS Grid Quick Start
https://aws.amazon.com/quickstart/architecture/sas-grid-infrastructure/