SlideShare a Scribd company logo
1 of 55
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Webinar
Darryl S. Osborne
Solutions Architect – AWS File Services
Big Data and Analytics Workloads
on
Amazon EFS
Joe Disher
Sr. Product Marketing Manager – Amazon EFS - AWS
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
Big Data and Analytics Workloads on Amazon EFS
Phase 1:
Choose Storage
Platform
Phase 2:
Big Data and
Analytics
Defined
Phase 3:
Big Data and
Analytics
Workloads
Phase 4:
Wrapping Up
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
Phase 1:
Choose Storage Platform
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
What do you think about when
choosing a storage solution?
EconomicsStorage type Features
and performance
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
Three types of storage
File ObjectBlock
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
Three types of storage
File
Data stored as files in a
directory hierarchy
Shared over a network
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
Three types of storage
Block
Data stored as blocks on a
disk or disks
Locally attached
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
Three types of storage
Object
Data is stored as an object that’s
identified by a key in a flat space
Simple API to get and put data
based on key
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
Why is file storage so popular?
Works natively with operating systems
Provides shared access while providing consistency guarantees and
locking functionality
Provides hierarchical namespace
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
How does performance compare
File
Object
Block
Latency
Throughput
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
Before Amazon EFS… DIY file storage costs
AZ-a
Clients
Storage
volumes
AZ-b
File server
Storage
volumes
File server
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
Amazon EFS
A fully managed file service
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
Key features of Amazon EFS
Simple
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
Key features of Amazon EFS
Elastic
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
Key features of Amazon EFS
Scalable
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
Key features of Amazon EFS
Highly available and durable
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
Key features of Amazon EFS
Simple ScalableElastic
Highly available and durable
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
Security model & security features
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
Security model & security features
Control
network traffic
using Amazon VPC security
groups and network ACLs
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
Security model & security features
Control
network traffic
using Amazon VPC security
groups and network ACLs
Control file and
directory access
using POSIX
permissions
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
Security model & security features
Control
network traffic
using Amazon VPC security
groups and network ACLs
Control file and
directory access
using POSIX
permissions
Control administrative
access (API access)
using AWS IAM
(action-level and
resource-level
permissions)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
Security model & security features
Control
network traffic
using Amazon VPC security
groups and network ACLs
Control file and
directory access
using POSIX
permissions
Control administrative
access (API access)
using AWS IAM
(action-level and
resource-level
permissions)
Encryption
of data
at rest and
in transit
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
Where is Amazon EFS available today?
• US West (Oregon)
• US West (N. California)
• US East (N. Virginia)
• US East (Ohio)
• EU (Ireland)
• EU (Frankfurt)
• Asia Pacific (Sydney)
• Asia Pacific (Seoul)
More coming soon!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
Amazon EFS economics
$0.30/GB-Month (US N. Virginia, Ohio, Oregon)
$0.33/GB-Month (US N. California, EU Ireland, AP Seoul)
$0.36/GB-Month (EU Frankfurt, AP Sydney)
No minimum commitments
or up-front fees
No need to provision
storage in advance
No other fees, charges,
or billing dimensions
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
Before Amazon EFS… DIY file storage costs
AZ-a
Clients
Storage
volumes
AZ-b
File server
Storage
volumes
File server
Amazon EC2
instance costs
Inter-AZ data
transfer costs
Amazon EBS
volume costs
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
TCO example
Amazon EFS cost: (500 GB * $0.30/GB-month*) = $150 per month
For DIY, you might provision 600 GB of Amazon EBS (i.e., ~85% utilization):
Storage (2x 600 GB EBS gp2 volumes): $120 per month
Compute (2x m4.xlarge instances): $290 per month
Inter-AZ data transfer costs (est.): $130 per month
Total $540 per month
For storing 500 GB, Amazon EFS is 70% less than DIY
* US N. Virginia pricing
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
Designed for a wide spectrum of needs
Scale-out jobs Metadata-intensive jobs
Analytics
Media
workflows
Enterprise apps and messaging
Web serving
Content management
Database backups
Container storage
Dev tooling
Home
directories
Low latency and serial I/OHigh throughput and parallel I/O
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
Amazon EFS customers and partners
New York
University
University of
Pennsylvania
Cornell
University
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
Phase 2:
Big Data and Analytics Defined
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
What is Big Data and analytics?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
What is Big Data and analytics?
Volume
TiBs & PiBs
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
What is Big Data and analytics?
Volume
TiBs & PiBs
Variety
Web logs, social media
interactions, transactions
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
What is Big Data and analytics?
Volume
TiBs & PiBs
Velocity
Collected, stored,
processed, analyzed within
a relatively short window
Variety
Web logs, social media
interactions, transactions
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
What is the common data flow of Big Data?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
What is the common data flow of Big Data?
1. Collect
• Raw data
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
What is the common data flow of Big Data?
1. Collect
• Raw data
2. Store
• Secure, scalable, durable
• Pre, post, temp
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
What is the common data flow of Big Data?
1. Collect
• Raw data
2. Store
• Secure, scalable, durable
• Pre, post, temp
3. Process and analyze
• Transform
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
What is the common data flow of Big Data?
1. Collect
• Raw data
2. Store
• Secure, scalable, durable
• Pre, post, temp
3. Process and analyze
• Transform
4. Consume and visualize
• Tools to explore data
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
Phase 3:
Big Data and Analytics Workloads
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
Big Data workloads
Run custom code for virtually
any type of data application
Easily run and scale big data
processing frameworks on
managed clusters
Use Amazon EFS as a
durable, decoupled, and
secure file system accessible
to all nodes in the cluster
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
Writing objects in parallel into Amazon EFS
Webinar
Informatica PowerCenter
PowerCenter is a data
integration solution
Use Amazon EFS to write
files such as cache,
source, and target files.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
Informatica PowerCenter Quick Start
https://aws.amazon.com/quickstart/architecture/informatica-powercenter/
Webinar
SAS Grid
SAS Grid is a analytics
computing environment
Use Amazon EFS to share
bootstrap information
and shared storage
among all the machines
in the grid
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
SAS Grid whitepaper
https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-
proceedings/2018/1866-2018.pdf
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
SAS Grid recommendations when using EFS
• If placing SAS Home and SAS Configuration files on an
Amazon EFS file system, disable file locks in the
sasv9.cfg file for the home directory
“-filelocks <HOME> none”
• Where <HOME> is the directory containing the SAS
Home and SAS Configuration files
• Any one particular file can have up to 87 locks across all
users of the file system
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
SAS Grid recommendations when using EFS
• Use the appropriate sized Amazon EC2 instance for
your workload – keep in mind network performance
• i3.8xlarge instances are recommended by SAS due to CPU,
memory, instance store (temp space), and consistent 10
Gbps network performance
• Maximum throughput each Amazon EC2 instance can
drive a single file system is 250 MB/s
• Mount multiple file systems to achieve greater than
250 MB/s throughput to EFS
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
SAS Grid performance results from one node
During SAS POC we run a SAS simulation load from one i3.8xlarge instance with 8
EFS file systems mounted
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
SAS Grid Quick Start
https://aws.amazon.com/quickstart/architecture/sas-grid-infrastructure/
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
Wrapping up
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
Best practices using Amazon EFS
Linux
kernel 4.0+
EFS mount
helper
(NFSv4.1)
Multiple
instances
Large
IO size
(aggregate IO)
Multiple
threads
Multiple
directories
Monitor
metrics
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
Informatica PowerCenter Quick Start
https://aws.amazon.com/quickstart/architecture/informatica-powercenter/
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar
SAS Grid Quick Start
https://aws.amazon.com/quickstart/architecture/sas-grid-infrastructure/
Webinar
Resources
https://aws.amazon.com/efs/resources
Thank you

More Related Content

What's hot

What's hot (20)

How to Build a Data Lake in Amazon S3 & Amazon Glacier - AWS Online Tech Talks
How to Build a Data Lake in Amazon S3 & Amazon Glacier - AWS Online Tech TalksHow to Build a Data Lake in Amazon S3 & Amazon Glacier - AWS Online Tech Talks
How to Build a Data Lake in Amazon S3 & Amazon Glacier - AWS Online Tech Talks
 
Building High Availability Apps on Lightsail: Load Balancing and Block Storag...
Building High Availability Apps on Lightsail: Load Balancing and Block Storag...Building High Availability Apps on Lightsail: Load Balancing and Block Storag...
Building High Availability Apps on Lightsail: Load Balancing and Block Storag...
 
Kickstart Your All-In Move to the AWS Cloud Using AWS Storage Gateway and Ama...
Kickstart Your All-In Move to the AWS Cloud Using AWS Storage Gateway and Ama...Kickstart Your All-In Move to the AWS Cloud Using AWS Storage Gateway and Ama...
Kickstart Your All-In Move to the AWS Cloud Using AWS Storage Gateway and Ama...
 
Reducing Branch Office Storage with AWS Storage Gateway (STG332) - AWS re:Inv...
Reducing Branch Office Storage with AWS Storage Gateway (STG332) - AWS re:Inv...Reducing Branch Office Storage with AWS Storage Gateway (STG332) - AWS re:Inv...
Reducing Branch Office Storage with AWS Storage Gateway (STG332) - AWS re:Inv...
 
Builders' Day- Mastering Kubernetes on AWS
Builders' Day- Mastering Kubernetes on AWSBuilders' Day- Mastering Kubernetes on AWS
Builders' Day- Mastering Kubernetes on AWS
 
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech TalksAnalyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
 
SID304 Threat Detection and Remediation with Amazon GuardDuty
 SID304 Threat Detection and Remediation with Amazon GuardDuty SID304 Threat Detection and Remediation with Amazon GuardDuty
SID304 Threat Detection and Remediation with Amazon GuardDuty
 
SRV307 Applying AWS Purpose-Built Database Strategy: Match Your Workload to ...
 SRV307 Applying AWS Purpose-Built Database Strategy: Match Your Workload to ... SRV307 Applying AWS Purpose-Built Database Strategy: Match Your Workload to ...
SRV307 Applying AWS Purpose-Built Database Strategy: Match Your Workload to ...
 
Best of AWS re:Invent 2017
Best of AWS re:Invent 2017Best of AWS re:Invent 2017
Best of AWS re:Invent 2017
 
Using Cloud File Storage to Accelerate Your Software Development Pipeline (ST...
Using Cloud File Storage to Accelerate Your Software Development Pipeline (ST...Using Cloud File Storage to Accelerate Your Software Development Pipeline (ST...
Using Cloud File Storage to Accelerate Your Software Development Pipeline (ST...
 
Automating Backup and Archiving on AWS with Commvault (STG358) - AWS re:Inven...
Automating Backup and Archiving on AWS with Commvault (STG358) - AWS re:Inven...Automating Backup and Archiving on AWS with Commvault (STG358) - AWS re:Inven...
Automating Backup and Archiving on AWS with Commvault (STG358) - AWS re:Inven...
 
Big Data and Alexa_Voice-Enabled Analytics
Big Data and Alexa_Voice-Enabled Analytics Big Data and Alexa_Voice-Enabled Analytics
Big Data and Alexa_Voice-Enabled Analytics
 
Builders' Day - Building Data Lakes for Analytics On AWS LC
Builders' Day - Building Data Lakes for Analytics On AWS LCBuilders' Day - Building Data Lakes for Analytics On AWS LC
Builders' Day - Building Data Lakes for Analytics On AWS LC
 
SRV327 Replicate, Analyze, and Visualize Data Using Managed Database and Ser...
 SRV327 Replicate, Analyze, and Visualize Data Using Managed Database and Ser... SRV327 Replicate, Analyze, and Visualize Data Using Managed Database and Ser...
SRV327 Replicate, Analyze, and Visualize Data Using Managed Database and Ser...
 
SRV205 Architectures and Strategies for Building Modern Applications on AWS
 SRV205 Architectures and Strategies for Building Modern Applications on AWS SRV205 Architectures and Strategies for Building Modern Applications on AWS
SRV205 Architectures and Strategies for Building Modern Applications on AWS
 
Querying Data in Place with AWS Object Storage Features and Analytics Tools (...
Querying Data in Place with AWS Object Storage Features and Analytics Tools (...Querying Data in Place with AWS Object Storage Features and Analytics Tools (...
Querying Data in Place with AWS Object Storage Features and Analytics Tools (...
 
Using Amazon S3 and Amazon Glacier for Backup or Archive Storage (STG339) - A...
Using Amazon S3 and Amazon Glacier for Backup or Archive Storage (STG339) - A...Using Amazon S3 and Amazon Glacier for Backup or Archive Storage (STG339) - A...
Using Amazon S3 and Amazon Glacier for Backup or Archive Storage (STG339) - A...
 
Containerize Legacy .NET Framework Web Apps for Cloud Migration
Containerize Legacy .NET Framework Web Apps for Cloud Migration Containerize Legacy .NET Framework Web Apps for Cloud Migration
Containerize Legacy .NET Framework Web Apps for Cloud Migration
 
Transforming Data Lakes with Amazon S3 Select & Amazon Glacier Select - AWS O...
Transforming Data Lakes with Amazon S3 Select & Amazon Glacier Select - AWS O...Transforming Data Lakes with Amazon S3 Select & Amazon Glacier Select - AWS O...
Transforming Data Lakes with Amazon S3 Select & Amazon Glacier Select - AWS O...
 
Enabling a Digital Platform with Microservices Architecture (ARC218-S) - AWS ...
Enabling a Digital Platform with Microservices Architecture (ARC218-S) - AWS ...Enabling a Digital Platform with Microservices Architecture (ARC218-S) - AWS ...
Enabling a Digital Platform with Microservices Architecture (ARC218-S) - AWS ...
 

Similar to Big Data and Analytics Workloads on Amazon EFS - AWS Online Tech Talks

PreReLa #1 (PRL1-R1) - AWS re:Invent 2018
PreReLa #1 (PRL1-R1) - AWS re:Invent 2018PreReLa #1 (PRL1-R1) - AWS re:Invent 2018
PreReLa #1 (PRL1-R1) - AWS re:Invent 2018
Amazon Web Services
 

Similar to Big Data and Analytics Workloads on Amazon EFS - AWS Online Tech Talks (20)

SRV303 Deep Dive on Amazon EFS
 SRV303 Deep Dive on Amazon EFS SRV303 Deep Dive on Amazon EFS
SRV303 Deep Dive on Amazon EFS
 
Enterprise Applications with Amazon EFS - AWS Online Tech Talks
Enterprise Applications with Amazon EFS - AWS Online Tech TalksEnterprise Applications with Amazon EFS - AWS Online Tech Talks
Enterprise Applications with Amazon EFS - AWS Online Tech Talks
 
Amazon EFS: Deep Dive - SRV303 - Atlanta AWS Summit
Amazon EFS: Deep Dive - SRV303 - Atlanta AWS SummitAmazon EFS: Deep Dive - SRV303 - Atlanta AWS Summit
Amazon EFS: Deep Dive - SRV303 - Atlanta AWS Summit
 
Amazon EFS: Deep Dive - SRV303 - Chicago AWS Summit
Amazon EFS: Deep Dive - SRV303 - Chicago AWS SummitAmazon EFS: Deep Dive - SRV303 - Chicago AWS Summit
Amazon EFS: Deep Dive - SRV303 - Chicago AWS Summit
 
Deep Dive on Amazon Elastic File System (Amazon EFS) (STG301-R1) - AWS re:Inv...
Deep Dive on Amazon Elastic File System (Amazon EFS) (STG301-R1) - AWS re:Inv...Deep Dive on Amazon Elastic File System (Amazon EFS) (STG301-R1) - AWS re:Inv...
Deep Dive on Amazon Elastic File System (Amazon EFS) (STG301-R1) - AWS re:Inv...
 
[NEW LAUNCH!] Optimize file system costs using Amazon EFS Infrequent Access (...
[NEW LAUNCH!] Optimize file system costs using Amazon EFS Infrequent Access (...[NEW LAUNCH!] Optimize file system costs using Amazon EFS Infrequent Access (...
[NEW LAUNCH!] Optimize file system costs using Amazon EFS Infrequent Access (...
 
Amazon EFS: Deep Dive
Amazon EFS: Deep DiveAmazon EFS: Deep Dive
Amazon EFS: Deep Dive
 
Maximizing Throughput and Performance on Amazon EFS (STG406) - AWS re:Invent ...
Maximizing Throughput and Performance on Amazon EFS (STG406) - AWS re:Invent ...Maximizing Throughput and Performance on Amazon EFS (STG406) - AWS re:Invent ...
Maximizing Throughput and Performance on Amazon EFS (STG406) - AWS re:Invent ...
 
How a Biotech Firm Streamlined Data Protection on AWS
 How a Biotech Firm Streamlined Data Protection on AWS How a Biotech Firm Streamlined Data Protection on AWS
How a Biotech Firm Streamlined Data Protection on AWS
 
AWS Storage Leadership Session: What's New in Amazon S3, Amazon EFS, Amazon E...
AWS Storage Leadership Session: What's New in Amazon S3, Amazon EFS, Amazon E...AWS Storage Leadership Session: What's New in Amazon S3, Amazon EFS, Amazon E...
AWS Storage Leadership Session: What's New in Amazon S3, Amazon EFS, Amazon E...
 
[NEW LAUNCH!] How to build and deploy Windows file system in AWS using Amazon...
[NEW LAUNCH!] How to build and deploy Windows file system in AWS using Amazon...[NEW LAUNCH!] How to build and deploy Windows file system in AWS using Amazon...
[NEW LAUNCH!] How to build and deploy Windows file system in AWS using Amazon...
 
Amazon EFS (Elastic File System) 이해하고사용하기
Amazon EFS (Elastic File System) 이해하고사용하기Amazon EFS (Elastic File System) 이해하고사용하기
Amazon EFS (Elastic File System) 이해하고사용하기
 
Go Global with Cloud-Native Architecture: Deploy AdTech Services Across Four ...
Go Global with Cloud-Native Architecture: Deploy AdTech Services Across Four ...Go Global with Cloud-Native Architecture: Deploy AdTech Services Across Four ...
Go Global with Cloud-Native Architecture: Deploy AdTech Services Across Four ...
 
Lessons Learned from a Large-Scale Legacy Migration with Sysco (STG311) - AWS...
Lessons Learned from a Large-Scale Legacy Migration with Sysco (STG311) - AWS...Lessons Learned from a Large-Scale Legacy Migration with Sysco (STG311) - AWS...
Lessons Learned from a Large-Scale Legacy Migration with Sysco (STG311) - AWS...
 
Tape Is a Four Letter Word: Back Up to the Cloud in Under an Hour (STG201) - ...
Tape Is a Four Letter Word: Back Up to the Cloud in Under an Hour (STG201) - ...Tape Is a Four Letter Word: Back Up to the Cloud in Under an Hour (STG201) - ...
Tape Is a Four Letter Word: Back Up to the Cloud in Under an Hour (STG201) - ...
 
Building a Hybrid Architecture: Enterprise Backup & Recovery (ENT212-S) - AWS...
Building a Hybrid Architecture: Enterprise Backup & Recovery (ENT212-S) - AWS...Building a Hybrid Architecture: Enterprise Backup & Recovery (ENT212-S) - AWS...
Building a Hybrid Architecture: Enterprise Backup & Recovery (ENT212-S) - AWS...
 
PreReLa #1 (PRL1-R1) - AWS re:Invent 2018
PreReLa #1 (PRL1-R1) - AWS re:Invent 2018PreReLa #1 (PRL1-R1) - AWS re:Invent 2018
PreReLa #1 (PRL1-R1) - AWS re:Invent 2018
 
How Fannie Mae Processes over a Quarter Million Loans per Day with Amazon S3 ...
How Fannie Mae Processes over a Quarter Million Loans per Day with Amazon S3 ...How Fannie Mae Processes over a Quarter Million Loans per Day with Amazon S3 ...
How Fannie Mae Processes over a Quarter Million Loans per Day with Amazon S3 ...
 
BI & Analytics
BI & AnalyticsBI & Analytics
BI & Analytics
 
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018
 

More from Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Big Data and Analytics Workloads on Amazon EFS - AWS Online Tech Talks

  • 1. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar Darryl S. Osborne Solutions Architect – AWS File Services Big Data and Analytics Workloads on Amazon EFS Joe Disher Sr. Product Marketing Manager – Amazon EFS - AWS
  • 2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar Big Data and Analytics Workloads on Amazon EFS Phase 1: Choose Storage Platform Phase 2: Big Data and Analytics Defined Phase 3: Big Data and Analytics Workloads Phase 4: Wrapping Up
  • 3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar Phase 1: Choose Storage Platform
  • 4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar What do you think about when choosing a storage solution? EconomicsStorage type Features and performance
  • 5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar Three types of storage File ObjectBlock
  • 6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar Three types of storage File Data stored as files in a directory hierarchy Shared over a network
  • 7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar Three types of storage Block Data stored as blocks on a disk or disks Locally attached
  • 8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar Three types of storage Object Data is stored as an object that’s identified by a key in a flat space Simple API to get and put data based on key
  • 9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar Why is file storage so popular? Works natively with operating systems Provides shared access while providing consistency guarantees and locking functionality Provides hierarchical namespace
  • 10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar How does performance compare File Object Block Latency Throughput
  • 11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar Before Amazon EFS… DIY file storage costs AZ-a Clients Storage volumes AZ-b File server Storage volumes File server
  • 12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar Amazon EFS A fully managed file service
  • 13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar Key features of Amazon EFS Simple
  • 14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar Key features of Amazon EFS Elastic
  • 15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar Key features of Amazon EFS Scalable
  • 16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar Key features of Amazon EFS Highly available and durable
  • 17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar Key features of Amazon EFS Simple ScalableElastic Highly available and durable
  • 18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar Security model & security features
  • 19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar Security model & security features Control network traffic using Amazon VPC security groups and network ACLs
  • 20. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar Security model & security features Control network traffic using Amazon VPC security groups and network ACLs Control file and directory access using POSIX permissions
  • 21. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar Security model & security features Control network traffic using Amazon VPC security groups and network ACLs Control file and directory access using POSIX permissions Control administrative access (API access) using AWS IAM (action-level and resource-level permissions)
  • 22. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar Security model & security features Control network traffic using Amazon VPC security groups and network ACLs Control file and directory access using POSIX permissions Control administrative access (API access) using AWS IAM (action-level and resource-level permissions) Encryption of data at rest and in transit
  • 23. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar Where is Amazon EFS available today? • US West (Oregon) • US West (N. California) • US East (N. Virginia) • US East (Ohio) • EU (Ireland) • EU (Frankfurt) • Asia Pacific (Sydney) • Asia Pacific (Seoul) More coming soon!
  • 24. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar Amazon EFS economics $0.30/GB-Month (US N. Virginia, Ohio, Oregon) $0.33/GB-Month (US N. California, EU Ireland, AP Seoul) $0.36/GB-Month (EU Frankfurt, AP Sydney) No minimum commitments or up-front fees No need to provision storage in advance No other fees, charges, or billing dimensions
  • 25. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar Before Amazon EFS… DIY file storage costs AZ-a Clients Storage volumes AZ-b File server Storage volumes File server Amazon EC2 instance costs Inter-AZ data transfer costs Amazon EBS volume costs
  • 26. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar TCO example Amazon EFS cost: (500 GB * $0.30/GB-month*) = $150 per month For DIY, you might provision 600 GB of Amazon EBS (i.e., ~85% utilization): Storage (2x 600 GB EBS gp2 volumes): $120 per month Compute (2x m4.xlarge instances): $290 per month Inter-AZ data transfer costs (est.): $130 per month Total $540 per month For storing 500 GB, Amazon EFS is 70% less than DIY * US N. Virginia pricing
  • 27. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar Designed for a wide spectrum of needs Scale-out jobs Metadata-intensive jobs Analytics Media workflows Enterprise apps and messaging Web serving Content management Database backups Container storage Dev tooling Home directories Low latency and serial I/OHigh throughput and parallel I/O
  • 28. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar Amazon EFS customers and partners New York University University of Pennsylvania Cornell University
  • 29. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar Phase 2: Big Data and Analytics Defined
  • 30. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar What is Big Data and analytics?
  • 31. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar What is Big Data and analytics? Volume TiBs & PiBs
  • 32. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar What is Big Data and analytics? Volume TiBs & PiBs Variety Web logs, social media interactions, transactions
  • 33. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar What is Big Data and analytics? Volume TiBs & PiBs Velocity Collected, stored, processed, analyzed within a relatively short window Variety Web logs, social media interactions, transactions
  • 34. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar What is the common data flow of Big Data?
  • 35. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar What is the common data flow of Big Data? 1. Collect • Raw data
  • 36. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar What is the common data flow of Big Data? 1. Collect • Raw data 2. Store • Secure, scalable, durable • Pre, post, temp
  • 37. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar What is the common data flow of Big Data? 1. Collect • Raw data 2. Store • Secure, scalable, durable • Pre, post, temp 3. Process and analyze • Transform
  • 38. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar What is the common data flow of Big Data? 1. Collect • Raw data 2. Store • Secure, scalable, durable • Pre, post, temp 3. Process and analyze • Transform 4. Consume and visualize • Tools to explore data
  • 39. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar Phase 3: Big Data and Analytics Workloads
  • 40. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar Big Data workloads Run custom code for virtually any type of data application Easily run and scale big data processing frameworks on managed clusters Use Amazon EFS as a durable, decoupled, and secure file system accessible to all nodes in the cluster
  • 41. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar Writing objects in parallel into Amazon EFS
  • 42. Webinar Informatica PowerCenter PowerCenter is a data integration solution Use Amazon EFS to write files such as cache, source, and target files.
  • 43. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar Informatica PowerCenter Quick Start https://aws.amazon.com/quickstart/architecture/informatica-powercenter/
  • 44. Webinar SAS Grid SAS Grid is a analytics computing environment Use Amazon EFS to share bootstrap information and shared storage among all the machines in the grid
  • 45. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar SAS Grid whitepaper https://www.sas.com/content/dam/SAS/support/en/sas-global-forum- proceedings/2018/1866-2018.pdf
  • 46. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar SAS Grid recommendations when using EFS • If placing SAS Home and SAS Configuration files on an Amazon EFS file system, disable file locks in the sasv9.cfg file for the home directory “-filelocks <HOME> none” • Where <HOME> is the directory containing the SAS Home and SAS Configuration files • Any one particular file can have up to 87 locks across all users of the file system
  • 47. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar SAS Grid recommendations when using EFS • Use the appropriate sized Amazon EC2 instance for your workload – keep in mind network performance • i3.8xlarge instances are recommended by SAS due to CPU, memory, instance store (temp space), and consistent 10 Gbps network performance • Maximum throughput each Amazon EC2 instance can drive a single file system is 250 MB/s • Mount multiple file systems to achieve greater than 250 MB/s throughput to EFS
  • 48. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar SAS Grid performance results from one node During SAS POC we run a SAS simulation load from one i3.8xlarge instance with 8 EFS file systems mounted
  • 49. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar SAS Grid Quick Start https://aws.amazon.com/quickstart/architecture/sas-grid-infrastructure/
  • 50. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar Wrapping up
  • 51. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar Best practices using Amazon EFS Linux kernel 4.0+ EFS mount helper (NFSv4.1) Multiple instances Large IO size (aggregate IO) Multiple threads Multiple directories Monitor metrics
  • 52. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar Informatica PowerCenter Quick Start https://aws.amazon.com/quickstart/architecture/informatica-powercenter/
  • 53. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Webinar SAS Grid Quick Start https://aws.amazon.com/quickstart/architecture/sas-grid-infrastructure/