SlideShare a Scribd company logo
1 of 67
Download to read offline
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Case Study: How Spokeo Improved
Web Application Response Times with
Amazon EFS
December 2, 2016
STG206
Austin Fonacier, Spokeo
Sajee Mathew, AWS Principal Solutions Architect
What to Expect from the Session
• Overview of Amazon EFS
• How Spokeo uses EFS
• What we do at Spokeo
• Spokeo Tech Stack
• Our challenge
• Off the shelf CDN
• Writing our own reverse proxy
• Back ends
• Populating EFS at scale
• Lessons learned
Batches and Streams
Direct
Connect
Snowball,
Snowmobile
3rd Party
Connectors
Transfer
Acceleration
Storage
Gateway
Amazon Kinesis
Firehose
File
Amazon EFS
Block
Amazon EBS
(persistent)
Object
Amazon GlacierAmazon S3
Amazon EC2
Instance Store
(ephemeral)
AWS Storage Overview
Operating shared file storage today is a pain
App owners and
Developers
Business
Managers
IT administrators
 Estimate demand
 Procure, setup, maintain hardware & space
 Provide demand forecasts/business case
 Limited flexibility and agility
 CAPEX & over-buy
 Constant upgrade/refresh cycle
What if you could…
App owners and
Developers
Business
Managers
IT administrators
 Eliminate management & maintenance
 Scale
 Migrate code, apps, tools
 Build new cloud-native apps
 Predict cost & eliminate CAPEX
 Increase agility
 Less time managing file system
 Fully managed file system for EC2
 File system access semantics that works with standard OS
APIs
 Sharable across thousands of clients
 Grow elastically to petabyte scale
 Highly available and durable
 Strong consistency
What is Amazon EFS?
Amazon EFS is simple
Fully managed
- No hardware, network, file layer
- Create a scalable file system in seconds!
Seamless integration with existing tools and apps
- NFS v4.1—widespread, open
- Standard file system access semantics
- Works with standard OS file system APIs
Simple pricing = simple forecasting
Amazon EFS is elastic
File systems grow and shrink automatically
as you add and remove files
No need to provision storage capacity or
performance
You pay for only the storage space you use,
with no minimum fee
File systems can grow to petabyte scale
Throughput and IOPS scale automatically
as file systems grow
Consistent low latencies regardless of file
system size
Support for thousands of concurrent NFS
connections
Amazon EFS is scalable
Designed to sustain Availability Zone (AZ)
offline conditions
Resources aggregated across multiple AZs
Superior to traditional NAS availability
models
Appropriate for Production / Tier 0
applications
Highly Durable and Highly Available
Highly durable and highly available
Every file system
object (directory,
file, and link) is
redundantly
stored across
multiple
Availability Zones
in a region
AVAILABILITY
ZONE 1
REGION
AVAILABILITY
ZONE 2
AVAILABILITY
ZONE 3
Amazon
EFS
Example use cases
Big data analytics
Media workflow processing
Web serving
Content management
Home directories
The AWS Management Console, CLI, and SDK each
enable you to perform a variety of management tasks
Create a file system
Create and manage mount targets
Tag a file system
Delete a file system
View details on file systems in your AWS account
Setting up and mounting a file system takes
under a minute
1. Create a file system
2. Create a mount target in each Availability Zone from
which you want to access the file system
3. Enable the NFS client on your instances
4. Run the mount command
Setting up and mounting a file system
Two performance modes designed to support
this broad spectrum of use cases
Optimized for latency-sensitive applications and general-
purpose, file-based workloads – the best option for the majority
of use cases
General
Purpose mode
Max I/O mode
Can scale to higher levels of aggregate throughput with a tradeoff
of slightly higher latencies for file operations
Default: Recommended for most use cases
Use Amazon CloudWatch to determine whether your application can benefit
from Max I/O mode; if not, you’ll get the best performance in General Purpose mode
EFS provides a throughput bursting model that
scales as a file system grows
As a file system gets larger, it
needs access to more
throughput
Many file workloads are spiky,
with peak throughput well above
average levels
+
Amazon EFS scalable bursting model is designed to
make performance available when you need it
Throughput bursting model based on earning
and spending “bursting credits”
• File systems earn credits at a “baseline rate” of 0.05 MB/s per GB stored and use credits by
performing file system operations; file systems can drive throughput at “baseline rate” indefinitely
• File systems with a positive bursting credit balance can “burst” to higher levels for periods of time:
100 MB/s for file systems 1 TB or smaller, 100 MB/s per TB for file systems larger than 1 TB
• New file systems start with a full credit balance
Bursting model examples
File system size Read/write throughput
A 100 GB EFS file system can… • Drive up to 5 MB/s continuously
or
• Burst to 100 MB/s for up to 72 minutes each day
A 1 TB EFS file system can… • Drive up to 50 MB/s continuously
or
• Burst to 100 MB/s for up to 12 hours each day
A 10 TB EFS file system can… • Drive up to 500 MB/s continuously
or
• Burst to 1 GB/s for up to 12 hours each day
In Which Regions Can I Use EFS Today?
US East (N. Virginia) – us-east-1
US East (Ohio) – us-east-2
US West (Oregon) – us-west-2
EU (Ireland) – eu-west-1
More coming soon!
Simple and predictable pricing
With EFS, you pay for only the storage space you use
• No minimum commitments or upfront fees
• No need to provision storage in advance
• No other fees, charges, or billing dimensions
EFS price:
• $0.30/GB/month (N.Virginia, Ohio, Oregon)
• $0.33/GB/month (Ireland)
Customers within their first 12 months on AWS can use up to
5 GB/month for free
Introduction
Austin Fonacier
Lead Software Architect at Spokeo
austin@spokeo.com
@austinrfnd
http://github.com/austinrfnd
Spokeo
People search engine
Headquartered in beautiful
Pasadena, CA
200+ employees
18,000,000 unique visitors a month
8.5 billion people records
30,000,000 bot hits per 24 hour
period
Spokeo the product
Search for people by any
intersection of data:
first name, last name, email, age,
address, phone, email, or relative
name
Email/username and address search
Spokeo tech stack
High-level Spokeo tech stack
Our challenge: SEO pages
3,000,000,000 SEO pages
≈ 37.4 terabytes of data
≈ 30,000,000 crawls per day
SEO pages: compatibility
Crawlers Users
The importance of page speed
●Page speed abandonment rate
●Google utilizes page speed for
search ranking
●Studies show a direct relationship
with page speed and conversion
rate
How to get faster?
Ninety - ninety rule
“The first 90 percent of the code accounts for the first 10
percent of the development time. The remaining 10 percent
of the code accounts for the other 90 percent of the
development time.”
- Tom Cargill Bell Labs
How to get faster?
Switch away from Ruby on Rails
●Ton of effort
●Ton of time
●Ton of money
●Unmeasurable performance gains
How to get faster?
Reverse Proxy
●Low effort (some code/header
tweaks)
●Immediate measurable
performance gains
Over the counter reverse proxy
●Fast
●Easy (CDN & header changes)
●Global delivery system
The LRU
The LRU
The LRU
The LRU
Back to the drawing board
“Always serve Google the
fastest possible page”
- Mike Daly
Spokeo CTO
Google is the toughest critic. By
making the site faster for
Google, we are making our
customers happy
Cache requirements
As fast as reasonably possible
Cost efficient
Scalable
Fault tolerant
Failover/availability
Ruby on Rails cache
Rails penalty of going through the framework
“Always serve Google the
fastest possible page”
- Mike Daly
Spokeo CTO
Two-part project
Reverse proxy
Backend
Proposed topology
Off-the-shelf reverse proxies
Reverse proxy options
●All have in-memory mapping of keys and
values
●Nginx and Varnish are expensive
●Apache Traffic Server doesn’t notify
other nodes on writes
●All are huge code bases
Reverse proxy options
Cons Lines of Code Cost
Nginx Memory mapping of all
keys
164,978 $187,573
Varnish Memory mapping of all
keys
220,813 $495,999
Apache Traffic Server Memory mapping of all
keys
Writes don’t propagate
between nodes
889,824 $6,771
Assuming 45 c4.xlarge instances per month
●In-house expert knowledge
●Very simple use case
●Inexpensive to run (thin node.js app)
●No in-memory mapping
Write our own: MassCache
MassCache
Back ends
Back ends
Cost* Performance Cons
Amazon S3/Amazon
CloudFront
$6,000 10-1000ms CloudFront LRU
Amazon DynamoDB $11,000 20-30ms
Amazon
ElastiCache
$90,000 Fast Not data-persistent
Amazon EBS
volumes
- - EBS mounting
limitations**
Amazon EFS $11,000/month 17 ms reads
30 ms writes
(Max I/O mode, more
details next slide)
EFS
Price: $11,000/month
Performance: 17 ms for read, 30 ms for writes
• Latencies in Max IO mode (General Purpose mode has
lower latencies)
• Writes: Node.js Open, Write, and Close
• Reads: Node.JS file descriptor, file stats, and reading the
contents
• 30 kb files and peak EFS size is 2.3 GB
Built-in data redundancy
Built-in scalability
EFS costs
Spokeo tech stack now
Populating EFS: Cacheup
● Actively populate EFS as fast as possible
● EFS doesn’t shy away from 250,000 requests/second
● Populate 3,000,000,000 files in one week
● Cache invalidation: sending requests with a special
header
Cacheup: dynamic throttling
Dynamic throttling based off of key metrics
of our stack:
●Application performance index scores
●Response times
●Database load
Benefits after a year with EFS, MassCache, and
Cacheup
● Costs to serve a cached page
● Horizontally scalable
• 37.4 TB
• 3,000,000,000 files
• 30,000,000 requests per day
● Active warming taught us about bottlenecks on our webstack
● Site redundancy
● Built-in DDOS protection
● Google webmaster dashboard numbers are steady
EFS is the cloud
EFS to us feels like an infinitely
scalable resource
● Fast
● Easy
● Cheap
● Data redundant
● Goldilocks solution for us
EFS gotchas
●Writes are slower than reads
●Writing a file is slightly slower than updating a file
●Improvements have been made since preview a year ago
and will continue to occur; including support for NFSv4.1
●Any access to EFS looks like a file access but is actually a
network call!
●General Purpose (GP) ≠ Max I/O
DDoS/site redundancy
Crawler spike protection
Related Sessions
• STG202 - Deep Dive on Amazon Elastic File System
• Recorded on Wednesday
• STG207 - Case Study: How Atlassian Uses Amazon
EFS with JIRA to Cut Costs and Accelerate Performance
• Friday 12:30pm
• STG208 - Case Study: How Monsanto Uses Amazon
EFS with Their Large-Scale Geospatial Data Sets
• Friday 11:00am
Thank you!
We’re hiring
http://spokeo.com/jobs
Remember to complete
your evaluations!
Questions?

More Related Content

What's hot

What's hot (20)

AWS re:Invent 2016: Deep Dive on Amazon Elastic File System (STG202)
AWS re:Invent 2016: Deep Dive on Amazon Elastic File System (STG202)AWS re:Invent 2016: Deep Dive on Amazon Elastic File System (STG202)
AWS re:Invent 2016: Deep Dive on Amazon Elastic File System (STG202)
 
Getting Started with Managed Database Services on AWS
Getting Started with Managed Database Services on AWSGetting Started with Managed Database Services on AWS
Getting Started with Managed Database Services on AWS
 
Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery
Getting Started with the Hybrid Cloud: Enterprise Backup and RecoveryGetting Started with the Hybrid Cloud: Enterprise Backup and Recovery
Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery
 
AWS re:Invent 2016: Case Study: How Videology and Zendesk Modernized Their Bi...
AWS re:Invent 2016: Case Study: How Videology and Zendesk Modernized Their Bi...AWS re:Invent 2016: Case Study: How Videology and Zendesk Modernized Their Bi...
AWS re:Invent 2016: Case Study: How Videology and Zendesk Modernized Their Bi...
 
Getting Started with Managed Database Services on AWS
Getting Started with Managed Database Services on AWSGetting Started with Managed Database Services on AWS
Getting Started with Managed Database Services on AWS
 
AWS re:Invent 2016: Getting Started with Amazon Aurora (DAT203)
AWS re:Invent 2016: Getting Started with Amazon Aurora (DAT203)AWS re:Invent 2016: Getting Started with Amazon Aurora (DAT203)
AWS re:Invent 2016: Getting Started with Amazon Aurora (DAT203)
 
The Pace of Innovation - Pop-up Loft Tel Aviv
The Pace of Innovation - Pop-up Loft Tel AvivThe Pace of Innovation - Pop-up Loft Tel Aviv
The Pace of Innovation - Pop-up Loft Tel Aviv
 
AWS re:Invent 2016: Case Study: How Atlassian Uses Amazon EFS with JIRA to Cu...
AWS re:Invent 2016: Case Study: How Atlassian Uses Amazon EFS with JIRA to Cu...AWS re:Invent 2016: Case Study: How Atlassian Uses Amazon EFS with JIRA to Cu...
AWS re:Invent 2016: Case Study: How Atlassian Uses Amazon EFS with JIRA to Cu...
 
Announcing Amazon Lightsail - January 2017 AWS Online Tech Talks
Announcing Amazon Lightsail - January 2017 AWS Online Tech TalksAnnouncing Amazon Lightsail - January 2017 AWS Online Tech Talks
Announcing Amazon Lightsail - January 2017 AWS Online Tech Talks
 
Selecting the Right AWS Database Solution - AWS 2017 Online Tech Talks
Selecting the Right AWS Database Solution - AWS 2017 Online Tech TalksSelecting the Right AWS Database Solution - AWS 2017 Online Tech Talks
Selecting the Right AWS Database Solution - AWS 2017 Online Tech Talks
 
Deep Dive on Amazon RDS
Deep Dive on Amazon RDSDeep Dive on Amazon RDS
Deep Dive on Amazon RDS
 
ENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million UsersENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million Users
 
AWS re:Invent 2016: Event Handling at Scale: Designing an Auditable Ingestion...
AWS re:Invent 2016: Event Handling at Scale: Designing an Auditable Ingestion...AWS re:Invent 2016: Event Handling at Scale: Designing an Auditable Ingestion...
AWS re:Invent 2016: Event Handling at Scale: Designing an Auditable Ingestion...
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Introducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar Series
Introducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar SeriesIntroducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar Series
Introducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar Series
 
Getting Started with Amazon Aurora
Getting Started with Amazon AuroraGetting Started with Amazon Aurora
Getting Started with Amazon Aurora
 
SRV401 Deep Dive on Amazon Elastic File System (Amazon EFS)
SRV401 Deep Dive on Amazon Elastic File System (Amazon EFS)SRV401 Deep Dive on Amazon Elastic File System (Amazon EFS)
SRV401 Deep Dive on Amazon Elastic File System (Amazon EFS)
 
Running Relational Databases on AWS
Running Relational Databases on AWS  Running Relational Databases on AWS
Running Relational Databases on AWS
 
AWS re:Invent 2016: Design, Deploy, and Optimize Microsoft SharePoint on AWS ...
AWS re:Invent 2016: Design, Deploy, and Optimize Microsoft SharePoint on AWS ...AWS re:Invent 2016: Design, Deploy, and Optimize Microsoft SharePoint on AWS ...
AWS re:Invent 2016: Design, Deploy, and Optimize Microsoft SharePoint on AWS ...
 
Automate the Provisioning of Secure Developer Environments on AWS PPT
 Automate the Provisioning of Secure Developer Environments on AWS PPT Automate the Provisioning of Secure Developer Environments on AWS PPT
Automate the Provisioning of Secure Developer Environments on AWS PPT
 

Viewers also liked

Viewers also liked (20)

Amazon Elastic File System (EFS): New Elastic File Storage Service That Makes...
Amazon Elastic File System (EFS): New Elastic File Storage Service That Makes...Amazon Elastic File System (EFS): New Elastic File Storage Service That Makes...
Amazon Elastic File System (EFS): New Elastic File Storage Service That Makes...
 
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
 
AWS re:Invent 2016: Real-time Data Processing Using AWS Lambda (SVR301)
AWS re:Invent 2016: Real-time Data Processing Using AWS Lambda (SVR301)AWS re:Invent 2016: Real-time Data Processing Using AWS Lambda (SVR301)
AWS re:Invent 2016: Real-time Data Processing Using AWS Lambda (SVR301)
 
AWS re:Invent 2016: How to Manage Inventory, Patching, and System Images for ...
AWS re:Invent 2016: How to Manage Inventory, Patching, and System Images for ...AWS re:Invent 2016: How to Manage Inventory, Patching, and System Images for ...
AWS re:Invent 2016: How to Manage Inventory, Patching, and System Images for ...
 
NEW LAUNCH! AWS Shield—A Managed DDoS Protection Service
NEW LAUNCH! AWS Shield—A Managed DDoS Protection ServiceNEW LAUNCH! AWS Shield—A Managed DDoS Protection Service
NEW LAUNCH! AWS Shield—A Managed DDoS Protection Service
 
Blended Solutions: Hitting the Sweet Spot of Universal Design for Learning (UDL)
Blended Solutions: Hitting the Sweet Spot of Universal Design for Learning (UDL)Blended Solutions: Hitting the Sweet Spot of Universal Design for Learning (UDL)
Blended Solutions: Hitting the Sweet Spot of Universal Design for Learning (UDL)
 
THE CUSTOMER EXPERIENCE (CE) GAMEPLAN: A Universal Design Model For Answering...
THE CUSTOMER EXPERIENCE (CE) GAMEPLAN: A Universal Design Model For Answering...THE CUSTOMER EXPERIENCE (CE) GAMEPLAN: A Universal Design Model For Answering...
THE CUSTOMER EXPERIENCE (CE) GAMEPLAN: A Universal Design Model For Answering...
 
Get Started with AWS
Get Started with AWSGet Started with AWS
Get Started with AWS
 
Amazon EC2 to Amazon VPC: A case study
Amazon EC2 to Amazon VPC: A case studyAmazon EC2 to Amazon VPC: A case study
Amazon EC2 to Amazon VPC: A case study
 
Database Migration: Simple, Cross-Engine and Cross-Platform Migrations with ...
 Database Migration: Simple, Cross-Engine and Cross-Platform Migrations with ... Database Migration: Simple, Cross-Engine and Cross-Platform Migrations with ...
Database Migration: Simple, Cross-Engine and Cross-Platform Migrations with ...
 
Getting Started with Amazon Enterprise Applications
Getting Started with Amazon Enterprise ApplicationsGetting Started with Amazon Enterprise Applications
Getting Started with Amazon Enterprise Applications
 
Successful Cloud Adoption for the Enterprise. Not If. When.
Successful Cloud Adoption for the Enterprise. Not If. When.Successful Cloud Adoption for the Enterprise. Not If. When.
Successful Cloud Adoption for the Enterprise. Not If. When.
 
Application Migrations
Application MigrationsApplication Migrations
Application Migrations
 
AWS Summit Gold Sponsor Presentation - Soltius
AWS Summit Gold Sponsor Presentation - SoltiusAWS Summit Gold Sponsor Presentation - Soltius
AWS Summit Gold Sponsor Presentation - Soltius
 
Cost optimization at scale toronto v3
Cost optimization at scale toronto v3Cost optimization at scale toronto v3
Cost optimization at scale toronto v3
 
Best Practices for Protecting Cloud Workloads - November 2016 Webinar Series
Best Practices for Protecting Cloud Workloads - November 2016 Webinar SeriesBest Practices for Protecting Cloud Workloads - November 2016 Webinar Series
Best Practices for Protecting Cloud Workloads - November 2016 Webinar Series
 
Cloud Adoption
Cloud AdoptionCloud Adoption
Cloud Adoption
 
Deep Dive on Amazon Relational Database Service
Deep Dive on Amazon Relational Database ServiceDeep Dive on Amazon Relational Database Service
Deep Dive on Amazon Relational Database Service
 
Meetup #3: Migrating an Oracle Application from on-premise to AWS
Meetup #3: Migrating an Oracle Application from on-premise to AWSMeetup #3: Migrating an Oracle Application from on-premise to AWS
Meetup #3: Migrating an Oracle Application from on-premise to AWS
 
Configuration Management with AWS OpsWorks  by Amir Golan, Senior Product Man...
Configuration Management with AWS OpsWorks  by Amir Golan, Senior Product Man...Configuration Management with AWS OpsWorks  by Amir Golan, Senior Product Man...
Configuration Management with AWS OpsWorks  by Amir Golan, Senior Product Man...
 

Similar to AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

Track 5 Session 5_STG03 AWS 檔案儲存服務概觀
Track 5 Session 5_STG03 AWS 檔案儲存服務概觀Track 5 Session 5_STG03 AWS 檔案儲存服務概觀
Track 5 Session 5_STG03 AWS 檔案儲存服務概觀
Amazon Web Services
 
Track 5 Session 5_STG03 AWS 檔案儲存服務概觀
Track 5 Session 5_STG03 AWS 檔案儲存服務概觀Track 5 Session 5_STG03 AWS 檔案儲存服務概觀
Track 5 Session 5_STG03 AWS 檔案儲存服務概觀
Amazon Web Services
 

Similar to AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206) (20)

Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks
Deep Dive on Elastic File System - February 2017 AWS Online Tech TalksDeep Dive on Elastic File System - February 2017 AWS Online Tech Talks
Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks
 
(STG306) EFS: How to store 8 Exabytes & look good doing it
(STG306) EFS: How to store 8 Exabytes & look good doing it(STG306) EFS: How to store 8 Exabytes & look good doing it
(STG306) EFS: How to store 8 Exabytes & look good doing it
 
Building a Strong Foundation with AWS Storage Services
Building a Strong Foundation with AWS Storage ServicesBuilding a Strong Foundation with AWS Storage Services
Building a Strong Foundation with AWS Storage Services
 
Amazon Elastic File System (Amazon EFS) for File Storage
Amazon Elastic File System (Amazon EFS) for File StorageAmazon Elastic File System (Amazon EFS) for File Storage
Amazon Elastic File System (Amazon EFS) for File Storage
 
Deep Dive on Amazon Elastic File System - June 2017 AWS Online Tech Talks
Deep Dive on Amazon Elastic File System - June 2017 AWS Online Tech TalksDeep Dive on Amazon Elastic File System - June 2017 AWS Online Tech Talks
Deep Dive on Amazon Elastic File System - June 2017 AWS Online Tech Talks
 
Deep Dive on Amazon Elastic File System (Amazon EFS)
Deep Dive on Amazon Elastic File System (Amazon EFS)Deep Dive on Amazon Elastic File System (Amazon EFS)
Deep Dive on Amazon Elastic File System (Amazon EFS)
 
SRV401 Deep Dive on Amazon Elastic File System (Amazon EFS)
SRV401 Deep Dive on Amazon Elastic File System (Amazon EFS)SRV401 Deep Dive on Amazon Elastic File System (Amazon EFS)
SRV401 Deep Dive on Amazon Elastic File System (Amazon EFS)
 
2017 AWS DB Day | AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?
2017 AWS DB Day |  AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?2017 AWS DB Day |  AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?
2017 AWS DB Day | AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?
 
Intro to AWS: Storage Services
Intro to AWS: Storage ServicesIntro to AWS: Storage Services
Intro to AWS: Storage Services
 
Deep Dive on Amazon EFS | AWS Public Sector Summit 2017
Deep Dive on Amazon EFS | AWS Public Sector Summit 2017Deep Dive on Amazon EFS | AWS Public Sector Summit 2017
Deep Dive on Amazon EFS | AWS Public Sector Summit 2017
 
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch ServiceBDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
 
Amazon Elastic Block Store for Application Storage
Amazon Elastic Block Store for Application StorageAmazon Elastic Block Store for Application Storage
Amazon Elastic Block Store for Application Storage
 
Real-time Data Exploration and Analytics with Amazon Elasticsearch Service
Real-time Data Exploration and Analytics with Amazon Elasticsearch ServiceReal-time Data Exploration and Analytics with Amazon Elasticsearch Service
Real-time Data Exploration and Analytics with Amazon Elasticsearch Service
 
Amazon Elastic File System (EFS) for File Storage
Amazon Elastic File System (EFS) for File StorageAmazon Elastic File System (EFS) for File Storage
Amazon Elastic File System (EFS) for File Storage
 
Introduction to Amazon Elastic File System (EFS)
Introduction to Amazon Elastic File System (EFS)Introduction to Amazon Elastic File System (EFS)
Introduction to Amazon Elastic File System (EFS)
 
AWS Summit Berlin 2013 - Choosing the right data storage options with AWS
AWS Summit Berlin 2013 - Choosing the right data storage options with AWSAWS Summit Berlin 2013 - Choosing the right data storage options with AWS
AWS Summit Berlin 2013 - Choosing the right data storage options with AWS
 
Introduction to Storage on AWS - AWS Summit Cape Town 2017
Introduction to Storage on AWS - AWS Summit Cape Town 2017Introduction to Storage on AWS - AWS Summit Cape Town 2017
Introduction to Storage on AWS - AWS Summit Cape Town 2017
 
Track 5 Session 5_STG03 AWS 檔案儲存服務概觀
Track 5 Session 5_STG03 AWS 檔案儲存服務概觀Track 5 Session 5_STG03 AWS 檔案儲存服務概觀
Track 5 Session 5_STG03 AWS 檔案儲存服務概觀
 
Track 5 Session 5_STG03 AWS 檔案儲存服務概觀
Track 5 Session 5_STG03 AWS 檔案儲存服務概觀Track 5 Session 5_STG03 AWS 檔案儲存服務概觀
Track 5 Session 5_STG03 AWS 檔案儲存服務概觀
 
Migra le tue file shares in cloud con FSx for Windows
Migra le tue file shares in cloud con FSx for Windows Migra le tue file shares in cloud con FSx for Windows
Migra le tue file shares in cloud con FSx for Windows
 

More from Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 

AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

  • 1. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS December 2, 2016 STG206 Austin Fonacier, Spokeo Sajee Mathew, AWS Principal Solutions Architect
  • 2. What to Expect from the Session • Overview of Amazon EFS • How Spokeo uses EFS • What we do at Spokeo • Spokeo Tech Stack • Our challenge • Off the shelf CDN • Writing our own reverse proxy • Back ends • Populating EFS at scale • Lessons learned
  • 3. Batches and Streams Direct Connect Snowball, Snowmobile 3rd Party Connectors Transfer Acceleration Storage Gateway Amazon Kinesis Firehose File Amazon EFS Block Amazon EBS (persistent) Object Amazon GlacierAmazon S3 Amazon EC2 Instance Store (ephemeral) AWS Storage Overview
  • 4. Operating shared file storage today is a pain App owners and Developers Business Managers IT administrators  Estimate demand  Procure, setup, maintain hardware & space  Provide demand forecasts/business case  Limited flexibility and agility  CAPEX & over-buy  Constant upgrade/refresh cycle
  • 5. What if you could… App owners and Developers Business Managers IT administrators  Eliminate management & maintenance  Scale  Migrate code, apps, tools  Build new cloud-native apps  Predict cost & eliminate CAPEX  Increase agility  Less time managing file system
  • 6.  Fully managed file system for EC2  File system access semantics that works with standard OS APIs  Sharable across thousands of clients  Grow elastically to petabyte scale  Highly available and durable  Strong consistency What is Amazon EFS?
  • 7. Amazon EFS is simple Fully managed - No hardware, network, file layer - Create a scalable file system in seconds! Seamless integration with existing tools and apps - NFS v4.1—widespread, open - Standard file system access semantics - Works with standard OS file system APIs Simple pricing = simple forecasting
  • 8. Amazon EFS is elastic File systems grow and shrink automatically as you add and remove files No need to provision storage capacity or performance You pay for only the storage space you use, with no minimum fee
  • 9. File systems can grow to petabyte scale Throughput and IOPS scale automatically as file systems grow Consistent low latencies regardless of file system size Support for thousands of concurrent NFS connections Amazon EFS is scalable
  • 10. Designed to sustain Availability Zone (AZ) offline conditions Resources aggregated across multiple AZs Superior to traditional NAS availability models Appropriate for Production / Tier 0 applications Highly Durable and Highly Available
  • 11. Highly durable and highly available Every file system object (directory, file, and link) is redundantly stored across multiple Availability Zones in a region AVAILABILITY ZONE 1 REGION AVAILABILITY ZONE 2 AVAILABILITY ZONE 3 Amazon EFS
  • 12. Example use cases Big data analytics Media workflow processing Web serving Content management Home directories
  • 13. The AWS Management Console, CLI, and SDK each enable you to perform a variety of management tasks Create a file system Create and manage mount targets Tag a file system Delete a file system View details on file systems in your AWS account
  • 14. Setting up and mounting a file system takes under a minute 1. Create a file system 2. Create a mount target in each Availability Zone from which you want to access the file system 3. Enable the NFS client on your instances 4. Run the mount command
  • 15. Setting up and mounting a file system
  • 16. Two performance modes designed to support this broad spectrum of use cases Optimized for latency-sensitive applications and general- purpose, file-based workloads – the best option for the majority of use cases General Purpose mode Max I/O mode Can scale to higher levels of aggregate throughput with a tradeoff of slightly higher latencies for file operations Default: Recommended for most use cases Use Amazon CloudWatch to determine whether your application can benefit from Max I/O mode; if not, you’ll get the best performance in General Purpose mode
  • 17. EFS provides a throughput bursting model that scales as a file system grows As a file system gets larger, it needs access to more throughput Many file workloads are spiky, with peak throughput well above average levels + Amazon EFS scalable bursting model is designed to make performance available when you need it
  • 18. Throughput bursting model based on earning and spending “bursting credits” • File systems earn credits at a “baseline rate” of 0.05 MB/s per GB stored and use credits by performing file system operations; file systems can drive throughput at “baseline rate” indefinitely • File systems with a positive bursting credit balance can “burst” to higher levels for periods of time: 100 MB/s for file systems 1 TB or smaller, 100 MB/s per TB for file systems larger than 1 TB • New file systems start with a full credit balance
  • 19. Bursting model examples File system size Read/write throughput A 100 GB EFS file system can… • Drive up to 5 MB/s continuously or • Burst to 100 MB/s for up to 72 minutes each day A 1 TB EFS file system can… • Drive up to 50 MB/s continuously or • Burst to 100 MB/s for up to 12 hours each day A 10 TB EFS file system can… • Drive up to 500 MB/s continuously or • Burst to 1 GB/s for up to 12 hours each day
  • 20. In Which Regions Can I Use EFS Today? US East (N. Virginia) – us-east-1 US East (Ohio) – us-east-2 US West (Oregon) – us-west-2 EU (Ireland) – eu-west-1 More coming soon!
  • 21. Simple and predictable pricing With EFS, you pay for only the storage space you use • No minimum commitments or upfront fees • No need to provision storage in advance • No other fees, charges, or billing dimensions EFS price: • $0.30/GB/month (N.Virginia, Ohio, Oregon) • $0.33/GB/month (Ireland) Customers within their first 12 months on AWS can use up to 5 GB/month for free
  • 22. Introduction Austin Fonacier Lead Software Architect at Spokeo austin@spokeo.com @austinrfnd http://github.com/austinrfnd
  • 23. Spokeo People search engine Headquartered in beautiful Pasadena, CA 200+ employees 18,000,000 unique visitors a month 8.5 billion people records 30,000,000 bot hits per 24 hour period
  • 24. Spokeo the product Search for people by any intersection of data: first name, last name, email, age, address, phone, email, or relative name
  • 28. Our challenge: SEO pages 3,000,000,000 SEO pages ≈ 37.4 terabytes of data ≈ 30,000,000 crawls per day
  • 30. The importance of page speed ●Page speed abandonment rate ●Google utilizes page speed for search ranking ●Studies show a direct relationship with page speed and conversion rate
  • 31. How to get faster?
  • 32. Ninety - ninety rule “The first 90 percent of the code accounts for the first 10 percent of the development time. The remaining 10 percent of the code accounts for the other 90 percent of the development time.” - Tom Cargill Bell Labs
  • 33. How to get faster?
  • 34. Switch away from Ruby on Rails ●Ton of effort ●Ton of time ●Ton of money ●Unmeasurable performance gains
  • 35. How to get faster?
  • 36. Reverse Proxy ●Low effort (some code/header tweaks) ●Immediate measurable performance gains
  • 37. Over the counter reverse proxy ●Fast ●Easy (CDN & header changes) ●Global delivery system
  • 42. Back to the drawing board “Always serve Google the fastest possible page” - Mike Daly Spokeo CTO Google is the toughest critic. By making the site faster for Google, we are making our customers happy
  • 43. Cache requirements As fast as reasonably possible Cost efficient Scalable Fault tolerant Failover/availability
  • 44. Ruby on Rails cache Rails penalty of going through the framework “Always serve Google the fastest possible page” - Mike Daly Spokeo CTO
  • 48. Reverse proxy options ●All have in-memory mapping of keys and values ●Nginx and Varnish are expensive ●Apache Traffic Server doesn’t notify other nodes on writes ●All are huge code bases
  • 49. Reverse proxy options Cons Lines of Code Cost Nginx Memory mapping of all keys 164,978 $187,573 Varnish Memory mapping of all keys 220,813 $495,999 Apache Traffic Server Memory mapping of all keys Writes don’t propagate between nodes 889,824 $6,771 Assuming 45 c4.xlarge instances per month
  • 50. ●In-house expert knowledge ●Very simple use case ●Inexpensive to run (thin node.js app) ●No in-memory mapping Write our own: MassCache
  • 53. Back ends Cost* Performance Cons Amazon S3/Amazon CloudFront $6,000 10-1000ms CloudFront LRU Amazon DynamoDB $11,000 20-30ms Amazon ElastiCache $90,000 Fast Not data-persistent Amazon EBS volumes - - EBS mounting limitations** Amazon EFS $11,000/month 17 ms reads 30 ms writes (Max I/O mode, more details next slide)
  • 54. EFS Price: $11,000/month Performance: 17 ms for read, 30 ms for writes • Latencies in Max IO mode (General Purpose mode has lower latencies) • Writes: Node.js Open, Write, and Close • Reads: Node.JS file descriptor, file stats, and reading the contents • 30 kb files and peak EFS size is 2.3 GB Built-in data redundancy Built-in scalability
  • 57. Populating EFS: Cacheup ● Actively populate EFS as fast as possible ● EFS doesn’t shy away from 250,000 requests/second ● Populate 3,000,000,000 files in one week ● Cache invalidation: sending requests with a special header
  • 58. Cacheup: dynamic throttling Dynamic throttling based off of key metrics of our stack: ●Application performance index scores ●Response times ●Database load
  • 59. Benefits after a year with EFS, MassCache, and Cacheup ● Costs to serve a cached page ● Horizontally scalable • 37.4 TB • 3,000,000,000 files • 30,000,000 requests per day ● Active warming taught us about bottlenecks on our webstack ● Site redundancy ● Built-in DDOS protection ● Google webmaster dashboard numbers are steady
  • 60. EFS is the cloud EFS to us feels like an infinitely scalable resource ● Fast ● Easy ● Cheap ● Data redundant ● Goldilocks solution for us
  • 61. EFS gotchas ●Writes are slower than reads ●Writing a file is slightly slower than updating a file ●Improvements have been made since preview a year ago and will continue to occur; including support for NFSv4.1 ●Any access to EFS looks like a file access but is actually a network call! ●General Purpose (GP) ≠ Max I/O
  • 64. Related Sessions • STG202 - Deep Dive on Amazon Elastic File System • Recorded on Wednesday • STG207 - Case Study: How Atlassian Uses Amazon EFS with JIRA to Cut Costs and Accelerate Performance • Friday 12:30pm • STG208 - Case Study: How Monsanto Uses Amazon EFS with Their Large-Scale Geospatial Data Sets • Friday 11:00am