SUMMIT
Berlin
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Making S3 more resilient
using Lambda@Edge
Júlia Biró, Yann Hamon
Reliability Team
Contentful
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Introduction
Júlia Biró
Reliability Engineer
Yann Hamon
Reliability Engineer
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Agenda
1. Our goal
2. Proof of concept
3. Going live
4. Improving our Lambda@Edge software platform
5. Review
SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Our goal
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Our file delivery infrastructure
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Our goal: multi-region active-active
https://www.youtube.com/watch?v=2e29I3dA8o4
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Active-active vs. failover
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Current state-of-the-art (?)
Highly available multi region S3 website
Cloudfront distributions - Derek Higgins (2017)
Could work but:
Failover solution
No guaranteed propagation time for
configuration changes in Cloudfront
Manual reset
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
A possible solution...
Use an origin with DNS Round-Robin?
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
… that doesn’t work
GET /puppy.jpg HTTP/1.1
Host: examplebucket.s3-us-west-2.amazonaws.com
Date: Mon, 11 Apr 2016 12:00:00 GMT
x-amz-date: Mon, 11 Apr 2016 12:00:00 GMT
Authorization: authorization string
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
There must be a way...
Dynamically Route Viewer Requests
to Any Origin Using Lambda@Edge
Jake Wells, AWS Blog (Nov. 2017)
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
A/B Testing with Lambda@Edge
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Load-balancing with Lambda@Edge
dns.resolveCname('files-origin.contentful.com', (function(err, result) {
if (result[0].includes('us-east-1')) {
bucketName = 'cf-files.s3.us-east-1.amazonaws.com';
region = 'us-east-1';
} else {
bucketName = 'cf-files.s3.eu-west-1.amazonaws.com';
region = 'eu-west-1';
}
request.origin.s3.region = region;
request.origin.s3.domain = bucketName;
request.headers['host'] = [{key: 'host', value: bucketName}];
}))
SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Proof of concept
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Technology validation
We build a small proof-of-concept. Only 20 lines of code!
Proof: different image with the same path in both regions.
us-east-1 eu-west-1
Learnings:
Our Javascript is not great - we don't do this every day!
We do a DNS resolution on every cache miss
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
DNS Caching
static async ResolveCname(fqdn) {
let now = Date.now();
let cachedEntry = this.cache[fqdn];
if (cachedEntry && now - cachedEntry.updatedAt < this.defaultTTL) {
return cachedEntry.answers;
}
[...]
DNS lookups add up to 100ms latency to our requests...
But we can cache the results [1]
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
It works!
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Leap of faith
Our team is not used to writing Javascript
Lambda@Edge is a new technology to the
company
High cost of failure
©https://www.iimef.marines.mil/Units/MEF-Support-Battalion/Article/1096888/leap-of-faith/
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
High cost of failure
Contractual SLAs
100s of requests/second
No graceful degradation
We need safety gear.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Why do we trust our current software platform?
testing deploy-ment
alerting
logging
version
control
monitoring
dash-
boards debugging
perfor-
mance
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Production-readiness criteria
Solution-agnostic criteria list
To add an image, select Click
to insert image, and find the
image you want to use.
Gap analysis
Translated for the specific solution
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Production-readiness list is an ideal
The goal is to reduce uncertainty and risk RUNNIN' RHINO
Design by Allan Faustino
No need to meet all requirements:
Our existing software platforms did
not meet all requirements
It needs to be a conscious and
documented decision
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Risk-aversion
SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Going live
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Going live… gradually
1. Attach the Lambda - new feature turned off completely
2. Feature-flag / dark release
3. Whitelist for some internal test customers
4. Gradually roll out to all traffic
5. Monitor all steps
Big red button: ability to quickly revert
at all times
Expect to meet unknowns.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Our new file delivery infrastructure
SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Improving our Lambda@Edge
software platform
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Migrating a larger service to Lambda@Edge
10 times more requests per second
Full-fledged API, lot of logic
/yadj1kx9rmg0/wtrHxeu3zEoEce2MokCSi/cf6f68efdcf625fdc060607df0f3baef/qu
wowooybuqbl6ntboz3.jpg?fm=jpg&w=250&h=100
Request
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
TypeScript
JavaScript
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Rolling out again
https://commons.wikimedia.org/wiki/File:Pit_stop_raikkonen.jpg
Improved CI
Code review
Rollbacks
Backport tooling
Operational experience
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Working with Lambda@Edge
Near-immediate scale-up
Marginal costs
Highly available
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Working with Lambda@Edge
But...
Challenging development environment (esp. integration testing)
Logs saved in each region
Our deployment workflow is still manual
TypeScript helped us write safer code
We contributed the Lambda@Edge types to the AWS TypeScript package
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Future improvements
Use Geo-routing to forward requests to the closest S3 bucket
Run the DNS resolution outside of the main event loop
Automate deployments
SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
A few months later...
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
A few months later...
Similar implementations emerged:
Using Amazon CloudFront with Multi-Region Amazon S3 Origins (Sept. 30th, Seldam)
Amazon S3 Region Failover — Part 2: CloudFront S3 origin failover (Oct. 30th, Frias)
Cloudfront origin failover was introduced:
Amazon CloudFront announces support for Origin Failover (Nov. 20th, AWS)
But…
Current solution has proven cheap, fast and stable...
and has laid the groundworks for other multi-region projects.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Takeaways
Lambda@Edge is a cheap, scalable & highly reliable platform to build
stateless APIs
Feature-flagging, canarying, gradual rollouts are easy to use with
Lambda@Edge to reduce risk of large-scale changes
When the cost of failure is high, use production-readiness lists
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Thank you!
SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Júlia Biró, Yann Hamon
julia.biro@contentful.com, yann.hamon@contentful.com

Making s3 more resilient using lambda@edge

  • 1.
  • 2.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved.SUMMIT Making S3 more resilient using Lambda@Edge Júlia Biró, Yann Hamon Reliability Team Contentful
  • 3.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved.SUMMIT Introduction Júlia Biró Reliability Engineer Yann Hamon Reliability Engineer
  • 4.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved.SUMMIT
  • 5.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved.SUMMIT Agenda 1. Our goal 2. Proof of concept 3. Going live 4. Improving our Lambda@Edge software platform 5. Review
  • 6.
    SUMMIT © 2019,Amazon Web Services, Inc. or its affiliates. All rights reserved. Our goal
  • 7.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved.SUMMIT Our file delivery infrastructure
  • 8.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved.SUMMIT Our goal: multi-region active-active https://www.youtube.com/watch?v=2e29I3dA8o4
  • 9.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved.SUMMIT Active-active vs. failover
  • 10.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved.SUMMIT Current state-of-the-art (?) Highly available multi region S3 website Cloudfront distributions - Derek Higgins (2017) Could work but: Failover solution No guaranteed propagation time for configuration changes in Cloudfront Manual reset
  • 11.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved.SUMMIT A possible solution... Use an origin with DNS Round-Robin?
  • 12.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved.SUMMIT … that doesn’t work GET /puppy.jpg HTTP/1.1 Host: examplebucket.s3-us-west-2.amazonaws.com Date: Mon, 11 Apr 2016 12:00:00 GMT x-amz-date: Mon, 11 Apr 2016 12:00:00 GMT Authorization: authorization string
  • 13.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved.SUMMIT There must be a way... Dynamically Route Viewer Requests to Any Origin Using Lambda@Edge Jake Wells, AWS Blog (Nov. 2017)
  • 14.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved.SUMMIT A/B Testing with Lambda@Edge
  • 15.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved.SUMMIT Load-balancing with Lambda@Edge dns.resolveCname('files-origin.contentful.com', (function(err, result) { if (result[0].includes('us-east-1')) { bucketName = 'cf-files.s3.us-east-1.amazonaws.com'; region = 'us-east-1'; } else { bucketName = 'cf-files.s3.eu-west-1.amazonaws.com'; region = 'eu-west-1'; } request.origin.s3.region = region; request.origin.s3.domain = bucketName; request.headers['host'] = [{key: 'host', value: bucketName}]; }))
  • 16.
    SUMMIT © 2019,Amazon Web Services, Inc. or its affiliates. All rights reserved. Proof of concept
  • 17.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved.SUMMIT Technology validation We build a small proof-of-concept. Only 20 lines of code! Proof: different image with the same path in both regions. us-east-1 eu-west-1 Learnings: Our Javascript is not great - we don't do this every day! We do a DNS resolution on every cache miss
  • 18.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved.SUMMIT DNS Caching static async ResolveCname(fqdn) { let now = Date.now(); let cachedEntry = this.cache[fqdn]; if (cachedEntry && now - cachedEntry.updatedAt < this.defaultTTL) { return cachedEntry.answers; } [...] DNS lookups add up to 100ms latency to our requests... But we can cache the results [1]
  • 19.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved.SUMMIT It works!
  • 20.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved.SUMMIT Leap of faith Our team is not used to writing Javascript Lambda@Edge is a new technology to the company High cost of failure ©https://www.iimef.marines.mil/Units/MEF-Support-Battalion/Article/1096888/leap-of-faith/
  • 21.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved.SUMMIT High cost of failure Contractual SLAs 100s of requests/second No graceful degradation We need safety gear.
  • 22.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved.SUMMIT Why do we trust our current software platform? testing deploy-ment alerting logging version control monitoring dash- boards debugging perfor- mance
  • 23.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved.SUMMIT Production-readiness criteria Solution-agnostic criteria list To add an image, select Click to insert image, and find the image you want to use. Gap analysis Translated for the specific solution
  • 24.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved.SUMMIT Production-readiness list is an ideal The goal is to reduce uncertainty and risk RUNNIN' RHINO Design by Allan Faustino No need to meet all requirements: Our existing software platforms did not meet all requirements It needs to be a conscious and documented decision
  • 25.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved.SUMMIT Risk-aversion
  • 26.
    SUMMIT © 2019,Amazon Web Services, Inc. or its affiliates. All rights reserved. Going live
  • 27.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved.SUMMIT Going live… gradually 1. Attach the Lambda - new feature turned off completely 2. Feature-flag / dark release 3. Whitelist for some internal test customers 4. Gradually roll out to all traffic 5. Monitor all steps Big red button: ability to quickly revert at all times Expect to meet unknowns.
  • 28.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved.SUMMIT Our new file delivery infrastructure
  • 29.
    SUMMIT © 2019,Amazon Web Services, Inc. or its affiliates. All rights reserved. Improving our Lambda@Edge software platform
  • 30.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved.SUMMIT Migrating a larger service to Lambda@Edge 10 times more requests per second Full-fledged API, lot of logic /yadj1kx9rmg0/wtrHxeu3zEoEce2MokCSi/cf6f68efdcf625fdc060607df0f3baef/qu wowooybuqbl6ntboz3.jpg?fm=jpg&w=250&h=100 Request
  • 31.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved.SUMMIT TypeScript JavaScript
  • 32.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved.SUMMIT Rolling out again https://commons.wikimedia.org/wiki/File:Pit_stop_raikkonen.jpg Improved CI Code review Rollbacks Backport tooling Operational experience
  • 33.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved.SUMMIT Working with Lambda@Edge Near-immediate scale-up Marginal costs Highly available
  • 34.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved.SUMMIT Working with Lambda@Edge But... Challenging development environment (esp. integration testing) Logs saved in each region Our deployment workflow is still manual TypeScript helped us write safer code We contributed the Lambda@Edge types to the AWS TypeScript package
  • 35.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved.SUMMIT Future improvements Use Geo-routing to forward requests to the closest S3 bucket Run the DNS resolution outside of the main event loop Automate deployments
  • 36.
    SUMMIT © 2019,Amazon Web Services, Inc. or its affiliates. All rights reserved. A few months later...
  • 37.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved.SUMMIT A few months later... Similar implementations emerged: Using Amazon CloudFront with Multi-Region Amazon S3 Origins (Sept. 30th, Seldam) Amazon S3 Region Failover — Part 2: CloudFront S3 origin failover (Oct. 30th, Frias) Cloudfront origin failover was introduced: Amazon CloudFront announces support for Origin Failover (Nov. 20th, AWS) But… Current solution has proven cheap, fast and stable... and has laid the groundworks for other multi-region projects.
  • 38.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved.SUMMIT Takeaways Lambda@Edge is a cheap, scalable & highly reliable platform to build stateless APIs Feature-flagging, canarying, gradual rollouts are easy to use with Lambda@Edge to reduce risk of large-scale changes When the cost of failure is high, use production-readiness lists
  • 39.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved.SUMMIT Thank you! SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Júlia Biró, Yann Hamon julia.biro@contentful.com, yann.hamon@contentful.com