This document provides an experience report on going to production with serverless applications. It discusses several lessons learned including hidden complexities, the need to scale services earlier than with EC2 to avoid latency issues, high costs for unused resources with EC2, and long deployment times requiring downtime with EC2. It then outlines goals for serverless deployments including being small, fast, having zero downtime, loose coupling, minimizing costs and ops effort. The document describes the author's migration of several services to serverless and the resulting improvements in deployment frequency, costs savings, and time to delivery. It concludes by discussing various practices and tools related to deploying and managing serverless applications at production scale.
11. hidden complexities and dependencies
low utilisation to leave room for traffic spikes
EC2 scaling is slow, so scale earlier
lots of cost for unused resources
up to 30 mins for deployment
deployment required downtime
12. - Dan North
“lead time to someone saying
thank you is the only reputation
metric that matters.”
69. “…We find that tests that mock external
libraries often need to be complex to
get the code into the right state for the
functionality we need to exercise.
The mess in such tests is telling us that
the design isn’t right but, instead of
fixing the problem by improving the
code, we have to carry the extra
complexity in both code and test…”
Don’t Mock Types You Can’t Change
70. “…The second risk is that we have to be
sure that the behaviour we stub or mock
matches what the external library will
actually do…
Even if we get it right once, we have to
make sure that the tests remain valid
when we upgrade the libraries…”
Don’t Mock Types You Can’t Change
76. is our request correct?
is the request mapping
set up correctly?is the API resources
configured correctly?
are we assuming the
correct schema?
LambdaAPI Gateway DynamoDB
is Lambda proxy
configured correctly?
is IAM policy set
up correctly?
is the table created?
what unit tests will not tell you…
77.
78. most Lambda functions are simple
have single purpose, the risk of
shipping broken software has largely
shifted to how they integrate with
external services
observation
79.
80. But it slows down
my feedback loop…
IT’S NOT
ABOUT YOU!
81. …if a service can’t provide
you with a relatively easy
way to test the interface in
reality, then you should
consider using another one.
Paul Johnston
82. “…Wherever possible, an acceptance
test should exercise the system end-to-
end without directly calling its internal
code.
An end-to-end test interacts with the
system only from the outside: through
its interface…”
Testing End-to-End
93. “the earlier you consider CI + CD, the
more time you save in the long run”
- Yan
94. “…We prefer to have the end-to-end
tests exercise both the system and the
process by which it’s built and
deployed…
This sounds like a lot of effort (it is), but
has to be done anyway repeatedly
during the software’s lifetime…”
Testing End-to-End
143. console.log(“hydrating yubls from db…”);
console.log(“fetching user info from user-api”);
console.log(“MONITORING|1489795335|27.4|latency|user-api-latency”);
console.log(“MONITORING|1489795335|8|count|yubls-served”);
timestamp metric value
metric type
metric namemetrics
logs
192. complexity ceiling of a
Node.js app
complexity
referential transparency
immutability as default
type inference
option types
union types
…
193. for managing complexity
complexity ceiling of a
Node.js app
complexity
referential transparency
immutability as default
type inference
option types
union types
…
194. complexity ceiling of a
Node.js app
complexity
complexity ceiling of a
Node.js Lambda function
195. if you can limit the complexity
of your solution, maybe you
won’t need the tools for
managing that complexity.
me
206. “AWS Lambda polls your stream and
invokes your Lambda function. Therefore, if
a Lambda function fails, AWS Lambda
attempts to process the erring batch of
records until the time the data expires…”
http://docs.aws.amazon.com/lambda/latest/dg/retries-on-errors.html
212. “Each shard can support up to
5 transactions per second for
reads, up to a maximum total data
read rate of 2 MB per second.”
http://docs.aws.amazon.com/streams/latest/dev/service-sizes-and-limits.html
213. “If your stream has 100 active shards,
there will be 100 Lambda functions
running concurrently. Then, each
Lambda function processes events
on a shard in the order that they arrive.”
http://docs.aws.amazon.com/lambda/latest/dg/concurrent-executions.html