Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
in production
an experience reportan experience report
what you should know before you go to production
ServerlessServerle...
Yan Cui
http://theburningmonk.com
@theburningmonk
AWS user since 2009
Yan Cui
Server Architect
Principal Engineer
Lead Developer
Senior Developer
http://theburningmonk.com
@theburningmonk
Seni...
Yan Cui
Server Architect
Principal Engineer
Lead Developer
Senior Developer
http://theburningmonk.com
@theburningmonk
Seni...
Domas Lasauskas
AWS user since 2012
Domas Lasauskas
Senior Developer
Server Developer
Senior Developer
Server Developer
apr, 2016
hidden complexities and dependencies
low utilisation to leave room for traffic spikes
EC2 scaling is slow, so scale earlie...
- Dan North
“lead time to someone saying
thank you is the only reputation
metric that matters.”
“what would good
look like for us?”
be small
be fast
have zero downtime
have no lock-step
DEPLOYMENTS SHOULD...
FEATURES SHOULD...
be deployable independently
be loosely-coupled
WE WANT TO...
minimise cost for unused resources
minimise ops effort
reduce tech mess
deliver visible improvements faster
nov, 2016
170 Lambda functions in prod
1.2 GB deployment packages in prod
95% cost saving vs EC2
15x no. of prod releases per month
time
is a good fit
1st function in prod!
time
is a good fit
?
time
is a good fit
1st function in prod!
ALERTING
CI / CD
TESTING
LOGGING
MONITORING
Practices ToolsPrinciples
what is good? how to make it good? with what?
Principles outlast Tools
170 functions
WOOF!
? ?
time
is a good fit
1st function in prod!
SECURITY
DISTRIBUTED
TRACING
CONFIG
MANAGEMENT
evolving the PLATFORM
rebuilt search
Legacy Monolith Amazon Kinesis Amazon Lambda
Amazon CloudSearch
Legacy Monolith Amazon Kinesis Amazon Lambda
Amazon CloudSearchAmazon API Gateway Amazon Lambda
new analytics pipeline
Legacy Monolith Amazon Kinesis Amazon Lambda
Google BigQuery
Legacy Monolith Amazon Kinesis Amazon Lambda
Google BigQuery
1 developer, 2 days
design production
(his 1st serverless pro...
Legacy Monolith Amazon Kinesis Amazon Lambda
Google BigQuery
“nothing ever got done
this fast at Skype!”
- Chris Twamley
- Dan North
“lead time to someone saying
thank you is the only reputation
metric that matters.”
Rebuilt
with Lambda
Rebuilt
with Lambda
BigQuery
BigQuery
grapheneDB
BigQuery
grapheneDB
BigQuery
grapheneDB
BigQuery
getting PRODUCTION READY
CHOOSE A
FRAMEWORK
DEPLOYMENT
http://serverless.com
https://github.com/awslabs/serverless-application-model
http://apex.run
https://apex.github.io/up
https://github.com/claudiajs/claudia
https://github.com/Miserlou/Zappa
http://gosparta.io/
TESTING
amzn.to/29Lxuzu
Level of Testing
1.Unit
do our objects do the right thing?
are they easy to work with?
Level of Testing
1.Unit
2.Integration
does our code work against code we
can’t change?
handler
handler
test by invoking
the handler
Level of Testing
1.Unit
2.Integration
3.Acceptance
does the whole system work?
Level of Testing
unit
integration
acceptance
feedback
confidence
“…We find that tests that mock external
libraries often need to be complex to
get the code into the right state for the
fu...
“…The second risk is that we have to be
sure that the behaviour we stub or mock
matches what the external library will
act...
Don’t Mock Types You Can’t Change
Services
Paul Johnston
The serverless approach to
testing is different and may
actually be easier.
http://bit.ly/2t5viwK
LambdaAPI Gateway DynamoDB
LambdaAPI Gateway DynamoDB
Unit Tests
LambdaAPI Gateway DynamoDB
Unit Tests
Mock/Stub
is our request correct?
is the request mapping
set up correctly?is the API resources
configured correctly?
are we assuming ...
most Lambda functions are simple
have single purpose, the risk of
shipping broken software has largely
shifted to how they...
But it slows down
my feedback loop…
IT’S NOT
ABOUT YOU!
…if a service can’t provide
you with a relatively easy
way to test the interface in
reality, then you should
consider usin...
“…Wherever possible, an acceptance
test should exercise the system end-to-
end without directly calling its internal
code....
Legacy Monolith Amazon Kinesis Amazon Lambda
Amazon CloudSearchAmazon API Gateway Amazon Lambda
Legacy Monolith Amazon Kinesis Amazon Lambda
Amazon CloudSearchAmazon API Gateway Amazon Lambda
Test Input
Legacy Monolith Amazon Kinesis Amazon Lambda
Amazon CloudSearchAmazon API Gateway Amazon Lambda
Test Input
Validate
integration tests exercise
system’s Integration with its
external dependencies
my code
acceptance tests exercise
system End-to-End from
the outside
my code
integration tests differ from
acceptance tests only in HOW the
Lambda functions are invoked
observation
CI + CD PIPELINE
“the earlier you consider CI + CD, the
more time you save in the long run”
- Yan
“…We prefer to have the end-to-end
tests exercise both the system and the
process by which it’s built and
deployed…
This s...
“deployment scripts
that only live on the CI
box is a disaster
waiting to happen”
- Yan
Jenkins build config deploys and tests
unit + integration tests
deploy
acceptance tests
build.sh allows repeatable builds on both local & CI
if [ "$1" = "deploy" ] && [ $# -eq 4 ]; then
STAGE=$2
REGION=$3
PROFILE=$4
npm install
AWS_PROFILE=$PROFILE 'node_modules/...
Jenkinsfile
Auto Auto Manual
LOGGING
2016-07-12T12:24:37.571Z 994f18f9-482b-11e6-8668-53e4eab441ae
GOT is off air, what do I do now?
2016-07-12T12:24:37.571Z 994f18f9-482b-11e6-8668-53e4eab441ae
GOT is off air, what do I do now?
UTC Timestamp API Gateway ...
function name
date
function version
Yan
Logs are not easily searchable
in CloudWatch Logs.
LOG OVERLOAD
CENTRALISE LOGS
CENTRALISE LOGS
MAKE THEM EASILY
SEARCHABLE
+ +
the elk stack
CloudWatch Logs
CloudWatch Logs AWS Lambda ELK stack
CloudWatch Events
http://bit.ly/2f3zxQG
DISTRIBUTED TRACING
“my followers didn’t
receive my new post!”
- a user
where could the
problem be?
correlation IDs*
* eg. request-id, user-id, yubl-id, etc.
ROLL YOUR OWN
CLIENTS
kinesis client
http client
sns client
http://bit.ly/2k93hAj
ROLL YOUR OWN
CLIENTS
X-RAY
Amazon X-Ray
Amazon X-Ray
traces do not span over
API Gateway
http://bit.ly/2s9yxmA
MONITORING + ALERTING
“where do I install
monitoring agents?”
you can’t
• invocation Count
• error Count
• latency
• throttling
• granular to the minute
• support custom metrics
• same metrics as CW
• better dashboard
• support custom metrics
https://www.datadoghq.com/blog/monitoring-lambda-function...
my code
my code
my code
internet internet
press button something happens
“how do I batch up
and send logs in the
background?”
you can’t
(kinda)
console.log(“hydrating yubls from db…”);
console.log(“fetching user info from user-api”);
console.log(“MONITORING|14897953...
CloudWatch Logs AWS Lambda
ELK stack
logs
metrics
CloudWatch
http://bit.ly/2gGredx
DASHBOARDS
DASHBOARDS
SET ALARMS
DASHBOARDS
SET ALARMS
TRACK APP-LEVEL
METRICS
Not Only CloudWatch
“you really don't want
your monitoring
system to fail at the
same time as the
system it monitors”
- me
CONFIG MANAGEMENT
easily and quickly propagate
config changes
Yan
Environment variables make it
hard to share configurations
across functions.
Yan
Environment variables make it
hard to implement fine-grained
access to sensitive info.
CENTRALISED
CONFIG SERVICE
config service
goes here
SSM
Parameter
Store
sensitive data should be encrypted
in-flight, and at rest
(credentials, connection string, etc.)
role-based access
SSM Parameter Store
HTTPS
role-based access
encrypted in-flight
SSM Parameter Store
encrypt
role-based access
SSM Parameter Store
encrypted at-rest
HTTPS
role-based access
SSM Parameter Store
encrypted in-flight
CENTRALISED
CONFIG SERVICE
CLIENT LIBRARY
fetch & cache at Cold Start
invalidate at interval + signal
http://bit.ly/2yLUjwd
PRO TIPS
max 75 GB total deployment package size*
* limit is per AWS region
Janitor Monkey
Janitor Lambda
http://bit.ly/2xzVu4a
disable versionFunctions in
install Serverless framework as dev
dependency at project level
dev dependencies are excluded since 1.16.0
http://bit.ly/2vzBqhC
http://amzn.to/2vtUkDU
UNDERSTAND
COLDSTARTS
Amazon X-Ray
1st invocation
2nd invocation
cold start
source: http://bit.ly/2oBEbw2
http://bit.ly/2rtCCBz
C#
http://bit.ly/2rtCCBz
Java
http://bit.ly/2rtCCBz
NodeJs, Python
http://bit.ly/2rtCCBz
EMBRACE
NODE.JS & PYTHON
what about type safety?
complexity ceiling of a
Node.js app
complexity
complexity ceiling of a
Node.js app
complexity
referential transparency
immutability as default
type inference
option type...
for managing complexity
complexity ceiling of a
Node.js app
complexity
referential transparency
immutability as default
ty...
complexity ceiling of a
Node.js app
complexity
complexity ceiling of a
Node.js Lambda function
if you can limit the complexity
of your solution, maybe you
won’t need the tools for
managing that complexity.
me
AVOID
COLDSTARTS
CloudWatch Event AWS Lambda
CloudWatch Event AWS Lambda
ping
ping
ping
ping
CloudWatch Event AWS Lambda
ping
ping
ping
ping
CloudWatch Event AWS Lambda
ping
ping
ping
ping
HEALTH CHECKS?
AVOID HARD
ASSUMPTIONS
ABOUT FUNCTION
LIFETIME
USE STATE
FOR
OPTIMISATION
max 5 mins execution time
USE RECURSION
FOR LONG
RUNNING TASKS
CONSIDER
PARTIAL
FAILURES
“AWS Lambda polls your stream and
invokes your Lambda function. Therefore, if
a Lambda function fails, AWS Lambda
attempts...
should function fail on
partial/any failures?
SNS
Kinesis
SQS
after 3 attempts
share processing logic
events are processed in
chronological order
failed events are retr...
PROCESS SQS
WITH RECURSIVE
FUNCTIONS
http://bit.ly/2npomX6
AVOID HOT
KINESIS
STREAMS
“Each shard can support up to
5 transactions per second for
reads, up to a maximum total data
read rate of 2 MB per second...
“If your stream has 100 active shards,
there will be 100 Lambda functions
running concurrently. Then, each
Lambda function...
when no. of processors goes up…
ReadProvisionedThroughputExceeded
can have too many Kinesis read operations…
ReadRecords.IteratorAge
unpredictable spikes in read ‘latency’…
can kinda workaround…
http://bit.ly/2uv5LsH
clever, but costly
for subsystems that don’t have
to be realtime, or are task-
based (ie. order doesn’t
matter), consider other
triggers such...
@theburningmonk
theburningmonk.com
github.com/theburningmonk
API Gateway and Kinesis
Authentication & authorisation (IAM, Cognito)
Testing
Running & Debugging functions locally
Log ag...
API Gateway and Kinesis
Authentication & authorisation (IAM, Cognito)
Testing
Running & Debugging functions locally
Log ag...
Serverless in production, an experience report (NDC London 2018)
Serverless in production, an experience report (NDC London 2018)
Serverless in production, an experience report (NDC London 2018)
Serverless in production, an experience report (NDC London 2018)
Serverless in production, an experience report (NDC London 2018)
Serverless in production, an experience report (NDC London 2018)
Serverless in production, an experience report (NDC London 2018)
Serverless in production, an experience report (NDC London 2018)
Serverless in production, an experience report (NDC London 2018)
Serverless in production, an experience report (NDC London 2018)
Serverless in production, an experience report (NDC London 2018)
Serverless in production, an experience report (NDC London 2018)
Serverless in production, an experience report (NDC London 2018)
Serverless in production, an experience report (NDC London 2018)
Serverless in production, an experience report (NDC London 2018)
Serverless in production, an experience report (NDC London 2018)
Serverless in production, an experience report (NDC London 2018)
Serverless in production, an experience report (NDC London 2018)
Serverless in production, an experience report (NDC London 2018)
Serverless in production, an experience report (NDC London 2018)
Serverless in production, an experience report (NDC London 2018)
Serverless in production, an experience report (NDC London 2018)
Serverless in production, an experience report (NDC London 2018)
Serverless in production, an experience report (NDC London 2018)
Serverless in production, an experience report (NDC London 2018)
Serverless in production, an experience report (NDC London 2018)
Serverless in production, an experience report (NDC London 2018)
Serverless in production, an experience report (NDC London 2018)
Serverless in production, an experience report (NDC London 2018)
Serverless in production, an experience report (NDC London 2018)
Upcoming SlideShare
Loading in …5
×

Serverless in production, an experience report (NDC London 2018)

396 views

Published on

AWS Lambda has changed the way we deploy and run software, but this new serverless paradigm has created new challenges to old problems - how do you test a cloud-hosted function locally? How do you monitor them? What about logging and config management? And how do we start migrating from existing architectures?

In this talk Yan and Domas will discuss solutions to these challenges by drawing from real-world experience running Lambda in production and migrating from an existing monolithic architecture.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Serverless in production, an experience report (NDC London 2018)

  1. 1. in production an experience reportan experience report what you should know before you go to production ServerlessServerless
  2. 2. Yan Cui http://theburningmonk.com @theburningmonk AWS user since 2009
  3. 3. Yan Cui Server Architect Principal Engineer Lead Developer Senior Developer http://theburningmonk.com @theburningmonk Senior Developer
  4. 4. Yan Cui Server Architect Principal Engineer Lead Developer Senior Developer http://theburningmonk.com @theburningmonk Senior Developer
  5. 5. Domas Lasauskas AWS user since 2012
  6. 6. Domas Lasauskas Senior Developer Server Developer Senior Developer Server Developer
  7. 7. apr, 2016
  8. 8. hidden complexities and dependencies low utilisation to leave room for traffic spikes EC2 scaling is slow, so scale earlier lots of cost for unused resources up to 30 mins for deployment deployment required downtime
  9. 9. - Dan North “lead time to someone saying thank you is the only reputation metric that matters.”
  10. 10. “what would good look like for us?”
  11. 11. be small be fast have zero downtime have no lock-step DEPLOYMENTS SHOULD...
  12. 12. FEATURES SHOULD... be deployable independently be loosely-coupled
  13. 13. WE WANT TO... minimise cost for unused resources minimise ops effort reduce tech mess deliver visible improvements faster
  14. 14. nov, 2016
  15. 15. 170 Lambda functions in prod 1.2 GB deployment packages in prod 95% cost saving vs EC2 15x no. of prod releases per month
  16. 16. time is a good fit
  17. 17. 1st function in prod! time is a good fit
  18. 18. ? time is a good fit 1st function in prod!
  19. 19. ALERTING CI / CD TESTING LOGGING MONITORING
  20. 20. Practices ToolsPrinciples what is good? how to make it good? with what?
  21. 21. Principles outlast Tools
  22. 22. 170 functions WOOF! ? ? time is a good fit 1st function in prod!
  23. 23. SECURITY DISTRIBUTED TRACING CONFIG MANAGEMENT
  24. 24. evolving the PLATFORM
  25. 25. rebuilt search
  26. 26. Legacy Monolith Amazon Kinesis Amazon Lambda Amazon CloudSearch
  27. 27. Legacy Monolith Amazon Kinesis Amazon Lambda Amazon CloudSearchAmazon API Gateway Amazon Lambda
  28. 28. new analytics pipeline
  29. 29. Legacy Monolith Amazon Kinesis Amazon Lambda Google BigQuery
  30. 30. Legacy Monolith Amazon Kinesis Amazon Lambda Google BigQuery 1 developer, 2 days design production (his 1st serverless project)
  31. 31. Legacy Monolith Amazon Kinesis Amazon Lambda Google BigQuery “nothing ever got done this fast at Skype!” - Chris Twamley
  32. 32. - Dan North “lead time to someone saying thank you is the only reputation metric that matters.”
  33. 33. Rebuilt with Lambda
  34. 34. Rebuilt with Lambda
  35. 35. BigQuery
  36. 36. BigQuery
  37. 37. grapheneDB BigQuery
  38. 38. grapheneDB BigQuery
  39. 39. grapheneDB BigQuery
  40. 40. getting PRODUCTION READY
  41. 41. CHOOSE A FRAMEWORK DEPLOYMENT
  42. 42. http://serverless.com
  43. 43. https://github.com/awslabs/serverless-application-model
  44. 44. http://apex.run
  45. 45. https://apex.github.io/up
  46. 46. https://github.com/claudiajs/claudia
  47. 47. https://github.com/Miserlou/Zappa
  48. 48. http://gosparta.io/
  49. 49. TESTING
  50. 50. amzn.to/29Lxuzu
  51. 51. Level of Testing 1.Unit do our objects do the right thing? are they easy to work with?
  52. 52. Level of Testing 1.Unit 2.Integration does our code work against code we can’t change?
  53. 53. handler
  54. 54. handler test by invoking the handler
  55. 55. Level of Testing 1.Unit 2.Integration 3.Acceptance does the whole system work?
  56. 56. Level of Testing unit integration acceptance feedback confidence
  57. 57. “…We find that tests that mock external libraries often need to be complex to get the code into the right state for the functionality we need to exercise. The mess in such tests is telling us that the design isn’t right but, instead of fixing the problem by improving the code, we have to carry the extra complexity in both code and test…” Don’t Mock Types You Can’t Change
  58. 58. “…The second risk is that we have to be sure that the behaviour we stub or mock matches what the external library will actually do… Even if we get it right once, we have to make sure that the tests remain valid when we upgrade the libraries…” Don’t Mock Types You Can’t Change
  59. 59. Don’t Mock Types You Can’t Change Services
  60. 60. Paul Johnston The serverless approach to testing is different and may actually be easier. http://bit.ly/2t5viwK
  61. 61. LambdaAPI Gateway DynamoDB
  62. 62. LambdaAPI Gateway DynamoDB Unit Tests
  63. 63. LambdaAPI Gateway DynamoDB Unit Tests Mock/Stub
  64. 64. is our request correct? is the request mapping set up correctly?is the API resources configured correctly? are we assuming the correct schema? LambdaAPI Gateway DynamoDB is Lambda proxy configured correctly? is IAM policy set up correctly? is the table created? what unit tests will not tell you…
  65. 65. most Lambda functions are simple have single purpose, the risk of shipping broken software has largely shifted to how they integrate with external services observation
  66. 66. But it slows down my feedback loop… IT’S NOT ABOUT YOU!
  67. 67. …if a service can’t provide you with a relatively easy way to test the interface in reality, then you should consider using another one. Paul Johnston
  68. 68. “…Wherever possible, an acceptance test should exercise the system end-to- end without directly calling its internal code. An end-to-end test interacts with the system only from the outside: through its interface…” Testing End-to-End
  69. 69. Legacy Monolith Amazon Kinesis Amazon Lambda Amazon CloudSearchAmazon API Gateway Amazon Lambda
  70. 70. Legacy Monolith Amazon Kinesis Amazon Lambda Amazon CloudSearchAmazon API Gateway Amazon Lambda Test Input
  71. 71. Legacy Monolith Amazon Kinesis Amazon Lambda Amazon CloudSearchAmazon API Gateway Amazon Lambda Test Input Validate
  72. 72. integration tests exercise system’s Integration with its external dependencies my code
  73. 73. acceptance tests exercise system End-to-End from the outside my code
  74. 74. integration tests differ from acceptance tests only in HOW the Lambda functions are invoked observation
  75. 75. CI + CD PIPELINE
  76. 76. “the earlier you consider CI + CD, the more time you save in the long run” - Yan
  77. 77. “…We prefer to have the end-to-end tests exercise both the system and the process by which it’s built and deployed… This sounds like a lot of effort (it is), but has to be done anyway repeatedly during the software’s lifetime…” Testing End-to-End
  78. 78. “deployment scripts that only live on the CI box is a disaster waiting to happen” - Yan
  79. 79. Jenkins build config deploys and tests unit + integration tests deploy acceptance tests
  80. 80. build.sh allows repeatable builds on both local & CI
  81. 81. if [ "$1" = "deploy" ] && [ $# -eq 4 ]; then STAGE=$2 REGION=$3 PROFILE=$4 npm install AWS_PROFILE=$PROFILE 'node_modules/.bin/sls' deploy -s $STAGE -r $REGION elif [ "$1" = "int-test" ] && [ $# -eq 4 ]; then STAGE=$2 REGION=$3 PROFILE=$4 npm install AWS_PROFILE=$PROFILE npm run int-$STAGE elif [ "$1" = "acceptance-test" ] && [ $# -eq 4 ]; then STAGE=$2 REGION=$3 PROFILE=$4 npm install AWS_PROFILE=$PROFILE npm run acceptance-$STAGE else usage exit 1 fi
  82. 82. Jenkinsfile
  83. 83. Auto Auto Manual
  84. 84. LOGGING
  85. 85. 2016-07-12T12:24:37.571Z 994f18f9-482b-11e6-8668-53e4eab441ae GOT is off air, what do I do now?
  86. 86. 2016-07-12T12:24:37.571Z 994f18f9-482b-11e6-8668-53e4eab441ae GOT is off air, what do I do now? UTC Timestamp API Gateway Request Id your log message
  87. 87. function name date function version
  88. 88. Yan Logs are not easily searchable in CloudWatch Logs.
  89. 89. LOG OVERLOAD
  90. 90. CENTRALISE LOGS
  91. 91. CENTRALISE LOGS MAKE THEM EASILY SEARCHABLE
  92. 92. + + the elk stack
  93. 93. CloudWatch Logs
  94. 94. CloudWatch Logs AWS Lambda ELK stack
  95. 95. CloudWatch Events
  96. 96. http://bit.ly/2f3zxQG
  97. 97. DISTRIBUTED TRACING
  98. 98. “my followers didn’t receive my new post!” - a user
  99. 99. where could the problem be?
  100. 100. correlation IDs* * eg. request-id, user-id, yubl-id, etc.
  101. 101. ROLL YOUR OWN CLIENTS
  102. 102. kinesis client http client sns client
  103. 103. http://bit.ly/2k93hAj
  104. 104. ROLL YOUR OWN CLIENTS X-RAY
  105. 105. Amazon X-Ray
  106. 106. Amazon X-Ray
  107. 107. traces do not span over API Gateway
  108. 108. http://bit.ly/2s9yxmA
  109. 109. MONITORING + ALERTING
  110. 110. “where do I install monitoring agents?”
  111. 111. you can’t
  112. 112. • invocation Count • error Count • latency • throttling • granular to the minute • support custom metrics
  113. 113. • same metrics as CW • better dashboard • support custom metrics https://www.datadoghq.com/blog/monitoring-lambda-functions-datadog/
  114. 114. my code
  115. 115. my code
  116. 116. my code internet internet press button something happens
  117. 117. “how do I batch up and send logs in the background?”
  118. 118. you can’t (kinda)
  119. 119. console.log(“hydrating yubls from db…”); console.log(“fetching user info from user-api”); console.log(“MONITORING|1489795335|27.4|latency|user-api-latency”); console.log(“MONITORING|1489795335|8|count|yubls-served”); timestamp metric value metric type metric namemetrics logs
  120. 120. CloudWatch Logs AWS Lambda ELK stack logs metrics CloudWatch
  121. 121. http://bit.ly/2gGredx
  122. 122. DASHBOARDS
  123. 123. DASHBOARDS SET ALARMS
  124. 124. DASHBOARDS SET ALARMS TRACK APP-LEVEL METRICS
  125. 125. Not Only CloudWatch
  126. 126. “you really don't want your monitoring system to fail at the same time as the system it monitors” - me
  127. 127. CONFIG MANAGEMENT
  128. 128. easily and quickly propagate config changes
  129. 129. Yan Environment variables make it hard to share configurations across functions.
  130. 130. Yan Environment variables make it hard to implement fine-grained access to sensitive info.
  131. 131. CENTRALISED CONFIG SERVICE
  132. 132. config service goes here
  133. 133. SSM Parameter Store
  134. 134. sensitive data should be encrypted in-flight, and at rest (credentials, connection string, etc.)
  135. 135. role-based access
  136. 136. SSM Parameter Store HTTPS role-based access encrypted in-flight
  137. 137. SSM Parameter Store encrypt role-based access
  138. 138. SSM Parameter Store encrypted at-rest
  139. 139. HTTPS role-based access SSM Parameter Store encrypted in-flight
  140. 140. CENTRALISED CONFIG SERVICE CLIENT LIBRARY
  141. 141. fetch & cache at Cold Start
  142. 142. invalidate at interval + signal
  143. 143. http://bit.ly/2yLUjwd
  144. 144. PRO TIPS
  145. 145. max 75 GB total deployment package size* * limit is per AWS region
  146. 146. Janitor Monkey
  147. 147. Janitor Lambda http://bit.ly/2xzVu4a
  148. 148. disable versionFunctions in
  149. 149. install Serverless framework as dev dependency at project level dev dependencies are excluded since 1.16.0
  150. 150. http://bit.ly/2vzBqhC
  151. 151. http://amzn.to/2vtUkDU
  152. 152. UNDERSTAND COLDSTARTS
  153. 153. Amazon X-Ray 1st invocation 2nd invocation cold start
  154. 154. source: http://bit.ly/2oBEbw2
  155. 155. http://bit.ly/2rtCCBz
  156. 156. C# http://bit.ly/2rtCCBz
  157. 157. Java http://bit.ly/2rtCCBz
  158. 158. NodeJs, Python http://bit.ly/2rtCCBz
  159. 159. EMBRACE NODE.JS & PYTHON
  160. 160. what about type safety?
  161. 161. complexity ceiling of a Node.js app complexity
  162. 162. complexity ceiling of a Node.js app complexity referential transparency immutability as default type inference option types union types …
  163. 163. for managing complexity complexity ceiling of a Node.js app complexity referential transparency immutability as default type inference option types union types …
  164. 164. complexity ceiling of a Node.js app complexity complexity ceiling of a Node.js Lambda function
  165. 165. if you can limit the complexity of your solution, maybe you won’t need the tools for managing that complexity. me
  166. 166. AVOID COLDSTARTS
  167. 167. CloudWatch Event AWS Lambda
  168. 168. CloudWatch Event AWS Lambda ping ping ping ping
  169. 169. CloudWatch Event AWS Lambda ping ping ping ping
  170. 170. CloudWatch Event AWS Lambda ping ping ping ping HEALTH CHECKS?
  171. 171. AVOID HARD ASSUMPTIONS ABOUT FUNCTION LIFETIME
  172. 172. USE STATE FOR OPTIMISATION
  173. 173. max 5 mins execution time
  174. 174. USE RECURSION FOR LONG RUNNING TASKS
  175. 175. CONSIDER PARTIAL FAILURES
  176. 176. “AWS Lambda polls your stream and invokes your Lambda function. Therefore, if a Lambda function fails, AWS Lambda attempts to process the erring batch of records until the time the data expires…” http://docs.aws.amazon.com/lambda/latest/dg/retries-on-errors.html
  177. 177. should function fail on partial/any failures?
  178. 178. SNS Kinesis SQS after 3 attempts share processing logic events are processed in chronological order failed events are retried out of sequence
  179. 179. PROCESS SQS WITH RECURSIVE FUNCTIONS
  180. 180. http://bit.ly/2npomX6
  181. 181. AVOID HOT KINESIS STREAMS
  182. 182. “Each shard can support up to 5 transactions per second for reads, up to a maximum total data read rate of 2 MB per second.” http://docs.aws.amazon.com/streams/latest/dev/service-sizes-and-limits.html
  183. 183. “If your stream has 100 active shards, there will be 100 Lambda functions running concurrently. Then, each Lambda function processes events on a shard in the order that they arrive.” http://docs.aws.amazon.com/lambda/latest/dg/concurrent-executions.html
  184. 184. when no. of processors goes up…
  185. 185. ReadProvisionedThroughputExceeded can have too many Kinesis read operations…
  186. 186. ReadRecords.IteratorAge unpredictable spikes in read ‘latency’…
  187. 187. can kinda workaround…
  188. 188. http://bit.ly/2uv5LsH
  189. 189. clever, but costly
  190. 190. for subsystems that don’t have to be realtime, or are task- based (ie. order doesn’t matter), consider other triggers such as S3 or SNS.me
  191. 191. @theburningmonk theburningmonk.com github.com/theburningmonk
  192. 192. API Gateway and Kinesis Authentication & authorisation (IAM, Cognito) Testing Running & Debugging functions locally Log aggregation Monitoring & Alerting X-Ray Correlation IDs CI/CD Performance and Cost optimisation Error Handling Configuration management VPC Security Leading practices (API Gateway, Kinesis, Lambda) Step Functions Serverless design patterns https://bit.ly/aws-lambda-in-motion
  193. 193. API Gateway and Kinesis Authentication & authorisation (IAM, Cognito) Testing Running & Debugging functions locally Log aggregation Monitoring & Alerting X-Ray Correlation IDs CI/CD Performance and Cost optimisation Error Handling Configuration management VPC Security Leading practices (API Gateway, Kinesis, Lambda) Step Functions Serverless design patterns get 40% off with: ytcui https://bit.ly/aws-lambda-in-motion

×