Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

How to build a social network on Serverless (AWS Community Summit)

619 views

Published on

Many people are building different workloads using serverless technologies these days, but how would a non-trivial system such as a social network look like on serverless?

In this talk Yan will discuss his journey of migrating a social network startup to serverless, and how his team was able to improve performance, scalability and feature delivery using serverless technologies.

Yan will discuss how serverless technologies such as Lambda are used to implement each part of their system, including search, push notifications, timeline, user recommendations, and business intelligence. If you're wondering how serverless can be used to solve a wide variety of challenges in your business, this is the talk for you.

Published in: Technology
  • Be the first to comment

How to build a social network on Serverless (AWS Community Summit)

  1. 1. MANCHESTER
  2. 2. How to build a social network with serverless AWS CREATIVE STUDIO | 2018
  3. 3. Yan Cui http://theburningmonk.com @theburningmonk Principal Engineer @ Independent Consultant Instructor @ Instructor @ Advisor @
  4. 4. “Netflix for sports” offices in London, Leeds, Katowice and Amsterdam
  5. 5. available in Austria, Switzerland, Germany, Japan, Italy, Spain, Canada and USA available on 30+ platforms
  6. 6. ~1,000,000 concurrent viewers
  7. 7. “Netflix for sports” offices in London, Leeds, Katowice and Amsterdam We’re hiring! Visit engineering.dazn.com to learn more. follow @dazneng for updates about the engineering team. WE’RE HIRING!
  8. 8. apr, 2016
  9. 9. nov, 2016
  10. 10. WHY?
  11. 11. hey guys, vote on this post and I’ll announce a winner at 10PM tonight
  12. 12. 10PM traffic
  13. 13. 10PM traffic 70-100x
  14. 14. low utilisation to leave room for spikes EC2 scaling is slow, so scale earlier
  15. 15. lots of $$$ for unused resources
  16. 16. up to 30 mins for deployment deployment required downtime
  17. 17. features took months to develop
  18. 18. - Dan North “lead time to someone saying thank you is the only reputation metric that matters.”
  19. 19. WHY? to deliver better UX
  20. 20. WHY? to deliver better UX to deliver value faster
  21. 21. WHY? to deliver better UX to deliver value faster to be more cost efficient
  22. 22. WHY? to deliver better UX to deliver value faster to be more cost efficient HOW?
  23. 23. what would good look like for us?
  24. 24. small fast zero downtime no lock-step deployments should be…
  25. 25. features should be… deployable independently loosely-coupled
  26. 26. we want to… minimise cost for unused resources
  27. 27. we want to… minimise cost for unused resources minimise ops effort
  28. 28. we want to… minimise cost for unused resources minimise ops effort reduce tech mess
  29. 29. we want to… minimise cost for unused resources minimise ops effort reduce tech mess deliver visible improvements faster
  30. 30. WHY? to deliver better UX to deliver value faster to be more cost efficient HOW? microservices
  31. 31. WHY? to deliver better UX to deliver value faster to be more cost efficient HOW? microservices event-driven
  32. 32. WHY? to deliver better UX to deliver value faster to be more cost efficient HOW? microservices event-driven serverless
  33. 33. WHY? to deliver better UX to deliver value faster to be more cost efficient HOW? microservices event-driven serverless WHAT? this talk!
  34. 34. WHY? to deliver better UX to deliver value faster to be more cost efficient HOW? microservices event-driven serverless WHAT? this talk!
  35. 35. 170 Lambda functions in prod
  36. 36. 95% cost saving vs. EC2
  37. 37. 15x no. of prod releases per month
  38. 38. 15x no. of prod releases per month (features were sometimes implemented on the same day)
  39. 39. time is a good fit
  40. 40. 1st function in prod! time is a good fit
  41. 41. ? time is a good fit 1st function in prod!
  42. 42. CI/CD?
  43. 43. CI/CD? testing?
  44. 44. CI/CD? testing? logging, monitoring, alerting?
  45. 45. time is a good fit 1st function in prod! CI/CD, testing, logging, monitoring, alerting
  46. 46. 170 functions ? time is a good fit 1st function in prod! CI/CD, testing, logging, monitoring, alerting
  47. 47. tracing?
  48. 48. tracing? config management?
  49. 49. tracing? config management? security?
  50. 50. 170 functions time is a good fit 1st function in prod! CI/CD, testing, logging, monitoring, alerting tracing, config management, security
  51. 51. API Gateway and Kinesis Authentication & authorisation (IAM, Cognito) Testing Running & Debugging functions locally Log aggregation Monitoring & Alerting X-Ray Correlation IDs CI/CD Performance and Cost optimisation Error Handling Configuration management VPC Security Leading practices (API Gateway, Kinesis, Lambda) Canary deployments http://bit.ly/production-ready-serverless get 40% off with: ytcui
  52. 52. evolving the PLATFORM
  53. 53. Legacy Monolith Amazon Kinesis Step 1. ALL state changes!
  54. 54. events are an enabler for COMPOSABILITY
  55. 55. AWS LAMBDA is the...
  56. 56. Kinesis
  57. 57. Kinesis API Gateway AWS Lambda API GatewayAWS Lambda service-A service-B
  58. 58. Kinesis API Gateway AWS Lambda API GatewayAWS Lambda service-A service-B
  59. 59. Kinesis API Gateway AWS Lambda API GatewayAWS Lambda service-A service-B AWS Lambda AWS Lambda AWS Lambda
  60. 60. Kinesis API Gateway AWS Lambda API GatewayAWS Lambda service-A service-B AWS Lambda AWS Lambda AWS Lambda DynamoDBIOT
  61. 61. Kinesis API Gateway AWS Lambda API GatewayAWS Lambda service-A service-B AWS Lambda AWS Lambda AWS Lambda DynamoDBIOT
  62. 62. Kinesis API Gateway AWS Lambda API GatewayAWS Lambda service-A service-B AWS Lambda AWS Lambda AWS Lambda DynamoDBIOT AWS Lambda AWS Lambda
  63. 63. build loosely-coupled system through events
  64. 64. service A service B service C service D bounded context bounded context
  65. 65. service A service B service C service D bounded context bounded context
  66. 66. service A service B service C service D
  67. 67. there are no silver bullets
  68. 68. service A service B service C service D
  69. 69. service A service B service C service D
  70. 70. service A service B service C service D update!
  71. 71. service A service B service C service Dbackward-compatible? update!
  72. 72. bounded context DON’T use events to orchestrate workflows within the same bounded context
  73. 73. bounded context adds unnecessary complexity to logging, tracing, and end-to-end reporting
  74. 74. bounded context the workflow doesn’t exist as a standalone concept, but as the sum of a series of loosely connected parts
  75. 75. Step Functions use Step Functions instead
  76. 76. Step Functions don’t forget to emit events from the workflow
  77. 77. Step Functions so others can react to state changes that happened as part of the workflow
  78. 78. “how do I organize my functions into code repositories?”
  79. 79. monorepo?
  80. 80. github repo
  81. 81. https://lumigo.io/blog/mono-repo-vs-one-per-service/
  82. 82. monorepo !== monostack
  83. 83. one repo per service?
  84. 84. github repo github repo github repo github repo user-api timeline-api relationship-api search-api
  85. 85. CI/CD pipeline per service
  86. 86. functions are deployed together, as a stack
  87. 87. Strangler Pattern incrementally migrate the legacy system by gradually replacing pieces of functionalities to the new system
  88. 88. rebuilt search
  89. 89. Legacy Monolith Amazon Kinesis Amazon Lambda Amazon CloudSearch
  90. 90. Legacy Monolith Amazon Kinesis Amazon Lambda Amazon CloudSearchAmazon API Gateway Amazon Lambda
  91. 91. proxy requests from monolith to new service
  92. 92. new analytics pipeline
  93. 93. expensive ($3000/month) don’t understand our domain JS based query language
  94. 94. Legacy Monolith Amazon Kinesis Amazon Lambda Google BigQuery
  95. 95. Legacy Monolith Amazon Kinesis Amazon Lambda Google BigQuery 1 developer, 2 days design production (his 1st serverless project)
  96. 96. Legacy Monolith Amazon Kinesis Amazon Lambda Google BigQuery “nothing ever got done this fast at Skype!” - Chris Twamley
  97. 97. - Dan North “lead time to someone saying thank you is the only reputation metric that matters.”
  98. 98. $3000/month $0.03/month
  99. 99. Kinesis sink
  100. 100. Kinesis Kinesis Firehose batch Kinesis events
  101. 101. Kinesis Kinesis Firehose S3 data lake
  102. 102. Kinesis Kinesis Firehose S3 Glue analyze data schema, catalog data into tables
  103. 103. Kinesis Kinesis Firehose S3 Athena Glue query engine
  104. 104. Kinesis Kinesis Firehose S3 AthenaQuickSight Glue visualization, dashboards
  105. 105. Kinesis Kinesis Firehose S3 AthenaQuickSight Glue no code is required!
  106. 106. Kinesis Kinesis Firehose S3 AthenaQuickSight Glue no code is required! pay-per-use!
  107. 107. user action business intelligence
  108. 108. user action business intelligence
  109. 109. Problem didn’t work…
  110. 110. Problem didn’t work… over-engineered…
  111. 111. try figure out what’s going on here…
  112. 112. Problem didn’t work… over-engineered… didn’t scale…
  113. 113. Rebuilt with Lambda
  114. 114. built-in retry and DLQ
  115. 115. built-in retry and DLQ avoid repeating expensive work of fetching mils of relationships
  116. 116. github repo timeline-api service: timeline-api provider: name: aws runtime: nodejs6.10 stage: dev region: us-east-1 functions: distribute-yubl: … undistribute-yubl: …
  117. 117. Problem didn’t work…
  118. 118. “it returns the first 30 users in the database, by creation time…”
  119. 119. Rebuilt with Lambda
  120. 120. BigQuery
  121. 121. BigQuery
  122. 122. grapheneDB BigQuery
  123. 123. grapheneDB BigQuery
  124. 124. grapheneDB BigQuery
  125. 125. grapheneDB BigQuery mostly built in one sleepless night…
  126. 126. Building a scalable notification system
  127. 127. expensive ($3000/month) don’t understand our domain JS based query language
  128. 128. all the analytics data is already in BigQuery powerful query engine
  129. 129. all the analytics data is already in BigQuery powerful query engine
  130. 130. Design Goals ad-hoc notifications
  131. 131. Design Goals ad-hoc notifications scheduled notifications
  132. 132. Design Goals ad-hoc notifications scheduled notifications A/B testing
  133. 133. Design Goals ad-hoc notifications scheduled notifications A/B testing scalable
  134. 134. Design Goals ad-hoc notifications scheduled notifications A/B testing scalable cost-effective
  135. 135. scheduled notifications
  136. 136. how to send notifications what to send
  137. 137. other processes can leverage this capability of sending notifications
  138. 138. why not SNS?
  139. 139. ad-hoc notifications
  140. 140. Oversight vs. Frictionless
  141. 141. Oversight vs. Frictionless don’t make life difficult for the marketing team
  142. 142. Oversight vs. Frictionless don’t make life difficult for the marketing team don’t let marketing team spam users
  143. 143. Oversight vs. Frictionless don’t make life difficult for the marketing team don’t let marketing team spam users driving usage/engagement maintaining user experience
  144. 144. Marketing work with BI on query request approval from CPO/CTO approver checks impact and tests message format send notifications
  145. 145. more Scalable (and scales faster!)
  146. 146. Cheaper (don’t pay for idle servers)
  147. 147. Resilience (built-in redundancy and multi-AZ)
  148. 148. Secure
  149. 149. request blue-green deployment req/s auto-scaling us-east-1a us-east-1b us-east-1c multi-AZ
  150. 150. idea production greater Velocity from idea to product
  151. 151. WHY? to deliver better UX to deliver value faster to be more cost efficient
  152. 152. @theburningmonk theburningmonk.com github.com/theburningmonk

×