Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Techdays Finland 2019 - Adventures of building a (multi-tenant) PaaS on Microsoft Azure

441 views

Published on

Building a multi-tenant PaaS is not a walk in the part, certainly if the platform you are building on is constantly changing.

In this session I'll walk you through the adventure we've been on where you'll learn about the challenges we've had and how we approached them and whether or not our decisions worked out or not.

– How to design for scale
– How to operate the platform
– How to grow a platform mindset and force ownership
– How to run tests for your whole platform
– How to design for multi-tenancy
– How to approach constant change
– etc

Cloud projects are never finished so you'd better come prepared.

Published in: Technology
  • The Surprising Reason 11:11 Keeps Popping-Up: Free report reveals the Universe's secret "Sign Posts" that point the way to success, wealth and happiness. Claim your copy right now! ●●● http://ishbv.com/manifmagic/pdf
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Free Magical Healing Soundscape - Attract more abundance... clear negativity... reduce stress... sleep deeper... and bring astonishing miracles into your life... simply by listening for a few minutes each day. Get yours now. ■■■ http://ishbv.com/manifmagic/pdf
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Fact: Penis Enlargement CAN Work. Here's How. ■■■ https://bit.ly/30G1ZO1
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Techdays Finland 2019 - Adventures of building a (multi-tenant) PaaS on Microsoft Azure

  1. 1. Adventures of building a (multi-tenant) PaaS on Microsoft Azure Tom Kerkhove 1
  2. 2. Azure Architect Microsoft Azure MVP & Advisor Writes on blog.tomkerkhove.be Creator of Promitor.io Hi, I’m Tom Kerkhove 2 @TomKerkhove tomkerkhove
  3. 3. Agenda | Scalability | Tenancy | Monitoring | Shipping Features | Webhooks | Embrace Change February 2019 Adventures of building a (multi-tenant) PaaS on Microsoft Azure 3
  4. 4. Disclaimer You’ll learn about my adventures & findings, not about silver bullets 4
  5. 5. Scale 5
  6. 6. Scale up/down February 2019 Adventures of building a (multi-tenant) PaaS on Microsoft Azure 6 Scale | Easiest way of scaling is to get a bigger box | The only trade-off is that it means your app will be unavailable for a while | At some point you’ll run out of “bigger boxes”
  7. 7. Scale out / in February 2019 Adventures of building a (multi-tenant) PaaS on Microsoft Azure 7 Scale | Provide multiple copies of your application based on your workload | No impact on your uptime, but more complex | My preferred way of scaling, but your application needs to be designed for it
  8. 8. Choose the right compute infrastructure February 2019 Adventures of building a (multi-tenant) PaaS on Microsoft Azure 8 Functions Functions Container Instances Cloud Services Service Service Fabric Mesh App Service Fabric Kubernetes Cluster VM Scale Sets VMs Bare Metal | As control increases, so does complexity | Every service has it’s own characteristics | How you run your application | How you package your application | How you scale your application
  9. 9. Designing for scale February 2019 Adventures of building a (multi-tenant) PaaS on Microsoft Azure 9 Scale Order Function Order Function Order Function Order Function Order Function Order Function Azure Functions
  10. 10. Designing for scale with serverless February 2019 Adventures of building a (multi-tenant) PaaS on Microsoft Azure 10 Scale | The good | The service handles scaling for you | The bad | The service handles scaling for you | Does not provide a lot of awareness | The ugly | Dangerous to burn a lot of money
  11. 11. Designing for scale with PaaS February 2019 Adventures of building a (multi-tenant) PaaS on Microsoft Azure 11 Scale Instance Orders Role Cloud Services Instance InstanceInstance InstanceInstance InstanceInstance Autoscaler Message Count > 1, Add instance Scale!
  12. 12. Designing for scale with PaaS February 2019 Adventures of building a (multi-tenant) PaaS on Microsoft Azure 12 Scale | The good | You need to define how it scales | Provides you with scaling awareness | The bad | You need to define how it scales | Hard to determine the perfect scaling rules | The ugly | Be aware of “flapping” (http://bit.ly/monitor-autoscale-best-practices) | Be aware of infinite scaling loops Use an Azure Monitor Autoscale
  13. 13. 13 Scale Node 1 Cluster Node 2 Node 3 Pod Pod Pod Pod Pod Pod Pod Pod Pod Pod Pod Pod Pod Pod Pod Designing for scale with CPaaS Custom Metric Provider(s) Horizontal Pod Autoscaler(s) Cluster Autoscaler
  14. 14. Designing for scale with CPaaS February 2019 Adventures of building a (multi-tenant) PaaS on Microsoft Azure 14 Scale | The good | Share resources across different teams | Serverless scaling capabilities are coming with Virtual Kubelet & Virtual Nodes | The bad | You are in charge of providing enough resources | With great power, comes great responsibilities | No autoscaling out-of-the-box | Scaling on different levels | The ugly | Takes a lot of effort to ramp up on how to scale | There’s a lot to manage
  15. 15. Create awareness around your autoscaling February 2019 Adventures of building a (multi-tenant) PaaS on Microsoft Azure 15 Scale | Avoid burning money, get notified before it’s too late! | Gain insights in your autoscaling rules | Either configured by you or managed by Azure (ie. Azure Functions) | Learn from them and tweak them | Detect autoscaling loops in TEST instead of during live-site issue | Choose the approach that fits your needs | Configure Azure Monitor notifications | Use built-in metrics to visualize and alert on | Provide your own tooling around it
  16. 16. February 2019 Adventures of building a (multi-tenant) PaaS on Microsoft Azure 16 Source: http://blog.tdwright.co.uk/2018/09/06/beware-runonstartup-in-azure-functions-a-serverless-horror-story/
  17. 17. Create awareness around your autoscaling February 2019 Adventures of building a (multi-tenant) PaaS on Microsoft Azure 17 Scale
  18. 18. Tips February 2019 Adventures of building a (multi-tenant) PaaS on Microsoft Azure 18 Scale | Resource consolidation pattern does not play nice with autoscaling | Configure maximum instance count for your autoscaling | Provide representable metrics of your remaining work | Azure Monitor Autoscale is a hidden gem in Azure, use it! | Does all the great things an autoscaler should do | Learn the scalability capabilities & trade-offs of your technologies | Kubernetes and Service Fabric are harder to scale, but more transparent than Functions | Provide budget alerts, if feasible
  19. 19. Tenancy 19
  20. 20. Multi-tenancy is all about choices 20 Tenancy | Multi-tenancy is more than data sharding | How will you deploy your application? | How much isolation does it require between tenants? | How much customization will we allow? | Do our tenants need access to their data? | Will it run in multiple regions? Will one region require multiple deployments? | What is our pricing model? Do we need to reflect this in our tenancy? | You will need to make big decisions up front, which can turn out to be bad ones – Be prepared!
  21. 21. Choosing a tenancy model 21 Tenancy App Stamp A Stamp B Stamp C Stamp D App Tenant A App Tenant B #n…#1 #n…#1BA Full isolation between tenants by deploying everything for every tenant Run a multi-tenant application, but use sharded data layer App deployed in multiple stamps & geographies with sharded data layer
  22. 22. Choosing a sharding strategy 22 Tenancy | Spread all your data across multiple smaller databases instead of one big one | Good example of scaling out to handle load | A shard key is used to determine the shard based on the chosen strategy: | Lookup, Range & Hash strategy | Choose your strategy wisely and think about your query patterns | Does your customer need access to it? Then you should shard per tenant! | You cannot easily change your strategy later on | More information: http://bit.ly/sharding-pattern
  23. 23. Locating shards February 2019 Adventures of building a (multi-tenant) PaaS on Microsoft Azure 23 Tenancy Shard Manager How do I connect to “Sello”? Order Processor #5#4#3#2#1 #6 #7 #8 Use this connection string Get Secret “Sql-Tenant-Sello”
  24. 24. Using shard managers 24 Tenancy | Provide catalog of all shard in the platform | Determine current shard based on shard key & chosen approach | Metadata is stored in a store of choice | Be careful where you store your secrets | Choosing a good shard manager | Build your own, ie on top of Azure Key Vault | Use existing tool, ie Azure SQL Database Elastic Tools
  25. 25. Cost-efficient sharding 25 Tenancy #5#4#3#2#1 #6 #7 #8 Elastic Pool | In a PaaS world you need to pay for every data store instance you have. S1–5% S1–15% S1–5% S1–25% S1–50% S1–7% S1–21% S1–41% Use an Azure SQL Elastic Pools | We pay ~€200 for 160 DTU, but only use ~20 % of it
  26. 26. Cost-efficient sharding February 2019 Adventures of building a (multi-tenant) PaaS on Microsoft Azure 26 Tenancy #5#4#3#2#1 #6 #7 #8 Elastic Pool | Enforce resource limitation on a per-shard level 7DTU 15DTU 8DTU 25DTU 24DTU 7DTU 40DTU 21DTU 93DTU I need more resources!!! We ran out, we only have 200 DTU 25DTU
  27. 27. Cost-efficient sharding February 2019 Adventures of building a (multi-tenant) PaaS on Microsoft Azure 27 Tenancy #5#4#3#2#1 #6 #7 #8 Elastic Pool | Enforce resource limitation on a per-shard level 7DTU 15DTU 8DTU 25DTU 24DTU 7DTU 40DTU 21DTU 50DTU I need more resources!!! You’ve had enough 37DTU
  28. 28. | Provide multiple resource pools to reduce impact of noisy neighbors Cost-efficient sharding February 2019 Adventures of building a (multi-tenant) PaaS on Microsoft Azure 28 Tenancy #5#4#3#2#1 #6 #7 #8 Resource Pool A Resource Pool B Resource Pool C 7DTU 15DTU 8DTU 24DTU 7DTU 21DTU 50DTU 37DTU 7DTU 15DTU 8DTU 24DTU 7DTU 37DTU
  29. 29. | Reflect your pricing model in your resource pooling Cost-efficient sharding February 2019 Adventures of building a (multi-tenant) PaaS on Microsoft Azure 29 Tenancy #5#4#3#2#1 #6 #7 #8 Basic Pool 37DTU 24DTU 5DTU 51DTU 63DTU 77DTU 158DTU 121DTU Standard Pool Premium Pool Provision 250 Standard DTUs , capped at 100Provision 100 Basic DTUs , capped at 50 Provision 500 Premium DTUs, capped at 250
  30. 30. Cost-efficient sharding February 2019 Adventures of building a (multi-tenant) PaaS on Microsoft Azure 30 | Consider moving all shards in a resource pool | Configure maximum consumption per database | Consider using multiple resource pools to reduce impact of noisy neighbor | Resource pools are a great way to reflect your pricing model | Monitor your pools as you would do for individual databases Tenancy
  31. 31. Determining tenants February 2019 Adventures of building a (multi-tenant) PaaS on Microsoft Azure 31 | Determining the tenant that is consuming your service via your API gateway | Map the authentication key to the registered application and use its context Tenancy API Gateway API DB POST api/v1/orders X-API-Key ABC POST api/v1/orders X-Tenant Sello Shard Manager What shard is “Sello”? Bill Bracket owns ABC. Part of “Sello” group
  32. 32. Determining tenants February 2019 Adventures of building a (multi-tenant) PaaS on Microsoft Azure Tenancy <policies> <inbound> <base/> <set-header name="X-Tenant" exists-action="override"> <value> @(string.Join(";", (from item in context.User.Groups where item.Name.ToLower().Contains("sello -") select item.Name.Replace("Sello - ", String.Empty).Trim()))) </value> </set-header> </inbound> <backend> <base/> </backend> <outbound> <base/> </outbound> <on-error> <base/> </on-error> </policies>
  33. 33. Monitoring 33
  34. 34. Monitoring | Azure Monitor is your one-stop-shop for monitoring in Azure February 2019 Adventures of building a (multi-tenant) PaaS on Microsoft Azure 34 Karl Ots gives a nice overview in his “Monitoring real-life Azure applications: When to use what and why” talk - https://www.youtube.com/watch?v=93-4soieZzI
  35. 35. Correlate your telemetry February 2019 Adventures of building a (multi-tenant) PaaS on Microsoft Azure 35 Monitoring Frontend API Order Processor DB Get Products Create Order Operation ABC Operation DEF Session XYZ
  36. 36. Correlate your telemetry February 2019 Adventures of building a (multi-tenant) PaaS on Microsoft Azure 36 Monitoring Frontend API Order Processor DB Get Products Create Order Operation ABC Operation DEF Session XYZ Cycle 123
  37. 37. Correlate your telemetry February 2019 Adventures of building a (multi-tenant) PaaS on Microsoft Azure 37 | Use different layers of correlation ids & make sure the terminology is consistent | Ideally you have “session id”, “operation id” & “cycle id” | Always return your correlation ids to your consumers | Easier to troubleshoot in case of support cases | Correlated all your telemetry to provide a logical flow, not just traces Monitoring
  38. 38. Enriching your telemetry February 2019 Adventures of building a (multi-tenant) PaaS on Microsoft Azure 38 | Provide app-specific information to all telemetry | Think about service name, instance name, tenant name, business-specific ids, … | Use telemetry initializers to automatically add information Monitoring
  39. 39. GDPR February 2019 Adventures of building a (multi-tenant) PaaS on Microsoft Azure 39 | Never track personal identifiable information | Think email address, username, first & last name, etc | You can | Stop tracking any information | Obfuscate/hash all information by using a custom Telemetry Processor Monitoring
  40. 40. Application overview with App Map February 2019 Adventures of building a (multi-tenant) PaaS on Microsoft Azure 40 | Great way to see all your dependencies, how their health is | Living documentation of your platform Monitoring
  41. 41. Health checks February 2019 Adventures of building a (multi-tenant) PaaS on Microsoft Azure 41 | Report status of your application - Is it still healthy? Is it ready? | This can be used for liveness & readiness probes when autoscaling | Go as far as you want – Simple HTTP 200 or list of all dependencies | Be wary of performance impact | Always provide throttling to block noisy consumers | Think about your connection management | No direct business value, until it’s too late Monitoring
  42. 42. Health checks February 2019 Adventures of building a (multi-tenant) PaaS on Microsoft Azure 42 Monitoring
  43. 43. Availability monitoring February 2019 Adventures of building a (multi-tenant) PaaS on Microsoft Azure 43 | Good way to monitor availability and latency across the world | Configure alerts when availability drops | Health checks are a good fit Monitoring
  44. 44. Handling alerts February 2019 Adventures of building a (multi-tenant) PaaS on Microsoft Azure 44 | Always automate alert creation, they are part of your infrastructure as well | Build a centralized alert handling process | Azure Logic Apps is a good fit for this | Unless an alert needs different alert handling | Different alerts have different contracts | Use adapters to receive notifications | Map to internal metric contract | Handle via centralized alert handler Monitoring Azure Monitor Classic Alert Adapter Azure Monitor Metrics Alert Adapter Centralized Alert Handler ... Time to move it! Azure Classic Alerts will be deprecated by June 2019. https://docs.microsoft.com/en-gb/azure/azure-monitor/platform/monitoring-classic-retirement
  45. 45. Write Root Cause Analysis (RCA) 45 | Train your team for PROD outages, write RCAs in all environments | Provides a structured way of analysing your platform | Acts as a knowledge transfer to customers & team | Not only what happened and how it was mitigated, also how you analysed it | Use them to detect recurring issues | Define action points and follow-up on them | How could this have been prevented? | Did we have enough telemetry & alerts? | Who was impacted? Do we need to send out communications? Growing a platform mindset
  46. 46. Shipping Features 46
  47. 47. Automate everything 47 | Infrastructure-as-code is a must | Every aspect of your infrastructure should be automatically provisioned; including alerts, scaling rules, etc. | Living documentation of your infrastructure | Everything should be in source control, reviewed and versioned; it is part of your platform. | Manual interventions are evil, but not unavoidable | Use “Deployment Notes” to track these and make it part of your release process | Tenant onboarding should be automated from the start | Provide APIs to manage your platform | We all know “We’ll do it later” means “This never going to happen” Shipping Features
  48. 48. Use Task Groups 48 Shipping Features
  49. 49. Deployment rings 49 | A way of releasing to production environments in “rings”. | Allows you to have early adopters before rolling out to the masses | More information: bit.ly/deployment-rings Shipping Features
  50. 50. Deployment rings 50 Shipping Features App Stamp A Stamp B Stamp C Stamp D App Tenant A App Tenant B #n…#1 #n…#1BA Full isolation between tenants by deploying everything for every tenant Run a multi-tenant application, but use sharded data layer App deployed in multiple stamps & geographies with sharded data layer
  51. 51. Deployment rings 51 | What is defined as a “Ring” is up to you | This can be per tenant, a group of tenant of category ie. early adaptors | Depends on your tenancy-approach as well Shipping Features
  52. 52. Tracking releases in Application Insights 52 | Annotate metrics with new releases Shipping Features
  53. 53. Tracking releases in Application Insights 53 | Free extension available in Marketplace - http://bit.ly/ai-release-annotations Shipping Features
  54. 54. Verifying deployments February 2019 Adventures of building a (multi-tenant) PaaS on Microsoft Azure 54 | Use Deployment Gates to verify deployments | Checking for Azure Alerts kicking off is a must Shipping Features
  55. 55. Verifying deployments 55 | Use Deployment Gates to verify deployments | Re-use your health checks and call via REST Shipping Features
  56. 56. Verifying deployments 56 | Swap deployment slots when instance is warmed up | Careful with slots, resources are shared which can impact production slot Shipping Features
  57. 57. Webhooks 57
  58. 58. Consuming webhooks February 2019 Adventures of building a (multi-tenant) PaaS on Microsoft Azure 58 | Generated URLs are evil, provide good DNS names of your services | And this goes for everything, not only webhooks Webhooks API (12345.provider.com) 3rd Party POST https://12345.provider.com/api/webhooks Where did 123456 go?! API (67890.provider.com) Use an API Gateway
  59. 59. Consuming webhooks February 2019 Adventures of building a (multi-tenant) PaaS on Microsoft Azure 59 | Do not reduce your API security because of your 3rd Party Webhooks API (12345.provider.com) 3rd Party POST http://12345.provider.com/api/webhooks Where is your cert? Use an API Gateway
  60. 60. Consuming webhooks February 2019 Adventures of building a (multi-tenant) PaaS on Microsoft Azure 60 | Always route webhooks through an API gateway | This decouples the webhook from your internal architecture Webhooks API Gateway API 3rd Party POST http://sello.com/api/v1/webhooks POST https://sello.provider.com/api/v1/webhooks Authentication: Certificate
  61. 61. Consuming webhooks February 2019 Adventures of building a (multi-tenant) PaaS on Microsoft Azure 61 | Always route webhooks through an API gateway | This decouples the webhook from your internal architecture Webhooks API Gateway Orders API 3rd Party POST http://sello.com/api/v1/webhooks POST https://orders.sello.provider.com/api/v1/webhooks Authentication: Certificate Stock APISome gateways support auth key via query parameters
  62. 62. Provide user-friendly webhooks February 2019 Adventures of building a (multi-tenant) PaaS on Microsoft Azure 62 | Provide a way for consumers to provide context during registration | This context will be delivered along with the effective payload | Allows them | Provide a self-service CRUD API to register new subscriptions | Registering is one thing, opting-out or getting an overview is another story | This can be important for migration reasons or in case of bugs causing bad subscriptions | Pass your correlation id via Request-Id header | Provide an invocation history | This helps consumers troubleshoot why events were not delivered | Allow them to search on correlation id Webhooks
  63. 63. Spaghetti infrastructure 2.0? February 2019 Adventures of building a (multi-tenant) PaaS on Microsoft Azure 63 Webhooks Warehouse Service Order Service Stock Service Shipping Service Invoice Service Payment Service
  64. 64. Spaghetti infrastructure 2.0? February 2019 Adventures of building a (multi-tenant) PaaS on Microsoft Azure 64 | Long-term this can start to become a burden | A lot of bookkeeping to know who to update, how we should authenticate, etc | No central place to route all webhooks through | Your platform needs to be robust | What if subscriber II is not responding? Let’s build a retry mechanism! | Who says subscriber II owns foo.bar.com? | Webhooks should use a “I don’t care, here’s an update” approach Webhooks
  65. 65. Event Routers February 2019 Adventures of building a (multi-tenant) PaaS on Microsoft Azure 65 Webhooks Warehouse Service Order Service Stock Service Shipping Service Invoice Service Payment Service Event Router
  66. 66. Event Routers February 2019 Adventures of building a (multi-tenant) PaaS on Microsoft Azure 66 | Event Routers do all the heavy lifting for you | Provide a centralized hub for all things events | Bookkeeping of whom subscribes to what webhooks and events | They will retry sending events when they did not reply | They will perform webhook validations | Publishers can publish events to the event router and takes it from there | Great for for internal usage, but harder to use with 3rd parties | Webhook validation is not always easy to setup Webhooks
  67. 67. Tips February 2019 Adventures of building a (multi-tenant) PaaS on Microsoft Azure 67 | Webhooks are not durable, if you are not around you will miss it. | If you need to ensure at-least-once delivery, consider using a broker instead | Store audit entry of webhook that are being pushed | Can be important in case of a dispute | Optionally even include the response of the consumer | Don’t only allow global registrations, consider serving more granular updates | For example, I want updates of one flight instead of all flights | Provide rate limiting on your webhook endpoints | Don’t let your platform go down by your 3rd party provider | Webhooks are contracts as well | Provide good documentation and version them Webhooks
  68. 68. Embrace Change 68
  69. 69. How we used to ship 69 Embrace Change
  70. 70. How we used to ship 70 Embrace Change | Major releases were done every 1-3 years | Feature requests were in the works for long time | Not easy to shift product focus based on market change | Agile to the rescue! | Move faster and learn from your users more quickly | New features are rolled out ever sprint, week, day | Easier to make decisions
  71. 71. We live in a world of constant change 71 Embrace Change | Our underlying infrastructure is constantly moving & changing | Cloud vendors are competing to offer unique services | Get it out early and see if customers want this; if not, kill it | Staying up to date is a lot harder
  72. 72. The lifecycle of a service 72 Embrace Change Private Preview • Rough version of product • Shared under NDA to limited group Public Preview • Available to the masses General Available • Covered by SLA • Supported version The End • Deprecation • Silent Sunsetting • Reincarnation in 2.0
  73. 73. The end of the road 73 Embrace Change | Official deprecation | Officially announced as deprecated | Migration is required before service shutdown | Reincarnation | A new version of the service arises in a new version | Can be part of service or service in total | Silent deprecation | No further development in the product | Service is still running smoothly | Does not mean you should stop using it Azure Access Control Service, Azure Scheduler Azure Data Factory (Service) Azure Functions (Host) Can also be tooling cfr. Azure DevOps Cloud- based Loadtesting
  74. 74. Choosing an Alternative 74 Embrace Change | Be careful with the latest shiny technology | “This new technology fixes our problem!” and potentially gives you a lot of new ones | Make sure it’s already stable | Questions you should ask yourself | Is it operable? | What is the learning curve? Is it worthwhile? | Does it have a vendor lock-in? | Build or buy | There is no silver bullet
  75. 75. Cloud platforms are never finished 75 Embrace Change | There is no such thing as a cloud project, use a product mindset | Nothing is written in stone, dare to revise decisions you’ve made | Prepare for your migrations | Make sure you have enough time | Create a small POC to see if the technology is the right fit | Change is coming, so you’d better be prepared
  76. 76. Conclusion 76
  77. 77. Conclusion | Learn the scalability capabilities & trade-offs of your technologies | Provide user-friendly webhooks & route them via API gateways as a consumer | Define and roll out a good monitoring strategy, and make sure you have enough information | Automate everything, it will save you one day | We live in a world of constant change, so be prepared | You build it, you run it
  78. 78. 78

×