4. What is scalability?
• We can’t predict the load
• Scaling is the process of managing your resources to help your application meet a set of
performance requirements
• Ability to handle increased load
• Add resources without modifying the system
4
5. What is scaling up?
• Scaling up is the process where you increase the capacity of a given instance
• Scaling down is the process where you decrease the capacity of a given instance
• Application does not have to be designed for scalability
• Easy to implement
• Costly
• Not linear performance growth
• Restart or short interruption of the resources
5
6. What is scaling out or in?
• Scaling out is the process of adding more instances to support the load of your solution
• Scaling in is the process of removing instances
• Is elastic process
• Isn’t a magic fix
• Application has to be designed for horizontal scalability
• Requires more investment to implement
• Introduces additional complexity
• Nearly linear performance increase
• Application issues: File access, Session State, Shared resources, Bottlenecks (database)
6
7. Autoscale
• A primary advantage of the cloud is elastic scaling .
• Many Microsoft Azure services provide the capability to scale both manually and automatically
• Autoscale refers to the capability of many of these services to monitor the application instances and
automatically scale appropriately to handle the current usage of the application
• Scale based on:
• Metrics
• Schedules
• Consider startup time
• Handle state (to Azure Cache for Redis or SQL Database)
• Serverless – infrastructure isn’t you responsibility, scaling is handled automatically
7
8. Application Gateway
• OWASP Protection
• URL-based routing
• Application Gateway Ingress Controller
• Scaling:
• Autoscaling
• Manual (1-125 instances)
• Creates new instance can take some time (around six or seven minutes)
• Scaling does not cause downtime
8
9. App Service
• Scale out (horizontal scaling) – increases number of VM instances depending on pricing tier
• Manually
• Automatic
• Scale up (vertical scaling)
• Create own App Service Plan for the apps that need scaling
9
10. Autoscale metrics
Metric Metric identifier Description
CPU CpuPercentage The average amount of CPU time used across all instances of the plan
Memory MemoryPercentage The average amount of memory used across all instances of the plan
Data in BytesReceived The average incoming bandwidth used across all instances of the plan
Data out BytesSent The average outgoing bandwidth used across all instances of the plan
HTTP queue HttpQueueLength
The average number of both read and write requests that were queued on storage. A
high disk queue length is an indication of an application that might be slowing down
due to excessive disk I/O.
Disk queue DiskQueueLength
The average number of HTTP requests that had to sit in the queue before being
fulfilled. A high or increasing HTTP queue length is a symptom of a plan under a heavy
load.
10
11. Autoscale patterns
Scale based on CPU
Scale differently on weekdays vs. weekends
Scale differently during holidays
Scale based on custom metric
11
12. Autoscale concepts
• Each resource can have one autoscale setting:
• Autoscale settings can have one-to-many profiles
• Profiles can have one-to-many rules
• Autoscale increases instances horizontally within bounds:
• Bounds are set by using the minimum, maximum, and default values
• Thresholds are calculated at an instance level
• Autoscale successful actions and failures are logged to the Activity Log
12
13. Autoscale thresholds
Scale is constrained to a minimum and maximum:
◦ Your current instance count must be between the minimum and maximum:
◦ Minimum can help guarantee availability
◦ Maximum can help control costs
13
14. Best practices
• Ensure that the maximum and minimum values are different and have an adequate margin
between them
• Manual scaling is reset by autoscale min and max
• Always use a scale-out and scale-in rule combination that performs an increase and decrease
• Choose the appropriate statistic for your diagnostics metric
• Choose the thresholds carefully for all metric types
14
15. Azure Functions
• Consumption plan
• Scales automatically
• Apps may scale to zero when idle
• Scales CPU and memory
• Max instances
• Windows 200
• Linux 100
• Premium plan
• Scales automatically – no delay (pre-warmed workers)
• Max instances
• Windows 100
• Linux 40
• Dedicated plan
• Requires predictive scaling
15
16. Azure Functions scalability best practices
• Share and manage connections
• Avoid sharing storage accounts between function apps
• Don't host production code in shared function app
• Use async code but avoid blocking calls
• Use multiple worker processes
• Configure host behaviors to better handle concurrency
16
17. API Management
• Policies
• API documentation
• Rate limiting
• Health monitoring
• Modern formats like JSON
• Security
• Scale (manual) process can take from 15 to 45 minutes to apply
• Support autoscale, based on capacity metric (takes at least 20 minutes)
• API Management service in the Consumption tier scales automatically based on the traffic
• Add caching to improve performance (reduce latency for API callers and backend load)
• If you're scaling from or to the Developer tier, there will be downtime. Otherwise no
downtime
APIM
Modern
API
Legacy
API
17
18. Azure Static Web App
• Globally distributed content
• Integration with serverless APIs powered by Azure Functions
• Access to a variety of authentication providers
• First-class GitHub and Azure DevOps integration
• Free SSL certificates, which are automatically renewed
• Add managed Azure Front Door to reduce latency
18
19. Azure Service Bus
• Decouple services for greater scalability and reliability
• Supports larger messages sizes of 256 KB (standard tier) or 100 MB (premium tier) per message
• Supports both at-most-once and at-least-once delivery
• Guarantees first-in, first-out (FIFO) order
• Supports role-based security
◦ Premium tier supports manual and autoscaling
19
20. Storage account
• Depending on region maximum ingress (gp v2) can vary 30-60 Gb ps
• Default maximum request rate per storage account – 20 000 requests ps
• If your application must exceed one of the scalability targets, then create multiple storage
accounts
• Blob type will affect the performance and scalability of your solution
• Connect with Azure CDN
• CDN can typically support much higher egress limits than a single storage account
20
21. Azure SQL Database
• Scale on the fly with minimal downtime
• Elastic pool can be scaled just manually
• Expect a short connection break when the scale up/scale
• Database sharding - split your data into several databases and scale them independently
• SQL Managed Instance doesn’t support serverless mode
21
22. Azure PostgreSQL
• Scale separately vCores and storage
• The number of vCores can be scaled up or down – restarts the server
• Storage can only be scaled up, not down
• Perform scale operations during non-peak hours
22
23. Azure Cosmos DB
• Scale the throughput (RU/s) manually or autoscale
• When the storage increases it increases and autoscale RU/s
• Logical and physical partition
• Choose good partition key
Column family Document
Graph
Turnkey
global
distribution
Elastic scale-out
of storage and
throughput
Guaranteed low
latency at the 99th
percentile
Comprehensive
SLAs
Five well-defined
consistency models
Table API
Key-value
MongoDB
23
24. Optimize performance
• Create a read-replica of the database
• Use a globally distributed database such as Azure Cosmos DB
• Use a CDN to cache content close to users
Origin server
Client
Client
CDN
40 milliseconds
Image
Image
120 milliseconds
24
25. Azure Redis Cache
• Fully Managed Service
• High Performance
• Built-in Reliability
• Choose an appropriate tier
• Flexible Scaling
• It takes approximately 20 minutes
• Caches remain available during the scaling operation
• You can't scale from a higher pricing tier to a lower pricing tier
25
26. Containers
• Containers are lightweight and well suited to scale-out scenarios
Azure Kubernetes Service (AKS) offers two levels of autoscale
• Horizontal autoscale: Can be enabled on service containers to add more or fewer pod
instances within the cluster
• Cluster autoscale: Can be enabled on the agent VM instances running an agent node-pool to
add more or remove VM instances dynamically
• Azure Container Instances
26
27. Application Insight
• Use Application Insights Profiler
• Performance visualization and details
• Smart detection automatically warns about potential problems
• Response time degradation
• Dependency duration degradation
• Slow performance pattern
27
28. Tips and tricks
• Autoscale
• Move load to blobs and CDN
• Use Redis for caching
• Load test and measure up performances
• Use Async in code
• Use queues
• Database Sharding and Partitioning
28
https://learn.microsoft.com/en-us/training/modules/azure-well-architected-performance-efficiency/2-scaling-up-and-scaling-out
Scaling is the process of managing your resources to help your application meet a set of performance requirements
Virtual machine scale set – automated scaling
Azure SQL Database – sharding
Azure App Service – automated scaling
When exposing a Web Application towards the Internet, always place a Web Application Gateway or Azure Front Door in front of the Web App.
The Web Application Firewall (WAF) must be enabled on the Application Gateway whenever using public endpoints for web applications.
The WAF must be configured to use the latest OWASP ruleset core rule set.
The WAF must be configured to “detect and block” or in the so called “prevention mode”.
It is acceptable to have the WAF configured in “detect and log” for finetuning purposes and for investigating issues for a temporary period. Once the beforementioned activities have concluded, “detect and block” must be activated again.
Always ensure that traffic to the backend systems is re-encrypted, once it has been terminated by the Application Gateway to ensure end-to-end encryption.
Transform http traffic to https using redirection
Application Gateways should have both private and public IP address
Enable Application Insights
Enable diagnostic settings
Azure Monitor autoscaling allows you to scale the number of running instances up or down, based on telemetry data (metrics).
By default, Resource Manager-based virtual machines (VMs) and virtual machine scale sets (VMSSs) emit basic (host-level) metrics. In addition, when you configure diagnostics data collection for an Azure VM and VMSS, the Azure diagnostic extension also emits guest-OS performance counters (commonly known as guest-OS metrics). You use all these metrics in autoscale rules.
You can also perform autoscale based on common web server metrics such as the Http queue length. It's metric name is HttpQueueLength. This table lists available server farm (Web Apps) metrics.
You can scale by Storage queue length, which is the number of messages in the storage queue. Storage queue length is a special metric and the threshold is the number of messages per instance. For example, if there are two instances and if the threshold is set to 100, scaling occurs when the total number of messages in the queue is 200. That can be 100 messages per instance, 120 and 80, or any other combination that adds up to 200 or more.
You can scale by Service Bus queue length, which is the number of messages in the Service Bus queue. Service Bus queue length is a special metric and the threshold is the number of messages per instance. For example, if there are two instances and if the threshold is set to 100, scaling occurs when the total number of messages in the queue is 200. That can be 100 messages per instance, 120 and 80, or any other combination that adds up to 200 or more.
Azure Monitor Autoscale applies only to Virtual Machine Scale Sets, Cloud Services, App Service - Web Apps, and API Management services.
A resource can have only one autoscale setting.
An autoscale setting can have one or more profiles, and each profile can have one or more autoscale rules.
An autoscale setting scales instances horizontally, which is moved out by increasing the instances and in by decreasing the number of instances. An autoscale setting has a maximum, minimum, and default value of instances.
An autoscale job always reads the associated metric to scale by checking if it has crossed the configured threshold for scale-out or scale-in.
All thresholds are calculated at an instance level. For example, "scale out by one instance when average CPU > 80% when instance count is 2", means scale-out when the average CPU across all instances is greater than 80%.
All autoscale failures are logged to the Activity Log. You can then configure an activity log alert so that you can be notified via email, SMS, or webhooks whenever there is an autoscale failure.
Similarly, all successful scale actions are posted to the Activity Log. You can then configure an activity log alert so that you can be notified via email, SMS, or webhooks whenever there is a successful autoscale action. You can also configure email or webhook notifications to get notified for successful scale actions via the notifications tab on the autoscale setting.
With autoscale, you can add the right amount of resources to handle increased load on your application. It also helps you save money by removing idle resources.
You specify a minimum and maximum number of instances to run, and the system will automatically add or remove VMs based on a set of rules.
Having a minimum ensures that an application is always running, even under no load.
Having a maximum limits your total possible hourly cost.
Ensure that the maximum and minimum values are different and have an adequate margin between them
If you have a setting that has minimum=2, maximum=2 and the current instance count is 2, no scale action can occur. Keep an adequate margin between the maximum and minimum instance counts, which are inclusive. Autoscale always scales between these limits.
Manual scaling is reset by autoscale min and max
If you manually update the instance count to a value above or below the maximum, the autoscale engine automatically scales back to the minimum (if below) or the maximum (if above). For example, you set the range between 3 and 6. If you have one running instance, the autoscale engine scales to three instances on its next run. Likewise, if you manually set the scale to eight instances, on the next run autoscale will scale it back to six instances. Manual scaling is temporary unless you also reset the autoscale rules.
Always use a scale-out and scale-in rule combination that performs an increase and decrease
If you use only one part of the combination, autoscale will only take action in a single direction (scale out, or in) until it reaches the maximum, or minimum instance counts defined in the profile. This is not optimal because ideally, you want your resource to scale up at times of high usage to ensure availability. Similarly, at times of low usage you want your resource to scale down, so you can realize cost savings.
Choose the appropriate statistic for your diagnostics metric
For diagnostics metrics, you can choose among Average, Minimum, Maximum and Total as a metric to scale by. The most common statistic is Average.
Choose the thresholds carefully for all metric types
We recommend carefully choosing different thresholds for scale-out and scale-in based on practical situations.
We do not recommend autoscale settings like the examples below with the same or very similar threshold values for out and in conditions:
Increase instances by 1 count when Thread Count <= 600
Decrease instances by 1 count when Thread Count >= 600
Estimation during a scale-in is intended to avoid "flapping" situations, where scale-in and scale-out actions continually go back and forth. Keep this behavior in mind when you choose the same thresholds for scale-out and -in.
Considerations for scaling threshold values for special metrics
For special metrics such as Storage or Service Bus Queue length metric, the threshold is the average number of messages available per current number of instances. Carefully choose the threshold value for this metric.
API documentation. Documentation of APIs enables calling clients to quickly integrate their solutions. API Management allows you to quickly expose the structure of your API to calling clients through modern standards like Open API. You can have more than one version of an API. With multiple versions, you can stage app updates as your consuming apps don't have to use the new version straight away.
Rate limiting access. If your API could potentially access a large amount of data, its a good idea to limit the rate at which clients can request data. Rate limiting helps maintain optimal response times for every client. API Management let you set rate limits as a whole or for specific individual clients.
Health monitoring. APIs are consumed by remote clients. So it can be difficult to identify potential problems or errors. API Management lets you view error responses and log files, and filter by types of responses.
Modern formats like JSON. APIs have used many different data exchange formats over the years from XML to CSV and many more. API Management enables you to expose these formats using modern data models like JSON.
Connections to any API. In many businesses, APIs are located across different countries and use different formats. API Management lets you add all of these disparate APIs into single modern interface.
Analytics. As you develop your APIs, it's useful to see how often your APIs are being called and by which types of systems. API Management allows you to visualize this data within the Azure portal.
Security. Security is paramount when dealing with system data. Unauthorized breaches can cost companies money, time lost in reworking code, and reputational loss. Security tools that you can use with Azure API management include OAuth 2.0 user authorization, and integration with Azure Active Directory.
https://docs.microsoft.com/en-us/learn/modules/publish-manage-apis-with-azure-api-management/2-create-an-api-gateway
Static web apps are commonly built using libraries and frameworks like Angular, React, Svelte, or Vue. These apps include HTML, CSS, JavaScript, and image assets that make up the application. When using a traditional web server architecture, these files are served from a single server along side any required API endpoints.
Additional Talk:
With Static Web Apps, developers can use modular and extensible patterns to deploy apps in minutes while taking advantage of the built-in scaling and cost-savings offered by serverless technologies. Pre-rendering static content (including HTML, CSS, JavaScript, and image files) and leveraging global content distribution to serve this content removes the need for traditional web servers generating the content with every request. Moving dynamic logic to serverless APIs unlocks dynamic scale that can adjust to demand in real time and can empower developers to access the benefits of microservices as they evolve and extend individual app components.
https://docs.microsoft.com/en-us/learn/modules/publish-app-service-static-web-app-api/1-introduction?ns-enrollment-type=LearningPath&ns-enrollment-id=learn.azure-static-web-apps&pivots=angular
https://techcommunity.microsoft.com/t5/apps-on-azure-blog/introducing-app-service-static-web-apps/ba-p/1394451
Azure Static Web Apps is a service that automatically builds and deploys full stack web apps to Azure from a code repository.
The workflow of Azure Static Web Apps is tailored to a developer's daily workflow. Apps are built and deployed based off code changes.
When you create an Azure Static Web Apps resource, Azure interacts directly with GitHub or Azure DevOps to monitor a branch of your choice. Every time you push commits or accept pull requests into the watched branch, a build is automatically run and your app and API is deployed to Azure.
https://docs.microsoft.com/en-us/azure/static-web-apps/overview?WT.mc_id=dotnet-00000-cephilli
Globally distributed web hosting puts static content like HTML, CSS, JavaScript, and images closer to your users
Integrated API support provided by Azure Functions
First-class GitHub and Azure DevOps integration where repository changes trigger builds and deployments.
Free SSL certificates, which are automatically renewed
Unique preview URLs for previewing pull requests
https://docs.microsoft.com/en-us/learn/modules/publish-app-service-static-web-app-api/1-introduction?ns-enrollment-type=LearningPath&ns-enrollment-id=learn.azure-static-web-apps&pivots=angular
Image link: https://docs.microsoft.com/en-us/shows/on-net/getting-started-with-azure-static-web-apps (@05:51)
A Service Bus queue is a simple temporary storage location for messages. A sending component adds a message to the queue. A destination component picks up the message at the front of the queue. Under ordinary circumstances, each message is received by only one receiver.
Queues decouple the source and destination components to insulate destination components from high demand.
Additional Talk:
A queue responds to high demand without needing to add resources to the system. However, for messages that need to be handled quickly, creating additional instances of your destination component can allow them to share the load. Each message is handled by only one instance.
https://docs.microsoft.com/en-us/learn/modules/implement-message-workflows-with-service-bus/2-choose-a-messaging-platform
The key advantages of Service Bus queues include:
Supports larger messages sizes of 256 KB (standard tier) or 100 MB (premium tier) per message versus 64 KB for Azure Storage queue messages.
Supports both at-most-once and at-least-once delivery. Choose between a very small chance that a message is lost or a very small chance it's handled twice.
Guarantees first-in, first-out (FIFO) order. Messages are handled in the same order they are added. Note that although FIFO is the normal operation of a queue, the default FIFO pattern is altered if the organization sets up sequenced or scheduled messages or during interruptions like a system crash.
Can group multiple messages in one transaction. If one message in the transaction fails to be delivered, all messages in the transaction aren't delivered.
Supports role-based security.
Does not require destination components to continuously poll the queue.
https://docs.microsoft.com/en-us/learn/modules/implement-message-workflows-with-service-bus/2-choose-a-messaging-platform
Image link: https://docs.microsoft.com/en-us/azure/service-bus-messaging/service-bus-quickstart-portal
A storage account is a container that groups a set of Azure Storage services together. Only data services from Azure Storage can be included in a storage account (Azure Blobs, Azure Files, Azure Queues, and Azure Tables). The following illustration shows a storage account containing several data services.
A storage account is an Azure resource and is part of a resource group. The following illustration shows an Azure subscription containing multiple resource groups, where each group contains one or more storage accounts.
https://docs.microsoft.com/en-us/learn/modules/create-azure-storage-account/2-decide-how-many-storage-accounts-you-need
As a fully managed service, Azure Cosmos DB takes database administration off your hands with automatic management, updates and patching. It also handles capacity management with cost-effective serverless and automatic scaling options that respond to application needs to match capacity with demand.
Azure Cosmos DB is a globally distributed and elastically scalable database. It has a guaranteed low latency that is backed by a comprehensive set of Service Level Agreements (SLAs). Consistency can sometimes be an issue when you are working with distributed systems, but Azure Cosmos DB alleviates this situation by offering you five different consistency levels: strong, bounded staleness, session, consistent prefix, and eventual.
All of the above is supported by a multi-model Azure Cosmos DB's approach, which provides you with the ability to use document, key-value, wide-column, or graph-based data.
The final choice you have is how to access and manipulate your data. Azure Cosmos DB was built to support multiple different models, and you can continue to use industry standard APIs if they are already part of your application or database design.
https://docs.microsoft.com/en-us/azure/cosmos-db/introduction#:~:text=As%20a%20fully%20managed%20service,to%20match%20capacity%20with%20demand.
https://docs.microsoft.com/en-us/learn/modules/choose-api-for-cosmos-db/2-identify-the-technology-options
Image link: https://devblogs.microsoft.com/cosmosdb/
Azure Cache for Redis
Fully managed, open source–compatible in-memory data store to power fast, scalable applications
Fully managed service
Enjoy a fully managed version of the popular open-source Redis server with a turnkey caching solution. Harness the benefits without the need to become an expert in deploying and managing it.
High performance
Azure Cache for Redis achieves superior throughput and latency performance by storing data in memory instead of on disk. It consistently serves read and write requests within single-digit milliseconds, delivering exceedingly fast cache operations to scale data tiers as application loads increase.
Built-in reliability
Standard and Premium tiers include a redundant pair of virtual machines (VMs) configured for data replication to ensure maximum reliability. Premium caches also can replicate data across Azure regions as part of an application’s disaster-recovery implementation.
Flexible scaling
With three tiers, Azure Cache for Redis fits your needs. Start with any cache size and scale up to a larger one later without any service downtime or scale down a cache within the same tier.
Enterprise-grade security
Azure Cache for Redis supports industry-standard SSL to secure your data in transit and Azure Storage disk encryption at rest. Premium caches can be placed in your own Azure Virtual Network (VNet) so that you can further restrict traffic routes to and from your cache through your VNet topology and access policies.
Open source compatible
At its core, Azure Cache for Redis is backed by the open-source Redis server and natively supports data structures such as strings, hashes, lists, sets and sorted sets. If your application uses Redis, it will work as-is with Azure Cache for Redis.
Source:
https://azure.microsoft.com/en-in/services/cache/