Building Resilient Azure Solutions for Office 365 - SharePoint Saturday Atlanta 2017
1. Building Resilient Azure Solutions for Office 365
Josh Carlisle
B&R Business Solutions
@joshcarlisle
Developer | Level 200 | SharePoint Saturday Atlanta
#SPSATL #Office365 #Azure
2. Who Am I
• Raleigh North Carolina
• Senior Solution Developer | B&R Business Solutions x 12 years
• Developer x 20 years | SharePoint x 12 Years | Azure & Office 365 x 4 Years
• Involved with SharePoint Saturday since the first event back in 2009
• Twitter: @joshcarlisle
• Web: www.joshcarlisle.io
3.
4.
5. Overview
• Overview on the importance of resiliency in your
Azure based Office 365 solutions.
• How to architect your infrastructure for resiliency
• How to architect your application for resiliency.
6. Resiliency is all about your
application continuing to work
despite having problems.
8. Office 365 & Azure
• Azure is a powerful and one of the most popular solutions platforms for
Office 365.
• Many organizations today have custom SharePoint Add-ins that are
hosted on Azure.
• Many organizations employ custom services used throughout Office
365 that are hosted on Azure.
• Azure Functions are becoming increasingly popular solution for various
custom solutions throughout SharePoint & Office 365.
• Organizations often have expectations in regards to solutions hosted in
Azure (or any cloud platform)
9. Why you need to think about Resiliency?
• Organizations expect instant resiliency by virtue of being in the
cloud.
• Organizations expect auto-magic continuity during outages or
periods of reduced availability.
• Organizations expect any custom solution deployed to the
cloud will have instant resiliency.
• Addressing Resiliency is different for every application
depending on your solution.
• Careful analysis should be made to identify potential failure
points.
• Should be done early in project due to infrastructure,
application design, and cost considerations.
10. Common Azure Solution – Identifying Potential Infrastructure &
Edge Points of Failure
API Endpoint
External API
Endpoint
(third party)
Azure East US Region
Sql Database
11. Common Azure Solution – Adding Resiliency
API Endpoint
External API
Endpoint
(third party)
Azure East US Region
API Endpoint
Azure West US Region
SQL DB
SQL DB
Queue
Queue
Function
Function
Failover
Local ReadsTraffic
Manager
13. What about ….
• Azure Storage w/replication
•Service Bus Queues w/namespaces
• Cosmos DB w/ global distribution
•Virtual Machines w/ VM Scale Sets
• Serverless Architectures
• Container Based Architectures
14. Software Resiliency
• Modern Applications are often simply orchestrating calls to other systems and
services
• Commonly experience transient failures – the “try that again” type failure
• Also experience more impactful events like service outages that can potentially
take longer to resolve.
• Commonly applications end with timeouts and other type of blocked
operations
• Non Transient failures can consumewaste resources and contribute to slower
recovery time for effected services.
• There are common software patterns that can address these including
Retry Pattern (Transient)
Circuit Breaker Pattern (Non-Transient)
Queue Based Load Leveling
15. Transient Failures & Retry Pattern
• Cloud based solutions are often heavily dependent on network connections and external
services.
• Many issues caused by these types of failures are self healing and quickly fix themselves.
• Common issues like dropped databased connection, services under heavy load, external
services throttling connections, issues caused by load balancers, etc
• Many services on Azure have built in retry policies available that offer various degrees of
configuration
• Many common database frameworks such as Entity Framework also provide for retry policies
• Consider dedicated libraries such as Polly.
• Identify types of exceptions and faults that are candidates for retry. Idempotent functions are
good candidates (safe to be retried – same result whether execute 1x or 20x)
• WARNING!!! Aggressive retry policies could further degrade failing services!!
16. What does this look like?
source: https://docs.microsoft.com/en-us/azure/architecture/patterns/retry
18. Circuit Breaker Pattern
• At a certain point exceptions become non-transient. The Circuit
Breaker pattern is intended to keep an application from retrying a
request that is likely to fail.
• Can often be utilized in conjunction with the Retry Pattern.
• Ideal for external resources & services.
• Prevents the over consumption of resources on our own system
caused by failure of another system (think timeouts).
• User requests fail more quickly but just as importantly resources
won’t be blocked waiting on failure.
• Allows downstream services to potentially recover faster
21. Parting Thoughts….
Plan & Design for
Resiliency
This is not just an IT problem to solve. This is not just a developer problem to solve.
Don’t expect someone else to solve this. Everyone needs to …
22. Additional Resources
• Resiliency Checklist - https://docs.microsoft.com/en-
us/azure/architecture/checklist/resiliency
• Azure Retry Policies by Technology -
https://docs.microsoft.com/en-us/azure/architecture/best-
practices/retry-service-specific
• Polly - https://github.com/App-vNext/Polly
PLEASE FILL OUT THE EVALUATION FORMS