Building a reliable, scalable, secure applications could happen either following verified design patterns or the hard way - following the trial and error approach. Azure architecture patterns are a tested and accepted solutions of common challenges thus reducing the technical risk to the project by not having to employ a new and untested design. However, most of the patterns are relevant to any distributed system, whether hosted on Azure or on other cloud platforms.
7. Cloud Architecture Design Challenges
Availability
Data Management
Consistency
Messaging
Management & Monitoring
Performance & Scalability
Recover from Failures
Security
The time a system is functional and working (SLA Uptime %)
Data on multiple locations, performance and consistency
Predictable behaviour, reusable decisions, maintainability
Required by the distributed loosely coupled nature of the cloud
Expose runtime and debug information
Responsiveness within time unit, handle load w/o impact
Detecting failures, and recovering quickly and efficiently
Prevent malicious or accidental issues outside the designed usage
8. Management & Monitoring Patterns
Anti-corruption Layer
• Adapter layer between 2 subsystems to
isolate and translate semantics
• Consider
o Data consistency
o Extra maintenance point
o Permanent vs retiring layers
o Possible overhead and scalability challenges
o Interoperation with legacy system requires shared semantics
• Not Suitable
o No significant semantic differences between systems
Gateway Offloading Pattern
• Offload shared or specialized service
functionality to an API gateway / proxy
• Consider
o Central handling of shared features
o Maintain shared features in multiple places
o Delegate certain responsibilities to specialized team (i.e. security)
o Secure appropriate scaling, avoid bottlenecks
• Not Suitable
o When introducing coupling between services
o No business logic shall be offloaded (keep reusable)
9. When to use Messaging Services in Azure?
Event Grid
• Event-driven reactive
programming
Key Points
• Cheap (€0.5 per 1M)
• At least once delivery
• Does not deliver data
• Order delivery not possible
• Push-based delivery
Reason to Use
• React on event
Event Hub
• Big data and telemetry
pipeline
Key Points
• Semi-cheap (€9.2/month)
• At least once delivery
• Low latency, millions of
events per sec.
Reason to Use
• Telemetry streaming
Service Bus
• High value secure
messaging and control
Key Points
• 13M free, €0.0114 unit/h
• Batches, filters, duplicate
detection, transactions
• Pull-based delivery
• Order delivery
Reason to Use
• X-system transactions
https://docs.microsoft.com/en-us/azure/event-grid/compare-messaging-services
10. Claim-Check Pattern
• Split large messages into claim-payload to
protect message bus from overwhelming
• Consider
o Custom logic to apply pattern for large messages only
o Delete the message data after consuming
• Not Suitable
o Overhead for small messages
Messaging Patterns
Asynch Request-Reply
• Decouple backend processing from a frontend host
• Consider
o Validate request prior to starting a long running task
o API endpoint shall return:
• Location – a place where to poll, includes CorrelationID
• Retry Interval – when to retry for new status to reduce unnecessary load
o Callback endpoint could be used instead
• Not Suitable
o When response latency is important
o When callback, WebSockets or SignalR are possiblehttps://github.com/SeanFeldman/ServiceBus.AttachmentPlugin
11. Availability Patterns
Throttling
• Control resource consumption to allow
system functioning under extreme load
• Consider
o Reject requests from some users (service time or last)
o Degrade Quality of Service (bandwidth, compression, time)
o Delay operation (i.e. queue processing time)
o Provide specific Throttling error code to denied user
o Quickly detect high demand and apply throttling
Health Endpoint Monitoring
• Functional check of an application using
external endpoint on given interval
• Consider
o Tools (App Insights Web Tests, System Center Ops Manager)
o Get response time to a health verification endpoint
o Analyse results (Alive != Working, response body)
o Consider action in case of failure (i.e. restart)
o Secure the endpoint – authentication, IP filter
12. Two Types of Azure Queue Services?
Storage Queue
• Part of Azure storage infrastructure
• Simple Get/Put/Peek interface
Key Points
• No message ordering guarantee
• At least once delivery
• Intended for decoupling for scalability scenarios
• Scheduled delivery and poison message support
• No duplicate detection
• No session support
• Max message size 64KB
Service Bus Queue
• Part of Azure Messaging infrastructure
• Publish–subscribe model
Key Points
• Message ordering guarantee
• At least once / At most once delivery
• Transactions within a queue
• Scheduled delivery and poison message support
• Duplicate detection based on MessageId
• Session support for message processing affinity
• Max message size 1MB
https://docs.microsoft.com/en-us/azure/event-grid/compare-messaging-services
13. Pipes and Filters Integration
• Decompose a complex task in separate
individually scalable elements
• Suitable – reusable pipeline filters; avoid
bottleneck filters; breakable processing;
flexibility to reorder steps; context sharing
Consistent Design Patterns
• Sample Implementation
o Message Queue (i.e. AZ Service Bus) receives raw message
o Filter task (i.e. AZ Function App) listens and transforms message
o Message enqueued on next queue
o Until final message is built
• Not suitable – processing steps are not
independent (i.e. bad design) or transactional;
huge context may make the process
inefficient; not sufficient scalability of
underlying resources (i.e. DB)
https://github.com/mspnp/cloud-design-patterns/tree/master/pipes-and-filters
14. Gateway Aggregation
• Aggregate individual requests to one.
Improve performance on high-latency
network
• Suitable
o CorrelationID identification of original calls
o Partial response on service failure
o Caching
o No service coupling for backend services
o Near to backend to reduce latency
• Not Suitable
o Need to reduce calls to backend (i.e. with batch handling)
o Application is near to backend and latency is practically zero
Consistent Design Patterns
Command and Query Request Segregation
• Separate Read and Create operations to a datastore
for performance, scalability and security.
• Suitable
o Enqueue async commands
o Separate R/W databases, individual scaling R/W endpoints
• Not suitable
o Simple domain and business rules;
o When CRUD interface implementation is sufficient
15. Ambassador Proxy
• External process sends requests on behalf
of a consumer service or application
• Suitable
o Offload common topics on same host as a sidecar;
o Extend legacy or not modifiable apps
• Not suitable
o Latency overhead unacceptable
o Context sharing required
o Reusability cannot be achieved
Consistent Design Patterns
Strangler Façade
• Incrementally migrate a legacy system,
gradually replacing pieces of functionality
• Suitable
o Avoid bottleneck façade;
o Assure common resources are accessible
• Not suitable
o Cannot intercept backend calls;
o No complex wholesale replacement
16. Sidecar Pattern
• Deploy components in a separate process to
provide isolation and encapsulation
• i.e. Infrastructure Sidecar - monitors main app
• Consider
o Suitable interprocess communication (reliability, performance)
o Service or daemon instead of sidecar
• Suitable
o Heterogeneous languages
o Different teams or entity owns a component
o Independent update of components shall be enabled
• Not Suitable
o Performance of communication is critical
o Small solutions where the design benefit is not worth
o Individual scaling requirements may require a service
Consistent Design Patterns
17. Circuit Breaker Pattern
• Prevent application from repeatedly trying
to execute remote service, likely to fail.
• Suitable – temporary errors due to
timeout, network issues, high resource
utilization
o Closed – request routed to endpoint (Threshold)
o Open – request fails immediately (Timer)
o Half-open - limited number of requests are monitored to decide
on Open/Closed
• Challenges
o Resource differentiation and resource abstraction
o Manual override to open state
o Enqueue failed requests for reprocessing
• Not Suitable - local/in-memory resources
Resiliency Patterns
18. Resiliency Patterns
Compensating Transaction Pattern
• Undo the work from an eventually
consistent transaction from series of steps.
• Challenges
o Simple replace of previous state is rarely possible
o Record information on each step on how steps can be undone.
o Undo might not be doable in exactly the reverse order
o Consider retry logic to try avoiding compensating transactions
o Restore first the more sensitive to changes entities
o Compensating transaction shall be idempotent (repeatable)
• Suitable
o Avoid distributed transactions with eventual consistency
o Undo failed steps by performing the reverse action
• Not Suitable
o Try to avoid the complications if possible
19. Valet Key Pattern
• Token for restricted access to resource to offload
workload from the main application
• Consider
o Manage key validation, use short key expiration
o Key only for the required operation
o Audit all operations; deliver the key securely
o Provide the client with a key or token that the data store can validate.
• Not Suitable
o When action is required before sending to datastore.
o Limit user behavior – i.e. subscribe to events of resource to validate
Gatekeeper Pattern
• Limit attack surface by using dedicated
instance to sanitize and validate requests
• Challenges
o The backend host shall not expose unprotected endpoints
o May introduce single point of failure or performance hit
o Shall not perform actions other than sanitization
• Suitable
o Services with high degree of sensitive data
o Centralize validation
Security Patterns
20. Resiliency Patterns
Queue-Based Load Leveling
• Use a buffer queue between a task and a
service to smooth demand peaks.
• Suitable
o Maximize availability when overloading is expected
o Maximize scalability as Tasks and Services grow independently
o Reasonable cost – services scale on load, rather than max load
• Challenges
o Communication is one-way. When response is needed, use Async
request-reply with correlation ID (i.e. sequence Nr)
o Control the rate of consuming messages to avoid overload of
underlying resources (i.e. scaling consumers and DB contention)
• Not Suitable
o When minimal latency is critical
21. Performance and Scalability Patterns
Materialized View Pattern
• Prepopulated view in the necessary format
to support efficient querying
• Suitable
o Performance improvement; Limited data access
o Query simplification – ignore data complexity RDBMS + NOSQL
• Challenges
o Reusability – likely to have multiple hits
o Disposability – can be regenerated at any time
o Variability – can vary on user or query parameters
o Consistency – data may become outdated
o Regeneration – update on new data, manual trigger
• Not Suitable
o Easy to read source data
o Data changes very quickly and requires lots of regenerations
o Consistency is a priority
o Domain Driven Design – behaviour w/o data (as in microservices)