Cloud Design Patterns:
Prepare your application for Azure
Carlos Mendible
+34 648 76 84 17
carlos.mendible@sogeti.com
Carlos Mendible
2
Lead Solutions Architect
carlos.mendible.com/blog
@cmendibl3
carlosmendible
Agenda
► Design for the Cloud
► Problem Areas in the Cloud
• Availability
• Data Management
• Design and Implementation
• Messaging
• Management and Monitoring
• Performance and Scalability
• Resiliency
• Security
► Cloud Design Patterns
• Cache-aside
• Circuit Breaker
• Competing Consumers,
• CQRS
• Event Sourcing
• Valet Key,
• Health Endpoint Monitoring,
• Static Content Hosting
3
Pokemon GO Facts
► Total number of downloads – 100 million (by August 8th, Google Play Mkt)
► Total revenue – $268 million (by August 12st)
► Percentage of iOS users that do in-app purchases – 80%
► Daily Active Users – 20+ millions
► Gender female vs. men split percentage – 40/60
http://www.businessofapps.com/pokemon-go-usage-revenue-statistics/
Pokemon GO Architecture?
► Google Cloud Platform
► Java
► NoSQL (BigTable?)
► No offline mode
► No global consistency
► Load Balancing
► Sharding
► Dynamic Scaling
Design for the Cloud
Designing for Cloud
Multi-tenant
Distributed
system
Abstraction
Commodity
hardware at
Internet scale
Composed of
multiple
services
Services at Internet Scale
► Failure is expected
► Latency is a fact of life
► CAP: Consistent, available, and partition tolerant … pick two.
► Upgrade without downtime requires multiple concurrent service versions
► You can’t know what you don’t measure.
► Nothing is like production
► Services should be as simple as possible
► As services scale, cost/resource should decline
8
Designing for Cloud
• Partition application, scale by adding (or removing) resources
• Optimize density by using resources efficiently
• Use the right services for the right job
Design for
Scale (Out)
• Degrade gracefully, isolate faults, fallback to alternate delivery paths
• Ensure customers (and client devices) can access and use the service
• Services that are “live”, but cannot handle desired/required demand are
not available
Design for
Availability
• Insight is critical; instrumentation, monitoring and alerting
• Lifecycle management; service operations, configuration and updates
• Know the quality of your end user experience before Twitter does
Design for
Operations
Problem Areas in the Cloud
11
The Book: Cloud Design Patterns
►http://aka.ms/Cloud-Design-Patterns
Problem Areas in the Cloud
► Availability
► Data Management
► Design and Implementation
► Messaging
► Management and Monitoring
► Performance and Scalability
► Resiliency
► Security
12
Availability
► Availability defines the proportion of time that the system is functional
and working.
► Cloud applications typically provide users with a service level agreement
(SLA)  maximize availability.
13
14
Data Management
► Is the key element of cloud applications, and influences most of the
quality attributes.
► Data is typically hosted in different locations and across multiple
servers.
15
Design and Implementation
► Consistency and coherence in component design and deployment
► Maintainability
► Reusability
► Decisions --> huge impact on the quality and the TCO
16
Messaging
► Messaging infrastructure to connect the components and services
► Loosely coupled.
► Asynchronous messaging.
17
Management and Monitoring
► Management and monitoring more difficult than an on-premises
deployment.
► Applications must expose runtime information that administrators and
operators can use to manage and monitor the system.
► Applications support changing business requirements and customization
without requiring the application to be stopped or redeployed.
18
Performance and Scalability
► Performance is an indication of the responsiveness of a system to
execute any action within a given time interval.
► Scalability is ability of a system either to handle increases in load
without impact on performance.
► Applications should be able to scale out within limits to meet peaks in
demand, and scale in when demand decreases.
19
Resiliency
► Resiliency is the ability of a system to gracefully handle and recover
from failures.
► Cloud + Internet  Increased likelihood that both transient and more
permanent faults will arise.
► Detecting failures, and recovering quickly and efficiently, is necessary
20
Security
► Applications must be designed and deployed in a way that protects
them from malicious attacks, restricts access to only approved users,
and protects sensitive data.
Cloud Design Patterns
Retry Pattern
Runtime Reconfiguration
Scheduler Agent Supervisor
Sharding Pattern
Computer Resource
Consolidation
Throttling
External Configuration Store
Federated Identity
Gatekeepers
Compensating transaction
Index table
Leader Election
Materialized View
Pipes and Filters
Priority Queue
Queue-Based Load Leveling
Cloud Design Patterns
Cache-Aside
Circuit Breaker
Competing Consumers
CQRS
Event Sourcing
Health Endpoint Monitoring
Static Content Hosting
Valet Key
23
Cache-Aside
Context Solution Usage
 Applications use a cache to
optimize repeated access to
information held in a data
store.
 Usually impractical to expect
that cached data will always
be completely consistent
with the data in the data
store.
 Implement a strategy that
helps to ensure that the
data in the cache is up to
date as far as possible
 Detect and handle situations
that arise when the data in
the cache has become stale.
 Load data into the cache on
demand.
 A cache does not provide
native read-through and
write-through operations.
 Resource demand is
unpredictable.
 When the cached data set is
static.
 For caching session state
information in a web
application hosted in a web
farm
Cache-Aside Demo
25
Circuit Breaker
Context Solution Usage
 Access remote resources
and services.
 Partial loss of connectivity.
 Complete failure of a
service.
 Pointless for an application
to continually retry the
operation.
 Avoid cascading failures
 Prevent an application
repeatedly trying to execute
an operation that is likely to
fail.
 Detect whether the fault has
been resolved
 Prevent an application from
attempting to invoke a
remote service or access a
shared resource if this
operation is highly likely to
fail.
 Handling access to local
private resources in an
application.
 As a substitute for handling
exceptions in the business
logic of your applications
Circuit Breaker Demo
27
Competing Consumers
Context Solution Usage
 Application running in the
cloud may be expected to
handle a large number of
requests.
 Application pass the request
through a messaging system
and then handles them
asynchronously through a
consumer service
 Use a message queue to
implement the
communication channel
between the application and
the instances of the
consumer service.
 Consumer service instances
receive messages from the
queue and process them
 Application workload can
run asynchronously.
 Tasks are independent and
can run in parallel.
 Volume of work is highly
variable.
 Not easy to separate the
application workload into
discrete tasks
 Tasks must be performed
synchronously or in a
specific sequence
28
CQRS
Context Solution Usage
 CRUD operations are
applied to the same
representation of an
entity.
 Data contention in a
collaborative domain.
 Mismatch between
the read and write
representations of the
data
 Use of separate query and
update models for the data.
 Common to separate the
data into different physical
stores to maximize
performance, scalability, and
security
 Task-based user interfaces.
 Performance of data reads
must be tuned separately
from data writes.
 Integration with other
systems.
 Simple Business rules
 CRUD is sufficient.
 Implementation across the
whole system
29
Event Sourcing
Context Solution Usage
 CRUD systems perform
update operations directly:
hit performance and
responsiveness, and limit
scalability.
 Need to records the details
of each operation in a
separate log.
 Handle operations on data
that is driven by a sequence
of events.
 Use an append-only event
store.
 Capture “intent,” “purpose,”
or “reason”
 It’s vital to minimize
conflicting updates
 Restore the state of a
system.
 Eventual consistency is
acceptable
 Simple domains
 Consistency is required
30
Health Endpoint Monitoring
Context Solution Usage
 It is more difficult to monitor
services running in the
cloud than it is to monitor
on-premises services.
 Services typically depend on
other services provided by
third parties.
 Ensure the required level of
availability (SLA)
 Implement health
monitoring by sending
requests to an endpoint on
the application
 Verify availability.
 Check for correct operation.
 Monitoring middle-tier or
shared services.
 Complement existing
instrumentation
 Does not replace the
requirement for logging and
auditing.
Health Endpoint Monitoring Demo
32
Static Content Hosting
Context Solution Usage
 Requests to download static
content.
 Processing cycles can be put
to better use.
 Locating some of an
application’s resources and
static pages in a storage
service.
 Minimize costs related to
hosting static content.
 CDN.
 Monitor costs and bandwith
usage.
 The application needs to
perform some processing
on the static content.
 The volume of static content
is very small.
33
Valet Key
Context Solution Usage
 Client programs and web
browsers often need to read
and write files or data
streams to and from an
application’s storage.
 This approach absorbs
valuable resources such as
compute, memory, and
bandwidth.
 Data stores have the
capability to handle upload
and download of data.
 Provide the client with a key
or token (vale-key) that the
data store itself can validate.
 Provides time-limited access
to specific resources.
 Maximize performance and
scalability.
 Minimize operational cost.
 Clients regularly upload or
download data.
 If the application must perform
some task on the data before
it is stored or before it is sent
to the client.
 Audit trails or control the
number of times a data
transfer
 Limit the size of the data
Valet Key Demo
Questions?
Carlos Mendible
Thank you
Carlos Mendible

Cloud Design Patterns

  • 1.
    Cloud Design Patterns: Prepareyour application for Azure Carlos Mendible
  • 2.
    +34 648 7684 17 carlos.mendible@sogeti.com Carlos Mendible 2 Lead Solutions Architect carlos.mendible.com/blog @cmendibl3 carlosmendible
  • 3.
    Agenda ► Design forthe Cloud ► Problem Areas in the Cloud • Availability • Data Management • Design and Implementation • Messaging • Management and Monitoring • Performance and Scalability • Resiliency • Security ► Cloud Design Patterns • Cache-aside • Circuit Breaker • Competing Consumers, • CQRS • Event Sourcing • Valet Key, • Health Endpoint Monitoring, • Static Content Hosting 3
  • 4.
    Pokemon GO Facts ►Total number of downloads – 100 million (by August 8th, Google Play Mkt) ► Total revenue – $268 million (by August 12st) ► Percentage of iOS users that do in-app purchases – 80% ► Daily Active Users – 20+ millions ► Gender female vs. men split percentage – 40/60 http://www.businessofapps.com/pokemon-go-usage-revenue-statistics/
  • 5.
    Pokemon GO Architecture? ►Google Cloud Platform ► Java ► NoSQL (BigTable?) ► No offline mode ► No global consistency ► Load Balancing ► Sharding ► Dynamic Scaling
  • 6.
  • 7.
  • 8.
    Services at InternetScale ► Failure is expected ► Latency is a fact of life ► CAP: Consistent, available, and partition tolerant … pick two. ► Upgrade without downtime requires multiple concurrent service versions ► You can’t know what you don’t measure. ► Nothing is like production ► Services should be as simple as possible ► As services scale, cost/resource should decline 8
  • 9.
    Designing for Cloud •Partition application, scale by adding (or removing) resources • Optimize density by using resources efficiently • Use the right services for the right job Design for Scale (Out) • Degrade gracefully, isolate faults, fallback to alternate delivery paths • Ensure customers (and client devices) can access and use the service • Services that are “live”, but cannot handle desired/required demand are not available Design for Availability • Insight is critical; instrumentation, monitoring and alerting • Lifecycle management; service operations, configuration and updates • Know the quality of your end user experience before Twitter does Design for Operations
  • 10.
  • 11.
    11 The Book: CloudDesign Patterns ►http://aka.ms/Cloud-Design-Patterns
  • 12.
    Problem Areas inthe Cloud ► Availability ► Data Management ► Design and Implementation ► Messaging ► Management and Monitoring ► Performance and Scalability ► Resiliency ► Security 12
  • 13.
    Availability ► Availability definesthe proportion of time that the system is functional and working. ► Cloud applications typically provide users with a service level agreement (SLA)  maximize availability. 13
  • 14.
    14 Data Management ► Isthe key element of cloud applications, and influences most of the quality attributes. ► Data is typically hosted in different locations and across multiple servers.
  • 15.
    15 Design and Implementation ►Consistency and coherence in component design and deployment ► Maintainability ► Reusability ► Decisions --> huge impact on the quality and the TCO
  • 16.
    16 Messaging ► Messaging infrastructureto connect the components and services ► Loosely coupled. ► Asynchronous messaging.
  • 17.
    17 Management and Monitoring ►Management and monitoring more difficult than an on-premises deployment. ► Applications must expose runtime information that administrators and operators can use to manage and monitor the system. ► Applications support changing business requirements and customization without requiring the application to be stopped or redeployed.
  • 18.
    18 Performance and Scalability ►Performance is an indication of the responsiveness of a system to execute any action within a given time interval. ► Scalability is ability of a system either to handle increases in load without impact on performance. ► Applications should be able to scale out within limits to meet peaks in demand, and scale in when demand decreases.
  • 19.
    19 Resiliency ► Resiliency isthe ability of a system to gracefully handle and recover from failures. ► Cloud + Internet  Increased likelihood that both transient and more permanent faults will arise. ► Detecting failures, and recovering quickly and efficiently, is necessary
  • 20.
    20 Security ► Applications mustbe designed and deployed in a way that protects them from malicious attacks, restricts access to only approved users, and protects sensitive data.
  • 21.
  • 22.
    Retry Pattern Runtime Reconfiguration SchedulerAgent Supervisor Sharding Pattern Computer Resource Consolidation Throttling External Configuration Store Federated Identity Gatekeepers Compensating transaction Index table Leader Election Materialized View Pipes and Filters Priority Queue Queue-Based Load Leveling Cloud Design Patterns Cache-Aside Circuit Breaker Competing Consumers CQRS Event Sourcing Health Endpoint Monitoring Static Content Hosting Valet Key
  • 23.
    23 Cache-Aside Context Solution Usage Applications use a cache to optimize repeated access to information held in a data store.  Usually impractical to expect that cached data will always be completely consistent with the data in the data store.  Implement a strategy that helps to ensure that the data in the cache is up to date as far as possible  Detect and handle situations that arise when the data in the cache has become stale.  Load data into the cache on demand.  A cache does not provide native read-through and write-through operations.  Resource demand is unpredictable.  When the cached data set is static.  For caching session state information in a web application hosted in a web farm
  • 24.
  • 25.
    25 Circuit Breaker Context SolutionUsage  Access remote resources and services.  Partial loss of connectivity.  Complete failure of a service.  Pointless for an application to continually retry the operation.  Avoid cascading failures  Prevent an application repeatedly trying to execute an operation that is likely to fail.  Detect whether the fault has been resolved  Prevent an application from attempting to invoke a remote service or access a shared resource if this operation is highly likely to fail.  Handling access to local private resources in an application.  As a substitute for handling exceptions in the business logic of your applications
  • 26.
  • 27.
    27 Competing Consumers Context SolutionUsage  Application running in the cloud may be expected to handle a large number of requests.  Application pass the request through a messaging system and then handles them asynchronously through a consumer service  Use a message queue to implement the communication channel between the application and the instances of the consumer service.  Consumer service instances receive messages from the queue and process them  Application workload can run asynchronously.  Tasks are independent and can run in parallel.  Volume of work is highly variable.  Not easy to separate the application workload into discrete tasks  Tasks must be performed synchronously or in a specific sequence
  • 28.
    28 CQRS Context Solution Usage CRUD operations are applied to the same representation of an entity.  Data contention in a collaborative domain.  Mismatch between the read and write representations of the data  Use of separate query and update models for the data.  Common to separate the data into different physical stores to maximize performance, scalability, and security  Task-based user interfaces.  Performance of data reads must be tuned separately from data writes.  Integration with other systems.  Simple Business rules  CRUD is sufficient.  Implementation across the whole system
  • 29.
    29 Event Sourcing Context SolutionUsage  CRUD systems perform update operations directly: hit performance and responsiveness, and limit scalability.  Need to records the details of each operation in a separate log.  Handle operations on data that is driven by a sequence of events.  Use an append-only event store.  Capture “intent,” “purpose,” or “reason”  It’s vital to minimize conflicting updates  Restore the state of a system.  Eventual consistency is acceptable  Simple domains  Consistency is required
  • 30.
    30 Health Endpoint Monitoring ContextSolution Usage  It is more difficult to monitor services running in the cloud than it is to monitor on-premises services.  Services typically depend on other services provided by third parties.  Ensure the required level of availability (SLA)  Implement health monitoring by sending requests to an endpoint on the application  Verify availability.  Check for correct operation.  Monitoring middle-tier or shared services.  Complement existing instrumentation  Does not replace the requirement for logging and auditing.
  • 31.
  • 32.
    32 Static Content Hosting ContextSolution Usage  Requests to download static content.  Processing cycles can be put to better use.  Locating some of an application’s resources and static pages in a storage service.  Minimize costs related to hosting static content.  CDN.  Monitor costs and bandwith usage.  The application needs to perform some processing on the static content.  The volume of static content is very small.
  • 33.
    33 Valet Key Context SolutionUsage  Client programs and web browsers often need to read and write files or data streams to and from an application’s storage.  This approach absorbs valuable resources such as compute, memory, and bandwidth.  Data stores have the capability to handle upload and download of data.  Provide the client with a key or token (vale-key) that the data store itself can validate.  Provides time-limited access to specific resources.  Maximize performance and scalability.  Minimize operational cost.  Clients regularly upload or download data.  If the application must perform some task on the data before it is stored or before it is sent to the client.  Audit trails or control the number of times a data transfer  Limit the size of the data
  • 34.
  • 35.
  • 36.