Cloud computing

Cloud Computing
Considerations

Malisa Ncube
Developer Evangelist – Microsoft
be.com@malisancube
inbox@malisancube.com

What does it take to run an app?
Inspired by Steve Marx
http://blog.smarx.com/posts/what-is-windows-azure-a-hand-drawn-video

What does it take to run an app?

Scalability
• Measured by the number of users that the application can support effectively at the
same time.
• Relates to hardware resources needed (CPU,Memory, Disk and network bandwidth.
• Application logic runs on compute nodes and data on data nodes.
• Vertical scaling is achieved by increasing resources within existing nodes. This is limited
by hardware.
• Horizontal scaling is achieved by adding more nodes. It is more efficient with
homogeneous nodes.
• A scale unit is a combination of resources that needs to be scaled together in horizontal
homogeneous nodes.
• Resource contention limits scalability.
• Scalability is business concern. Google noticed a 20% reduction in traffic after introducing
500ms to page response time. Amazon 100ms caused 1% decrease in revenue.

The cloud
• Gives (illusion) infinite resources and limited by capacity of individual
virtual machines.
• Enabled by short term resource rental model
• Enabled by metered pay-for-use model. Usage costs are transparent.
• Enabled by self-service, on-demand, programmatic provisioning and
releasing of resources, scaling is automatable.
• Gives an ecosystem of managed platform services such as VMs, data
storage, networking, messaging and caching.
• Gives a simplified application development model.

A cloud native application
• Lets the platform do the hard stuff by leveraging the application services.
• Uses non-blocking asynchronous communication in a loosely coupled
architecture.
• Scales horizontally in an elastic mechanism.
• Does not waste resources
• Handles scaling events, node failures, transient failures without downtime
or performance degradation.
• Uses geographic distribution to minimize network latency.
• Upgrades without downtime.
• Scales automatically using proactive and reactive actions.
• Monitors and manages application logs as nodes come and go.

Horizontal scaling compute Pattern
• Horizontal scaling is reversible.
• Supports scaling out and scaling in
• Stateful nodes
• They keep user session information
• They have single point of failure
• Stateless nodes
• Store session information externally from the nodes.

Queue-Centric Workflow Pattern
• Used in web applications to decouple communication between web-tier and service tier
by focusing on the flow of commands.
• A service tier that is unreliable or slow can affect the web tier negatively.
• All communication is asynchronous as message over a queue
• The sender and receiver are loosely coupled. Neither one knows about the
implementation of the other.
• There is some edge cases where the risk of invisibility windows occurs when processing
takes longer than allowed.
• Idempotency concerns. Database transactions, compensating transaction.
• Poison messages placed in dead letter queue.
• QCW is not full CQRS as it does not articulate the read model.

Autoscaling Pattern
• Assumes horizontal scaling architecture
• Concerns are cost optimization and scalability
• Auto-scaling solutions enable scheduled (proactive and reactive) rules
that enable the provisioning of resources as needed.
• Throttling by selectively enabling or disabling features or functionality
based on environmental signals.

Eventual Consistency
• Simultaneous requests for the same data may result in different values.
• Leads to better performance and lower cost.
• Uses Brewer’s CAP theorem (Consistency Availability and Partition
tolerance). 3 Guarantees and application an pick only 2.
• Consistency. Everyone get the same answer.
• Availability. Clients have ongoing access (even if there is a partial system failure)
• Partition tolerance. Means correct operation even if some nodes are cut of from the
network.
• DNS updates and NoSQL are examples of eventually consistent services.

MapReduce Pattern
• Data processing approach for processing highly parallelized datasets.
• Require a mapper and reducer functions. Accepting data and producing
output with subsets of data and output of the mapper aggregated and sent
to the reducer.
• Used to process documents, server logs, social graphs.
• Hadoop implements MR as a batch processing system, optimized for large
amounts of data than response time.
• Created by Google Inc.
• Most effective to bring compute function to data
• Commonly refered to as BigData.
• Hadoop has abstractions on top that create functions e.g. (Mahout - ML,
Hive – SQL like, Pig – dataflow, Sqoop – RDBMs connector)

Database sharding/Federation Pattern
• A database divided into several shards, where each database row
exists only on one shard.
• Shards do not reference other shards.
• Slave shard nodes a typically eventually consistent and readonly.
• Programming model is simplified by maintaining a single logical
database with horizontal scaling.
• Fan-Out queries used to make updates to dependent federation
members. Similar to Windows Azure SQL Data Sync and MapReduce.

Multitenancy and commodity Hardware
• Multitenancy – multi companies using the system, usually a software
system with an illusion that they are the only tenant.
• Multitenancy in the cloud are standard: DNS Services, Hardware for
VMs, Load balancers, Identity management among others.
• Commonly used in SaaS environments where each tenant runs in a
secure sandbox (HyperV, RDBMS).
• Perfomance managed by using quotas, running resource hungry
service with those less intensive.
• Commodity hardware fails occasionally. Plan on it happening on your
compute nodes and plan on handling it.

Busy Signal Pattern
• Applies to services or resources accessed over a network where a
signal response is busy.
• These may include management, data services and more, and
periodic transient failure should be expected. E.g. Busy signal on
telephones.
• A good application should be able to handle retries and properly
handle failures.
• On HTTP. Response 503 Service Unavailable.
• Clearly identify Busy Signal and Errors and retry on Busy state after an
interval. Log them for further analysis of patterns.

Node Failure Pattern
• Concerns availability and graceful handling of unexpected
application/hardware failures, reboots or node shutdown.
• Application state should be in reliable storage, not on local disk or
individual node.
• Avoid single point of failure by using the N+1 rule.
• AWS & Azure send signals from nodes indicating shutdown and traffic
is routed to different tenants.
• An approach would include having the UI code to retry on failures,
throttling some of the features while the recovery is taking place.
• Azure runs in two fault domains

Network latency problem
• Network latency is a function of distance and bandwith
• Consider Data Compression, Background processing, Predictive Fetching.
• Move applications closer to users
• Move application data closer to users
• Ensure nodes within your application are closer together (Colocation)
• WA uses Affinity Groups
• Consider Valet/Key Pattern for public or temporary access. (Blob storage)
protected through hashing.
• Consider Content Delivery Network (CDN) – global distributed cache
effective for frequently accessed content. Can be inconsistent.

Feedback, materials and contacts
@malisancube

Cloud computing

More Related Content

What's hot

Viewers also liked

Similar to Cloud computing

Recently uploaded

Cloud computing

Editor's Notes