SlideShare a Scribd company logo
1 of 105
Download to read offline
Architecting for the Cloud
Len and Matt Bass
Deployment
Deployment pipeline
• Developers commit code
• Code is compiled
• Binary is processed by a build and unit test tool which builds
the service
• Integration tests are run followed by performance tests.
• Result is a machine image (assuming virtualization)
• The service (its image) is deployed to production.
2
Deployment Overview
3
Multiple instances of
a service are
executing
• Red is service being replaced
with new version
• Blue are clients
• Green are dependent services
VAVB
VBVB
UAT / staging /
performance
tests
Additional considerations
• Recall release plan.
• Each release requires coordination among
stakeholders and development teams.
– Can be explicitly performed
– Can be implicit in the architecture (we saw an
example of this)
• You as an architect need to know how frequently
new releases will be deployed.
– Some organizations deploy dozens of times a day
– Other organizations deploy once a week
– Still others once a month or longer
Monthly deployment
• There is time for coordination among
development teams to ensure that
components are consistent
• There is time for “release manager” to ensure
that a release is correct before moving it to
production
Weekly deployment
• Limited time for coordination among
development teams
• Time for “release manager” to ensure that a
release is correct before moving it to
production
Daily deployment
• No time for
– Coordination among development teams
– Release manager to validate release
• These items must be implicit in the
architecture.
Deployment goal and constraints
• Goal of a deployment is to move from current state (N
instances of version A of a service) to a new state (N instances
of version B of a service)
• Constraints associated with Continuous Deplloyment:
– Any development team can deploy their service at any
time. I.e. New version of a service can be deployed either
before or after a new version of a client. (no
synchronization among development teams)
– It takes time to replace one instance of version A with an
instance of version B (order of minutes)
– Service to clients must be maintained while the new
version is being deployed.
8
Deployment strategies
• Two basic all of nothing strategies
– Red/Black– leave N instances with version A as they are,
allocate and provision N instances with version B and then
switch to version B and release instances with version A.
– Rolling Upgrade – allocate one instance, provision it with
version B, release one version A instance. Repeat N times.
• Other deployment topics
– Partial strategies (canary testing, A/B testing,). We will
discuss them later. For now we are discussing all or nothing
deployment.
– Rollback
– Packaging services into machine images
9
Trade offs – Red/Black and
Rolling Upgrade
• Red/Black
– Only one version available to the
client at any particular time.
– Requires 2N instances (additional
costs)
– Accomplished by moving
environments
• Rolling Upgrade
– Multiple versions are available for
service at the same time
– Requires N+1 instances.
• Both are heavily used. Choice
depends on
– Cost
– Managing complications from
using rolling upgrade
10
Update Auto Scaling
Group
Sort Instances
Remove & Deregister Old
Instance from ELB
Confirm Upgrade Spec
Terminate Old Instance
Wait for ASG to Start
New Instance
Register New Instance
with ELB
Rolling
Upgrade in
EC2
Types of failures during rolling upgrade
Rolling
Upgrade
Failure
Provisioning
Research topic
Logical failure
Inconsistencies
to be discussed
Instance failure
Handled by
Auto Scaling
Group in EC2
11
What are the problems with
Rolling Upgrade?
• Recall that any development team can deploy
their service at any time.
• Three concerns
– Maintaining consistency between different versions of
the same service when performing a rolling upgrade
– Maintaining consistency among different services
– Maintaining consistency between a service and
persistent data
12
Maintaining consistency between
different versions of the same
service
• Key idea – differentiate between installing a new
version and activating a new version
• Involves “feature toggles” (described momentarily)
• Sequence
– Develop version B with new code under control of feature
toggle
– Install each instance of version B with the new code
toggled off.
– When all of the instances of version A have been replaced
with instances of version B, activate new code through
toggling the feature.
13
Issues
• Do you remember feature toggles?
• How do I manage features that extend across
multiple services?
• How do I activate all relevant instances at
once?
14
Feature toggle
• Place feature dependent new code inside of an
“if” statement where the code is executed if an
external variable is true. Removed code would be
the “else” portion.
• Used to allow developers to check in
uncompleted code. Uncompleted code is toggled
off.
• During deployment, until new code is activated, it
will not be executed.
• Removing feature toggles when a new feature
has been committed is important.
15
Multi service features
• Most features will involve multiple services.
• Each service has some code under control of a
feature toggle.
• Activate feature when all instances of all services
involved in a feature have been installed.
– Maintain a catalog with feature vs service version
number.
– A feature toggle manager determines when all old
instances of each version have been replaced. This
could be done using registry/load balancer.
– The feature manager activates the feature.
16
Activating feature
• The feature toggle manager changes the value of
the feature toggle. Two possible techniques to
get new value to instances.
– Push. Broadcasting the new value will instruct each
instance to use new code. If a lag of several seconds
between the first service to be toggled and the last
can be tolerated, there is no problem. Otherwise
synchronizing value across network must be done.
– Pull. Querying the manager by each instance to get
latest value may cause performance problems.
• A coordination mechanism such as Zookeeper will
overcome both problems.
17
Maintaining consistency across
versions (summary)
• Install all instances before activating any new
code
• Use feature toggles to activate new code
• Use feature toggle manager to determine
when to activate new code
• Use Zookeeper to coordinate activation with
low overhead
18
Maintaining consistency among
different services
• Use case:
– Wish to deploy new version of service A without
coordinating with development team for clients of
service A.
• I.e. new version of service A should be backward
compatible in terms of its interfaces.
• May also require forward compatibility in certain
circumstances, e.g. rollback
19
Achieving Backwards Compatibility
• APIs can be extended but must always be backward
compatible.
• Leads to a translation layer
External APIs (unchanging but with ability to extend or
add new ones)
Translation to internal APIs
Client Client
Internal APIs (changes require changes to translation
layer but do not propagate further)
What about dependent services?
• Dependent services that are within your control
should maintain backward compatibility
• Dependent services not within your control (third
party software) cannot be forced to maintain
backward compatibility.
– Minimize impact of changes by localizing interactions
with third party software within a single module.
– Keeping services independent and packaging as much
as possible into a virtual machine means that only
third party software accessed through message
passing will cause problems.
21
Forward Compatibility
• Gracefully handle unknown calls and data base schema information
– Suppose your service receives a method call it does
not recognize. It could be intended for a later version
where this method is supported.
– Suppose your service retrieves a data base table with
an unknown field. It could have been added to
support a later version.
• Forward compatibility allows a version of a service to be upgraded
or rolled back independently from its clients. It involves both
– The service handling unrecognized information
– The client handling returns that indicate unrecognized
information.
22
Maintaining consistency between a
service and persistent data
• Assume new version is correct – we will discuss the situation where
it is incorrect in a moment.
• Inconsistency in persistent data can come about because data
schema or semantics change.
• Effect can be minimized by the following practices (if possible).
– Only extend schema – do not change semantics of
existing fields. This preserves backwards compatibility.
– Treat schema modifications as features to be toggled.
This maintains consistency among various services
that access data.
23
I really must change the schema
• In this case, apply pattern for backward
compatibility of interfaces to schemas.
• Use features of database system (I am
assuming a relational DBMS) to restructure
data while maintaining access to not yet
restructured data.
24
Summary of consistency discussion so
far.
• Feature toggles are used to maintain consistency
within instances of a service
• Backward compatibility pattern is used to
maintain consistency between a service and it s
clients.
• Discouraging modification of schema will
maintain consistency between services and
persistent data.
– If schema must be modified, then synchronize
modifications with feature toggles.
25
Canary testing
• Canaries are a small number of instances of a new version
placed in production in order to perform live testing in a
production environment.
• Canaries are observed closely to determine whether the new
version introduces any logical or performance problems. If
not, roll out new version globally. If so, roll back canaries.
• Named after canaries
in coal mines.
26
Implementation of canaries
• Designate a collection of instances as canaries. They do not need to
be aware of their designation.
• Designate a collection of customers as testing the canaries. Can be,
for example
– Organizationally based
– Geographically based
• Then
– Activate feature or version to be tested for canaries. Can be
done through feature activation synchronization mechanism
– Route messages from canary customers to canaries. Can be
done through making registry/load balancer canary aware.
27
A/B testing
• Suppose you wish to test user response to a
system variant. E.g. UI difference or marketing
effort. A is one variant and B is the other.
• You simultaneously make available both
variants to different audiences and compare
the responses.
• Implementation is the same as canary testing.
28
Rollback
• New versions of a service may be unacceptable either
for logical or performance reasons.
• Two options in this case
• Roll back (undo deployment)
• Roll forward (discontinue current deployment and create a
new release without the problem).
• Decision to rollback or roll forward is almost never automated
because there are multiple factors to consider.
• Forward or backward recovery
• Consequences and severity of problem
• Importance of upgrade
29
States of upgrade.
• An upgrade can be in one of two states when
an error is detected.
– Installed (fully or partially) but new features not
activated
– Installed and new features activated.
30
Possibilities
• For this slide assume persistent data is correct.
• Installed but new features not activated
– Error must be in backward compatibility
– Halt deployment
– Roll back by reinstalling old version
– Roll forward by creating new version and installing
that
• Installed with new features activated
– Turn off new features
– If that is insufficient, we are at prior case.
31
Persistent data may be incorrect
• Keep log of user requests (each with their own
identification)
• Identification of incorrect persistent data
• Tag each data item with metadata that provides service and
version that wrote that data
• user request that caused the data to be written
• Correction of incorrect persistent data (simplistic
version)
– Remove data written by incorrect version of a service
– Install correct version
– Replay user requests that caused incorrect data to be
written
32
Persistent data correction
problems
I will not present good solutions to these problems.
1. Replaying user requests may involve requesting features that are
not in the current version.
– Requests can be queued until they can be correctly
re-executed
– User can be informed of error (after the fact)
2. There may be domino effects from incorrect data. i.e. other
calculations may be affected.
– Keep pedigree for data items that allows determining
which additional data items are incorrect. Remove
them and regenerate them when requests replayed.
– Data that escaped the system, e.g. sent to other
system or shown to a user, cannot be retrieved.
33
Summary of rollback options
• Can roll back or roll forward
• Rolling back without consideration of
persistent data is relatively straightforward.
• Managing erroneous persistent data is
complicated and will likely require manual
processing.
34
Packaging of services
• The last portion of the deployment pipeline is
packaging services into machine images for
installation.
• Two dimensions
– Flat vs deep service hierarchy
– One service per virtual machine vs many services
per virtual machine
35
Flat vs Deep Service Hierarchy
• Trading off independence of teams and
possibilities for reuse.
• Flat Service Hierarchy
– Limited dependence among services & limited
coordination needed among teams
– Difficult to reuse services
• Deep Service Hierarchy
– Provides possibility for reusing services
– Requires coordination among teams to discover reuse
possibilities. This can be done during architecture
definition.
36
Services per VM Image
37
Service1
Service2
VM image
Develop
Develop
Embed
Embed
One service per VM
Service VM image
Develop Embed
Multiple services per VM
One Possible Race Condition with
Multiple Services per VM
38
TIME
Initial State: VM image with Version N of Service 1 and Version N of Service 2
Developer 1
Build new image with VN+1|VN
Begin provisioning process
with new image
Developer 2
Build new image with VN|VN+1
Begin provisioning process
with new image
without new version of
Service 1
Results in Version N+1 of Service 1 not being updated
until next build of VM image
Could be prevented by VM image build tool
Another Possible Race Condition
with Multiple Services per VM
39
TIME
Initial State: VM image with Version N of Service 1 and Version N of Service 2
Developer 1
Build new image with VN+1|VN
Begin provisioning process
with new image
overwrites image
created by developer 2
Developer 2
Build new image with VN+1|VN+1
Begin provisioning process
with new image
Results in Version N+1 of Service 2 not being updated
until next build of VM image
Could be prevented by provisioning tool
Trade offs
• One service per VM
– Message from one service to another must go
through inter VM communication mechanism –
adds latency
– No possibility of race condition
• Multiple Services per VM
– Inter VM communication requirements reduced –
reduces latency
– Adds possibility of race condition caused by
simultaneous deployment
40
Summary of Deployment
• Rolling upgrade is common deployment strategy
• Introduces requirements for consistency among
– Different versions of the same service
– Different services
– Services and persistent data
• Other deployment considerations include
– Canary deployment
– A/B testing
– Rollback
– Business continuity
41
Architecting for the Cloud
Case Study
Overview
• What is Netflix
• Migrating to the cloud
– Data
– Build process
– Testing process
• Withstanding Amazon outage of April 21, 2011.
Overview
• What is Netflix
• Migrating to the cloud
– Data
– Build process
– Testing process
• Withstanding Amazon outage of April 21, 2011.
Netflix Corporation
Launched in 1998 after founder was irritated at having to pay late fees on a DVD rental.
DVD Model
• Pay monthly membership fee that includes rentals, shipping and no late fees
• Maintain online queue of desired rentals
• When return last rental (depending on service plan), next item in queue is mailed to
you together with a return envelope.
• Customers rate movies and Netflix recommends based on your preferences
Unusual culture
• Unlimited vacation time for salaried workers
• Workers can take any percentage of their salary in stock options (up to 100%).
Streaming video - 1
Customers can watch Netflix streaming video on a wide variety of devices many of which
feed into a TV
– Roku set top box
– Blu-ray disk players
– Xbox 360
– TV directly
– PlayStation 3
– Wii
– DVRs
– Nintendo
Customers can stop and restart video at will. Netflix calls these locations in the films
“bookmarks”.
Streaming video - 2
In 2007 Netflix began to offer streaming video to it’s subscribers
Initially, one hour of streaming video was available to customers for every dollar they spent
on their plan
In Jan, 2008, every customer was entitled to unlimited streaming video.
In May, 2011, Netflix streaming video accounted for 22% of all internet traffic. 30% of traffic
during peak usage hours.
Three bandwidth tiers
• Continuous bandwidth to the client of 5 Mbit/s. HDTV, surround sound
• Continuous bandwidth to the client of 3Mbit/s – better than DVD
• Continuous bandwidth to the client of 1.5Mbit/s – DVD quality
Overview
• What is Netflix
• Migrating to the cloud
– Data
– Build process
– Testing process
• Withstanding Amazon outage of April 21, 2011.
Netflix’s Growth
In late 2008, Netflix had a single data center with Oracle as the main database system.
With the growth of subscriptions and streaming video, it was clear that they would soon
outgrow the data center.
Not Just More Requests
• Netflix was expanding internationally
• They had unpredictable usage spikes with product releases
– E.g. Wii, and Xbox
– Need infrastructure to handle usage spikes
• Datacenters are expensive
– Netflix was running Oracle on IBM hardware (not commodity hardware)
– Could switch to commodity hardware but would have to have in-house
expertise
• Can’t hire enough system and database administrators to grow fast
enough
Netflix Moves to Cloud
Four reasons cited by Netflix for moving to the cloud
1. Every layer of the software stack needed to scale horizontally, be more reliable,
redundant, and fault tolerant. This leads to reason #2
2. Outsourcing data center infrastructure to Amazon allowed Netflix engineers to focus
on building and improving their business.
3. Netflix is not very good at predicting customer growth or device engagement. They
underestimated their growth rate. The cloud supports rapid scaling.
4. Cloud computing is the future. This will help Netflix with recruiting engineers who are
interested in honing their skills, and will help scale the business. It will also ensure
competition among cloud providers helping to keep costs down.
Why Amazon and EC2? In 2008, Amazon was the leading supplier. Netflix wanted
an IaaS so they could focus on their core competencies.
Netflix applications
What applications does Netflix have?
• Video ratings, reviews, and recommendations
• Video streaming
• User registration, log-in
• Video queues
• Billing
• DVD disc management – inventory and shipping
• Video metadata management – movie cast information
Strategy was to move in a phased manner.
• A subset of applications would move in a particular phase
• Means that data may have to be replicated between cloud and data center during the
move.
Amazon’s S3 and SimpleDB
SimpleDB is Amazon’s non-relational data store
– Optimized for data access speed
– Indexes data for fast retrieval
– Physical drives are less dense to support this
Amazon’s Simple Storage Service (S3)
– Stores raw data
– Optimized on storing larger data sets inexpensively
Choosing Cloud-Based Data Store
SimpleDB and S3 both provide
• Disaster recovery
• Managed fail over and fail back
• Distribution across availability zones
• Persistence
• Eventual consistency
• SimpleDB has a query language
Cloud-Based Data Store II
SimpleDB and S3 do not provide:
• Transactions
• Locks
• Sequences
• Triggers
• Clocks
• Constraints
• Joins
• Schemas and associated integrity checking
• Native data types. All data in SimpleDB is string data
Migrating Data from Oracle to SimpleDB
Whole data sets were moved. E.g. instant watch bookmarks were lifted wholesale into
SimpleDB
Incremental data changes were kept in synch (bi-directional) between Oracle and
SimpleDB. Device support moved from Oracle to the cloud. During that transition,
a user may begin watching a movie on a device supported in the cloud and
continue watching on a device supported in Oracle. Instant watch bookmarks are
kept in synch for these types of possibilities.
Once, the need for the Oracle version is removed, then the synchronization is
discontinued and the data no longer resides on the Oracle database.
Using SimpleDB and S3
Move normal DB functionality up the stack to the application.
Specifically:
• Do “Group By” and “Join” in application
• De-normalize tables to reduce necessity for joins
• Do without triggers
• Manage concurrency using time stamps and conditional Put and
Delete
• Do without clock operations
• Application checks on constraints on read and repair data as a side
effect
Exploiting SimpleDB
All datasets with a significant write load are partitioned across multiple SimpleDB domains
Since all data is stored as strings
• Store all date-times in ISO8601 format
• Zero pad any numeric columns used in sorts or WHERE clause inequalities
Atomic writes requiring both a delete of an attribute and an update of another attribute in the same
item are accomplished through deleting and rewriting the entire item.
SimpleDB is case sensitive to attribute names. Netflix implemented a common data access layer that
normalizes all attribute names (e.g. TO_UPPER)
Misspelled attribute names can fail silently so Netflix has a schema validator in the common data access
layer
Deal with eventual consistency by avoiding reads immediately after writes. If this is not possible, use
ConsistentRead.
Netflix Build Process - Ivy
Ant is a tool for automating the software build process (like Make).
Ivy is an extension to Ant for managing (recording, tracking, resolving and
reporting) project dependencies.
Ant, Ivy
Netflix Build Process - Artifactory
Artifactory is a repository manager.
• Artifactory acts as a proxy between your build tool (Maven, Ant, Ivy, Gradle etc.) and the
outside world.
• It caches remote artifacts so that you don’t have to download them over and over again.
• It blocks unwanted (and sometimes security-sensitive) external requests for internal
artifacts and controls how and where artifacts are deployed, and by whom.
Artifactory
Netflix Build Process - Jenkins
Jenkins is an open source continuous integration tool.
Jenkins provides a so-called continuous integration system, making it easier for developers to integrate
changes to the project, and making it easier for users to obtain a fresh build.
Jenkins monitors executions of externally-run jobs, such as cron jobs and procmail jobs, even those that are
run on a remote machine. For example, with cron, Jenkins keeps e-mail outputs in a repository.
Migration Lessons
• The change from reliable to commodity hardware impacts many things
– E.g. in the old world memory based session state was fine
• Hardware failures were rare
– Not appropriate on the cloud
• Co-tenancy is hard
– In the cloud all resources are shared e.g.
• Hardware, network, storage, …
– This introduces variable performance depending on load
• At all levels of the stack
– Have to be willing to abandon sub-task or manage resources to avoid co-tenancy
• The best way to avoid failure is to fail consistently
– Netflix has designed their systems to be tolerant to failure
– They test this capability regularly
Unleash the Monkey
• As a result of disruption of services Netflix adopted a process of
testing the system
• They learned that the best way to test the system was at scale
• In order to learn how their systems handle failure they introduced
faults
– Along came the “Chaos Monkey”
• The Chaos Monkey randomly disables production instances
• This is done in a tightly controlled environment with engineers
standing by
– But it is done in the production environment
• The engineers then learn how their systems handle failures and can
improve the solution accordingly
Increase the Chaos
• The Chaos Monkey was so successful Netflix created a “Simian Army” to test the system
• The “Army” includes:Netflix Test Suite - 1
– Chaos monkey. Randomly kill a process and monitor the effect.
– Latency monkey. Randomly introduce latency and monitor the effect.
– Doctor monkey. The Doctor Monkey taps into health checks that run on
each instance as well as monitors other external signs of health (e.g. CPU
load) to detect unhealthy instances.
– Janitor Monkey. The Janitor Monkey ensures that the Netflix cloud
environment is running free of clutter and waste. It searches for unused
resources and disposes of them.
Netflix Test Suite - 2
• Conformity Monkey. The Conformity Monkey finds instances that don’t adhere to best-
practices and shuts them down. For example, if an instance does not belong to an auto-
scaling group, that is a potential problem.
• Security Monkey The Security Monkey is an extension of Conformity Monkey. It finds
security violations or vulnerabilities, such as improperly configured AWS security
groups, and terminates the offending instances. It also ensures that all our SSL and DRM
certificates are valid and are not coming up for renewal.
• 10-18 Monkey The 10-18 Monkey (Localization-Internationalization) detects
configuration and run time problems in instances serving customers in multiple
geographic regions, using different languages and character sets. The name 10-18
comes from L10n and I18n which are the number of characters in the words localization
and internationalization.
• Chaos Gorilla: The Gorilla is similar to the Chaos Monkey but simulates the outage of an
entire Amazon availability zone
Overview
• What is Netflix
• Migrating to the cloud
– Data
– Build process
– Testing process
• Withstanding Amazon outage of April 21, 2011
April 21, 2011 Outage – Background
An EBS cluster is comprised of a set of EBS nodes.
When Node A loses connectivity to a node (Node B) to which it is replicating data,
Node A assumes node B has failed
In this case, it must find another node to which data is replicated. This is called re-
mirroring.
When data is being re-mirrored, all nodes that have copies of that data retain that
data and block all external access to that data. This is called the node being “stuck”
In the normal case, a node is stuck for only a few milli-seconds.
April 21, 2011 Outage - Events
At 12:47 AM PDT, a configuration change was made to upgrade the capacity of the primary EBS
network within one availability zone in the Eastern Region.
• Standard practice is to shift traffic off of one of the redundant routers in the primary EBS
network to allow the upgrade to happen
• Traffic was shifted incorrectly to the lower capacity backup network.
Consequently, there was no functioning primary or secondary network in the affected portion of
the availability zone since traffic was shifted off of the primary and the secondary became
quickly overloaded.
When the error was detected and connectivity restored, a large number of nodes that had been
isolated began searching for other nodes to which they could re-mirror their data.
This caused a re-mirroring storm where all of the available space in the affected cluster was used
up and the attempts to gain access to another availability zone overwhelmed the primary
communication network.
This caused the whole region to be unavailable
April 21, 2011 Outage – Consequences
At 12:04 PM PDT, the outage was contained to the one affected availability zone. 13%
of the nodes in the availability zone remained “stuck.
Additional capacity was added to the availability zone to allow those nodes to become
unstuck.
In the meantime, some of the stuck volumes experienced hardware failures. Hardware
failures are a normal occurrence within a cluster.
Ultimately, the data on .07% of the nodes in the affected availability zone could not be
recovered and this data was permanently lost.
Netflix and the Amazon Outage of April, 2011
On April 21, 2011, one of the availability zones in Amazon’s Eastern Region failed.
– Many organizations were unavailable for the duration of the outage (24 hours)
– Some organizations permanently lost data
Netflix was largely unaffected
– Customers experienced no disruption of service
– There were some increased latencies
– Higher than normal error rate
– No interruption on customers ability to find and watch movies
How Did Netflix Accomplish This?
Stateless services
• Services are largely stateless
• If a server fails the request can be re-routed
• Remember the discussion of modular redundancy?
– The lack of state means failover times are negligible
Data stored across availability zones
• In some cases it was not practical to re-architect the system to be stateless
• In these cases the there are multiple hot copies of the data across availability zones
• In the event of a failure again the failover time is negligible
– What is the cost of doing this?
Design Decisions II
Graceful degradation: The Netflix systems are designed for failure. Allot of thought went
into what to do when they fail.
• Fail fast – aggressive time outs so that failing components do not slow the whole
system down
• Fallbacks – every feature has the ability to fall back to a lower quality representation.
E.g. if Netflix cannot generate personalized recommendations, they will fall back to non-
personalized results
• Feature removal. If a feature is non-critical and is slow, then it may be removed from
any given page.
N+1 redundancy
• The system is architected with more capacity than it needs at any time
• This allows them to cope with large spikes in load caused by users directly or the ripple
effect of transient failures
What Issues Were Experienced?
Netflix did have some issues, however, including:
• As the availability zone started to fail Netflix decided to pull out all-together
• This required manual configuration changes to AWS
• Each service team was involved in moving their service out of the zone
• This was a time consuming and error prone process
Load balancers had to be manually adjusted to keep traffic
out of affected availability zone.
• Netflix uses Amazon’s Elastic Load Balancing (ELB)
• This first balances load to availability zones and then instances
• If you have many instances go down the others in that zone have to pick up the slack
• If you can’t bring up more nodes you will experience a cascading effect until all nodes go down
Amazon Outage of Aug 8, 2011
On Aug 8, 2011, the Eastern Region of AWS went down for about
half an hour due to connectivity issues that affected all
availability zones …
Netflix was down as well…
Summary
Netfflix moved to the cloud because
• They doubted their ability to correctly predict the load
• They did not want to invest in data centers directly
Netflix chose SimpleDB as their original target database system and their migration
strategy involved
• Running Oracle and SimpleDB concurrently for a period
• Moving some features provided by Oracle up the application stack
Netflix has a number of test programs that inject faults or- look for specific types of
violations.
Neflix re-architected their build process to support sophisticated deployment practices
and to allow for continuous integration.
Netflix did a number of things to promote availability and these enabled Netflix to
continue customer service through an extensive AWS outage.
Reference
http://techblog.netflix.com/
Questions??
Architecting for the Cloud
Future Trends
Topics
• Containers
• Augmented reality
• Cloudlets
• Internet of things – cars, applicances, etc
Containers
• Loading full VM at deployment time
– Takes time for large machine images
– Results in many slightly different VM images for
different versions
– Depends on particular processor even if virtualized.
• Containers are a proposed solution
– Machine image consists of OS + libraries
– Application portion runs inside of this machine image
and is loaded by the machine image once it is
instantiated.
What is a Container
Containers vs VMs
Virtues of Containers
Adoption of Containers
• Containers are an old concept
• Docker is a software system that packages the
creation of containers
• Cloud providers are adopting containers since
they control the entire stack and can choose
OSs.
Topics
• Containers
• Augmented reality
• Cloudlets
• Internet of things – cars, applicances, etc
Augmented Reality
• Superimpose computer generated images
over visual images
Simple text has been available
for a long time
• CMU wearable computer group early 1990s
• Google Glass modern display device
Computer power needed for more
sophisticated images
• Route planning – showing what is ahead if you
take a particular route
• Visualizing furniture in a room
• These applications require
– Location determination
– Orientation determination
• This computer power is not available on
mobile devices.
• This is a lead in to “cloudlets”
Cloudlets
• Mobile devices do not have the CPU power
necessary for some applications.
• Consequently, they act as data sources for
apps in the cloud.
– This introduces latency in responding to a user
request
– This latency does not matter for some purposes –
e.g. airline reservations, music streaming
– But – it does matter for others, e.g. voice
recognition, translation, augmented reality.
How does one match computation
power and data with low latency?
• Move computation power closer to the data.
• But, but, but
– Data is inherently mobile (with some latency)
– Computation is tied to a server. Real computation
power is not so mobile.
– I am in an automobile and my mobile is constantly
moving and getting further away from any fixed
computation source
Anyway, won’t my mobile have
sufficient computational power?
• Mobiles will always inherently be less powerful
relative to static client and server hardware
• Mobiles have to trade off
– size,
– weight,
– battery life,
– storage
with computational power
• Computational power is always going to be of
lower priority than these other qualities.
Cloudlet infrastructure
• A cloudlet infrastructure is
– A collection of servers with more computational power
than my mobile
– Connected to the internet
– Connected to a power source
– One data hop away from my mobile
• Where would these servers live?
– Doctor’s offices
– Coffee shops
– Think wireless access points
• A new client would instantiate a VM with the desired
app in a local cloudlet server.
How would it work?
• Three models
1. Download application VM from cloud into cloudlet
server on demand.
• Time consuming
• Allows for bare cloudlet server
• Allows arbitrary applications
2. Pre load most popular application VMs
• Fast
• Limited in terms of number of applications available
3. Use containers to preload most popular platforms and
download remaining software of an app.
• Intermediate in time required to initiate
• Intermediate in flexibility of apps.
Revisit AR application
• Consider what is
involved to show what
is ahead based on
head location.
– Location
– Orientation
• Cloudlets have
potential to supply
necessary
infrastructure.
Summary of cloudlets
• Locally available servers with higher
computational power than mobiles
• Enables real time application of some
technologies such as augmented reality or real
time translation
• Based on creating VM in local server with
necessary app
• Feasible with current technology
Topics
• Containers
• Augmented reality
• Cloudlets
• Internet of things – cars, applicances, etc
Internet of Things
• “Everything” is connected to the internet
– Toasters
– Refrigerators
– Automobiles
– …
• Every device has both data generation and
control aspects.
• From a cloud perspective, data generation is
important.
Data available from a device
• Current state
• Current activity
• Current location
• Current power usage
• Information about the environment of the
device
• …
How frequently is the data sampled?
• Suppose each device is sampled once a
minute
• Suppose each sample contains 1MB
• The number of devices being sampled may be
in the 100,000s or1,000,000s (think
automobiles or engines, or tires, or …)
• Petabytes of data streaming into a data center.
• “Big Data” is the buzz term used for this level
of data.
Leads to streaming DBMSs
• SQLstream
• STREAM [1]
• AURORA,[2] StreamBase Systems, Inc.
• TelegraphCQ [3]
• NiagaraCQ,[4]
• QStream
• PIPES, webMethods Business Events
• StreamGlobe
• Odysseus
• StreamInsight
• InfoSphere Streams
• Kinesis
Requirements for streaming dbms
Database management system (DBMS) Data stream management system (DSMS)
Persistent data (relations) volatile data streams
Random access Sequential access
One-time queries Continuous queries
(theoretically) unlimited secondary
storage
limited main memory
Only the current state is relevant Consideration of the order of the input
relatively low update rate potentially extremely high update rate
Little or no time requirements Real-time requirements
Assumes exact data Assumes outdated/inaccurate data
Plannable query processing
Variable data arrival and data
characteristics
Managing the data
• Synopses
– Select sample of raw data to support a particular analysis.
– Moving average of data
– May be inaccurate when performing analysis
• Windows
– Only look at a portion of the data
– Last 10 elements
– Last 10 seconds
• Choice of approach for managing the data is driven by
the types of analyses that the data is used for.
Demand is for ever fasting
processing
• Businesses want to use the data for real time
reaction
• E.g. Amazon has “customers who look at this
also looked at”. Suppose they use this
information and your browsing history to offer
you a deal on two items rather than have you
decide individually to buy the two items.
• Just one example out of hundreds.
Issues
• Managing the volume of data
• Security – how do you know data is correctly
identified?
• Privacy – what kind of information sharing will
consumers tolerate?
Summary for Internet of Things
• If every device is connected to the internet,
the data generated will be massive
• Specialized data management systems
• Specialized analysis systems.
• Problems include security and privacy

More Related Content

What's hot

[QCon London 2020] The Future of Cloud Native API Gateways - Richard Li
[QCon London 2020] The Future of Cloud Native API Gateways - Richard Li[QCon London 2020] The Future of Cloud Native API Gateways - Richard Li
[QCon London 2020] The Future of Cloud Native API Gateways - Richard LiAmbassador Labs
 
Sneak Peek into the New ChangeMan ZMF Release
Sneak Peek into the New ChangeMan ZMF ReleaseSneak Peek into the New ChangeMan ZMF Release
Sneak Peek into the New ChangeMan ZMF ReleaseNavita Sood
 
Overview of Kovair Omnibus Integration Platform
Overview of Kovair Omnibus Integration PlatformOverview of Kovair Omnibus Integration Platform
Overview of Kovair Omnibus Integration PlatformKovair
 
The Top 5 Practices of a Highly Successful ChangeMan ZMF Administrator
The Top 5 Practices of a Highly Successful ChangeMan ZMF AdministratorThe Top 5 Practices of a Highly Successful ChangeMan ZMF Administrator
The Top 5 Practices of a Highly Successful ChangeMan ZMF AdministratorSerena Software
 
Serena Mainframe VUG: What's new in ChangeMan ZMF 8.1
Serena Mainframe VUG: What's new in ChangeMan ZMF 8.1Serena Mainframe VUG: What's new in ChangeMan ZMF 8.1
Serena Mainframe VUG: What's new in ChangeMan ZMF 8.1Serena Software
 
Whats new in microsoft desktop optimization package
Whats new in microsoft desktop optimization packageWhats new in microsoft desktop optimization package
Whats new in microsoft desktop optimization packageOlav Tvedt
 
Neotys PAC 2018 - Ramya Ramalinga Moorthy
Neotys PAC 2018 - Ramya Ramalinga MoorthyNeotys PAC 2018 - Ramya Ramalinga Moorthy
Neotys PAC 2018 - Ramya Ramalinga MoorthyNeotys_Partner
 
Mainframe VUG Presentation April 2016
Mainframe VUG Presentation April 2016Mainframe VUG Presentation April 2016
Mainframe VUG Presentation April 2016Serena Software
 
Solution Manager 7.2 SAP Monitoring - Part 3 - Managed System Configuration
Solution Manager 7.2 SAP Monitoring - Part 3 - Managed System ConfigurationSolution Manager 7.2 SAP Monitoring - Part 3 - Managed System Configuration
Solution Manager 7.2 SAP Monitoring - Part 3 - Managed System ConfigurationLinh Nguyen
 
Neotys PAC 2018 - Bruno Da Silva
Neotys PAC 2018 - Bruno Da SilvaNeotys PAC 2018 - Bruno Da Silva
Neotys PAC 2018 - Bruno Da SilvaNeotys_Partner
 
Solution Manager 7.2 SAP Monitoring - Part 2 - Configuration
Solution Manager 7.2 SAP Monitoring - Part 2 - ConfigurationSolution Manager 7.2 SAP Monitoring - Part 2 - Configuration
Solution Manager 7.2 SAP Monitoring - Part 2 - ConfigurationLinh Nguyen
 
Introduction to kovair ALM and Integration Products
Introduction to kovair ALM and Integration ProductsIntroduction to kovair ALM and Integration Products
Introduction to kovair ALM and Integration ProductsKovair
 
Year in Review: Perforce 2014 Product Updates
Year in Review: Perforce 2014 Product UpdatesYear in Review: Perforce 2014 Product Updates
Year in Review: Perforce 2014 Product UpdatesPerforce
 
Response time difference analysis of performance testing tools
Response time difference analysis of performance testing toolsResponse time difference analysis of performance testing tools
Response time difference analysis of performance testing toolsSpoorthi Sham
 
SAP Solution Manager - Netweaver on HANA Monitoring Setup Part 1 of 3 (Prepar...
SAP Solution Manager - Netweaver on HANA Monitoring Setup Part 1 of 3 (Prepar...SAP Solution Manager - Netweaver on HANA Monitoring Setup Part 1 of 3 (Prepar...
SAP Solution Manager - Netweaver on HANA Monitoring Setup Part 1 of 3 (Prepar...Linh Nguyen
 
Kanban - Change Management
Kanban - Change ManagementKanban - Change Management
Kanban - Change ManagementTal Aviv
 
Neotys PAC 2018 - Gayatree Nalwadad
Neotys PAC 2018 - Gayatree NalwadadNeotys PAC 2018 - Gayatree Nalwadad
Neotys PAC 2018 - Gayatree NalwadadNeotys_Partner
 
DevOps Workflows in the Windows Ecosystem - April 21
 DevOps Workflows in the Windows Ecosystem - April 21 DevOps Workflows in the Windows Ecosystem - April 21
DevOps Workflows in the Windows Ecosystem - April 21Puppet
 

What's hot (20)

[QCon London 2020] The Future of Cloud Native API Gateways - Richard Li
[QCon London 2020] The Future of Cloud Native API Gateways - Richard Li[QCon London 2020] The Future of Cloud Native API Gateways - Richard Li
[QCon London 2020] The Future of Cloud Native API Gateways - Richard Li
 
Sneak Peek into the New ChangeMan ZMF Release
Sneak Peek into the New ChangeMan ZMF ReleaseSneak Peek into the New ChangeMan ZMF Release
Sneak Peek into the New ChangeMan ZMF Release
 
Overview of Kovair Omnibus Integration Platform
Overview of Kovair Omnibus Integration PlatformOverview of Kovair Omnibus Integration Platform
Overview of Kovair Omnibus Integration Platform
 
The Top 5 Practices of a Highly Successful ChangeMan ZMF Administrator
The Top 5 Practices of a Highly Successful ChangeMan ZMF AdministratorThe Top 5 Practices of a Highly Successful ChangeMan ZMF Administrator
The Top 5 Practices of a Highly Successful ChangeMan ZMF Administrator
 
Serena Mainframe VUG: What's new in ChangeMan ZMF 8.1
Serena Mainframe VUG: What's new in ChangeMan ZMF 8.1Serena Mainframe VUG: What's new in ChangeMan ZMF 8.1
Serena Mainframe VUG: What's new in ChangeMan ZMF 8.1
 
Whats new in microsoft desktop optimization package
Whats new in microsoft desktop optimization packageWhats new in microsoft desktop optimization package
Whats new in microsoft desktop optimization package
 
Neotys PAC 2018 - Ramya Ramalinga Moorthy
Neotys PAC 2018 - Ramya Ramalinga MoorthyNeotys PAC 2018 - Ramya Ramalinga Moorthy
Neotys PAC 2018 - Ramya Ramalinga Moorthy
 
Mainframe VUG Presentation April 2016
Mainframe VUG Presentation April 2016Mainframe VUG Presentation April 2016
Mainframe VUG Presentation April 2016
 
Solution Manager 7.2 SAP Monitoring - Part 3 - Managed System Configuration
Solution Manager 7.2 SAP Monitoring - Part 3 - Managed System ConfigurationSolution Manager 7.2 SAP Monitoring - Part 3 - Managed System Configuration
Solution Manager 7.2 SAP Monitoring - Part 3 - Managed System Configuration
 
Neotys PAC 2018 - Bruno Da Silva
Neotys PAC 2018 - Bruno Da SilvaNeotys PAC 2018 - Bruno Da Silva
Neotys PAC 2018 - Bruno Da Silva
 
Performance testing material
Performance testing materialPerformance testing material
Performance testing material
 
Solution Manager 7.2 SAP Monitoring - Part 2 - Configuration
Solution Manager 7.2 SAP Monitoring - Part 2 - ConfigurationSolution Manager 7.2 SAP Monitoring - Part 2 - Configuration
Solution Manager 7.2 SAP Monitoring - Part 2 - Configuration
 
Introduction to kovair ALM and Integration Products
Introduction to kovair ALM and Integration ProductsIntroduction to kovair ALM and Integration Products
Introduction to kovair ALM and Integration Products
 
Year in Review: Perforce 2014 Product Updates
Year in Review: Perforce 2014 Product UpdatesYear in Review: Perforce 2014 Product Updates
Year in Review: Perforce 2014 Product Updates
 
Response time difference analysis of performance testing tools
Response time difference analysis of performance testing toolsResponse time difference analysis of performance testing tools
Response time difference analysis of performance testing tools
 
SAP Solution Manager - Netweaver on HANA Monitoring Setup Part 1 of 3 (Prepar...
SAP Solution Manager - Netweaver on HANA Monitoring Setup Part 1 of 3 (Prepar...SAP Solution Manager - Netweaver on HANA Monitoring Setup Part 1 of 3 (Prepar...
SAP Solution Manager - Netweaver on HANA Monitoring Setup Part 1 of 3 (Prepar...
 
Kanban - Change Management
Kanban - Change ManagementKanban - Change Management
Kanban - Change Management
 
Chapter12
Chapter12Chapter12
Chapter12
 
Neotys PAC 2018 - Gayatree Nalwadad
Neotys PAC 2018 - Gayatree NalwadadNeotys PAC 2018 - Gayatree Nalwadad
Neotys PAC 2018 - Gayatree Nalwadad
 
DevOps Workflows in the Windows Ecosystem - April 21
 DevOps Workflows in the Windows Ecosystem - April 21 DevOps Workflows in the Windows Ecosystem - April 21
DevOps Workflows in the Windows Ecosystem - April 21
 

Viewers also liked

Release the Monkeys ! Testing in the Wild at Netflix
Release the Monkeys !  Testing in the Wild at NetflixRelease the Monkeys !  Testing in the Wild at Netflix
Release the Monkeys ! Testing in the Wild at NetflixGareth Bowles
 
Intro to Netflix's Chaos Monkey
Intro to Netflix's Chaos MonkeyIntro to Netflix's Chaos Monkey
Intro to Netflix's Chaos MonkeyMichael Whitehead
 
ARC301 Intro to Chaos Monkey & the Simian Army - AWS re: Invent 2012
ARC301 Intro to Chaos Monkey & the Simian Army - AWS re: Invent 2012ARC301 Intro to Chaos Monkey & the Simian Army - AWS re: Invent 2012
ARC301 Intro to Chaos Monkey & the Simian Army - AWS re: Invent 2012Amazon Web Services
 
Dev ops and safety critical systems
Dev ops and safety critical systemsDev ops and safety critical systems
Dev ops and safety critical systemsLen Bass
 
Docker Service Registration and Discovery
Docker Service Registration and DiscoveryDocker Service Registration and Discovery
Docker Service Registration and Discoverym_richardson
 
Cloud Security At Netflix, October 2013
Cloud Security At Netflix, October 2013Cloud Security At Netflix, October 2013
Cloud Security At Netflix, October 2013Jay Zarfoss
 
Practical Security Automation
Practical Security AutomationPractical Security Automation
Practical Security AutomationJason Chan
 
From Code to the Monkeys: Continuous Delivery at Netflix
From Code to the Monkeys: Continuous Delivery at NetflixFrom Code to the Monkeys: Continuous Delivery at Netflix
From Code to the Monkeys: Continuous Delivery at NetflixDianne Marsh
 
Netflix: A State of Xen - Chaos Monkey & Cassandra
Netflix: A State of Xen - Chaos Monkey & CassandraNetflix: A State of Xen - Chaos Monkey & Cassandra
Netflix: A State of Xen - Chaos Monkey & CassandraDataStax Academy
 
Summer School 2013 - What is iPaaS and why it is important
Summer School 2013 - What is iPaaS and why it is importantSummer School 2013 - What is iPaaS and why it is important
Summer School 2013 - What is iPaaS and why it is importantWSO2
 
Netflix security monkey overview
Netflix security monkey overviewNetflix security monkey overview
Netflix security monkey overviewRyan Hodgin
 
Devops at Netflix (re:Invent)
Devops at Netflix (re:Invent)Devops at Netflix (re:Invent)
Devops at Netflix (re:Invent)Jeremy Edberg
 
AWS Application Discovery Service
AWS Application Discovery ServiceAWS Application Discovery Service
AWS Application Discovery ServiceAmazon Web Services
 
Api gateway : To be or not to be
Api gateway : To be or not to beApi gateway : To be or not to be
Api gateway : To be or not to beJaewoo Ahn
 
Service Discovery using etcd, Consul and Kubernetes
Service Discovery using etcd, Consul and KubernetesService Discovery using etcd, Consul and Kubernetes
Service Discovery using etcd, Consul and KubernetesSreenivas Makam
 
Distributed architecture in a cloud native microservices ecosystem
Distributed architecture in a cloud native microservices ecosystemDistributed architecture in a cloud native microservices ecosystem
Distributed architecture in a cloud native microservices ecosystemZhenzhong Xu
 
Igor Popov: Mutation Testing at I T.A.K.E. Unconference 2015
Igor Popov: Mutation Testing at I T.A.K.E. Unconference 2015Igor Popov: Mutation Testing at I T.A.K.E. Unconference 2015
Igor Popov: Mutation Testing at I T.A.K.E. Unconference 2015Mozaic Works
 

Viewers also liked (20)

Release the Monkeys ! Testing in the Wild at Netflix
Release the Monkeys !  Testing in the Wild at NetflixRelease the Monkeys !  Testing in the Wild at Netflix
Release the Monkeys ! Testing in the Wild at Netflix
 
Intro to Netflix's Chaos Monkey
Intro to Netflix's Chaos MonkeyIntro to Netflix's Chaos Monkey
Intro to Netflix's Chaos Monkey
 
Mini-Training: Netflix Simian Army
Mini-Training: Netflix Simian ArmyMini-Training: Netflix Simian Army
Mini-Training: Netflix Simian Army
 
ARC301 Intro to Chaos Monkey & the Simian Army - AWS re: Invent 2012
ARC301 Intro to Chaos Monkey & the Simian Army - AWS re: Invent 2012ARC301 Intro to Chaos Monkey & the Simian Army - AWS re: Invent 2012
ARC301 Intro to Chaos Monkey & the Simian Army - AWS re: Invent 2012
 
Dev ops and safety critical systems
Dev ops and safety critical systemsDev ops and safety critical systems
Dev ops and safety critical systems
 
presentation-chaos-monkey
presentation-chaos-monkeypresentation-chaos-monkey
presentation-chaos-monkey
 
Docker Service Registration and Discovery
Docker Service Registration and DiscoveryDocker Service Registration and Discovery
Docker Service Registration and Discovery
 
Cloud Security At Netflix, October 2013
Cloud Security At Netflix, October 2013Cloud Security At Netflix, October 2013
Cloud Security At Netflix, October 2013
 
Practical Security Automation
Practical Security AutomationPractical Security Automation
Practical Security Automation
 
From Code to the Monkeys: Continuous Delivery at Netflix
From Code to the Monkeys: Continuous Delivery at NetflixFrom Code to the Monkeys: Continuous Delivery at Netflix
From Code to the Monkeys: Continuous Delivery at Netflix
 
Netflix: A State of Xen - Chaos Monkey & Cassandra
Netflix: A State of Xen - Chaos Monkey & CassandraNetflix: A State of Xen - Chaos Monkey & Cassandra
Netflix: A State of Xen - Chaos Monkey & Cassandra
 
Summer School 2013 - What is iPaaS and why it is important
Summer School 2013 - What is iPaaS and why it is importantSummer School 2013 - What is iPaaS and why it is important
Summer School 2013 - What is iPaaS and why it is important
 
Netflix security monkey overview
Netflix security monkey overviewNetflix security monkey overview
Netflix security monkey overview
 
Devops at Netflix (re:Invent)
Devops at Netflix (re:Invent)Devops at Netflix (re:Invent)
Devops at Netflix (re:Invent)
 
AWS Application Discovery Service
AWS Application Discovery ServiceAWS Application Discovery Service
AWS Application Discovery Service
 
Api gateway : To be or not to be
Api gateway : To be or not to beApi gateway : To be or not to be
Api gateway : To be or not to be
 
Service Discovery using etcd, Consul and Kubernetes
Service Discovery using etcd, Consul and KubernetesService Discovery using etcd, Consul and Kubernetes
Service Discovery using etcd, Consul and Kubernetes
 
Distributed architecture in a cloud native microservices ecosystem
Distributed architecture in a cloud native microservices ecosystemDistributed architecture in a cloud native microservices ecosystem
Distributed architecture in a cloud native microservices ecosystem
 
Igor Popov: Mutation Testing at I T.A.K.E. Unconference 2015
Igor Popov: Mutation Testing at I T.A.K.E. Unconference 2015Igor Popov: Mutation Testing at I T.A.K.E. Unconference 2015
Igor Popov: Mutation Testing at I T.A.K.E. Unconference 2015
 
Mutation testing
Mutation testingMutation testing
Mutation testing
 

Similar to Architecture for the cloud deployment case study future

REPORT_ppt
REPORT_pptREPORT_ppt
REPORT_pptRivu Das
 
Microservice Architecture Patterns, by Richard Langlois P. Eng.
Microservice Architecture Patterns, by Richard Langlois P. Eng.Microservice Architecture Patterns, by Richard Langlois P. Eng.
Microservice Architecture Patterns, by Richard Langlois P. Eng.Richard Langlois P. Eng.
 
Continuous Deployment of your Application @SpringOne
Continuous Deployment of your Application @SpringOneContinuous Deployment of your Application @SpringOne
Continuous Deployment of your Application @SpringOneciberkleid
 
issues with the use of canaries in upgrade
issues with the use of canaries in upgradeissues with the use of canaries in upgrade
issues with the use of canaries in upgradeLen Bass
 
Strategies in continuous delivery
Strategies in continuous deliveryStrategies in continuous delivery
Strategies in continuous deliveryAviran Mordo
 
Ncerc rlmca202 adm m3 ssm
Ncerc rlmca202  adm m3 ssmNcerc rlmca202  adm m3 ssm
Ncerc rlmca202 adm m3 ssmssmarar
 
Dev ops for software architects
Dev ops for software architectsDev ops for software architects
Dev ops for software architectsLen Bass
 
API310 - How to refactor a monolith to serverless in 8 steps
API310 - How to refactor a monolith to serverless in 8 stepsAPI310 - How to refactor a monolith to serverless in 8 steps
API310 - How to refactor a monolith to serverless in 8 stepsYan Cui
 
Getting to Walk with DevOps
Getting to Walk with DevOpsGetting to Walk with DevOps
Getting to Walk with DevOpsEklove Mohan
 
The Future of Serverless
The Future of ServerlessThe Future of Serverless
The Future of ServerlessWSO2
 
SQL Server DevOps Jumpstart
SQL Server DevOps JumpstartSQL Server DevOps Jumpstart
SQL Server DevOps JumpstartOri Donner
 
Continuous Deployment of your Application @JUGtoberfest
Continuous Deployment of your Application @JUGtoberfestContinuous Deployment of your Application @JUGtoberfest
Continuous Deployment of your Application @JUGtoberfestMarcin Grzejszczak
 
Semantic Validation: Enforcing Kafka Data Quality Through Schema-Driven Verif...
Semantic Validation: Enforcing Kafka Data Quality Through Schema-Driven Verif...Semantic Validation: Enforcing Kafka Data Quality Through Schema-Driven Verif...
Semantic Validation: Enforcing Kafka Data Quality Through Schema-Driven Verif...HostedbyConfluent
 
.NET microservices with Azure Service Fabric
.NET microservices with Azure Service Fabric.NET microservices with Azure Service Fabric
.NET microservices with Azure Service FabricDavide Benvegnù
 
Java Abs Dynamic Server Replication
Java Abs   Dynamic Server ReplicationJava Abs   Dynamic Server Replication
Java Abs Dynamic Server Replicationncct
 
Continuous Deployment to the cloud
Continuous Deployment to the cloudContinuous Deployment to the cloud
Continuous Deployment to the cloudVMware Tanzu
 
Introduction to GOCD - Amulya Sharma
Introduction to GOCD - Amulya SharmaIntroduction to GOCD - Amulya Sharma
Introduction to GOCD - Amulya SharmaAmulya Sharma
 
Technology insights: Decision Science Platform
Technology insights: Decision Science PlatformTechnology insights: Decision Science Platform
Technology insights: Decision Science PlatformDecision Science Community
 

Similar to Architecture for the cloud deployment case study future (20)

REPORT_ppt
REPORT_pptREPORT_ppt
REPORT_ppt
 
Microservice Architecture Patterns, by Richard Langlois P. Eng.
Microservice Architecture Patterns, by Richard Langlois P. Eng.Microservice Architecture Patterns, by Richard Langlois P. Eng.
Microservice Architecture Patterns, by Richard Langlois P. Eng.
 
Continuous Deployment of your Application @SpringOne
Continuous Deployment of your Application @SpringOneContinuous Deployment of your Application @SpringOne
Continuous Deployment of your Application @SpringOne
 
issues with the use of canaries in upgrade
issues with the use of canaries in upgradeissues with the use of canaries in upgrade
issues with the use of canaries in upgrade
 
Chapter Three.pptx
Chapter Three.pptxChapter Three.pptx
Chapter Three.pptx
 
Strategies in continuous delivery
Strategies in continuous deliveryStrategies in continuous delivery
Strategies in continuous delivery
 
Ncerc rlmca202 adm m3 ssm
Ncerc rlmca202  adm m3 ssmNcerc rlmca202  adm m3 ssm
Ncerc rlmca202 adm m3 ssm
 
Dev ops for software architects
Dev ops for software architectsDev ops for software architects
Dev ops for software architects
 
API310 - How to refactor a monolith to serverless in 8 steps
API310 - How to refactor a monolith to serverless in 8 stepsAPI310 - How to refactor a monolith to serverless in 8 steps
API310 - How to refactor a monolith to serverless in 8 steps
 
Grails Services
Grails ServicesGrails Services
Grails Services
 
Getting to Walk with DevOps
Getting to Walk with DevOpsGetting to Walk with DevOps
Getting to Walk with DevOps
 
The Future of Serverless
The Future of ServerlessThe Future of Serverless
The Future of Serverless
 
SQL Server DevOps Jumpstart
SQL Server DevOps JumpstartSQL Server DevOps Jumpstart
SQL Server DevOps Jumpstart
 
Continuous Deployment of your Application @JUGtoberfest
Continuous Deployment of your Application @JUGtoberfestContinuous Deployment of your Application @JUGtoberfest
Continuous Deployment of your Application @JUGtoberfest
 
Semantic Validation: Enforcing Kafka Data Quality Through Schema-Driven Verif...
Semantic Validation: Enforcing Kafka Data Quality Through Schema-Driven Verif...Semantic Validation: Enforcing Kafka Data Quality Through Schema-Driven Verif...
Semantic Validation: Enforcing Kafka Data Quality Through Schema-Driven Verif...
 
.NET microservices with Azure Service Fabric
.NET microservices with Azure Service Fabric.NET microservices with Azure Service Fabric
.NET microservices with Azure Service Fabric
 
Java Abs Dynamic Server Replication
Java Abs   Dynamic Server ReplicationJava Abs   Dynamic Server Replication
Java Abs Dynamic Server Replication
 
Continuous Deployment to the cloud
Continuous Deployment to the cloudContinuous Deployment to the cloud
Continuous Deployment to the cloud
 
Introduction to GOCD - Amulya Sharma
Introduction to GOCD - Amulya SharmaIntroduction to GOCD - Amulya Sharma
Introduction to GOCD - Amulya Sharma
 
Technology insights: Decision Science Platform
Technology insights: Decision Science PlatformTechnology insights: Decision Science Platform
Technology insights: Decision Science Platform
 

More from Len Bass

Devops syllabus
Devops syllabusDevops syllabus
Devops syllabusLen Bass
 
DevOps Syllabus summer 2020
DevOps Syllabus summer 2020DevOps Syllabus summer 2020
DevOps Syllabus summer 2020Len Bass
 
11 secure development
11  secure development 11  secure development
11 secure development Len Bass
 
10 disaster recovery
10 disaster recovery  10 disaster recovery
10 disaster recovery Len Bass
 
9 postproduction
9 postproduction 9 postproduction
9 postproduction Len Bass
 
8 pipeline
8 pipeline 8 pipeline
8 pipeline Len Bass
 
7 configuration management
7 configuration management 7 configuration management
7 configuration management Len Bass
 
6 microservice architecture
6 microservice architecture6 microservice architecture
6 microservice architectureLen Bass
 
5 infrastructure security
5 infrastructure security5 infrastructure security
5 infrastructure securityLen Bass
 
4 container management
4  container management4  container management
4 container managementLen Bass
 
3 the cloud
3 the cloud 3 the cloud
3 the cloud Len Bass
 
1 virtual machines
1 virtual machines1 virtual machines
1 virtual machinesLen Bass
 
2 networking
2 networking2 networking
2 networkingLen Bass
 
Quantum talk
Quantum talkQuantum talk
Quantum talkLen Bass
 
Icsa2018 blockchain tutorial
Icsa2018 blockchain tutorialIcsa2018 blockchain tutorial
Icsa2018 blockchain tutorialLen Bass
 
Experience in teaching devops
Experience in teaching devopsExperience in teaching devops
Experience in teaching devopsLen Bass
 
Understanding blockchains
Understanding blockchainsUnderstanding blockchains
Understanding blockchainsLen Bass
 
What is a blockchain
What is a blockchainWhat is a blockchain
What is a blockchainLen Bass
 
My first deployment pipeline
My first deployment pipelineMy first deployment pipeline
My first deployment pipelineLen Bass
 
Packaging tool options
Packaging tool optionsPackaging tool options
Packaging tool optionsLen Bass
 

More from Len Bass (20)

Devops syllabus
Devops syllabusDevops syllabus
Devops syllabus
 
DevOps Syllabus summer 2020
DevOps Syllabus summer 2020DevOps Syllabus summer 2020
DevOps Syllabus summer 2020
 
11 secure development
11  secure development 11  secure development
11 secure development
 
10 disaster recovery
10 disaster recovery  10 disaster recovery
10 disaster recovery
 
9 postproduction
9 postproduction 9 postproduction
9 postproduction
 
8 pipeline
8 pipeline 8 pipeline
8 pipeline
 
7 configuration management
7 configuration management 7 configuration management
7 configuration management
 
6 microservice architecture
6 microservice architecture6 microservice architecture
6 microservice architecture
 
5 infrastructure security
5 infrastructure security5 infrastructure security
5 infrastructure security
 
4 container management
4  container management4  container management
4 container management
 
3 the cloud
3 the cloud 3 the cloud
3 the cloud
 
1 virtual machines
1 virtual machines1 virtual machines
1 virtual machines
 
2 networking
2 networking2 networking
2 networking
 
Quantum talk
Quantum talkQuantum talk
Quantum talk
 
Icsa2018 blockchain tutorial
Icsa2018 blockchain tutorialIcsa2018 blockchain tutorial
Icsa2018 blockchain tutorial
 
Experience in teaching devops
Experience in teaching devopsExperience in teaching devops
Experience in teaching devops
 
Understanding blockchains
Understanding blockchainsUnderstanding blockchains
Understanding blockchains
 
What is a blockchain
What is a blockchainWhat is a blockchain
What is a blockchain
 
My first deployment pipeline
My first deployment pipelineMy first deployment pipeline
My first deployment pipeline
 
Packaging tool options
Packaging tool optionsPackaging tool options
Packaging tool options
 

Recently uploaded

ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfStefano Stabellini
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noidabntitsolutionsrishis
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 

Recently uploaded (20)

ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdf
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 

Architecture for the cloud deployment case study future

  • 1. Architecting for the Cloud Len and Matt Bass Deployment
  • 2. Deployment pipeline • Developers commit code • Code is compiled • Binary is processed by a build and unit test tool which builds the service • Integration tests are run followed by performance tests. • Result is a machine image (assuming virtualization) • The service (its image) is deployed to production. 2
  • 3. Deployment Overview 3 Multiple instances of a service are executing • Red is service being replaced with new version • Blue are clients • Green are dependent services VAVB VBVB UAT / staging / performance tests
  • 4. Additional considerations • Recall release plan. • Each release requires coordination among stakeholders and development teams. – Can be explicitly performed – Can be implicit in the architecture (we saw an example of this) • You as an architect need to know how frequently new releases will be deployed. – Some organizations deploy dozens of times a day – Other organizations deploy once a week – Still others once a month or longer
  • 5. Monthly deployment • There is time for coordination among development teams to ensure that components are consistent • There is time for “release manager” to ensure that a release is correct before moving it to production
  • 6. Weekly deployment • Limited time for coordination among development teams • Time for “release manager” to ensure that a release is correct before moving it to production
  • 7. Daily deployment • No time for – Coordination among development teams – Release manager to validate release • These items must be implicit in the architecture.
  • 8. Deployment goal and constraints • Goal of a deployment is to move from current state (N instances of version A of a service) to a new state (N instances of version B of a service) • Constraints associated with Continuous Deplloyment: – Any development team can deploy their service at any time. I.e. New version of a service can be deployed either before or after a new version of a client. (no synchronization among development teams) – It takes time to replace one instance of version A with an instance of version B (order of minutes) – Service to clients must be maintained while the new version is being deployed. 8
  • 9. Deployment strategies • Two basic all of nothing strategies – Red/Black– leave N instances with version A as they are, allocate and provision N instances with version B and then switch to version B and release instances with version A. – Rolling Upgrade – allocate one instance, provision it with version B, release one version A instance. Repeat N times. • Other deployment topics – Partial strategies (canary testing, A/B testing,). We will discuss them later. For now we are discussing all or nothing deployment. – Rollback – Packaging services into machine images 9
  • 10. Trade offs – Red/Black and Rolling Upgrade • Red/Black – Only one version available to the client at any particular time. – Requires 2N instances (additional costs) – Accomplished by moving environments • Rolling Upgrade – Multiple versions are available for service at the same time – Requires N+1 instances. • Both are heavily used. Choice depends on – Cost – Managing complications from using rolling upgrade 10 Update Auto Scaling Group Sort Instances Remove & Deregister Old Instance from ELB Confirm Upgrade Spec Terminate Old Instance Wait for ASG to Start New Instance Register New Instance with ELB Rolling Upgrade in EC2
  • 11. Types of failures during rolling upgrade Rolling Upgrade Failure Provisioning Research topic Logical failure Inconsistencies to be discussed Instance failure Handled by Auto Scaling Group in EC2 11
  • 12. What are the problems with Rolling Upgrade? • Recall that any development team can deploy their service at any time. • Three concerns – Maintaining consistency between different versions of the same service when performing a rolling upgrade – Maintaining consistency among different services – Maintaining consistency between a service and persistent data 12
  • 13. Maintaining consistency between different versions of the same service • Key idea – differentiate between installing a new version and activating a new version • Involves “feature toggles” (described momentarily) • Sequence – Develop version B with new code under control of feature toggle – Install each instance of version B with the new code toggled off. – When all of the instances of version A have been replaced with instances of version B, activate new code through toggling the feature. 13
  • 14. Issues • Do you remember feature toggles? • How do I manage features that extend across multiple services? • How do I activate all relevant instances at once? 14
  • 15. Feature toggle • Place feature dependent new code inside of an “if” statement where the code is executed if an external variable is true. Removed code would be the “else” portion. • Used to allow developers to check in uncompleted code. Uncompleted code is toggled off. • During deployment, until new code is activated, it will not be executed. • Removing feature toggles when a new feature has been committed is important. 15
  • 16. Multi service features • Most features will involve multiple services. • Each service has some code under control of a feature toggle. • Activate feature when all instances of all services involved in a feature have been installed. – Maintain a catalog with feature vs service version number. – A feature toggle manager determines when all old instances of each version have been replaced. This could be done using registry/load balancer. – The feature manager activates the feature. 16
  • 17. Activating feature • The feature toggle manager changes the value of the feature toggle. Two possible techniques to get new value to instances. – Push. Broadcasting the new value will instruct each instance to use new code. If a lag of several seconds between the first service to be toggled and the last can be tolerated, there is no problem. Otherwise synchronizing value across network must be done. – Pull. Querying the manager by each instance to get latest value may cause performance problems. • A coordination mechanism such as Zookeeper will overcome both problems. 17
  • 18. Maintaining consistency across versions (summary) • Install all instances before activating any new code • Use feature toggles to activate new code • Use feature toggle manager to determine when to activate new code • Use Zookeeper to coordinate activation with low overhead 18
  • 19. Maintaining consistency among different services • Use case: – Wish to deploy new version of service A without coordinating with development team for clients of service A. • I.e. new version of service A should be backward compatible in terms of its interfaces. • May also require forward compatibility in certain circumstances, e.g. rollback 19
  • 20. Achieving Backwards Compatibility • APIs can be extended but must always be backward compatible. • Leads to a translation layer External APIs (unchanging but with ability to extend or add new ones) Translation to internal APIs Client Client Internal APIs (changes require changes to translation layer but do not propagate further)
  • 21. What about dependent services? • Dependent services that are within your control should maintain backward compatibility • Dependent services not within your control (third party software) cannot be forced to maintain backward compatibility. – Minimize impact of changes by localizing interactions with third party software within a single module. – Keeping services independent and packaging as much as possible into a virtual machine means that only third party software accessed through message passing will cause problems. 21
  • 22. Forward Compatibility • Gracefully handle unknown calls and data base schema information – Suppose your service receives a method call it does not recognize. It could be intended for a later version where this method is supported. – Suppose your service retrieves a data base table with an unknown field. It could have been added to support a later version. • Forward compatibility allows a version of a service to be upgraded or rolled back independently from its clients. It involves both – The service handling unrecognized information – The client handling returns that indicate unrecognized information. 22
  • 23. Maintaining consistency between a service and persistent data • Assume new version is correct – we will discuss the situation where it is incorrect in a moment. • Inconsistency in persistent data can come about because data schema or semantics change. • Effect can be minimized by the following practices (if possible). – Only extend schema – do not change semantics of existing fields. This preserves backwards compatibility. – Treat schema modifications as features to be toggled. This maintains consistency among various services that access data. 23
  • 24. I really must change the schema • In this case, apply pattern for backward compatibility of interfaces to schemas. • Use features of database system (I am assuming a relational DBMS) to restructure data while maintaining access to not yet restructured data. 24
  • 25. Summary of consistency discussion so far. • Feature toggles are used to maintain consistency within instances of a service • Backward compatibility pattern is used to maintain consistency between a service and it s clients. • Discouraging modification of schema will maintain consistency between services and persistent data. – If schema must be modified, then synchronize modifications with feature toggles. 25
  • 26. Canary testing • Canaries are a small number of instances of a new version placed in production in order to perform live testing in a production environment. • Canaries are observed closely to determine whether the new version introduces any logical or performance problems. If not, roll out new version globally. If so, roll back canaries. • Named after canaries in coal mines. 26
  • 27. Implementation of canaries • Designate a collection of instances as canaries. They do not need to be aware of their designation. • Designate a collection of customers as testing the canaries. Can be, for example – Organizationally based – Geographically based • Then – Activate feature or version to be tested for canaries. Can be done through feature activation synchronization mechanism – Route messages from canary customers to canaries. Can be done through making registry/load balancer canary aware. 27
  • 28. A/B testing • Suppose you wish to test user response to a system variant. E.g. UI difference or marketing effort. A is one variant and B is the other. • You simultaneously make available both variants to different audiences and compare the responses. • Implementation is the same as canary testing. 28
  • 29. Rollback • New versions of a service may be unacceptable either for logical or performance reasons. • Two options in this case • Roll back (undo deployment) • Roll forward (discontinue current deployment and create a new release without the problem). • Decision to rollback or roll forward is almost never automated because there are multiple factors to consider. • Forward or backward recovery • Consequences and severity of problem • Importance of upgrade 29
  • 30. States of upgrade. • An upgrade can be in one of two states when an error is detected. – Installed (fully or partially) but new features not activated – Installed and new features activated. 30
  • 31. Possibilities • For this slide assume persistent data is correct. • Installed but new features not activated – Error must be in backward compatibility – Halt deployment – Roll back by reinstalling old version – Roll forward by creating new version and installing that • Installed with new features activated – Turn off new features – If that is insufficient, we are at prior case. 31
  • 32. Persistent data may be incorrect • Keep log of user requests (each with their own identification) • Identification of incorrect persistent data • Tag each data item with metadata that provides service and version that wrote that data • user request that caused the data to be written • Correction of incorrect persistent data (simplistic version) – Remove data written by incorrect version of a service – Install correct version – Replay user requests that caused incorrect data to be written 32
  • 33. Persistent data correction problems I will not present good solutions to these problems. 1. Replaying user requests may involve requesting features that are not in the current version. – Requests can be queued until they can be correctly re-executed – User can be informed of error (after the fact) 2. There may be domino effects from incorrect data. i.e. other calculations may be affected. – Keep pedigree for data items that allows determining which additional data items are incorrect. Remove them and regenerate them when requests replayed. – Data that escaped the system, e.g. sent to other system or shown to a user, cannot be retrieved. 33
  • 34. Summary of rollback options • Can roll back or roll forward • Rolling back without consideration of persistent data is relatively straightforward. • Managing erroneous persistent data is complicated and will likely require manual processing. 34
  • 35. Packaging of services • The last portion of the deployment pipeline is packaging services into machine images for installation. • Two dimensions – Flat vs deep service hierarchy – One service per virtual machine vs many services per virtual machine 35
  • 36. Flat vs Deep Service Hierarchy • Trading off independence of teams and possibilities for reuse. • Flat Service Hierarchy – Limited dependence among services & limited coordination needed among teams – Difficult to reuse services • Deep Service Hierarchy – Provides possibility for reusing services – Requires coordination among teams to discover reuse possibilities. This can be done during architecture definition. 36
  • 37. Services per VM Image 37 Service1 Service2 VM image Develop Develop Embed Embed One service per VM Service VM image Develop Embed Multiple services per VM
  • 38. One Possible Race Condition with Multiple Services per VM 38 TIME Initial State: VM image with Version N of Service 1 and Version N of Service 2 Developer 1 Build new image with VN+1|VN Begin provisioning process with new image Developer 2 Build new image with VN|VN+1 Begin provisioning process with new image without new version of Service 1 Results in Version N+1 of Service 1 not being updated until next build of VM image Could be prevented by VM image build tool
  • 39. Another Possible Race Condition with Multiple Services per VM 39 TIME Initial State: VM image with Version N of Service 1 and Version N of Service 2 Developer 1 Build new image with VN+1|VN Begin provisioning process with new image overwrites image created by developer 2 Developer 2 Build new image with VN+1|VN+1 Begin provisioning process with new image Results in Version N+1 of Service 2 not being updated until next build of VM image Could be prevented by provisioning tool
  • 40. Trade offs • One service per VM – Message from one service to another must go through inter VM communication mechanism – adds latency – No possibility of race condition • Multiple Services per VM – Inter VM communication requirements reduced – reduces latency – Adds possibility of race condition caused by simultaneous deployment 40
  • 41. Summary of Deployment • Rolling upgrade is common deployment strategy • Introduces requirements for consistency among – Different versions of the same service – Different services – Services and persistent data • Other deployment considerations include – Canary deployment – A/B testing – Rollback – Business continuity 41
  • 42. Architecting for the Cloud Case Study
  • 43. Overview • What is Netflix • Migrating to the cloud – Data – Build process – Testing process • Withstanding Amazon outage of April 21, 2011.
  • 44. Overview • What is Netflix • Migrating to the cloud – Data – Build process – Testing process • Withstanding Amazon outage of April 21, 2011.
  • 45. Netflix Corporation Launched in 1998 after founder was irritated at having to pay late fees on a DVD rental. DVD Model • Pay monthly membership fee that includes rentals, shipping and no late fees • Maintain online queue of desired rentals • When return last rental (depending on service plan), next item in queue is mailed to you together with a return envelope. • Customers rate movies and Netflix recommends based on your preferences Unusual culture • Unlimited vacation time for salaried workers • Workers can take any percentage of their salary in stock options (up to 100%).
  • 46. Streaming video - 1 Customers can watch Netflix streaming video on a wide variety of devices many of which feed into a TV – Roku set top box – Blu-ray disk players – Xbox 360 – TV directly – PlayStation 3 – Wii – DVRs – Nintendo Customers can stop and restart video at will. Netflix calls these locations in the films “bookmarks”.
  • 47. Streaming video - 2 In 2007 Netflix began to offer streaming video to it’s subscribers Initially, one hour of streaming video was available to customers for every dollar they spent on their plan In Jan, 2008, every customer was entitled to unlimited streaming video. In May, 2011, Netflix streaming video accounted for 22% of all internet traffic. 30% of traffic during peak usage hours. Three bandwidth tiers • Continuous bandwidth to the client of 5 Mbit/s. HDTV, surround sound • Continuous bandwidth to the client of 3Mbit/s – better than DVD • Continuous bandwidth to the client of 1.5Mbit/s – DVD quality
  • 48. Overview • What is Netflix • Migrating to the cloud – Data – Build process – Testing process • Withstanding Amazon outage of April 21, 2011.
  • 49. Netflix’s Growth In late 2008, Netflix had a single data center with Oracle as the main database system. With the growth of subscriptions and streaming video, it was clear that they would soon outgrow the data center.
  • 50. Not Just More Requests • Netflix was expanding internationally • They had unpredictable usage spikes with product releases – E.g. Wii, and Xbox – Need infrastructure to handle usage spikes • Datacenters are expensive – Netflix was running Oracle on IBM hardware (not commodity hardware) – Could switch to commodity hardware but would have to have in-house expertise • Can’t hire enough system and database administrators to grow fast enough
  • 51. Netflix Moves to Cloud Four reasons cited by Netflix for moving to the cloud 1. Every layer of the software stack needed to scale horizontally, be more reliable, redundant, and fault tolerant. This leads to reason #2 2. Outsourcing data center infrastructure to Amazon allowed Netflix engineers to focus on building and improving their business. 3. Netflix is not very good at predicting customer growth or device engagement. They underestimated their growth rate. The cloud supports rapid scaling. 4. Cloud computing is the future. This will help Netflix with recruiting engineers who are interested in honing their skills, and will help scale the business. It will also ensure competition among cloud providers helping to keep costs down. Why Amazon and EC2? In 2008, Amazon was the leading supplier. Netflix wanted an IaaS so they could focus on their core competencies.
  • 52. Netflix applications What applications does Netflix have? • Video ratings, reviews, and recommendations • Video streaming • User registration, log-in • Video queues • Billing • DVD disc management – inventory and shipping • Video metadata management – movie cast information Strategy was to move in a phased manner. • A subset of applications would move in a particular phase • Means that data may have to be replicated between cloud and data center during the move.
  • 53. Amazon’s S3 and SimpleDB SimpleDB is Amazon’s non-relational data store – Optimized for data access speed – Indexes data for fast retrieval – Physical drives are less dense to support this Amazon’s Simple Storage Service (S3) – Stores raw data – Optimized on storing larger data sets inexpensively
  • 54. Choosing Cloud-Based Data Store SimpleDB and S3 both provide • Disaster recovery • Managed fail over and fail back • Distribution across availability zones • Persistence • Eventual consistency • SimpleDB has a query language
  • 55. Cloud-Based Data Store II SimpleDB and S3 do not provide: • Transactions • Locks • Sequences • Triggers • Clocks • Constraints • Joins • Schemas and associated integrity checking • Native data types. All data in SimpleDB is string data
  • 56. Migrating Data from Oracle to SimpleDB Whole data sets were moved. E.g. instant watch bookmarks were lifted wholesale into SimpleDB Incremental data changes were kept in synch (bi-directional) between Oracle and SimpleDB. Device support moved from Oracle to the cloud. During that transition, a user may begin watching a movie on a device supported in the cloud and continue watching on a device supported in Oracle. Instant watch bookmarks are kept in synch for these types of possibilities. Once, the need for the Oracle version is removed, then the synchronization is discontinued and the data no longer resides on the Oracle database.
  • 57. Using SimpleDB and S3 Move normal DB functionality up the stack to the application. Specifically: • Do “Group By” and “Join” in application • De-normalize tables to reduce necessity for joins • Do without triggers • Manage concurrency using time stamps and conditional Put and Delete • Do without clock operations • Application checks on constraints on read and repair data as a side effect
  • 58. Exploiting SimpleDB All datasets with a significant write load are partitioned across multiple SimpleDB domains Since all data is stored as strings • Store all date-times in ISO8601 format • Zero pad any numeric columns used in sorts or WHERE clause inequalities Atomic writes requiring both a delete of an attribute and an update of another attribute in the same item are accomplished through deleting and rewriting the entire item. SimpleDB is case sensitive to attribute names. Netflix implemented a common data access layer that normalizes all attribute names (e.g. TO_UPPER) Misspelled attribute names can fail silently so Netflix has a schema validator in the common data access layer Deal with eventual consistency by avoiding reads immediately after writes. If this is not possible, use ConsistentRead.
  • 59. Netflix Build Process - Ivy Ant is a tool for automating the software build process (like Make). Ivy is an extension to Ant for managing (recording, tracking, resolving and reporting) project dependencies. Ant, Ivy
  • 60. Netflix Build Process - Artifactory Artifactory is a repository manager. • Artifactory acts as a proxy between your build tool (Maven, Ant, Ivy, Gradle etc.) and the outside world. • It caches remote artifacts so that you don’t have to download them over and over again. • It blocks unwanted (and sometimes security-sensitive) external requests for internal artifacts and controls how and where artifacts are deployed, and by whom. Artifactory
  • 61. Netflix Build Process - Jenkins Jenkins is an open source continuous integration tool. Jenkins provides a so-called continuous integration system, making it easier for developers to integrate changes to the project, and making it easier for users to obtain a fresh build. Jenkins monitors executions of externally-run jobs, such as cron jobs and procmail jobs, even those that are run on a remote machine. For example, with cron, Jenkins keeps e-mail outputs in a repository.
  • 62. Migration Lessons • The change from reliable to commodity hardware impacts many things – E.g. in the old world memory based session state was fine • Hardware failures were rare – Not appropriate on the cloud • Co-tenancy is hard – In the cloud all resources are shared e.g. • Hardware, network, storage, … – This introduces variable performance depending on load • At all levels of the stack – Have to be willing to abandon sub-task or manage resources to avoid co-tenancy • The best way to avoid failure is to fail consistently – Netflix has designed their systems to be tolerant to failure – They test this capability regularly
  • 63. Unleash the Monkey • As a result of disruption of services Netflix adopted a process of testing the system • They learned that the best way to test the system was at scale • In order to learn how their systems handle failure they introduced faults – Along came the “Chaos Monkey” • The Chaos Monkey randomly disables production instances • This is done in a tightly controlled environment with engineers standing by – But it is done in the production environment • The engineers then learn how their systems handle failures and can improve the solution accordingly
  • 64. Increase the Chaos • The Chaos Monkey was so successful Netflix created a “Simian Army” to test the system • The “Army” includes:Netflix Test Suite - 1 – Chaos monkey. Randomly kill a process and monitor the effect. – Latency monkey. Randomly introduce latency and monitor the effect. – Doctor monkey. The Doctor Monkey taps into health checks that run on each instance as well as monitors other external signs of health (e.g. CPU load) to detect unhealthy instances. – Janitor Monkey. The Janitor Monkey ensures that the Netflix cloud environment is running free of clutter and waste. It searches for unused resources and disposes of them.
  • 65. Netflix Test Suite - 2 • Conformity Monkey. The Conformity Monkey finds instances that don’t adhere to best- practices and shuts them down. For example, if an instance does not belong to an auto- scaling group, that is a potential problem. • Security Monkey The Security Monkey is an extension of Conformity Monkey. It finds security violations or vulnerabilities, such as improperly configured AWS security groups, and terminates the offending instances. It also ensures that all our SSL and DRM certificates are valid and are not coming up for renewal. • 10-18 Monkey The 10-18 Monkey (Localization-Internationalization) detects configuration and run time problems in instances serving customers in multiple geographic regions, using different languages and character sets. The name 10-18 comes from L10n and I18n which are the number of characters in the words localization and internationalization. • Chaos Gorilla: The Gorilla is similar to the Chaos Monkey but simulates the outage of an entire Amazon availability zone
  • 66. Overview • What is Netflix • Migrating to the cloud – Data – Build process – Testing process • Withstanding Amazon outage of April 21, 2011
  • 67. April 21, 2011 Outage – Background An EBS cluster is comprised of a set of EBS nodes. When Node A loses connectivity to a node (Node B) to which it is replicating data, Node A assumes node B has failed In this case, it must find another node to which data is replicated. This is called re- mirroring. When data is being re-mirrored, all nodes that have copies of that data retain that data and block all external access to that data. This is called the node being “stuck” In the normal case, a node is stuck for only a few milli-seconds.
  • 68. April 21, 2011 Outage - Events At 12:47 AM PDT, a configuration change was made to upgrade the capacity of the primary EBS network within one availability zone in the Eastern Region. • Standard practice is to shift traffic off of one of the redundant routers in the primary EBS network to allow the upgrade to happen • Traffic was shifted incorrectly to the lower capacity backup network. Consequently, there was no functioning primary or secondary network in the affected portion of the availability zone since traffic was shifted off of the primary and the secondary became quickly overloaded. When the error was detected and connectivity restored, a large number of nodes that had been isolated began searching for other nodes to which they could re-mirror their data. This caused a re-mirroring storm where all of the available space in the affected cluster was used up and the attempts to gain access to another availability zone overwhelmed the primary communication network. This caused the whole region to be unavailable
  • 69. April 21, 2011 Outage – Consequences At 12:04 PM PDT, the outage was contained to the one affected availability zone. 13% of the nodes in the availability zone remained “stuck. Additional capacity was added to the availability zone to allow those nodes to become unstuck. In the meantime, some of the stuck volumes experienced hardware failures. Hardware failures are a normal occurrence within a cluster. Ultimately, the data on .07% of the nodes in the affected availability zone could not be recovered and this data was permanently lost.
  • 70. Netflix and the Amazon Outage of April, 2011 On April 21, 2011, one of the availability zones in Amazon’s Eastern Region failed. – Many organizations were unavailable for the duration of the outage (24 hours) – Some organizations permanently lost data Netflix was largely unaffected – Customers experienced no disruption of service – There were some increased latencies – Higher than normal error rate – No interruption on customers ability to find and watch movies
  • 71. How Did Netflix Accomplish This? Stateless services • Services are largely stateless • If a server fails the request can be re-routed • Remember the discussion of modular redundancy? – The lack of state means failover times are negligible Data stored across availability zones • In some cases it was not practical to re-architect the system to be stateless • In these cases the there are multiple hot copies of the data across availability zones • In the event of a failure again the failover time is negligible – What is the cost of doing this?
  • 72. Design Decisions II Graceful degradation: The Netflix systems are designed for failure. Allot of thought went into what to do when they fail. • Fail fast – aggressive time outs so that failing components do not slow the whole system down • Fallbacks – every feature has the ability to fall back to a lower quality representation. E.g. if Netflix cannot generate personalized recommendations, they will fall back to non- personalized results • Feature removal. If a feature is non-critical and is slow, then it may be removed from any given page. N+1 redundancy • The system is architected with more capacity than it needs at any time • This allows them to cope with large spikes in load caused by users directly or the ripple effect of transient failures
  • 73. What Issues Were Experienced? Netflix did have some issues, however, including: • As the availability zone started to fail Netflix decided to pull out all-together • This required manual configuration changes to AWS • Each service team was involved in moving their service out of the zone • This was a time consuming and error prone process Load balancers had to be manually adjusted to keep traffic out of affected availability zone. • Netflix uses Amazon’s Elastic Load Balancing (ELB) • This first balances load to availability zones and then instances • If you have many instances go down the others in that zone have to pick up the slack • If you can’t bring up more nodes you will experience a cascading effect until all nodes go down
  • 74. Amazon Outage of Aug 8, 2011 On Aug 8, 2011, the Eastern Region of AWS went down for about half an hour due to connectivity issues that affected all availability zones … Netflix was down as well…
  • 75. Summary Netfflix moved to the cloud because • They doubted their ability to correctly predict the load • They did not want to invest in data centers directly Netflix chose SimpleDB as their original target database system and their migration strategy involved • Running Oracle and SimpleDB concurrently for a period • Moving some features provided by Oracle up the application stack Netflix has a number of test programs that inject faults or- look for specific types of violations. Neflix re-architected their build process to support sophisticated deployment practices and to allow for continuous integration. Netflix did a number of things to promote availability and these enabled Netflix to continue customer service through an extensive AWS outage.
  • 78. Architecting for the Cloud Future Trends
  • 79. Topics • Containers • Augmented reality • Cloudlets • Internet of things – cars, applicances, etc
  • 80. Containers • Loading full VM at deployment time – Takes time for large machine images – Results in many slightly different VM images for different versions – Depends on particular processor even if virtualized. • Containers are a proposed solution – Machine image consists of OS + libraries – Application portion runs inside of this machine image and is loaded by the machine image once it is instantiated.
  • 81. What is a Container
  • 84. Adoption of Containers • Containers are an old concept • Docker is a software system that packages the creation of containers • Cloud providers are adopting containers since they control the entire stack and can choose OSs.
  • 85. Topics • Containers • Augmented reality • Cloudlets • Internet of things – cars, applicances, etc
  • 86. Augmented Reality • Superimpose computer generated images over visual images
  • 87. Simple text has been available for a long time • CMU wearable computer group early 1990s • Google Glass modern display device
  • 88. Computer power needed for more sophisticated images • Route planning – showing what is ahead if you take a particular route • Visualizing furniture in a room • These applications require – Location determination – Orientation determination • This computer power is not available on mobile devices. • This is a lead in to “cloudlets”
  • 89. Cloudlets • Mobile devices do not have the CPU power necessary for some applications. • Consequently, they act as data sources for apps in the cloud. – This introduces latency in responding to a user request – This latency does not matter for some purposes – e.g. airline reservations, music streaming – But – it does matter for others, e.g. voice recognition, translation, augmented reality.
  • 90. How does one match computation power and data with low latency? • Move computation power closer to the data. • But, but, but – Data is inherently mobile (with some latency) – Computation is tied to a server. Real computation power is not so mobile. – I am in an automobile and my mobile is constantly moving and getting further away from any fixed computation source
  • 91. Anyway, won’t my mobile have sufficient computational power? • Mobiles will always inherently be less powerful relative to static client and server hardware • Mobiles have to trade off – size, – weight, – battery life, – storage with computational power • Computational power is always going to be of lower priority than these other qualities.
  • 92. Cloudlet infrastructure • A cloudlet infrastructure is – A collection of servers with more computational power than my mobile – Connected to the internet – Connected to a power source – One data hop away from my mobile • Where would these servers live? – Doctor’s offices – Coffee shops – Think wireless access points • A new client would instantiate a VM with the desired app in a local cloudlet server.
  • 93. How would it work? • Three models 1. Download application VM from cloud into cloudlet server on demand. • Time consuming • Allows for bare cloudlet server • Allows arbitrary applications 2. Pre load most popular application VMs • Fast • Limited in terms of number of applications available 3. Use containers to preload most popular platforms and download remaining software of an app. • Intermediate in time required to initiate • Intermediate in flexibility of apps.
  • 94. Revisit AR application • Consider what is involved to show what is ahead based on head location. – Location – Orientation • Cloudlets have potential to supply necessary infrastructure.
  • 95. Summary of cloudlets • Locally available servers with higher computational power than mobiles • Enables real time application of some technologies such as augmented reality or real time translation • Based on creating VM in local server with necessary app • Feasible with current technology
  • 96. Topics • Containers • Augmented reality • Cloudlets • Internet of things – cars, applicances, etc
  • 97. Internet of Things • “Everything” is connected to the internet – Toasters – Refrigerators – Automobiles – … • Every device has both data generation and control aspects. • From a cloud perspective, data generation is important.
  • 98. Data available from a device • Current state • Current activity • Current location • Current power usage • Information about the environment of the device • …
  • 99. How frequently is the data sampled? • Suppose each device is sampled once a minute • Suppose each sample contains 1MB • The number of devices being sampled may be in the 100,000s or1,000,000s (think automobiles or engines, or tires, or …) • Petabytes of data streaming into a data center. • “Big Data” is the buzz term used for this level of data.
  • 100. Leads to streaming DBMSs • SQLstream • STREAM [1] • AURORA,[2] StreamBase Systems, Inc. • TelegraphCQ [3] • NiagaraCQ,[4] • QStream • PIPES, webMethods Business Events • StreamGlobe • Odysseus • StreamInsight • InfoSphere Streams • Kinesis
  • 101. Requirements for streaming dbms Database management system (DBMS) Data stream management system (DSMS) Persistent data (relations) volatile data streams Random access Sequential access One-time queries Continuous queries (theoretically) unlimited secondary storage limited main memory Only the current state is relevant Consideration of the order of the input relatively low update rate potentially extremely high update rate Little or no time requirements Real-time requirements Assumes exact data Assumes outdated/inaccurate data Plannable query processing Variable data arrival and data characteristics
  • 102. Managing the data • Synopses – Select sample of raw data to support a particular analysis. – Moving average of data – May be inaccurate when performing analysis • Windows – Only look at a portion of the data – Last 10 elements – Last 10 seconds • Choice of approach for managing the data is driven by the types of analyses that the data is used for.
  • 103. Demand is for ever fasting processing • Businesses want to use the data for real time reaction • E.g. Amazon has “customers who look at this also looked at”. Suppose they use this information and your browsing history to offer you a deal on two items rather than have you decide individually to buy the two items. • Just one example out of hundreds.
  • 104. Issues • Managing the volume of data • Security – how do you know data is correctly identified? • Privacy – what kind of information sharing will consumers tolerate?
  • 105. Summary for Internet of Things • If every device is connected to the internet, the data generated will be massive • Specialized data management systems • Specialized analysis systems. • Problems include security and privacy