Your SlideShare is downloading. ×
0
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Design consequences of dev ops practices
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Design consequences of dev ops practices

1,564

Published on

This is a tutorial that I am giving at WICSA 2014 http://www.wicsa.net/ and SATURN 2014 https://www.sei.cmu.edu/saturn/2014/

This is a tutorial that I am giving at WICSA 2014 http://www.wicsa.net/ and SATURN 2014 https://www.sei.cmu.edu/saturn/2014/

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,564
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
64
Comments
0
Likes
3
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  1. NICTA Copyright 2012 From imagination to impact Design Consequences of DevOps Practices Len Bass
  2. NICTA Copyright 2012 From imagination to impact Introductions • Me • You 2
  3. NICTA Copyright 2012 From imagination to impact Overview of Tutorial • DevOps practices when taken to the limit for internet scale organizations => continuous delivery • Economics of deployment when have many instances of services => rolling upgrade 3
  4. NICTA Copyright 2012 From imagination to impact Outline • What is DevOps? – Definitions – Deriving architecturally significant requirement • Architectural style elaboration • Deployment • Summary 4
  5. NICTA Copyright 2012 From imagination to impact What is DevOps? • “DevOps is a software development method that stresses communication, collaboration, and integration between software developers and IT professionals” – Wikipedia • From an architect’s or developers’ perspective it means treating system administrators and operators as first class stakeholders. 5
  6. NICTA Copyright 2012 From imagination to impact What is DevOps - 2 • DevOps is accompanied by a certain amount of mysticism. – “Be Self-Aware – Be aware of a project’s maturity – Be aware of others” http://architects.dzone.com/articles/zen-and-art-collaborative • Similar to the early days of agile. 6
  7. NICTA Copyright 2012 From imagination to impact What problem is DevOps trying to solve? • Poor communication between developers and operations personnel • Slow release schedule • Limited capacity of operations staff • Limited organizational insight into operations 7
  8. NICTA Copyright 2012 From imagination to impact Communication between developers and operations staff • Log messages – What information is needed to do monitoring and error diagnosis? – Where is the best place to put particular types of information? • Release planning – What is the scheduling for the next release? – What capacity is needed for the next release? – What are the infrastructure compatibility requirements for the next release? 8
  9. NICTA Copyright 2012 From imagination to impact Release plan 1. Define and agree release and deployment plans with customers/stakeholders. 2. Ensure that each release package consists of a set of related assets and service components that are compatible with each other. 3. Ensure that integrity of a release package and its constituent components is maintained throughout the transition activities and recorded accurately in the configuration management system. 4. „„Ensure that all release and deployment packages can be tracked, installed, tested, verified, and/or uninstalled or backed out, if appropriate. 5. „„Ensure that change is managed during the release and deployment activities. 6. „„Record and manage deviations, risks, issues related to the new or changed service, and take necessary corrective action. 7. „„Ensure that there is knowledge transfer to enable the customers and users to optimise their use of the service to support their business activities. 8. „„Ensure that skills and knowledge are transferred to operations and support staff to enable them to effectively and efficiently deliver, support and maintain the service, according to required warranties and service levels *http://en.wikipedia.org/wiki/Deployment_Plan 9
  10. NICTA Copyright 2012 From imagination to impact Limited capacity of operations staff • The number of physical servers that can be administered by a single sys admin varies depending on context but some data* – As low as 10 per admin – Norm of 30 per admin at small-medium businesses • Depends on whether admin performs just maintenance or whether admin is also involved in other projects *http://www.computerworld.com.au/article/352635/there_best_practice_server_ system_administrator_ratio_/ 10
  11. NICTA Copyright 2012 From imagination to impact Limited Organizational insight into operations • An organization has budgetary insight into operations. • The impact of various operational activities on business value is difficult to discern. • This is a long running complaint that goes under the heading of “aligning IT with the business”. There are differences in – Objectives – Culture – Incentives 11
  12. NICTA Copyright 2012 From imagination to impact DevOps can also be a role • DevOps practices rely on a high degree of automation and standardization of tools • Someone has to be responsible for these tools. • Person filling this role is “DevOps Engineer” 12
  13. NICTA Copyright 2012 From imagination to impact My Take on DevOps • DevOps is a set of practices intended to – Reduce management overhead – Speed up deployment – Move some (formerly) IT responsibilities to developers – Increase communication between developers and operations – Reduce operations costs • Are there architecturally significant requirements in these practices? 13
  14. NICTA Copyright 2012 From imagination to impact Architecturally significant requirement • Speed up deployment through minimizing synchronous coordination among development teams. • Synchronous coordination such as a meeting adds time since it requires – Ensuring that all parties are available – Ensuring that all parties have the background to make the coordination productive. – Following up to decisions made during the meeting. 14
  15. NICTA Copyright 2012 From imagination to impact Summary of this section • DevOps is a collection of practices designed, among other things, to reduce time to deploy new features. • Reducing time to deploy new features can be accomplished by reducing synchronous coordination among development teams – This is an architecturally significant requirement that we will carry forward. 15
  16. NICTA Copyright 2012 From imagination to impact Questions 16
  17. NICTA Copyright 2012 From imagination to impact Outline • What is DevOps? • Architectural Style Elaboration – Micro Service Oriented Architecture – Categories of design decisions – How micro SOA specifies or delegates the categories of design decisions • Deployment • Summary 17
  18. NICTA Copyright 2012 From imagination to impact Deployment pipeline • Developers commit code • Code is compiled • Binary is processed by a build and unit test tool which builds the service • Integration tests are run followed by performance tests. • Result is a machine image (assuming virtualization) • The service (its image) is deployed to production. 18
  19. NICTA Copyright 2012 From imagination to impact Continuous Deployment • Deployment pipeline is triggered by commit of code • All gates from one phase to the next are automatic. 19
  20. NICTA Copyright 2012 From imagination to impact Requirements that drive the design in this section • Reduce synchronous communication among development teams – Continuous deployment – Individual developers can commit to production (as long as automated tests are passed) • Scalability and performance • Reliability • A different ordering of requirements will produce a different design 20
  21. NICTA Copyright 2012 From imagination to impact Architectural Style • An architectural style (pattern) can specify many decisions that might otherwise require synchronous coordination among development teams. • The remainder of this section will justify why the Micro Service Oriented Architecture style satisfies our identified Architecturally Significant Requirement. 21
  22. NICTA Copyright 2012 From imagination to impact Amazon design rules - 1 • All teams will henceforth expose their data and functionality through service interfaces. • Teams must communicate with each other through these interfaces. • There will be no other form of inter-process communication allowed: no direct linking, no direct reads of another team’s data store, no shared-memory model, no back-doors whatsoever. The only communication allowed is via service interface calls over the network. 22
  23. NICTA Copyright 2012 From imagination to impact Amazon design rules - 2 • It doesn’t matter what technology they[services] use. • All service interfaces, without exception, must be designed from the ground up to be externalizable. • Amazon is optimizing for its workload with these requirements – Mainly searching and browsing and web page delivery – Some transactions but not the dominant portion of the workload. 23
  24. NICTA Copyright 2012 From imagination to impact Micro service oriented architecture 24 Service • Each user request is satisfied by some sequence of services. • Most services are not externally available. • Each service communicates with other services through service interfaces. • Service depth may be 70, e.g. LinkedIn
  25. NICTA Copyright 2012 From imagination to impact Relation of teams and services • Each service is the responsibility of a single development team • Individual developers can deploy new version without coordination with other developers. • It is possible that a single development team is responsible for multiple services • Team size • Coordination among team members must be high bandwidth and low overhead. • Typically is done with small teams – as in agile. 25
  26. NICTA Copyright 2012 From imagination to impact Design decisions • Seven categories of design decisions*. 1. Allocation of responsibilities. 2. Coordination model. 3. Data model. 4. Management of resources. 5. Mapping among architectural elements. 6. Binding time decisions. 7. Choice of technology *Software Architecture in Practice 3rd edition, Chap 4 26
  27. NICTA Copyright 2012 From imagination to impact Design decisions made or delegated by choice of micro SOA • Micro service oriented architecture either specifies or delegates to the development team five out of the seven categories of design decisions. 1. Allocation of responsibilities. 2. Coordination model. 3. Data model. 4. Management of resources. 5. Mapping among architectural elements. 6. Binding time decisions. 7. Choice of technology 27
  28. NICTA Copyright 2012 From imagination to impact Roadmap for next several slides • Micro service oriented architectural style will either specify or allow delegation of five different categories of design decisions. • Each decision category will be discussed separately. 28
  29. NICTA Copyright 2012 From imagination to impact Decision 1 – allocation of responsibilities • This decision is not delegated to the team or specified. • Development teams must coordinate to divide responsibilities for features that are to be added. • Typically this happens at the beginning of each iteration cycle. 29
  30. NICTA Copyright 2012 From imagination to impact Decision 2 - coordination model • Elements of service interaction – Services communicate asynchronously through message passing – Each service could (in principle) be deployed anywhere on the net. • Latency requirements will probably force particular deployment location choices. • Services must discover location of dependent services. 30
  31. NICTA Copyright 2012 From imagination to impact Service discovery 31 • When an instance of a service is launched, it registers with a registry/load balancer • When a client wishes to utilize a service, it gets the location of an instance from the registry/load balancer. • Eureka is an open source registry/load balancer Instance of a service Client Register Invoke Registry/ load balancer Query registry
  32. NICTA Copyright 2012 From imagination to impact Subtleties of registry/load balancer • When multiple instances of the same service have registered, the load balancer can rotate through them to equalize number of requests to each instance. • Each instance must renew its registration periodically (~90 seconds) so that load balancer does not schedule message to failed instance. • Registry can keep other information as well as address of instance. For example, version number of service instance. 32
  33. NICTA Copyright 2012 From imagination to impact Decision 3 – Data model • Schema based database system (relational). Requires coordination. – Development teams must coordinate when schema is defined or modified. – Schema definition happens once when the architecture is defined. Schema modification should be rare occurrence. Schema extensions (new fields or tables) do not cause problems. • NoSQL systems. Will still require coordination over semantics of data. – Data written by one service is typically read by others, they must agree on semantics. 33
  34. NICTA Copyright 2012 From imagination to impact Decision 4 – Resource Management • Each instance of a service can process a certain workload. – Could be expressed in terms of requests – Could be expressed in terms of resource requirements – e.g. CPU • Each client instance will require resources from the service to process its requests. • Service Level Agreements (SLAs) are a means for automating the resource assumptions of the clients and the resource requirements of the service. 34
  35. NICTA Copyright 2012 From imagination to impact Managing SLAs • A requirement for each service is to provide an SLA for its response time in terms of the workload asked of it. – E.g. For a workload of Y requests per second, I will provide a response within X seconds. • A requirement for each client is to provide an estimate of the requests it will make of each dependent service. – E.g. for each request I receive, I will make Z requests for your service per second. • This combination will enable a run time determination of the number of instances required for each service to meet its SLA. 35
  36. NICTA Copyright 2012 From imagination to impact Provisioning new instances • When the desired workload of a service is greater than can be provided by the existing number of instances of that service, new instances can be instantiated (at runtime). • Four possibilities for initiating new instance of a service: 1. Client. Client determines whether service is adequately provisioned for its needs based on service SLA and services current workload. 2. Service. Service determines whether it is adequately provisioned based on number of requests it expects from clients. 3. Registry/load balancer determines appropriate number of instances of a service based on SLA and client instance requests. 4. External entity can initiate creation of new instances 36
  37. NICTA Copyright 2012 From imagination to impact Responsibilities of development teams. • SLA determination of a service is done by the service development team prior to deployment augmented by run time discovery. • Determination of a client's requirements for a service are is done by the client’s development team. • Choice of which component has responsibility for instantiating/deinstantiating instances of a service is done as a portion of the architecture definition. 37
  38. NICTA Copyright 2012 From imagination to impact Decision 5 – Mapping among architectural elements • Decisions about packaging modules into processes and processes into a service are delegated to the service development team. • Decisions about deployment of a service will be discussed in the next section. 38
  39. NICTA Copyright 2012 From imagination to impact Decision 6 – Binding time • Configuration information binding time is decided during the development of architecture and the deployment pipeline. • Other binding time decisions are delegated to the service development team. 39
  40. NICTA Copyright 2012 From imagination to impact Decisions 7 – Technology choices • All technology choices are delegated to the service development team. 40
  41. NICTA Copyright 2012 From imagination to impact Questions about Micro SOA • /Q/ Isn’t it possible that different teams will implement the same functionality, likely differently? • /A/ Yes, but so what? Major duplications are avoided through assignment of responsibilities to services. Minor duplications are the price to be paid to avoid necessity for synchronous coordination. • /Q/ what about transactions? • /A/ Micro SOA privileges flexibility above reliability and performance. Transactions are recoverable through logging of service interactions. This may introduce some delays if failures occur. 41
  42. NICTA Copyright 2012 From imagination to impact Summary • Synchronous coordination among development teams is avoided by – Using a micro SOA architecture – Having the architecture specify the coordination model and resource management techniques used by the application. – Delegating to the development team mapping, binding time, and technology decisions. – Having each service be the responsibility of a single development team. • Micro SOA privileges flexibility and development team independence over performance and reliability. 42
  43. NICTA Copyright 2012 From imagination to impact Questions 43
  44. NICTA Copyright 2012 From imagination to impact Outline • What is DevOps? • Overall Architectural Style • Deployment – Deployment strategies – Maintaining Logical Consistency. • Summary 44
  45. NICTA Copyright 2012 From imagination to impact Deployment Overview 45 Multiple instances of a service are executing • Red is service being replaced with new version • Blue are clients • Green are dependent services VAVB VBVB UAT / staging / performance tests
  46. NICTA Copyright 2012 From imagination to impact Deployment goal and constraints • Goal of a deployment is to move from current state (N instances of version A of a service) to a new state (N instances of version B of a service) • Constraints: – Any development team can deploy their service at any time. I.e. New version of a service can be deployed either before or after a new version of a client. (no synchronization among development teams) – It takes time to replace one instance of version A with an instance of version B (order of minutes) – Service to clients must be maintained while the new version is being deployed. 46
  47. NICTA Copyright 2012 From imagination to impact Deployment strategies • Two basic all of nothing strategies – Big Flip – leave N instances with version A as they are, allocate and provision N instances with version B and then switch to version B and release instances with version A. – Rolling Upgrade – allocate one instance, provision it with version B, release one version A instance. Repeat N times. • Other deployment topics – Partial strategies (canary testing, A/B testing,). We will discuss them later. For now we are discussing all or nothing deployment. – Rollback – Packaging services into machine images 47
  48. NICTA Copyright 2012 From imagination to impact Trade offs - Big Flip and Rolling Upgrade • Big Flip – Only one version available to the client at any particular time. – Requires 2N instances (additional costs) • Rolling Upgrade – Multiple versions are available for service at the same time – Requires N+1 instances. • Rolling upgrade is commonly preferred. 48 Update Auto Scaling Group Sort Instances Remove & Deregister Old Instance from ELB Confirm Upgrade Spec Terminate Old Instance Wait for ASG to Start New Instance Register New Instance with ELB Rolling Upgrade in EC2
  49. NICTA Copyright 2012 From imagination to impact Types of failures during rolling upgrade Rolling Upgrade Failure Provisioning See references at end Logical failure Inconsistencies to be discussed Instance failure Handled by Auto Scaling Group in EC2 49
  50. NICTA Copyright 2012 From imagination to impact What are the problems with Rolling Upgrade? • Recall that any development team can deploy their service at any time. • Three concerns – Maintaining consistency between different versions of the same service when performing a rolling upgrade – Maintaining consistency among different services – Maintaining consistency between a service and persistent data 50
  51. NICTA Copyright 2012 From imagination to impact Maintaining consistency between different versions of the same service • Key idea – differentiate between installing a new version and activating a new version • Involves “feature toggles” (described momentarily) • Sequence – Develop version B with new code under control of feature toggle – Install each instance of version B with the new code toggled off. – When all of the instances of version A have been replaced with instances of version B, activate new code through toggling the feature. 51
  52. NICTA Copyright 2012 From imagination to impact Issues • What is a feature toggle? • How do I manage features that extend across multiple services? • How do I activate all relevant instances at once? 52
  53. NICTA Copyright 2012 From imagination to impact Feature toggle • Place feature dependent new code inside of an “if” statement where the code is executed if an external variable is true. Removed code would be the “else” portion. • Used to allow developers to check in uncompleted code. Uncompleted code is toggled off. • During deployment, until new code is activated, it will not be executed. • Removing feature toggles when a new feature has been committed is important. 53
  54. NICTA Copyright 2012 From imagination to impact Multi service features • Most features will involve multiple services. • Each service has some code under control of a feature toggle. • Activate feature when all instances of all services involved in a feature have been installed. – Maintain a catalog with feature vs service version number. – A feature toggle manager determines when all old instances of each version have been replaced. This could be done using registry/load balancer. – The feature manager activates the feature. 54
  55. NICTA Copyright 2012 From imagination to impact Activating feature • The feature toggle manager changes the value of the feature toggle. Two possible techniques to get new value to instances. – Push. Broadcasting the new value will instruct each instance to use new code. If a lag of several seconds between the first service to be toggled and the last can be tolerated, there is no problem. Otherwise synchronizing value across network must be done. – Pull. Querying the manager by each instance to get latest value may cause performance problems. • A coordination mechanism such as Zookeeper will overcome both problems. I will discuss Zookeeper if I have time at the end. 55
  56. NICTA Copyright 2012 From imagination to impact Maintaining consistency across versions (summary) • Install all instances before activating any new code • Use feature toggles to activate new code • Use feature toggle manager to determine when to activate new code • Use Zookeeper to coordinate activation with low overhead 56
  57. NICTA Copyright 2012 From imagination to impact Maintaining consistency among different services • Use case: – Wish to deploy new version of service A without coordinating with development team for clients of service A. • I.e. new version of service A should be backward compatible in terms of its interfaces. • May also require forward compatibility in certain circumstances, e.g. rollback 57
  58. NICTA Copyright 2012 From imagination to impact Achieving Backwards Compatibility • APIs can be extended but must always be backward compatible. • Leads to a translation layer External APIs (unchanging but with ability to extend or add new ones) Translation to internal APIs Client Client Internal APIs (changes require changes to translation layer but do not propagate further)
  59. NICTA Copyright 2012 From imagination to impact What about dependent services? • Dependent services that are within your control should maintain backward compatibility • Dependent services not within your control (third party software) cannot be forced to maintain backward compatibility. – Minimize impact of changes by localizing interactions with third party software within a single module. – Keeping services independent and packaging as much as possible into a virtual machine means that only third party software accessed through message passing will cause problems. 59
  60. NICTA Copyright 2012 From imagination to impact Forward Compatibility • Gracefully handle unknown calls and data base schema information – Suppose your service receives a method call it does not recognize. It could be intended for a later version where this method is supported. – Suppose your service retrieves a data base table with an unknown field. It could have been added to support a later version. • Forward compatibility allows a version of a service to be upgraded or rolled back independently from its clients. It involves both – The service handling unrecognized information – The client handling returns that indicate unrecognized information. 60
  61. NICTA Copyright 2012 From imagination to impact Maintaining consistency between a service and persistent data • Assume new version is correct – we will discuss the situation where it is incorrect in a moment. • Inconsistency in persistent data can come about because data schema or semantics change. • Effect can be minimized by the following practices (if possible). – Only extend schema – do not change semantics of existing fields. This preserves backwards compatibility. – Treat schema modifications as features to be toggled. This maintains consistency among various services that access data. 61
  62. NICTA Copyright 2012 From imagination to impact I really must change the schema • In this case, apply pattern for backward compatibility of interfaces to schemas. • Use features of database system (I am assuming a relational DBMS) to restructure data while maintaining access to not yet restructured data. 62
  63. NICTA Copyright 2012 From imagination to impact Summary of consistency discussion so far. • Feature toggles are used to maintain consistency within instances of a service • Backward compatibility pattern is used to maintain consistency between a service and it s clients. • Discouraging modification of schema will maintain consistency between services and persistent data. – If schema must be modified, then synchronize modifications with feature toggles. 63
  64. NICTA Copyright 2012 From imagination to impact Canary testing • Canaries are a small number of instances of a new version placed in production in order to perform live testing in a production environment. • Canaries are observed closely to determine whether the new version introduces any logical or performance problems. If not, roll out new version globally. If so, roll back canaries. • Named after canaries in coal mines. 64
  65. NICTA Copyright 2012 From imagination to impact Implementation of canaries • Designate a collection of instances as canaries. They do not need to be aware of their designation. • Designate a collection of customers as testing the canaries. Can be, for example – Organizationally based – Geographically based • Then – Activate feature or version to be tested for canaries. Can be done through feature activation synchronization mechanism – Route messages from canary customers to canaries. Can be done through making registry/load balancer canary aware. 65
  66. NICTA Copyright 2012 From imagination to impact A/B testing • Suppose you wish to test user response to a system variant. E.g. UI difference or marketing effort. A is one variant and B is the other. • You simultaneously make available both variants to different audiences and compare the responses. • Implementation is the same as canary testing. 66
  67. NICTA Copyright 2012 From imagination to impact Rollback • New versions of a service may be unacceptable either for logical or performance reasons. • Two options in this case • Roll back (undo deployment) • Roll forward (discontinue current deployment and create a new release without the problem). • Decision to rollback or roll forward is almost never automated because there are multiple factors to consider. • Forward or backward recovery • Consequences and severity of problem • Importance of upgrade 67
  68. NICTA Copyright 2012 From imagination to impact States of upgrade. • An upgrade can be in one of two states when an error is detected. – Installed (fully or partially) but new features not activated – Installed and new features activated. 68
  69. NICTA Copyright 2012 From imagination to impact Possibilities • Initially we will discuss the situation where persistent data is not incorrect. Later we will discuss persistent data. • Installed but new features not activated – Error must be in backward compatibility – Halt deployment – Roll back by reinstalling old version – Roll forward by creating new version and installing that • Installed with new features activated – Turn off new features – If that is insufficient, we are at prior case. 69
  70. NICTA Copyright 2012 From imagination to impact Persistent data • Keep log of user requests (each with their own identification) • Identification of incorrect persistent data • Tag each data item with metadata that provides service and version that wrote that data • user request that caused the data to be written • Correction of incorrect persistent data (simplistic version) – Remove data written by incorrect version of a service – Install correct version – Replay user requests that caused incorrect data to be written 70
  71. NICTA Copyright 2012 From imagination to impact Persistent data correction problems I will not present good solutions to these problems. 1. Replaying user requests may involve requesting features that are not in the current version. – Requests can be queued until they can be correctly re-executed – User can be informed of error (after the fact) 2. There may be domino effects from incorrect data. i.e. other calculations may be affected. – Keep pedigree for data items that allows determining which additional data items are incorrect. Remove them and regenerate them when requests replayed. – Data that escaped the system, e.g. sent to other system or shown to a user, cannot be retrieved. 71
  72. NICTA Copyright 2012 From imagination to impact Summary of rollback options • Can roll back or roll forward • Rolling back without consideration of persistent data is relatively straightforward. • Managing erroneous persistent data is complicated and will likely require manual processing. 72
  73. NICTA Copyright 2012 From imagination to impact Packaging of services • The last portion of the deployment pipeline is packaging services into machine images for installation. • Two dimensions – Flat vs deep service hierarchy – One service per virtual machine vs many services per virtual machine 73
  74. NICTA Copyright 2012 From imagination to impact Flat vs Deep Service Hierarchy • Trading off independence of teams and possibilities for reuse. • Flat Service Hierarchy – Limited dependence among services & limited coordination needed among teams – Difficult to reuse services • Deep Service Hierarchy – Provides possibility for reusing services – Requires coordination among teams to discover reuse possibilities. This can be done during architecture definition. 74
  75. NICTA Copyright 2012 From imagination to impact Services per VM Image 75 Service 1 Service 2 VM image Develop Develop Embed Embed One service per VM Service VM image Develop Embed Multiple services per VM
  76. NICTA Copyright 2012 From imagination to impact One Possible Race Condition with Multiple Services per VM 76 TIME Initial State: VM image with Version N of Service 1 and Version N of Service 2 Developer 1 Build new image with VN+1|VN Begin provisioning process with new image Developer 2 Build new image with VN|VN+1 Begin provisioning process with new image without new version of Service 1 Results in Version N+1 of Service 1 not being updated until next build of VM image Could be prevented by VM image build tool
  77. NICTA Copyright 2012 From imagination to impact Another Possible Race Condition with Multiple Services per VM 77 TIME Initial State: VM image with Version N of Service 1 and Version N of Service 2 Developer 1 Build new image with VN+1|VN Begin provisioning process with new image overwrites image created by developer 2 Developer 2 Build new image with VN+1|VN+1 Begin provisioning process with new image Results in Version N+1 of Service 2 not being updated until next build of VM image Could be prevented by provisioning tool
  78. NICTA Copyright 2012 From imagination to impact Trade offs • One service per VM – Message from one service to another must go through inter VM communication mechanism – adds latency – No possibility of race condition • Multiple Services per VM – Inter VM communication requirements reduced – reduces latency – Adds possibility of race condition caused by simultaneous deployment 78
  79. NICTA Copyright 2012 From imagination to impact Summary of Deployment • Rolling upgrade is common deployment strategy • Introduces requirements for consistency among – Different versions of the same service – Different services – Services and persistent data • Other deployment considerations include – Canary deployment – A/B testing – Rollback 79
  80. NICTA Copyright 2012 From imagination to impact Question 80
  81. NICTA Copyright 2012 From imagination to impact Zookeeper • What purpose does Zookeeper serve? • Use cases – Leader election – Group membership – Distributed locks – Synchronization – Configuration • In our case, we will use Zookeeper to manage activating features 81
  82. NICTA Copyright 2012 From imagination to impact Distributed applications • Zookeeper provides guaranteed consistent (mostly) data structure for every instance of a distributed application. – Definition of “mostly” is within eventual consistency lag (but this is small) • Zookeeper deals with managing failure as well as consistency. – Done using Praxis algorithm. • Zookeeper guarantees that service requests are linearly ordered and processed in a FIFO order
  83. NICTA Copyright 2012 From imagination to impact Model • Zookeeper maintains a file type data structure – Hierarchical – Data in every node (called znode) – Amount of data in each node assumed small (<1M) – Intended for metadata • Configuration • Location • Group
  84. NICTA Copyright 2012 From imagination to impact Zookeeper znode structure / <data> /b1 <data> /b1/c1 <data> /b1/c2 <data> /b2 <data> /b2/c1 <data>
  85. NICTA Copyright 2012 From imagination to impact API Function Type create write delete write Exists read Get children Read Get data Read Set data write + others • All calls return atomic views of state – either succeed or fail. No partial state returned. Writes also are atomic. Either succeed or fail. If they fail, no side effects.
  86. NICTA Copyright 2012 From imagination to impact Use Case – leader election • Many distributed applications have master (leader)/slave structure – One master, many slaves – Master • Sends work to slaves • Monitors health of slaves and creates new ones as needed.
  87. NICTA Copyright 2012 From imagination to impact Using Zookeeper to elect master • Suppose master fails. Then must create/choose a new master. • All candidates issue “create” call with node name “master”. • Only one of these create requests will succeed, the rest will fail. This is one of the consistency elements enforced by Zookeeper. • Client who successfully creates znode named “master” will become new master.
  88. NICTA Copyright 2012 From imagination to impact Using Zookeeper to manage group membership • App connects to zookeeper – Get list of zookeeper servers – Create session (if server fails – automatic fail over) • Known group name – Create /group_name • If already exists get a failure • Client joins group by creating /group_name/my_id • Client can list children of /group_name and get members of group. • Watcher will inform client if group members fail or leave.
  89. NICTA Copyright 2012 From imagination to impact Using Zookeeper to manage distributed locks - 1 • Naïve solution – All clients attempt to create /lockname – Successful client has lock. • Client will delete znode when finished with lock • Znode will be deleted if client fails – Unsuccessful clients will watch /lockname. If it is deleted then they will attempt to create it. – Repeat
  90. NICTA Copyright 2012 From imagination to impact Distributed locks – 2 • Problem with naïve solution is “herd effect”. – If many clients all wake up and try to grab lock at once there will be an impact on the system load. • Better solution is for each client to watch predecessor. – Zookeeper enforces order – When predecessor deletes /lockname, then client will acquire it. – If predecessor fails, client is informed and will watch predecessor’s predecessor. Etc.
  91. NICTA Copyright 2012 From imagination to impact Using Zookeeper for distributed synchronization • Create new synchronization client. – It creates synchronization node – Other clients register on synchronization node at beginning of computation. – At end of computation they remove themselves from synchronization node • Synchronization client watches clients that have registered themselves. If one fails, it removes it from synchronization node. • When synchronization node is empty, synchronization client deletes it and other clients (who are watching) can proceed.
  92. NICTA Copyright 2012 From imagination to impact Using Zookeeper for configuration Each client records configuration information as data in a child node it creates under a main configuration node. Checking configuration is a matter of getting data from all of the children of the configuration node.
  93. NICTA Copyright 2012 From imagination to impact Using Zookeeper to synchronize activation of features. • Feature manager creates Znode containing – <feature flag name, feature flag value> – Written only when all services available. • Service retrieves feature flag value from Znode – If (Znode_read_value(feature flag name) then feature is active else feature is inactive • Feature flag value guaranteed to be consistent across services. • Latency is low (order of micro seconds) since Zookeeper keeps data structures in memory. 93
  94. NICTA Copyright 2012 From imagination to impact Summary of tutorial • DevOps practices lead to requirement to minimize inter team coordination • Continuous deployment has no human intervention from developer commit until deployment to production • Micro SOA architectural style determines or delegates 5 of 7 design decision categories • Deployment strategies raise issues of consistency. Separation of installation and activation enables turning features on or off. • Zookeeper is one tool to manage synchronizing the activations of features. 94
  95. NICTA Copyright 2012 From imagination to impact NICTA Team • Anna Liu • Alan Fekete • Min Fu • Daniel Sun • Hiroshi Wada • Ingo Weber • Xiwei Xu • Liming Zhu 95
  96. NICTA Copyright 2012 From imagination to impact Readings • http://www.slideshare.net/lenbass/what-is-dev-ops-for- review • http://www.slideshare.net/lenbass/02-team-practcies- and-overall-architecture • http://www.slideshare.net/lenbass/03-build-structure- and-testing • NICTA research papers. https://ssrg.nicta.com.au/projects/cloud/ 96

×